ABSTRACT
Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply "undone" -- but deleted from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information).
Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied.
- Wikimedia API. http://en.wikipedia.org/w/api.php.Google Scholar
- Wikimedia page-view statistics. http://dammit.lt/wikistats.Google Scholar
- Wikipedia. http://www.wikipedia.org.Google Scholar
- Wikistats: Wikimedia statistics. http://stats.wikimedia.org.Google Scholar
- WP: Long-term abuse. http://en.wikipedia.org/wiki/WP:LTA.Google Scholar
- WP: Mass blanking of copyright violations. http://en.wikipedia.org/wiki/WP:Wikipedia_Signpost/2010-09-13/.Google Scholar
- WP: Oversight. http://en.wikipedia.org/wiki/WP:OS.Google Scholar
- WP: Revision deletion. http://en.wikipedia.org/wiki/WP:RVDL.Google Scholar
- WP: Suppression statistics. http://en.wikipedia.org/wiki/WP:Arbitration_Committee/Audit_Subcommittee/Statistics.Google Scholar
- M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. I tube, you tube, everybody tubes: Analyzing the world's largest user generated content video system. In IMC, 2007. Google ScholarDigital Library
- L. Edwards. Content filtering and the new censorship. In ICDS '10: Proc. of the Conference on Digital Society, 2010. Google ScholarDigital Library
- P. Gehres, N. Singleton, G. Louthan, and J. Hale. Toward sensitive information redaction in a collaborative, multilevel security environment. In WikiSym, 2010. Google ScholarDigital Library
- E. Goldman. Wikipedia's labor squeeze and its consequences. Jour. of Telecommunications and High Tech. Law, 8, 2009.Google Scholar
- J. Merante. UK Natl. Portrait Gallery threatens Wikipedia user over public domain images. http://creativecommons.org/weblog/entry/15764, July, 14 2009.Google Scholar
- R. Priedhorsky, J. Chen, S. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in Wikipedia. In GROUP, 2007. Google ScholarDigital Library
- B. Stone. Policing the Web's lurid precincts. The New York Times, page B1, July 18, 2010.Google Scholar
- A. G. West, J. Chang, K. Venkatasubramanian, and I. Lee. Link spamming Wikipedia for profit. In CEAS, 2011. Google ScholarDigital Library
- J. Winter. Wikipedia distributing child porn, co-founder tells FBI. FoxNews.com, April 27, 2010.Google Scholar
Index Terms
- What Wikipedia deletes: characterizing dangerous collaborative content
Recommendations
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Automatically Generating Wikipedia Info-boxes from Wikidata
WWW '18: Companion Proceedings of the The Web Conference 2018Info-boxes provide a summary of the most important meta-data relating to a particular entity described by a Wikipedia article. However, many articles have no info-box or have info-boxes with only minimal information; furthermore, there is a huge ...
Comments