In:Investigating Wikipedia: Linguistic corpus building, exploration and analysis
Edited by Céline Poudat, Harald Lüngen and Laura Herzberg
[Studies in Corpus Linguistics 121] 2024
► pp. 178–204
Chapter 7Live exploration of Wikipedia editing dynamics with visual analytics
WhoColor and Interactive Wikipedia Article Analysis Notebooks
Published online: 31 October 2024
https://doi.org/10.1075/scl.121.07flo
https://doi.org/10.1075/scl.121.07flo
Abstract
The revision histories of Wikipedia
articles are a rich source of data about the interactions of editors with each other and with the content, yet they
are not straightforward to mine or understand. We describe two tools for visual analytics that support this effort:
(i) An interactive browser extension to study word authorship, age, and conflict dynamics, which provides an overlay on live Wikipedia articles;
and (ii) a novel interactive Jupyter Notebook package that allows us to run analyses of editorial dynamics
out-of-the-box and is easily modifiable. Both leverage live data for any article on demand from several Web APIs,
centering on our own WikiWho service, providing the most accurate mining of live word-level changes currently
available. We show how these tools enable the exploration of the survival of content, productivity of editors,
conflict dynamics, and other metrics through low-barrier interfaces while providing the opportunity for more
quantitative investigations via access to the notebooks’ underlying data structures.
Keywords: Wikipedia revision history, visual analytics, WikiWho, IWAAN
Article outline
- 1.Introduction
- 1.1Related work
- 1.2What is in this chapter?
- 2.Analyzing articles “in situ”: WhoColor
- 3.IWAAN: Visual analytics with Jupyter Notebooks in the
cloud
- 3.1Jupyter Notebooks as a versatile analysis, sharing and documentation tool
- 3.2IWAAN content structure
- 4.Use case: “Genetically modified organism”
- 4.1Authorship distribution and conflict with WhoColor
- 4.2IWAANs — Templates and protection
- 4.3IWAANs — Actions over time
- 4.4IWAANs — Talk topics
- 4.5IWAANs — Impactful editors and token ownership
- 4.6IWAANs — Editor activity drilldown
- 4.7IWAANs — Text change overall
- 4.8Focus on the “volatile phase”
- 4.9Further large-scale changes and conflicts
- 4.10Beyond predesigned interface modules
- 4.11A note on privacy
- 5.Conclusion
Notes References
References (39)
Borra, Erik, Weltevrede, Esther, Ciuccarelli, Paolo, Kaltenbrunner, Andreas, Laniado, David, Magni, Giovanni, Mauri, Michele, Rogers, Richard & Venturini, Tommaso. 2015. Societal
controversies in Wikipedia
articles. In CHI ’15: Proceedings of the 33rd Annual
ACM Conference on Human Factors in Computing
Systems, 193–196. New York NY: ACM.
Brandes, Ulrik, Kenis, Patrick, Lerner, Jürgen & van Raaij, Denise. 2009. Network
analysis of collaboration structure in
Wikipedia. In WWW ’09: Proceedings of the 18th
International Conference on World Wide
Web, 731–740. New York NY: ACM.
Bryant, Susan L., Forte, Andrea & Bruckman, Amy. 2005. Becoming
Wikipedian: Transformation of participation in a collaborative online
encyclopedia. In GROUP ’05: Proceedings of the 2005
International ACM SIGGROUP Conference on Supporting Group
Work, 1–10. New York NY: ACM.
Bykau, Siarhei, Korn, Flip, Srivastava, Divesh & Velegrakis, Yannis. 2015. Fine-grained
controversy detection in
Wikipedia. In 2015 IEEE 31st International Conference
on Data Engineering, Seoul, Korea
(South), 1573–1584. IEEE.
Ferschke, Oliver, Gurevych, Iryna & Chebotar, Yevgen. 2012. Behind
the article: Recognizing dialog acts in Wikipedia talk
pages. In Proceedings of the 13th Conference of the
European Chapter of the Association for Computational Linguistics, Walter Daelemans (ed.), 777–786. Stroudsburg PA: ACL.
Flöck, Fabian & Acosta, Maribel. 2014. Wikiwho:
Precise and efficient attribution of authorship of revisioned
content. In WWW ’14: Proceedings of the 23rd
International Conference on World Wide
Web, 843–854. New York NY: ACM.
. 2015. Whovis:
Visualizing editor interactions and dynamics in collaborative writing over
time. In WWW ’15 Companion: Proceedings of the 24th
International Conference on World Wide
Web, 191–194. New York NY: ACM.
Flöck, Fabian, Erdogan, Kenan & Acosta, Maribel. 2017. Toktrack:
A complete token provenance and change tracking dataset for the English
Wikipedia. Proceedings of the International AAAI Conference on Web and Social
Media 11(1): 408–417.
Flöck, Fabian, Laniado, David, Stadthaus, Felix & Acosta, Maribel. 2015. Towards
better visual tools for exploring Wikipedia article development — The use case of “Gamergate
Controversy”. Proceedings of the International AAAI Conference on Web and
Social
Media 9(5): 48–55.
Flöck, Fabian, Vrandečić, Denny & Simperl, Elena. 2011. Towards
a diversity-minded Wikipedia. In Proceedings of the
3rd International Web Science
Conference, 1–8.
. 2012. Revisiting
reverts: Accurate revert detection in
Wikipedia. In HT ’12: Proceedings of the 23rd ACM
Conference on Hypertext and Social
Media, 3–12. New York NY: ACM.
García-Gavilanes, Ruth, Mollgaard, Anders, Tsvetkova, Milena & Yasseri, Taha. 2017. The
memory remains: Understanding collective memory in the digital age. Science
Advances 3(4).
Halfaker, Aaron & Geiger, R. Stuart. 2020. ORES:
Lowering barriers with participatory machine learning in Wikipedia. Proceedings
of the ACM on Human-Computer
Interaction 4(CSCW2): 1–37.
Halfaker, Aaron, Kittur, Aniket & Riedl, John. 2011. Don’t
bite the newbies: How reverts affect the quantity and quality of Wikipedia
work. In Proceedings of the 7th International
Symposium on Wikis and Open
Collaboration, 163–172. New York NY: ACM.
Hamilton, William L., Clark, Kevin, Leskovec, Jure & Jurafsky, Dan. 2016. Inducing
domain-specific sentiment lexicons from unlabeled
corpora. In Proceedings of the 2016 Conference on
Empirical Methods in Natural Language Processing, Jian Su, Kevin Duh & Xavier Carreras (eds), 595–605. Stroudsburg PA: ACL.
Ho-Dac, Lydia-Mai, Laippala, Veronika, Poudat, Céline & Tanguy, Ludovic. 2016. French
Wikipedia talk pages: Profiling and conflict
detection. In 4th Conference on CMC and Social Media
Corpora for the Humanities, Darja Fišer & Michael Beißwenger (eds), Ljubljana University Press, Faculty of Arts.
Iba, Takashi, Nemoto, Keiichi, Peters, Bernd & Gloor, Peter A. 2010. Analyzing
the creative editing behavior of Wikipedia editors: Through dynamic social network
analysis. Procedia-Social and Behavioral
Sciences 2(4): 6441–6456.
Jurgens, David & Lu, Tsai-Ching. 2012. Temporal
motifs reveal the dynamics of editor interactions in Wikipedia. Proceedings of
the International AAAI Conference on Web and Social
Media 6(1): 162–169.
Keegan, Brian C., Gergle, Darren & Contractor, Noshir. 2013. Hot
off the Wiki: Structures and dynamics of Wikipedia’s coverage of breaking news
events. American Behavioral
Scientist 57(5): 595–622.
Keegan, Brian C., Lev, Shakked & Arazy, Ofer. 2016. Analyzing
organizational routines in online knowledge collaborations: A case for sequence analysis in
CSCW. In CSCW ’16: Proceedings of the 19th ACM
Conference on Computer-Supported Cooperative Work & Social
Computing, 1065–1079. New York NY: ACM.
Kittur, Aniket, Suh, Bongwon, Pendleton, Bryan A. & Chi, Ed H. 2007. He says,
she says: Conflict and coordination in
Wikipedia. In CHI ’07: Proceedings of the SIGCHI
Conference on Human Factors in Computing
Systems, 453–462. New York NY: ACM.
Lerner, Jürgen & Lomi, Alessandro. 2020. The
free encyclopedia that anyone can dispute: An analysis of the micro-structural dynamics of positive and
negative relations in the production of contentious Wikipedia articles. Social
Networks 60: 11–25.
Liu, Jun & Ram, Sudha. 2009. Who
does what: Collaboration patterns in the Wikipedia and their impact on data
quality. In 19th Workshop on Information Technologies
and Systems, 175–180. 〈[URL]〉 (2 June 2024).
Nemoto, Keiichi, Gloor, Peter & Laubacher, Robert. 2011. Social
capital increases efficiency of collaboration among Wikipedia
editors. In HT ’11: Proceedings of the 22nd ACM
Conference on Hypertext and
Hypermedia, 231–240. New York NY: ACM.
Pentzold, Christian, Weltevrede, Esther, Mauri, Michele, Laniado, David, Kaltenbrunner, Andreas & Borra, Erik. 2017. Digging
Wikipedia: The online encyclopedia as a digital cultural heritage gateway and
site. Journal on Computing and Cultural
Heritage 10(1): 1–19.
Polanyi, Livia & Zaenen, Annie. 2006. Contextual
valence shifters. In Computing Attitude and Affect in
Text: Theory and Applications [The Information Retrieval Series
20], James G. Shanahan, Yan Qu & Janyce Wiebe (eds), 1–10. Berlin: Springer.
Poudat, Céline, Vanni, Laurent & Grabar, Natalia. 2016. How
to explore conflicts in French Wikipedia talk pages? JADT 2016 — Statistical
Analysis of Textual
Data: 645–656.
Qu, Iris, Thain, Nithum & Hua, Yiqing. 2019. WikiDetox
visualization. Wiki Workshop 2019. 〈[URL]〉 (2 June 2024).
Rogers, Richard & Sendijarevic, Emina. 2012. Neutral
or national point of view? A comparison of Srebrenica articles across Wikipedia’s language
versions. Wikipedia Academy: Research and Free Knowledge, Berlin, 29 June 29–1
July 1 2012. 〈[URL]〉 (2 June 2024).
Shi, Feng, Teplitskiy, Misha, Duede, Eamon & Evans, James A. 2019. The
wisdom of polarized crowds. Nature Human
Behaviour 3(4): 329–336.
Suh, Bongwon, Chi, Ed H., Kittur, Aniket & Pendleton, Bryan A. 2008. Lifting
the veil: Improving accountability and social transparency in Wikipedia with
Wikidashboard. In CHI ’08: Proceedings of the SIGCHI
Conference on Human Factors in Computing
Systems, 1037–1040. New York NY: ACM.
Suh, Bongwon, Chi, Ed H., Pendleton, Bryan A. & Kittur, Aniket. 2007. Us
vs. them: Understanding social dynamics in Wikipedia with revert graph
visualizations. 2007 IEEE Symposium on Visual Analytics Science and
Technology, 163–170. IEEE.
Sumi, Róbert, Yasseri, Taha, Rung, András, Kornai, András & Kertész, János. 2011. Edit
wars in Wikipedia. In 2011 IEEE Third International
Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social
Computing: 724–727. IEEE.
Thom-Santelli, Jennifer, Cosley, Dan & Gay, Geri. 2009. What’s
mine is mine: Territoriality in collaborative
authoring. In Proceedings of the SIGCHI Conference on
Human Factors in Computing
Systems, 1481–1484. New York NY: ACM.
Tsvetkova, Milena, García-Gavilanes, Ruth & Yasseri, Taha. 2016. Dynamics
of disagreement: Large-scale temporal network analysis reveals negative interactions in online
collaboration. Scientific
Reports 6(1): 36333.
Viégas, Fernanda B., Wattenberg, Martin & Dave, Kushal. 2004. Studying
cooperation and conflict between authors with history flow
visualizations. In CHI ’04: Proceedings of the SIGCHI
Conference on Human Factors in Computing
Systems, 575–582. New York NY: ACM.
Vuong, Ba-Quy, Lim, Ee-Peng, Sun, Aixin, Le, Minh-Tam & Lauw, Hady W. 2008. On
ranking controversies in Wikipedia: Models and
evaluation. In WSDM ’08: Proceedings of the
International Conference on Web Search and Web Data
Mining, 171–182. 〈[URL]〉 (2 June 2024).
