In:The Documentarist Turn: From observable linguistic behaviour to typological generalizations
Edited by Sonja Riesberg, Uta Reinöhl and Birgit Hellwig
[Studies in Language Companion Series 240] 2026
► pp. 134–148
Chapter 6Language archives in context
This content is being prepared for publication; it may be subject to changes.
Abstract
Language archives are a type of research infrastructure that grew out of requirements and efforts
of language documentation. From the very beginning of language documentation as a separate research paradigm, it has
been accompanied by a strong and vibrant methodological discourse. This discussion of methods and data has not only
led to a clear framework of documentary linguistics, but also specified requirements for data and metadata formats,
software tools, language archives and ontologies. A whole ecosystem of data infrastructure along with new data types
has grown out of this methodological discourse and in close dialogue with language documentation practice.
Article outline
- 1.Introduction
- 2.The emergence of documentary linguistics
- 3.The technical developments and method
- 4.Data quality and corpus creation
- 5.In the research infrastructure landscape
- 6.In a network of relations
- 7.Conclusion and outlook
Acknowledgements Notes References
References (62)
Austin, Peter K. 2010. “Communities,
Ethics and Rights in Language Documentation.” Language Documentation and
Description 7: 34–54.
2013. “Language
Documentation and Meta-Documentation.” In Keeping
Languages Alive: Documentation, Pedagogy and Revitalization, ed.
by Mari C. Jones, and Sarah Ogilvie, 3–15. Cambridge: Cambridge University Press.
Bański, Piotr, and Hanna Hedeland. 2022. “Standards
in CLARIN.” In CLARIN: The Infrastructure for
Language Resources, ed. by Darja Fišer, and Andreas Witt, 307–339. Berlin, Boston: De Gruyter Mouton.
Barth, Danielle, and Nicholas Evans (eds). 2017. Social
Cognition Parallax Interview Corpus
(SCOPIC). Honolulu: University of Hawaiʻi Press. [URL]
Bechert, Johannes. 1990. “Universalienforschung
und Ethnozentrismus.” In Proceedings of the
Fourteenth International Congress of
Linguists, Vol. 3, ed.
by Werner Bahner, Joachim Schildt, and Dieter Viehweger, 2350–2352. Berlin, Boston: De Gruyter Mouton.
Berez-Kroeker, Andrea L., and Ryan Henke. 2018. “Language
Archiving.” In The Oxford Handbook of Endangered
Languages, ed. by Kenneth L. Rehg, and Lyle Campbell, 347–369. Oxford: Oxford University Press.
Bird, Steven, and Gary Simons. 2001. “The
OLAC Metadata Set and Controlled
Vocabularies.” In STAR ‘01: Proceedings of the ACL
2001 Workshop on Sharing Tools and
Resources, Vol. 15, 7–18. Stroudsburg, PA: Association for Computational Linguistics.
. 2003. “Extending
Dublin Core Metadata to Support the Description and Discovery of Language
Resources.” Computers and the
Humanities 37 (4): 375–388.
Broeder, Daan, Freddy Offenga, Don Willems, and Peter Wittenburg. 2001. “The
IMDI Metadata Set, Its Tools and Accessible Linguistic
Databases.” In Proceedings of the IRCS Workshop on
Linguistic Databases, ed. by Steven Bird, Peter Buneman, and Mark Liberman, 11–13. Philadelphia: University of Pennsylvania.
Broeder, Daan, Menzo Windhouwer, Dieter van Uytvanck, Twan Goosen, and Thorsten Trippel. 2012. “CMDI:
A Component Metadata Infrastructure.” In Proceedings
of the Workshop Describing Language Resources with Metadata: Towards Flexibility and Interoperability in the
Documentation of Language Resources (LREC 2012), ed. by Victoria Arranz, Daan Broeder, Bertrand Gaiffe, Maria Gavrilidou, Monica Monachini, and Thorsten Trippel, 1–4. Paris: European Language Resources Association.
Broeder, Daan, and Peter Wittenburg. 2006. “The
IMDI Metadata Framework, Its Current Application and Future
Direction.” International Journal of Metadata, Semantics and
Ontologies 1 (2): 119–132.
Broeder, Daan, Hennie Brugman, Albert Russel, and Peter Wittenburg. 2000. “A
Browsable Corpus: Accessing Linguistic Resources the Easy
Way.” Presentation, Workshop Meta-Descriptions and
Annotation Schemas for Multimodal/Multimedia Language Resources (LREC
2000), Athens, 29–30 May
2000. Available at [URL]
Carroll, Stephanie Russo, Ibrahim Garba, Oscar L. Figueroa-Rodríguez, Jarita Holbrook, Raymond Lovett, Simeon Materechera, Mark Parsons, Kay Raseroka, Desi Rodriguez-Lonebear, Robyn Rowe, Rodrigo Sara, Jennifer D. Walker, Jane Anderson, and Maui Hudson. 2020. “The
CARE Principles for Indigenous Data Governance.” Data Science
Journal 19 (November): 43.
Chew, Kari A. B., Lokosh (Joshua D. Hinson), and Juliet Morgan. 2022. “Centering
Relationality in Online Indigenous Language Learning: Reflecting on the Creation and Use of Rosetta Stone
Chickasaw.” Language Documentation and
Conservation 16: 228–258. [URL]
De Smedt, Koenraad, Dimitris Koureas, and Peter Wittenburg. 2020. “FAIR
Digital Objects for Science: From Data Pieces to Actionable Knowledge
Units.” Publications 8 (2): 21.
Dobrin, Lise M., and Josh Berson. 2011. “Speakers
and Language Documentation.” In The Cambridge
Handbook of Endangered Languages, ed. by Julia Sallabank, and Peter K. Austin, 187–211. Cambridge: Cambridge University Press.
Drude, Sebastian, Daan Broeder, Paul Trilsbeek, and Peter Wittenburg. 2012. “The
Language Archive — a New Hub for Language
Resources.” In Proceedings of the 8th International
Conference on Language Resources and Evaluation (LREC 2012), ed.
by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 3264–3267. Paris: European Language Resources Association.
Dwyer, Arienne M. 2006. “Ethics and
Practicalities of Cooperative Fieldwork and
Analysis.” In Essentials of Language
Documentation, ed. by Jost Gippert, Nikolaus P. Himmelmann, and Ulrike Mosel, 31–66. Berlin, New York: De Gruyter Mouton.
Evans, Nick, and Hans-Jürgen Sasse. 2007. “Searching
for Meaning in the Library of Babel: Field Semantics and Problems of Digital
Archiving.” Language Documentation and
Description 4: 58–99.
Fišer, Darja, and Andreas Witt (eds.). 2022. CLARIN:
The Infrastructure for Language Resources. Berlin, Boston: De Gruyter Mouton.
Hale, Ken, Michael Krauss, Lucille J. Watahomigie, Akira Y. Yamamoto, Colette Craig, LaVerne Masayesva Jeanne, and Nora C. England. 1992. “Endangered
Languages.” Language 68 (1): 1–42.
Hallonsten, Olof. 2020. “Research
Infrastructures in Europe: The Hype and the Field.” European
Review 28 (4): 617–635.
Hatton, John, Gary Holton, Mandana Seyfeddinipur, and Nick Thieberger. 2021. Lameta. [Computer
program]. [URL]
Henke, Ryan, and Andrea L. Berez-Kroeker. 2016. “A
Brief History of Archiving in Language Documentation, with an Annotated
Bibliography.” Language Documentation and
Conservation 10: 411–457. [URL]
Himmelmann, Nikolaus P. 1998. “Documentary
and Descriptive
Linguistics.” Linguistics 36 (1): 161–196.
2006. “Language
Documentation: What Is It and What Is It Good
For?” In Essentials of Language
Documentation, ed. by Jost Gippert, Nikolaus P. Himmelmann, and Ulrike Mosel, 1–30. Berlin, New York: De Gruyter Mouton.
2012. “Linguistic
Data Types and the Interface between Language Documentation and
Description.” Language Documentation and
Conservation 6: 187–207. [URL]
2018. “Meeting the
Transcription Challenge.” In Reflections on Language
Documentation 20 Years after Himmelmann 1998, ed. by Bradley McDonnell, Andrea L. Berez-Kroeker, and Gary Holton, 33–40. Honolulu: University of Hawai‘i Press. [URL]
Hollander, H. S., Nicola Aloia, Ceri Binding, Sebastian Cuy, Martin Doerr, Bruno Fanini, Achille Felicetti, Johan Fihn, Dimitris Gavrilis, Guntram Geser, Carlo Meghini, Franco Niccolucci, Federico Nurra, Christos Papatheodorou, Julian Richards, Paola Ronzino, Roberto Scopigno, Maria Theodoridou, Maria Theodoridou, Douglas Tudhope, Andreas Vlachidis, and Holly Wright. 2017. “Enabling
European Archaeological Research: The ARIADNE E-Infrastructure.” Internet
Archaeology 17 (43): 43.
Holton, Gary. 2012. “Language
Archives: They’re Not Just for Linguists Any
More.” In Potentials of Language Documentation:
Methods, Analyses, and Utilization, ed. by Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek, 111–117. Honolulu: University of Hawai’i Press. [URL]
Johnson, Heidi. 2004. “Language
Documentation and Archiving, or How to Build a Better Corpus.” Language
Documentation and
Description 2: 140–153.
Kreuzburg, Lea, Juliane Watson, Fabian Riebschläger, and Alexander Dorn. 2023. “Zentral und wartbar sollst Du sein! Der FDM-Bereich am DAI. Die Erstellung eines
Informationshubs für das Forschungsdatenmanagement. [You
Should Be Centralized and Maintainable! The FDM Department at the DAI. Creating an Information Hub for
Research Data Management]” Forum for Digital Archaeology and
Infrastructure: 1–57 (§).
Lange, Herbert, and Jocelyn Aznar. 2022. “RefCo
and Its Checker: Improving Language Documentation Corpora’s Reusability Through a Semi-Automatic Review
Process.” In Proceedings of the 13th Conference on
Language Resources and Evaluation (LREC 2022), ed. by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, and Stelios Piperidis, 2721–2729. Paris: European Language Resources Association.
Lehmann, Christian. 2001. “Language
Documentation: A Program.” In Aspects of Typology and
Universals, ed. by Walter Bisang, 83–98. Berlin: Akademie Verlag.
McGregor, William. 2008. “History
of Fieldwork on Kimberley Languages.” In Encountering
Aboriginal Languages: Studies in the History of Australian Linguistics, ed.
by William McGregor, 1–34. Canberra: Pacific Linguistics.
Meghini, Carlo, Roberto Scopigno, Julian Richards, Holly Wright, Guntram Geser, Sebastian Cuy, Johan Fihn, Bruno Fanini, Hella Hollander, Franco Niccolucci, Achille Felicetti, Paola Ronzino, Federico Nurra, Christos Papatheodorou, Dimitris Gavrilis, Maria Theodoridou, Martin Doerr, Douglas Tudhope, Ceri Binding, and Andreas Vlachidis. 2017. “ARIADNE:
A Research Infrastructure for Archaeology.” Journal on Computing Cultural
Heritage 10 (3): 18.
Meighan, Paul J. 2021. “Decolonizing
the Digital Landscape: The Role of Technology in Indigenous Language
Revitalization.” AlterNative: An International Journal of Indigenous
Peoples 17 (3): 397–405.
Mosel, Ulrike. 2018. “Corpus
Compilation and Exploitation in Language Documentation
Projects.” In The Oxford Handbook of Endangered
Languages, ed. by Kenneth L. Rehg, and Lyle Campbell, 248–270. Oxford: Oxford University Press.
Nathan, David, and Peter K. Austin. 2004. “Reconceiving
Metadata: Language Documentation through Thick and Thin.” Language
Documentation and
Description 2: 179–188.
Paschen, Ludger, François Delafontaine, Christoph Draxler, Susanne Fuchs, Matthew Stave, and Frank Seifart. 2020. “Building
a Time-Aligned Cross-Linguistic Reference Corpus from Language Documentation Data
(DoReCo).” In Proceedings of the 12th Conference on
Language Resources and Evaluation (LREC 2020), ed. by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 2657–2666. Paris: European Language Resources Association.
Rice, Keren, and Nicholas Thieberger. 2018. “Tools
and Technology for Language Documentation and
Revitalization.” In The Oxford Handbook of Endangered
Languages, ed. by Kenneth L. Rehg, and Lyle Campbell, 225–247. Oxford: Oxford University Press.
Schnell, Stefan, Geoffrey Haig, and Frank Seifart (eds). 2021a. Doing
Corpus-Based Typology With Spoken Language Corpora: State of the
Art. Honolulu: University of Hawai’i Press. [URL]
. 2021b. “The
Role of Language Documentation in Corpus-Based
Typology.” In Doing Corpus-Based Typology with Spoken
Language Data: State of the Art, ed. by Geoffrey Haig, Stefan Schnell, and Frank Seifart (eds.), 1–28. Honolulu: University of Hawai’i Press. [URL]
Schwiertz, Gabriele. 2012. “Online
Presentation and Accessibility of Endangered Languages Data: The General Portal to the DoBeS
Archive.” In Potentials of Language Documentation:
Methods, Analyses, and Utilization, ed. by Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek, 126–128. Honolulu: University of Hawai’i Press. [URL]
Seifart, Frank, Nicholas Evans, Harald Hammarström, and Stephen C. Levinson. 2018. “Language
Documentation Twenty-Five Years
On.” Language 94 (4): e324–e345.
Seifart, Frank, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek (eds). 2012. Potentials
of Language Documentation: Methods, Analyses, and
Utilization. Honolulu: University of Hawai’i Press. [URL]
Sloetjes, Han, and Peter Wittenburg. 2008. “Annotation
by Category: ELAN and ISO DCR.” In Proceedings of the
6th International Conference on Language Resources and Evaluation (LREC 2008), ed.
by Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, 816–820. Paris: European Language Resources Association.
Starblanket, Gina. 2018. “Complex
Accountabilities: Deconstructing ‘the Community’ and Engaging Indigenous Feminist Research
Methods.” American Indian Culture and Research
Journal 42 (4): 1–20.
Sullivan, Corrinne Tayce. 2020. “Who
Holds the Key? Negotiating Gatekeepers, Community Politics, and the ‘Right’ to Research in Indigenous
Spaces.” Geographical
Research 58 (4): 344–354.
Sullivant, Ryan. 2020. “Archival
Description for Language Documentation Collections.” Language Documentation and
Conservation 14: 520–578. [URL]
Thieberger, Nick. 2016. “Documentary
Linguistics: Methodological Challenges and Innovatory Responses.” Applied
Linguistics 37 (1): 88–99.
Thieberger, Nick, and Amanda Harris. 2022. “When
Your Data Is My Grandparents Singing. Digitisation and Access for Cultural Records, the Pacific and Regional
Archive for Digital Sources in Endangered Cultures (PARADISEC).” Data Science
Journal 21 (1): 9.
Trilsbeek, Paul, and Peter Wittenburg. 2006. “Archiving
Challenges.” In Essentials of Language
Documentation, ed. by Jost Gippert, Nikolaus P. Himmelmann, and Ulrike Mosel, 311–336. Berlin, New York: De Gruyter Mouton.
Tynan, Lauren. 2021. “What
Is Relationality? Indigenous Knowledges, Practices and Responsibilities with
Kin.” Cultural
Geographies 28 (4): 597–610.
Váradi, Tamás, Steven Krauwer, Peter Wittenburg, Martin Wynne, and Kimmo Koskenniemi. 2008. “CLARIN:
Common Language Resources and Technology
Infrastructure.” In Proceedings of the 6th
International Conference on Language Resources and Evaluation (LREC 2008), ed.
by Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, 1244–1248. Paris: European Language Resources Association.
Wamprechtshammer, Anna, Jocelyn Aznar, Elena Arestau, Hanna Hedeland, Amy Isard, Ilya Khait, Herbert Lange, Nicole Majka, and Felix Rau. 2022. “QUEST:
Guidelines and Specifications for the Assessment of Audiovisual, Annotated Language
Data.” Working Papers in Corpus Linguistics and Digital Technologies: Analyses
and
Methodology 8: 1–85.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J. G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A.C ’t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. 2016. “The
FAIR Guiding Principles for Scientific Data Management and
Stewardship.” Scientific
Data 3 (1): 160018.
Wilson, Shawn. 2020. Research
Is Ceremony: Indigenous Research Methods. Nova Scotia: Fernwood Publishing.
Wittenburg, Peter, Ulrike Mosel, and Arienne M. Dwyer. 2002. “Methods
of Language Documentation in the DOBES Project.” Proceedings of the 3rd
International Conference on Language Resources and Evaluation (LREC 2002), ed.
by Manuel González Rodríguez, and Carmen Paz Suarez Araujo, 36–42. Paris: European Language Resources Association.
Wittenburg, Peter, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. “ELAN:
A Professional Framework for Multimodality
Research.” In Proceedings of the 5th International
Conference on Language Resources and Evaluation (LREC 2006), ed.
by Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, and Daniel Tapias, 1556–1559. Paris: European Language Resources Association.
