Article published In: International Journal of Corpus Linguistics
Vol. 28:3 (2023) ► pp.430–459
Annotation uncertainty in the context of grammatical change
Published online: 20 June 2023
https://doi.org/10.1075/ijcl.20113.mer
https://doi.org/10.1075/ijcl.20113.mer
Abstract
This paper elaborates on the notion of uncertainty in the context of annotation in large text corpora, specifically focusing on (but not limited to) historical languages. Such uncertainty might be due to inherent properties of the language, for example, linguistic ambiguity and overlapping categories of linguistic description, but could also be caused by a lack of annotation expertise. By examining annotation uncertainty in more detail, we identify the sources, deepen our understanding of the nature and different types of uncertainty encountered in daily annotation practice, and discuss practical implications of our theoretical findings. This paper can be seen as an attempt to reconcile the perspectives of the main scientific disciplines involved in corpus projects, linguistics and computer science, to develop a unified view and to highlight the potential synergies between these disciplines.
Keywords: annotation, uncertainty, fuzziness, grammatical change
Article outline
- 1.Introduction
- 2.Current annotation practice and limitations
- 3.Uncertainty in historical (Corpus) linguistics
- 3.1Project context and underlying corpus
- 3.2Annotation uncertainties
- 3.2.1Overlapping categories and the gradualness of change
- 3.2.2Types of categorical gradience
- 3.2.3The human annotator as a source of (subjective) uncertainty
- 4.Mathematical modeling of uncertainty
- 4.1Frame of discernment and ground-truth
- 4.2Uncertainty measures and calculi
- 4.3Vagueness, fuzziness, and graded notions of truth
- 4.4Granularity
- 5.A unified view of uncertainty
- 5.1Fuzziness and ambiguity
- 5.2Incompleteness and lack of knowledge
- 6.Practical implications
- 6.1Types of annotation
- 6.2Experience from the annotation practice: Tagging ambiguities and uncertainties
- 6.2.1Human expert annotator
- 6.2.2Machine annotator
- 7.Conclusions
- Notes
References
References (48)
Aarts, B. (2007). Syntactic Gradience: The Nature of Grammatical Indeterminacy. Cambridge University Press.
Bley-Vroman, R. (1983). The comparative fallacy in interlanguage studies: The case of systematicity. Language learning, 331, 1–17.
(2011). Usage-based theory and grammaticalization. In H. Narrog & B. Heine (Eds.), The Oxford Handbook of Grammaticalization (pp. 60–78). Oxford University Press.
Croft, W. A. (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford University Press.
Deng, Y. (2014). Generalized evidence theory. CoRR, abs/1404.4801.v1. Retrieved from [URL]
Denison, D. (2017). Ambiguity and vagueness in historical change. In M. Hundt, S. Molling, & S. E. Pfenniger (Eds.), The Changing English Language: Psycholinguistic Perspectives (pp. 292–318). Cambridge University Press.
Diewald, G. (2009). Konstruktionen und Paradigmen [Constructions and paradigms]. Zeitschrift für germanistische Linguistik, 37(3), 445–468.
Dipper, S. (2015). Annotierte Korpora für die Historische Syntaxforschung: Anwendungsbeispiele anhand des Referenzkorpus Mittelniederdeutsch [Annotated corpora for historical syntax studies: Applications of the Middle Low German Reference Corpus]. Zeitschrift für Germanistische Linguistik, 43(3), 516–563.
Dipper, S., Donhauser, K., Klein, T., Linde, S., Müller, S., & Wegera, K.-P. (2013). HiTS: ein Tagset für historische Sprachstufen des Deutschen [HiTS: A tagset for historical language levels of German]. Journal for Language Technology and Computational Linguistics, 281, 85–137.
Dubois, D. (2006). Possibility theory and statistical reasoning. Computational Statistics and Data Analysis, 51(1), 47–69.
(1990). Rough fuzzy sets and fuzzy rough sets. International Journal of General Systems, 171, 191–209.
Dubois, D., Prade, H., & Smets, P. (1996). Representing partial ignorance. IEEE Transactions on Systems, Man and Cybernetics, Series A, 26(3), 361–377.
Eckart de Castilho, R., Mújdricza-Maydt, E., Yimam, S. M., Hartmann, S., Gurevych, I., Frank, A., & Biemann, C. (2016). A web-based tool for the integrated annotation of semantic and syntactic structures. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH) (pp. 76–84). The COLING 2016 Organizing Committee. [URL]
Hacking, I. (1975). The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference. Cambridge University Press.
Heine, B. (2002). On the role of context in grammaticalization. In I. Wischer & G. Diewald (Eds.), New Reflections on Grammaticalization (pp. 83–101). John Benjamins.
Heine, B., & Narrog, H. (2010). Grammaticalization and linguistic analysis. In B. Heine & H. Narrog (Eds.), The Oxford Handbook of Linguistic Analysis (pp. 401–423). Oxford University Press.
Klie, J.-C., Bugert, M., Boullosa, B., de Castilho, R. E., & Gurevych, I. (2018). The inception platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th international conference on computational linguistics: System demonstrations (pp. 5–9). Association for Computational Linguistics. [URL]
Krishnapuram, R. (1994). Generation of membership functions via possibilistic clustering. In Proceedings of the IEEE 3rd International Fuzzy Systems Conference.
Kruse, R., Schwecke, E., & Heinsohn, J. (1991). Uncertainty and Vagueness in Knowledge Based Systems. Springer.
Kübler, S., & Zinsmeister, H. (2015). Corpus Linguistics and Linguistically Annotated Corpora. Bloomsbury.
Lakoff, G. (1987). Cognitive models and prototype theory. In U. Neisser (Ed.), Concepts and Conceptual Development (pp. 63–100). Cambridge University Press.
Langacker, R. W. (1987). Foundations of Cognitive Grammar (i): Theoretical Prerequisites. Stanford University Press.
Lehmberg, M. (2013). Der Goslarer Ratskodex – das Stadtrecht um 1350 [Codex of Goslar’s Council – the Municipal Law around 1350]. Verlag für Regionalgeschichte.
Merten, M. (2018). Literater Sprachausbau kognitiv-funktional [Literate language expansion cognitive-functional]. De Gruyter.
Merten, M. & Tophinke, D. (2019). Interaktive Analyse historischen Grammatikwandels. Konstruktionsgrammatik trifft auf machine learning [Interactive analysis of historical grammatical change: Construction grammar meets machine learning]. Jahrbuch für Germanistische Sprachgeschichte, 101, 303–323.
Narrog, H. (2012). Modality, Subjectivity, and Semantic Change: A Cross-linguistic Perspective. Oxford University Press.
Nguyen, H. T. (1978). On random sets and belief functions. Journal of Mathematical Analysis and Applications, 651, 531–542.
Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Sciences, 111, 341–356.
Schmid, H.-J. (2010). Does frequency in text instantiate entrenchment in the cognitive system? In D. Glynn & K. Fischer (Eds.), Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches (pp. 110–133). De Gruyter.
Seemann, N., Merten, M., Geierhos, M., Tophinke, D., & Hüllermeier, E. (2017). Annotation challenges for reconstructing the structural elaboration of Middle Low German. In B. Alex, S. Degaetano-Ortlieb, A. Feldman, A. Kazantseva, N. Reiter, & S. Szpakowicz (Eds.), Proceedings of Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, and Literature (pp. 40–45). Association for Computational Linguistics.
Shilkret, N. (1971). Maxitive measure and integration. Nederlandse Akadademie van Wetenschappen. Proceedings Serie A 74 = Indagationes Mathematicae, 331, 109–116.
Skala, H. (1978). On many-valued logics, fuzzy sets, fuzzy logics and their applications. Fuzzy Sets and Systems, 1(2), 129–149.
Smets, P., & Kennes, R. (1994). The transferable belief model. Artificial Intelligence, 661, 191–234.
Traugott, E., & Trousdale, G. (2010). Gradience, gradualness and grammaticalization: How do they intersect? In E. C. Traugott & G. Trousdale (Eds.), Gradience, Gradualness and Grammaticalization (pp. 19–44). John Benjamins.
Trousdale, G. (2012). Grammaticalization, lexicalization and constructionalization from a cognitive-pragmatic perspective. In H.-J. Schmid (Ed.), Cognitive Pragmatics (pp. 533–558). De Gruyter.
(2013). Gradualness in language change. In A. G. Ramat, C. Mauri, & P. Molinelli (Eds.), Synchrony and Diachrony: A Dynamic Interface (pp. 27–42). John Benjamins.
