In:Reference: From conventions to pragmatics
Edited by Laure Gardelle, Laurence Vincent-Durroux and Hélène Vinckel-Roisin
[Studies in Language Companion Series 228] 2023
► pp. 107–126
A linear approach of chain composition
Published online: 2 February 2023
https://doi.org/10.1075/slcs.228.06fed
https://doi.org/10.1075/slcs.228.06fed
Abstract
This corpus-based approach to coreference chains analyzes recurrences in the patterns of chains, providing new insights into conventions or preferences in the forms of referential expressions. By taking into account the linearity of discourse and the succession of mentions, it goes beyond the more commonly implemented analysis of global characteristics. We analyze 581 reference chains from the French corpus AnnoDis. Using clustering methods, we first show that the resulting clusters are linguistically interpretable. We then demonstrate that animacy and genre influence chain composition. Finally we identify the main patterns of coreference chains in the corpus. This highlights different types of chains and discourse strategies, which vary across genres, and confirms a major influence of referent type.
Keywords: coreference chains, linear approach, corpus-based analysis
Article outline
- 1.Introduction
- 2.Application to the analysis of an annotated French corpus: The AnnoDis Corpus
- 3.A sequence analysis approach to coreference chains
- 3.1Parameters of the sequence analysis
- 3.2From mentions to states
- 4.Clustering coreference chains according to their sequence of mentions
- 5.Impact of animacy and text genre on chain composition
- 6.Patterns of coreference chains
- 7.Discussion
Notes References
References (41)
Abbott, Andrew. 1995. Sequence analysis: New methods for old ideas. Annual Review of Sociology 21(1): 93–113.
Ariel, Mira. 2001. Accessibility theory: An overview. In Text Representation: Linguistic and Psycholinguistic Aspects [Human Cognitive Processing 8], Ted J. M. Sanders, Joost Schilperoord & Wilbert Spooren (eds), 29–87. Amsterdam: John Benjamins.
Asher, Nicholas, Muller, Philippe, Bras, Myriam, Ho-Dac, Lydia-Mai, Benamara, Farah, Afantenos Stergos & Vieu, Laure. 2017. ANNODIS and related projects: Case studies on the annotation of discourse structure. In Handbook of Linguistic Annotation, Nancy Ide & James Pustejovsky (eds), 1241–1264. Dordrecht: Springer.
Biber, Douglas, Connor, Ulla & Upton, Thomas A. 2007. Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure [Studies in Corpus Linguistics 28]. Amsterdam: John Benjamins.
Corblin, Francis. 1995. Les formes de reprise dans le discours. Anaphores et chaînes de référence. Rennes: Presses universitaires de Rennes.
Dietrich, Julia, Andersson, Håkan & Salmera-Aro, Katariina. 2014. Developmental psychologists’ perspective on pathways through school and beyond. In Advances in Sequence Analysis: Theory, Method, Applications, Vol. 2, Philippe Blanchard, Felix Bühlmann & Jacques-Antoine Gauthier (eds), 129–150. Cham: Springer.
Fasang, Annette Eva. 2014. New perspectives on family formation: What can we learn from sequence analysis In Advances in Sequence Analysis: Theory, Method, Applications, Vol. 2, Philippe Blanchard, Felix Bühlmann & Jacques-Antoine Gauthier (eds), 107–128. Cham: Springer.
Federzoni, Silvia, Ho-Dac, Lydia-Mai & Rebeyrolle, Josette. 2020. Les chaînes topicales dans la ressource ANNODIS. In 7e Congrès Mondial de Linguistique Française (Montpellier, France) [SHS Web of Conferences 78], Franck Neveu, Bernard Harmegnies, Linda Hriba, Sophie Prévost & Agnes Steuckardt (eds), #11005. Les Ulis: EDP Sciences.
Fossard, Marion, Achim, Amélie M., Rousier-Vercruyssen, Lucie, Gonzalez, Sylvia, Bureau, Alexandre, & Champagne-Lavau, Maud. 2018. Referential choices in a collaborative storytelling task: Discourse stages and referential complexity matter. Frontiers in Psychology 9: 176.
Gabadinho, Alexis, Ritschard, Gilbert, Müller, Nicolas S. & Studer, Matthias. 2011. Analyzing and visualizing state sequences in R with TraMineR. Journal of Statistical Software 40(4): 1–37.
Gabadinho, Alexis, Gilbert, Ritschard, Studer, Matthias & Müller, Nicolas S. 2009. Mining Sequence Data in R with the TraMineR Package: A User’s Guide. Department of Econometrics and Laboratory of Demography. Geneva: University of Geneva.
Gundel, Jeanette K., Hedberg, Nancy & Zacharski, Ron. 1993. Cognitive status and the form of referring expressions in discourse. Languages 69(2): 274–307.
Halpin, Brendan. 2010. Optimal matching analysis and life course data: The importance of duration. Sociological Methods & Research 38(3): 365–388.
Kunz, Kerstin & Lapshinova-Koltunski, Ekaterina. 2015. Cross-linguistic analysis of discourse variation across registers. Nordic Journal of English Studies 14(1): 258–288.
Landragin, Frédéric. 2015. Description, modélisation et détection automatique des chaînes de référence (DEMOCRAT). Bulletin de l’Association Française pour l’Intelligence Artificielle (AFIA) 92: 11–15.
Lapshinova-Koltunski, Ekaterina & Kunz, Kerstin. 2020. Exploring coreference features in heterogeneous data. In Proceedings of the First Workshop on Computational Approaches to Discourse, Chloé Braud, Christian Hardmeier, Junyi Jessy Li, Annie Louis & Michael Strube (eds), 53–64. Stroudsburg PA: ACL.
Lesnard, Laurent. 2014. Using optimal matching analysis in sociology: Cost setting and sociology of time. In Advances in Sequence Analysis: Theory, Method, Applications, Vol. 2, Philippe Blanchard, Felix Bühlmann & Jacques-Antoine Gauthier (eds), 39–50. Cham: Springer.
Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8): 707–710.
Longo, Laurence & Todirascu, Amalia. 2014. Vers une typologie des chaînes de référence dans des textes administratifs et juridiques. Langages 195(3): 79–98.
. 2010. Genre-based reference chains identification for French. Investigationes Lingvisticae 21: 57–75.
Nedoluzko, Anna & Lapshinova-Koltunski, Ekaterina. 2016. Contrasting coreference in Czech and German: From different frameworks to joint results. In Computational Linguistics and Intellectual Technologies: Proceedings of the 22nd International Conference Dialogue. Moscow, Russia. <[URL]> (12 August 2022).
Obry, Vanessa, Glikman, Julie, Guillot-Barbance, Céline & Pincemin, Bénédicte. 2017. Les chaînes de référence dans les récits brefs en français: Étude diachronique (XIIIe-XVIe s.). Langue Française 195(3): 91–110.
Péry-Woodley, Marie-Paule, Afantenos, Stergos, Ho-Dac, Lydia-Mai & Asher, Nicholas. 2011. La ressource ANNODIS, un corpus enrichi d’annotations discursives. Revue TAL 52(3): 71–101.
Péry-Woodley, Marie-Paule, Ho-Dac, Lydia-Mai, Rebeyrolle, Josette, Tanguy, Ludovic & Fabre, Cécile. 2017. A corpus-driven approach to discourse organisation: From cues to complex markers. Dialogue & Discourse 8(1): 66–105.
Quignard, Matthieu, Le Mené, Marine & Landragin, Frédéric. 2021. Élaboration du corpus DEMOCRAT: Procédures d’annotation et d’évaluation. Langages 224: 25–46.
Recasens Potau, Marta. 2010. Coreference: Theory, Annotation, Resolution and Evaluation. PhD dissertation, Universitat de Barcelona.
Robette, Nicolas. 2011. Explorer et décrire les parcours de vie: Les typologies de trajectoires [Les collections du CEPED]. Paris: CEPED.
Rousier-Vercruyssen, Lucie & Landragin, Frédéric. 2019. Interdistance et instabilité au sein des chaînes de référence: Indices textuels? Discours 25.
Schnedecker, Catherine. 2021. Les chaînes de référence en Français [Collection L’essentiel Français]. Paris: Ophrys.
. 2005. Les chaînes de référence dans les portraits journalistiques: Éléments de description. Travaux de Linguistique 51(2): 85–133.
Schnedecker, Catherine & Landragin, Frédéric. 2014. Les chaînes de référence: Présentation. Langages 195(3): 3–22.
Studer, Matthias & Ritschard, Gilbert. 2014. A comparative review of sequence dissimilarity measures. LIVES Working Paper 133.
Urieli, Assaf. 2013. Analyse syntaxique robuste du Français: Concilier méthodes statistiques et connaissances linguistiques dans l’Outil Talismane. Unpublished PhD dissertation. Université Toulouse 2 Le Mirail.
Uryupina, Olga, Kabadjov, Mijail & Poesio, Massimo. 2016. Detecting non-reference and non-anaphoricity. In Anaphora Resolution [Theory and Applications of Natural Language Processing], Massimo Poesio, Roland Stuckardt & Yannick Versley (eds), 369–392. Berlin: Springer.
