Article published In: The Wealth and Breadth of Construction-Based Research:
Edited by Timothy Colleman, Frank Brisard, Astrid De Wit, Renata Enghels, Nikos Koutsoukos, Tanja Mortelmans and María Sol Sansiñena
[Belgian Journal of Linguistics 34] 2020
► pp. 5–16
Construction mining
Identifying construction candidates for the German constructicon
Published online: 28 May 2021
https://doi.org/10.1075/bjl.00030.bar
https://doi.org/10.1075/bjl.00030.bar
Abstract
The German Constructicon Project (www.german-constructicon.de) aims at documenting grammatical constructions in contemporary standard German on the basis
of annotated corpus examples, including relations between constructions and between constructions and evoked semantic frames. So
far, the research focus has been mainly on the development and computational implementation of a constructicographic workflow
(including a parsing pipeline) that allows for addressing any kind of constructions on varying levels of schematicity,
idiomaticity, and abstractness. However, such an exemplar-driven procedure precludes us from systematically identifying
constructional candidates. In this article, we scrutinize ways to operationalize and implement data-mining procedures to
inductively identify construction candidates.
Keywords: German constructicon, construction mining, patterns, n-gram, UIF-PMI, constructicography
Article outline
- 1.What’s out there in the constructicon?
- 2.Identifying constructions
- 3.Construction mining: Operationalization and implementation of a computational framework
- 3.1Generating a list of patterns
- 3.2From patterns to construction candidates
- 3.3cxnMiner: A framework for mining constructions
- 3.4Future work: From construction candidates to a constructicon
- 4.Conclusions
- Acknowledgements
- Notes
References
References (19)
Bäckström, Linnéa, Lars Borin, Markus Forsberg, Benjamin Lyngfelt, Julia Prentice, and Emma Sköldberg. 2013. “Automatic Identification of Construction Candidates for a Swedish Constructicon.” In Proceedings of the Workshop on Lexical Semantic Resources for NLP at NODALIDA 2013 (= NEALT Proceedings Series 19 / Linköping Electronic Conference Proceedings 88), ed. by Lars Borin, Ruth Vatvedt Fjeld, Markus Forsberg, Sanni Nimb, Pierre Nugues, and Bolette Sandford Pedersen, 2–11.
Borges Völker, Emanuel, Maximilian Wendt, Felix Hennig, and Arne Köhn. 2019. “HDT-UD: A Very Large Universal Dependencies Treebank for German.” In Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019), 46–57. Paris: Association for Computational Linguistics.
Borin, Lars, Dana Dannélls, and Normunds Grūzītis. 2018. “Linguistics vs. Language Technology in Constructicon Building and Use.” In: Constructicography: Constructicon Development across Languages, ed. by Benjamin Lyngfelt, Lars Borin, Kyoko Ohara, and Tiago Timponi Torrent, 229–254. Amsterdam, Philadelphia: John Benjamins.
Dunn, Jonathan. 2017. “Computational Learning of Construction Grammars.” Language and Cognition 9 (2): 254–292.
Fillmore, Charles J. 2008. “Border Conflicts: FrameNet Meets Construction Grammar.” In Proceedings of the XIII EURALEX International Congress Barcelona, ed. by Elisenda Bernal, and Janet De Cesaris, 49–68. Barcelona: Universitat Pompeu Fabra.
Foth, Kilian A., Arne Köhn, Niels Beuck, and Wolfgang Menzel. 2014. “Because Size Does Matter: The Hamburg Dependency Treebank.” In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), ed. by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 2326–2333. Reykjavik: European Language Resources Association (ELRA).
Forsberg, Markus, Richard Johansson, Linnéa Bäckström, Lars Borin, Benjamin Lyngfelt, Joel Olofsson, and Julia Prentice. 2014. “From Construction Candidates to Constructicon Entries: An Experiment Using Semi-Automatic Methods for Identifying
Constructions in Corpora.” Constructions and Frames, 6 (1): 114–135.
Fournier-Viger, Philippe, Jerry Chun-Wei Lin, Rage Uday Kiran, Yun Sing Koh, and Rincy Thomas. 2017. “A Survey of Sequential Pattern Mining.” Data Science and Pattern Recognition 1 (1): 54–77.
Goldberg, Adele E. 2006. Constructions at Work: The Nature of Generalization in Language. Oxford: Oxford University Press.
Guthrie, David, Ben Allison, Wei Liu, Louise Guthrie, and Yorick Wilks. 2006. “A Closer Look at Skip-gram Modelling.” In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), ed. by Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, and Daniel Tapias, 1222–1225. Genoa: European Language Resources Association (ELRA).
Herbst, Thomas (ed). 2019. From Lexicography to Constructicography. Special Issue of Lexicographica 35.
Honnibal, Matthew, and Ines Montani. 2017. spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental
Parsing.
Lyngfelt, Benjamin, Lars Borin, Kyoko Ohara, and Tiago Torrent (eds). 2018. Constructicography: Constructicon Development across Languages (=
Constructional Approaches to Language, 22). Amsterdam/Philadelphia: John Benjamins.
Martí, Maria Antònia, Mariona Taulé, Venelin Kovatchev, and Maria Salamó. 2019. “DISCOver: DIStributional Approach Based on Syntactic Dependencies for Discovering COnstructions.” In Corpus Linguistics and Linguistic Theory (published online ahead of print, 04.01.2019).
Shibuya, Yoshikata, and Kim Ebensgaard Jensen. 2015. “Mining for Constructions in Texts using N-Gram and Network Analysis.” Globe: A Journal of Language, Culture and Communication 21: 23–54.
Sidorov, Grigori. 2019. Syntactic N-Grams in Computational Linguistics (=
SpringerBriefs in Computer Science). Cham, Switzerland: Springer International Publishing.
Wible, David, and Nai-Lung Tsao. 2010. “StringNet as a Computational Resource for Discovering and Investigating Linguistic Constructions.” In: Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, ed. by Magnus Sahlgren, and Ola Knutsson, 25–31. Los Angeles: Association for Computational Linguistics.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 3 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
