Article published In: Lexical semantic approaches to terminology
Edited by Pamela Faber and Marie-Claude L'Homme
[Terminology 20:2] 2014
► pp. 198–224
Hunting for a linguistic phantom
A corpus-linguistic study of knowledge-rich contexts
Published online: 31 October 2014
https://doi.org/10.1075/term.20.2.04sch
https://doi.org/10.1075/term.20.2.04sch
The importance of semantic descriptions of concepts by means of defining statements is a commonplace tenet of scientific and practical approaches to terminology. While the current understanding of defining statements remains bound to classical concepts of defining, there is limited knowledge about the types of conceptual information that may ease the transfer of knowledge. Furthermore, there is little insight into how defining statements differ epistemologically from non-defining (generic) statements; on the linguistic side, the same can be said about linguistic differences between defining and generic statements. Last but not least, it remains unclear how practical terminology work can benefit from corpus-based research on the description of defining statements. This paper aims to shed light on some of these questions by describing a corpus-linguistic study of knowledge-rich contexts in German and Russian web corpora. Hypotheses about linguistic features of knowledge-rich contexts are derived in a theory-driven manner and researched by means of corpus-linguistic methods. Significant features are then investigated further for the German data, using a multivariate method.
Keywords: corpus linguistics, knowledge-rich contexts, Russian, German, web corpora
References (35)
Barnbrook, Geoff. 2002. Defining Language. A Local Grammar of Definition Sentences (Studies in Corpus Linguistics 11). Amsterdam: John Benjamins.
Baroni, Marco, and Stefan Evert. 2009. “Statistical Methods for Corpus Exploitation.” In Corpus Linguistics. An International Handbook. Vol. 21 (Handbücher zur Sprach- und Kommunikationswissenschaft 29.2), ed. by Anke Lüdeling, and Merja Kytö, 777–803. Berlin: de Gruyter.
Bierwisch, Manfred, and Ferenc Kiefer. 1969. “Remarks on Definitions in Natural Language.” In Studies in Syntax and Semantics (Foundations of Language 10), ed. by Ferenc Kiefer, 55–79. Dordrecht: Reidel.
Condamines, Anne. 2002. “Corpus Analysis and Conceptual Relation Patterns.” Terminology 8 (1): 141–162.
Cramer, Irene Magdalena. 2011. Definitionen in Wörterbuch und Text: Zur manuellen Annotation, korpusgestützten Analyse und automatischen Extraktion definitorischer Textsegmente im Kontext der computergestützten Lexikographie. Dissertation. Dortmund: Kulturwissenschaftliche Fakultät.
de Groc, Clément. 2011. “Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction.” In
IEEE/WIC/ACM: International Conference on Web Intelligence and Intelligent Agent Technology
, 497–498. Lyon, France.
Drescher, Martina. 1992. Verallgemeinerung als Verfahren der Textkonstitution. Untersuchungen zu französischen Texten aus mündlicher und schriftlicher Kommunikation. Stuttgart: Steiner.
Dubuc, Robert, and Andy Lauriston. 1997. “Terms and Contexts.” In Handbook of Terminology Management. Vol. 1: Basic Aspects of Terminology Management, ed. by Sue Ellen Wright, and Gerhard Budin, 80–87. Amsterdam: John Benjamins.
Fahmi, Ismail, and Gosse Bouma. 2006. “Learning to Identify Definitions Using Syntactic Features.” In
Workshop on Learning Structured Information in Natural Language Applications, 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006)
, 64–71. Trento, Italy.
Feliu, Judit, and Maria Teresa Cabré. 2002. “Conceptual Relations in Specialized Texts: New Typology and an Extraction System Proposal.” In
Terminology and Knowledge Engineering (TKE 2002), 45–49. Nancy, France.
Hamp, Birgit, and Helmut Feldweg. 1997. “GermaNet – A Lexical-Semantic Net for German.” In
Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, 35th Annual Meeting of the Association for Computational Linguistics/8th Conference of the European Chapter of the Association for Computational Linguistics (ACL/EACL 1997)
, 9–15. Madrid, Spain.
International Organization for Standardization. 2009. International Standard ISO 12620: 2009 – Terminology and Other Language and Content Resources – Specification of Data Categories and Management of a Data Category Registry for Language Resources. Geneva: ISO.
Malaisé, Véronique, Pierre Zweigenbaum, and Bruno Bachimont. 2005. “Mining Defining Contexts to Help Structuring Differential Ontologies.” Terminology 11 (1): 21–53.
Marshman, Elizabeth. 2008. “Expressions of Uncertainty in Candidate Knowledge-rich Contexts”. Terminology 14 (1): 124–151.
Meyer, Ingrid. 2001. “Extracting Knowledge-rich Contexts for Terminography: A Conceptual and Methodological Framework.” In Recent Advances in Computational Terminology (Natural Language Processing 2), ed. by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 279–302. Amsterdam: John Benjamins.
Oberholzer, Mirjam. 2002. Terminologische Definitionen: Form, Funktion, Extraktion. Lizenziatsarbeit. Zürich: Philosophische Fakultät.
Pearson, Jennifer. 1998. Terms in Context (Studies in Corpus Linguistics 1). Amsterdam: John Benjamins.
Przepiórkowski, Adam, Łukasz Degórski, Miroslav Spousta, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Vladislav Kuboň, and Beata Wójtowicz. 2007. “Towards the Automatic Extraction of Definitions in Slavic.” In
Workshop on Balto-Slavonic Natural Language Processing, 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007)
, 43–50. Prague, Czech Republic.
Quasthoff, Uwe, Matthias Richter, and Christian Biemann. 2006. “Corpus Portal for Search in Monolingual Corpora.” In
Language Resources and Evaluation (LREC 2006)
, 1799–1802. Genova, Italy.
R Development Core Team. 2012. R. A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
Sachs, Lothar, and Jürgen Hedderich. 2009. Angewandte Statistik. Methodensamlung mit R. 13th ed. Berlin: Springer.
Schmid, Helmut. 1994. “Probabilistic Part-of-Speech Tagging Using Decision Trees.” In
International Conference on New Methods in Language Processing
, 44–49. Manchester, UK.
Schumann, Anne-Kathrin. 2013. “Collection, Annotation and Analysis of Gold Standard Corpora for Knowledge-Rich Context Extraction in Russian and German.” In
Student Research Workshop, International Conference Recent Advances in Natural Language Processing (RANLP 2013)
, 134–141. Hissar, Bulgaria.
Sharoff, Serge. 2006. “Creating General-Purpose Corpora Using Automated Search Engine Queries.” In WaCky! Working Papers on the Web as Corpus (Studi Interdisciplinari su Traduzione, Lingue e Culture), ed. by Marco Baroni, and Silvia Bernardini, 63–98. Bologna: Gedit.
Sierra, Gerardo, Rodrigo Alarcón, César Aguilar, and Carme Bach. 2008. “Definitional Verbal Patterns for Semantic Relation Extraction.” Terminology 14 (1): 74–98.
Stanaitytė, Greta. 2005. Alltagsdefinitionen und ihre Funktionen. Dissertation. Mannheim: Philosophische Fakultät.
Storrer, Angelika, and Sandra Wellinghoff. 2006. “Automated Detection and Annotation of Term Definitions in German Text Corpora.” In
Language Resources and Evaluation (LREC 2006)
, 2373–2376. Genova, Italy.
Walter, Stephan. 2010. Definitionsextraktion aus Urteilstexten. Dissertation. Saarbrücken: Philosophische Fakultät.
Walter, Stephan, and Manfred Pinkal. 2006. “Automatic Extraction of Definitions from German Court Decisions.” In
Workshop on Information Extraction beyond the Document, 21st International Conference on Computational Linguistics/44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006)
, 20–28. Sydney, Australia.
Westerhout, Eline. 2009. “Definition Extraction Using Linguistic and Structural Features.” In
First Workshop on Definition Extraction, International Conference Recent Advances in Natural Language Processing (RANLP 2009)
, 61–67. Borovets, Bulgaria.
Wollschläger, Daniel. 2012. Grundlagen der Datenanalyse mit R. Eine anwendungsorientierte Einführung. 2nd ed. Berlin: Springer.
