Measuring the degree of specialisation of sub-technical legal terms through corpus comparison: A domain-independent method

Pérez, María José Marín

doi:10.1075/term.22.1.04mar

Article published In: Terminology
Vol. 22:1 (2016) ► pp.80–102

Get fulltext from our e-platform

Download PDF

Measuring the degree of specialisation of sub-technical legal terms through corpus comparison

A domain-independent method

María José Marín Pérez

Published online: 19 May 2016

https://doi.org/10.1075/term.22.1.04mar

One of the most remarkable features of the legal English lexicon is the use of sub-technical vocabulary, that is, words frequently shared by the general and specialised fields which either retain a legal meaning in general English or acquire a specialised one in the legal context. As testing has shown, almost 50% of the terms extracted from BLaRC, an 8.85m word legal corpus, were found amongst the most frequent 2,000 word families of West’s (1953) GSL, Coxhead’s (2000) AWL or the BNC (2007), hence the relevance of this type of vocabulary in this English variety. Owing to their peculiar statistical behaviour in both contexts, it is particularly problematic to identify them and measure their termhood based on such parameters as their frequency or distribution in the general and specialised environments. This research proposes a novel termhood measuring method intended to objectively quantify this lexical phenomenon through the application of Williams’ (2001) lexical network model, which incorporates contextual information to compute the level of specialisation of sub-technical terms.

Keywords: Legal English, sub-technical terms, lexical networks, ESP, corpus linguistics

References (54)

Ahmad, Khurshid, Andrea Davies, Heather Fulford, and Monika Rogers. 1994. “What is a Term? The Semi-automatic Extraction of Terms from Text.” In Translation Studies: An Interdiscipline, ed. by Snell-Hornby, M.F. Pöchhacker, and K. Kaindl, 267–278. Amsterdam: John Benjamins.

Alcaraz Varó, Enrique. 1994. El Inglés Jurídico: Textos y Documentos. Madrid: Derecho.

. 2000. El Inglés Profesional y Académico. Madrid: Alianza Editorial.

Ananiadou, Sofia. 1988. A Methodology for Automatic Term Recognition. PhD Thesis, University of Manchester, Institute of Science and Technology, United Kingdom.

Aronson, Alan, and Françoise-Michel Lang. 2010. “An Overview of MetaMap: Historical Perspective and Recent Advances.” Journal of American Medical Informatics Association 17 (3): 229–236.

Baker, Mona. 1988. “Sub-technical Vocabulary and the ESP Teacher: An Analysis of some Rhetorical Items in Medical Journal Articles.” Reading in a Foreign Language 4 (2): 91–105.

Barrón-Cedeño, Alberto, Gerardo Sierra, Patrick Drouin, and Sofia Ananiadou. 2009. “An Improved Automatic Term Recognition Method for Spanish.” In Proceedings of the 10th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2009), ed. by A. Gelbuck, 125–136. Berlin: Springer-Verlag. ([URL]). Accessed January 2016.

Bourigault, Didier. 1992. “Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases.” In Proceedings of the 5th International Conference on Computational Linguistics , 977–981. Nantes, France.

Borja Albí, Anabel. 2000. El Texto Jurídico en Inglés y su Traducción. Barcelona: Ariel.

Cabré, María Teresa, Rosa Estopà, and Jorge Vivaldi. 2001. “Automatic Term Detection: A Review of Current Systems.” In Recent Advances in Computational Terminology, ed. by D. Bourigault, C. Jacquemin, and M.C. L’Homme, 53–87. Amsterdam: John Benjamins.

Chung, Teresa M. 2003. “A Corpus Comparison Approach for Terminology Extraction.” Terminology 9 (2): 221–246.

Chung, Teresa M., and Paul Nation. 2003. “Technical Vocabulary in Specialised Texts.” Reading in a Foreign Language 15 (2): 103–116.

Church, Kenneth W., and Patrick Hanks. 1990. “Word Association Norms, Mutual Information, and Lexicography.” Computational Linguistics 16 (1): 22–29.

Church, Kenneth W., and William Gale. 1995. “Inverse Document Frequency IDF: A Measure of Deviations from Poisson.” In Proceedings of the Third Workshop on Very Large Corpora, ed. by D. Yarowsky and K. Church, 121–130. Cambridge: Massachusetts Institute of Technology Press.

Cowan, Ronayne. 1974. “Lexical and Syntactic Research for the Design of EFL.” TESOL Quarterly 81: 389–399.

Coxhead, Averyl. 2000. “A New Academic Word List.” TESOL Quarterly 34 (2): 213–238.

Dagan, Ido, and Kenneth Church. 1994. “TERMIGHT: Identifying and Translating Technical Terminology.” In Proceedings of the 4th Conference on Applied Natural Language Processing , 34–40. Stuttgart, Germany ([URL]). Accessed January, 2016.

Daille, Beatrice. 1996. “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language, ed. by J.L. Klavans and P. Resnik, 29–36. Cambridge: Massachusetts Institute of Technology Press.

David, Sophie, and Pierre Plante. 1990. Termino 1.0. Research Report of Centre d’Analyse de Textes par Ordinateur. Université du Québec, Montréal.

Drouin, Patrick. 2003. “Term Extraction Using Non-technical Corpora as a Point of Leverage.” Terminology 9 (1): 99–117.

Dunning, Ted. 1993. “Accurate Methods for the Statistics of Surprise and Coincidence”. Computational Linguistics 19 (1): 61–74.

Fahmi, Ismail, Gosse Bouma, and Lonneke van der Plas. 2007. “Improving Statistical Method Using Known Terms for Automatic Term Extraction.” In Proceedings of Computational Linguistics in the Netherlands (CLIN 17), ed. by F. van Eynde, P. Dirix, I. Schuurman, and V. Vandeghinste, 1–8. Belgium: University of Leuven.

Farrell, Paul. 1990. Vocabulary in ESL: A Lexical Analysis of the English of Electronics and a Study of Semi-technical Vocabulary. Dublin: Centre for Language and Communication Studies.

Flowerdew, John. 2001. “Concordancing as Tool in Course Design.” In Small corpus Studies and ELT: Theory and Practice, ed. by M. Ghadessy, A. Henry, and R. Roseberry, 71–92. Amsterdam: John Benjamins.

Frantzi, Katerina T., and Sophia Ananiadou. 1999. “The C/NC Value Domain Independent Method for Multi-word Term Extraction.” Journal of Natural Language Processing 3 (2): 115–127.

Frantzi, Katerina, Sofia Ananiadoua, and Hideki Mima. 2000. “Automatic Recognition of Multi-Word Terms: The C-value/NC-value Method.” International Journal on Digital Libraries 3 (2): 115–130.

Geffet, Maayan, and Ido Dagan. 2005. “The Distributional Inclusion Hypotheses and Lexical Entailment.” In Proceedings of the Annual Meeting of the ACL , 107–114. Michigan, USA.

Heatley, Alex, and Paul Nation. 2002. Range. Computer software. Wellington, New Zealand: Victoria University of Wellington.

Jacquemin, Christian. 2001. Spotting and Discovering Terms through NLP. Cambridge: Massachusetts Institute of Technology Press.

Joslyn, Cliff, Patrick Paulson, and Karin Verspoor. 2008. “Exploiting Term Relations for Semantic Hierarchy Construction.” In Proceedings of the International Conference of Semantic Computing IEEE , 42–49. Santa Clara (CA), USA.

Justeson, John S., and Slava M. Katz. 1995. “Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text.” Natural Language Engineering 1 (1): 9–27.

Kit, Chunyu, and Xiaoyue Liu. 2008. “Measuring Mono-word Termhood by Rank Difference via Corpus Comparison.” Terminology 14 (2): 204–229.

Lemay, Chantal, Marie-Claude L’Homme, and Patrick Drouin. 2005. “Two Methods for Extracting “Specific” Single-Word Terms form Specialised Corpora.” International Journal of Corpus Linguistics 10 (2): 227–255.

Loginova, Elizabeta, Anita Gojun, Helena Blancafort, María Guegan, Tatiana Gornostay, and Ulrich Heid. “Reference Lists for the Evaluation of Term Extraction Tools.” In Proceedings of TKE 2012: Terminology and Knowledge Engineering , 177–192. Madrid: Universidad Politécnica de Madrid. ([URL]), Accessed January 2016.

Marín, María José. 2014. “Evaluation of Five Single-word Term Recognition Methods on a Legal Corpus.” Corpora 9 (1): 83–107.

Marín, María José, and Camino Rea. 2012. “Structure and Design of the BLRC: A Legal Corpus of Judicial Decisions from the UK.” Journal of English Studies 101: 131–145.

Maynard, Diana, and Sofia Ananiadou. 2000. “TRUCKS: A Model for Automatic Multi-word Term Recognition”. Journal of Natural Language Processing 8 (1): 101–125.

Mellinkoff, David. 1963. The Language of the Law. Boston: Little, Brown & Co.

Nakagawa, Hiroshi, and Tatsunori Mori. 2002. “A Simple but Powerful Automatic Term Extraction Method.” In COLING-02 on COMPUTERM. Proceedings of the Second International Workshop on Computational Terminology , 1–7. Taipei, Taiwan.

Nazar, Rogelio, and María Teresa Cabré. 2012. “Supervised Learning Algorithms Applied to Terminology Extraction.” In Proceedings of the 10th Terminology and Knowledge Engineering Conference TKE 2012, ed. by G. Aguado de Cea, M.C. Suárez-Figueroa, R. García-Castro, and E. Montiel-Ponsoda, 209–217. Madrid: Ontology Engineering Group, Association for Terminology and Knowledge Transfer.

Orts, María Ángeles. 2006. Aproximación al Discurso Jurídico en Inglés: Las Pólizas de Seguro Marítimo de Lloyd’s. Madrid: Edisofer.

Panzienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto. 2005. “Terminology Extraction: An Analysis of Linguistic and Statistical Approaches.” Studies in Fuzziness and Soft Computing 1851: 225–279.

Park, Younja, Roy Byrd, and Branimir Boguraev. 2002. “Automatic Glossary Extraction: Beyond Terminology Association.” In Proceedings of COLING’02 19th International Conference on Computational Linguistics , ed. by S.C. Zeng, 1–7. Taipei, Taiwan.

Sclano, Francesco, and Paola Velardi. 2007. “A Web Application to Learn the Common Terminology of Interest Groups and Research Communities.” In Proceedings of the Conference TIA-2007, ed. by C. Engehard and R.D. Kuntz, 85–94. Grenoble: Presses Universitaires de Grenoble.

Scott, Mike. 2008. WordSmith Tools Version 5. Liverpool: Lexical Analysis Software.

Sparck-Jones, Kathleen. 1972. “A Statistical Interpretation of Term Specificity and its Application in Retrieval.” Journal of Documentation 281: 11–21.

Tiersma, Peter. 1999. Legal Language. Chicago: The University of Chicago Press.

Trimble, Louis. 1985. English for Science & Technology: A Discourse Approach. Cambridge: Cambrige University Press.

Vivaldi, Jorge 2001. Extracción de Candidatos a Término mediante Combinación de Estrategias Heterogéneas. PhD Thesis. Universidad Politécnica de Cataluña.

Vivaldi, Jorge, Diego Cabrera, Luis Adrián, Gerardo Sierra and María Pozzi. 2012. “Using Wikipedia to Validate the Terminology Found in a Corpus of Basic Textbooks.” In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) , 3820–3827. Instambul: Instambul Lütfi Kırdar Convention and Exhibition Centre. ([URL]). Accessed January 2016.

Wang, Karen, and Paul Nation. 2004. “Word Meaning in Academic English: Homography in the Academic Word List.” Applied Linguistics 25 (3): 291–314.

Weeds, Julie, David Weir, and Diana McCarthy. 2004. “Characterising Measures of Lexical Distributional Similarity.” In Proceedings of Coling-04 . 1–7, Geneva, Switzerland.

West, Michael. 1953. A General Service List of English Words. London: Longman.

Williams, Geoffrey. 2001. “Mediating between Lexis and Texts: Collocational Networks in Specialised Corpora.” ASp, la Revue du GERAS 31-331: 63–76.

Cited by (3)

Cited by three other publications

Marín, María José

2023. Automatic term recognition and legal language. In Handbook of Terminology [Handbook of Terminology, 3], ► pp. 511 ff.

Pérez, María José Marín & Ángela Almela

2022. The representation of migrants in Spanish judicial decisions: using corpus data to refute hate speech. Corpora 17:2 ► pp. 167 ff.

Llopis, María Ángeles Orts

2017. Terror at Home On the Rhetoric of Domestic Violence Legislation in the United Kingdom and Spain. Journal of Intercultural Communication 17:2 ► pp. 1 ff.

This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.