Automatic acquisition of verb subcategorization information by exploiting mininal linguistic resources

Kermanidis, Katia Lida; Fakotakis, Nikos; Kokkinakis, George

doi:10.1075/ijcl.9.1.01ker

Article published In: International Journal of Corpus Linguistics
Vol. 9:1 (2004) ► pp.1–28

Get fulltext from our e-platform

Download PDF

Automatic acquisition of verb subcategorization information by exploiting mininal linguistic resources

Katia Lida Kermanidis | University of Patras, Greece

Nikos Fakotakis | University of Patras, Greece

George Kokkinakis | University of Patras, Greece

Published online: 29 April 2004

https://doi.org/10.1075/ijcl.9.1.01ker

A set of well known statistical filtering methods (binomial hypothesis testing, log-likelihood ratio, t-test, thresholds on relative frequencies) is used on Modern Greek and English corpora in order to automatically acquire verb subcategorization frames that are not limited in number and are not known beforehand. As sophisticated linguistic resources and tools are not available for most languages (including Modern Greek), pre-processing of our corpora reaches merely the stage of elementary, intrasentential, non-embedded phrase chunking. By forming, permutating and counting subsets of the verb's neighboring set of phrases, and by applying the statistical filters mentioned previously, valid syntactic frames of verbs are detected. The results achieved were comparable to and, in several cases, better than the ones of previous approaches, even approaches utilizing richer resources. Incorporating the extracted list of frames into a shallow parser, the performance of the latter increases by almost 6%, showing thereby the importance of the acquired knowledge.

Keywords: Modern Greek, subcategorization, hypothesis testing, shallow parsing

Cited by (4)

Cited by four other publications

Order by:

Orasmaa, Siim

2013. Verb Subcategorisation Acquisition for Estonian Based on Morphological Information. In Text, Speech, and Dialogue [Lecture Notes in Computer Science, 8082], ► pp. 583 ff.

EunJooLee

2008. An analysis of corpus-based research on TEFL and applied linguistics.. English Teaching 63:2 ► pp. 283 ff.

KERMANIDIS, KATIA, MANOLIS MARAGOUDAKIS, NIKOS FAKOTAKIS & GEORGE KOKKINAKIS

2008. Learning verb complements for Modern Greek: balancing the noisy dataset. Natural Language Engineering 14:1 ► pp. 71 ff.

Forsberg, Markus, Harald Hammarström & Aarne Ranta

2006. Morphological Lexicon Extraction from Raw Text Data. In Advances in Natural Language Processing [Lecture Notes in Computer Science, 4139], ► pp. 488 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.