Chapter 4. How to compare speed and accuracy of syntactic parsers

van Noord, Gertjan

doi:10.1075/z.210.04van

In:Crossroads Semantics: Computation, experiment and grammar
Edited by Hilke Reckman, Lisa Lai-Shen Cheng, Maarten Hijzelendoorn and Rint Sybesma
[Not in series 210] 2017
► pp. 57–76

Get fulltext from our e-platform

Download Book PDF

Chapter 4
How to compare speed and accuracy of syntactic parsers

Gertjan van Noord | University of Groningen

Published online: 12 April 2017

https://doi.org/10.1075/z.210.04van

Abstract

The paper introduces a methodological innovation as well as a practical innovation. Firstly, two scenarios are introduced to compare accurate, but slow parsers on the one hand, with faster, but less accurate parsers on the other hand. Secondly, a corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be filtered without significant loss in parsing accuracy, but with an important increase in parsing efficiency. Experimental results with the Alpino parser for Dutch indicate that the technique yields much faster parsers that perform with almost the same level of accuracy. An interesting characteristic of our approach is that it is self-learning, in the sense that it uses unannotated corpora.

Keywords: Parsing, Alpino, Dutch, Efficiency, Accuracy, HPSG

Article outline

1.Introduction
2.Background: The Alpino parser for Dutch
3.Methodology: Balancing efficiency and accuracy
- 3.1On-line and off-line parsing scenarios
  - 3.1.1On-line scenario
  - 3.1.2Off-line scenario
- 3.2Accuracy: Comparing sets of dependencies
4.Learning efficient parsing
- 4.1Left-corner parsing
- 4.2Left-corner splines
- 4.3Filtering left-corner splines
  - 4.3.1Context size
  - 4.3.2Required evidence
- 4.4Comparison with link table
- 4.5Implementation detail
5.Experimental results
- 5.1Results on Alpino Treebank
- 5.2Effect of the amount of training data
- 5.3Experiment with D-Coi data
6.Specializing lexical categories
7.Discussion
Acknowledgements
Note
References

References (16)

References

van der Beek, Leonoor, Gosse Bouma, Robert Malouf & Gertjan van Noord. 2002. The Alpino dependency treebank. In Computational linguistics in the Netherlands.

den Boogaart, P. C. Uit. 1975. Woordfrequenties in geschreven en gespro-ken Nederlands. Utrecht: Oosthoek, Scheltema & Holkema. Werkgroep Frequentie-onderzoek van het Nederlands.

Hoekstra, Heleen, Michael Moortgat, Bram Renmans, Machteld Schouppe, Ineke Schuurman & Ton van der Wouden. 2003. CGN syntactische annotatie.

Matsumoto, Y., H. Tanaka, H. Hirakawa, H. Miyoshi & H. Yasukawa. 1983. BUP: a bottom up parser embedded in Prolog. New Generation Computing 1(2).

Ninomiya, Takashi, Yoshimasa Tsuruoka, Yusuke Miyao & Jun’ichi Tsujii. 2005. Efficacy of beam thresholding, unification filtering and hybrid parsing in probabilistic HPSG parsing. In Proceedings of the international workshop on parsing technologies (IWPT).

van Noord, Gertjan. 1997. An efficient implementation of the head corner parser. Computational Linguistics 23(3). 425–456. Cmp-lg/9701004.

. 2006. At Last Parsing Is Now Operational. In Taln 2006 verbum ex machina, actes de la 13e conference sur le traitement automatique des langues naturelles, 20–42. Leuven.

van Noord, Gertjan & Robert Malouf. 2005. Wide coverage parsing with stochastic attribute value grammars. Draft available from the authors. A Preliminary Version of This Paper Was Published In The Proceedings of The Ijcnlp Workshop Beyond Shallow Analyses, Hainan China, 2004.

van Noord, Gertjan, Ineke Schuurman & Vincent Vandeghinste. 2006. Syntactic annotation of large corpora in STEVIN. In Proceedings of the 5th international conference on language resources and evaluation (lrec), Genoa, Italy.

Ordelman, Roeland, Franciska de Jong, Arjan van Hessen & Hendri Hondorp. 2007. TwNC: a multifaceted Dutch news corpus. ELRA Newsletter 12(3/4). 4–7.

Pereira, Fernando C. N. & Stuart M. Shieber. 1987. Prolog and natural language analysis. Center for the Study of Language and Information, Stanford.

Prins, Robbert. 2005. Finite-state pre-processing for natural language analysis: University of Groningen dissertation.

Rayner, Manny & David Carter. 1996. Fast parsing using pruning and grammar specialization. In 34th annual meeting of the association for computational linguistics, Santa Cruz.

Samuelsson, Christer. 1994. Grammar specialization through entropy thresholds. In 32th annual meeting of the association for computational linguistics, New Mexico: ACL.

Sima’an, Khalil. 1999. Learning efficient disambiguation: University of Utrecht dissertation.

Tsuruoka, Yoshimasa, Yusuke Miyao & Jun’ichi Tsujii. 2004. Towards efficient probabilistic hpsg parsing: integrating semantic and syntactic preference to guide the parsing. In Beyond shallow analyses – formalisms and statistical modeling for deep analyses, Hainan China: IJCNLP.

Chapter 4How to compare speed and accuracy of syntactic parsers

Chapter 4
How to compare speed and accuracy of syntactic parsers