In:The Progressive Revisited: Historical and Quantitative Studies in Germanic and Romance Languages
Edited by Alessandro Carlucci and Jerzy Nykiel
[Studies in Language Companion Series 236] 2025
► pp. 190–229
Searching for the progressive in treebanks
Published online: 12 September 2025
https://doi.org/10.1075/slcs.236.07de
https://doi.org/10.1075/slcs.236.07de
Abstract
Research on the progressive aspect in Germanic and Romance languages has benefited from corpus data. A
transparent, objective, reliable and replicable identification of such constructions in corpora is however
challenging. The present chapter presents preliminary methodological work in automatically retrieving and counting
authentic examples from treebanks, that is, grammatically annotated corpora. It demonstrates how selected
constructions that mark the progressive in Italian and Norwegian are collected from treebanks accessible through the
INESS platform. Deep syntactic relations such as those between predicates and arguments are factored in and
quantified. Corpus queries that exploit syntactic dominance relations are potentially more powerful than queries using
only linear precedence, but there is a relative shortage of treebank resources.
Keywords: treebank, dominance relations, progressive, Norwegian, Italian, periphrasis, corpus research, automatic search
Article outline
- 1.Introduction
- 2.Italian
- 2.1Data sourcing in previous work on Italian
- 2.2Retrieval of Italian progressives from UD treebanks
- 3.Norwegian
- 3.1Overview of Norwegian progressives and previous work
- 3.2NorGramBank
- 3.3Semi-fixed constructions with med and the infinitive
- 3.4Pseudo-coordinations
- 4.Conclusion
Notes References
References (44)
Abeillé, Anne (ed.). 2003. Treebanks:
Building and Using Parsed
Corpora. Dordrecht: Kluwer Academic Publishers.
Alzetta, Chiara, Dell’Orletta, Felice, Montemagni, Simonetta, Simi, Maria & Venturi, Giulia. 2018. Assessing
the impact of incremental error detection and correction. A case study on the Italian Universal Dependency
Treebank. In Proceedings of the Second Workshop on
Universal Dependencies, Marie-Catherine de Marneffe, Teresa Lynn & Sebastian Schuster (eds), 1–7. Association for Computational Linguistics.
Amato, Irene & Lenci, Alessandro. 2017. Story
of a construction: statistical and distributional analysis of the development of the Italian gerundival
periphrases. In Strutture linguistiche e dati
empirici in diacronia e sincronia, Francesca Strik Lievers & Giovanna Marotta (eds), 135–158. Pisa: Pisa University Press. <[URL]>.
Andersen, Gisle & Hofland, Knut. 2012. Building
a large corpus based on newspapers from the
web. In Exploring Newspaper Language: Using the Web
to Create and Investigate a Large Corpus of Modern Norwegian, Gisle Andersen (ed.), 1–28. Amsterdam & Philadelphia: John Benjamins.
Bertinetto, Pier Marco. 2000. The
progressive in Romance, as compared with
English. In Tense and Aspect in the Languages of
Europe, Östen Dahl (ed.), 559–604. Berlin: Mouton De Gruyter.
Bertinetto, Pier Marco, Ebert, Karen H. & De Groot, Casper. 2000. The
progressive in Europe. In Tense and Aspect in the
Languages of Europe, Östen Dahl (ed.), 517–558. Berlin: Mouton De Gruyter.
Brants, Sabine, Dipper, Stefanie, Hansen, Silvia, Lezius, Wolfgang & Smith, George. 2002. The
TIGER treebank. In Proceedings of the 1st Workshop on
Treebanks and Linguistic Theories, 24–41. <[URL]>.
Butt, Miriam, Dyvik, Helge, King, Tracy Holloway, Masuichi, Hiroshi & Rohrer, Christian. 2002. The
Parallel Grammar Project. In COLING-GEE ’02
Proceedings of the 2002 Workshop on Grammar Engineering and Evaluation, John Carroll, Nelleke Oostdijk & Richard Sutcliffe (eds), 1–7. Association for Computational Linguistics. <[URL]>.
Carlucci, Alessandro. 2018. The
Impact of the English Language in Italy: Linguistic Outcomes and Political
Implications. München: LINCOM.
. 2019. Contact,
change, and translation: a theoretical and empirical assessment of non-lexical
anglicisms. In Italy and the USA: Cultural Change
through Language and Narrative, Alessandro Carlucci, Guido Bonsaver & Matthew Reza (eds), 246–261. Cambridge: Modern Humanities Research Association, Legenda.
de Jong, Franciska, Maegaard, Bente, De Smedt, Koenraad, Fišer, Darja & Van Uytvanck, Dieter. 2018. CLARIN:
towards FAIR and responsible data science using language
resources. In Proceedings of the Eleventh
International Conference on Language Resources and Evaluation (LREC 2018), Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis & Takenobu Tokunaga (eds), 3259–3264. European Language Resources Association (ELRA). <[URL]>.
de Marneffe, Marie-Catherine, Manning, Christopher D., Nivre, Joakim & Zeman, Daniel. 2021. Universal
Dependencies. Computational
Linguistics 47(2): 255–308.
De Smedt, Koenraad, Rosén, Victoria & Meurer, Paul. 2015. Studying
consistency in UD treebanks with
INESS-Search. In Proceedings of the Fourteenth
Workshop on Treebanks and Linguistic Theories (TLT14), Markus Dickinson, Erhard Hinrichs, Agnieszka Patejuk & Adam Przepiórkowski (eds), 258–267. Warsaw: Institute of Computer Science, Polish Academy of Sciences. <[URL]>.
De Smedt, Koenraad, Samdal, Gunn Inger Lyse, Kyrkjebø, Rune, Al Ruwehy, Hemed Ali Hemed, Gjesdal, Øyvind Liland, Rosén, Victoria & Meurer, Paul. 2016. The
CLARINO Bergen Centre: development and
deployment. In Selected Papers from the CLARIN Annual
Conference 2015, October 14–16, 2015, Wrocław,
Poland, 1–12. Linköping: LiU Electronic Press. <[URL]>.
Dyvik, Helge, Meurer, Paul, Rosén, Victoria, De Smedt, Koenraad, Haugereid, Petter, Losnegaard, Gyri Smørdal, Lyse, Gunn Inger & Thunes, Martha. 2016. NorGramBank:
a ‘deep’ treebank for Norwegian. In Proceedings of
the 10th International Conference on Language Resources and Evaluation (LREC
2016), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk & Stelios Piperidis (eds), 3555–3562. European Language Resources Association (ELRA). <[URL]>.
Faarlund, Jan Terje, Lie, Svein & Vannebo, Kjell Ivar. 1997. Norsk
referansegrammatikk. Oslo: Universitetsforlaget.
Guevara, Emiliano Raul. 2010. NoWaC: a
large web-based corpus for Norwegian. In Proceedings
of the NAACL HLT 2010 Sixth Web as Corpus Workshop, Adam Kilgarriff & Dekang Lin (eds), 1–7. Association for Computational Linguistics. <[URL]>.
Johannessen, Janne Bondi, Nøklestad, Anders & Hagen, Kristin. 2000. A
web-based advanced and user friendly system: the Oslo corpus of tagged Norwegian
texts. In Proceedings of the Second International
Conference on Language Resources and Evaluation (LREC 2000), Maria Gavrilidou, George Carayannis, Stella Markantonatou, Stelios Piperidis & Gregory Stainhauer (eds). European Language Resources Association (ELRA). <[URL]>.
Johansson, Stig. 2007. Seeing
through Multilingual Corpora: On the Use of Corpora in Contrastive
Studies. Amsterdam: John Benjamins.
Kinn, Torodd. 2019. Framveksten
av pseudokoordinasjon med drive. Norsk Lingvistisk
Tidsskrift 37(2): 207–236. <[URL]>.
Kinn, Torodd, Blensenius, Kristian & Andersson, Peter. 2018. Posture,
location, and activity in Mainland Scandinavian
pseudocoordinations. CogniTextes 18.
Knudsen, Rune Lain & Fjeld, Ruth Vatvedt. 2013. LBK2013:
A balanced, annotated national corpus for Norwegian
Bokmål. In Proceedings of the Workshop on Lexical
Semantic Resources for NLP at
NODALIDA, 12–20. Linköping: LiU Electronic Press. <[URL]>.
Lin, Yuri, Michel, Jean-Baptiste, Lieberman, Erez Aiden, Orwant, Jon, Brockman, Will & Petrov, Slav. 2012. Syntactic
annotations for the Google Books Ngram
Corpus. In Proceedings of the ACL 2012 System
Demonstrations, 169–174. Association for Computational Linguistics. <[URL]>.
Lødrup, Helge. 2002. The
syntactic structures of Norwegian pseudocoordinations. Studia
linguistica 56(2): 121–143.
. 2019. Pseudocoordination
with posture verbs in Mainland Scandinavian: a grammaticalized progressive
construction? Nordic Journal of
Linguistics 42(1): 87–110.
Meurer, Paul. 2012. INESS-Search:
a search system for LFG (and other)
treebanks. In Proceedings of the LFG ’12
Conference, Miriam Butt & Tracy Holloway King (eds), 404–421. Stanford, CA: CSLI Publications. <[URL]>.
Meurer, Paul, Rosén, Victoria & De Smedt, Koenraad. 2020. Interactive
visualizations in INESS. In LingVis: Visual Analytics
for Linguistics, Miriam Butt, Annette Hautli-Janisz & Verena Lyding (eds), 55–85. Stanford, CA: CSLI Publications / University of Chicago Press. <[URL]>.
Nunberg, Geoffrey. 2009. Google’s
book search: a disaster for scholars. The Chronicle of Higher
Education. <[URL]>.
Onelli, Corinna, Proietti, Domenico, Seidenari, Corrado & Tamburini, Fabio. 2006. The
DiaCORIS project: a diachronic corpus of written
Italian. In Proceedings of the Fifth International
Conference on Language Resources and Evaluation (LREC 2006), Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk & Daniel Tapias (eds), 1212–1215. European Language Resources Association (ELRA). <[URL]>.
Rögnvaldsson, Eiríkur, Ingason, Anton Karl, Sigurðsson, Einar Freyr & Wallenberg, Joel. 2012. The
Icelandic Parsed Historical Corpus
(IcePaHC). In Proceedings of the Eighth International
Conference on Language Resources and Evaluation (LREC 2012), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk & Stelios Piperidis (eds), 1977–1984. European Language Resources Association (ELRA). <[URL]>.
Rosén, Victoria & De Smedt, Koenraad. 2007. Theoretically
motivated treebank coverage. In Proceedings of the
16th Nordic Conference of Computational Linguistics
(NoDaLiDa-2007), 152–159. Tartu: Tartu University Library. <[URL]>.
. 2022. Managing
treebank data with the Infrastructure for the Exploration of Syntax and Semantics
(INESS). In The Open Handbook of Linguistic Data
Management, Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller & Lauren B. Collister (eds). Cambridge, MA: The MIT Press.
Rosén, Victoria, De Smedt, Koenraad, Meurer, Paul & Dyvik, Helge. 2012. An
open infrastructure for advanced
treebanking. In META-RESEARCH Workshop on Advanced
Treebanking at LREC 2012, Jan Hajič, Koenraad De Smedt, Marko Tadić & António Branco (eds), 22–29. <[URL]>.
Rosén, Victoria, Dyvik, Helge J. Jakhelln, Meurer, Paul & De Smedt, Koenraad. 2017. Exploring
treebanks with INESS Search. In Proceedings of the
21st Nordic Conference on Computational Linguistics
(NoDaLiDa), 326–329. Linköping: LiU Electronic Press. <[URL]>.
Rosén, Victoria, Meurer, Paul & De Smedt, Koenraad. 2009. LFG
Parsebanker: a toolkit for building and searching a treebank as a parsed
corpus. In Proceedings of the Seventh International
Workshop on Treebanks and Linguistic Theories (TLT7), Frank Van Eynde, Anette Frank, Gertjan van Noord & Koenraad De Smedt (eds), 127–133. Utrecht: Landelijke Onderzoekschool Taalwetenschap (LOT).
Rossini Favretti, Rema, Tamburini, Fabio & De Santis, Cristiana. 2003. A
corpus of written Italian: a defined and a dynamic
model. In A Rainbow of Corpora: Corpus Linguistics
and the Languages of the World, Andrew Wilson, Paul Rayson & Tony McEnery (eds), 27–38. München: LINCOM.
Sulger, Sebastian, Butt, Miriam, King, Tracy Holloway, Meurer, Paul, Laczkó, Tibor, Rákosi, György, Dione, Cheikh Bamba, Dyvik, Helge, Rosén, Victoria, De Smedt, Koenraad, Patejuk, Agnieszka, Çetinoglu, Özlem, Arka, I Wayan & Mistica, Meladel. 2013. ParGramBank:
the ParGram parallel treebank. In Proceedings of the
51st Annual Meeting of the Association for Computational Linguistics, Hinrich Schütze, Pascale Fung & Massimo Poesio (eds), vol. 1, 550–560. Association for Computational Linguistics. <[URL]>.
Tonne, Ingebjørg. 2001. Progressives
in Norwegian and the theory of aspectuality. PhD
thesis. Oslo: University of
Oslo. <[URL]>.
. 2006. Elucidating
progressives in Norwegian. In A Festschrift for Kjell
Johan Sæbø, Torgrim Solstad, Atle Grønn & Dag Haug (eds), 173–186. Oslo: Unipub. <[URL]>.
Viola, Lorella. 2016. Stai
Scherzando? ‘Are you kidding?’: investigating the influence of dubbing on the Italian
progressive. Italian Journal of
Linguistics 28(2): 181–201. <[URL]>.
Wilkinson, Mark D. et al. 2016. The
FAIR guiding principles for scientific data management and
stewardship. Scientific
Data 3: 160018.
Wold, Sephanie Hazel G. 2017. INGlish
English. The progressive construction in learner narratives. PhD
thesis. Bergen, Norway: University of Bergen. <[URL]>.
