Article published In: International Journal of Learner Corpus Research
Vol. 7:2 (2021) ► pp.259–274
Materials & Methods Report
fsca
French syntactic complexity analyzer
Published online: 11 October 2021
https://doi.org/10.1075/ijlcr.20018.van
https://doi.org/10.1075/ijlcr.20018.van
Abstract
This article reports on an open-source R package for the extraction of syntactic units from dependency-parsed
French texts. To evaluate the reliability of the package, syntactic units were extracted from a corpus of L2 French and were
compared to units extracted manually from the same corpus. The f-score of the extracted units ranged from 0.53–0.97. Although
units were not always identical between the two methods, manual and automatically-derived syntactic complexity measures were
strongly and significantly correlated (ρ = 0.62–0.97, p < 0.001), suggesting that this
package may be a suitable replacement for manual annotation in some cases where manual annotation is not possible but that care
should be used in interpreting the measures based on these units.
Article outline
- 1.Introduction
- 2.Methodology
- 2.1Manual annotation
- 2.2Automatic extraction of syntactic units
- 3.Results
- 3.1Precision and recall of automatically identified units
- 3.2Correlation between manual and automatic methods
- 3.3Sources of error
- 4.Discussion and conclusion
- Disclosures
- Acknowledgements
- Notes
References
References (34)
Abeillé, A., & Barrier, N. (2004). Enriching
a French treebank. In Proceedings of the Fourth International
Conference on Language Resources and Evaluations (LREC
’04), 2233–2236.
Benevento, C., & Storch, N. (2011). Investigating
writing development in secondary school learners of French. Assessing
Writing, 16(2), 97–110.
Bernardini, P., & Granfeldt, J. (2019). On
cross-linguistic variation and measures of linguistic complexity in learner texts: Italian, French and
English. International Journal of Applied
Linguistics, 29(2), 211–232.
Brown, J. D. (2014). Classical
theory reliability. In A. J. Kunnen (Ed.), The
companion to language
assessment (pp. 1165–1181). Oxford: Wiley-Blackwell.
Candito, M., Nivre, J., Denis, P., & Anguiano, E. H. (2010). Benchmarking
of statistical dependency parsers for French. In Proceedings of the
23rd International Conference on Computational Linguistics (COLING 2010: Poster
Volume), 108–116.
Council of Europe. (2001). The common
european framework of reference for languages: Learning, teaching,
assessment. Cambridge: Cambridge University Press.
Csardi, G., & Nepusz, T. (2006). The
igraph software package for complex network research. InterJournal (Complex
Systems), 1695. [URL]
De Clercq, B., & Housen, A. (2017). A
cross-linguistic perspective on syntactic complexity in L2 development: Syntactic elaboration and
diversity. The Modern Language Journal, 101(2), 315–334.
Demol, A., & Hadermann, P. (2008). An
exploratory study of discourse organisation in French L1, Dutch L1, French L2 and Dutch L2 written
narratives. In G. Gilquin, S. Papp, & M. B. Díez-Bedmar (Eds.), Linking
up contrastive and learner corpus
research (pp. 255–282). Amsterdam: Brill.
Denis, P., & Sagot, B. (2012). Coupling
an annotated corpus and a lexicon for state-of-the-art POS tagging. Language Resources and
Evaluation, 461, 721–736.
Garretson, G. (2011). Dexter
coder. Retrieved from [URL]
Gyllstad, H., Granfeldt, J., Bernardini, P., & Källkvist, M. (2014). Linguistic
correlates to communicative proficiency levels of the CEFR: The case of syntactic complexity in written L2 English, L3 French
and L4 Italian. EuroSLA
Yearbook, 14(1), 1–30.
Henry, L., & Wickham, H. (2020). purrr:
Functional programming tools. Retrieved from [URL]
Honnibal, M., & Montani, I. (2017). spaCy
2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental
parsing.
Klein, D., & Manning, C. (2003). Fast
exact inference with a factored model for natural language
parsing. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances
in neural information processing
systems 151 (pp. 3–10). Cambridge, MA: The MIT Press.
Kuiken, F., & Vedder, I. (2008). Cognitive
task complexity and written output in Italian and French as a foreign language. Journal of
Second Language
Writing, 17(1), 48–60.
Kyle, K. (2021). (Ed.) Natural language processing for learner corpus research [Special issue]. International Journal of Learner Corpus Research 7(1).
Kyle, K., & Crossley, S. A. (2018). Measuring
syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102(2), 333–349.
Landis, J. R., & Koch, G. G. (1977). The
measurement of observer agreement for categorical
data. Biometrics, 33(1), 159–174.
Lu, X. (2010). Automatic
analysis of syntactic complexity in second language writing. International Journal of Corpus
Linguistics, 15(4), 474–496.
(2011). A
corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language
development. TESOL
Quarterly, 45(1), 36–62.
Nivre, J., Hall, J., & Nilsson, J. (2006). MaltParser:
A data-driven parser-generator for dependency parsing. In Proceedings
of the Fifth International Conference on Language Resources and Evaluation (LREC
2006), 2216–2219.
Norris, J. M., & Ortega, L. (2009). Towards
an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied
Linguistics, 30(4), 555–578.
Ortega, L. (2003). Syntactic
complexity measures and their relationship to L2 proficiency: A research synthesis of college level L2
writing. Applied
Linguistics, 24(4), 492–518.
Plonsky, L., & Derrick, D. J. (2016). A
meta-analysis of reliability coefficients in second language research. The Modern Language Journal, 100(2), 538–553.
Plonsky, L., & Oswald, F. L. (2014). How
big is “big”? Interpreting effect sizes in L2 research. Language
Learning, 64(4), 878–912.
R Core Team. (2019). R: A language
and environment for statistical computing. Retrieved from [URL]
RStudio Team. (2018). RStudio:
Integrated Development Environment for R. Retrieved from [URL]
Scott, W. A. (1955). Reliability
of content analysis: The case of nominal scale coding. The Public Opinion
Quarterly, 19(3), 321–325.
Shrout, P. E. (1998). Measurement
reliability and agreement in psychiatry. Statistical Methods in Medical
Research, 7(3), 301–317.
Vanderbauwhede, G. (2012). Le déterminant démonstratif en français et en néerlandais à travers les corpus: Théorie, description,
acquisition (Unpublished doctoral
dissertation). Katholieke Universiteit Leuven, Leuven, Belgium; Université Paris Ouest Nanterre La Défense, Paris, France.
Vandeweerd, N., Housen, A., & Paquot, M. (2021). Applying phraseological complexity measures to L2 French: A partial replication
study. International Journal of Learner Corpus Research, 7(2), 197–229.
Cited by (4)
Cited by four other publications
Alzahrani, Alaa & Adel Alfaifi
Loignon, Guillaume
Loignon, Guillaume
Vandeweerd, Nathan, Alex Housen & Magali Paquot
2021. Applying phraseological complexity measures to L2 French. International Journal of Learner Corpus Research 7:2 ► pp. 197 ff.
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
