Article published In: Selected Papers from the 37th Annual Symposium on Arabic Linguistics
Edited by Reem Khamis and Mira Goral
[Arabic Linguistics 1:2] 2025
► pp. 240–263
Ellipsis in Arabic
Using machine learning to detect and predict elided words
Published online: 27 February 2026
https://doi.org/10.1075/arli.00013.abd
https://doi.org/10.1075/arli.00013.abd
Abstract
This paper introduces the Hoosiers Arabic Ellipsis Corpus, a novel dataset targeting syntactic ellipsis in Arabic. Addressing the significant challenge ellipsis poses to natural language processing (NLP) technologies, the Hoosiers Arabic Ellipsis Corpus leverages the Corpus Query Language (CQL) to extract ellipsis instances from the ArTenTen corpus. To the best of our knowledge, this is the first comprehensive dataset of its kind, filling a critical gap in resources for Arabic, which remains under-resourced in NLP studies. We evaluate the corpus through three computational experiments: detecting sentences with ellipsis, predicting the location of elided elements, and generating missing words using state-of-the-art large language models (LLMs). Results demonstrate that few-shot prompting significantly improves LLM performance, with Gemini 2.5 Pro achieving the highest accuracy in ellipsis detection (95.6%). However, LLMs struggled with precisely locating and reconstructing elided elements. The findings highlight the challenges of ellipsis processing in Arabic and point to the need for larger, more balanced datasets and further refinement of NLP models to handle structural inference.
Keywords: Ellipsis, Large Language Models, Arabic NLP, computational syntax
Article outline
- 1.Introduction
- 2.Ellipsis in Arabic
- 2.1NP Ellipsis
- 2.2VP Ellipsis
- 2.3Gapping
- 2.4Stripping
- 2.5Sluicing
- 2.6Fragment answer
- 3.Related works
- 4.Rationale
- 5.Corpus creation and annotation
- 5.1Corpus description
- 6.Experiments and results
- 7.Limitations of the study
- 8.Conclusion
- Note
References
References (33)
Abdelali, Ahmed, Darwish, Kareem, Durrani, Nadir, & Mubarak, Hamdy (2016). Farasa: A fast and furious segmenter for arabic. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations (pp. 11–16). Association for Computational Linguistics. [URL].
AbuOdeh, Muhammed, Phan, Long, Elshabrawy, Ahmed, & Habash, Nizar (2024). Palmyra 3.0: A User-Friendly Cloud-Based Platform for Morphology and Dependency Syntax Annotation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 12585–12591).
Al Mana, Suaad (1986). Poetic necessity from the perspective of the medieval Arab critics and rhetoricians (Al-Darurah, Sibawayhi, poetic language, Qudamah Ibn Ja’far) [Doctoral dissertation, University of Michigan].
Algryani, Ali (2019). The syntax of sluicing in Omani Arabic. International Journal of English Linguistics, 9(6), 337–346.
Alhalalmeh, Bahjat (2020). Nominal ellipsis in Jordanian Arabic Advertisements. Journal of the Faculty of Arts and Humanities, Suez Canal University, 3(32), 1–31.
Al-Horais, Nasser (2000). Arabic negation marker (Laysa) with bare argument ellipsis and its association with information structure. Argument, 20011, 2006–2008.
Al-Khawalda, Mohammad (2002). Ellipsis in Arabic and English. International Journal of Arabic-English Studies, 3(1), 183–199.
Al-Liheibi, Fahd (1999). Aspects of sentence analysis in the Arabic linguistic tradition, with particular reference to ellipsis [Doctoral dissertation, Durham University].
Antoun, Wissam, Baly, Fady, & Hajj, Hazem (2020). AraBERT: Transformer-based model for Arabic language understanding. In Hend Al-Khalifa, Walid Magdy, Kareem Darwish, Tamer Elsayed, Hamdy Mubarak (Eds.), Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 9–15). Marseille, France: European Language Resource Association. [URL]
Arts, Tressy, Belinkov, Yonatan, Habash, Nizar, Kilgarriff, Adam, & Suchomel, Vit (2014). arTenTen: Arabic Corpus and Word Sketches. Journal of King Saud University-Computer and Information Sciences, 26(4), 357–371.
Assiri, Ahmed (2021). Gapping in Modern Standard Arabic: An Agree-Based Analysis. Umm Al-Qura University Journal for Languages & Literature, (27).
Bouzid, Saoussen, & Zribi, Chiraz (2021). Efficient learning approach for pronominal anaphora and ellipsis identification and resolution in Arabic texts. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 291, 3335–3348.
Cavar, Damir, Mompelat, Ludovic & Abdo, Muhammad (2024). The Typology of Ellipsis: A Corpus for Linguistic Analysis and Machine Learning Applications. In Michael Hahn and Alexey Sorokin and Ritesh Kumar, et al., (Eds.), Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (pp. 46–54). Association for Computational Linguistics.
Elshabrawy, Ahmed, AbuOdeh, Muhammed, Inoue, Go, & Habash, Nizar (2023). CamelParser2. 0: A State-of-the-Art Dependency Parser for Arabic. In Proceedings of ArabicNLP 2023 (pp. 170–180).
Fatani, Afnan. (2010). Al-Zarkashī on Ellipsis in the Qur’ān: A Translation & Critical Synopsis. Journal of Arabic Linguistics, 81.
Green, Spence, & Manning, Christopher (2010). Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010) (pp. 394–402).
Habash, Nizar, & Roth, Ryan (2009). CATiB: The Columbia Arabic Treebank. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 221–224). Association for Computational Linguistics.
Haddar, Kais, & Hamadou, Abdelmajid (1998). An Ellipsis Detection Method Based on a Clause Parser for Arabic Language. In Proceedings of the Eleventh International Florida Artificial Intelligence Research Society Conference (pp. 270–274).
Hawkins, Roger (2012). Knowledge of English verb phrase ellipsis by speakers of Arabic and Chinese. Linguistic Approaches to Bilingualism, 2(4), 404–438.
Homerin, Thomas (2007). [Review of the book The Diwan of Ibn al-Farid: Readings of its Text Throughout History, by G. Scattolin]. Mamlūk Studies Review, 11(1), [243].
Johnson, Kyle (2001). What VP Ellipsis Can Do, and What it Can’t, But Not Why. The Handbook of Contemporary Syntactic Theory, 439–479. Portico.
Kilgarriff, Adam, Rychly, Pavel, Smrz, Pavel, & Tugwell, David (2008). The Sketch Engine. In P. Fontenelle (Ed.), Practical Lexicography: A Reader (pp. 297–306). Oxford University Press.
Maamouri, Mohamed, Bies, Ann, Buckwalter, Tim, & Mekki, Wigdan (2004). The penn arabic treebank: Building a large-scale annotated arabic corpus. In NEMLAR conference on Arabic language resources and tools (Vol. 271, pp. 466–467).
Manning, Christopher, Surdeanu, Mihai, Bauer, John, Finkel, Jenny, Bethard, Steven & McClosky, David (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55–60).
Mansour, Mohamed (2007). Semantic constraints on licensing VP-ellipsis and VP-gapping in Arabic. Bulletin of the Faculty of Arts, Assiut University.
. Ellipsis: A survey of analytical approaches. In Jeroen van Craenenbroeck and Tanja Temmerman (Eds.), The Oxford Handbook of Ellipsis, Oxford Handbooks (2018), , accessed 2 June 2024.
Quirk, Randolph, Greenbaum, Sidney, Leech, Geoffrey, & Svartvik, Jan (1985). A comprehensive grammar of the English language. Longman.
