Article published In: Recent Advances in Automatic Readability Assessment and Text Simplification
Edited by Thomas François and Delphine Bernhard
[ITL - International Journal of Applied Linguistics 165:2] 2014
► pp. 194–222
Readability assessment for text simplification
From analysing documents to identifying sentential simplifications
Published online: 23 January 2015
https://doi.org/10.1075/itl.165.2.04vaj
https://doi.org/10.1075/itl.165.2.04vaj
Readability assessment can play a role in the evaluation of a simplification algorithm as well as in the identification of what to simplify. While some previous research used traditional readability formulas to evaluate text simplification, there is little research into the utility of readability assessment for identifying and analyzing sentence level targets for text simplification. We explore this aspect in our paper by first constructing a readability model that is generalizable across corpora and across genres and later adapting this model to make sentence-level readability judgments.
First, we report on experiments establishing that the readability model integrating a broad range of linguistic features works well at a document level, performing on par with the best systems on a standard test corpus. Next, the model is confirmed to be transferable to different text genres. Moving from documents to sentences, we investigate the model’s ability to correctly identify the difference in reading level between a sentence and its human simplified version. We conclude that readability models can be useful for identifying simplification targets for human writers and for evaluating machine generated simplifications.
References (69)
Allen, D. (2009). Using a corpus of simplified news texts to investigate features of the intuitive approach to simplification.
Proceedings of the Corpus Linguistics Conference
(pp. 585–599).
Aluisio, S., Specia, L., Gasperin, C., & Scarton, C. (2010). Readability assessment for text simplification.
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
(pp. 1–9). Association for Computational Linguistics.
Aranzabe, M.J., de Ilarraza, A.D., & Gonzalez-Dios, I. (2012). First approach to automatic text simplification in Basque.
Proceedings of the First workshop on Natural Language Processing for Improving Textual Accessibility (NLP4ITA)
(pp. 1–8).
Baayen, R.H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (CD-ROM). [URL].
Bach, N., Gao, Q., Vogel, S., & Waibel, A. (2011). TriS: A statistical sentence simplifier with log-linear models and margin-based discriminative training.
Proceedings of 5th International Joint Conference on Natural Language Processing (IJCNLP)
(pp. 474–482).
Barlacchi, G., & Tonelli, S. (2013). ERNESTA: A sentence simplification tool for children’s stories in Italian.
Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing)
, (pp. 476–487).
Biran, O., Brody, S., & Elhadad, N. (2011). Putting it simply: A context-aware approach to lexical simplification.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT)
(pp. 496–501).
Boston, M.F., Hale, J.T., Patil, U., Kliegl, R., & Vasishth, S. (2008). Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus. Journal of Eye Movement Research, 21, 1–12.
Bott, S., & Saggion, H. (2011). Spanish text simplification: An exploratory study.
Proceedings of the 27th Conference of the Spanish Society for Natural Language Processing
(pp. 87–95).
Carroll, J., Minnen, G., Canning, Y., Devlin, S., & Tait, J. (1998). Practical simplification of English newspaper text to assist aphasic readers.
Proceedings of the AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology
(pp. 7–10).
Canning, Y., Tait, J., Archibald, J., & Crawley, R. (1999). Cohesive generation of syntactically simplified newspaper text.
Proceedings of the Third International Workshop on Text, Speech and Dialogue
(pp. 145–150).
Chall, J.S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookeline Books.
Chandrasekar, R., Doran, C., & Srinivas, B. (1996). Motivations and methods for text simplification.
Proceedings of the 16th Conference on Computational Linguistics (COLING)
(pp. 1041–1044).
Chandrasekar, R., & Srinivas, B. (1997). Automatic induction of rules for text simplification. Knowledge Based Systems, 101, 183–190.
Collins-Thompson, K., & Callan, J. (2005). Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology, 561, 1448–1462.
Coster, W., & Kauchak, D. (2011). Learning to simplify sentences using wikipedia.
Proceedings of the Workshop on Monolingual Text-To-Text Generation
, (pp. 1–9).
Crossley, S.A., Dufty, D.F., McCarthy, P.M., & McNamara, D.S. (2007). Toward a new readability: A mixed model approach.
Proceedings of the 29th annual conference of the Cognitive Science Society
(pp. 197–202).
Dell’Orletta, F., Montemagni, S., & Venturi, G. (2011). READ-IT: Assessing readability of Italian texts with a view to text simplification.
Proceedings of the 2nd Workshop on Speech and Language Processing for Assistive Technologies
(pp. 73–83).
Feng, L., Elhadad, N., & Huenerfauth, M. (2009). Cognitively motivated features for readability assessment.
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)
(pp. 229–237).
Flor, M., Klebanov, B.B. & Sheehan, K.M. (2013). Lexical Tightness and Text Complexity. Proceedings of the Second Workshop on Natural Language Processing for Improving Textual Accessibility (pp. 29–38).
François, T., & Watrin, P. (2011). On the contribution of MWE-based features to a readability formula for French as a foreign language.
Proceedings of Recent Advances in Natural Language Processing (RANLP)
(pp. 441–447).
Futagi, Y., Kostin, I.W., & Sheehan, K.M. (2007). Reading level assessment for literacy and expository texts.
Proceedings of the 29th Annual Meeting of the Cognitive Science Society
(pp. 18–53).
Gasperin, C., Specia, L., Pereira, T.F., & Aluisio, S.M. (2009). Learning when to simplify sentences for natural text simplification.
Proceedings of the Encontro Nacional de Inteligência Artificial (ENIA-2009)
(pp. 809–818).
Graesser, A.C., McNamara, D.S., & Kulikowich, J.M. (2012). Coh-metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223–234.
Hancke, J., Vajjala, S., & Meurers, D. (2012). Readability classification for German using lexical, syntactic and morphological features.
Proceedings of the 24th International Conference on Computational Linguistics (COLING)
(pp. 1063–1080).
Heilman, M., Collins-Thompson, K., Callan, J., & Eskenazi, M. (2007). Combining lexical and grammatical features to improve readability measures for first and second language texts.
Proceedings of the Human Language Technologies Conference (HLT)
(pp. 460–467). Association for Computational Linguistics.
Heilman, M., Collins-Thompson K., & Eskenazi, M. (2008). An analysis of statistical models and features for reading difficulty prediction.
Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications
(pp. 71–79). Association for Computational Linguistics.
Heilman, M., Zhao, L., Pino, J., & Eskenazi, M. (2008a). Retrieval of reading materials for vocabulary and reading practice.
Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications (BEA3)
(pp. 80–88).
Hunt, K.W. (1970). Do sentences in the second language grow like those in the first? TESOL Quarterly, 41, 195–202.
Jonnalagadda, S., Tari, L., Hakenberg, J., Baral, C., & Gonzalez, G. (2009). Towards effective sentence simplification for automatic processing of biomedical text.
Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (
NAACL
HLT)
(pp. 177–180).
Kim, J.Y., Collins-Thompson, K., Bennett, P.N., & Dumais, S.T. (2012). Characterizing web content, user interests, and search behavior by reading level and topic.
Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM)
(pp. 213–222).
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., & Chissom, B.S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for Navy enlisted personnel. Research Branch Report. Naval Technical Training Command. (pp. 8–75).
Klebanov, B.B., Knight, K., & Marcu, D. (2004). Text simplification for information-seeking applications. On the Move to Meaningful Internet Systems, Lecture Notes in Computer Science (pp. 735–747).
Klerke, S., & Søgaard A. (2012). Dsim, a Danish parallel corpus for text simplification.
Proceedings of Language Resources and Evaluation Conference (LREC)
(pp. 4015–4018).
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, Marc. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 441, 978–990.
Landauer, T., & Way, D. (2012). Improving text complexity measurement through reading maturity metric.
Annual Meeting of the National Council on Measurement in Education
. [URL]
Levy, R., & Andrew, G. (2006). Tregex and Tsurgeon: Tools for querying and manipulating tree data structures.
Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC)
(pp. 2231–2234).
Liu, X., Croft, W.B., Oh, P., & Hart, D. (2004). Automatic recognition of reading levels from user queries.
Proceedings of the 27th Annual International ACM SIGIR Conference on RESEARCH and Development in Information Retrieval
(pp. 548–549).
Ma., Y., Fosler-Lussier, E., & Lofthus, R. (2012). Ranking-based readability assessment for early primary children’s literature.
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)
(pp. 548–552).
Medero, J., & Ostendorf, M. (2011). Identifying targets for syntactic simplification.
Proceedings of the International Workshop on Speech and Language Technology in Education (SLaTE 2011)
. [URL]
Napoles, C., & Dredze, M. (2010). Learning simple wikipedia: A cogitation in ascertaining abecedarian language.
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids
(pp. 42–50).
Nelson, J., Perfetti, C., Liben, D., & Liben, M. (2012). Measures of Text Difficulty: Testing their Predictive Value for Grade Levels and Student Performance. The Council of Chief State School Officers Technical Report.
Pera, M.S., & Ng, Y-K. (2012). BReK12: A book recommender for K-12 users.
Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
(pp. 1037–1038).
Petersen, S.E., & Ostendorf, M. (2007). Text simplification for language learners: A corpus analysis.
Proceedings of Speech and Language Technology for Education (SLaTE)
. [URL]
Petrov, S., & Klein, D. (2007). Improved inference for unlexicalized parsing.
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics
(pp. 404–411).
Sheehan, K.M., Kostin, I., & Futagi, Y. (2009). When do standard approaches for measuring vocabulary difficulty, syntactic complexity and referential cohesion yield biased estimates of text difficulty?
Proceedings of the 30th Annual Meeting of the Cognitive Science Society
(pp. 1978–1983).
Sheehan, K.M., Kostin, I., Futagi, Y., & Flor, M. (2010). Generating automated text complexity classifications that are aligned with targeted text complexity standards. ETS Research Report, RR-10-28.
Siddharthan, A. (2002). An architecture for a text simplification system.
Proceedings of the Language Engineering Conference (LEC)
(pp. 64–71).
. (2003). Preserving discourse structure when simplifying text.
Proceedings of the European Natural Language Generation Workshop (ENLG)
(pp. 103–110).
. (2004). Syntactic simplification and text cohesion. PhD Thesis, University of Cambridge.
Specia, L. (2010). Translating from complex to simplified sentences.
Proceedings of the 9th international Conference on Computational Processing of the Portuguese Language (PROPOR’10)
(pp. 30–39).
Specia, L., Jauhar, S.K., & Mihalcea, R. (2012). SemEval-2012 task 1: English lexical simplification.
Proceedings of the 6th International Conference on Semantic Evaluation (SemEval)
(pp. 347–355).
Štajner, S., Drndarevic, B., & Saggion, H. (2013). Corpus-based sentence deletion and split decisions for Spanish text simplification. Computación y Sistemas (CICLing 2013) 17(2). 251–262.
Toutanova, K., & Klein, D. (2003). Feature-Rich Part-of-speech tagging with a cyclic dependency network.
Proceedings of HLT-NAACL 2003
(pp. 252–259).
Vajjala, S., & Meurers, D. (2012). On improving the accuracy of readability classification.
Proceedings of the Seventh Workshop on Innovative use of NLP for Building Educational Applications (BEA7)
(pp. 163–173). Association for Computational Linguistics.
. (2013). On The applicability of readability models to web texts.
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)
(pp. 59–68). Association for Computational Linguistics.
. (2014). Exploring measures of Readability for spoken language: Analyzing linguistic features of subtitles to identify age-specific TV programs.
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations
. Association for Computational Linguistics. Gothenburg, Sweden.
Vor der Brück, T., Hartrumpf, S., & Helbig, H. (2008). A readability checker with supervised learning using deep syntactic and semantic indicators. Informatica, 321, 429–435.
Wilson, M.D. (1988). The MRC psycholinguistic database: Machine readable dictionary, Version 2. Behavioral Research Methods, Instruments and Computers, 201, 6–11.
Witten, I.H., & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, Amsterdam; Boston, MA.
Woodsend, K., & Lapata, M. (2011). Learning to simplify sentences with quasi-synchronous grammar and integer programming.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
(pp. 409–420).
Wubben, S., van den Bosch, A., & Krahmer, E. (2012). Sentence simplification by monolingual machine translation.
Proceedings of ACL 2012
(pp. 1015–1024).
Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., & Lee, L. (2010). For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia.
Proceedings of NAACL-HLT
(pp. 365–368).
Zhao, J., & Kan, M-Y. (2010). Domain-specific iterative readability computation.
Proceedings of the 10th annual joint conference on Digital libraries
(pp. 205–214).
Zhu, Z., Bernhard, D., & Gurevych, I. (2010). A monolingual tree-based translation model for sentence simplification.
Proceedings of The 23rd International Conference on Computational Linguistics (COLING)
(pp. 1353–1361).
Cited by (18)
Cited by 18 other publications
Elysia, Aurellia Gita & Yulyani Arifin
Nanyonga, Aziida, Hassan Wasswa, Keith Joiner, Ugur Turhan & Graham Wild
Karaca, Mehmet Fatih & Münir Şahin
Kostadimas, Dimitris, Katia Lida Kermanidis & Theodore Andronikos
Kong, Nancy, Uwe Dulleck, Adam B. Jaffe, Shupeng Sun & Sowmya Vajjala
Li, Zhenzhen, Han Ding & Shaohong Zhang
Jena, Om Prakash, Alok Ranjan Tripathy, Sudhansu Sekhar Patra, Manas Ranjan Chowdhury & Rajesh Kumar Sahoo
Sharoff, Serge Aleksandrovich
Xu, Rui, Wenjing Pan, Canhua Chen, Xiaoyin Chen, Shilin Lin & Xia Li
Andreessen, Lena M., Peter Gerjets, Detmar Meurers & Thorsten O. Zander
Alva-Manchego, Fernando, Carolina Scarton & Lucia Specia
Brysbaert, Marc
Berger, Cynthia, Eric Friginal & Jennifer Roberts
Hartmann, Nathan, Livia Cucatto, Danielle Brants & Sandra Aluísio
Vágvölgyi, Réka, Andra Coldea, Thomas Dresler, Josef Schrader & Hans-Christoph Nuerk
De Ruvo, Giuseppe & Antonella Santone
Collins-Thompson, Kevyn
2014. Computational assessment of text readability. ITL - International Journal of Applied Linguistics 165:2 ► pp. 97 ff.
This list is based on CrossRef data as of 30 march 2026. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
