In:Directions in Empirical Literary Studies: In honor of Willie van Peer
Edited by Sonia Zyngier, Marisa Bortolussi, Anna Chesnokova and Jan Auracher
[Linguistic Approaches to Literature 5] 2008
► pp. 175–191
Computationally Discriminating Literary from Non-Literary Texts
Published online: 15 May 2008
https://doi.org/10.1075/lal.5.16lou
https://doi.org/10.1075/lal.5.16lou
Three computational linguistic methods are presented to discriminate literary from non-literary texts. In the first study, a hierarchical clustering technique of results obtained from Latent Semantic Analysis showed a clustering of literary versus non-literary texts. The second study used the frequencies of shared bigrams across the text, resulting in a 100% correct classification of literary versus non-literary texts. The third study used unigrams yielding a 94% correct classification into literary versus non-literary texts. The final two studies using a larger sample of texts showed that the high classification performance cannot be attributed to specific texts. These findings provide evidence that distinguishing literature from non-literature can be done with high accuracy and with relatively simple computational linguistic techniques.
Cited by (7)
Cited by seven other publications
Berthelier, Benoit
Mohseni, Mahdi, Volker Gast & Christoph Redies
Gavaler, Chris & Dan Johnson
van Cranenburgh, Andreas, Karina van Dalen-Oskam & Joris van Zundert
Guy, Josephine M, Kathy Conklin & Jennifer Sanchez-Davies
Mar, Raymond A.
McCarthy, Kathryn S.
This list is based on CrossRef data as of 25 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
