Article published In: Analyse Lexicale et Syntaxique: Le système INTEX
Edited by Cédrick Fairon
[Lingvisticæ Investigationes 22:1/2] 1999
► pp. 291–307
Normalisation des textes anglais
Article language: French
Published online: 3 October 2000
https://doi.org/10.1075/li.22.1-2.18zel
https://doi.org/10.1075/li.22.1-2.18zel
The present study deals with the pre-processing of texts. This pre-processing is performed in three steps, which are: the segmentation of the texts into textual units (sentences), the re-writing of contracted forms into a standard form, and the tagging of unambiguous compounds. We describe here two of the three steps: text segmentation, and the re-writing of contracted forms. The segmentation of the texts into textual units is made possible by using the transducer Sentence. The re-writing of contracted forms into their standard forms is done by applying the transducer Normalisation. We describe in detail the various steps involved in the development of both transducers.
