In:Patterns of Text: In honour of Michael Hoey
Edited by Mike Scott and Geoff Thompson †
[Not in series 107] 2001
► pp. 213–237
Lexical segments in text
Published online: 27 February 2001
https://doi.org/10.1075/z.107.11ber
https://doi.org/10.1075/z.107.11ber
Editors’ introduction
Berber Sardinha’s paper deals with a problem, namely text segmentation, which connects at several points with
those of the other contributors to this volume. Like Scott, Sinclair and Coulthard, Berber Sardinha is interested in
understanding the computer’s understanding of text, or rather the computer’s failure to handle the
complexities of text satisfactorily. Like the other contributors who have been influenced by Hoey’s work on
text patterning, his work is also concerned with the problem of identifying the stages which a text goes through as it
moves from one component of a pattern to the next.
The problem is not trivial. Computer methods for processing text have already led to an explosion of text
retrieval methods which anyone who uses Internet search engines knows, needs and curses. That is, a fairly simple
technology is there to help us find all instances of a desired word or phrase in a database, or in the whole Internet, or
on a given computer, and the uses to which this technology can be put are both text retrieval — to find the text
one is searching for — and pedagogical: to learn about word collocation and colligation. But as
Sinclair’s paper shows, such a technology may be efficient in its own terms but disconnected from the way
human users relate to the world and to each other. Thus, a very large number of irrelevant hits are typically found,
which usually hinder text retrieval as much as they help it and may also obscure and frustrate collocational inference.
It is likely that these problems will be best tackled by refinements to the techniques used, refinements which
are very likely to involve questions central to the rest of this volume, concerning the aboutness of individual text
segments, and the relations between text segments or elements. Thus, for information retrieval and language learning
we certainly need to know much more than “which texts contain word x or phrase y?” and move
towards “which texts are about z?” and “which segments of which texts are about p and not
q?” and “where does the text change from explaining r to evaluating it?”. It is probable that as
we learn to answer questions such as these, we shall be that much nearer to a truly useful text retrieval.
Berber Sardinha’s paper proposes a detailed and ingenious method for getting at the boundaries
within a text, identifying its segments in the sense of changes in aboutness.
As with the other contributors using computer methods, the problems are as yet greater than the solutions
encountered. It is therefore important to view the method being proposed in the right light: the purpose here as in so
much else is to model the world; it is through insights arising from model-making, model application and model-
testing that progress is eventually made.
Cited by (2)
Cited by two other publications
McCarthy, Philip M., Adam M. Renner, Michael G. Duncan, Nicholas D. Duran, Erin J. Lightman & Danielle S. McNamara
[no author supplied]
This list is based on CrossRef data as of 24 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
