In:Challenges in Corpus Linguistics: Rethinking corpus compilation and analysis
Edited by Mark Kaunisto and Marco Schilk
[Studies in Corpus Linguistics 118] 2024
► pp. 171–172
Subject index
Published online: 19 September 2024
https://doi.org/10.1075/scl.118.si
https://doi.org/10.1075/scl.118.si
A
- accessibility 5, 30, 73, 75, 77–79, 81, 89, 91–97, 100, 109–110, 112
- affinity propagation153
- annotation
11–12, 16, 28–29, 40, 52, 55–58, 61–64, 75, 83, 90, 94–95, 142–143, 145, 148, 153–155, 161, 164–165
- contact influence 148, 153–154, 161–165
- compounds17
- learner errors 61–62
- multilingual language use 57–58, 64
- named entities 35–40, 42
- part-of-speech (POS) 11–17, 28, 75, 83
B
- balance 11, 22–23, 29, 74, 76, 78, 84, 93
- big-data(see very large corpora)
- bilingualism 145, 149, 154, 156, 160–163
- British Library Newspapers database 69–85
- British National Corpus (BNC) 4, 21, 40–43, 45–48, 51, 98, 100, 127, 135
- Brown Corpus 39, 72
C
- calquing 56, 58, 61, 63
- Clean Corpus of Historical American English51
- codeswitching 56, 58, 158, 160, 165
- collocation 36–37, 48–51, 83, 126, 137
- common nouns 37–38, 40–45, 158
- compilation 2, 4–6, 9–10, 16, 22, 29–30, 56, 59, 63–65, 68–69, 76–79, 85, 93–94, 98–101, 109–110, 131, 135, 137–138
- computational linguistics 39, 51
- Constituent Likelihood Automatic Word-tagging System (CLAWS) 12, 15, 40, 45
- contact-induced semantic shifts 62, 142–165
- copyright 92, 96–100
- corpus exploration151
- Corpus of Contemporary American English (COCA) 14, 21, 29, 45, 94, 127, 134–135
- Corpus of Global Web-based English (GloWbE) 42–44
- Corpus of Historical American English (COHA) 13–14, 18–23, 29, 94, 127, 134
D
- databases 4–5, 10–11, 23, 29–30, 69, 72, 74–75, 85
- deep learning27
- diachronic corpora 9, 12, 18, 22, 29, 45, 129, 131, 147
- Digital Humanities 24, 30, 68–73, 75–76, 81, 85, 130
- discourse of deficit 55–56, 61
- dispersion 2, 20–21, 29, 42, 52, 116
E
- Eighteenth Century Collections Online (ECCO) 11, 23–27, 30
- EF Cambridge Open Language Database (EFCAMDAT)59
F
- false positives 57, 91, 143, 154–155, 157–162, 165
- formality 134, 136, 145, 154, 156, 158, 160
- foreignizing 56, 58, 61, 63
G
- genre 4, 6, 10–11, 22–23, 29, 58, 78–79, 93, 108–110, 115, 121, 126–139
- genre categorization/classification 18, 126–139
- genre evolution 15–16, 22, 132
- God’s truth fallacy 1, 3, 24, 36, 52, 89–90, 93, 99
H
- hapax legomena 25–26
- Helsinki Corpus 9, 22
- historical corpora 9–13, 15–16, 22, 27–30, 69, 131
- historical corpus linguistics 4, 9–30, 36, 69–85
- historical lexis / historical spelling 12, 17, 26–27
- historical text databases 10–11, 23–27, 29–30
- homographs 143, 156, 158–162
I
- information extraction39
- interlanguage 56–58, 60–63
- International Corpus of English (ICE) 59, 62
- International Corpus of Learner English (ICLE) 59, 62, 127
K
- keyword analysis 74, 126, 137
L
- language contact 56, 143, 145, 148, 155–165
- language variation 10, 142–143, 146, 165
- learner corpus (research) 55–65, 127
- lengthwise analysis 106, 111, 118–120
- lengthwise scaling 118–120
- lexical bias 60–61
- lexical diversity 108, 114, 122–123
- lexical innovation 62–63
- literary corpora127
- literary studies130
- loan translations144
- Lost Generation Corpus 129, 138
- Louvain International Database of Spoken English Interlanguage (LINDSEI)57
M
- metadata 11, 22–24, 26–27, 29–30, 64–65, 69, 75, 81, 93, 95, 98, 116–117, 146, 156
- multilingualism 56–58, 61–64
- multiple correspondence analysis 120–121
- multi-word units 36–37, 39–41, 45
- Mystery of the Vanishing Reliability 1, 89–90, 94, 99
N
- n-grams 59, 75, 100, 126, 137
- named entities 36–51
- named entity recognition 36, 39, 51, 75, 127
- natural language processing (NLP) 6, 27, 36, 39, 51, 60, 147
- neural networks 27, 147, 152, 164
- neural word embeddings 142–144, 147–148, 152–153
- News on the Web Corpus (NOW) 49, 51
- normalization 26, 107–109, 118, 121
O
- Open American National Corpus 98, 100
- optical character recognition (OCR) 11–12, 24–27, 29, 74, 81–85
- open research 89–102
P
- Parsed Corpus of Early English Correspondence (PCEEC) 15–17, 28
- Philologist’s dilemma 1, 9–11, 28, 89, 92–93, 99
- POS (part-of-speech) category change(see word class change)
- precision 2, 25, 29, 35, 39, 82–83
- proper nouns / proper names 12, 17, 36–48, 158–160
Q
- Quebec English 142–165
- query building 15–16, 25–27, 29, 51
- quotations 36–37, 58, 97–98, 101
R
- recall 25–26, 29, 82–83
- reference corpora 93, 96, 128
- regional variation 43, 49, 143, 145–146, 155, 164–165
- register 69, 73, 78–81, 84–85, 112, 136
- register analysis 79, 84, 111, 120
- reliability 1, 11, 24, 36, 51, 90, 94, 115, 120, 146
- remediation 69, 71
- replicability 64, 73, 89–92, 99, 101–102
- representativeness 2–3, 10, 22–23, 29, 36, 68, 76–81, 84–85, 90, 92–94, 109, 127
- resampling 93, 120–121, 123
S
- sampling 2, 9–11, 18, 20–22, 24–27, 29, 59, 65, 69, 76–78, 80, 84, 92–93, 121–122, 145
- semantic change 144–145, 147–148, 150–151, 165 (see also contact-induced semantic shifts)
- Spanish Learner Language Oral Corpora (SPLLOC)58
- special corpora 85, 127
- stylistics 130, 137
- (near-)synonyms 41, 45–50, 156
- syntactic parsing16
T
- task instruction / task effects 55–56, 59–61, 64
- text categorization(see text type)
- text length 106–123
- text type 46–47, 127–135
- text sampling(see sampling)
- transparency 91–94
- Twitter
110, 117, 146, 149
- Twitter corpora 101, 121, 142–165
- type-token ratio122
V
- vector space models (VSMs) 147–149
- very large corpora 9–11, 18, 27–28, 30, 93, 112, 120
- Vocabulary-Based Discourse Unit117
W
- word-class change 12–15, 28
- writing prompt 55–56, 59, 64
