Cover not available

In:Challenges in Corpus Linguistics: Rethinking corpus compilation and analysis
Edited by Mark Kaunisto and Marco Schilk
[Studies in Corpus Linguistics 118] 2024
► pp. 171–172

Get fulltext from our e-platform

Download Book PDF

Download Book EPUB

Subject index

Published online: 19 September 2024

https://doi.org/10.1075/scl.118.si

A

accessibility 5, 30, 73, 75, 77–79, 81, 89, 91–97, 100, 109–110, 112

affinity propagation153

annotation 11–12, 16, 28–29, 40, 52, 55–58, 61–64, 75, 83, 90, 94–95, 142–143, 145, 148, 153–155, 161, 164–165
- contact influence 148, 153–154, 161–165
- compounds17
- learner errors 61–62
- multilingual language use 57–58, 64
- named entities 35–40, 42
- part-of-speech (POS) 11–17, 28, 75, 83

B

balance 11, 22–23, 29, 74, 76, 78, 84, 93

big-data(see very large corpora)

bilingualism 145, 149, 154, 156, 160–163

British Library Newspapers database 69–85

British National Corpus (BNC) 4, 21, 40–43, 45–48, 51, 98, 100, 127, 135

Brown Corpus 39, 72

C

calquing 56, 58, 61, 63

Clean Corpus of Historical American English51

codeswitching 56, 58, 158, 160, 165

collocation 36–37, 48–51, 83, 126, 137

common nouns 37–38, 40–45, 158

compilation 2, 4–6, 9–10, 16, 22, 29–30, 56, 59, 63–65, 68–69, 76–79, 85, 93–94, 98–101, 109–110, 131, 135, 137–138

computational linguistics 39, 51

Constituent Likelihood Automatic Word-tagging System (CLAWS) 12, 15, 40, 45

contact-induced semantic shifts 62, 142–165

copyright 92, 96–100

corpus exploration151

Corpus of Contemporary American English (COCA) 14, 21, 29, 45, 94, 127, 134–135

Corpus of Global Web-based English (GloWbE) 42–44

Corpus of Historical American English (COHA) 13–14, 18–23, 29, 94, 127, 134

D

databases 4–5, 10–11, 23, 29–30, 69, 72, 74–75, 85

deep learning27

diachronic corpora 9, 12, 18, 22, 29, 45, 129, 131, 147

Digital Humanities 24, 30, 68–73, 75–76, 81, 85, 130

discourse of deficit 55–56, 61

dispersion 2, 20–21, 29, 42, 52, 116

E

Eighteenth Century Collections Online (ECCO) 11, 23–27, 30

EF Cambridge Open Language Database (EFCAMDAT)59

F

false positives 57, 91, 143, 154–155, 157–162, 165

formality 134, 136, 145, 154, 156, 158, 160

foreignizing 56, 58, 61, 63

G

genre 4, 6, 10–11, 22–23, 29, 58, 78–79, 93, 108–110, 115, 121, 126–139

genre categorization/classification 18, 126–139

genre evolution 15–16, 22, 132

God’s truth fallacy 1, 3, 24, 36, 52, 89–90, 93, 99

H

hapax legomena 25–26

Helsinki Corpus 9, 22

historical corpora 9–13, 15–16, 22, 27–30, 69, 131

historical corpus linguistics 4, 9–30, 36, 69–85

historical lexis / historical spelling 12, 17, 26–27

historical text databases 10–11, 23–27, 29–30

homographs 143, 156, 158–162

I

information extraction39

interlanguage 56–58, 60–63

International Corpus of English (ICE) 59, 62

International Corpus of Learner English (ICLE) 59, 62, 127

K

keyword analysis 74, 126, 137

L

language contact 56, 143, 145, 148, 155–165

language variation 10, 142–143, 146, 165

learner corpus (research) 55–65, 127

lengthwise analysis 106, 111, 118–120

lengthwise scaling 118–120

lexical bias 60–61

lexical diversity 108, 114, 122–123

lexical innovation 62–63

literary corpora127

literary studies130

loan translations144

Lost Generation Corpus 129, 138

Louvain International Database of Spoken English Interlanguage (LINDSEI)57

M

metadata 11, 22–24, 26–27, 29–30, 64–65, 69, 75, 81, 93, 95, 98, 116–117, 146, 156

multilingualism 56–58, 61–64

multiple correspondence analysis 120–121

multi-word units 36–37, 39–41, 45

Mystery of the Vanishing Reliability 1, 89–90, 94, 99

N

n-grams 59, 75, 100, 126, 137

named entities 36–51

named entity recognition 36, 39, 51, 75, 127

natural language processing (NLP) 6, 27, 36, 39, 51, 60, 147

neural networks 27, 147, 152, 164

neural word embeddings 142–144, 147–148, 152–153

News on the Web Corpus (NOW) 49, 51

normalization 26, 107–109, 118, 121

O

Open American National Corpus 98, 100

optical character recognition (OCR) 11–12, 24–27, 29, 74, 81–85

open research 89–102

P

Parsed Corpus of Early English Correspondence (PCEEC) 15–17, 28

Philologist’s dilemma 1, 9–11, 28, 89, 92–93, 99

POS (part-of-speech) category change(see word class change)

precision 2, 25, 29, 35, 39, 82–83

proper nouns / proper names 12, 17, 36–48, 158–160

Q

Quebec English 142–165

query building 15–16, 25–27, 29, 51

quotations 36–37, 58, 97–98, 101

R

recall 25–26, 29, 82–83

reference corpora 93, 96, 128

regional variation 43, 49, 143, 145–146, 155, 164–165

register 69, 73, 78–81, 84–85, 112, 136

register analysis 79, 84, 111, 120

reliability 1, 11, 24, 36, 51, 90, 94, 115, 120, 146

remediation 69, 71

replicability 64, 73, 89–92, 99, 101–102

representativeness 2–3, 10, 22–23, 29, 36, 68, 76–81, 84–85, 90, 92–94, 109, 127

resampling 93, 120–121, 123

S

sampling 2, 9–11, 18, 20–22, 24–27, 29, 59, 65, 69, 76–78, 80, 84, 92–93, 121–122, 145

semantic change 144–145, 147–148, 150–151, 165 (see also contact-induced semantic shifts)

Spanish Learner Language Oral Corpora (SPLLOC)58

special corpora 85, 127

stylistics 130, 137

(near-)synonyms 41, 45–50, 156

syntactic parsing16

T

task instruction / task effects 55–56, 59–61, 64

text categorization(see text type)

text length 106–123

text type 46–47, 127–135

text sampling(see sampling)

transparency 91–94

Twitter 110, 117, 146, 149
- Twitter corpora 101, 121, 142–165

type-token ratio122

V

vector space models (VSMs) 147–149

very large corpora 9–11, 18, 27–28, 30, 93, 112, 120

Vocabulary-Based Discourse Unit117

W

word-class change 12–15, 28

writing prompt 55–56, 59, 64