In:Language and Text: Data, models, information and applications
Edited by Adam Pawłowski, Jan Mačutek, Sheila Embleton and George Mikros
[Current Issues in Linguistic Theory 356] 2021
► pp. 137–144
The perils of big data
Published online: 22 December 2021
https://doi.org/10.1075/cilt.356.09emb
https://doi.org/10.1075/cilt.356.09emb
Abstract
The use of large amounts of data and the technologies to process them are characteristic of modern research. However, such practices come with risks of misleading the researcher. While there is much that could be said on this topic, here briefly is our cautionary tale to others, based on our direct experiences.
Keywords: big data, research practices, statistical packages, Romanian, dialects, Crișana, shibboleths
Article outline
- 1.Motivation
- 2.Some background
- 3.The muddle in the middle
- 4.Faith and reason
- 5.Data, and more data
- 6.In short
Acknowledgements References
References (20)
Embleton, Sheila, Dorin Uritescu & Eric S. Wheeler. 2002, 2007a. Romanian Online Dialect Atlas. [URL] (now at [URL] under the “dialectology” community, “RODA” collection)
. 2004. An exploration into the management of high volumes of complex knowledge in the social sciences and humanities. Journal of Quantitative Linguistics 11(3). 183–192.
. 2007a. Data capture and presentation in the Romanian Online Dialect Atlas. Linguistica Atlantica 27. 37–39.
. 2007b. Romanian Online Dialect Atlas: Data capture and presentation. In Peter Grzybek & Reinhard Köhler (eds.), Exact methods in the study of language and text. Dedicated to Gabriel Altmann on the occasion of his 75th birthday, 87–96. Berlin: Mouton de Gruyter.
. 2009. Data management and linguistic analysis: Multidimensional Scaling applied to Romanian Online Dialect Atlas. In Reinhard Köhler (ed.), Studies in Quantitative Linguistics 5, 10–16. Lüdenscheid: RAM-Verlag.
. 2011. Defining dialect regions with interpretations. Advancing the multidimensional scaling approach. Paper presented at Methods In Dialectology 14 Conference, London, Canada, August 2–6.
. 2013. Defining dialect regions with interpretations. Advancing the multidimensional scaling approach. Literary and Linguistics Computing 2013. 28(1).
. 2018. An Expanded Quantitative Study of Linguistic vs Geographic Distance Using Romanian Dialect Data. In Lu Wang, Reinhard Köhler, & Arjuna Tuzzi (eds.), Structure, Function and Process in Texts, Proceedings of Qualico 2016, 25–33. Lüdenscheid, Germany: RAM-Verlag.
Embleton, Sheila & Eric S. Wheeler. 1997a. Multidimensional scaling and the SED data. In Viereck, Wolfgang & Heinrich Ramisch (eds.), The Computer Developed Linguistic Atlas of England, Volume 2, 5–11. Tübingen: Max Niemeyer.
. 1997b. Finnish dialect atlas for quantitative studies. Journal of Quantitative Linguistics 4. 99–102.
. 2000. Computerized dialect atlas of Finnish: Dealing with ambiguity. Journal of Quantitative Linguistics 7. 227–231.
Kettunen, Lauri. 1940. Suomen murrekartasto [The dialect atlas of Finland]. Helsinki: Suomalaisen kirjallisuuden seura.
McGuire, Patricia. 2019 October 27. How higher education’s data obsession leads us astray. The Chronicle of Higher Education. [URL]. Accessed October 31, 2019.
Stan, Ionel & Dorin Uritescu. 1996, 2003. Noul Atlas lingvistic român. Crişana [The New Romanian Linguistic Atlas. Crişana]. Volume 1, 1996, Volume 2, 2003. Bucureşti: Romanian Academy Press.
Uritescu, Dorin. 1983. Asupra repartiţiei dialectale a graiurilor dacoromâne. Graiul din Oaş [On the dialect structure of Daco-Romanian. The dialect of Oaş]. In Ion Gheție (ed.), Materiale şi cercetări dialectale II [Dialectal materials and research II]. Cluj-Napoca: The University of Cluj-Napoca. 231–246.
. 1984a. Subdialectul crișean [The dialect of Crișana]. In Valeriu Rusu (ed.), Tratat de dialectologie românească [Treatise of Romanian Dialectology], 284–320, maps 78–106. Craiova: Scrisul Românesc.
. 1984b. Graiul din Țara Oașului [The dialect of Tara Oașului]. In Valeriu Rusu (ed.), Tratat de dialectologie românească [Treatise of Romanian Dialectology], 390–399, maps 171–174. Craiova: Scrisul Românesc.
