Article published In: International Journal of Corpus Linguistics
Vol. 24:4 (2019) ► pp.522–535
Short paper
Phonological CorpusTools
Software for doing phonological analysis on transcribed corpora
Published online: 1 November 2019
https://doi.org/10.1075/ijcl.18009.hal
https://doi.org/10.1075/ijcl.18009.hal
Abstract
Phonological analysis increasingly involves the quantification of various lexical and/or usage statistics, such as
phonotactic probabilities, the functional loads of various phonemic contrasts, or neighbourhood densities. This paper presents
Phonological CorpusTools, a free, open-source software for conducting such phonological analyses on
transcribed corpora. The motivations for creating the software are given, along with an overview of the structure of the program,
its analysis algorithms, and its applications within phonology.
Keywords: phonology, frequency, software, functional load, predictability of distribution
Article outline
- 1.Introduction
- 2.Types of corpora
- 3.Corpora, transcriptions, and features
- 3.1Transcriptions and inventories
- 3.2Using tiers
- 4.Analyses and output
- 5.Applications and access
- Acknowledgements
- Notes
References
References (36)
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX Lexical Database: English Linguistic Guide. Linguistic Data Consortium, University of Pennsylvania. Retrieved from [URL] (last accessed August 2019).
Bod, R., Hay, J., & Jannedy, S. (Eds.). (2003). Probabilistic Linguistics. Cambridge, MA: MIT Press.
Boersma, P., & Weenink, D. (2019). Praat: Doing phonetics by computer (Version 6.0.50) [Computer program]. Retrieved from [URL] (last accessed March 2019).
Brent, M. R. (1999). An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning, 34(1–3), 71–105.
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.
Cohen Priva, U. (2008). Using information content to predict phone deletion. In N. Abner & J. Bishop (Eds.), Proceedings of the 27th West Coast Conference on Formal Linguistics (pp. 90–98). Somerville, MA: Cascadilla Proceedings Project.
Durand, J., Gut, U., & Kristoffersen, G. (Eds.). (2014). The Oxford Handbook of Corpus Phonology. Oxford: Oxford University Press.
Ernestus, M. (2011). Gradience and categoricality in phonological theory. In M. van Oostendorp, C. J. Ewen, E. Hume & K. Rice (Eds.), The Blackwell Companion to Phonology (pp. 2115–2136). Oxford: Wiley-Blackwell.
Frisch, S. (2011). Frequency effects. In M. van Oostendorp, C. J. Ewen, E. Hume & K. Rice (Eds.), The Blackwell Companion to Phonology (pp. 2137–2163). Oxford: Wiley-Blackwell.
Goldsmith, J., & Riggle, J. (2012). Information theoretic approaches to phonological structure: The case of Finnish vowel harmony. Natural Language and Linguistic Theory, 30(3), 859–896.
Greenberg, J. H., & Jenkins, J. (1964). Studies in the psychological correlates of the sound system of American English. Word, 20(1–3), 157–177.
Hall, K. C. (2009). A Probabilistic Model of Phonological Relationships from Contrast to Allophony (Unpublished doctoral dissertation). The Ohio State University, Columbus, OH.
(2013). A typology of intermediate phonological relationships. The Linguistic Review, 30(2), 215–275.
(submitted). Corpora and phonological analysis. In B. E. Dresher & H. Van der Hulst (Eds.), The Handbook of the History of Phonology. Oxford: Oxford University Press.
Hall, K. C., Allen, B., Fry, M., Mackie, S., & McAuliffe, M. (2015). Calculating functional load with pronunciation variants. Paper presented at the Workshop on Modeling Variability in Speech, Institute for Natural Language Processing, Stuttgart, Germany.
Hall, K. C., Jaeger, T. F., Hume, E., & Wedel, A. (2018a). The role of predictability in shaping phonological patterns. Linguistics Vanguard, 4(s2), 1–15.
Hall, K. C., Pine, A., & Schwan, M. D. (2018b). Doing phonological corpus analysis in a fieldwork context. In L. Matthewson, E. A. Guntly & M. Rochemont (Eds.), Wa7 Xweysás i Nqwal’utteníha i Ucwalmícwa: He Loves the People’s Languages: Essays in honour of Henry Davis (pp. 615–630). Vancouver, BC: UBC Occasional Papers in Linguistics.
Hockett, C. F. (1966). The quantification of functional load: A linguistic problem. U.S. Air Force Memorandum RM-5168-PR.
International Phonetic Association (Ed.) (1999). The Handbook of the International Phonetic Association. Cambridge: Cambridge University Press.
Johnson, K. (2004). Massive reduction in conversational American English. In K. Yoneyama & K. Maekawa (Eds.), Spontaneous Speech: Data and Analysis (pp. 29–54). Tokyo: The International Institute for Japanese Language.
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86.
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear Hear, 19(1), 1–36.
Pierrehumbert, J. B. (2003). Probabilistic phonology: Discrimination and robustness. In R. Bod, J. Hay & S. Jannedy (Eds.), Probabilistic Linguistics (pp. 177–228). Cambridge, MA: MIT Press.
Pinnow, E., & Connine, C. M. (2014). Phonological variant recognition: Representations and rules. Language and Speech, 57(1), 42–67.
Pitt, M. A. (2009). The strength and time course of lexical activation of pronunciation variants. Journal of Experimental Psychology: Human Perception and Performance, 35(3), 896–910.
Pitt, M. A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., & Fosler-Lussier, E. (2007). Buckeye Corpus of Conversational Speech (2nd Release). Retrieved from [URL] (last accessed August 2019).
Rose, Y., MacWhinney, B., Byrne, R., Hedlund, G., Maddocks, K., O’Brien, P., & Wareham, T. (2006). Introducing Phon: A software solution for the study of phonological acquisition. In D. Bamman, T. Magnitskaia & C. Zaller (Eds.), Proceedings of the 30th Annual Boston University Conference on Language Development (pp. 489–500). Somerville, MA: Cascadilla Proceedings Project.
Rose, Y., & MacWhinney, B. (2014). The PhonBank Project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut & G. Kristoffersen (Eds.), The Oxford Handbook of Corpus Phonology (pp. 308–401). Oxford: Oxford University Press.
Vaden, K. I., Halpin, H. R., & Hickok, G. S. (2009). Irvine Phonotactic Online Dictionary (Version 2.0). Retrieved from [URL] (last accessed August 2019).
Vitevitch, M. S. & Luce, P. A. (2004). A web-based interface to calculate phonotactic probability for words and nonwords in English. Behavior Research Methods, Instruments, and Computers, 36(3), 481–487.
Wedel, A., Kaplan, A., & Jackson, S. (2013). High functional load inhibits phonological contrast loss: A corpus study. Cognition, 128(2), 179–186.
Wells, J. C. (n.d.). Computer-coding the IPA: A Proposed Extension of SAMPA. London: University College London. Retrieved from [URL] (last accessed August 2019).
Cited by (11)
Cited by 11 other publications
Bailey, Lyam M., Heath E. Matheson, Jonathon M. Fawcett, Glen E. Bodner & Aaron J. Newman
Fichtner, Friederike, Joe Barcroft, Mitchell Sommers & Paul Olejarczuk
Graves, William W., Hillary J. Levinson, Ryan Staples, Olga Boukrina, David Rothlein & Jeremy Purcell
Renwick, Margaret E. L.
Tolkacheva, Valeriya, Sonia L E Brownsett, Katie L McMahon & Greig I de Zubicaray
Wedig, Helena, Felix Theodor, Joshua Wieler & Eva Belke
Alderete, John & Sara Finley
Graves, William W., Jeremy Purcell, David Rothlein, Donald J. Bolger, Miriam Rosenberg-Lee & Ryan Staples
Sabev, Mitko
de Vargas, Mauricio Fontana, David Marino, Antoine Weill--Duflos & Jeremy R. Cooperstock
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
