Machine learning in diachronic corpus phonology: mining verse data to infer trajectories in English phonotactics
Abstract
Machine learning is a powerful method when working with large data sets such as diachronic corpora. However, as opposed to standard techniques from inferential statistics like regression modeling, machine learning is less commonly used among phonological corpus linguists. This paper discusses three different machine learning techniques (K nearest neighbors classifiers; Naïve Bayes classifiers; artificial neural networks) and how they can be applied to diachronic corpus data to address specific phonological questions. To illustrate the methodology, I investigate Middle English schwa deletion and when and how it potentially triggered reduction of final /mb/ clusters in English.
This is an Open Access journal. All material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence, unless otherwise stated.
Please read our Open Access, Copyright and Permissions policies for more information.