Citation:Macklin-Cordes, J. L. & E. R. Round, 2016. Reflections of linguistic history in quantitative phonotactics. Paper presented at the Australian Linguistic Society Annual Conference, Monash University, Caulfield, Australia. 7 December 2016. Doi: https://dx.doi.org/10.6084/m9.figshare.4299365
Abstract:
Advanced quantitative methods are at the cutting edge of historical linguistics, however these methods often ideally require many hundreds of data points per language. In order to generate reliable inferences at ever greater time depths, there is a need for typological datasets which are not only broader in coverage, but also contain a deeper store of information. We explore one avenue by extracting large numbers of high-definition phonotactic ‘traits’ per language. We show that these traits contain phylogenetic signal, thus demonstrating an important path towards high-powered methods of the near future. Methodology: Languages may be compared in terms of which two-segment sequences they permit. Moreover, such biphones possess distinct lexical frequencies, which can also be compared. We examined whether such data contain information about family-tree structure, i.e., phylogenetic signal. Two standard statistics are used: D [1] tests coarse-grained biphone ‘permissibility’ data; and K [2] tests higher-definition transition probabilities. We examined 2 subgroups of the Australian Pama-Nyungan family: 10 languages of Ngumpin-Yapa [3] and 7 of Yolngu [4], represented by phonemically-standardised lexicons from the CHIRILA database [5]. Phylogenetic signal is calculated with reference to phylogenies from C. Bowern (updated from [6]). Australian languages present a tough challenge, since phonotactically they are notoriously uniform [7–9]. Moreover, Ngumpin-Yapa has some of the world’s highest borrowing rates [10–11]. Thus we hypothesized that the coarse-grained D test would fail. The key question is whether the high-definition K test succeeds. Results: D attempts to reject two null hypotheses: that traits’ distributions are (A) too uniform to reveal structure present in the reference tree; and (B) random. We extracted 184 (Ngumpin-Yapa) and 164 (Yolngu) traits per language. We were surprised to reject both hypotheses for Yolngu (Stouffer’s Z>100, p=0.00): thus, even binary permissibility data revealed some phylogenetic signal. For N-Y only the second null hypothesis could be reject...
Abstract:
Advanced quantitative methods are at the cutting edge of historical linguistics, however these methods often ideally require many hundreds of data points per language. In order to generate reliable inferences at ever greater time depths, there is a need for typological datasets which are not only broader in coverage, but also contain a deeper store of information. We explore one avenue by extracting large numbers of high-definition phonotactic ‘traits’ per language. We show that these traits contain phylogenetic signal, thus demonstrating an important path towards high-powered methods of the near future. Methodology: Languages may be compared in terms of which two-segment sequences they permit. Moreover, such biphones possess distinct lexical frequencies, which can also be compared. We examined whether such data contain information about family-tree structure, i.e., phylogenetic signal. Two standard statistics are used: D [1] tests coarse-grained biphone ‘permissibility’ data; and K [2] tests higher-definition transition probabilities. We examined 2 subgroups of the Australian Pama-Nyungan family: 10 languages of Ngumpin-Yapa [3] and 7 of Yolngu [4], represented by phonemically-standardised lexicons from the CHIRILA database [5]. Phylogenetic signal is calculated with reference to phylogenies from C. Bowern (updated from [6]). Australian languages present a tough challenge, since phonotactically they are notoriously uniform [7–9]. Moreover, Ngumpin-Yapa has some of the world’s highest borrowing rates [10–11]. Thus we hypothesized that the coarse-grained D test would fail. The key question is whether the high-definition K test succeeds. Results: D attempts to reject two null hypotheses: that traits’ distributions are (A) too uniform to reveal structure present in the reference tree; and (B) random. We extracted 184 (Ngumpin-Yapa) and 164 (Yolngu) traits per language. We were surprised to reject both hypotheses for Yolngu (Stouffer’s Z>100, p=0.00): thus, even binary permissibility data revealed some phylogenetic signal. For N-Y only the second null hypothesis could be reject...