UvA-DARE (Digital Academic Repository) The phonetics of NCh in Tumbuka and its implications for diachronic change

The phonetic motivation for the synchronic and diachronic development of post-nasal voicing (*NT > ND) is well understood. Less well understood is the phonetic motivation for other common synchronic and diachronic developments from *NT, widely attested in Bantu languages, such as aspiration of the voiceless plosive and subsequent loss of either the nasal or the plosive portion of the sequence: *NT > NTh > Th, Nh. In this paper we first review the existing (scarce) phonetic literature on these developments. Then we present the results of a phonetic study of NC sequences in Tumbuka, a Bantu language where NT > NTh, as a way of exploring how the acoustic and perceptual properties of NTh sequences could motivate the development, found in other Bantu languages, of Th or Nɦ from NTh. We conclude by proposing that a perceptual cue approach, rather than a gestural or other articulatory approach, provides the most persuasive phonetic account, not only of the motivation for post-nasal aspiration of voiceless stops, but also for the instability of nasals and of voiceless stops in the NTh context which leads to other sound changes.


Introduction
As Kerremans' (1980) thorough survey shows, a wide range of reflexes of Proto-Bantu *NT are found in modern Bantu languages. While voicing of the post-nasal obstruent (*NT > ND) might be the most wellknown (see, e.g., Pater 1999), it is also extremely common for the postnasal obstruent to undergo aspiration: *NT > NTh (see Hamann & Downing 2017 for detailed discussion).
In a number of Bantu languages, we find other developments from *NT: either the nasal or the stop portion of the *NT sequence is deleted, as illustrated in (1).
(2) *NT > N̥ T > N̥ Th > Nɦ / Th Note first that neither *NT > Nɦ nor *NT > Th is considered to result from a one-step change. Rather, these outcomes have developed from a phonologisation 'seriation.' (See Hyman 1976, 2013, Barnes 2006 for detailed discussion of the role of phonologisation in sound change.) Note further that these proposals crucially assume that the nasal in the intermediate N̥ Th sequence is voiceless, in order to motivate the further developments to Nɦ or Th.
In this paper, we evaluate the plausibility of this historical scenario, based on a careful phonetic study of NC sequences in Tumbuka, a Bantu language (N.21, Malawi) where NT > NTh (both diachronically and synchronically). After surveying previous work on the topic in section 2, we go on to present the results of our phonetic study in section 3 and, in section 4, we discuss how the acoustic and perceptual properties of NTh sequences could motivate the development, found in other Bantu languages, of Th or Nɦ from NTh sequences.

Background to our study
While the basic path of diachronic development of the modern reflexes of *NT is uncontroversial -it could be a Historical Phonology 101 problem, if Table 1 included cognates from languages where *NT > NT -more controversial is the motivation for each step in the seriation.
It is commonly agreed that natural sound changes should have a phonetic basis (see, e.g., Barnes 2006, Kiparsky 2003. Therefore, each step in the phonologisation seriation given in (2), from *NT to its modern reflexes, should be grounded in phonetics. In this section, we critically review the phonetic motivations that have been offered in the literature.

Accounting for the phonetic naturalness of *NT > N̥ Th
The first step in the seriation is for a voiceless stop following a nasal to become aspirated. Givón (1974, 110) suggests the following phonetic hypothesis (which he ascribes to John Ohala, via Leon Jacobson, via Tom Hinnebusch); the underlining is found in the original: Natural assimilation would de-voice the nasal before a voiceless homorganic stop [.] Since voiceless stops tend to be universally aspirated […], the presence of a 'breath' effect before the voiceless consonant creates a rather understandable perceptual confusion. This in turn gives rise to a perceptually motivated metathesis, whereby the speaker interprets the voiceless nasal as an aspiration on the following voiceless stop.
In short, aspiration is the result of two natural processes, first, assimilation of the nasal to the voiceless stop, then metathesis (or assimilation) of the stop and the 'breathiness' of the nasal.
This interesting proposal cries out for phonetic investigations, and we do find a few. Huffman & Hinnebusch (1998), Ladefoged & Maddieson (1996), and Maddieson (1991) carried out phonetic studies of Bantu languages with either aspirated post-nasal voiceless stops (NTh) or aspirated nasals (Nɦ). They found that the nasal in these contexts is not (systematically) devoiced. Indeed, Maddieson (1991, 152) concludes that the "diachronic development of aspirated nasals did not involve any stage in which the nasal portion became devoiced." This implies that postnasal aspiration cannot be conditioned by breathiness (or voicelessness) of the nasal.
Maddieson and others following him, like Huffman & Hinnebusch (1998) and Halpert (2010Halpert ( , 2012, argue instead for a gestural alignment account of postnasal aspiration, schematised in Figure 1 (from Hamann & Downing, Figure 3). That is, in the unmarked case, all of the gestures in an NC sequence should be aligned. To avoid a marked voiceless nasal, the open glottis gesture is misaligned with the sequence and is instead left-aligned only with the voiceless stop. As a result, the open glottis gesture (its duration determined by the original sequence) spills over beyond the release of the stop, resulting in aspiration.
As Huffman & Hinnebusch (1998) point out, one problem with this misalignment account is that aspiration requires an additional aspiration gesture. Gestural (mis-)alignment alone is not enough to lead to post-NT aspiration. Hamann & Downing (2017) provide a detailed critique of the gestural alignment account and argue in favor of a perceptual cue account of postnasal aspiration. In brief, they propose that the postnasal NT vs. ND contrast is hard to perceive without enhancement of the voiceless consonant. Aspiration enhances the phonetic cues to the laryngeal quality of NT. (See Hamann & Downing for the complete analysis.) These observations lead to our research question 1: The critiques of the alignment account assume that aspiration in NTh is distinctive and comparable to that found in Th: but is it?

Accounting for the phonetic naturalness of NTh > Th
If the nasal in NTh sequences is not voiceless, though, what other phonetic quality of the nasal could make it susceptible to deletion? We review some possibilities which emerge from previous work on NC sequences in this section.
Note first, that there appears to be no articulatory motivation for deleting the nasal. The gestural alignment approach cannot account for loss of a gesture (e.g., complete loss of velum raising gesture), as gesture deletion is not a legitimate 'move' in this approach. Cohn & Riehl's (2012) study of internal duration of the nasal and stop portion of NC sequences shows that the nasal portion typically is quite long, in some languages even longer than a plain nasal. Short duration of the nasal therefore is not a factor that could make it unstable.
Turning to possible perceptual accounts, Stanton's (2016) crosslinguistic survey of the distribution of NC sequences proposes that NC is best perceived intervocalically. It follows from this that nasals would be most susceptible to deletion in utterance-initial or utterance-final position. Could position explain the loss of the nasal? The problem with this potential perceptual motivation for deletion is that the NTh sequences in the Bantu language data often occur intervocalically. In none of the languages with NTh > Th is the nasal only deleted in wordinitial position. Rather, deletion is across-the-board (or at least, position in the word is not a factor).
These observations lead to our research question 2: What phonetic quality does the nasal in NTh have (compared to ND or N) that could make it susceptible to deletion?

Accounting for the phonetic naturalness of NTh > Nɦ
Maddieson (1991) also proposes to account for NTh > Nɦ in terms of gestural alignment. This is illustrated in Figure 2, where we can see that the duration of the wide velic gesture is not perfectly right-aligned with the lip closure gesture in (a), while the two gestures are aligned in (b). (from Maddieson 1991, 152) That is, the velar raising gesture expands to align itself with the closure gesture, eliminating the non-nasal stop and release portion of the original NC sequence.
There are a couple of problems with this account of the loss of the non-nasal stop closure. First, it provides no motivation or explanation for why only NTh loses closure, not ND. As we can see in Table 2, *ND sequences in words in class 9/10 do not undergo any change in the languages where we found changes in *NT sequences -cf. Table 1. Secondly, Stanton's (2016) survey of phonetic studies of NC shows that the stop portion of ND is usually very short compared to the stop portion of NT, cf. Figure 3.  (2). Additionally, give these durational differences, the differences in burst amplitude between N and NT i presumably greater than is the difference between N and ND.

Proto-
(2) Internal timing of NCs a. Internal timing of ND: NCs can also differ from other consonants according to the length of a preced ing vowel. In Sukuma (Maddieson and Ladefoged 1993), vowels preceding NTs ar significantly shorter than those preceding Ns or NDs. In some languages, vowels pre ceding NDs are longer than vowels preceding Ns; this effect, however, is language dependent. Languages where N/ND differ as a function of V1 duration are Lugand and Sukuma (Maddieson and Ladefoged 1993), CiYao and Runyambo (Hubbar 1995); languages where N and ND are not differentiated in this way include Fi jian (Maddieson 1989), CiTonga (Hubbard 1995), Tamambo and Erromangan (Rieh One would, then, expect the non-nasal portion of the stop closure to be more susceptible to deletion in ND sequences than in NT(h). This is the opposite of what we find.
A final problem with a gestural realignment account like that schematized in Figure 2 is that the 'aspirated nasal' (Nɦ) is often breathy voiced or acts like a depressor consonant. As phonetic studies like Traill & Jackson's (1988) investigation of breathy voiced nasals in Tsonga show, Nɦ (< *NT) is not a simple aspirated nasal (a rare sort of consonant in any case), as implied by the gestural alignment account.
These observations lead to our research question 3: What makes the non-nasal portion of the closure in NTh vulnerable to deletion, especially compared to ND?
In the remainder of this paper, we first present the results of our phonetic study of NC sequences in Tumbuka, and then discuss why a perceptual cue account, rather than gestural realignment, is the most promising approach to account for diachronic reductions of NTh > Nɦ/Th. Our three research questions structure the investigation.
Tumbuka provides an ideal testing ground for a possible phonetic support of a phonologisation seriation, because it has a three-way laryngeal contrast in plosives, but (synchronically and diachronically) NT is realized as NTh. Thus, this language has already undergone the initial part of the development in the seriation scenario.

Experimental support
In order to test whether present-day Tumbuka provides phonetic support for any of the three proposed diachronic developments described in section 2, we performed an acoustic study. The three hypothetical diachronic developments and the resulting research questions are summarized in (3).
(3) Diachronic development Research questions *NT > NTh RQ1: Is there acoustic evidence that the aspiration in NTh is due to misalignment and therefore less strong than the one in Th? NTh > Th RQ2: Is there acoustic evidence that the nasal in NTh is weaker and therefore more likely to be deleted than in ND? NTh > Nɦ RQ3: Is there acoustic evidence that the plosive in NTh is weaker and therefore more likely to be deleted than in ND?
In this section, we first give some background information on the relevant phoneme inventory and co-occurrence restrictions in Tumbuka (section 3.1), describe the acoustic study we performed (section 3.2), and then provide the results of this study (section 3.3) and a discussion thereof with respect to the research questions (section 3.4).  Example (4) illustrates the three-way laryngeal contrast for the velar plosives.

Tumbuka nasal plosive sequences
(4) ku-kama 'to squeeze, to milk' ku-k h ala 'to dwell, to sit' ku-ganda 'to bump' T and Th only robustly contrast in root-initial position, elsewhere only T occurs. NT does not occur: an underlying N+T is obligatorily realized as NTh. This is illustrated by the data in (5), where the tense-aspect marker /-ka-/ is realized as aspirated if it follows a nasal prefix, cf. (5b).
(5a) wa-ka-ndi-tumila 's/he sent me for' (5b) ŋ-k h a-tumikila 'I was sent for' Because root-initial position realizes all phonemic contrasts, we consider it a position of prominence, following work like Beckman (1997). Tumbuka is a phrasal stress language, with the correlates of stress being lengthening of the phrase-penult syllable and association of a High tone with the penult syllable (Downing 2006, to appear). The penult is, then, also considered a position of prominence when it realizes these stress correlates. (Vowel length is not contrastive in Tumbuka.) As work like Hubbard (1994) has shown for other Bantu languages, both root-initial consonants and consonants in the onset of syllables with phrasal stress are commonly longer in duration than other consonants in the word, as one would expect if they are in prominent positions.

Participants and stimuli
We recorded 7 native speakers of Tumbuka (3 male, 4 female) reading sentences that contained D, T, Th, ND, NTh and N at the beginning of prominent syllables (mostly stressed, often root-initial). These segments or segment sequences were preceded and followed by a vowel. An example sentence is given in (6); the prominent syllable is bolded.

(6) [ŋk h atunga ˈmp h áasa] 'I sewed the mats'
There was a total of 108 sentences in the set that the speakers read, and they were asked to produce at least four repetitions of each sentence. The recordings were made in Malawi (in Zomba and Mzuzu) under fieldwork conditions, hence several of the tokens had to be excluded due to background noise.
In the following, we present the analysis of the data of four speakers.

What we measured and why
We used harmonics-to-noise ratio (HNR) as a way to evaluate the degrees of voicing/aspiration of the nasals in our data. HNR is the ratio between periodicity (or voicing) and friction in the signal. A vowel has a very high HNR with values above 20 dB because it is only periodic, while a voiced fricative has a HNR around 5 dB, since it is periodic and noisy. A voiceless fricative has negative HNR values around -2 dB because it has only aperiodic noise. Nasals are not noisy but voiced, and therefore have vowel-like, high HNR values (see Boersma 1993). Preaspiration in nasal-plosive sequences is expected to lower the HNR of the nasal considerably due to its frication noise. Since HNR combines voicing and noisiness, we preferred it to the more common measure of periodicity, which only considers voicing.
In order to answer research question 1 -whether present-day Tumbuka supports a misalignment account -we compared the duration and the HNR of aspiration in NTh sequences to the aspiration in Th. Misalignment predicts shorter and less intense aspiration for NTh sequences than for Th.
To answer research question 2 -whether there is support for the development NTh > Th in Tumbuka -we compared HNR and duration of the nasals in three contexts: before T, before D, and without a preceding plosive. The nasal in NTh is expected to be weaker (i.e. less voiced and possibly preaspirated) than in ND or plain nasals and so be more likely to get deleted.
In order to test research question 3 -whether there is support in Tumbuka for the development NTh > Nh -we compared the duration of the closure phase in NTh with that in the plain plosives and in ND. The oral closure phase in NTh is expected to be shorter than in ND and in the plosives without a preceding nasal (Th/T/D) to account for why a later loss of the plosive occurs only in NTh sequences. Figure 4 is an illustration of the three acoustic events of a nasal-plosive sequence, i.e. nasal closure, oral closure, and burst with possible aspiration. These events are labeled in Praat (Boersma & Weenink 2017). The preceding and following vowels (V) are also labeled. Burst and aspiration noise, though usually easily distinguishable, were not labeled or measured separately in our study.

How we measured HNR and duration
In Figure 4, the nasal /m/ is clearly distinct from the preceding vowel (by its lower amplitude and weaker formants) and from the following closure phase of the plosive /p/ (where the amplitude is even lower). However, our data contained some instances where the nasal and the closure phase were indistinguishable, as illustrated in Figure 5. In the example in Figure 5, no oral plosive is discernible. Instead, voicing and nasal formants spread throughout the whole closure phase, while friction noise already starts before the release of the plosive. The latter could be an instance of preaspiration. Items like these without visible closure phase were excluded from the analysis.
Several items showed vowel nasalisation, and in such instances we employed amplitude and change in formant visibility to determine the boundary between vowel and nasal segment.
For the nasal, plosive closure and, if present, the aspiration, the duration was measured. For the nasal and the aspiration noise, the HNR was calculated (in Praat) with time steps of 0.01 s, a minimum pitch of 75 Hz, a silence threshold of 0.1 and 1 period per window. This calculation could only be performed if the duration of the nasal/ aspiration noise was at least 0.026 s. Figure 6 summarizes the duration measurements of all segments. As is obvious from Figure 6, the duration of the nasals in ND and NTh is almost the same, while both are shorter than the duration of N. The oral stop in both nasal-stop sequences, on the other hand, is extremely short compared to Th, T and D.

Results
The following results are not statistically analysed, as they are preliminary results, of only four of the seven speakers. We added standard deviations for each measure to give an impression of the variation and possible overlap in values. Table 4 gives the results of the comparison of the nasals in NTh, ND and N with respect to duration and HNR. These results show that the nasal in NTh is minimally shorter than nasals in the other positions, while its HNR of 16 dB indicates that it is less voiced and noisier than the nasals in ND and N.     We saw already in Figure 6 that there is a considerable difference between the closure phase in nasal-plosive sequences and that in plain plosives. This is attested by the values in Table 5: the difference in closure duration between nasal-plosive sequences and plosives without preceding nasal is considerable, while the difference between NTh and ND is minimal (5 ms, and thus below the perceptual threshold). 4 When comparing the duration of the nasal and the oral closure part for the nasal-stop sequences, one can observe that the nasal is about four times longer than the oral closure in Tumbuka. Similar relations have been reported for other Bantu languages such as Ikalanga (S.10, Botswana: Beddor & Onuswan, 2003) or Sukuma (F.21, Tanzania: Maddieson 1993, Maddieson & Ladefoged 1993. The duration and HNR measures for aspiration are given in Table  6. Note that we averaged across all following vowel contexts and therefore ignored the factor that following high vowels cause longer aspiration noise.  As we can see from the mean values in Table 6, the aspiration in NTh is shorter than that in Th (by 15 ms), and there is a considerable difference in HNR between the two: aspiration in NTh has a mean HNR of 5 dB, indicating less noise than expected for voiceless friction, 5 cf. the mean value of 0.9 dB for the aspiration in Th.
Note that we found strong individual differences in aspiration duration: for one speaker the aspiration in NTh was longer than in Th, thus showing the reverse pattern to all other speakers. Table 6 also includes the mean burst duration for the non-aspirated voiceless T of 20 ms, showing that this burst duration is quite long. For 53 of the 223 tokens of ND we could also measure distinct burst durations of a mean of 10 ms, and for 70 of the 257 D tokens we measured a mean burst duration of 13 ms.

Discussion
Let us consider our findings in the light of the research questions, repeated in (7) for convenience, with the outcome added: Research questions and outcome *NT > NTh RQ1: Is there acoustic evidence that the aspiration in NTh is due to misalignment and therefore less strong than the one in Th?

RQ2:
Is there acoustic evidence that the nasal in NTh is weaker and therefore more likely to be deleted than in ND?

RQ3:
Is there acoustic evidence that the plosive in NTh is weaker and therefore more likely to be deleted than in ND?

NO
With respect to research question 1, the aspiration in NTh showed a shorter duration and higher HNR values indicating less frication, and therefore providing acoustic support to a possible misalignment account.
With respect to research question 2, acoustic support for weakness of the nasal in NTh, which would make it more prone to deletion in this context, could also be found: the nasal in NTh has a mean HNR of 16 dB, indicating that it is less voiced and/or noisier than the nasal in ND or the nasal without following plosive. Furthermore, the nasal in NTh was shorter (though minimally) than the other two nasals.
With respect to research question 3, whether the plosive in NTh is weaker/shorter than in ND, the closure durations measured in this study did not provide evidence for this: the closure in NTh and ND are both very short compared to that of plosives without preceding nasal.

Conclusion
Our acoustic measurements of the present-day Tumbuka laryngeal contrast in plosives provide partial support for the historical phonologisation scenario proposed by Givón (1974), Hinnebusch (1973Hinnebusch ( , 1975, and Kerremans (1980), presented in (2). We split and adjust this scenario in (8) by not including a step of nasal devoicing because previous experimental studies did not support this assumption (see the discussion in section 2.1), and because the nasal in Tumbuka NTh shows no devoicing.
(8a) NT > NTh > Nɦ (8b) NT > NTh > Th With respect to the motivation of the first development common to both (8a) and (8b), we found that aspiration in NTh is on average weaker (shorter and with lower HNR values) than in Th, though one speaker showed the reverse pattern. Weakness of aspiration in NTh could support a misalignment account, but then we would not expect such speaker-specific variation. We prefer a perceptual account of the development of aspiration as proposed by Hamann & Downing (2017): the plosives in NT and ND are difficult to distinguish, especially given their short duration, and therefore languages prolong the burst and employ aspiration to perceptually enhance this contrast.
Our acoustic study furthermore showed that the oral stop in Tumbuka NTh is similar to that in ND, hence we found no acoustic support for claiming that the stop in NTh is more prone to deletion than the stop in ND, cf. the second step in (8a). We follow again Hamann & Downing (2017) in proposing that there are perceptual reasons why the stop in NTh can disappear. The presence of aspiration as a perceptual cue keeps NTh and N distinct, while the deletion of D in ND would render ND and N perceptually indistinguishable.
Our acoustic study further showed that the nasal in NTh is weaker than any other nasal in Tumbuka, and it is therefore more likely to be deleted than the nasal in ND, cf. the second step in (8b). As we argued in section 2.2, there is no articulatory explanation that could account for this development. Future studies will have to show whether the loss of the nasal in (8b) is due to the aspiration of the stop or whether it is also observable in NT sequences. Ideally, this would be tested by comparing ND, NT and NTh within a language, though no language with this threeway contrast exist, as far as we know. However, comparisons of ND -NT sequences e.g. in languages like Zulu could shed light on this question.
In sum, we need more phonetic studies of NC sequences in languages representing different points in the phonologization seriation to arrive at a better understanding of the development of the range of attested synchronic correlates of Proto-Bantu *NT.

Comments invited
PiHPh relies on post-publication review of the papers that it publishes. If you have any comments on this piece, please add them to its comments site. You are encouraged to consult this site after reading the paper, as there may be comments from other readers there, and replies from the author. This paper's site is here: https:// 10.2218/pihph.3.2018.2825