Acoustic and Articulatory Difference of Speech

Onaka et al.
Domain-initial strengthening in Japanese
ACOUSTIC AND ARTICULATORY DIFFERENCE OF SPEECH SEGMENTS AT DIFFERENT PROSODIC POSITIONS

Akiko Onaka1, Sallyanne Palethorpe1, Catherine Watson1 & Jonathan Harrington2
1
Macquarie Centre for Cognitive Science and Speech, Hearing and Language Research Centre, Macquarie University 2 Institute of Phonetics and Digital Speech Processing, University of Kiel, Germany
ABSTRACT: It has been claimed that phonetic and acoustic properties of consonants were influenced by high-level linguistic structure such as their prosodic position. The study presents acoustic and articulatory comparisons between consonant /t/ at different prosodic positions in Japanese. Tokens are obtained from native Japanese speakers using electropalatography and electromagnetic articulography. Results show that the domain-initial strengthening effect is present in Japanese. The duration of the consonants in higher prosodic domains is longer than that in lower prosodic domains. The consonants in higher prosodic domains have more linguopalatal contact than those in lower prosodic domains. Nonetheless, these effects appear to be less systematic in Japanese. INTRODUCTION Individual speech segments appear to be sensitive to high-level linguistic structure such as prosody (e.g., Dilley et al., 1996). Some previous studies found that phonetic and acoustic properties of individual speech segments were sensitive to high-level linguistic structure such as their prosodic position (e.g., Dilley et al., 1996). Keating et al. (1999) have found that the articulation of consonants at the beginning of higher prosodic domains, such as intonational phrases, had more linguopalatal contact and thus a more constricted place of articulation than consonants at the beginning of lower prosodic domains, such as intermediate phrases and words. This domain-initial strengthening effect was found in English, French, Taiwanese and Korean (Keating et al., 1999), but it was not realised in the same manner. For example, Korean showed a more systematic strengthening effect in all prosodic domains than the other languages studied, thus indicating that some aspects of domain initial strengthening may be language specific: domain-initial strengthening might be associated with kinds of prosodic organization in a given language and therefore might be realised in different ways in different languages. In Korean, the left edge of prosodic domain is reinforced and this could contribute to the strong domain-initial strengthening effect found in this language. Therefore, there might be a possibility that other typologically-similar languages would also show this relationship between strengthening and lengthening domain-initially. One way to test this would be to examine Japanese as it is typologically similar to Korean. Like Korean, Japanese is a predicate-final and a head-final language and thus has a left-edge reinforcement. In addition, Japanese has a similar prosodic structure to Korean in that it has an accentual phrase level, but no lexical stress or lexical tone. Korean and Japanese have similar linguistic structure and if the effect of prosodic position is linguistically driven then they should show similar effects. The present study aims to examine whether domain-initial strengthening is a physiological effect or a linguistic effect driven by the prosodic organisation in a given language. To this end, this study is a comparison of both acoustic and articulatory characteristics of consonants in different prosodic positions in Japanese, in order to see whether domain-initial strengthening occurs and how it is realised. METHOD Materials The speech materials used in this study consisted of sets of test sentences containing the target consonant /t/ at 5 prosodic levels: utterance, intonational phrase, accentual phrase, word and mora. The target consonant was placed in initial position in each of the 5 prosodic positions. It was preceded and followed by the same vowel, /a/, and placed at the same mora count position to factor out possible declination effects (Keating et al., 1999). The test sentences contained real words for the phoneme /t/ in utterance-initial (Ui), intonational phrase-initial (IPi), accentual phrase-initial (APi), Proceedings of the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 148
Onaka et al. Domain-initial strengthening in Japanese word-initial (Wi) and mora-initial (Mi) positions. Since it was difficult to construct meaningful sentences for all the prosodic domains with the consonant in the same mora count position, the stimuli set included two subsets. Ui, IPi, APi and Wi positions are grouped together as Subset 1, and Wi and Mi positions as Subset 2. The surrounding consonants were controlled as far as possible in keeping with the need to formulate meaningful sentences. Two different types of stimuli were included in this study. Examples from one type of the stimuli are shown in Table 1. Table 1. A type of the stimuli sentences containing different prosodic positions. The target consonant and surrounding vowels are underlined.
Prosodic Domain Test Sentence
Subset 1 (Ui, IPi, APi and Wi level) Ui Koko wa Hamadera. Takenoko ga yuumei desu. Here is Hamadera. This is famous for bamboo shoot. IPi Koko wa Hamadera, takenoko no sanchi desu. Here is Hamadera, (which is) a producing district of bamboo shoot. APi Koko de wa tashika takenoko o tabemasu. People here probably eat bamboo shoot. Wi Koko wa Hamadera takenoko ga yuumei desu. Here is famous for Hamadera-bamboo shoot. Subset 2 (Wi and Mi level) Wi Arisa wa nureta tatami o fukimashita. Arisa wiped a wet tatami mat. Mi Risa wa nureta tatami o fukimashita. Risa wiped a wet tatami mat.
Subjects and procedure Electropalatography (EPG) was used to measure linguopalatal contact to show the degree of oral constriction and jaw and lip movement were measured using the electromagnetic articulography (EMA). In addition, acoustic measurements including duration of the target consonant, voice onset time (VOT), duration of surrounding vowels and RMS were collected from acoustic data. In this study the results of acoustic and EPG data are presented. The speech samples analysed in this study were collected from 2 female speakers of standard Japanese. The speech data was recorded in a sound-treated studio at the Speech Hearing and Language Research Centre, Macquarie University. Acoustic and articulatory data were collected simultaneously using a Carstens Electromagnetic Articulography system and the Laryngograph EPG3 electro-palatograph. The speech data was sampled at 20 kHz and quantized to a 16-bit number and the EMA data was sampled at 100 Hz. The sensor coils were placed in the midsagittal plane on the vermilion border of the upper and lower lips and the apex of the mandible. A sensor on the bridge of the nose and a reference sensor at the incisors provided data for a frame of reference for the movement co-ordination system. The stimuli were presented to the subjects one sentence at a time, in blocks: each block contained the target sentence, repeated 5 times in succession, for all the prosodic positions. The blocks were repeated twice, with the sentences in the second block given in reverse order to avoid list effects. Thus, each sentence was repeated at least 10 times in total giving a minimum of 120 tokens per subject. The speech data was segmented and labeled phonetically by the first author. The segmentation and labeling were carried out in EMU, a speech data management system (Cassidy & Harrington, 2001). Proceedings of the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 149
Onaka et al. Domain-initial strengthening in Japanese The labelling criteria follow standard acoustic labelling criteria such as those described in Croot & Taylor (1995). Auditory analysis revealed that the intended prosodic structure was achieved for all utterances. This will be further verified using Japanese ToBI system (Venditti, 1995). RESULTS Acoustic measures Table 2 presents the results of the differences in duration of the consonant /t/, and its preceding and following vowels at different prosodic positions. For Subset 1, a one-way analysis of variance (ANOVA) was used ( < 0.01), with post-hoc Scheffe multiple comparisons (p=( <.001). For Subset 2, with only two levels for comparison, two-tailed t-test were used ( < 0.01). Table 2. Mean duration of /t/ and neighbouring vowels, and their standard deviation (in parenthesis) in different prosodic positions. Significant differences for Subset 2 (Wi and Mi) are indicated by * ( < 0.001); all alpha values were adjusted for multiple testing.
Duration (ms) Ui total /t/ duration [t] closure VOT V1 V2 180.6 (78.4) 144.6 (75.6) 35.9 (12.8) 101.1 (27.8) 37.5 (9.7) Prosodic Domain Subset 1 IPi 159.8 (31.4) 124.9 (32.5) 34.8 (8.0) 91.9 (29.8) 31.7 (7.9) APi 119.1 (42.5) 86.3 (38.4) 32.8 (12.1) 92.1 (44.1) 32.0 (10.3) Wi 88.2 (13.5) 58.2 (8.7) 30.0 (11.0) 47.1 (5.9) 44.9 (11.2) Prosodic Domain Subset 2 Wi 78.8 (12.4) 56.4 (8.9) 22.3* (6.3) 55.7* (10.2) 43.4* (13.7) Mi 70.4 (11.1) 54.0 (10.3) 16.4* (4.6) 28.5 * (13.9) 57.9* (11.7)
These acoustic measurements are taken to see whether different prosodic positions affect acoustic properties of the target consonant and surrounding sounds. As can be seen, there was a tendency for consonant duration to become longer at higher prosodic domains. For most of durational measures taken here, domain-initial strengthening was found: that is, the higher the prosodic position, the longer the duration. The duration of both the entire consonant and the /t/ closure was found to increase from lower prosodic domains to higher prosodic domains. The total duration of /t/ and /t/ closure at upper levels of higher prosodic domain (Ui and IPi) was found to be significantly longer than Wi position. Furthermore, these durations at Ui position were significantly longer than those at Ai position. In contrast, for Subset 2, there was no significant difference in the /t/ closure duration between Wi and Mi, and the significance level for the duration of /t/ closure was at p < 0.002, slightly above the chosen level of 0.001. As shown in Table 2, there was little difference for VOT in different prosodic domains for Subset 1. This was also found in other languages except for Korean which showed significantly different VOT at all higher levels above the word. For Subset 2, on the other hand, VOT was found to be significantly different between Wi and Mi positions. Although the difference between Subset 1 and Subset 2 was not compared, Table 2 also shows clear distinction between the subsets (Ui, IPi, APi vs. Wi, Mi) for VOT. Vowels preceding the consonants (V1) in higher prosodic domains were also found to be longer than ones in lower prosodic domains. The duration of preceding vowels at Ui, IPi and APi was significantly longer than that of WI. For Subset 2, there was also significant difference between duration of preceding vowels of Wi and Mi with Wi being longer than Mi. Vowels following the consonant (V2), however, displayed different patterns on their duration. The following vowels were shorter in higher prosodic domains and increased towards lower prosodic domains. Statistical analysis also showed Proceedings of the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 150
Onaka et al. Domain-initial strengthening in Japanese significant difference in these durations between upper levels of prosodic domain (IPi and APi) and Wi for Subset 1. Also, V2 in Subset 2 exhibited significant difference in their duration. One possible reason why the vowel following the consonant was shorter in higher domains and longer in lower domains may be due to the fact that Japanese is a mora-timed language and durational compensation between adjacent segments within a mora could be stronger in Japanese (MinagawaKawai, 1999). Nonetheless, the issue is controversial itself and requires further exploration (cf. Warner & Arai, 2001). Table 3. Mean RMS and standard deviation (in parenthesis) of the consonant /t/ burst for different prosodic positions.
Prosodic Domain Subset 1 Ui RMS 423.0 (160.2) IPi 477.0 (208.3) APi 563.7 (140.8) Wi 675.6 (259.0) Prosodic Domain Subset 2 Wi 728.6 (317.3) Mi 999.6 (660.9)
Means and standard deviations for RMS values of burst of the consonant /t/ are given in Table 3. The result showed that the higher the prosodic position, the lower the RMS energy as reported in Cho & Keating (2001). RMS of Ui and IPi was significantly smaller than that of Wi. However, for Subset 2 (Wi and Mi) the difference in RMS was found to be non-significant. Articulatory measures
Figure 1. Contact profiles of /t/ at maximum linguopalatal contact in different prosodic positions. Palatogram was averaged across all tokens and subjects. The linguopalatal contact was measured from the EPG data. The frame with the most contact was identified by measuring the number of electrodes contacted. Palatograms of maximum palatal contact in different prosodic domains were plotted and shown in Figure 1. As can be seen, more contact was found in the higher prosodic domains than in the lower domains. Here, there is a Proceedings of the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 151
Onaka et al. Domain-initial strengthening in Japanese domain-initial strengthening effect for linguopalatal contact (F(5,243)=6.65, p=0.001): Post-hoc tests indicated that linguopalatal contact in Ui and IPi was significantly greater than that in Wi position: that is, the higher the prosodic position, the greater the contact. For Subset 2, however, there was no significant effect of prosodic position on linguopalatal contact. Centre of gravity (COG) index was taken to see distribution of linguopalatal contact over the palate. COG index is a measure of the concentration of linguopalatal contact across the palate (Hardcastle, Gibbon & Nicolaidis, 1991). A high COG value indicates that a place of articulation is towards anterior region while a low COG value represents a place of articulation is towards posterior of the palate. The consonant in higher domains had more linguopalatal contact at front region as can be seen in Figure 1. Table 4 shows that COG values did not differ significantly either in higher prosodic domains (Ui, IPi, APi) or in lower prosodic domains (Wi and Mi), although for Subset 1 there was a slight difference between upper domains (Ui, IPi, and APi) and Wi. Table 4. Mean COG values and standard deviation (in parenthesis) of the consonant /t/ in different prosodic positions.
Prosodic Domain Subset 1 Ui COG 4.94 (0.13) IPi 4.95 (0.12) APi 4.96 (0.13) Wi 4.85 (0.19) Prosodic Domain Subset 2 Wi 4.67 (0.16) Mi 4.70 (0.24)
Figure 2 presents linguopalatal contact at maximum contact point. The total number of electrodes contacted at maximum contact point was taken and its percentage over the whole palate was calculated. The result shows that /t/ in the higher domains tend to be produced with more linguopalatal contact. For example, Ui and IPi had significantly greater contact than Wi. For Subset 2 Mi had more linguopalatal contact than Wi but there was no significant difference between them. One thing to notice in our data was that overall linguopalatal contact found for Japanese /t/ appears to be much smaller than that found in other languages (cf. Keating et al., 1999). Whether this is a possible characteristic of Japanese /t/ or it was due to some temporal effects in our data such as voicing of /t/ in some tokens requires further investigation.
Figure 2. Amount of linguopalatal contact (in percentage) of /t/ at maximum contact time: Subset 1 on the left and Subset 2 on the right. Error bars represent standard deviations. A strong correlation between linguopalatal contact and acoustic duration was found in Korean (Cho & Keating, 2001). Therefore, to see whether there was a similar relationship in Japanese, the correlation between acoustic duration and linguopalatal contact was tested using a non-parametric Spearmans rho correlation coefficient with a one-tailed test of significance. Like Korean, Japanese Proceedings of the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 152
Onaka et al. Domain-initial strengthening in Japanese showed significant (p=0.001) positive correlation between maximum linguopalatal contact and the durations of /t/ (r2=0.447), /t/ closure (r2=0.479), VOT (r2=0.297) and V1 (r2=0.361). Similarly, significant correlations were also found between COG and the acoustic durational measures. The exception to this was the negative correlation found between V2 and maximum linguoapalatal contact (r2=-0.318) and COG (r2=-0.353). This suggests that, in Japanese, differences in linguopalatal contact could result from differences in duration as in Korean. CONCLUSION The results of this study show that there were effects of prosodic positions on articulatory and acoustic properties of the target consonant, suggesting that domain initial strengthening is present in Japanese. Acoustic duration and linguopalatal contact varied according to prosodic positions. Strong correlations between linguopalatal contact and durations were also found. Nonetheless, the domaininitial strengthening effect found in Japanese was less clear than in Korean despite the fact that they have similar linguistic properties. This has been a preliminary investigation of domain-initial strengthening in Japanese. Thus, further work needs to be done in order to describe more fully domain-initial strengthening in Japanese. An analysis of the other physiological measures collected, such as jaw and lip movement, needs to be carried out in order to verify the results of present study. There might be language specific aspects of articulatory organisation in Japanese with regards to domain-initial strengthening. REFERENCES Cassidy, S. and Harrington, J. (2001). Multi-level annotation in the Emu speech database management system, Speech Communication 33, (1-2), 61-77. Cho, T. and Keating, P. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean, Journal of Phonetics 29, 155-190. Croot, K. and Taylor, B. (1995). Criteria for Acoustic-Phonetic Segmentation and Word Labelling in the Australian National Database of Spoken Language, [http://www.shlrc.mq.edu.au/criteria.html]. Dilley, L., Shattuck-Hufnagel, S. and Ostendorf, M. (1996). Glottalization of word-initial vowels as a function of prosodic structure, Journal of Phonetics 24, 423-444. Hardcastle, W. J., Gibbon, F. and Nicolaidis K. (1991). EPG data reduction methods and their implications for studies of lingual coarticulation, Journal of Phonetics 19, 251-266. Keating, P., Cho, T., Fougeron, C. and Hsu, C. (1999). Domain-initial articulatory strengthening in four languages, UCLA Working Papers in Linguistics 97, 139-156. (Also to appear in Papers in Laboratory Phonology VI, Cambridge, U.K.: Cambridge University Press.) Minagawa-Kawai, Y. (1999). Preciseness of temporal compensation in Japanese mora timing, Proceedings of the 14th International Congress of Phonetic Science, 365-368. Venditti, Jennifer J. (1995). Japanese ToBI state.edu/Phonetics/J_ToBI/jtobi_homepage.html]. Labelling Guidelines, [http://ling.ohio-
Warner, N. and Arai, T. (2001). Japanese mora-timing: a review, Phonetica 58, 1-25. [http://www.karger.com/journals/pho].
Proceedings of the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 153

Acoustic and Articulatory Difference of Speech

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Acoustic and Articulatory Difference of Speech

Diunggah oleh

Hak Cipta:

Format Tersedia

Onaka et al.

Domain-initial strengthening in Japanese

ACOUSTIC AND ARTICULATORY DIFFERENCE OF SPEECH SEGMENTS AT DIFFERENT PROSODIC POSITIONS

Anda mungkin juga menyukai