Comparative Analysis of Speech Rhythm Measures for Persian Speaker Identification: Duration vs. Intensity

Asadi, Homa

doi:10.22051/jlr.2023.45448.2370

Comparative Analysis of Speech Rhythm Measures for Persian Speaker Identification: Duration vs. Intensity

Document Type : Research

Author

Homa Asadi

Assistant Professor of Linguistics, University of Isfahan, Isfahan, Iran

10.22051/jlr.2023.45448.2370

Abstract

Previous studies have demonstrated the efficacy of speech rhythm measures in speaker identification across various languages with different phonotactic structures. In Persian language, in particular, two categories of speech rhythm metrics were examined: duration-based and intensity-based metrics. Building upon these prior works, the current study delves deeper into the discrimination capabilities of the mentioned measurement types—duration-based versus intensity-based—in the context of Persian speakers. To achieve this, a multinomial logistic regression model was employed on a dataset comprising 20 male Persian speakers, each reciting 100 sentences at a normal speaking pace. Findings revealed that, when distinguishing between Persian speakers, duration-based measures outperform intensity-based ones, however, this excellence is very slight. This observation is significant, as it sheds light on the suitability of specific rhythm metrics for Persian speaker identification. I postulate that this discrepancy in performance may be attributed to the simple syllable structure of Persian and the lesser reliance on intensity as a primary indicator of lexical stress. This research contributes valuable insights into the choice of rhythm metrics for optimal Persian speaker identification and underscores the importance of considering linguistic features when developing speaker recognition systems.Top of Form

Keywords

Main Subjects

Phonetics and Phonology

References

Asadi, H. & Alinezhad, B. (2023). Between-speaker syllable intensity variability in Persian. In 20th International Congress of the Phonetic Sciences (ICPhS), 3804-3808, Prague, Czech Republic.
Asadi, H., & Alinezhad, B. (2022). Speech Rhythm Measures: Acoustic Cues for Speaker Identification. Language Research, 12(2), 29-49. https://doi.org/10.22059/jolr.2021.304539.666624
Asadi, H., Nourbakhsh, M., He, L., Pellegrino, E. & Dellwo, V. (2018). Between-speaker rhythmic variability is not dependent on language rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151- 174. https://doi.org/10.1558/ijsll.37110
Bijankhan, M. (2018) Phonology. In A. Sadeghi & P. Shabani-Jadidi (Eds.), The Oxford Handbook of Persian Linguistics, 111–141. Oxford: Oxford University Press.
Boersma, P. & Weenink, D. (2013). Praat: Doing Phonetics by Computer. http://www.praat.org, Accessed 13 July 2013.
Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A. & Ghazanfar, A.A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436. https://doi.org/10.1371/journal.pcbi.1000436
Dellwo, V. (2010). Influences of speech rate on the acoustic correlates of speech rhythm: An experimental phonetic study based on acoustic and perceptual evidence. PhD dissertation, Bonn University.
Dellwo, V. & Fourcin, A. (2013). Rhythmic characteristics of voice between and within languages. Travaux Neuchâtelois de Linguistique, 59: 87–107. https://www.zora.uzh.ch/id/eprint/91230/
Dellwo, V., Leemann, A. & Kolly, M. (2012). Speaker idiosyncratic rhythm features in the speech signal. In Proceedings of INTERSPEECH, Portland, USA. https://doi.org/10.5167/uzh-68554
Dellwo, V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513–1528. https://doi.org/10.1121/1.4906837
Dellwo, V. & Wagner, P. (2003). Relations between language rhythm and speech rate. In Proceedings of the 15^th International Congress of Phonetic Sciences (ICPhS), 471-474. Barcelona, Spain. https://doi.org/10.5167/uzh-111779
Fry, D.B. (1958). Experiments in the perception of stress. Language and Speech, 1(2), 126-152. https://doi.org/10.1177/002383095800100207
Garnier, M., Wolfe, J., Henrich, N. & Smith, J. (2008). Interrelationship between vocal effort and vocal tract acoustics: a pilot study. In Proceedings of INTERSPEECH, 2302-2305. Brisbane, Australia. http://dx.doi.org/10.21437/Interspeech.2008-588
Grabe, E. & Low, E. L. (2002). Durational variability in speech and rhythm class hypothesis. In N. Warner & C. Gussenhoven (Eds.), Papers in Laboratory Phonology 7, 515-543, Berlin and New York: Mouton de Gruyter. https://doi.org/10.1515/9783110197105.2.515
He, L. & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. The International Journal of Speech, Language and the Law. Vol 23, 243-273. https://doi.org/10.1558/ijsll.v23i2.30345
He, L., & Dellwo, V. (2014). Speaker idiosyncratic variability of intensity across syllables. In Proceedings of INTERSPEECH, 233-237, Singapore. https://doi.org/10.5167/uzh-103024
Lazard, G. (1992). Grammar of contemporary Persian. Mazda Publishers.
Leemann, A., Kolly, M.-J., & Dellwo, V. (2014). Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Science International, 238, 59-67. https://doi.org/10.1016/j.forsciint.2014.02.019
Marcus, S. (1981). Acoustic determinants of perceptual center (p-center) location. Perception and Psychophysics, 30, 247–256. https://doi.org/10.3758/bf03214280
Moez, Ajili., Bonastre, Jean- François., Rossato, Solange. (2018). Voice comparison and rhythm: Behavioral differences between target and non-target comparisons. In Proceedings of INTERSPEECH, 1061-1065. Hyderabad, India. https://doi.org/10.21437/Interspeech.2018-61
Nolan, F. & Asu, E. L. (2009). The pairwise variability index and coexisting rhythms in language. Phonetica, 66(1–2), 64–77. https://doi.org/10.1159/000208931
Prieto, P., del Mar Vanrell, M., Astruc, L., Payne, E., & Post, B. (2012). Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish. Speech Communication, 54, 681–702. https://doi.org/10.1016/j.specom.2011.12.001
R Core Team (2021) R: A Language and Environment for Statistical Computing (version 3.3.3). R Foundation for Statistical Computing. http://www.Rproject.org, Accessed 20 November 2021.
Ramus, F., Nespor, M. & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, Vol 73, 265-292. https://doi.org/10.1016/S0010-0277(00)00101-3
Rose, P. (2002). Forensic speaker identification, New York: Taylor & Francis.
Sadeghi, V. (2011). Acoustic correlates of lexical stress in Persian. In Proceedings of the 17^th International Congress of Phonetic Sciences (ICPhS), 1738-1741. Hong Kong.
Sadeghi, V. (2015). A phonetic study of vowel reduction in Persian, Language Related Research, 30, 165–187. http://lrr.modares.ac.ir/article-14-7916-en.html
Taghva, N., Moloodi, A., & Abolhasanizadeh, V. (2021). Acoustic correlations of speech rhythms in Persian based on variability of between-speakers characteristics. Journal of Researches in Linguistics, 12(2), 27-50. https://doi.org/10.22108/jrl.2021.126261.1535
Taghva, N., Moloodi, A., Abolhasanizadeh, V., & Tabei, R. (2023). A corpus study of durational rhythmic measures in the Kalhori variety of Kurdish. Loquens, 10(1-2), e098. https://doi.org/10.3989/loquens.2023.e098
Tilsen, S. & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of theamplitude envelope: characterizing rhythmic patterns within and across languages. Journal of the Acoustical Society of America, 134(1), 628–639. https://doi.org/10.1121/1.4807565
Wang, Q. (2008). L2 stress perception: The reliance on different acoustic cues. In Speech Prosody, 635-638. Campinas, Brazil.
Weingartova, Lenka. (2014). Rhythm metrics for speaker identification in Czeck. ActaUniversitatis Carolinae Philologica, 1(10), 33-42.
White, L. & Mattys, S.L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501–522. https://doi.org/10.1016/j.wocn.2007.02.003
Wiget, L., White, L., Schuppler, B., Grenon, I., Rauch, O., & Mattys, S. L. (2010). How stable are acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America, 127(3), 1559–1569. https://doi.org/10.1121/1.3293004
Windfuhr, G. L. (1979). Persian grammar: History and state of its study. New York: De Gruyter Mouton.
Yoon, T.J. (2010). Capturing inter-speaker invariance using statistical measures of speech rhythm. In Electronic Proceedings of Speech Prosody, (pp. 1-4), Chicago/IL, USA. https://doi.org/10.21437/SpeechProsody.2010-58

ZABANPAZHUHI (Journal of Language Research)

Volume 15, Issue 49 - Serial Number 49
February 2024
Pages 61-82

Article View: 319
PDF Download: 262

Comparative Analysis of Speech Rhythm Measures for Persian Speaker Identification: Duration vs. Intensity

References

Volume 15, Issue 49 - Serial Number 49
February 2024
Pages 61-82

Files

Share

How to cite

Statistics

Comparative Analysis of Speech Rhythm Measures for Persian Speaker Identification: Duration vs. Intensity

References

Volume 15, Issue 49 - Serial Number 49February 2024Pages 61-82

Files

Share

How to cite

Statistics

Volume 15, Issue 49 - Serial Number 49
February 2024
Pages 61-82