Comparative Analysis of Speech Rhythm Measures for Persian Speaker Identification: Duration vs. Intensity

Document Type : Research

Author

Assistant Professor of Linguistics, University of Isfahan, Isfahan, Iran

Abstract

Previous studies have demonstrated the efficacy of speech rhythm measures in speaker identification across various languages with different phonotactic structures. In Persian language, in particular, two categories of speech rhythm metrics were examined: duration-based and intensity-based metrics. Building upon these prior works, the current study delves deeper into the discrimination capabilities of the mentioned measurement types—duration-based versus intensity-based—in the context of Persian speakers. To achieve this, a multinomial logistic regression model was employed on a dataset comprising 20 male Persian speakers, each reciting 100 sentences at a normal speaking pace. Findings revealed that, when distinguishing between Persian speakers, duration-based measures outperform intensity-based ones, however, this excellence is very slight. This observation is significant, as it sheds light on the suitability of specific rhythm metrics for Persian speaker identification. I postulate that this discrepancy in performance may be attributed to the simple syllable structure of Persian and the lesser reliance on intensity as a primary indicator of lexical stress. This research contributes valuable insights into the choice of rhythm metrics for optimal Persian speaker identification and underscores the importance of considering linguistic features when developing speaker recognition systems.Top of Form

Keywords

Main Subjects


  1. Asadi, H. & Alinezhad, B. (2023). Between-speaker syllable intensity variability in Persian. In 20th International Congress of the Phonetic Sciences (ICPhS), 3804-3808, Prague, Czech Republic.
  2. Asadi, H., & Alinezhad, B. (2022). Speech Rhythm Measures: Acoustic Cues for Speaker Identification. Language Research, 12(2), 29-49. https://doi.org/10.22059/jolr.2021.304539.666624
  3. Asadi, H., Nourbakhsh, M., He, L., Pellegrino, E. & Dellwo, V. (2018). Between-speaker rhythmic variability is not dependent on language rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151- 174. https://doi.org/10.1558/ijsll.37110
  4. Bijankhan, M. (2018) Phonology. In A. Sadeghi & P. Shabani-Jadidi (Eds.), The Oxford Handbook of Persian Linguistics, 111–141. Oxford: Oxford University Press.
  5. Boersma, P. & Weenink, D. (2013). Praat: Doing Phonetics by Computer. http://www.praat.org, Accessed 13 July 2013.
  6. Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A. & Ghazanfar, A.A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436. https://doi.org/10.1371/journal.pcbi.1000436
  7. Dellwo, V. (2010). Influences of speech rate on the acoustic correlates of speech rhythm: An experimental phonetic study based on acoustic and perceptual evidence. PhD dissertation, Bonn University.
  8. Dellwo, V. & Fourcin, A. (2013). Rhythmic characteristics of voice between and within languages. Travaux Neuchâtelois de Linguistique, 59: 87–107. https://www.zora.uzh.ch/id/eprint/91230/
  9. Dellwo, V., Leemann, A. & Kolly, M. (2012). Speaker idiosyncratic rhythm features in the speech signal. In Proceedings of INTERSPEECH, Portland, USA. https://doi.org/10.5167/uzh-68554
  10. Dellwo, V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America137(3), 1513–1528. https://doi.org/10.1121/1.4906837
  11. Dellwo, V. & Wagner, P. (2003). Relations between language rhythm and speech rate. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), 471-474. Barcelona, Spain. https://doi.org/10.5167/uzh-111779
  12. Fry, D.B. (1958). Experiments in the perception of stress. Language and Speech, 1(2), 126-152. https://doi.org/10.1177/002383095800100207
  13. Garnier, M., Wolfe, J., Henrich, N. & Smith, J. (2008). Interrelationship between vocal effort and vocal tract acoustics: a pilot study. In Proceedings of INTERSPEECH, 2302-2305. Brisbane, Australia. http://dx.doi.org/10.21437/Interspeech.2008-588
  14. Grabe, E. & Low, E. L. (2002). Durational variability in speech and rhythm class hypothesis. In N. Warner & C. Gussenhoven (Eds.), Papers in Laboratory Phonology 7, 515-543, Berlin and New York: Mouton de Gruyter. https://doi.org/10.1515/9783110197105.2.515
  15. He, L. & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. The International Journal of Speech, Language and the Law. Vol 23, 243-273. https://doi.org/10.1558/ijsll.v23i2.30345
  16. He, L., & Dellwo, V. (2014). Speaker idiosyncratic variability of intensity across syllables. In Proceedings of INTERSPEECH, 233-237, Singapore. https://doi.org/10.5167/uzh-103024
  17. Lazard, G. (1992). Grammar of contemporary Persian. Mazda Publishers.
  18. Leemann, A., Kolly, M.-J., & Dellwo, V. (2014). Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Science International, 238, 59-67. https://doi.org/10.1016/j.forsciint.2014.02.019
  19. Marcus, S. (1981). Acoustic determinants of perceptual center (p-center) location. Perception and Psychophysics, 30, 247–256. https://doi.org/10.3758/bf03214280
  20. Moez, Ajili., Bonastre, Jean- François., Rossato, Solange. (2018). Voice comparison and rhythm: Behavioral differences between target and non-target comparisons. In Proceedings of INTERSPEECH, 1061-1065. Hyderabad, India. https://doi.org/10.21437/Interspeech.2018-61
  21. Nolan, F. & Asu, E. L. (2009). The pairwise variability index and coexisting rhythms in language. Phonetica, 66(1–2), 64–77. https://doi.org/10.1159/000208931
  22. Prieto, P., del Mar Vanrell, M., Astruc, L., Payne, E., & Post, B. (2012). Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish. Speech Communication, 54, 681–702. https://doi.org/10.1016/j.specom.2011.12.001
  23. R Core Team (2021) R: A Language and Environment for Statistical Computing (version 3.3.3). R Foundation for Statistical Computing. http://www.Rproject.org, Accessed 20 November 2021.
  24. Ramus, F., Nespor, M. & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, Vol 73, 265-292. https://doi.org/10.1016/S0010-0277(00)00101-3
  25. Rose, P. (2002). Forensic speaker identification, New York: Taylor & Francis.
  26. Sadeghi, V. (2011). Acoustic correlates of lexical stress in Persian. In Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS), 1738-1741. Hong Kong.
  27. Sadeghi, V. (2015). A phonetic study of vowel reduction in Persian, Language Related Research, 30, 165–187. http://lrr.modares.ac.ir/article-14-7916-en.html
  28. Taghva, N., Moloodi, A., & Abolhasanizadeh, V. (2021). Acoustic correlations of speech rhythms in Persian based on variability of between-speakers characteristics. Journal of Researches in Linguistics, 12(2), 27-50. https://doi.org/10.22108/jrl.2021.126261.1535
  29. Taghva, N., Moloodi, A., Abolhasanizadeh, V., & Tabei, R. (2023). A corpus study of durational rhythmic measures in the Kalhori variety of Kurdish. Loquens10(1-2), e098. https://doi.org/10.3989/loquens.2023.e098
  30. Tilsen, S. & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of theamplitude envelope: characterizing rhythmic patterns within and across languages. Journal of the Acoustical Society of America, 134(1), 628–639. https://doi.org/10.1121/1.4807565
  31. Wang, Q. (2008). L2 stress perception: The reliance on different acoustic cues. In Speech Prosody, 635-638. Campinas, Brazil.
  32. Weingartova, Lenka. (2014). Rhythm metrics for speaker identification in Czeck. ActaUniversitatis Carolinae Philologica, 1(10), 33-42.
  33. White, L. & Mattys, S.L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501–522. https://doi.org/10.1016/j.wocn.2007.02.003
  34. Wiget, L., White, L., Schuppler, B., Grenon, I., Rauch, O., & Mattys, S. L. (2010). How stable are acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America, 127(3), 1559–1569. https://doi.org/10.1121/1.3293004
  35. Windfuhr, G. L. (1979). Persian grammar: History and state of its study. New York: De Gruyter Mouton.
  36. Yoon, T.J. (2010). Capturing inter-speaker invariance using statistical measures of speech rhythm. In Electronic Proceedings of Speech Prosody, (pp. 1-4), Chicago/IL, USA. https://doi.org/10.21437/SpeechProsody.2010-58