Acoustic Models for the Automatic Identification of Prosodic Boundaries in Spontaneous Speech / Modelos acústicos para a identificação automática de fronteiras prosódicas na fala espontânea

Bárbara Helohá Falcão Teixeira, Maryualê Malvessi Mittmann

Abstract


Abstract: This work presents the results of the analysis of multiple acoustic parameters for the construction of a model for the automatic segmentation of speech in tone units. Based on literature review, we defined sets of acoustic parameters related to the signalization of terminal and non-terminal boundaries. For each parameter, we extracted a series of measurements: 6 for speech rate and rhythm; 34 for duration; 65 for fundamental frequency; 4 for intensity and 2 measurements related to pause. These parameters were extracted from spontaneous speech fragments that were previously segmented into tone units, manually performed by 14 human annotators. We used two methods of statistical classification, Random Forest (RF) and Linear Discriminant Analysis (LDA), to generate models for the identification of prosodic boundaries. After several phases of training and testing, both methods were relatively successful in identifying terminal and non-terminal boundaries. The LDA method presented a higher accuracy in the prediction of terminal and non-terminal boundaries than the RF method, therefore the model obtained with LDA was further refined. As a result, the terminal boundary model is based on 20 acoustic measurements and shows a convergence of 80% in relation to boundaries identified by annotators in the speech sample. For non-terminal boundaries, we arrived at three models that, combined, presented a convergence of 98% in relation to the boundaries identified by annotators in the sample.

Keywords: speech segmentation; prosodic boundaries; spontaneous speech.

Resumo: Este trabalho apresenta os resultados da análise de múltiplos parâmetros acústicos para a construção de um modelo para a segmentação automática da fala em unidades tonais. A partir da investigação da literatura, definimos conjuntos de parâmetros acústicos relacionados à identificação de fronteiras terminais e não terminais. Para cada parâmetro, uma série de medidas foram extraídas: 6 medidas de taxa de elocução e ritmo; 34 de duração; 65 de frequência fundamental; 4 de intensidade e 2 medidas relativas às pausas. Tais parâmetros foram extraídos de fragmentos de fala espontânea previamente segmentada em unidades tonais de forma manual por 14 anotadores humanos. Utilizamos dois métodos de classificação estatística, Random Forest (RF) e Linear Discriminant Analysis (LDA), para gerar modelos de identificação de fronteiras prosódicas. Após diversas fases de treinamentos e testes, ambos os métodos apresentaram sucesso relativo na identificação de fronteiras terminais e não-terminais. O método LDA apresentou maior índice de acerto na previsão de fronteiras terminais e não-terminais do que o RF, portanto, o modelo obtido com este método foi refinado. Como resultado, O modelo para as fronteiras terminais baseia-se em 20 medidas acústicas e apresenta uma convergência de 80% em relação às fronteiras identificadas pelos anotadores na amostra de fala. Para as fronteiras não terminais, chegamos a três modelos que, combinados, apresentaram uma convergência de 98% em relação às fronteiras identificadas pelos anotadores na amostra.

Palavras-chave: segmentação da fala; fronteiras prosódicas; fala espontânea.


Keywords


speech segmentation; prosodic boundaries; spontaneous speech.

Full Text:

PDF

References


AUER, P. Zum Segmentierungsproblem in der Gesprochenen Sprache. InLiSt - Interaction and Linguistic Structures, Freiburg, v. 49, p. 1-19, Nov. 2010. Available from: http://www.inlist.uni-bayreuth.de/issues/49/InList49.pdf. Access on: 5 Dec. 2017.

AUSTIN, J. L. How to do things with words. Oxford: Oxford University Press, 1962.

BARBOSA, P. A. At least two macrorhythmic units are necessary for modeling Brazilian Portuguese duration. In: ETRW ON SPEECH PRODUCTION MODELING: FROM CONTROL STRATEGIES TO ACOUSTIC, 1., 1996, Autrans. p. 85-88. Available from: http://www.isca-speech.org/archive_open/spm_96/sps6_085.html. Access on: 5 Dec. 2017.

BARBOSA, P. A. BreakDescriptor. Script para o PRAAT. [Computer program]. 2016.

BARBOSA, P. A. Incursões em torno do ritmo da fala. Campinas: Pontes; Fapesp, 2006.

BARBOSA, P. A. Prominence-and boundary-related acoustic correlations in Brazilian Portuguese read and spontaneous speech. In: BARBOSA, P. A; MADUREIRA, S.; REIS, C. (Ed.). Speech Prosody. Campinas: ISCA, 2008. p. 257-260. Available from: http://aune.lpl.univ-aix.fr/~sprosig/sp2008/papers/id060.pdf. Access on: 5 Dec. 2017.

BARBOSA, P. A. Conhecendo melhor a prosódia: aspectos teóricos e metodológicos daquilo que molda nossa enunciação. Revista de Estudos da Linguagem, Belo Horizonte, v. 20, n. 1, p. 11-27, 2012.

BARBOSA, P. A. Semi-automatic and automatic tools for generating prosodic descriptors for prosody research. In: BIGI, B.; HIRST, D. (Eds.). Proceedings of the Tools and Resources for the Analysis of Speech Prosody. Aix-en-Provence: Laboratoire Parole et Langage, 2013. v. 13, p. 86-89. Available from: http://www.lpl-aix.fr/~trasp/Proceedings/19874-trasp2013.pdf. Acess on: 22 Dec. 2015.

BARTH-WEINGARTEN, D. Intonation Units Revisited: Cesuras in talk-in-interaction. Amsterdam: John Benjamins, 2016.

BATLINER, A. et al. The Prosodic Marking of Phrase Boundaries: Expectations and Results. In: RUBIO AYUSO, A. J.; LOPEZ SOLER, J. M. (Org.). Speech Recognition and Coding: New advances and Trends. Berlin: Springer, 1995. v. 147, p. 89-92.

BIRKNER, K. Relative Konstruktionen zur Personenattribuierung. In: GÜNTHNER, S.; WOLFGANG, I. Konstruktionen in der Interaktion. Berlim: Mouton de Gruyter, 2006. p. 205-238.

BLAAUW, Eleonora. The contribution of prosodic boundary markers to the perceptual difference between read and spontaneous speech. Speech Communication, Elsevier, v. 14, n. 4, p. 359-375, 1994. Available at: http://www.sciencedirect.com/science/article/pii/0167639394900280. Access on: 10 Apr. 2015.

BOERSMA, P.; WEENINK, D. Praat: doing phonetics by computer. 2015. Available from: . Access: 2 dec. 2015

BOLINGER, D. Around the edges of language. In: BOLINGER, D. (Ed.). Intonation: Selected Readings. Harmondsworth: Penguin, 1972. p. 19-29.

BOSSAGLIA, G. Effects of speech rhythm on spoken syntax A corpus-based study on Brazilian Portuguese and Italian. CHIMERA: Romance Corpora and Linguistic Studies, Madri, v. 2, n. 3, p. 265-285, 2016.

BROWN, G. et al. Questions of Intonation. London: Croom Helm, 1980.

BYBEE, J. Language, usage and cognition. Cambridge: Cambridge University Press, 2010.

CHAFE, W. L. Discourse, consciousness and time: The flow and displacement of conscious experience in speaking and writing. Chicago: University of Chicago, 1994.

COLE, J.; SHATTUCK-HUFNAGEL, S.; MO, Y. Prosody production in spontaneous speech: Phonological encoding, phonetic variability, and the prosodic signature of individual speakers. The Journal of the Acoustical Society of America, New York, v. 128, n. 4, p. 2429, 2010.

COOPER, W. E.; PACCIA-COOPER, J. Syntax and speech. Cambridge/MA: Harvard University Press, 1980.

COUPER-KUHLEN, E. Prosodic Cues of Discourse Units. In: BROWN, Keith (Ed.). Encyclopedia of Language & Linguistics. Oxford: Elsevier, 2006. p. 178-182.

CRESTI, E. Corpus di Italiano parlato. Firenze: Accademia della Crusca, 2000. v. 1.

CRESTI, E. Syntactic properties of spontaneous speech in the L-AcT framework: data on Italian complement and relative clauses through the IPIC Data Base. In: RASO, T.; MELLO, H.; PETTORINO, M. (Ed.). Spoken Corpora and Linguistic Studies. Philadelphia; Amsterdam: John Benjamins, 2014.

CRESTI, E.; MONEGLIA, M. Informational patterning theory and the corpus-based description of spoken language: The compositionality issue in the topic-comment pattern. In: MONEGLIA, M.; PANUNZI, A. (Ed.). Bootstrapping Information from Corpora in a Cross-Linguistic Perspective. Firenze: Firenze University Press, 2010. p. 13-45.

CROFT, W. Intonation units and grammatical structure. Linguistics, De Gruyter, v. 33, n. 5, p. 839-882, 1995.

CRUTTENDEN, A. Intonation. 2. ed. Cambridge: CUP, 1997.

CRYSTAL, D. Prosodic Systems and Intonation in English. Cambridge: CUP, 1969.

DU BOIS, J. W; CUMMING, S; SCHUETZE-COBURN, S; PAOLINO, D. (Ed.). Santa Barbara Papers in Linguistics. v. 4: Discourse Transcription. Santa Barbara Papers in Linguistics, Santa Barbara, v. 4, 224p., 1992.

DU BOIS, J. Rhythm and Tunes: The notation Unit in the Structure of Dialogic Engagement. Conference Prosody and Interaction. University of Potsdam, 2008.

FON, J.; JOHNSON, K.; CHEN, S. Durational patterning at syntactic and discourse boundaries in Mandarin spontaneous speech. Language and Speech, Kansas, v. 54, n. Pt 1, p. 5-32, 2011.

FOWLER, C. A. Segmentation of coarticulated speech in perception. Attention, Perception & Psychophysics, New York, v. 36, n. 4, p. 359-368, 1984.

FUCHS, S.; KRIVOKAPIC, J.; JANNEDY, S. Prosodic boundaries in German: Final lengthening in spontaneous speech. The Journal of the Acoustical Society of America, New York, v. 127, n. 3, p. 1851, 2010.

HALLIDAY, M. A. K. Speech and Situation. London: University College, 1965.

IZRE’EL, S. Intonation Units and the Structure of Spontaneous Spoken Language : A View from Hebrew. In: AURAN, C; BERTRAND, R; CHANET, C; COLAS, A; DI CRISTO, A; PORTES, C; REYNIER; A; VION, M. (Ed.) Proceedings of the IDP05 International Symposium on Discourse-Prosody Interfaces. Aix-en-Provence: 2011. Available from: http://aune.lpl.univ-aix.fr/~prodige/idp05/actes/izreel.pdf. Access at: 20 Nov. 2017.

KOHLER, K. J; PETERS, B.; WESENER, T. Interruption Glottalization in German Spontaneous Speech. Proceedings of Disfluency in Spontaneous Speech, DiSS01, 2001. p. 45-48. Available from: http://www.isca-speech.org/archive_open/archive_papers/diss_01/dis1_045.pdf. Access at: 20 Nov. 2017.

LIAW, A.; WIENER, M. Classification and Regression by randomForest. The R News Journal, [s.l.], v. 2, n. 3, p. 18-22, 2002. Available from: http://cran.r-project.org/doc/Rnews/. Access on: 10, Jan. 2018.

MITTMANN, M. M. et al. Utterance as the minimal pragmatic entity in spontaneous speech perception. In: CONFERÊNCIA LINGUÍSTICA E COGNIÇÃO, V., 2010, Florianópolis. Anais... Florianópolis: Universidade Federal de Santa Catarina, 2010. Available from: http://www.nupffale.ufsc.br/lincognition/anais.htm. Access on: 20, Nov. 2017.

MITTMANN, M. M.; BARBOSA, A. An automatic speech segmentation tool based on multiple acoustic parameters. CHIMERA. Romance Corpora and Linguistic Studies, Madri, v. 32, p. 133-147, 2016.

MO, Y.; COLE, J.; LEE, E-K. Naïve listeners’ prominence and boundary perception. In: BARBOSA, P. A; MADUREIRA, S.; REIS, C. (Org.). Speech Prosody. Campinas: ISCA, 2008. p. 735-738. Available from: http://www.isca-speech.org/archive/sp2008/papers/ sp08_735.pdf. Access on: 20, Nov. 2017.

MO, Y. Duration and intensity as perceptual cues for naïve listeners’ prominence and boundary perception. In: BARBOSA, P. A; MADUREIRA, S; REIS, C. (Ed.). Speech Prosody. Campinas: ISCA, 2008. Available from: http://www.isca-speech.org/archive/sp2008/sp08_739.html. Access on: 20 Nov. 2017.

MONEGLIA, M.; CRESTI, E. C-ORAL-ROM: Prosodic boundaries for spontaneous speech analysis. In: KAWAGUCHI, Y.; ZAIMA, S.; TAKAGAKI, T. (Ed.). Spoken Language Corpus and Linguistics Informatics. Amsterdam; Philadelphia: John Benjamins, 2006. p. 89-112.

MONEGLIA, M. Units of Analysis of Spontaneous Speech and Speech Variation in a Cross-linguistic Perspective. In: KAWAGUCHI, Y.; ZAIMA, S.; TAKAGAKI, T. (Ed.). Spoken Language Corpus and Linguistics Informatics. Amsterdam; Philadelphia: John Benjamins, 2006. p. 153-179.

MONEGLIA, M. Spoken Corpora and Pragmatics. Revista Brasileira de Linguística Aplicada, Belo Horizonte, v. 11, n. 2, p. 479-519, 2011.

PETERS, B.; KOHLER, K. J.; WESENER, T. Phonetische Merkmale prosodischer Phrasierung in deutscher Spontansprache. In: KOHLER, J.; KLEBER, F.; PETERS, B. (Ed.). Prosodic Structures in German Spontaneous Speech (AIPUK 35a). Kiel: IPDS, 2005. p. 143-184.

PIERREHUMBERT, J. B. The Phonetics and Phonology of English Intonation. 1980. 401 f. Thesis (PhD) – Dept. of Linguistics and Philosophy, Massachusetts Institute of Technology, Cambridge/MA, 1980. Available from: http://hdl.handle.net/1721.1/16065. Access on: 20 Nov. 2017.

PIKE, K. L. The intonation of American English. Ann Arbor: University of Michigan, 1945.

R CORE TEAM (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [Computer program] Available from: . Access on: 15 Dec. 2017.

RASO, T.; MELLO, H. (Ed.). C-ORAL-BRASIL I: Corpus de referência do português brasileiro falado informal. Belo Horizonte: UFMG, 2012.

RASO, T.; MITTMANN, M. M.; MENDES, A. C. O. O papel da pausa na segmentação prosódica de corpora de fala. Revista de Estudos da Linguagem, Belo Horizonte, v. 23, n. 3, p. 883-922, 2015.

RASO, T.; VIEIRA, M. A description of Dialogic Units/Discourse Markers in spontaneous speech corpora based on phonetic parameters. CHIMERA: Romance Corpora and Linguistic Studies, Madri, v. 3, p. 221-249, 2016.

SANDERS, L. D.; NEVILLE, H. J. Lexical, Syntactic, and Stress-Pattern Cues for Speech Segmentation. Journal of Speech, Language, and Hearing Research, ASHA Association, v. 43, n. 6, p. 1301-1321, 2000.

SCHUETZE-COBURN, S.; SHAPLEY, M.; WEBER, E. G. Units of intonation in discourse: a comparison of acoustic and auditory analyses. Language and Speech, Kansas, v. 34, n. 3, p. 207-234, 1991.

SELKIRK, E. O. Comments on Intonational Phrasing in English. In: FROTA, S.; VIGARIO, M.; FREITAS, M. J. (Ed.). Prosodies. Berlin: Mouton de Gruyter, 2005. p. 11-58.

SWERTS, M.; COLLIER, R.; TERKEN, J. Prosodic predictors of discourse finality in spontaneous monologues. Speech Communication, Elsevier, v. 15, n. 1-2, p. 79-90, Out. 1994.

SWERTS, M. Prosodic features at discourse boundaries of different strength. The Journal of the Acoustical Society of America, New York, v. 101, n. 1, p. 514-521, 1997.

SZCZEPEK REED, B. Turn-final intonation in English. In: COUPER-KUHLEN, E.; FORD, C. (Ed.). Sound Patterns in Interaction. Amsterdam: John Benjamins, 2004. p. 97-118.

SZCZEPEK REED, B. Prosody, syntax and action formation: Intonation phrases as “action components”. In: BERGMANN, P. et al. (Ed.). Prosody and Embodiment in Interactional Grammar. Berlin: Mouton de Gruyter, 2012. p. 142-170.

TSENG, C.-Y. Y. et al. Fluent speech prosody: Framework and modeling. Speech Communication, Elsevier, Anais... jul. 2005. Available from: http://www.sciencedirect.com/science/article/pii/S0167639305000919. Access on: 26 May 2015.

TSENG, C.-Y.; CHANG, C.-H. Pause or no pause?: Prosodic phrase boundaries revisited. Tsinghua Science and Technology, Tsinghua, v. 13, n. 4, p. 500-509, ago. 2008.

VENABLES, W N; RIPLEY, B D. Modern Applied Statistics with S. 4. ed. New York: Springer, 2002. Available from: http://www.stats.ox.ac.uk/pub/MASS4. Access on: 10 Jan. 2018.




DOI: http://dx.doi.org/10.17851/2237-2083.26.4.1455-1488

Refbacks

  • There are currently no refbacks.
';



Copyright (c) 2018 Bárbara Helohá F. Teixeira, Maryualê Malvessi Mittmann

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

e - ISSN 2237-2083 

License

Licensed through  Creative Commons Atribuição 4.0 Internacional    

Image result for fapemig

Grant #APL-00427-17 (2018-2019)