A Comparative Annotator-agreement Analysis of Emotional Speech Corpora

Piyawat Sukhummek; Jessada Karnjana; Sawit Kasuriya; Chai Wutiwiwatchai; Thanaruk Theeramunkong

Authors

Piyawat Sukhummek Sirindhorn International Institute of Technology, Thammasat University, 131 Moo 5, Tiwanon Rd., Bangkadi, Muang, Pathum Thani, 12000, Thailand
Jessada Karnjana National Science and Technology Development Agency, 112 Thailand Science Park, Phahonyothin Rd., Klong Luang, Pathum Thani, 12120, Thailand
Sawit Kasuriya National Science and Technology Development Agency, 112 Thailand Science Park, Phahonyothin Rd., Klong Luang, Pathum Thani, 12120, Thailand
Chai Wutiwiwatchai National Science and Technology Development Agency, 112 Thailand Science Park, Phahonyothin Rd., Klong Luang, Pathum Thani, 12120, Thailand
Thanaruk Theeramunkong Sirindhorn International Institute of Technology, Thammasat University, 131 Moo 5, Tiwanon Rd., Bangkadi, Muang, Pathum Thani, 12000, Thailand

Keywords:

annotator-agreement analysis, inter-annotator reliability measurement, IEMOCAP corpus, EMOLA corpus, HMM-based emotion recognition

Abstract

This paper proposes three methods for removing or filtering out ambiguous utterances: the filtering based on the first label preference and majority vote, the filtering based on full consensus, and the filtering based on the first label preference and full consensus. We investigate two corpora, which are Interactive Emotional Dyadic Motion Capture Database (IEMOCAP) and Emotional Tagged Corpus on Lakorn (EMOLA). The first corpus is an English language corpus whereas the second one is a Thai language corpus, and both are annotated by six annotators. We primarily study only four emotions, which are anger, happiness, neutral, and sadness. The experimental results show that, once the emotionally ambiguous utterances are removed from a corpus by the proposed methods, and then the corpora are used in training and testing emotion recognition models, the accuracy results improve considerably compared with those of emotion recognition models trained and tested by the original corpora. In the best case, the accuracy improves by 37.47 percents. Also, the proposed methods can considerably improve the reliability of agreement among annotators.

References

Hassenzahl, M.: Emotions can be quite ephemeral; we cannot design them. interactions. 11, 5, 46–48 (2004)

Fox, E.: Emotion science cognitive and neuroscientic approaches to understanding human emotions. Palgrave Macmillan, (2008)

Niemic, C.P., Warren, K.: Studies of emotion. A Theoretical and Em- pirical Review of Psychophysiological Studies of Emotion. 1, 1, 15–19 (2002)

Vul, E., Harris, C., Winkielman, P., Pashler, H.: Puzzlingly high cor- relations in fmri studies of emotion, personality, and social cognition. Perspect Psychol Sci. 4, 3, 274–290 (2009)

Frijda, N.H.: The emotions: Studies in emotion and social interaction. Hist Urbaine. (1986)

Panksepp, J.: Affective neuroscience: The foundations of human and animal emotions. Oxford university press, (2004)

Izard, C.E.: The psychology of emotions. Springer Science & Business Media, (1991)

Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. International Journal of Smart Home 6, 2, 101–108 (2012)

Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth Annual Conference of the International Speech Communication Association, (2014)

Lin, Y.L., Wei, G.: Speech emotion recognition based on hmm and svm. In: International Conference on Machine Learning and Cybernetics. vol. 8, pp. 4898–4901. IEEE, (2005)

Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association. (2009)

Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: Towards a new generation of databases. Speech Commun. 40, 1, 33–60 (2003)

Scott, W.A.: Reliability of content analysis: The case of nominal scale coding. Public opinion quarterly. 321–325 (1955)

Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol Bull. 76, 5, 378 (1971)

Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics, 159–174 (1977)

Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. (2011)

A Comparative Annotator-agreement Analysis of Emotional Speech Corpora

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Logo

Make a Submission

Manual

Visitors