Evaluation of implicit emotion in the message through emotional speech processing based on Mel-Frequency Cepstral Coefficient and Short-Time Fourier Transform features

Ravanbakhsh, Mahsa; Setayeshi, Saeed; Pedram, Mir Mohsen; Mirzaei, Azadeh

doi:10.30699/icss.22.2.71

Volume 22, Issue 2 (Summer 2020) Advances in Cognitive Sciences 2020, 22(2): 71-81 | Back to browse issues page

‎ 10.30699/icss.22.2.71

Mendeley

Zotero

RefWorks

Ravanbakhsh M, Setayeshi S, Pedram M M, Mirzaei A. Evaluation of implicit emotion in the message through emotional speech processing based on Mel-Frequency Cepstral Coefficient and Short-Time Fourier Transform features. Advances in Cognitive Sciences 2020; 22 (2) :71-81
URL: http://icssjournal.ir/article-1-1082-en.html

Evaluation of implicit emotion in the message through emotional speech processing based on Mel-Frequency Cepstral Coefficient and Short-Time Fourier Transform features

Mahsa Ravanbakhsh¹

, Saeed Setayeshi ^*²

, Mir Mohsen Pedram³

, Azadeh Mirzaei⁴

1- PhD Student of Cognitive Linguistics, Institute for Cognitive Science Studies (ICSS), Tehran, Iran
2- Associate Professor of Department of Physics and Energy Engineering, Amirkabir University of Technology, Tehran, Iran
3- Associate Professor, Department of Electrical and Computer Engineering, Kharazmi University, Tehran, Iran
4- Assistant Professor of Linguistics, Department of Linguistics, Faculty of Persian Literature and Foreign Languages, Allameh Tabataba'i University, Tehran, Iran

Abstract: (3276 Views)

Introduction: Speech is the most effective way to exchange information. In a speech, a speaker’s voice carries additional information other than the words and grammar content of the speech, i.e., age, gender, and emotional state. Many studies have been conducted with various approaches to the emotional content of speech. These studies show that emotion content in speech has a dynamic nature. The dynamics of speech make it difficult to extract the emotion hidden in a speech. This study aimed to evaluate the implicit emotion in a message through emotional speech processing by applying the Mel-Frequency Cepstral Coefficient (MFCC) and Short-Time Fourier Transform (STFT) features.
Methods: The input data is the Berlin Emotional Speech Database consisting of seven emotional states, anger, boredom, disgust, anxiety/fear, happiness, sadness, and neutral version. MATLAB software is used to input audio files of the database. Next, the MFCC and STFT features are extracted. Feature vectors for each method are calculated based on seven statistical values, i.e. minimum, maximum, mean, standard deviation, median, skewness, and kurtosis. Then, they are used as an input to an Artificial Neural Network. Finally, the recognition of emotional states is done by training functions based on different algorithms.
Results: The results revealed that the average and accuracy of emotional states recognized using STFT features are better and more robust than MFCC features. Also, emotional states of anger and sadness have a higher rate of recognition, among other emotions.
Conclusion: STFT features showed to be better than MFCC features to extract implicit emotion in speech.

Keywords: Emotional speech, Emotion recognition, Short time Fourier transform, Mel-frequency Cepstral coefficients, Emotional speech processing

Full-Text [PDF 1209 kb] (973 Downloads)

Type of Study: Research |
Received: 2019/04/28 | Accepted: 2019/12/10 | Published: 2020/06/30

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Designed & Developed by : Yektaweb