Emotion recognition based on multimodal fusion using mixture of brain emotional learning

Farhoudi, Zeinab; Setayeshi, Saeed; Razazi, Farbod; Rabiee, Azam

doi:10.30699/icss.21.4.113

Volume 21, Issue 4 (Winter 2020) Advances in Cognitive Sciences 2020, 21(4): 113-127 | Back to browse issues page

‎ 10.30699/icss.21.4.113

Mendeley

Zotero

RefWorks

Farhoudi Z, Setayeshi S, Razazi F, Rabiee A. Emotion recognition based on multimodal fusion using mixture of brain emotional learning. Advances in Cognitive Sciences 2020; 21 (4) :113-127
URL: http://icssjournal.ir/article-1-1067-en.html

Emotion recognition based on multimodal fusion using mixture of brain emotional learning

Zeinab Farhoudi¹

, Saeed Setayeshi ^*²

, Farbod Razazi³

, Azam Rabiee⁴

1- PhD Student of Artificial Intelligence, Department of Computer Engineering, Science and Reserach Branch, Islamic Azad University, Tehran, Iran
2- Professor of Department of Energy Engineering and Physics, Amirkabir University of Technology, Tehran, Iran
3- Professor of Department of Electrical and Computer Engineering, Science and Reserach Branch, Islamic Azad University, Tehran, Iran
4- Professor of Department of Computer Science, Dolatabad Branch, Islamic Azad University, Isfahan, Iran

Abstract: (4281 Views)

Introduction: Multimodal emotion recognition due to receiving information from different sensory resources (modalities) from a video has a lot of challenges and has attracted many researchers as a new method of human computer interaction. The purpose of this paper was to automatically recognize emotion from emotional speech and facial expression based on the neural mechanisms of the brain. Therefore, based on studies on brain-inspired models, a general framework for bimodal emotion recognition inspired by the functionality of the auditory and visual cortics and brain limbic system is presented.
Methods: The hybrid and hierarchical proposed model consisted of two learning phases. The first step: the deep learning models for the representation of visual and auditory features, and the second step: a Mixture of Brain Emotional Learning (MoBEL) model, obtained from the previous stage, for fusion of audio-visual information. For visual feature representation, 3D-convolutional neural network (3D-CNN) was used to learn the spatial relationship between pixels and the temporal relationship between the video frames. Also, for audio feature representation, the speech signal was first converted to the log Mel-spectrogram image and then fed to the CNN. Finally, the information obtained from the two above streams was given to the MoBEL neural network model to improve the efficiency of the emotional recognition system by considering the correlation between visual and auditory and fusion of information at the feature level.
Results: The accuracy rate of emotion recognition in video in the eNterface'05 database using the proposed method was on average of 82%.
Conclusion: The experimental results in the database show that the performance of the proposed method is better than the hand-crafted feature extraction methods and other fusion models in the emotion recognition.

Keywords: Multimodal emotion recognition, Brain emotional learning, Mixture of neural networks, Fusion, Deep learning

Full-Text [PDF 2631 kb] (1056 Downloads)

Received: 2019/03/6 | Accepted: 2019/09/16 | Published: 2020/03/18

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Designed & Developed by : Yektaweb