2021, 3(1): 65-75 Published Date:2021-2-20
DOI: 10.1016/j.vrih.2020.11.006
Abstract:
Cite this article:
1.
Swain M, Routray A, Kabisatpathy P. Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology, 2018, 21(1): 93–120 DOI:10.1007/s10772-018-9491-z
2.
Vrysas N, Kotsakis R, Liatsou A, Dimoulas C, Kalliris G. Speech emotion recognition for performance interaction. Journal of the Audio Engineering Society, 2018, 66(6): 457–467 DOI:10.17743/jaes.2018.0036
3.
Kotsakis R, Dimoulas C, Kalliris G, Veglis A. Emotional prediction and content profile estimation in evaluating audiovisual mediated communication. International Journal of Monitoring and Surveillance Technologies Research, 2014, 2(4): 62–80 DOI:10.4018/ijmstr.2014100104
4.
Gideon J, McInnis M, Mower Provost E. Improving cross-corpus speech emotion recognition with adversarial discriminative domain generalization (ADDoG). IEEE Transactions on Affective Computing, 2019: 1 DOI:10.1109/taffc.2019.2916092
5.
Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G. Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Transactions on Affective Computing, 2010, 1(2): 119–131 DOI:10.1109/t-affc.2010.8
6.
Ntalampiras S. Toward language-agnostic speech emotion recognition. Journal of the Audio Engineering Society, 2020, 68(1/2): 7–13 DOI:10.17743/jaes.2019.0045
7.
Sahu S, Gupta R, Espy-Wilson C. On enhancing speech emotion recognition using generative adversarial networks. In: Interspeech 2018. ISCA, 2018 DOI:10.21437/interspeech.2018-1883
8.
Bao F, Neumann M, Vu N T. CycleGAN-based emotion style transfer as data augmentation for speech emotion recognition. In: Interspeech 2019. ISCA, 2019, 35–37 DOI:10.21437/interspeech.2019-2293
9.
Han J, Zhang Z, Ren Z, Ringeval F, Schuller B. Towards conditional adversarial training for predicting emotions from speech. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2018, 6822–6826 DOI:10.1109/ICASSP.2018.8462579
10.
Salamon J, Bello J P. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 2017, 24(3): 279–283 DOI:10.1109/lsp.2017.2657381
11.
Abdelwahab M, Busso C. Domain adversarial for acoustic emotion recognition. ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(12): 2423–2435 DOI:10.1109/taslp.2018.2867099
12.
Pan S J, Tsang I W, Kwok J T, Yang Q. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199–210 DOI:10.1109/tnn.2010.2091281
13.
Ververidis D, Kotropoulos C. Emotional speech recognition: resources, features, and methods. Speech Communication, 2006, 48(9): 1162–1181 DOI:10.1016/j.specom.2006.04.003
14.
Fayek H M, Lech M, Cavedon L. Evaluating deep learning architectures for speech emotion recognition. Neural Networks, 2017, 92: 60–68 DOI:10.1016/j.neunet.2017.02.013
15.
Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A. Acoustic emotion recognition: A benchmark comparison of performances. In: 2009 IEEE Workshop on Automatic Speech Recognition & Understanding. 2009, 552–557 DOI:10.1109/ASRU.2009.5372886
16.
Jeon J H, Xia R, Liu Y. Sentence level emotion recognition based on decisions from subsentence segments. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011, 4940–4943 DOI:10.1109/ICASSP.2011.5947464
17.
Vryzas N, Vrysis L, Matsiola M, Kotsakis R, Dimoulas C, Kalliris G. Continuous speech emotion recognition with convolutionacnnl neural networks. Journal of the Audio Engineering Society, 2020. 68(1/2), 14–24
18.
Zhang S, Huang T, Gao W. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia, 2018, 20(6):1576–1590 DOI:10.1109/TMM.2017.2766843
19.
Vrysis L, Tsipas N, Thoidis I, Dimoulas C. 1D/2D deep CNNs vs. temporal feature integration for general audio classification. Journal of the Audio Engineering Society, 2020, 68(1/2), 66–77
20.
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing. MIT Press, 2014, 2672–2680
21.
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V. Domain-Adversarial Training of Neural Networks. The Journal of Machine Learning Research, 2016, 17(1): 2096–2030
22.
Müller C. The interspeech 2010 paralinguistic challenge. Proc Interspeech, 2010, 2794–2797
23.
Eyben F, Wöllmer M, Schuller B. Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia. Firenze, Italy, Association for Computing Machinery, 2010, 1459–1462 DOI:10.1145/1873951.1874246
24.
Qian K, Zhang Y, Chang S, Cox D, Hasegawa-Johnson M. Unsupervised speech decomposition via triple information bottleneck. 2020
25.
Metallinou A, Wollmer M, Katsamanis A, Eyben F, Schuller B, Narayanan S. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Transactions on Affective Computing 2012, 3(2):184–198 DOI:10.1109/T-AFFC.2011.40
26.
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. Computer Science, 2014
27.
Vinyals O, Kaiser L, Koo T, Petrov S, Sutskever I, Hinton G. Grammar as a foreign language. In: Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2. Montreal, Canada, Press MIT, 2015, 2773–2781
28.
Busso C, Bulut M, Lee C-C, Kazemzadeh A, Mower E, Kim S, Chang J N, Lee S, Narayanan S S. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, 42(4): 335 DOI:10.1007/s10579-008-9076-6
29.
The selected speech emotion database of institute of automation chinese academy of sciences (CASIA). http://www.datatang.com/ data/39277
30.
Busso C, Parthasarathy S, Burmania A, AbdelWahab M, Sadoughi N, Provost E M. MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 2017, 8(1):67–80 DOI:10.1109/TAFFC.2016.2515617
31.
Li H, Tu M, Huang J, Narayanan S, Georgiou P. Speaker-invariant affective representation learning via adversarial training. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, 7144–7148 DOI:10.1109/ICASSP40776.2020.9054580.
32.
Sayan G, Eugene L, Louis-Philippe M, Stefan S. Representation learning for speech emotionrecognition. Interspeech, 2016, 3603–3607
33.
Xu Y, Xu H, Zou J. HGFM: A hierarchical grained and feature model for acoustic emotion recognition. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020, 6499–6503 DOI:10.1109/ICASSP40776.2020.9053039