Encoding Temporal Information for Automatic Depression Recognition from Facial Analysis

Wheidima Carneiro De Melo, Eric Granger, Miguel Bordallo Lopez

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

Abstract

Depression is a mental illness that may be harmful to an individual's health. Using deep learning models to recognize the facial expressions of individuals captured in videos has shown promising results for automatic depression detection. Typically, depression levels are recognized using 2D-Convolutional Neural Networks (CNNs) that are trained to extract static features from video frames, which impairs the capture of dynamic spatio-temporal relations. As an alternative, 3D-CNNs may be employed to extract spatiotemporal features from short video clips, although the risk of overfitting increases due to the limited availability of labeled depression video data. To address these issues, we propose a novel temporal pooling method to capture and encode the spatio-temporal dynamic of video clips into an image map. This approach allows fine-tuning a pre-trained 2D CNN to model facial variations, and thereby improving the training process and model accuracy. Our proposed method is based on two-stream model that performs late fusion of appearance and dynamic information. Extensive experiments on two benchmark AVEC datasets indicate that the proposed method is efficient and outperforms the state-of-the-art schemes.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PublisherIEEE Institute of Electrical and Electronic Engineers
Pages1080-1084
Number of pages5
ISBN (Electronic)978-1-5090-6631-5
DOIs
Publication statusPublished - May 2020
MoE publication typeA4 Article in a conference publication
Event2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
Duration: 4 May 20208 May 2020

Publication series

SeriesICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2020-May
ISSN1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
CountrySpain
CityBarcelona
Period4/05/208/05/20

Keywords

  • Affective Computing
  • Depression Detection
  • Expression Recognition
  • Temporal Pooling
  • Two-stream Model

Fingerprint Dive into the research topics of 'Encoding Temporal Information for Automatic Depression Recognition from Facial Analysis'. Together they form a unique fingerprint.

  • Cite this

    De Melo, W. C., Granger, E., & Lopez, M. B. (2020). Encoding Temporal Information for Automatic Depression Recognition from Facial Analysis. In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings (pp. 1080-1084). IEEE Institute of Electrical and Electronic Engineers. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Vol.. 2020-May https://doi.org/10.1109/ICASSP40776.2020.9054375