Depression recognition from facial videos: Preprocessing and scheduling choices hide the architectural contributions

  • Manuel Lage Cañellas*
  • , Constantino Álvarez Casado
  • , Le Nguyen
  • , Miguel Bordallo López
  • *Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

2 Citations (Scopus)

Abstract

Deep learning models have been widely applied in video-based depression detection. It is observed that the diversity of preprocessing, data augmentation, and optimization techniques makes it difficult to fairly compare model architectures. In this study, the typical ResNet-50 model is enhanced by using specific face alignment methods, improved data augmentation, optimization, and scheduling techniques. The extensive experiments on two popular benchmark datasets (AVEC2013 and AVEC2014) obtained competitive results, compared to sophisticated spatio-temporal models for single streams. Moreover, the score-level fusion approach based on two texture streams outperformed the state-of-the-art methods. It achieved mean square errors of 5.82 and 5.50 on AVEC2013 and AVEC2014, respectively. These findings suggest that the preprocessing and training configurations result in noticeable improvements, which have been originally attributed to the network architectures.

Original languageEnglish
Article numbere12992
Number of pages4
JournalElectronics Letters
Volume59
Issue number20
DOIs
Publication statusPublished - 22 Oct 2023
MoE publication typeA1 Journal article-refereed

Keywords

  • cameras
  • learning (artificial intelligence)

Fingerprint

Dive into the research topics of 'Depression recognition from facial videos: Preprocessing and scheduling choices hide the architectural contributions'. Together they form a unique fingerprint.

Cite this