TY - JOUR
T1 - Facial expression analysis using Decomposed Multiscale Spatiotemporal Networks
AU - de Melo, Wheidima Carneiro
AU - Granger, Eric
AU - Lopez, Miguel Bordallo
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/2
Y1 - 2024/2
N2 - Video-based analysis of facial expressions has been increasingly applied to infer health states of individuals, such as depression and pain. Among the existing approaches, deep learning models composed of structures for multiscale spatiotemporal processing have shown strong potential for encoding facial dynamics. However, such models have high computational complexity, making for a difficult deployment of these solutions. To address this issue, we introduce a new technique to decompose the extraction of multiscale spatiotemporal features. Particularly, a building block structure called Decomposed Multiscale Spatiotemporal Network (DMSN) is presented along with three variants: DMSN-A, DMSN-B, and DMSN-C blocks. The DMSN-A block generates multiscale representations by analyzing spatiotemporal features at multiple temporal ranges, while the DMSN-B block analyzes spatiotemporal features at multiple ranges, and the DMSN-C block analyzes spatiotemporal features at multiple spatial sizes. Using these variants, we design our DMSN architecture which has the ability to explore a variety of multiscale spatiotemporal features, favoring the adaptation to different facial behaviors. Our extensive experiments on challenging datasets show that the DMSN-C block is effective for depression detection, whereas the DMSN-A block is efficient for pain estimation. Results also indicate that our DMSN architecture achieves competitive performance while requiring 3.51× and 26.55× fewer parameters than the current state-of-the-art models for depression detection and pain estimation, respectively. The code is publicly available at https://github.com/wheidima/DMSN.
AB - Video-based analysis of facial expressions has been increasingly applied to infer health states of individuals, such as depression and pain. Among the existing approaches, deep learning models composed of structures for multiscale spatiotemporal processing have shown strong potential for encoding facial dynamics. However, such models have high computational complexity, making for a difficult deployment of these solutions. To address this issue, we introduce a new technique to decompose the extraction of multiscale spatiotemporal features. Particularly, a building block structure called Decomposed Multiscale Spatiotemporal Network (DMSN) is presented along with three variants: DMSN-A, DMSN-B, and DMSN-C blocks. The DMSN-A block generates multiscale representations by analyzing spatiotemporal features at multiple temporal ranges, while the DMSN-B block analyzes spatiotemporal features at multiple ranges, and the DMSN-C block analyzes spatiotemporal features at multiple spatial sizes. Using these variants, we design our DMSN architecture which has the ability to explore a variety of multiscale spatiotemporal features, favoring the adaptation to different facial behaviors. Our extensive experiments on challenging datasets show that the DMSN-C block is effective for depression detection, whereas the DMSN-A block is efficient for pain estimation. Results also indicate that our DMSN architecture achieves competitive performance while requiring 3.51× and 26.55× fewer parameters than the current state-of-the-art models for depression detection and pain estimation, respectively. The code is publicly available at https://github.com/wheidima/DMSN.
KW - Convolutional Neural Networks
KW - Deep learning
KW - Depression detection
KW - Facial expression analysis
KW - Pain estimation
UR - http://www.scopus.com/inward/record.url?scp=85170570242&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.121276
DO - 10.1016/j.eswa.2023.121276
M3 - Article
AN - SCOPUS:85170570242
SN - 0957-4174
VL - 236
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 121276
ER -