Abstract
The growth of labeled data for remote healthcare analysis lags far behind the rapid expansion of raw data, creating a significant bottleneck. To address this, we propose a multimodal self-supervised learning (SSL) framework for 1D signals that leverages unlabeled physiological data. Our architecture fuses heart and respiration waveforms from three sensors – mWave radar, RGB camera, and depth camera – while processing and augmenting each modality separately. It then uses contrastive learning to extract robust features from the data. This architecture enables effective downstream task training, with reduced labeled data even in scenarios where certain sensors or modalities are unavailable. We validate our approach using the OMuSense-23 multimodal biometric dataset, and evaluate its performance on tasks such as breathing pattern recognition and physiological classification. Our results show that the models perform comparably to fully supervised methods when using large amounts of labeled data and outperforms them when using only a small percentage. In particular, with 1% of the labels, the model achieves 64% accuracy in breathing pattern classification compared to 24 % with a fully supervised approach. This work highlights the scalability and adaptability of self-supervised learning for physiological monitoring, making it particularly valuable for healthcare and well-being applications with limited labels or sensor availability. The code is publicly available at: https://gitlab.com/manulainen/ssl-physiological.
Original language | English |
---|---|
Article number | 103397 |
Journal | Information Fusion |
Volume | 124 |
DOIs | |
Publication status | Published - Dec 2025 |
MoE publication type | A1 Journal article-refereed |
Funding
The research was supported by the Research Council of Finland (former Academy of Finland) 6G Flagship Programme (Grant Number: 346208) , and Infotech Oulu. The authors wish to acknowledge CSC-IT Center for Science, Finland, for computational resources.
Keywords
- Breathing pattern recognition
- Physiological analysis
- Self-supervised learning