Abstract
Voice pathology is very important in the identification of vocal disorders. Traditional methods of diagnosing voice disorders using voice pathology are expensive, time-consuming, and subjective. The study proposed the identification of normal and pathological voices using the Arabic Voice Pathology Database (AVPD). The study evaluated the performance of Support Vector Machine (SVM), hybrid deep learning, and transfer learning approaches for identifying normal and pathological voices. These models were trained using Mel spectrogram features extracted from the voice data from the AVPD. The transfer learning model outperformed with an accuracy of 96.88%, a precision of 0.96 and 0.98, a recall of 0.98 and 0.96 in the identification of normal and pathological voices, respectively. The transfer learning model showed an F1 score of 0.97 for both normal and pathological voices. The hybrid model showed an accuracy of 92.71% and superior performance in classification metrics to identify normal and pathological voices. The SVM model achieved an accuracy of 86.46% and showed low performance in classification metrics to identify normal and pathological voices. Deep learning models, particularly the transfer learning model, outperformed across all evaluation metrics. The proposed transfer learning model achieved a 1.53% increase in accuracy over state-of-the-art approaches in identifying voice disorders using voice pathology. The proposed solution has several applications in medical diagnosis, addressing issues associated with traditional approaches for identifying vocal disorders using voice pathology.
| Original language | English |
|---|---|
| Article number | 909 |
| Journal | Signal, Image and Video Processing |
| Volume | 19 |
| Issue number | 11 |
| DOIs | |
| Publication status | Published - Nov 2025 |
| MoE publication type | A1 Journal article-refereed |
Funding
The study is funded by the VTT-Technical Research Centre of Finland Ltd. This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R760), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors extend their appreciation to Northern Border University, Saudi Arabia, for supporting this work through project number (NBU-CRP-2025-2225).
Fingerprint
Dive into the research topics of 'Voice pathology identification using mel spectrogram features and deep learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver