Optimization of Multimodal Deep Learning for Depression Detection
Aditiya Hermawan(1*), Benny Daniawan(2), Edy Edy(3), Joese Nathaniel(4)
(1) Universitas Buddhi Dharma
(2) Universitas Buddhi Dharma
(3) Universitas Buddhi Dharma
(4) Universitas Buddhi Dharma
(*) Corresponding Author
Abstract
Keywords
Full Text:
PDFReferences
F. Mohammad and K. M. Al Mansoor, “MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech,” Computers, Materials & Continua, vol. 81, no. 3, pp. 4125–4147, 2024, doi: 10.32604/cmc.2024.056666.
Y. Zhao, Z. Liang, J. Du, L. Zhang, C. Liu, and L. Zhao, “Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech,” Front Neurorobot, vol. 15, Aug. 2021, doi: 10.3389/fnbot.2021.684037.
J. Wang et al., “Automatic Diagnosis of Major Depressive Disorder Using a High- and Low-Frequency Feature Fusion Framework,” Brain Sci, vol. 13, no. 11, Nov. 2023, doi: 10.3390/brainsci13111590.
T. T. Nguyen, V. H. Q. Pham, D. T. Le, X. S. Vu, F. Deligianni, and H. D. Nguyen, “Multimodal Machine Learning for Mental Disorder Detection: A Scoping Review,” in Procedia Computer Science, Elsevier B.V., 2023, pp. 1458–1467. doi: 10.1016/j.procs.2023.10.134.
J. Ye et al., “Multi-modal depression detection based on emotional audio and evaluation text,” J Affect Disord, vol. 295, pp. 904–913, Dec. 2021, doi: 10.1016/j.jad.2021.08.090.
Z. Zhang et al., “Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text Data,” Sensors, vol. 24, no. 12, Jun. 2024, doi: 10.3390/s24123714.
R. Flores, M. L. Tlachac, A. Shrestha, and E. A. Rundensteiner, “WavFace: A Multimodal Transformer-based Model for Depression Screening,” IEEE J Biomed Health Inform, 2025, doi: 10.1109/JBHI.2025.3529348.
Y. Li et al., “FPT-Former: A Flexible Parallel Transformer of Recognizing Depression by Using Audiovisual Expert-Knowledge-Based Multimodal Measures,” International Journal of Intelligent Systems, vol. 2024, pp. 1–13, Jan. 2024, doi: 10.1155/2024/1564574.
Y. Pan, Y. Shang, Z. Shao, T. Liu, G. Guo, and H. Ding, “Integrating Deep Facial Priors into Landmarks for Privacy Preserving Multimodal Depression Recognition,” IEEE Trans Affect Comput, 2023, doi: 10.1109/TAFFC.2023.3296318.
A. Nagrani, S. Yang, A. Arnab, A. Jansen, C. Schmid, and C. Sun, “Attention bottlenecks for multimodal fusion,” in Proceedings of the 35th International Conference on Neural Information Processing Systems, in NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021.
S. Yang, L. Cui, L. Wang, T. Wang, and J. You, “Enhancing multimodal depression diagnosis through representation learning and knowledge transfer,” Heliyon, vol. 10, no. 4, Feb. 2024, doi: 10.1016/j.heliyon.2024.e25959.
F. Yin, J. Du, X. Xu, and L. Zhao, “Depression Detection in Speech Using Transformer and Parallel Convolutional Neural Networks,” Electronics (Switzerland), vol. 12, no. 2, Jan. 2023, doi: 10.3390/electronics12020328.
S. Ahmed, M. Abu Yousuf, M. M. Monowar, A. Hamid, and M. O. Alassafi, “Taking All the Factors We Need: A Multimodal Depression Classification With Uncertainty Approximation,” IEEE Access, vol. 11, pp. 99847–99861, 2023, doi: 10.1109/ACCESS.2023.3315243.
J. Gratch et al., “The Distress Analysis Interview Corpus of human and computer interviews,” 2014, [Online]. Available: http://www.biopac.com
A. K. Das and R. Naskar, “A deep learning model for depression detection based on MFCC and CNN generated spectrogram features,” Biomed Signal Process Control, vol. 90, Apr. 2024, doi: 10.1016/j.bspc.2023.105898.
M. Fang, S. Peng, Y. Liang, C. C. Hung, and S. Liu, “A multimodal fusion model with multi-level attention mechanism for depression detection,” Biomed Signal Process Control, vol. 82, Apr. 2023, doi: 10.1016/j.bspc.2022.104561.
R. Beniwal and P. Saraswat, “A Hybrid BERT-CNN Approach for Depression Detection on Social Media Using Multimodal Data,” Comput J, vol. 67, no. 7, pp. 2453–2472, Jul. 2024, doi: 10.1093/comjnl/bxae018.
R. Flores, E. Toto, and E. Rundensteiner, “AudiFace: Multimodal Deep Learning for Depression Screening,” in Proceedings of Machine Learning Research, 2022, pp. 1–22.
R. Beniwal and P. Saraswat, “A hybrid BERT-CPSO model for multi-class depression detection using pure hindi and hinglish multimodal data on social media,” Computers and Electrical Engineering, vol. 120, Dec. 2024, doi: 10.1016/j.compeleceng.2024.109786.
Article Metrics
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1






