تطوير نموذج هجين لكشف حالات اكتئاب اعتماداً على الإشارة الصوتية
Keywords:
Speech signal analysis, machine learning, depression detection, clinical depression, audio databases, feature extraction, Gammatone Cepstral Coefficients (GTCC), XGBoost algorithm.Abstract
Depression is one of the most prevalent mental health disorders, adversely affecting quality of life and associated with significant health risks. Given the limitations of traditional diagnostic methods, artificial intelligence techniques have emerged as promising tools for the early detection of depression through the analysis of biomarkers, such as voice. In this study, a system for detecting clinical depression using Arabic speech signals was developed, with a focus on integrating contextual data and improving diagnostic accuracy.
An Arabic-Audio Database for Depression (AADD) was created, comprising recordings from 73 volunteers. Their voices were recorded while reading texts containing positive, negative, and neutral emotions in a professional environment. In addition to the audio recordings, the database included demographic information (age, gender, residence, smoking status) and Beck Depression Inventory (BDI) scores as a gold standard for diagnosis. To extract audio features, the Gammatone Cepstral Coefficients (GTCC) algorithm was employed, along with supplementary audio features such as shimmer and duration, to capture patterns associated with mental states.
The proposed system utilized the XGBoost algorithm for classification, chosen for its efficiency in handling multi-dimensional data and its ability to mitigate the risk of overfitting. To ensure the model's reliability, it was evaluated on two datasets: the first one is the AADD database and the second one is the international standard dataset MODMA Dataset (in Chinese). Multiple evaluation metrics were used, including accuracy, precision, recall, F1-score, cross-validation accuracy, and total error rate, with an 80-20 split for training and testing in both cases.
The results demonstrated that the system achieved an accuracy of 77% on the AADD dataset and 85% on the MODMA dataset, highlighting the impact of linguistic and cultural diversity on performance. Feature analysis revealed that shimmer and duration were significant indicators of depression. This study underscores the potential of machine learning in psychological diagnosis and emphasizes the importance of developing localized databases to generalize models across diverse cultural contexts.