Evaluating the effect of Parkinson's disease on jitter and shimmer speech features
Hamid Azadi1, Mohammad-R Akbarzadeh-T1, Ali Shoeibi2, Hamid Reza Kobravi3
1 Department of Electrical Engineering, Biomedical Engineering Group, Center of Excellence on Soft Computing and Intelligent Information Processing, Ferdowsi University of Mashhad, Mashhad, Iran
2 Department of Neurology. School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
3 Department of Biomedical Engineering, Islamic Azad University of Mashhad, Mashhad, Iran
|Date of Submission||15-Aug-2021|
|Date of Decision||22-Aug-2021|
|Date of Acceptance||31-Aug-2021|
|Date of Web Publication||25-Dec-2021|
Dr. Mohammad-R Akbarzadeh-T
Department of Electrical Engineering, Center of Excellence on Soft Computing and Intelligent Information Processing, Ferdowsi University of Mashhad, Azadi Square, Mashhad
Source of Support: None, Conflict of Interest: None
Background: Parkinson's disease (PD) is a neurological disorder caused by decreasing dopamine in the brain. Speech is one of the first functions that are disrupted. Accordingly, speech features are a promising indicator in PD diagnosis for telemedicine applications. The purpose of this study is to investigate the impact of Parkinson's disease on a minimal set of Jitter and Shimmer voice indicators and studying the difference between male and female speech features in noisy/noiseless environments. Materials and Methods: Our data includes 47 samples from nursing homes and neurology clinics, with 23 patients and 24 healthy individuals. The optimal feature for each category is studied separately for the men's and women's samples. The focus here is on the phonation in which the vowel/a/is expressed by the participants. The main features, including Jitter and Shimmer perturbations, are extracted. To find an optimal pair under both noisy and noiseless circumstance, we use the Relief feature selection strategy. Results: This research shows that the Jitter feature for men and women with Parkinson's is 21 and 33.4, respectively. While the Shimmer feature is 0.1 and 0.06. In addition, by using these two features alone, we reach a correct diagnosis rate of 79% and 81% for noisy and noiseless states, respectively. Conclusion: The PD effects on the speech features can be accurately identified. Evaluating the extracted features suggests that the absolute value of the selected feature in men with PD is higher than for healthy ones. Whereas, in the case of women, this is the opposite.
Keywords: Classification, dysphonia, Parkinson disease, phonation, speech disorders
|How to cite this article:|
Azadi H, Akbarzadeh-T MR, Shoeibi A, Kobravi HR. Evaluating the effect of Parkinson's disease on jitter and shimmer speech features. Adv Biomed Res 2021;10:54
|How to cite this URL:|
Azadi H, Akbarzadeh-T MR, Shoeibi A, Kobravi HR. Evaluating the effect of Parkinson's disease on jitter and shimmer speech features. Adv Biomed Res [serial online] 2021 [cited 2022 Jan 23];10:54. Available from: https://www.advbiores.net/text.asp?2021/10/1/54/333577
| Introduction|| |
Parkinson's disease (PD) is considered the second most neurodegenerative disease after Alzheimer's. Parkinson's is an ever-evolving disease traditionally diagnosed by movement symptoms such as muscle tremors, stiffness and slowness of movement, and imbalance when walking. The loss of cells that produce a substance called dopamine, which is located in the substantia nigra and the middle part of the brain, leads to this disease., Pathologically, the cause of dopamine-producing cells death is unknown and usually affects older people. So far, no definitive cure has been found for PD, and most existing methods only reduce its growth rate instead of treatment. Therefore, diagnosis in the early stages of this disease can be very effective in improving the quality of the patient's life.,
Researchers have proposed many non-invasive methods to diagnose PD, but among them, special attention has been paid to the acoustic analysis of voice signals. Most people with Parkinson's have a type of voice disorder called hypokinetic dysarthria. Dysarthritis is a type of speech disorder that occurs due to damage to the central or peripheral nervous system and as a result of disturbances in the muscular control of the speech mechanism. This disorder may affect breathing, vocalization, amplification, production, and speech. This makes the person's voice incomprehensible, slower, monotonous, and harsh. Parts of the vocal cords affected by PD include phonation, prosody, and articulation., Most researches focus only on phonation and examine the sustained vowel/a/. Since it is the most straightforward and unadorned voice to produce and much useful medical information can be obtained from it. Physiologically, a subtle combination of muscles in the vocal cords is involved in producing /a/. Therefore, if there is any neurological defect, the probability of diagnosing would be increased. Furthermore, when producing the letter /a/, the mouth is much more open than other letters, which causes a minimal return of air to the vocal cords.
In clinical applications, movement disorder specialists are responsible for diagnosing PD in the early stages, which is usually done by assessing a criterion called UPDRS. Recent studies, however, have introduced speech analysis as a cost-effective, targeted, and fully accessible approach that can significantly screen patients with PD., According to Tetrud, changes in the speech were revealed several years before the definitive diagnosis of PD had been done. Therefore, voice changes are considered as an attractive method in the initial diagnosis and determination of the progression of PD.,, Wide ranges of speech tests, including syllable expression, sustain phonation, and various passage readings, are designed to assess the occurrence of these speech disorders. Particularly, several studies investigated phonation features to distinguish PWP from healthy individuals., Azadi et al. proposed a new hybrid method called Safir, in which it uses a combination of type-2 fuzzy and AHP to select features that are approved by different feature selection criteria. They achieved an accuracy of about 90% in noise conditions using ten of the most prominent voice features from among 339 acoustic parameters. Benmalek et al. divided patients based on their severity and considered different classifiers to reach 93% accuracy in separation. Tsanas et al. compared several feature selection methods and found that the best performance was related to Relief in this field.
Despite the existence of numerous such methods for analyzing changes in sound, there are several issues that researchers are faced. Some things like differences in acoustic and conventional environments, as well as differences in the quality of sound recorded by professional microphones and telephone lines. Furthermore, evaluating a large number of acoustic parameters deteriorates the classifier's performance. Hence, the selection of the optimal feature(s) set is considered another critical issue. Considering the possibility for remote telemedicine and diagnosis, however, the remaining question is finding the “minimal” set of features for the least computational cost and most reliable diagnosis.
Sustained phonation is less affected by dialect and linguistic structures., Therefore, in the present study, the Phonation of vowels is investigated. On the other hand, because changes in amplitude and frequency have been observed in patients with Parkinson's, we have also focused on examining these two categories of features and introduce two optimal features. Furthermore with a simple support vector machine (SVM) classifier, we determine the accuracy of the diagnosis on each feature so that they can be used more efficiently in diagnostic applications.
| Materials and Methods|| |
In summary, the proposed methodology has four main stages consisting of data acquisition, feature extraction, feature selection, and evaluation of the classifier performance in noise-free and noisy conditions. These steps are explained in [Figure 1].
The present study is a descriptive-analytical cross-sectional study that compares healthy individuals with Parkinson's patients. Required samples were collected from Khorasan Razavi Welfare Elderly Care Centers and the clinical offices of the neurologists. The inclusion criteria for PWP were as follows: Physician diagnosis based on PD, Persian monolingualism, nondementia, or other mental problems. Furthermore, in this study, the distribution of PD severity is divided into mild, moderate, and severe, as illustrated in part B of [Figure 2]. Using Audacity software installed on a laptop voice samples were recorded. Each participant was given a complete explanation about how to perform the experiment and how to pronounce the vowel /a/ before the test begins. After one or two experimental performances, the final recording consist of several (4 or 5) voice samples that were subsequently collected for each participant was done.
|Figure 2: (a) Age distribution in healthy subjects and PWP, and (b) distribution of disease severity in PWP disease.|
Click here to view
Our data consist of 224 voice phonation samples from 26 females and 21 males, of which 23 have PD, as shown in part A of [Figure 2]. 47 participants were selected to be comparable with the other related research. The average and standard deviation age distribution of the PWP is 70 ± 8.2 years. A total of 111 voice samples are recorded. Similarly, the average and standard deviation age distribution of the remaining 24 healthy subjects is 70 ± 8.2 years and 113 voice samples. A professional microphone from AKG brand (model C544 L) installed on the person's head was used at a distance of approximately 3 cm from the subject's mouth to record. Therefore, possible vibrations and head movements will not affect the quality of the received signal. The sustain phonation was recorded with a frequency of 44.1 kHz and a resolution of 16 bits, and MATLAB 2016b (headquarters are in Natick, Massachusetts, USA.) software was used to extract the relevant features.
Adding noise to the signal
Any oscillation or change that occurs on the measured signals is called noise. Since one of the main objectives of this research is to determine features extracted from the voice signal that is recorded remotely, we simulate noise on telephone lines. Therefore, we add the following disturbances to the noiseless signal:
- Phone bandwidth is approximately 8000 Hz, so we reduce the sampling rate to this value
- Add a Gaussian white noise to the down-sample data to reach the signal-to-noise ratio of 30 dB.
In this way, by receiving the patient's voice through telephone lines, the specialists may make an initial assessment. To find out which feature performs better and robusts against noise in each category, we extract features from both noise-free and noisy signals separately. Therefore, we will have two matrices N × M, in each row of which (n = 223) are the observations or the samples that participated in the test, and each column (M = 44) represents a feature.
Perturbation measures such as jitter and shimmer are usually used to evaluate speech signals. The jitter is considered a parameter to measure frequency changes from cycle to cycle, and the shimmer is related to measuring changes in the amplitude of the speech wave. [Figure 3] provides a better illustration of this explanation. Therefore, we examine these two categories of features that align with the nature of the voice signal produced by PWP. In this way, we would have a quantitative criterion for separating patients and healthy people in noiseless and noisy conditions.
|Figure 3: Representation of Jitter and Shimmer perturbation measures in speech signal|
Click here to view
Types of frequency perturbation (Jitter)
According to the definition, jitter quantifies perturbations in successive cycles; in other words, it indicates a small deviation from the exact periodicity. By recognizing the basic concept of this measurement criterion, many types of frequency disorders can be introduced in this field., Either jitter can be calculated using the fundamental frequency (F0) or by main periodicity (T0) which is the inverse ratio of F0. We extract 22 features from this category (Jitter), which are described in detail in and listed in part A of [Table 1].
|Table 1: Name and index of features extracted from the voice signal a. Jitter and b. Shimmer|
Click here to view
Types of intensity or amplitude perturbations (Shimmer)
In the previous section, we define the frequency perturbations in different cycles of the fundamental frequency (F0). In this section, we introduce a new measurement method called domain variation. Therefore, instead of the main domain (F0), we introduce A0. A0 is the largest domain in each cycle. We extract 22 features from this category (Shimmer), which are described in detail in and listed in part B of [Table 1].
Relief feature selection
Kira and Randall proposed relief as an innovative feature selection algorithm in 1992. Features selected by the relief method helped to separate the samples from different classes. Relief is a weight-based method that uses the K Nearest-Neighbor classifier to select an optimal feature. It assigns weight to each feature based on the effectiveness of the feature in selecting the group or the class according to equation (1).
In (1), w (fi) is the weight of the jth feature, q is the number of samples, xi is selected sample, and ║ ║ is the Euclidean distance. In addition, we considered 10 for both │NH (xi)│ and │NM (xi)│ according to.
Least squares-support vector machine
The SVM method was first proposed by Vapnik in 1995 to separate two classes of data. It has become one of the most popular and widely used classification methods in recent years. Least squares SVM (LS-SVM)) classifiers were proposed by Suykens and Vandewalle in 1999. They are a class of kernel-based learning methods to solve both classification and regression problems. LS-SVM with the Gaussian radial basis kernel functions RBF has been shown to perform better in separating PWP from healthy subjects., Accordingly, we also use the LS-SVM method.
Validation is done by examining data that has not previously been used for classifier training. We use the ten-fold cross-validation method and applied it separately to the male and female sample sets. In this method, the samples are randomly divided into two training and testing subsets. About 90% of them are used to train, and remain 10% are used to test the classifier. Then, the classifier accuracy is calculated using (2). In addition to accuracy, the following measures are used to evaluate the current work as defined in:
Where TP, FP, TN, and FN are true positive, false positive, true negative, and false negative, respectively.
The Research Ethics Committee of the Ferdowsi University of Mashhad, Iran approved the above-mentioned sampling protocols (Ethical code: IR.UM.REC.1400.043). All the participants in the study provided written informed consent.
| Results|| |
Here, the results and statistical analyzes of the proposed method step-by-step are reported. It is worth mentioning that the process for both groups of samples, men and women, is repeated separately in noiseless and noisy conditions. Therefore, in the first step, according to the explanations provided in part A of the procedure section, we add the phone line noise to the signal to have two noisy and noise-free data sets.
Feature ranking using relief
[Table 2] and [Table 3] show the overall ranking of the five first features selected by relief in noiseless and noisy circumstances sequentially. It can be inferred that feature weights for females are more prominent than men in both Jitter and Shimmer feature sets. We select the feature that has been assigned the first rank by the relief method in each group as a diagnostic criterion. [Table 4] shows selected acoustic parameters (features) and their defined indexes for men and women. With a cursory glance at [Table 4], one can quickly realize that selected features for both noisy and noiseless signals are equal.
|Table 2: The top 5 features selected by the relief for samples of women and men in noiseless condition for a. jitter and b. Shimmer|
Click here to view
|Table 3: The top 5 features selected by the relief for samples of women and men in noisy condition for a. Jitter and b. Shimmer|
Click here to view
|Table 4: The top 5 features selected by the relief for samples of women and men in noisy condition for a. Jitter and b. Shimmer|
Click here to view
To make a quantitative comparison, the average value of the selected feature for the male and female samples in noiseless and noisy conditions is given for the Jitter in [Figure 4], and the Shimmer in [Figure 5]. As it is illustrated, the Jitter and Shimmer value for healthy men are about 16 and 0.05, respectively, and for men with PD are about 20 and 0.1, respectively. Furthermore, this comparison for healthy women samples is about 35 and 0.1, and for women with Parkinson's is about 32 and 0.05, sequentially.
|Figure 4: Quantitative value of Shimmer for healthy individuals and PWP for (a) men's samples and (b) female's samples|
Click here to view
|Figure 5: Classifier performance for selected features in noiseless (left column) and noisy (right column) conditions for: (a) Jitter category (b) Shimmer category, and (c) both Jitter and Shimmer together|
Click here to view
The performance of the classifier
To evaluate the performance of the selected features, we use an LS-SVM classifier. Each of the selected features from the jitter and shimmer was separately used to determine their performance. Moreover, the performance was evaluated using both features simultaneously, as shown in the left columns of [Figure 5] for noiseless and in the right columns of [Figure 5] for noisy signals. Undoubtedly, each of the selected features shows a good performance of about 70% accuracy in separating PWP from healthy controls in noiseless and noisy conditions for each gender. In addition, it should be mentioned when both the selected features are used together, the accuracy grows moderately and reaches about 80%.
| Discussion|| |
In this study, we have chosen acoustic parameters that make the most significant difference between the healthy and PWP to decide about people's health. By statistical evaluation of jitter and shimmer parameters, we found that for both noiseless and noisy situations, the accuracy of the diagnosis remained almost constant. This can prove the power of the Relief method in selecting noise-resistant features. Furthermore, the results showed that the values of the extracted features increased for men with PD compared to healthy individuals. Whereas this is quite the opposite for women; due to PD, the amount of extracted features decreases compared to the healthy group.
| Conclusion|| |
The main goal of this research is to determine the minimal set of voice features to distinguish patients with Parkinson's from healthy individuals. Although several studies have been done to consider this problem, there is no simplification in the number of appropriate feature sets and program run time. Moreover, compared to other studies, we considered the effect of noise on the signal. Furthermore, this method could be precious since not only it makes specialists be able to screen PWP remotely, but also it would be helpful for underprivileged populations to benefit from social health services.
In this article, we consider various essential measurements of phonic disorders, including 44 acoustic parameters with different properties of the voice signals. A statistical mechanism named Relief is then applied to select the optimal feature in each category of Jitter and Shimmer separately. We also study the effect of the poor signal quality of analog phone lines on the diagnosis. Since there are significant differences in speech characteristics for men and women, we also use the ten-fold cross-validation method to study the classifier performance separately for populations of (a) male-only and (b) female-only. Overall results in all states, regardless of the noisy/noiseless state of their voice recording, maintain adequate performance (accuracy of around 70%). More specifically, when we used both selected features simultaneously, the classifier performance reached 81% accuracy. This result is very close to other studies that have benefited more than ten features and/or complex classifiers. Furthermore, it should be mentioned that the presence of noise only deteriorates the diagnosis accuracy by <2% in all situations, indicating the robustness and utility of this approach in telemedicine applications.
The authors would deeply thank to Dr. Athanasios Tsanas for sharing his Ph.D. dissertation and related source codes, and Ms. Nina Shahsavanpour for valuable help with data gathering and comments on the early drafts of the paper and English editing.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Braga D, Madureira AM, Coelho L, Ajith R. Automatic detection of Parkinson's disease based on acoustic analysis of speech. Eng Appl Artif Intell 2019;77:148-58.
Simonet C, Schrag A, Lees AJ, Noyce AJ. The motor prodromes of Parkinson's disease: From bedside observation to large-scale application. J Neurol 2021;268:2099-108.
Eskidere Ö, Ertaş F, Hanilçi C. A comparison of regression methods for remote tracking of Parkinson's disease progression. Ex:pert Syst Appl 2012;39:5523-8.
Pah ND, Motin MA, Kempster P, Kumar DK. Detecting effect of levodopa in Parkinson's disease patients using sustained phonemes. IEEE J Transl Eng Health Med 2021; 9:1-9.
Azadi H, Akbarzadeh-T MR, Kobravi HR, Shoeibi A. Robust Voice Feature Selection Using Interval Type-2 Fuzzy AHP for Automated Diagnosis of Parkinson's Disease. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021 Jul 14.; 29:2792-2802. [doi: 10.1109/TASLP. 2021.3097215].
Rana B, Juneja A, Saxena M, Gudwani S, Kumaran SS, Behari M, et al
. Relevant 3D local binary pattern based features from fused feature descriptor for differential diagnosis of Parkinson's disease using structural MRI. Biomed Signal Process Control 2017;34:134-43.
Azadi H, Zade MA, Toutounchi MR, Kobravi HR, Talab FR, Bagherzade SA, et al
. Optimal feature selection and comparison for automatic detection of Parkinson's disease using speech signal. Iran J Biomed Eng 2016;10:41-7.
Hariharan M, Polat K, Sindhu R. A new hybrid intelligent system for accurate detection of Parkinson's disease. Comput Methods Programs Biomed 2014;113:904-13.
Cernak M, Orozco-Arroyave JR, Rudzicz F, Christensen H, Vásquez-Correa JC, Nöth E. Characterisation of voice quality of Parkinson's disease using differential phonological posterior features. Comput Speech Lang 2017;46:196-208.
Despotovic V, Skovranek T, Schommer C. Speech based estimation of Parkinson's disease using gaussian processes and automatic relevance determination. Neurocomputing 2020;401:173-81.
Ho AK, Iansek R, Marigliani C, Bradshaw JL, Gates S. Speech impairment in a large sample of patients with Parkinson's disease. Behav Neurol 1998;11:131-7.
Moro-Velazquez L, Gómez-García JA, Godino-Llorente JI, Villalba J, Orozco-Arroyave JR, Dehak N. Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson's disease. Appl Soft Comput 2018;62:649-66.
Ackermann H, Ziegler W. Die dysarthrophonie des Parkinson-syndroms. [The dysarthrophonia of Parkinson's disease]. Fortschr Neurol Psychiatr 1989;57:149-60.
Palacios-Alonso D, Meléndez-Morales G, López-Arribas A, Lázaro-Carrascosa C, Gómez-Rodellar A, Gómez-Vilda P. MonParLoc: A speech-based system for Parkinson's disease analysis and monitoring. IEEE Access 2020;8:188243-55.
Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Trans Biomed Eng 2012;59:1264-71.
Tetrud JW. Preclinical Parkinson's disease: Detection of motor and nonmotor manifestations. Neurology 1991;41:69-71.
Gürüler H. A novel diagnosis system for Parkinson's disease using complex-valued artificial neural network with k-means clustering feature weighting method. Neural Comput Appl 2017;28:1657-66.
Sakar BE, Serbes G, Sakar CO. Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson's disease. PLoS One 2017;12:e0182428.
Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease. IEEE Trans Biomed Eng 2009;56:1015.
Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, et al
. Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inform 2013;17:828-34.
Ali L, Zhu C, Zhang Z, Liu Y. Automated detection of Parkinson's disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized Neural network. IEEE J Transl Eng Health Med 2019;7:1-10.
Benmalek E, Elmhamdi J, Jilbab A. Multiclass classification of Parkinson's disease using different classifiers and LLBFS feature selection algorithm. Int J Speech Technol 2017;20:179-84.
Forrest K, Weismer G, Turner GS. Kinematic, acoustic, and perceptual analyses of connected speech produced by parkinsonian and normal geriatric adults. J Acoust Soc Am 1989;85:2608-22.
Moran RJ, Reilly RB, de Chazal P, Lacy PD. Telephony-based voice pathology assessment using automated speech analysis. IEEE Trans Biomed Eng 2006;53:468-77.
Attuluri N, Pushpavathi M, Pandey P, Mahapatra S. Voice perturbations in repaired cleft lip and palate. Global J Otolaryngol 2017;8 (1).;8:555729.
Teixeira JP, Oliveira C, Lopes C. Vocal acoustic analysis–jitter, shimmer and hnr parameters. Procedia Technol 201;9:1112-22.
Teixeira JP, Gonçalves A. Accuracy of jitter and shimmer measurements. Procedia Technol 2014;16:1190-9.
McNeil MR, Ballard KJ, Duffy JR, Wambaugh JU, van Lieshout P, Maassen B, Terband H. Apraxia of speech theory, assessment, differential diagnosis, and treatment: Past, present, and future. Speech motor control in normal and disordered speech: Future developments in theory and methodology. 2017:195-221.
Baken RJ, Orlikoff RF. Clinical Measurement of Speech and Voice. Cengage Learning is in Boston, MA, USA: Cengage Learning; 1999.
Kira K, Rendell LA. A practical approach to feature selection. In: Machine Learning Proceedings. Publisher is in San Francisco, CA, USA: Morgan Kaufmann 1992. p. 249-56.
Kononenko I. Estimating attributes: Analysis and extensions of RELIEF. In: European Conference on Machine Learning. Berlin, Heidelberg: Springer; 1994. p. 171-82.
Vapnik V. The Nature of Statistical Learning Theory. Springer Science and Business Media; 2013.
Wroge TJ, Özkanca Y, Demiroglu C, Si D, Atkins DC, Ghomi RH. Parkinson's disease diagnosis using machine learning and voice. In: 2018 IEEE Signal Processing in Medicine and Biology Symposium (SPMB). Philadelphia, PA, USA. IEEE; 2018. p. 1-7.
Suykens JA, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett 1999;9:293-300.
Gunduz H. Deep learning-based Parkinson's disease classification using vocal feature sets. IEEE Access 2019;7:115540-51.
Kuresan H, Masunda S, Samiappan D. Analysis of Jitter and Shimmer for Parkinson's disease diagnosis using telehealth. In: Cognitive Informatics and Soft Computing. Singapore: Springer; 2019. p. 711-21.
Hand DJ. The elements of statistical learning: Data mining, inference, and prediction. Biometrics 2002;58:252.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]
[Table 1], [Table 2], [Table 3], [Table 4]