I’m Jorge Andrés Gómez García, a researcher working at the Insituto Cajal at the National Spanish Research Council (CSIC). I received my degree in Electronics engineering and MEng from Universidad Nacional de Colombia, Manizales in 2008 and 2010 respectivel. In 2018 I received my PhD from Universidad Politécnica de Madrid, Spain. From 2018-2020 I worked as a researcher in Universidad Politécnica de Madrid. Nowadays, I’m a researcher at the Neural Rehabilitation group where I’m using artificial intelligence to solve problems in the field of rehabilitation and bioengineering.
This is the third of a three-part series devoted to review the current state of the art of automatic voice condition analysis systems. A direct continuation to “On the design of automatic voice condition analysis systems. Part I: review of concepts and an insight to the state of the art” and to “On the design of automatic voice condition analysis systems. Part II: review of speaker recognition techniques and study on the effects of different variability factors” already published in this journal. The goal of this paper is to compile the most significant parameterisation approaches used in the literature for automatic voice condition analysis systems, along with a critical discussion about their usefulness, providing the user with a comprehensive review of the most important techniques used for acoustic modelling in the field. The paper presents the mathematical formulation and physical interpretation of a series of perturbation and fluctuation parameters, noise features, complexity based parameters, modulation spectra, morphological parameters, and spectral-cepstral coefficients; and is complemented with a library written in MATLAB®, which has been made available to the readers in an online software repository.
EAAI
Emulating the perceptual capabilities of a human evaluator to map the GRB scale for the assessment of voice disorders
Jorge Andrés Gómez-García , Laureano Moro-Velázquez , Janaina Mendes-Laureano , and 2 more authors
Engineering Applications of Artificial Intelligence, Jun 2019
This paper presents the design of an automatic voice quality analysis system for the assessment of voice pathologies, which emulates the perceptual capabilities of a human evaluator according the GRB scale. For this purpose, a novel methodology based on multiple sets of characteristics, ordinal classification and Gaussian regression is proposed. In particular, a reduced subset of characteristics is identified, and the regressor is used to convert the discrete perceptual scale to a continuum, more in agreement to the nature of the problem under study. The robustness of the system is evaluated in several cross-dataset experiments. Similarly, a clinical evaluation of the predictions provided by the system is carried out. Results indicate that the proposed methodology is proficient in modelling the perceptual capabilities of the human evaluator. They also show that it is possible to extend the GRB scale to a continuum through regression techniques while maintaining the consistency of the results. On average, the deviation between the labels assessed by the expert and the ones provided by the system is of about 0.5 units (in a scale from 0 to 3) for G and B, and of 0.7 units for R. Similarly, the deviation of the labels predicted by the system in the clinical assessment trials is about 0.3 units for G, 0.4 units for B, and 0.5 units for R.
ASC
Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson’s Disease
The diagnosis of Parkinson’s Disease is a challenging task which might be supported by new tools to objectively evaluate the presence of deviations in patient’s motor capabilities. To this respect, the dysarthric nature of patient’s speech has been exploited in several works to detect the presence of this disease, but none of them has deeply studied the use of state-of-the-art speaker recognition techniques for this task. In this paper, two classification schemes (GMM-UBM and i-Vectors-GPLDA) are employed separately with several parameterization techniques, namely PLP, MFCC and LPC. Additionally, the influence of the kinetic changes, described by their derivatives, is analysed. With the proposed methodology, an accuracy of 87% with an AUC of 0.93 is obtained in the optimal configuration. These results are comparable to those obtained in other works employing speech for Parkinson’s Disease detection and confirm that the selected speaker recognition techniques are a solid baseline to compare with future works. Results suggest that Rasta-PLP is the most reliable parameterization for the proposed task among all the tested features while the two employed classification schemes perform similarly. Additionally, results confirm that kinetic changes provide a substantial performance improvement in Parkinson’s Disease automatic detection systems and should be considered in the future.
PlosOne
Towards the identification of Idiopathic Parkinson’s Disease from the speech. New articulatory kinetic biomarkers
Juan Ignacio Godino-Llorente , Stephanie Shattuck-Hufnagel , Jeung-Yoon Choi , and 2 more authors
Although a large amount of acoustic indicators have already been proposed in the literature to evaluate the hypokinetic dysarthria of people with Parkinson’s Disease, the goal of this work is to identify and interpret new reliable and complementary articulatory biomarkers that could be applied to predict/evaluate Parkinson’s Disease from a diadochokinetic test, contributing to the possibility of a further multidimensional analysis of the speech of parkinsonian patients. The new biomarkers proposed are based on the kinetic behaviour of the envelope trace, which is directly linked with the articulatory dysfunctions introduced by the disease since the early stages. The interest of these new articulatory indicators stands on their easiness of identification and interpretation, and their potential to be translated into computer based automatic methods to screen the disease from the speech. Throughout this paper, the accuracy provided by these acoustic kinetic biomarkers is compared with the one obtained with a baseline system based on speaker identification techniques. Results show accuracies around 85% that are in line with those obtained with the complex state of the art speaker recognition techniques, but with an easier physical interpretation, which open the possibility to be transferred to a clinical setting.