Diagnosing Speech Disorders: Can Voice Stress Analysis be Used in the Medical Field?
Although Computer Voice Stress Analysis (CVSA) is most commonly used to interview suspects and witnesses in criminal investigations, the technology’s ability to identify vocal tremors could also make it a promising diagnostic tool for individuals with speech and hearing disorders. The studies examining this application do not directly associate vocal tremor with deception, but they do prove that vocal tremor arises in speech that is not part of a casual or non-stressful conversation. Since the vocal tremor is a key element of the science behind CVSA, this research verifies the validity of this technology and explores possible uses for it within the medical field.
One such study, “An Acoustic Method to Quantify Perceived Vocal Tremor,” was conducted in 2010 by Marios Fourakis as part of the University of Wisconsin’s Phonology Project. The Phonology Project’s objective is to gather more information about eight speech disorders with unknown causes. In the experiment detailed in his technical report, Fourakis was attempting to develop a method to validate and quantify vocal tremor in children with speech disorders, which could potentially become a criterion for diagnosis.
Impetus for the Study
Currently, the main method used to diagnose speech disorders is the Speech Disorders Classification System (SDCS). The system uses both perceptual and acoustic methods. With perceptual methods, experts listen to a patient speak and classify what they hear using specific codes, as defined in the Prosody Voice Screening Profiles (PVSP). While perceptual methods are useful, the Phonology Project researchers have been focusing their efforts on acoustic methods, which use more objective analyses based on mathematical and scientific calculations.
Fourakis’ goal was to find an acoustic method that could validate the perceptual method that was, at the time, being used to identify vocal tremor. He also wanted to quantify vocal tremor in a way that would make it possible to use it as a diagnostic tool.
Two other researchers, Buder and Strand, had previously developed a general acoustic method for vocal tremor quantification, which Fourakis used as a basis for his study. Their multiple analysis technique—which they called a modulogram—utilized a high-level frequency analysis called Fast Fourier Transformation (FFT). This modulogram method allowed them to collect information about both the amplitude and the frequency of specific sounds. They classified low-frequency voice modulations into three categories: wow (0.2 to 2 Hz), tremor (2 to 10 Hz), and flutter (10 to 20 Hz).
The problem for Fourakis was that Buder and Strand’s method didn’t work well for diagnosing speech disorders in young children. Because of the complexity of some of its calculations, their multiple analysis method only worked for voice segments that were 8 to 40 seconds long. Similarly, other methods for acoustic analysis required at least 2 to 3 seconds of a prolonged vowel sound. Either way, Fourakis noted that it was challenging for young children to understand what it meant to hold a vowel sound and to keep their voice as steady as possible, which was necessary for reliable results. Therefore, the method he developed in this experiment was designed to detect and quantify vocal tremor in regular speech, where vowel sounds could be as short as 250 milliseconds—about the times it takes you to say “what.”
The Experimental Analysis
For this study, Fourakis drew from two databases of audio samples from children with speech disorders. The first contained samples from 13 children between the ages of 3 and 6 who had been diagnosed with speech delay. The second contained samples from 17 children between the ages of 4 and 16 years old who had been diagnosed with galactosemia and either speech delay or childhood apraxia of speech. These audio samples were recordings of either regular conversational speech or a basic speech repetition task.
Fourakis then narrowed down the audio samples to those in which vocal tremor had been detected using perceptual methods. Specifically, an expert used the PVSP system to assign the samples the code PV26 – Break/Shift/Tremulous. Of the 5 samples, 1 was from a child with mild speech delay, 2 were from children with mild-to-moderate speech delay, 1 was from a child with moderate-to-severe speech delay, and 1 was from a child with severe speech delay.
Within the audio samples assigned PV26 codes, Fourakis located words containing a long vowel sound followed by a consonant. He used audio editing software to extract the word and calculate the duration of the vowel sound. Then, with a series of calculations similar to those used by Buder and Strand, he was able to produce a “power spectrum,” which provides information about the frequency of the speech sound. However, unlike Buder and Strand, he performed an initial calculation to extrapolate the data from the shorter-duration vowel sound, so he didn’t need the longer audio clip required by the previous analysis method.
After applying his method to 13 sample words, he found that the average lowest frequency was 8.3 Hz—which is within the range for tremor, as defined by Buder and Strand (2 to 10 Hz). Out of all 13 words, 10 fell in this range, while 3 fell into the range of vocal flutter (10 to 20 Hz). Although Fourakis notes that the results could be improved with a large sample size, he was able to verify the consistency between his acoustic method and the perceptual method for identifying vocal tremor.
Relevance of the Results
In the end, Fourakis concluded that his new method provided an acoustic way to confirm perceptual methods for identifying vocal tremor. With further research, he postulated that this quantitative measure could eventually serve as a marker in the diagnosis of motor speech disorders, especially in young children.
Just like CVSA technology, Fourakis’ technique went beyond human perception to provide an objective analysis of speech. His results show that rigorous analysis of physiological tremor is a reliable way to evaluate abnormal speech, whether for speech disorder diagnosis in the medical field or for deception detection in law enforcement. It also shows that vocal tremor can be detected in all kinds of individuals, including those with speech disorders, thus providing further evidence for the versatility of CVSA technology for a broad range of law enforcement purposes.