Anda di halaman 1dari 7

Voice Assessment Procedures

There are three main procedures for assessing potential voice disorders:

1. laryngoscopic examination
2. perceptual assessment

3. instrumental measurement
This article provides an outline of procedure 2 perceptual assessment of voice.

Perceptual Assessment of Voice

This involves describing the voice solely by listening to it, i.e. using auditory perception. Speech therapists who specialize in working with people with voice disorders have received training in describing the relevant characteristics (see Paralinguistic Features in 'Voice') of a disordered voice. Perceptual evaluation can be either informal or formal.

Informal perceptual evaluation takes place throughout the whole meeting between a therapist and a client. It is usual for the therapist to engage the client in spontaneous conversation and to conduct a case history designed to gather information about the commencement and history of the voice difficulty, relevant medical information, the domestic situation, client lifestyle and any traumas that may affect the voice. Whilst it may appear to the client that the therapist is merely chatting on occasion, in reality the therapist is evaluating the voice on a principled basis. That is to say, the therapist will have some descriptive scheme in mind that allows them to appraise performance under various categories of concern (e.g. voice quality, pitch, pitch range, loudness, nasal resonance, flexibility, stamina, breathing method, and so on).

Formal perceptual evaluation typically involves the use of a protocol: a standard procedure for systematically describing and quantifying a voice difficulty. Often, specific time is set aside during an assessment appointment with the therapist for this aspect of evaluation, i.e. it is often presented as a separate activity from any informal conversation and case history taking. There is no universally agreed upon method of conducting a formal perceptual evaluation. In fact, there are many schemes/protocols to choose from, each with its own strengths and weaknesses. A popular scheme is the Buffalo III Voice Profile developed in 1987. This scheme rates: laryngeal tone, pitch, loudness, nasal resonance, oral resonance, breath supply, muscles, voice abuse, rate, speech anxiety, speech intelligibility and an overall voice rating. Each parameter is quantified using a 5-point scale, where 1 = normal, 2 = mild, 3 = moderate, 4 = severe, and 5 = very severe. Another popular scheme, which has been adopted by the UK Royal College of Speech and Language Therapists as the minimum knowledge and skills set for therapists working with voice difficulties, is the GRBAS Scale. Developed in 1981, this scheme is not a complete perceptual evaluation protocol but specifically evaluates voice quality. It assesses: Grade (the overall degree of voice abnormality), Roughness, Breathiness, Asthenia (voice weakness), and Strain. Under this scheme, each parameter is quantified on a 4-point scale, where 0 = normal, 1 = mild, 2= moderate, and 3 = severe.

There are several other similar schemes but I will exemplify the approach by referring to a scheme known as CVE2 which I originally developed in 2003.

Clinical Voice Evaluation 2 (CVE2)

CVE2 was originally written as a Windows based software program to assist in the perceptual assessment of voice difficulties in both adults and children, guiding the clinician through a systematic voice evaluation. Two types of assessment can be conducted. The first is a detailed assessment that gathers data in relation to: client perception, contextual speech, voice quality, S/Z ratio, MPT (maximum phonation time), endurance, breathing, pitch, loudness, prosody, cough, coup, glottal attack, resonance, motor speech, and musculoskeletal tension. The second type of assessment is a screening assessmentthat takes approximately 10 minutes to administer: the data gathered is a subset of that collected for the detailed assessment. Screening assessments are used to determine whether or not the voice is sufficiently different from the norm so as to warrant a complete, in-depth assessment. I will now describe the screening assessment protocol. Other than the aerodynamic measures and related observations, all screening parameters described below are rated on a 4-point scale: 0 = normal, 1 = mild, 2= moderate, and 3 = severe.

download CVE2 Screening Assessment

Client Perception
Ask the client to rate their voice today, right now, as they are speaking to you.

During informal conversation first determine if the voice quality is (1) normal (1) breathy, (2) hoarse, (3), husky or (4) whispered. Then, if the quality is other than normal, rate the particular quality as either mild, moderate or severe.

Again during informal conversation, determine if the pitch is (1) normal, (2) too high, (3) too low. If the pitch is other than normal, rate the too high or too low pitch as either mild, moderate or severe.

Pitch range
Ask the client to take a normal breath and, starting from a relatively low note, sing up a musical scale using the sound lah as high as they can comfortably reach without straining. Now ask them to reverse this procedure, singing down the scale, starting from a relatively high note. Judge whether of not the pitch range is restricted and rate it on the 4-point scale.

Ask the client to take a normal breath and to count from 1 to 10, starting fairly softly and increasing the loudness with each number. Then ask them to reverse this procedure, counting backwards from 10 to 1, starting loudly and reducing the loudness with each descending number. From the clients performance on this task and their performance during informal conversation judge whether or not the loudness of their voice is (1) normal, (2) too

loud, or (3) too quiet. As usual, if it is other than normal rate the too loud or too quiet voice as mild, moderate or severe.

Nasal resonance
Have the client read aloud the Standard Text.

download Standard Text (UK)

download Standard Text (US)

The first paragraph contains no nasal consonants and helps in detecting hypernasality. This is because, as all the consonants are oral consonants, the soft palate should normally be raised preventing air escaping through the nose. If there is genuine hypernasality this will be detectable because, as the soft palate does not fully seal off the nasal cavity, the oral consonants will be nasalized. The second paragraph does contain nasal consonants (m, n and ng) and so it helps in detecting hyponasality. This is the reverse of the above argument, i.e. we would expect the nasal consonants to be nasalized but if the soft palate is not lowered sufficiently (or there is a blockage in the nasal cavity) then insufficient air escapes through the nose and this is particularly noticeable on the nasal consonants. Having determined if the nasal resonance is (1) normal, (2) hypernasal, or (3) hyponasal, once again rate it as mild, moderate or severe if it is other than normal.

Oral resonance
This is sometimes known as throatiness. The voice seems to be focused too deep in the throat what some people call guttural speech. Typically the tongue is held quite flat in the mouth and it is retracted towards the back of the throat. Again, if this is not normal, rate it as a mild, moderate or severe deviation.

An average rate of speech is around 120-150 syllables per minute. Each of the paragraphs in the Standard Text contains exactly 150 syllables. So this can be used to judge the rate of speech, i.e. a person speaking at 150 syllables per minute would take one minute to speak each paragraph. Be careful here, however, as people will often read at a different rate to their everyday spontaneous speech. You will be able, once more, to assist your judgment of this parameter from the informal conversation(s) you have had with the client. Determine if the rate is (1) normal, (2) too fast, or (3) too slow, and quantify this as appropriate.

Rhythm and intonation give spontaneous speech a characteristic rhythmical beat and an attendant musical quality. A flexible voice should possess good coloring that makes it easy to listen to. Lack of coloring leads to a monotonous voice. From your informal conversation(s) and the various assessment tasks performed so far, judge whether or not the voice is (1) normal, (2) has inadequate variability, or (3) has excessive variability, and rate as appropriate. Remember that parameters such as prosody are culturally bound. That is to say, what is considered to be suitable prosody for a typist (who does not have to answer phones) may not be considered suitable for a professional voice user, such as a teacher, a minister of religion or an actor. Thus, one should judge this (and arguably all other parameters) in relation to the persons vocal demands, occupation, any hobbies that rely on use of the voice, and so on.

Aerodynamic measures

S/Z ratio: This ratio is explained elsewhere (click here for an explanation of the S/Z Ratio). However, in summary, it is an indicator of laryngeal dysfunction. It is obtained by dividing the longest time in seconds that the client can sustain the sound s by the longest time in seconds that they can sustain the sound z. Clients who have difficulty phonating will likely have an S/Z ratio of greater than 1.4. The higher the ratio, the more difficulty the client is experiencing when phonating.

MPT: This measure is explained elsewhere (click here for an explanation of MPT). In summary, it is simply the longest time that a client can sustain a vowel sound at a comfortable pitch and loudness on a deep breath. Adult females should achieve between 15-25 seconds, whereas adult males exceed this at between 25-35 seconds.

Related observations
The following observations are not rated on the 4-point scale but simply recorded as either (1) present, or (2) absent.

glottal fry: this is characterized by a series of rapid low-pitched 'pops' and a creaky quality. diplophonia: this is characterized by the perception of two simultaneous pitches in the voice it may result from involvement of the false vocal folds during normal phonation.

phonation breaks: these are characterized by uncontrolled, short-duration cessations of vocal fold vibrations during speech, heard as short periods of no voice.

fluctuations in quality: the quality of the voice (normal, breathy, hoarse, husky, whispered) may not be stable there may be wide fluctuations from one quality to another and back again.

The screening assessment data can be used to create a Voice Profile with a Severity Rating which is obtained by adding the scores for all the parameters, with the exception of the aerodynamic measures, related observations and client perception.

download CVE2 Vocal Profile Vocal Profiles such as this are useful as a baseline measure and for monitoring changes over time. Remember, that this profile is just a screening assessment and if the client received a high severity rating then this would suggest that a more complete and detailed assessment should be carried out.

Instrumental Measurement
There are many high technology instruments available for assisting the assessment of voice. Many, perhaps the majority, are used not by voice therapists per se but by suitably qualified physiologists, radiographers, and Ear,

Nose and Throat surgeons. There are too many to be considered in detail here, so I will briefly outline three instrumental techniques with which most people will likely have some familiarity. I will then focus on a discussion of acoustic analysis.

Vocal tract imagers

These instruments allow us to visualize the internal structure of the vocal tract. Radiography which uses X-rays is probably the most well known technique. A related technique is videofluoroscopy, which is popularly used to image swallowing and is now regularly used in the assessment of dysphagia. Magnetic Resonance Imaging (MRI) uses magnetism rather than X-rays to visualize internal structures that may not be readily visualized with Xrays. There are several other instruments and techniques, each with their advantages and disadvantages, but the common feature is their ability to visualize internal body structures. Owing to the highly specialized nature of this technique it is typically carried out by suitably qualified radiographers.

Electromyography (EMG)
When a muscle contracts a small electrical current is produced which is typically proportional to the strength of the muscle activity. EMG measures this electrical activity of muscles. There are two types. Surface EMG involves placing two electrodes on the skin overlying the muscles to be investigated. Intramuscular EMG involves inserting a small needle electrode into the muscle itself. In both instances, the electrical activity is typically displayed on an oscilloscope. This technique has limited application but does assist in detecting levels of laryngeal muscle tension and may be used in cases of identified vocal fold paralysis. EMG is typically carried out by trained physiologists or physicians.

Electroglottograph (EGG)
This is a non-invasive device that measures the contact between the vocal folds. Two electrodes are placed either side of the larynx and a small electrical current is passed between them. As the vocal folds open (abduct) and close (adduct) (see anatomy of the larynx), the resistance to the flow of the current alters. The variations in resistance are displayed as an image on a computer screen which represents the movement/contacts of the vocal folds. This technique is also useful in gathering information about the fundamental frequency of the voice, and the voice quality. Unlike vocal tract imaging and EMG techniques, electroglottography is commonly carried out by voice therapists (speech therapists).

Acoustic Analysis
Acoustic analysis is, in some ways, the objective counterpart of perceptual assessment of voice, in that it measures several of the same vocal characteristics that are explored using just auditory perception, e.g. pitch, pitch range, loudness, degree of hoarseness. Measurements can be made by dedicated instruments which are designed for a particular task(s) (e.g. a sound level meter, which displays the intensity level of speech sounds; a spectrogram, which displays the frequency and intensity of single speech sounds, syllables, words or connected speech) or by integrated software programs capable of measuring and displaying several parameters at once (e.g. KayPENTAXs Computerized Speech Lab (CSL)).

One integrated package that I have used in my clinics for several years is Praat. Written by Paul Boersma and David Weenink at the University of Amsterdam, Praat is a computer program with which you can analyze, synthesize, and manipulate speech. It also has an in-built Voice Report tool. It is available for many different computer operating systems and can be downloaded for free from

When conducting any formal assessment/measurement it is important to follow a protocol, i.e. a standard procedure for systematically gathering the relevant data which will subsequently be used to describe and quantify the voice characteristics. There is insufficient space here to describe the protocol I use in detail. However, I always record the client phonating long sustained vowels in isolation and also performing so-called solo connected speech, as follows:

1. Ask the client to take a normal breath and then to sustain the vowel sound ah (as in the
words art and heart) for about five seconds at a comfortable pitch and loudness on one exhalation, without straining. [Praat has its own built-in digital sound recorder.] 2. Repeat Step 1 above but, this time, sustaining the vowel ee. 3. Record the client speaking a short stretch of talk, e.g. My name is Graham Williamson and I live in Billingham. Praat is a sophisticated program which can perform complex analyses (including spectrograms) or be used to measure relatively straightforward characteristics such as pitch. When investigating the vowels in isolation, I routinely measure: JITTER This is also known as pitch perturbation and refers to the minute involuntary variations in the frequency of adjacent vibratory cycles of the vocal folds. In essence, it is a measure of frequency variability in comparison to the clients fundamental frequency. Pathological voices often exhibit a higher percentage of jitter. SHIMMER Whereas jitter is a measure of the percentage irregularity in the pitch of the vocal note (pitch perturbation), shimmer is a measure of the percentage irregularity in the amplitude of the vocal note. It is often referred to as amplitude perturbation. Shimmer, therefore, measures the variability in the intensity of adjacent vibratory cycles of the vocal folds. As with jitter, pathological voices will typically exhibit a higher percentage of jitter. HARMONICS-TO-NOISE RATIO (HNR) The vocal note produced by the vibrations of the vocal folds is complex and made up of periodic (regular and repetitive) andaperiodic (irregular and non-repetitive) sound waves. The aperiodic waves are random noise introduced into the vocal signal owing to irregular or asymmetric adduction (closing) of the vocal folds. Noise impairs the clarity of the vocal note and too much noise is perceived as hoarseness. Praat is capable of measuring the proportions of periodic and aperiodic waves (noise) in the vocal note and displaying this as a Harmonics-to-Noise (HNR) ratio. Laryngeal pathology may lead to poor adduction of the vocal folds and, therefore, increase the amount of random noise in the vocal note. The greater the proportion of noise, the greater the perceived hoarseness, and the lower the HNR figure will be, i.e. a low HNR indicates a high level of hoarseness, and a high HNR indicates a low level of hoarseness. Figure 1 represents jitter, shimmer and HNR diagrammatically.

Figure 1. Jitter, Shimmer and HNR When analysing the solo connected speech, I routinely measure the following: MEAN PITCH This is the Speaking Fundamental Frequency (SF0), i.e. the average speaking pitch. For adult males this is around 128 Hz (cycles per second), for adult females it is about 225 Hz, and for children under the age of 10 years it can average 260 Hz. STANDARD DEVIATION This is a statistical measure of how much the pitch varies from the mean pitch. MINIMUM PITCH This is simply the lowest pitch recorded in the spoken sample. MAXIMUM PITCH The highest pitch recorded in the spoken sample. PITCH RANGE Subtracting the minimum pitch from the maximum pitch gives the pitch range. This is an indicator of flexibility, representing a measure of how much the client varies their pitch during speech. Typical pitch ranges are 85-196 Hz for adult males and 155-334 Hz for adult females.

download a Praat Summary Report template

< Prev

Next >