A Systematic Review of Voice Therapy: What

Effectiveness Really Implies
*Maude Desjardins, Lucinda Halstead, Melissa Cooke, and *,,Heather Shaw Bonilha, *Charleston, South

Summary: Introduction. Behavioral voice therapy guided by a speech-language pathologist is recommended as

the main treatment approach for many kinds of voice disorders. Encouraging evidence regard of good outcomes from
voice therapy has been found in two previous reviews on broad patient populations. However, no definitive conclusion
on the effectiveness of direct voice therapy can be drawn from these reviews due to limitations of the included studies.
Aims. To review recent literature on voice therapy; to provide clinicians with a list of evidence-based voice therapy
techniques; to incorporate the therapy components in a physiologically based model; to assess the limitations and pro-
gress achieved in the recent research on voice therapy.
Methods. A literature search was conducted using three electronic databases: PubMed, Scopus, and CINAHL. A similar
strategy was used in all three databases to highlight the concepts of therapy and voice disorders. Only randomized
controlled trials were included in the review.
Results. Fifteen papers met the inclusion criteria, covering five categories of voice disorders (functional, Parkinson
induced, GERD induced, presbyphonia, unilateral vocal fold paresis) and seven specific behavioral voice therapy ap-
proaches. Statistically significant improvements were found postintervention on at least one outcome variable in all
but one study. Clinical significance of the results was rarely discussed. Discrepancies in reported outcome measures
were found across studies, making comparisons between interventions challenging.
Conclusion. Behavioral voice therapy generally leads to significant improvements in voice outcomes, but further re-
search considering clinical meaningfulness of the results are needed to establish what is really meant by the term
effectiveness when it comes to voice therapy.
Keywords: VoiceDysphoniaVoice disorderVoice therapyReview.

INTRODUCTION Behavioral voice therapy guided by a speech-language pa-

Almost 30% of the adult population will experience voice dif- thologist (SLP) is often the recommended primary approach for
ficulties at some point in their life, of either chronic (21.5%) or treating voice disorders and, when not the primary approach, is
acute (78.5%) nature.1 Their voices will not perform or sound recommended in addition to a medical or surgical treatment.5
as they usually do and this could impact their communication Voice therapy is often categorized as direct, focusing on the phys-
and work, as well as their overall quality of life.1,2 Reported symp- iological components of the disorder, or indirect, focusing on
toms may include hoarseness, breathiness, aphonia, vocal fatigue the actions and the environmental factors that may contribute
and pain, all of which can lead to difficulties in performing daily to the disorder.6 Van Stan and his colleagues7 have recently de-
activities. Voice disorders can negatively impact a persons social veloped a taxonomy of voice therapy that further subdivides direct
relationships, emotional state, and health to a degree compara- and indirect treatment into more specific elements. Direct in-
ble to other chronic disorders such as heart failure, angina, and tervention is divided into five categories: auditory, somatosensory,
chronic obstructive pulmonary disease.2 In a recent study, Cohen musculoskeletal, respiratory, and vocal function. The indirect
and his colleagues3 calculated the costs associated with short- intervention consists of pedagogy and counseling sections. This
term disability claims related to dysphonia. They found that voice taxonomy encourages clinicians and researchers to consider
disorders lead to productivity losses of $4437.89 per person each therapy approaches in terms of their physiological targets.
year. At a national scale, the potential direct healthcare costs In the last decade, two systematic reviews of broad patient
related to assessment and management of laryngeal diseases and populations have found encouraging evidence for the efficacy
disorders have been estimated at $13 billion in 12 months.4 of voice therapy. A systematic literature review conducted by
Ruotsalainen et al6 led to the conclusion that a combination of
Accepted for publication October 4, 2016. direct and indirect voice therapy should be considered the best
Institution where the study was performed: Medical University of South Carolina, De- available intervention for functional dysphonia, when com-
partment of Health Sciences and Research, College of Health Professions.
Grant support/acknowledgement of financial support: None/nothing to disclose. pared to no intervention. These results are based on the results
From the *Department of Health Sciences and Research, College of Health Profes- of three studies, and on the outcomes of self-assessment mea-
sions, Medical University of South Carolina, Charleston, South Carolina; Department of
OtolaryngologyHead and Neck Surgery, Medical University of South Carolina, Charles- sures only: Vocal Performance Questionnaire (VPQ) and Voice-
ton, South Carolina; and the Evelyn Trammell Institute of Voice and Swallowing, Medical Related Quality of Life (V-RQOL). A review by Speyer8 included
University of South Carolina, Charleston, South Carolina.
Address correspondence and reprint requests to Heather Shaw Bonilha, Department of functional and organic dysphonia and found that direct voice
Health Sciences and Research, College of Health Professions, Medical University of South therapy leads to more positive outcomes than indirect voice
Carolina, 77 President St. MSC 700, Charleston, SC 29425. E-mail:
Journal of Voice, Vol. , No. , pp. - therapy.8 This same review also found that when study popula-
0892-1997 tions were restricted to groups of patients with specific diagnoses
2016 Published by Elsevier Inc. on behalf of The Voice Foundation. and assigned to well-defined voice therapy techniques (eg,
laryngeal manipulation, Accent Method), they had more success (1) Review the literature on voice therapy administered by
compared to studies where groups and treatments were less SLPs to treat voice disorders.
explicit.8 (2) Provide clinicians with a list of evidence-based voice
Although it is important to recognize that direct voice therapy therapy techniques.
leads to positive voice outcomes, from the results of these reviews (3) Incorporate the therapy components in a physiological-
it is still unclear what specific treatment approaches within direct ly based model.
therapy are effective. In addition, despite the trends stemming (4) Assess the limitations of and progress achieved in the
from the results of his review, Speyer8 states that no definitive recent research in voice therapy.
conclusion on voice therapy effectiveness can be drawn. He high-
lights several limitations in the literature that have reduced the METHODS
conclusions that could be made from the research on voice
Search procedures
therapy.8 Some of the mentioned issues concern the use of sub-
A review of the literature was conducted following the PRISMA
jective instruments for voice assessment, small groups of patients,
guidelines.12 To answer the research questions, three electronic
and the lack of appropriate control groups.8 Moreover, the number
databases were searched: PubMed, Scopus, and CINAHL. A
of experimental studies on voice therapy was reported as being
similar strategy was used in all three databases to highlight the
limited and heterogeneous in terms of diagnoses, assessment in-
concepts of therapy and voice disorders. In PubMed, the
struments, and provided therapy.8 Ruotsalainen and his colleagues6
point out the fact that most studies do not provide a specific de- MESH terms treatment outcome and voice training were added
scription of the administered treatment. Instead, voice therapy to the key words voice training, voice therapy, voice treat-
is characterized with generic terms such as traditional treat- ment, voice rehabilitation, and treatment outcome to establish
ment, classical therapy, or direct voice training,6 leaving room the therapy component of the search. The MESH term voice
for substantial variation in the prescribed treatments based on disorders was linked to the subheadings therapy and rehabili-
the literature.9 A consequence of this lack of specificity is the tation and subsequently combined with the key words aphonia,
perpetuation of the black box phenomenon.10,11 The rehabil- dysphonia, hoarseness, and voice disorder to establish the voice
itation field has been compared to a black box because of its disorders part of the search. These two groups of terms were
many grey areas, one of them being the unclear process that links then combined to form the complete PubMed search strategy.
clinical treatment with functional outcomes.10 Without specify- The same key words and principles were applied in Scopus
ing the content of therapy, the possibility of gaining a better and CINAHL. In CINAHL, the exploded heading voice disor-
understanding of the mechanisms of action responsible for the ders was also added to the list of key words. The search was
treatment outcome is limited.11 conducted on January 29, 2016. The reference lists of all rele-
In the last 8 years since Speyers literature review,8 the vant articles, those included in the study, were screened to
voice research community has emphasized the importance of identify any articles that were not retrieved from the database
using rigorous methodology for treatment outcome studies. searches.
Thus, it is expected that more recent research has overcome
many of the limitations identified by Speyer8 and that a review Screening procedures
of this research would provide clinicians with valuable infor- Inclusion and exclusion criteria
mation about voice treatment techniques that have solid evidence
to support their use in clinical practice. To further inform I. Publication:
clinicians on the studied interventions, this review will de- a. English and French articles published beginning
scribe the components of the treatment approaches and will January 2007 were included in the search.
evaluate where these components fit within the theoretical model b. Only publications that were peer reviewed and issued
created by Van Stan and his colleagues.7 Incorporating the in an indexed journal were considered for the review.
results of the literature search into a physiologically based Masters or doctorate theses and conference proceed-
model will lead to a better understanding of which physiolog- ings were excluded.
ical targets are used in the studied approaches and which targets II. Participant characteristics:
lead to positive outcomes.10 a. For a study to be included in the review, its subjects
This article aims to review the current body of knowledge in had to have been diagnosed with dysphonia. Studies
voice therapy, giving clinicians access to a comprehensive over- that included participants based on the presence of
view of evidence-based voice techniques that have been studied voice symptoms only or without a specific dyspho-
since Speyers review.8 By reporting, discussing, and compar- nia diagnosis were excluded. Studies evaluating the
ing the results of voice therapy studies, this review could help effects of an intervention on healthy subjects com-
generate hypotheses concerning what treatment, combination of pared to dysphonic subjects were included if the data
treatments, or alternative treatment is most efficient for a certain for dysphonic subjects were reported separately. Voice
disorder. This could help enhance the efficacy of voice therapy, disorders of all origins (functional [maladaptive],
limit the duration of treatment, and reduce healthcare costs as- organic, neurological, and psychogenic) as well as
sociated with voice rehabilitation. dysphonia related to aging were considered for this
The present studys aims are as follows: review.
b. Studies including participants of all ages were Critique procedures

considered. Data elements of interest, extracted by the first author and last
c. Studies that were conducted on animals were author, are reported in Tables 1 and 2.
III. Therapy: Study characteristics
a. All behavioral therapies were included unless they Every study was assessed according to the Jadad scale for meth-
were used as complementary approaches to a medical, odological quality14,15 and was given an overall score. This
surgical, pharmaceutical, or electrical nerve stimu- standardized scale was developed by Jadad et al15 to assess the
lation treatment intervention (eg, voice therapy with quality of randomized controlled trials and is based on three main
BOTOX, therapy following a laryngectomy, etc.). criteria: method of randomization, blinding, and attrition con-
These interventions were included only if they were sideration. The maximum score that can be earned is five points.
compared to an exclusively behavioral therapy group. Participants characteristics extracted consisted of: age range
Exceptions were made for participants on reflux med- and average, gender repartition, and total sample size. The par-
ication; these studies were included because it is ticipants voice disorders were also reported.
possible that most voice patients are on reflux med-
ication, even when it is not mentioned in the studys
methodology. Voice therapy characteristics
b. Indirect therapies (vocal hygiene strategies, vocal rest, Extracted characteristics of the administered treatments con-
etc.) were included only if they were administered sisted of the type of intervention for each group (eg, flow
jointly with direct voice therapy. phonation, no-intervention control group, etc.) and the dura-
c. Voice training programs for voice professionals tion and frequency of the therapy. Regarding outcome measures,
without voice disorders and prevention interven- only those that were used to compare the pre- and posttreat-
tions were excluded. ment voice outcomes were reported. The key results of the studies
d. Therapy had to be conducted or supervised by SLPs were also described in the same table. If statistical analyses were
or students in speech and language pathology. performed, the P values, and when available effect sizes, were
IV. Study design: reported. Otherwise, descriptive key findings were summarized.
a. Only randomized controlled trials reporting pre- and
posttreatment voice outcomes were considered for RESULTS
inclusion. Search results
b. Studies reporting only long-term outcome measures Figure 1 shows the results of the search and screening process
(more than 1 month after the end of the treatment in an adaptation of the PRISMA flow chart.12 The numbers of
period) were excluded. articles included and excluded at each step of the process are
c. Interventions had to involve at least five partici- provided as well as the reasons for exclusion following the full-
pants in each group. text screening phase. After completing the screening process, 15
d. Tool development and validity studies were excluded. papers met all of the inclusion criteria and were evaluated in this
V. Outcome measures: Studies reporting any subjective review.
or objective voice-related outcome measures were con-
sidered, including: self-reported measures (self-
assessment questionnaires), observer-rated measures Study characteristics
(perceptual judgment of voice quality and visual Study characteristics are reported in Table 1. All studies were
examinations), and instrumental measures (acoustic randomized controlled trials and were rated using the Jadad
analysis and aerodynamic measures or phonation scale.15 The total quality score of each study is displayed in
time).13 Table 1, along with the participants characteristics and their voice
disorders. The studies in this review had Jadad scale scores
Following the initial search, duplicates were deleted and re- ranging from 2 to 4 points, with a mean of 3.33 points.
maining articles were screened by the first author based on the
title and abstract. To assess screening reliability, 15% of these Participant characteristics
articles were selected and screened by the last author. Reliabil- The age range of participants across all studies was 16 to 91 years
ity of the screening was 100%. Studies that did not meet the old. Two studies had participants under 18 years old,16,17 with a
inclusion criteria or that met any exclusion criterion were deleted minimum age of 16 years old in both cases. Four studies had
after screening title and abstract. Those that met all inclusion exclusively female participants1821 and one study did not mention
criteria or were ambiguous underwent the second screening phase, the gender distribution.22 For the 10 studies containing both sexes,
full-text review. At this stage, all articles were screened by the the cumulative female to male ratio was 2:1. Two studies had
first author and 20% of these articles were selected and screened more males than females and one had equal gender represen-
by the last author. Reliability of the second screening phase was tation. These include both articles on Parkinsons disease23,24 and
100%. The results of the screening are presented in the Results the article on presbyphonia.25 Total sample sizes varied from 14
section. to 162 participants with a median sample size of 34 participants.
Study Characteristics

Jadad Participant Characteristics

Study Score Age Gender Total N Voice Disorder for Inclusion
Alves Silverio et al 4 1845 20 females N = 20 Bilateral vocal nodules
Behrman et al19 3 Over 18 62 females N = 62 Bilateral, fairly symmetric, mid-membranous,
benign, free-edge vocal fold lesions (nodules,
prenodules, mid-fold swelling)
Constantinescu et al23 4 5487 (70.12 8.56) 7 females N = 34 Parkinsons disease with the presence of
27 males hypokinetic dysarthria impacting
Halpern et al24 4 5481 8 females N = 16 Parkinsons disease

8 males (+historical control
group n = 13)
Nguyen and Kenny20 3 G1: 2254 (42.5 9) 40 females N = 40 Muscle tension dysphonia (organic lesions
G2: 2454 (43.7 7.6) excluded)
Pedrosa et al34 4 G1: 34.5 9.03 56 females N = 80 Behavioral dysphonia (organic lesions excluded)
G2: 36.05 10.37 24 males
Ptok and Strack22 2 G1: 2783 (57 12.6) Not reported N = 69 Unilateral recurrent laryngeal nerve paresis/
G2: 2084 (54.1 16.1) unilateral vocal fold paresis
Rangarathnam et al16 4 1681 (50.86) 11 females N = 14 Muscle tension dysphonia (organic lesions
3 males excluded)
Rodriguez-Parra et al17 4 1665 (33.1 10.9) 39 females N = 42 Functional alteration or lesion in the larynx. The
3 males types of pathology included were vocal nodules,
polyps, angiomatous polyps, Reikes edema,
and hypotonic dysphonia.
Teixeira and Behlau21 4 Inclusion criteria:18 162 females N = 162 Behavioral dysphonia (organic lesions included)
50 (actual range not

van Leer and Connor26 2 G1: 41.1 12.7 26 females N = 35 Adducted hyperfunction with or without lesion
G2: 42.8 14 9 males thought secondary to hyperfunction
Vashani et al33 2 18 and older 18 females N = 32 Gastroesophageal reflux disease + having
14 males symptoms of hoarseness of voice or change of
voice quality
Watts et al35 3 2274 (46.85) 16 females N = 20 Muscle tension dysphonia (organic lesions
4 males excluded)
Wenke et al27 4 G1: 3275 (50.7 14.3) 16 females N = 17 Functional dysphonia arising from
G2: 3976 1 man musculoskeletal etiologies and/or occupational
(58.4 10.88) voice use (including vocal fold nodules)
Ziegler et al25 3 6091 (75.4 7.2) 7 females N = 16 Presbyphonia
9 males
Voice Therapy and Results
Duration of Voice-related Outcome
Study Intervention Groups Behavioral Therapy Measures Results (Statistical Significance) Results (Effect Sizes)
Alves Silverio G1: Transcutaneous 12 sessions Self-assessment: vocal In the questionnaire, 2 symptoms N/A
et al18 electrical nerve 20 minutes and laryngeal symptom improved in the TENS group (high-
stimulation (TENS) 2/week questionnaire pitched voice, P = 0.023; effort to
(n = 10) Perceptual judgment of speak, P = 0.035); 1 symptom
G2: Laryngeal Manual voice quality: improved in the LMT group (sore
Therapy (LMT) Overall, roughness, throat, P = 0.045).
(n = 10) breathiness, strain, Strain was the only parameter that

Effectiveness of Voice Therapy

instability, resonance improved significantly, and only in
the TENS group during vowel
phonation (P = 0.031). In

spontaneous speech, no significant
change was found in either
therapeutic group.
Behrman et al19 G1: Voice production 6 session Self-assessment: VHI Both groups achieved a decrease in Standardized effect
therapy (LMRVT) 45 minutes VHI scores (G1: P < 0.0001; G2: sizes from baseline
(n = 31) 1/week P = 0.018). to postself-study
G2: Vocal hygiene + 4 weeks of self- The improvement was significantly period for VHI scores:
education (n = 31) study period greater for the voice production G1: 1.01
group (P < 0.007). G2: 0.44
Constantinescu G1: Online LSVT 16 sessions Acoustic analysis: SPL, Increase in mean SPL and maximum N/A
et al23 (n = 17) 60 minutes maximum fundamental fundamental frequency range were
G2: Face-to-face LSVT 4/week frequency range evident from pre- to post- LSVT
(n = 17) Aerodynamic measure (P < 0.05) for both groups
and phonation time: combined. No significant effect was
duration of phonation found for duration of phonation.
Perceptual judgment of All perceptual measures were
voice quality: significantly improved following
breathiness, roughness, intervention (P < 0.05).
loudness level, There was no significant main effect
loudness variability, for the LSVT environment (face-to-
pitch variability face vs online).
Halpern et al24 LSVT LOUD modified 16 sessions (9 Acoustic analysis: VocSPL Significant differences from baseline N/A
through use of the sessions with Perceptual judgment of to post-therapy were found in all
Companion: SLP in the clinic voice quality: groups for VocSPL (P < 0.0001). The
G1: Immediate group and 7 sessions worse or better overall mean posttreatment listener
(n = 8) independently at Self-assessment: VHI rating score was 19.8 (SD = 14.3),
G2: Delayed group-No home) Significant others on the 50 to +50 scale.
intervention control 60 minutes assessment: CETI-M, There was significant improvement in
(n = 8) 4/week VAS almost all aspects of VAS (P < 0.05)
LSVT LOUD standard: and CETI-M (P < 0.01), but not in
Historical group (n = 13) VHI.
(continued on next page)

TABLE 2. (Continued)

Duration of Voice-related Outcome

Study Intervention Groups Behavioral Therapy Measures Results (Statistical Significance) Results (Effect Sizes)
Nguyen and Vocal Function 56 sessions Acoustic analysis: jitter, Jitter, shimmer, and HNR improved Effect sizes (Cohens d)
Kenny20 Exercises (VFE) 10 minutes shimmer, HNR and significantly in G1 after treatment from baseline to
adapted for tonal 2/day (14/week) tonal acoustic (P = 0.000) but not in G2. posttreatment
language parameters (mean F0 Tonal acoustic parameters also Mean T3
G1: Full exercises for T3-broken tone; improved only in G1 after G1: 0.64 G2: 0.06
(n = 22) mean F0, target F0, rise treatment (P < 0.05), except rise Mean T5
G2: Partial exercise- time, rise size, and rise time, which did not improve. G1: 0.35 G2: 0.05
only first part of the speed for T5-rising Mean severity scores improved Target T5

exercises (n = 18) tone) significantly after treatment only G1: 0.66 G2: 0.25
Perceptual judgment of for G1 (P = 0.000). Rise time
voice quality: G1: 0.15 G2: 0.09
rating of severity Rise size
G1: 0.62 G2: 0.32
Rise speed
G1: 1.09 G2: 0.57
Pedrosa et al34 G1: Comprehensive 6 sessions Self-assessment: V-RQOL, There was an improvement in VHI Effect size from
Voice Rehabilitation 40 minutes VHI scores, V-RQOL scores, LP baseline to
Program (CVRP) 1/week Perceptual judgment of outcomes and perceptual posttreatment
(n = 40, n = 37 after voice quality: evaluation for both groups from V-RQOL
withdrawals) degree of dysphonia baseline to posttreatment. G1: 1.09 G2: 0.86
G2: VFE (n = 40, n = 35 Visual examination: The percentage of progress was VHI
after withdrawals) laryngeal pattern (LP) higher for the CVRP group for G1:1.17 G2: 0.62
(including glottal laryngeal pattern (P = 0.003), VHI Perceptual evaluation
closure, presence and (P = 0.006), V-RQOL (P = 0.644), and G1: 0.79 G2: 0.48
size of lesion, and perceptual evaluation (P = 0.071). Laryngeal pattern

degree of supraglottic G1: 1.01 G2: 0.51
Ptok and G1: Voice exercises G1: 628 sessions Glottographic signals: CFx was reduced by 18.6% in the VE N/A
Strack22 therapy (VE) (n = 36) (mean: 15.29) irregularity index CFx group and 53.3% in the ES group,
G2: Electrical G2: 92268 (mean: Aerodynamic measure and this difference was significant
stimulation- 191) and phonation time: (P < 0.012). Neither the MPT
supported voice 12 weeks MPT increase in absolute terms (VE:
therapy (ES) (n = 33) 4.6 8.3 seconds; ES: 3.6 5.4
seconds) nor percentage change
differed significantly within or
between groups.
(continued on next page)
TABLE 2. (Continued)

Duration of Voice-related Outcome

Study Intervention Groups Behavioral Therapy Measures Results (Statistical Significance) Results (Effect Sizes)
Rangarathnam Voice therapy: Vocal 12 sessions Perceptual judgment of There were no significant differences Effect sizes (Cohens d)
et al16 hygiene + flow 2/week voice quality: in outcomes between the two from baseline to
phonation CAPE-V (only overall intervention groups. posttreatment (for

Effectiveness of Voice Therapy

G1: Voice therapy in severity was used for Pre-Post comparisons: there were both groups
person (n = 7) analysis) statistically significant combined):
G2: Voice therapy with Self-assessment: VHI improvements in the CAPE-V Overall severity: 1.73

telepractice (n = 7) Acoustic analysis: NHR, (P = 0.001), VHI (P = 0.002), mean NHR: 0.59
VTI airflow during CSP (P = 0.04) and VTI: 0.82
Aerodynamic measures MSP (P = 0.034). Changes in NHR, Mean airflow in CSP:
and phonation time: VTI, and RLaw did not reach 0.29
mean airflow for MSP statistical significance. Mean airflow in MSP:
(maximum sustained 0.27
phonation) and CSP Rlaw: 0.41
(comfortable sustained VHI: 1.24
phonation), Rlaw (ratio
of peak pressure/
Rodriguez-Parra G1: Voice-therapy 24 sessions Acoustic analysis: jitter, G1 had significantly better outcomes N/A
et al17 group (n = 21) 45 minutes spectrographic analysis than G2 at the posttreatment
G2: Vocal-hygiene 2/week Aerodynamic measures assessment for MET (P = 0.034),
group (n = 21) and phonation time: MPTS (P = 0.039), Well-Being
MPT, MET, MPTS (P = 0.000), Self-Voice (P = 0.000),
Perceptual judgment of and HYGIENE (P = 0.005).
voice quality: GRBAS Improvements in qualitative
Self-assessment: Well- dimensions (perceptual,
Being, Self-Voice, laryngoscopic, and spectographic)
HYGIENE were also in favor of direct voice
Visual examination: therapy.
amplitude, glottal
closure, mucosal wave,
periodicity, symmetry,
ventricular folds
TABLE 2. (Continued)

Duration of Voice-related Outcome

Study Intervention Groups Behavioral Therapy Measures Results (Statistical Significance) Results (Effect Sizes)
Teixeira and G1: Vocal Function 6 sessions Perceptual judgment of Considering the number of patients that N/A
Behlau21 Exercises (VFEG) 6 weeks voice quality: overall improved, VFEG had better outcomes
(n = 54) severity rating than CG in auditory-perceptual results
G2: Vocal Visual examination: size (P = 0.012) and better outcomes than
amplification (VAG) of lesion, glottal closure the CG and VAG in laryngeal results
(n = 54) Self-assessment: VAPP (P = 0.005; P = 0.05). Total VAPP score
G3: No intervention (Voice Activity and improved for VFEG (P = 0.001) and
control (n = 54) Participation Profile) VAG (P = 0.003); posttreatment means
Acoustic analysis: for VFEG and CG and for VFEG and
shimmer, jitter, NHR, F0 VAG were significantly different
(P < 0.001; P = 0.05).

In the VFEG, all acoustic parameters
improved (P < 0.001) but only F0 was
statistically different when compared
to CG and VAG at posttreatment
(P = 0.002; P = 0.013). In the VAG, only
shimmer (P = 0.009) and NHR
(P = 0.024) were significantly
van Leer and G1: Resonant Voice 4 sessions Self-assessment: VHI Average reduction in VHI scores was N/A
Connor26 Therapy (RVT) Perceptual judgment of greater for the MP4 participants than
supported with voice quality: CAPE-V for the control group. CAPE-V scores
digital videos on (overall severity only) were, on average, lower for MP4
MP4 players (n = 17) participants than for control group.
G2: RVT without MP4 However, these differences were not
(n = 18) significant.
For both groups combined, reductions
in VHI scores and CAPE-V ratings

were significant (P = 0.0001;
P = 0.001).
Vashani et al33 G1: 12 session Acoustic analysis: jitter, Shimmer and HNR improved only in the N/A
omeprazole + voice 2025 minutes shimmer, NNE voice therapy group (P = 0.001;
therapy (n = 16) 2/week (normalized noise P = 0.006). Jitter and NNE improved
G2: energy), HNR in both treatment groups, but by a
omeprazole + placebo Perceptual judgment of greater magnitude in the voice
for voice treatment voice quality: therapy group (G1: P = 0.018,
(n = 16) hoarseness, P = 0001; G2: P = 0.001, P = 0.001).
breathiness Perceptual assessment revealed
significant improvements for G1 in
hoarseness (P = 0.002) and
breathiness (P = 0.001); no significant
change was observed for G2.
(continued on next page)
TABLE 2. (Continued)

Duration of Voice-related Outcome

Study Intervention Groups Behavioral Therapy Measures Results (Statistical Significance) Results (Effect Sizes)
Watts et al G1: Stretch and flow 6 sessions Self-assessment: VHI A greater improvement from pre- to Effect sizes (Cohens d)
phonation + vocal 6 weeks Aerodynamic measures posttreatment was found for G1 in from baseline to
hygiene (n = 10) and phonation time: VHI scores (P = 0.003), MPT posttreatment for G1:
G2: Vocal hygiene MPT, s/z ratio (P = 0.013), CPP in connected speech VHI: 1.6
only (n = 10) Acoustic analysis: (P = 0.025), and CPP in vowels MPT: 1.2
cepstral peak (P = 0.017). Sentence CPP: 1.2

Effectiveness of Voice Therapy

prominence (CPP) from Vowel CPP: 1.1
sustained vowel, CPP s/z ratio: 0.5
from connected speech

Wenke et al27 G1: Intensive voice G1: 8 sessions Self-assessment: VHI Significant improvements from pre- to N/A
therapy (n = 8, n = 7 60 minutes Other: Australian therapy posttreatment were found in G1 for
after withdrawals) 4/week outcome measures the total VHI score (P = 0.008) and
G2: Standard voice G2: 8 sessions (AusTOMs) physical score (P = 0.002).
therapy (n = 9; n = 7 60 minutes No statistically significant differences
after withdrawals) 1/week between groups were found for the
All participants total VHI score or the 3 subscales at
attended a single any time point. No significant results
1-hour vocal were found for any of the AusTOMs
hygiene ratings.
session before
Ziegler et al25 G1: VFE (n = 6) 4 sessions Self-assessment: V-RQOL, Results revealed that the VFE and Effect sizes (Cohens d)
G2: PhoRTE 45 minutes PPE (perceived PhoRTE groups experienced a from baseline to
(Phonation 1/week phonatory effort) significant improvement in mean post-treatment
Resistance Training All participants V-RQOL scores (P = 0.054 and V-RQOL
Exercise) (n = 5) were briefly P = 0.049) (alpha level was set at 0.1). G1: 0.8 G2: 0.96
G3: No-intervention counseled on The control group did not PPE
control group voice hygiene demonstrate a significant change in G1: 0.76 G2: 1.06
(n = 5) mean V-RQOL scores. PPE ratings
decreased significantly in the
PhoRTE group only (P = 0.077) (alpha
level was set at 0.1).
Abbreviations: CAPE-V, Consensus Auditory-Perceptual Evaluation of Voice; CETI-M, modified Communication Effectiveness Index; CFx, irregularity index; GRBAS, Grade, Roughness, Breathiness, As-
thenia, Strain; HNR, harmonics-to-noise ratio; LMRVT, Lessac-Madsen Resonant Voice Therapy; LSVT, Lee Silverman voice treatment; MET, maximum exhalation time; MPT, maximum phonation time;
MPTS, maximum phonation time during connected speech; NHR, noise-to-harmonics ratio; RVT, resonant voice therapy; SLP, speech-language pathologist; SPL, sound pressure level; V-RQOL, Voice-
Related Quality of Life; VAS, visual analog scale; VHI, Voice Handicap Index; VocSPL, vocal sound pressure level.

Records identified through Additional records identified

database searching through other sources

N = 3265 N = 25

Records after duplicates


N = 1308

Records screened Records excluded

N = 1333 N = 1149

Full-text articles assessed Full-text articles

for eligibility excluded
8 failed criterion Ia
N = 184 10 failed criterion IIa
20 failed criterion IIIa
3 failed criterion IIIb
2 failed criterion IIIc
2 failed criterion IIId
Studies included in qualitative 112 failed criterion IVa
synthesis 4 failed criterion IVb
3 failed criterion IVd
N = 15 4 failed criterion V
1 article was not

(N = 1149)

FIGURE 1. PRISMA 2009 flow diagram.12

Voice disorders Group comparisons
Five categories of voice problems are represented in the studies: Two studies compared two different behavioral approaches25,34
functional voice disorders, dysphonia due to Parkinsons disease, whereas three studies compared one behavioral approach to an
dysphonia due to gastroesophageal reflux disease (GERD), alternative treatment method. Alternative methods consisted of
presbyphonia, and unilateral vocal fold paresis (UVFP). Dif- electrical nerve stimulation18,22 and reflux medication.33 Two
ferent terminology was used across studies to designate functional studies compared behavioral therapy to no intervention24,25 and
voice disorders (behavioral dysphonia, muscle tension dyspho- four studies compared a voice therapy group to a vocal hygiene
nia [MTD], functional alteration, and adducted hyperfunction). only group.17,19,21,35 In five articles, two conditions of a treat-
In all cases, the authors were referring to maladaptive voice use ment approach were compared. Constantinescu et al23 and
by the patients. In 6 of the 10 studies concerned with function- Rangarathnam et al16 both tested face-to-face intervention against
al dysphonia, the authors included patients with organic benign online intervention. Van Leer and Connor26 tested if treatment
lesions (nodules, prenodules, midfold swelling, polyps, angio- effect could be enhanced with the support of home practice videos
matous polyps, Reinkes edema, and vocal process on MP4. Wenke et al27 tested a standard therapy program versus
lesions).1719,21,26,27 In Alves Silverio et al18 and Berhman et als19 an intensive version of the same program and, finally, Nguyen
studies, the presence of bilateral nodules represented the main and Kenny20 tested a vocal function exercise (VFE) treatment
criterion for inclusion. with a partial administration of the same treatment.

Voice therapy Frequency and duration of interventions

Content All studies reported the number of voice therapy sessions and
Seven specific behavioral voice therapy approaches emerged from 11 reported the treatment frequency (number of sessions per
the review (Table 3). When the treatment comprised an amalgam week). The frequency was generally once or twice a week, with
of direct techniques, these were listed in the description column some exceptions. In the Lee Silverman Voice Treatment (LSVT)
of Table 3. studies, participants attended therapy four times a week.23,24 In
Interventions Classified by Disorder Category
Disorder Category Intervention Description
Functional voice disorder Laryngeal Manual Therapy28
Resonant Voice Therapy/ Lessac-
Madsen Resonant Voice Therapy29
(Stretch and) Flow phonation35
Vocal Function Exercises30
Comprehensive voice rehabilitation Focuses on five characteristics: body-voice
program31 integration, glottal source, resonance, coordination
of subsystems, communicative attitude
Direct techniques17,27 Yawn-sigh, chewing, pitch variation and control,
elimination and reduction of loudness, elimination
of glottal attacks, and voice placing17
Stretch, resonant voice, sob, twang, silent giggle,
onset of tone, gentle onset27
Parkinson disease-related Lee Silverman Voice Treatment32
voice disorder
GERD-related voice Direct techniques33 Relaxation and breathing exercises, and facilitating
disorder techniques such as yawn-sigh, glottal fry, chewing
exercises, chant talk, humming
Presbyphonia Vocal Function Exercises30
PhoRTE (Phonation Resistance
Training Exercise)25
UVFP Direct techniques22 Therapists were asked to follow the guidelines
outlined by Schwartz, Stengel, and Strauch
(Schwartz et al, 1998; cited in Reference 22)
Abbreviations: GERD, gastroesophageal reflux disease; UVFP, unilateral vocal fold paresis.

Wenke et als27 study of an intensive voice therapy program, the Outcome measures
intensive condition group also received four intervention ses- Outcome measures related to voice were grouped in six
sions per week. The experiment with the highest frequency was categories (Table 4). The most common categories of outcome
Nguyen and Kennys20 study on female teachers with MTD. Par- measures were self-assessment (present in 11 studies) fol-
ticipants had to participate in 10-minute practices of adapted VFE lowed by perceptual judgment of voice quality (present in 10
twice daily, for a total of 56 sessions. This largely exceeds the studies) and then acoustic analysis (present in eight studies).
average number of sessions across the other studies, which was Five studies reported aerodynamic measures or phonation time
10.5. Only in one protocol did the number of sessions vary among and three considered visual examination of the larynx in their
the participants, ranging from 6 to 28 sessions.22 results.
Each session had a mean duration of 40.75 minutes, calcu- The specific outcome measures found in each category are
lated using data from the 10 studies that provided length listed in Table 4. The most common were Voice Handicap Index
information. Among these 10 studies, the shortest session du- (VHI) and overall severity or degree of dysphonia, present in
ration had a length of 10 minutes and the longest had a length seven of the reviewed articles. Maximum phonation time (MPT)
of 60 minutes. The median was 45 minutes. By combining the and perceptual judgment of breathiness were reported in four
session durations and frequencies, the total amount of interven- studies. Jitter and harmonics-to-noise ratio (HNR) or noise-to-
tion received by the participants could be obtained. The subjects harmonics ratio (NHR) were also measured in four studies.
in Rodriguez-Parra et als17 study spent the most time in therapy Shimmer and perceptual judgment of roughness were reported
(18 hours), whereas those in Ziegler et als25 study spent the least in three studies. Results associated with the voice outcomes are
time in therapy (3 hours). The average obtained from the avail- reported in Table 2.
able duration and frequency data was 8.73 hours of therapy.
Only one article provided detailed information regarding treat-
ment intensity, in terms of the difficulty level to which exercises Outcomes of therapy
were executed.35 In Watts et als35 study, the gradation is clearly In general, studies demonstrated that voice therapy is superior
illustrated for both the complexity of the vocal task (from voice- to a vocal hygiene program or no intervention in treating dif-
less to dialogue) and the degree of muscle activity (from sigh ferent types of voice disorders. Results also showed that online
to reduced flow). In the study, the criterion for progression to and intensive modalities of voice therapy could represent alter-
the next level was standardized to 90% of correctness on 10 con- native service delivery methods.27 Specific details on voice therapy
secutive trials. outcomes are provided below.
Outcome Measures (Reported in the Reviewed Studies)
Category Outcome Measures
Self-assessment Vocal and Laryngeal Symptom Questionnaire
Voice Handicap Index
Voice-Related Quality of Life
Voice Activity and Participation Profile
Perceived Phonatory Effort
Perceptual Overall severity/severity rating/degree of dysphonia
judgment of Hoarseness
voice quality
Loudness level
Loudness variability
Pitch variability
Worse or better
Acoustic analysis (Vocal) sound pressure level (SPL)
Maximum fundamental frequency range
Harmonic-to-noise ratio (HNR)/noise-to-harmonics ratio (NHR)
Voice Turbulence Index (VTI)
Fundamental frequency (F0)
Normalized noise energy (NNE)
Cepstral peak prominence (CPP)
Tonal acoustic parameters (mean F0, target F0, rise time, rise size, rise speed)
Spectrographic analysis
Visual Glottal closure
examination Presence of lesion
Size of lesion
Degree of supraglottic constriction
Mucosal wave
Aerodynamic Maximum phonation time
measures and Mean airflow for maximum sustained phonation
phonation time
Mean airflow for comfortable sustained phonation
Ratio of peak pressure/airflow (Rlaw)
Maximum exhalation time
Maximum phonation time during connected speech
s/z ratio
Other Significant others assessment: modified Communication Effectiveness Index, visual analog scale
Glottography: irregularity index (CFx)
Australian therapy outcome measures (AusTOMs)
The studies included in this review reveal that many differ- at 6 months posttreatment in one study.24 In one study, post-
ent voice treatments have been shown to lead to positive results treatment evaluations were carried out twice, at 1 month and 4
for participants with functional dysphonia (with or without organic months after the end of the treatment.17 Berhman et als19 ex-
lesions). VFEs good outcomes have been demonstrated in three periment was the only one in which participants had to follow
studies20,21,34 and resonant voice therapy (RVT) has been shown specific guidelines during the period between the end of treat-
to be successful, beyond vocal hygiene alone or no interven- ment and the follow-up assessments, described as the self-
tion, in two studies.19,26 Flow phonation exercises were also proven study period. All five studies found that outcomes observed at
to lead to greater improvements than vocal hygiene35 and their follow-up remained improved when compared to baseline. More-
online application was supported in Rangarathnam et als16 study, over, Rodriguez-Parra et al17 and Berhman et al19 found that further
with MTD patients. In Pedrosa et als34 investigation, results improvements were made over time for some of the outcome
demonstrated that the Comprehensive Voice Rehabilitation variables (VHI, perceptual judgment of voice quality, ampli-
Program (CVRP) was also a useful treatment approach for func- tude of vibration, and glottal closure). However, significant
tional voice disorders. Finally the positive effects of voice therapy reductions in sound pressure level and in the always loud item
were supported by Wenke et al27 and Rodriguez-Parra et al17 of a voice questionnaire were found at 6 months follow-up in
who used an amalgam of direct voice techniques as the treat- Halpern et als24 experiment. These declines were small and mea-
ment approach in their studies. Conversely, Laryngeal Manual surements from pre- to long-term follow-up were significant
Therapy (LMT) alone did not improve the voice quality in pa- despite them. A slight reduction was also noticed in perceptual
tients with bilateral benign lesions and appeared better suited judgments of voice quality at 1 month posttreatment for the VFE
as a complementary approach to other voice therapy techniques.18 group in Pedrosa et als34 study, as well as in some of the self-
The authors of the LMT study mentioned that participants re- assessment measures in Rodriguez-Parra et als17 studies. The
mained silent during the digital manipulations and they suggested statistical significance of these changes was not discussed.
the addition of vocal training in the intervention protocol for
better results. DISCUSSION
Results of the studies also supported the use of voice therapy The aim of this review was to assess the literature on voice therapy
for other types of voice disorders. VFE and Phonation Resis- administered by SLPs to treat voice disorders, to provide clini-
tance Training Exercise (PhoRTE) were tested by Ziegler et al25 cians with an overview of evidence-based voice therapy
with presbyphonic patients, who responded positively to both techniques. Fifteen studies were included and all but one dem-
treatments. As for patients with Parkinsons disease treated with onstrated that voice therapy can lead to statistically significant
LSVT, the noninferiority hypothesis of an online condition com- improvement in at least one outcome measure. All articles sup-
pared to a face-to-face condition was supported in Constantinescu ported the use of voice therapy as a primary approach, with the
et als23 study for patients with mild-moderate dysarthria. More- exception of one that described LMT as a complementary method
over, Halpern et al24 demonstrated that the use of the Companion that should be used with other voice therapy techniques, for pa-
(a computer program) to support LSVT treatment led to thera- tients with nodules.18 Moreover, one study found that treatment
peutic gains similar to standard LSVT. Voice exercises alone were outcomes in participants with UVFP were enhanced when voice
shown to be less successful than voice exercises paired with elec- therapy was combined with electrical stimulation, based on one
trical stimulation to treat UVFP in Ptok and Stracks study22; outcome measure.22
however, the exact exercises that were used in their protocol are Despite these encouraging findings, the results of the re-
not known. Finally, Vashani et al33 have found superior improve- viewed articles need to be interpreted carefully. Most importantly,
ments in the voice therapy and medication group when compared some thought must be given regarding the actual definition of
to the medication only group for GERD-related dysphonia. effectiveness in the voice literature.
Conclusions of the studies were mostly based on the attain-
ment of statistical significance for at least one voice parameter. Outcome measures
Statistical analysis was performed and P values were reported The definition of an effective voice therapy depends highly on
in all the studies, either to compare pre- and posttreatment scores the measurement methods that are used to assess treatment out-
within groups, posttreatment scores across groups, or pre- to comes. If treatment outcomes are measured in a standardized
postchanges across groups. In six studies, standardize effect sizes manner that relates to a clinically meaningful change, then an
were also reported16,19,20,25,34,35 (Table 2). When reporting con- evaluation of effectiveness is relatively straightforward. However,
clusions, eight authors used the terms effectiveness or the results of this review reinforce the common belief in the field
effective,17,1922,26,33,34 whereas two described the results in terms that our use of and interpretation of voice outcome measures could
of efficacy. 20,24 In the other studies, authors referred to be refined and standardized to allow for a more clinically mean-
non-inferiority,23 complementary treatment method,18 utility,16 ingful interpretation of treatment study results.
practical clinical effect,35 and viable delivery option.27 In fact, the results of this systematic review confirm that there
is a large amount of inconsistency among outcome measures used
Duration of treatment effects in the studies. This may be due to the multidimensional nature
Five studies reported long-term outcomes in addition to imme- of voice.36 The search for an optimal combination of outcome
diate posttreatment results.17,19,24,27,34 Follow-up assessments were measures that would allow for accurate and reliable measures
performed at 1 month posttreatment in three studies19,27,34 and of voice therapy effectiveness has been undertaken by many
researchers.37 However, no standardized protocol has been unani- the included studies, are hoarseness, instability, and reso-
mously adopted. Hopefully, recent advocacy through scientific nance, although comments on these measures can be included
meetings and publications will change this.38 in the CAPE-V.

Self-assessment questionnaires
Visual examination
At least one self-assessment questionnaire was administered in
Although visual examination was used in most of the proto-
73% of the studies, which makes it the most popular outcome
cols to identify and/or confirm the participants diagnosis for
measure category. In total, eight different questionnaires were
inclusion in the study, it was used in only three studies to assess
employed, and they all led to the detection of a significant im-
the changes following therapy and for comparison across treat-
provement in at least one of their subcategories. Self-assessment
ment groups.17,21,34 This could be explained by the lack of
measures include, but are not limited to: the VHI, the V-RQOL,
standardization concerning visual examination of the larynx,
the Voice Activity and Participation Profile, and perceived pho-
leading to challenges in its use for assessing voice therapy out-
natory effort (PPE). The VHI was the most commonly used and
comes despite it being the only direct assessment of the anatomy
was present in seven studies.
and physiology of the larynx. As a matter of fact, none of the
It is widely acknowledged that the patients perspective on
authors employed a standardized instrument or a previously pub-
his/her voice disorder and its improvements following treat-
lished scale for the assessment of laryngeal features.
ment are valuable to monitor.36 Unlike instrumental assessments,
All three studies based their visual examination on glottal
self-rating questionnaires assess the patients feelings and voice-
closure, two of them also evaluated the presence or size of
related quality of life and can reveal everyday voice patterns.36
lesion,21,34 and two of them observed the degree of supraglottic
Even though this subjective evaluation method does not provide
constriction.17,34 Finally, Rodriguez-Parra and his colleagues17
information on the physiological changes, it can be used to
are the only ones to have considered amplitude, mucosal wave,
measure treatment efficacy in reducing the patients perceived
periodicity, and symmetry in their visual examination. Whereas
handicap. This is also one of the easiest and least time-consuming
Pedrosa et al34 summed the scores of the individual parameters
measurements used.
into a single variable, named laryngeal pattern, the two other
studies17,21 based their results on individual features. Improve-
Perceptual judgment of voice quality
ments were found in all three studies; however, statistical
Perceptual judgment of voice quality was used in 66% of the
significance was not assessed in Rodriguez-Parra et als17
studies. Even though standardized instruments are available to
measure parameters of voice quality, Consensus Auditory-
Perceptual Evaluation of Voice (CAPE-V) and Grade, Roughness,
Breathiness, Asthenia, Strain (GRBAS) were only employed in Acoustic analysis
three of the reviewed studies.16,17,26 The CAPE-V instrument allows Acoustic analysis was used in half of the reviewed articles. Of
for the judgment of voice quality in terms of six parameters: the four studies that included jitter and HNR or NHR as outcome
overall severity, breathiness, roughness, strain, pitch, and loud- measures, three found statistical differences between the pre- and
ness. However, the two studies16,26 that used CAPE-V only posttreatment results in the experimental group.20,21,33 Those same
included the parameter overall severity in their analysis. Overall studies also found significant differences in shimmer values. Both
severity is the feature that has previously been associated with studies that used measures of vocal loudness in their analysis
the highest reliability39 and is also the one that most authors used obtained significant results.23,24 Normalized noise energy (NNE),
in the reviewed studies. Different terms were used across studies voice turbulence index (VTI), and fundamental frequency (F0)
to describe this feature, such as rating of severity, degree of dys- were each assessed in only one study, respectively.16,21,33 NNE
phonia, and grade. It is interesting to note that all studies, except and F0 detected significant improvement following voice therapy,
one, that involved a statistical analysis on this parameter re- whereas VTI did not.
vealed a significant improvement from pre- to posttreatment or This review revealed that acoustic measures detected signif-
between treatment groups. As for the GRBAS instrument, it in- icant improvements in most of the experiments where they were
cludes five features to describe voice quality: grade, roughness, utilized.20,21,23,24,33,35 These results do not coincide with Halawa
breathiness, asthenia, and strain. This measure was used in only et als40 study, in which acoustic parameters did not signifi-
one study,17 in which an increase in the number of subjects with cantly differ between the participants who clinically improved
better-quality ratings was observed. No statistical analysis was and those who did not, based on the overall results of percep-
conducted on perceptual judgments of voice quality in this study. tual judgment of voice quality, visual examination, and VHI
Aside from in the three studies mentioned above, most in- questionnaire.40 Future refinement of acoustic analysis mea-
vestigators did not employ a standardized instrument to assess sures that reflect voice quality in both sustained vowels and
changes in voice quality. However, most of them assessed voice continuous speech segments will likely improve the diagnostic
characteristics that are also present in the CAPE-V and/or GRBAS validity of these measurements.41 For example, future use of
instruments, which include breathiness, roughness, strain, loud- cepstral measures may aid in improving the reliability of the de-
ness level and variability, pitch variability, and overall severity. tection and quantification of voice treatment outcomes. In fact,
The only additional parameters that are not present in the stan- Maryn et al41 found that cepstral peak prominence was the main
dardized instruments, but were used as an outcome measure in predictor of overall voice quality. In this review, cepstral peak
Maude Desjardins, et al Effectiveness of Voice Therapy 15

prominence was used as an outcome measure in only one of the as the smallest change or difference in an outcome measure
15 reviewed studies.35 that is perceived as beneficial and would lead to a change in
the patients medical management.44 However, finding consen-
Aerodynamic measures and phonation time sus for such thresholds can be challenging and assessment
Aerodynamic measures and phonation time were used in a third instruments that provide cutoff values for clinically relevant
of the reviewed studies. The most common one, MPT, de- changes are rare in the voice field. Conclusions of voice studies
tected a significant difference between the experimental and are therefore often based exclusively on statistical signifi-
comparison groups in one out of three studies,35 revealing an effect cance, although some articles in this review also reported effect
of stretch and flow phonation on MTD patients. Even though sizes, which give an estimate of the magnitude of the change.
Rodriguez-Parra et als17 groups did not differ in terms of MPT, While P values and effect sizes are both important and provide
a significant difference was found between the voice therapy group relevant information concerning the effect of the intervention,
and the vocal hygiene group when compared using maximum they do not establish the clinical value of the results.43 For the
exhalation time and MPT during connected speech. Rangarathnam purpose of this review, and to facilitate the comparison between
et al,16 when comparing pre- and posttreatment aerodynamic out- interventions, results have been reinterpreted in light of their
comes in MTD patients treated with flow phonation, found that clinical significance.
mean airflow during comfortable and maximum sustained pho-
nation improved significantly, but laryngeal resistance did not Functional dysphonia
reveal the same statistically significant improvement. Intervention comparisons
Aerodynamic measures and phonation time provide valu- The heterogeneity in the outcome measures makes it challeng-
able information about treatment effect, but they should ing to compare interventions across different studies. For this
nonetheless be interpreted carefully. For example, MPT is often reason, comparisons were made exclusively between studies using
the result of more than one physiologic component, such as re- the same outcome measures.
spiratory muscle activity, integrity of vocal fold tissue, glottal
competence,42 or phonatory style (pressed, balanced, breathy), VHI
and a lack of its improvement does not necessarily imply that Six out of the 10 studies concerning functional dysphonia used
the treatment had no impact on the patient. As an example, it the VHI questionnaire to assess intervention outcomes. VHI is
is possible that the respiratory muscle activity was improved, a questionnaire that measures the degree of handicap experi-
but not enough to overcome poor glottal competence. This is enced by the patient related to voice use. It is composed of 30
why all physiologic components that are involved in voicing items in which the degree of perceived handicap is rated on a
should be monitored closely and taken into consideration to make 5-point scale, from 0 (no problem) to 4 (always a problem).45
an accurate interpretation of aerodynamic outcomes and pho- A reduction of 18 points in the total score has been associated
nation times. with a significant clinical improvement.45 Moreover, Behrman
et al19 determined, after calculating normative data with a sample
Outcome Measure Sensitivity to Change of 100 healthy subjects, that 11.5, the upper limit of the 95%
In the articles assessed, some measures have been found to be confidence interval, was the cutoff value for a score expected
more sensitive to change following treatment. Self-assessment from a vocally healthy subject. This value is consistent with Arffa
and acoustic measurements have both improved between 85% et als46 normative study, which found that a VHI score over 11
and 90% of the time, whereas perceptual judgments of voice should be considered abnormal.
quality and aerodynamic measures or phonation time signifi- Although significant improvements in VHI scores were found
cantly improved approximately 50% of the time. It is difficult for every group of functional dysphonia patients undergoing be-
to determine if the perceptual judgment of voice quality and aero- havioral voice therapy in this review, none of them reached a
dynamic measures or phonation time is more difficult to change normal VHI score at posttreatment assessment. The group that
with treatment or if the variability in these outcome measures was closest to the 11.5 cutoff value at the end of the treatment
are obscuring the results. Furthermore, these percentages are based was the one studied by Rangarathnam et al,16 treated with flow
on statistically significant results and do not necessarily reflect phonation (onsite condition). Flow phonation also appeared to
a meaningful clinical improvement in the participants voices. be the intervention leading to a greater VHI score reduction
Conversely, a participant may have had a meaningful improve- mean,16,35 followed by RVT,19,26 CVRP,34 Wenke et als27 voice
ment in his/her voice, but it may not have reached the level of therapy program, and, lastly, VFE.34 However, because initial se-
statistical significance. verity can influence the ability to decrease VHI scores, baseline
mean scores were compared across groups (online and telepractice
Clinical significance groups as well as standard and intensive groups of the same treat-
Whereas statistical significance is used to answer the question ment were combined). Means varied between 42.5 and 45.5, with
Do the groups differ?, clinical significance helps interpret the exceptions of Watts et als35 flow therapy group that had a
the meaning of the change.43 In other words, was the improve- baseline mean score of 62.3 and Pedrosa et als34 VFE group
ment important enough to be beneficial for the patient? One that had a baseline mean score of 32.52. It is possible that the
central concept that is utilized to answer this question is the high VHI baseline score in Watts et als group could explain the
minimal clinically important difference, which can be defined larger score reduction when compared to other treatment groups.
Perceptual judgment of voice quality periority of flow phonation when compared to other intervention
Of the seven studies on functional dysphonia that included a per- methods. These results should be nuanced with the fact that none
ceptual judgment of voice quality, two used the CAPE-V scale of the approaches led the participants to a normal NHR value.
(0100) as the rating instrument,16,26 one used the GRBAS scale,17 In summary, a more robust study of the effect of flow phona-
one used a 5-point ordinal scale,20 one used a visual analog scale tion is important to clarify these results. Moreover, comparison
ranging from 0 to 100,34 and two studies classified the partici- with other treatments should be made while controlling for base-
pants in terms of improvement or nonimprovement.18,21 line voice characteristics of the participants.
Comparisons between studies using a 0100 scale revealed
that flow phonation leads to the largest reduction in overall se- Therapy components for functional dysphonia
verity score, followed by CVRP, VFE, and RVT. VFE and RVT All intervention programs described in the studies of function-
resulted in similar severity score reductions. Moreover, groups al dysphonia, with the exception of the LMT study,18 included
of patients who underwent flow phonation therapy16 and RVT26 aspects of vocal function and respiratory support and coordi-
reached a mean severity score within the normal range when using nation, as described in Van Stan et als7 taxonomy of voice
norms from a study in which dysphonic and nondysphonic par- therapies. Most intervention programs also included somato-
ticipants were assessed using the CAPE-V.47 Comparisons of the sensory elements, mostly related to discrimination (eg, placement,
studies reporting percentages of participants who improved after semi-occluded vocal tract), and used a modification of the au-
therapy revealed a slight difference between LMT and VFE. The ditory input (eg, pitch and loudness monitoring, voice quality
percentage of participants who were rated as better after LMT effects).7 The CVRP intervention34 and the treatment described
varied between 30% and 40%, depending on whether the judg- in Rodriguez-Parra et als17 study contained musculoskeletal el-
ment was made on sustained vowels or continuous speech.18 The ements related to postural alignment (body posture) or oral
overall severity of 27.8% of the participants treated with VFE modification (eg, yawn-sigh, chewing). However, even with well-
improved following the intervention, according to the blinded defined exercise programs such as VFE and Lessac-Madsen
judges.21 The voice therapy program used in Rodriguez-Parra Resonant Voice Therapy (LMRVT),29 it is not possible to de-
et als experiment resulted in 33% of the participants reaching termine exactly which component had a positive effect on the
a normal score on the Grade parameter of the GRBAS scale. voice outcomes. In all cases, efficacious voice therapy pro-
grams included elements of vocal function, respiratory support
Acoustic analysis and coordination, as well as somatosensory feedback. The role
Three studies on functional dysphonia measured the outcome of musculoskeletal elements of therapy on voice outcomes still
of voice therapy using jitter.17,20,21 However, comparisons could needs more investigation, because it was used in very few treat-
not be made on the basis of this parameter because one study ment approaches. Two studies that included postural alignment
reported only the posttreatment values20 and in the other two or oral modification in their behavioral therapies obtained clin-
studies the participants were already within a normal range at ical improvements. 17,34 However, the study using digital
baseline assessment. In addition, the type of jitter that was mea- manipulation as the only approach (LMT) did not lead to con-
sured is only mentioned in Nguyen and Kennys study,20 in which clusive results.
frequency perturbation was used.
A comparison could be made between flow phonation and VFE Other voice disorders
based on two studies that evaluated treatment outcomes with The results of this review also depict the positive effects of voice
NHR.16,21 Results revealed that no statistical difference was found therapy on presbyphonia, UVFP, and on voice disorders asso-
in the flow phonation group16 and that a significant improve- ciated with GERD and Parkinsons disease. The interpretation
ment was found in the VFE group following treatment.21 However, of these results was fraught with challenges similar to those pre-
in both studies, none of the experimental groups reached a normal viously mentioned, in particular regarding the assessment of their
value, based on the reference value provided by the software Com- clinical significance.
puterized Speech Lab 4500. It is also noteworthy that the absolute
change in NHR in the flow phonation16 study was higher than Presbyphonia
in the VFE21 study. The sample size in the former study was sub- Ziegler et als25 study on presbyphonia is one of the few that
stantially smaller than in the latter (14 and 54, respectively), and compared the effect of two behavioral approaches: VFE and
this could explain why no statistical significance was found despite PhoRTE. Those interventions both comprise elements of vocal
a greater absolute change. function (pitch range and flexibility), respiratory support (ab-
dominal breathing), and auditory monitoring (pitch monitoring).7
Based on the analysis of the VHI and overall severity scores, The main element that differentiates the treatments is loudness
flow phonation seems to be the most successful approach for modification and monitoring, present only in the PhoRTE in-
the treatment of functional dysphonia. However, Rangarathnam tervention. Both groups of participants improved their score on
et als16 study on flow phonation should be interpreted careful- the V-RQOL questionnaire. However, based on the instru-
ly considering the small sample size for each group (n = 7). The ments validation study, an increase of 15 to 20 points in V-RQOL
articles tables showed the presence of outliers that may have raw score is correlated with an improvement of perceived voice
influenced the results, leading to important score reduction in quality (fair to good, good to excellent).48 Neither VFE nor
VHI and CAPE-V. This could partly explain the observed su- PhoRTE led to a change of this magnitude, even though the im-
Maude Desjardins, et al Effectiveness of Voice Therapy 17

provement was statistically significant in both cases. It is better voice outcomes than reflux medication alone is based on
noteworthy that participants in Ziegler et als25 study had lower the analysis of four acoustic parameters (jitter, shimmer, NNE,
baseline scores than participants in the validation study,48 which and HNR) and two perceptual parameters (hoarseness and
may have had an impact on the observed reduction in scores from breathiness). NNE outcomes reached the normal cutoff value in
pre- to posttherapy. Treatment outcomes were also measured the voice therapy group. However, jitter values were already within
through perceived phonatory effort, a self-assessment outcome normal values at baseline and shimmer values were very close
for which only the PhoRTE group showed significant changes. to normal. Improvements in perceived voice quality were dif-
However, both groups approached a perceived phonatory effort ficult to verify from a clinical perspective because the authors
close to a comfortable level. In light of these considerations, did not use a standardized assessment protocol. Additionally, it
further studies are needed to be able to definitively state that is not possible to determine which of the voice therapy com-
PhoRTE is more effective than VFE in patients with presbyphonia. ponents were responsible for the improvement, because the
Nevertheless, one of the strengths of this article rests in that the intervention program was inclusive of many different treat-
hypothesis and interpretation of the results are based on a phys- ment techniques.
iologic model that helps the reader understand and compare the
underlying mechanisms targeted by the two different interventions. Parkinsons disease
The studies on Parkinsons disease-induced dysphonia23,24 tested
Unilateral vocal fold paresis specific conditions of a treatment already well supported by the
Ptok and Strack22 compared voice therapy alone to voice therapy literature, LSVT. LSVT has been shown to improve vocal loud-
combined with electrical stimulation in participants with UVFP. ness in patients with Parkinsons disease and therefore reduce
The causes of the paresis were not specified in the study. Ef- the functional impact of hypokinetic dysarthria on
fectiveness was defined as a decrease in vocal fold vibration communication.23,32 The LSVT studies included in this review
irregularity and an increased MPT. The results indicate that voice tested the feasibility and noninferiority hypotheses of technology-
therapy combined with electrical stimulation is more effective assisted interventions. Contantinescu et al23 demonstrated that
than voice therapy alone, based on significant differences between an online modality of the treatment led to outcomes compara-
groups on one outcome measure: irregularity index (CFx). ble to those obtained with face-to-face LSVT. Halpern et al24
However, MPT changes were not significant within or between established that the use of an interactive computer program was
groups. When compared to Zraick et als42 normative data, the effective to support at-home practice sessions. These alterna-
posttreatment MPT means for both groups are lower than the tives may help overcome treatment accessibility barriers such
means obtained in healthy adults, females and males, meaning as geographical and mobility constraints, the shortage of LSVT-
that none of the groups reached a normal phonation duration fol- certified SLPs, and costs associated with standard treatment.23,24
lowing the intervention. This study is an interesting example of
how the choice of outcome measure and the importance that is Temporal variables
assigned to it can have a great impact on the conclusion that is No formal guidelines are currently available concerning optimal
drawn by the investigators: whereas CFx outcomes indicate an duration, frequency, and intensity of voice therapy. However,
advantage for the group that received electrical stimulation in average session length and treatment frequency obtained from
addition to voice therapy, MPT outcomes indicate that voice the included studies are consistent with those published in a recent
therapy supported by electrical stimulation does not lead to better systematic review on temporal variables of voice therapy.49 De
results than voice therapy alone. Moreover, Ptok and Stracks22 Bodt et al49 analyzed data from a total of 140 studies and cal-
design did not include a no-intervention control group. The pos- culated an average of 10.87 sessions with a duration of 30 minutes
sibility that voice improvements could have happened because in 36.36% of the articles and a duration of 60 minutes in 27.27%
of spontaneous recovery can therefore not be rejected, especial- of the articles. Frequency was mostly once or twice a week.
ly because the participants onset of paresis occurred within 6 However, the review did not assess treatment outcomes and, there-
months before treatment started. Lastly, the voice therapy tech- fore, no correlation was established between temporal variables
niques used in the study are not described, making it difficult and intervention efficacy. Moreover, because most of the thera-
to interpret the results. Further studies are needed to evaluate pies were conducted following a fixed study design, the number
the clinical effect of voice therapy with and without electrical of sessions was predetermined and does not necessarily reflect
stimulation on UVFP. the typical clinical setting or the full potential for positive ther-
apeutic outcomes.49 This observation also applies to the present
GERD-related voice disorders review. It is possible that treatment outcomes would be differ-
Vashani et al33 tested if reflux medication could improve voice ent in daily clinical practice, where many factors determine
quality and if a further improvement could be made by adding treatment duration, such as diagnosis, insurance policy, otolar-
voice therapy. Voice therapy consisted of vocal hygiene and in- yngologists recommendation, distance from clinic, and patients
tervention covering a broad range of technique categories: voice demands (eg, voice professional).49
musculoskeletal (neck rotation, laryngeal massages, yawn- The aspect of intensity was not addressed by most of the studies
sigh, chewing); respiratory (breathing exercises); somatosensory included in the review, although some of them mentioned that
(humming); and vocal function (glottal fry). The authors con- the therapy program followed a task hierarchy. Voice exercises
clusion that medication combined with voice therapy generates should be executed at a difficulty level that is sufficient to induce
18 Journal of Voice, Vol. , No. , 2016

change. Although hard to measure and variable across pa- of the same treatment on two very different categories of voice
tients, intensity is a component of voice therapy that is believed disorders, presenting with distinctive physiological manifesta-
to impact voice outcomes. Thus, it should be taken into con- tions. It is possible that this treatment could be optimized for
sideration in voice therapy studies. one specific group of patients and result in even better outcomes.

Limitations Literature
Studies This review assessed the body of literature on voice therapy since
This review was limited to randomized controlled trials and, there- January 2007 and provided insights for clinical practice and future
fore, included studies with a high level of evidence. This is not research. Firstly, evidence was not found for all categories of
a guarantee of flawless methodology and some recurring weak- patients and voice disorders: behavioral treatments for paradox-
nesses were noted across studies. Some of the common limitations ical vocal fold movement disorders, puberphonia, and dysphonia
included the following: small sample sizes, the absence of a no- in children have not been tested in a published randomized con-
intervention control group, and the absence of an experimental trolled trial in the last 9 years. As for voice disorders that have
control group to rule out placebo effect. The lack of internal va- been studied, it remains unrealistic for clinicians to easily find
lidity was an issue for studies in which interventions were the best intervention for a specific disorder, for many reasons.
conducted by more than one SLP. Another threat for internal va- First of all, there is a lack of standardization in outcome mea-
lidity concerns the intervention itself: some studies were based sures that prevents meaningful comparisons between studies.
on a very specific protocol whereas others used a more flexible Secondly, even when the outcome measures are the same, there
approach, varying exercises and duration of each treatment phase are major discrepancies in the way results are reported. For
according to the individual participants needs. A less rigid therapy example, some authors may report results by comparing the per-
protocol may enhance the external validity of the study and lead centage of change for a certain outcome, whereas others might
to conclusions that can be generalized to different clinical set- compare posttreatment values across groups or report the number
tings. However, it can also decrease the studys internal validity of patients that improved and did not improve following inter-
and affect the results cogency, which is why a proper balance vention. A third challenge for clinicians was highlighted in this
must be sought. This could be attained by a clearer delineation review: clinical significance of the results is very rarely ad-
between early phase studies that focus on biological mecha- dressed. Instead, main interpretations are often based on statistical
nisms of action and efficacy of voice therapy, and mid to late- significance and effect sizes, when available. Normative and cutoff
development studies that focus on the effectiveness of a more values for voice outcome measures are frequently nonexistent
individualized voice therapy protocol. or difficult to locate, adding to the challenge for a clinician who
The lack of long-term follow-up is another limitation of most would want to interpret the studys results with a clinical per-
of the studies. In fact, 10 out of 15 studies did not perform follow- spective. This underscores the necessity for the publication of
up evaluations and limited the analysis to values obtained more minimal clinically important difference values for voice
immediately following the end of treatment. However, in the outcome measures. In sum, the literature is successful in re-
studies that did assess long-term outcomes, the follow-up periods vealing which interventions result in statistically significant
were limited to 1, 4, or 6 months. Studies that focus exclu- changes, and in many cases the effect sizes related to the changes,
sively on long-term evaluation of voice therapy are present in but not in informing the reader about the clinical significance
the literature but have been excluded from this review, because of these results. This is an important distinction because it reveals
it is important that both short-term and long-term outcomes be an opportunity for the voice research community to make the
reported to determine the impact of a voice treatment. scientific literature more informative and applicable to clinical
Most of the studies did not control for the participants mo- practice.
tivational characteristics and adherence to therapy. Behrman et al19 Despite those limitations, significant progress was made in
found that patients who were more compliant with the treat- voice research since the last published reviews.6,8 One of the main
ment plan had better voice outcomes when compared to non- critiques made by Ruotsalainen et al6 was the unspecific de-
adherent patients. Ziegler et als25 experiment confirmed the fact scription of the interventions, often labeled with generic terms.
that therapy outcomes were a result of biomechanical learning This observation seems to have been taken into consideration:
and adherence.29 Furthermore, van Leer and Connors26 study most of the reviewed studies depicted the treatment compo-
revealed that self-efficacy and therapeutic alliance were predic- nents, permitting a better understanding of the mechanisms
tors for adherence to voice therapy. These results stress the underlying the observed changes. Unfortunately, few articles dis-
importance of controlling for patients motivational and behav- cussed the underlying physiological processes in their hypothesis
ioral characteristics in addition to their demographic and voice or when interpreting the experimental results. Future research
characteristics. should continue to investigate the physiological underpinnings
As a final point, the broad definition of the voice disorder, par- responsible for changes induced by various voice therapy
ticularly in studies concerning functional dysphonia, led to techniques.
considerable heterogeneity across participants. For example, In his review, Speyer8 stated that no definitive conclusion on
Rodriguez-Parra et als17 study included diagnoses of both hy- voice therapy effectiveness could be drawn because of the re-
perfunctional (vocal nodules, polyps, angiomatous polyps, Reikes stricted number of experimental studies and their poor
edema) and hypofunctional dysphonia, therefore testing the effect methodological quality. The present review included 15 ran-
Maude Desjardins, et al Effectiveness of Voice Therapy 19

