Published in:
Journal of the Acoustical Society of America
DOI:
10.1121/1.402397
Published: 01/01/1991
Document Version
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
• A submitted manuscript is the author’s version of the article upon submission and before peer-review. There can be important differences
between the submitted version and the official published version of record. People interested in the research are advised to contact the
author for the final version of the publication, or visit the DOI to the publisher’s website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal ?
Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
97 d. Acoust. Sec. Am. 90 (1), July 1991 0001-4966/91/070097-06500.80 ¸ 1991 Acoustical Society of America 97
I
the two, he concludesthat, "wheneverintervals in pitch Rise
mustbecomparedat differentfrequencies, a logscaleis to be
preferred."Only Traunmiilleret al. ( 1989) havesofar con-
sideredthepossibilitythat pitchmovements in speechinton-
ation may bestbe expressedin a scalederivedfrom the fre- I { I I [ I { I
Rise-fall
quency selectivity of the auditory system. Following
Graddol (1986), theyshowedpreferencefor the logarithmic
frequencyscale,however.
This problem has consequences for variousapplica- 5OO
tions.If in syntheticspeech,e.g.,onewantsto givethe same
prominenceto accentedsyllablesin maleasin femalespeech, •oo
the excursionof the pitch movementsmust be the sameon
the frequencyscalein which the prominenceof pitch move- •o
0 0.3 0,6 0.9
mentsis perceived.As femaleand male voicesdiffer by al-
time (s)
most 1 oct, the differencebetweentheseapproachescan
causeconsiderable discrepancies.For example,an excursion FIG. 1. The three differentprominence-lending pitch movements.The
of 120 to 180 Hz in a male voicewould correspondto an pitch movements superimposed on the declinationline giveprominenceto
excursion of 240 to 300 Hz in a female voice if a linear scale the secondsyllable,the vowelonsetof whichis indicatedby the crossbar.
The rise starts from low declination 70 ms before vowel onset and ends at
were used, whereas an excursion of 240 to 360 Hz would high declination50 ms after the vowelonsetof the secondsyllable.The
provideequal prominenceif a logarithmicscalewere used. risingpart of the rise-fallhasthesametimingandis followedby the falling
On an ERB-rate scale,equalprominencewould requirean part, which starts80 ms and ends200 ms after the vowel onset.The fall
excursion of 240 to 325 Hz. startsfrom high declination20 ms beforethe vowelonsetof the second
syllable,and endsat low declination 100ms after the vowelonset.The same
In orderto decideon whichscaletheexcursions of pitch timing is usedin the IPO text-to-speechsystem.
movementsare perceived,subjectsadjustedthe variableex-
cursionsizeof a pitchmovementin a comparison stimulusto
the fixedexcursionsizeof a pitchmovementin a teststimu-
lusresynthesized in a differentfrequencyregister.This was
This was fixed within one run. The secondstimulus,referred
done both in sessions in which the test stimulus was in a low
to asthe comparison
stimulus,waspresented
in anotherreg-
register,whilethe comparisonstimuluswasin a high regis- ister and had a variable excursion size. In the first trial of a
ter, and for sessions
in which the teststimuluswasin a high
run, the excursionsizein the comparison stimuluswaszero.
register,whilethecomparison stimuluswasin a low register.
Subjectswere askedfirst to increasethe excursionsizein the
Furthermore, this was done for six differentexcursionsizes,
comparisonstimulusto suchan extentthat the prominence
and for three different prominence-lending
pitch move-
of its accentedsyllableclearlyexceeded the prominence of
ments, a rise, a rise-fall, and a fall.
the accentedsyllablein the teststimulus.In the nexttrials,
I. EXPERIMENT the subjectswereaskedto decreaseand increasethe excur-
sionsizein the comparison stimulusuntil it wasjudgedto
A. Materials
givethe sameprominenceas the pitch excursionin the test
The stimuli consisted of modified versions of one utter- stimulus.When the subjecthad donethis,the nextrun start-
ance,/mamfima/, spokenby a male speaker.Its duration ed. In each session,there were six runs for six different ex-
was0.77 s. The secondsyllablecarriedan accent.Pitch mod- cursion sizes in the test stimulus. The six different excursion
ificationswere appliedwith the pitch-synchronous overlap sizesin the test stimuluswere presentedin randomorder
and add (PSOLA) technique(Hamon etal., 1989), result- with a different order in each session.
ing in very naturalsoundingspeechstimuli.Durationand As mentioned,the two stimulimatchedin prominence
amplituderelationswere kept constant. werepresentedin differentregisters.In onesetof sessions,
These stimuli were resynthesizedwith one of three theteststimuliwerepresented in the highregister,whilethe
prominence-lending pitchmovements, a rise,a rise-fall,and comparisonstimuliwerepresentedin the low register(see
a fall, superimposedon declinationlinesasdisplayedin Fig. Fig. 2). Thesewill be referredto as downwardsessions. In
1. Thesepitchcontoursconsisted of linesthat werestraight anothersetof sessions, the teststimuliwerelow in register,
on either a linear frequency scale, an ERB-rate scale, or a and the comparison stimuli high. These will be referred to as
logarithmicfrequencyscale,resultingin threedifferentver- upwardsessions (seeFig. 3). Theseupwardanddownward
sions, which will be referred to as LIN, ERB, and LOG, sessionstook place for all three pitch movementsand for
respectively.
All versionshad a declinationend point of ei- eachof the threedifferentfrequencyscales,givingeighteen
ther 75 or 180 Hz, definingthe low versionsand the high different sessions.
versions.The high versionssoundedlike a male falsetto In eachsession,
the comparisonstimuliformeda setof
voice. ten stimuli with increasingexcursionsize. They were con-
All sessionsconsistedof adjustmentruns in which two structedin sucha way that, within one register,they were
stimuli,a low anda highversion,wererepeatedlypresented almostidenticalin all three frequencyscales[compareFig.
to the subjectwith an interstimulusintervalof 1 s. The stim- 2 (b), (d), and (f), andFig. 3 (b), (d) and (f) ]. Exactequa-
ulus presentedfirst will be referredto as the test stimulus. lity was impossibleas the linesthat made up the pitch con-
98 J. Acoust.Sec. Am., Vol. 90, No. 1, July 1991 D.J. Hermes and J. C. van Gestel: Frequencyscale of intonation 98
te•t stimuli comparisonstimuli te.t stimuli comparisonstimuli
)arallel on •arallel on
caJe of: (a), , ,q , , eft)* ' ' , , . ,tale of:
ERB ERB
5OO
-(e) '(el -if)
200!
50
0.3 0.6 0.9
• so[ H2
0.3 06 09
99 J. Acoust.Soc. Am., Vol. 90, No. 1, July 1991 D.d. Hermes and d.C. van Gestel: Frequencyscale of intonation 99
AVERAGES ACROSS ALL SUBJECTS
represent thestandarddeviationof the results.The straight
downward sessions
linewith a slopeof 45 degrepresentstheexpectedoutcomeif
upward sessions
thesubjecthad matchedprominence in thefrequencyscale
in which the resultsare plotted.In Fig. 4(a) and (b), the
resultsare presentedfor the LOG versions,in 4(a) for the
downwardadjustments, andin 4(b) for the upwardadjust-
ments.In Fig. 4(c) and (d), the resultsfor the ERB versions
are presented,and in Fig. 4(e) and (f), the resultsfor the
LIN versions. The results show that for the sessionswith the
semLtones
.... ;o....
semLtones
t's.... 2'0 LOG versions, a teststimuluspresented in a highregisteris
matchedto a comparison stimulus,which,in semitones, has
a higherexcursionsize[ Fig. 4 (a) ]. On theotherhand,when
(d) theteststimulusispresented in thelowregister,it ismatched
to a comparisonstimuluswhich, in semitones,hasa lower
excursionsize [Fig. 4(b) ]. For the ERB versions,the test
stimuliarematchedto a comparison stimulusthathasabout
anequalexcursion sizein theotherregister,in thisfrequency
scale.There isa tendencyto deviatefrom the ERB-rate scale
.... • .... ; .... •
for the lowestand the highestexcursions
of the teststimuli,
but it will be shown that this could be attributed to a tenden-
If)
cyto matchtheexcursion sizeof thecomparison stimulusto
the averageof all excursionsizes.For the sessionswith the
LIN versions, a teststimuluspresented in a highregisteris
matchedto a comparison stimulus,which,in Hz, hasa lower
s
excursion size.Whentheteststimulus ispresentedin thelow
register,thereis a tendencyto matchit to a comparison
so.... •o •o .... •o stimuluswhich,in Hz, hasa higherexcursionsize.
Hz Hz
C. Some comments
FIG. 4. The averageresultsacrossall ninesubjectswho participatedin the
experiments. The coordinatesrepresentthe frequencyscalein which the
Not everysubjectperformedthe experimentequally
pitchesof the low and the high stimulusran parallel.Thus (a) and (b)
showsthe resultsof the sessionswith the LOG versions,(c) and (d), those consistently. Somecomplainedaboutthe difficultyof the
for the ERB versions,and (e) and (f) for the LIN versions.The resultsfor task.Suchsubjects producedmatchingsthat tendedmoreto
the downwardsessions are presentedin (a), (c), and (e), while (b), (d), the averageof the comparison stimuli,producinga higher
and (f) presentthe resultsof the upwardsessions. The abscissaof the dia-
mondshowsthe endpoint of the lowerdeclinationline of the teststimulus,
variancein the results.Thesesubjects couldbe selectedby
whiletheordinateof thediamondshowstheendpointof thelowerdeclina- comparingtheirresponses in theupwardandthedownward
tionline of thecomparison stimulus.The sixdifferentexcursions of thetest sessions.Ifa teststimulusin thelowregisterismatchedto a
stimuliare presented astheintervalbetweenthe abscissa of thedatapoints comparison stimuluswith somespecificexcursionsizein the
andtheabscissa of thediamond.The averages of thematchings by all sub-
jectsare presentedas the intervalsbetweenthe ordinateof the data points highregister,a teststimulusin thehighregisterwithsuchan
and the ordinateof the diamond.The verticalbarsrepresentthe standard excursion sizeshould,in itsturn,bematchedto a compari-
deviationof the results.The straightline with a slopeof 45 degrepresents sonstimulusin the low registerwith aboutthe sameexcur-
theexpected outcomeif thesubject hadmatchedprominence in thefrequen- sionsizeastheoriginalteststimulus.Thisamountsto com-
cy scalein whichtheresultsareplotted.
bining the resultsof a downwardsessionwith thoseof a
corresponding upwardsession. When the excursionsizesof
the lowerstimuliare plottedagainstthe excursionsizesof
the higherstimuli,thereshouldbe a highcorrelation.This
andthe highstimulusran parallel.The abscissa
of the dia- correlation coefficient was calculated for all sessionsand for
mondshowstheendpointof thelowerdeclinationlineof the all subjects.This resultedin a quantitativemeasurethat
teststimulus( 180Hz for the downwardsessions,
and 75 Hz couldbeusedto selectsubjects responding consistently.
All
for the upward sessions),while the ordinate of the diamond subjectsparticipated in 18 sessions,while for the determina-
showsthe endpoint of the lowerdeclinationline of the com- tion of one such correlation coefficient two sessions were
parisonstimulus(75 Hz for the downwardsessions, and 180 necessary.So, a total of nine correlation coefficientswas ob-
Hz for the upwardsessions).-Thedatapointsrepresent the tained for each subject.The variancesand the bias in the
averages across
all subjects.
Theabscissaof a datapointrep- directionof theaverage
weremuchlesswhenonlythosefive
resentsthe endpointof the upperdeclinationline of the test subjectswere selectedfor whom this correlationcoefficient
stimulus,whileitsordinaterepresentstheaverageendpoint exceeded
0.75 in morethansixof theninesessions. Figure5
of the upperdeclinationline of the comparisonstimulus. showsthe averageresultsfor the fiveconsistently
respond-
This means that the interval between a coordinate of a data ing subjects.
point and the correspondingcoordinateof the diamond After beingtold that they had matchedprominenceof
givesthe excursionsize of the stimulus.The vertical bars pitch movementson an ERB-rate scale,a few subjectsfelt
100 J. Acoust.
Sec.Am.,Vol.90, No.1,July1991 D.J. HermesandJ. C. vanGestel:Frequency
scaleofintonation 100
AVERAGES ACROSS CONSISTENT SUBJECTS ONE "MUSICALLY" LISTENING SUBJECT
(a)
(b)
(a)
/.
.... ,'o.... ,'•....
•emLto•es
(d) (d)
ß• /
EaR
If) (f)
FIG. 5.Averageresultsofthematchings
byfivesubjects
showing
consisten- FIG. 6. Resultsof sixsessions
with a fall aspitchmovement,
in whichthe
cy in theirresponses;
otherwise,
asFig.4. subjectmatchedtheslimnitona musicalscale.andignoredtheprominence
of thesyllables;
mhcrwise,asFig. 4.
tiescharacteristicof speech"(Klatt, 1973,p. 8). Their data Houtsma,A. J. M., Durlach, N. I., and Braida,L. D. (1980). "Intensity
are not accurateenough,however,to draw any conclusions perceptionXI. Experimentalresultson the relationof intensityresolu-
tion to loudnessmatching,"J. Acoust.Soc.Am. 68, 807-813.
concerningthe scaleon which thesejnd's are constant. Klatt, D. H. (1973). "Discriminationof fundamentalfrequencycontours
Also in vowel perception,thereare somediscussions on in syntheticspeech:implicationsfor modelsof pitch perception,"J.
whetherformantfrequencyis perceivedon a logarithmicor Acoust. Soc. Am. 53, 8-16.
a scalederivedfrom the frequencyselectivityof the auditory Liberman,M. C. (1982). "The cochlearfrequencymapfor thecat:Label-
ingauditory-nerve
fibersof knowncharacteristic
frequency,"
J. Acoust.
system (e.g., Nearey, 1989; Miller, 1989). No conclusive Soc. Am. 72, 1441-1449.
experiments havebeenreported,however.This is probably Miller, J. D. (1989). "Auditory-perceptual
interpretationof the vowel,"J.
partlydueto the factthat no operationalexperimentalpara- Acoust. Soc. Am. 88, 2114-2134.
digm is known in which equalityin differentfrequencyre- Moore,B.C. J.,andGlasberg,B. R. (1983). "Suggested
formulaefor calcu-
latingauditory-filter
bandwidths
and excitationpatterns,"J. Acoust.
gionscan be established. Soc. Am. 74, 750-753.
From theresultobtainedin thisstudy,someconclusions Moore, B.C. J., and Glasberg,B. R. (1986). "The roleof frequencyselec-
can be drawn on the way in which pitch is coded in the tivityin theperception of loudness,pitchandtime,"in Frequency Selec-
central nervoussystem.In speechintonation,the promi- tivityin Hearing,editedbyB. C. J. Moore(Academic,London),pp.251-
308.
nencethat a pitchmovementlendsto a syllableappearsto be Moore,B.C. J., andGlasberg,B. R. (1989). "Mechanisms
underlyingthe
well definedperceptually,sothat excursionsizesin different frequencydiscrimination
of pulsedtonesandthedetectionof frequency
frequencyregionscanbe compared.This wasusedto deter- modulation," J. Acoust. Soc.Am. 86, 1722-1732.
Neary, T. M. (1989). "Static,dynamic,and relationalpropertiesin vowel
mine on whichfrequencyscalepitch movementsin speech perception,"J. Acoust.Soc.Am. 85, 2088-2113.
intonationarejudgedequal.A frequencyscalederivedfrom Patterson,R. D. (1976). "Auditory filtershapesderivedwith noisestimu-
the frequencyselectivityof the auditory systemfitted the li," J. Acoust. Soc. Am. 59, 640-654.
resultsbest.Since,in speech, mostharmonics havefrequen- Rietveld, A. C. M., and Gussenhoven,C. (1985). "On the relationbetween
pitchexcursionsizeand prominence,"J. Phon.13, 299-308.
cieshigherthan 500 Hz, and alsothe ERB-scaleis nearly Stevens,S.S., Volkman, J., and Newman, E. B. (1937). "A scalefor the
logarithmicabove500 Hz, prominence of pitchmovements measurement
of thepsychological
magnitude
pitch,"J. Acoust.Soc.Am.
would be perceivedon an approximatelylogarithmicfre- 8, 185-190.
quencyscale,if perceivedprominencewerebasedon a com- Stevens,
S.S.,andVolkman,J. (1940). "The relationto pitchandfrequen-
cy: a revisedscale,"Am. J. Psychol.53, 329-353.
binationof the excursions of the harmonics.Sincethis ap- 't Hart, J. (1981). "Differentialsensitivity
to pitchdistance,
particularlyin
pearsnot to bethe case,it mustbeconcludedthat perceived speech,"J. Acoust.Soc.Am. 69, 811-821.
prominence isbasedonthecourseof pitchitselfandnotof its Suchwerskyj, W. yon (1977). "Beurteilungyon Unterschiedenzwischen
harmonics. Thismeansthat thereisa pitch-coding arrayin aufeinanderfolgenden Schallen,"Acustica38, 131-139.
't Hart,J.,Collier,R., andCohen,A. (1990).A PerceptualStudyoflntona-
thehumanspeechprocessor. It hasnowbeenshownthatthis tion (CambridgeU. P., Cambridge,England).
array hasthe samelinear organizationasthe array of filters Traunmfiller,
H., Branderud,
P., andBigestans,
A. (1989). "Paralinguistic
in the peripheralauditory system. speechsignaltransformations,"PhoneticExperimentalResearch,Insti-
tute of Linguistics;Universityof Stockholm(PERILUS) I0, 47-64.
Traunmfiller,
H. (1990). "Analyticalexpressions
for thetonotopic
sensory
scale," J. Acoust. Soc. Am. 88, 97-100.
ACKNOWLEDGMENTS
Wilson,J.P., and Evans,E. F. (1977). "Cochlearfrequencymap for the
cat," in Psychophysicsand Physiology of Hearing,editedby E. F. Evans
This work wassupportedby the Instituut voor Doven,
and J.P. Wilson (Academic,New York), p. 69.
Sint-Michielsgestel.
We are gratefulto Hans 't Hart andJac- Zwicker,E., Flottorp, G., andStevens,S.S. (1957). "Critical bandwidth in
quesTerken for their fruitful discussions
and constructive loudnesssummation," J. Acoust. Soc. Am. 29, 548-557.
commentson the manuscript. Zwicker,E. ( I96I ). "Subdivision
of theaudiblefrequency
rangeintocriti-
cal bands(Frequenzgruppen),"J. Acoust.Soc.Am. 33, 248.
Zwicker,E., andTerhardt,E. (1980). "Analyticalexpressions
for critical-
bandrateand criticalbandwidthasa functionof frequency,"J. Acoust.
Cooper,W. E., and Sorensen,J. M. (1981). FundamentalFrequencyin Sen- Soc. Am. 68, 1523-1525.
tenceProduction(Springer-Verlag,New York). Zwislocki,J. (1965). "Analysisofsomeauditorycharacteristics,"
in Hand-
Fletcher, H. (1953). Speechand Hearing in Communication(Van Nos- bookof Mathematical
Psychology,Vol.Ill, editedby R. D. Luce,R. R.
trand, New York), pp. 153-175. Bush,andE. Galanter(Wiley, New York), pp. 1-97.