Absolure idenrificarion of notes and inrervals by musicians.

Perception & Psychophysics, 1977, 21, 143-152.
Terhardr, E. Absolute and relarive pitch revisired on psychoacousric grounds. Proceedings
of the llth Internationa/ Congress on Acoustics, Paris, 1983, 4, 427-430.
Trehub, S. E., Cohen, A. J., Thorpe, L. A., & Morrongiello,
B. Development of the perceprion of musical relations: Sernirone and diatonic strucrure. Journa/ of Experimental
Psychology: Human Perception & Performance, 1986, 12, 295-301.
Ward, W. D., & Burns, E.M. Absolute pirch. In D. Deursch-tfid.), The psychology of
music. New York: Academic Press, 1982, pp. 431-0451.
Wedell, C. H. Narure of absolute judgmenr of pitch, journal of Experimental Psychology,
1934, 17, 485-503.

Perceived Key Movement in Four-Voice Harmony

and Single Voices


Atkinson College, York University


Queen's University at Kingston

Lisreners with a moderare amounr of musical rraining rated the disrance
berween rhe firsr and final key of short chorale excerprs under one of
four presenration conditions, The disrance berween keys, or modularion
distance, was either zero, one, or rwo steps in eirher the clockwise or
counrerclockwise direction on the cycle of fifrhs. Presentarion condirions
were four-voice harmonic sequences excerpred from the complete set of
Bach chorales, single voices of the larrer sequences, four-voice harmonic
sequences simplified ro block chords, and single voices of the latter
sequences. Consisrenr wirh earlier findings (Thompson & Cuddy, 1989),
judgments far borh four-voice harmonic presenrarions and single-voice
presentarions revealed a clase correspondence berween modularion distance and judged distance. Rarings far harmonic sequences wirhin a
given key disrance, however, showed influences of direcrion of modularion and of harrnonic progression thar were not reflecred in rarings
far single voices. The findings suggesr thar harmony and melody follow
somewhar different principies in the process of idenrifying key change.

harmonized music, influences on the perception of key structure and
key change are notably complex. Both melodic and harmonic structure
conrribute to a lisrener's sense of key. It is difficult to isolate their separare
influences, however, beca use the tonal implications of -i:nelody and harmony are highly correlared.
In a previous repon, we addressed this problem by investigating a strict
hierarchical description of key (Thompson & Cuddy, 1989). In this description, a melodic line implicares key structure by first implicating an


Requesrs far reprints rnay be senr to W. F. Thompson at the Departrnenr of Psychology,

Arkinson College, York Universiry, North York, Onrario, Canada, M3J 1P3 orto L. L.
Cuddy ar rhe Departrnenr of Psychology, Queen's Universiry, Kingsron, Onrario, Canada,
K7L 3N6.



F. Thompson

Four-Yoice Harmony and Single Voices

& Lola L. Cuddy

underlying harmonic progression, which, in turn, implica tes key structure

and key movement. Because the process of deriving an implied harmonic
progression from a sequence of tones may be subject to error or ambiguity,
the strict hierarchical descripton predicrs that judgments of key structure
should in general be more dfficult for single voices than fer harmonic
Befare Thompson and Cuddy (1989), investigators had not explicitly
rested a hierarchical conception of tones, chords, and keys by asking
listeners directly to judge key movement in borh harrnonic sequences and
in single voices. However, demonstrations of top-clown influences involving a variety of other judgments ha ve generally supported a hierarchical
descriprion of key structure. For example, it has been shown rhat key
conrext influences judgments of chords or chord sequences (Krumhansl,
Bharucha, & Castellano, 1982; Krumhansl, Bharucha, & Kessler, 1982;
Krumhansl & Kessler, 1982) and melodies (e.g., Cuddy, Cohen, & Mewhorr, 1981; Cuddy, Cohen, & Miller, 1979; Dowling, 1978; Krumhansl,
1979). Moreover, judgments of melodies are influenced by the implied
harmonic progression (Bharucha, 1984; Cuddy, Cohen, & Mewhort,
1981; Cuddy & Lyons, 1981). These resulrs are consistent wirh the notion
that, within a hierarchy of musical structure, keys are represented at the
highest leve!, chords atan intermediare level, and single tones ar the lowest
leve! (Bharucha, 1987; Krumhansl, Bharucha, & Kessler, 1982, p. 34;
Lerdahl, 1988).
Thompson and Cuddy (1989) investigated the perception of key structure and key movernent by asking listeners to judge the distance between
the first and final key of excerpts adapted from Bach chorales. Five types
of sequences were included in the presentations: nonmodulating,
modularing to the dominant, modulating to the subdominant, modulating to
the supertonic, and modulating to rhe flartened seventh. J udgments of key
distance in four-voice sequences were compared with judgments of key
distance in che individual voices of those sequences. There were two main
findings: firsr, both harmonic and single-voice presentations reliably conveyed a similar degree of key movement to lisreners; second, for harmonic
presentarions, but not for single-voice presenrations, grearer distance was
associared with modulations moving in rhe counterclockwise, rarher than
in the clockwise, direction. This second finding suggests that the key rnovement conveyed by harmonic sequences, but nor by single voices, was
asymmetric with respect to direction of modulation.
Togerher, rhe findings suggest that judgments of key distance in harmony and single voices are nor adequately modeled by a strict hierarchical
systern. Wirhin a strict hierarchical processing sysrern, sorne information
loss is expected between the levels of harmony and single voices. Therefore, judgments of single voices should be less reliable than judgmenrs of



harmonic presentations. An exception to this latter outcome could occur

only if listeners were able to abstraer with complete accuracy the underlying harmonic progression from presentations of single voices: in that
case, however, one would expect effects such as the directional asymmetry,
mentioned earlier, to be evident for both judgments of harmonic sequences
and judgments of single voices. Because our results were inconsistent with
either possible outcome, we concluded that melody and harmony do not
implicare key structure within a strict hierarchical systern. Rather, the
processes of abstracting key structure from melody may operare somewhat
independently from the processes of abstracring key structure from harmony.
The present investigation was designed to replicare and elaborare on the
earlier findings. For both harmonic and single-voice presentations, we
compared judgments of key distance for three versions of the chorale
sequences. The first version was the original form as notated by Leuchter
(1968). The other versions were simplifications of the sequences [sirnplifications (1) and (2)]. Simplified versions were sequences of eight fourvoice chords, that is, without passing notes or ornamenrarion.
Simplificarion (1) was provided by Professor Fred Lerdahl. Ir retains the
overall key movemenr theoretically deemed presenr in the original chorale
sequences but alters the harmonic progressions in order to equate the poinr
of modulation across ali sequences. Lerdahl (personal communication,
1986) commented that sorne of the modulations occurred either "too soon
or too late," particularly those in the counterclockwise direction. This
imbalance may have contributed to the directional asymmetry nored in
Thompson and Cuddy (1989) for ratings of harmonic presenrations, Simplification (2) was the version used in our earlier research: it closely preserved the harmonic progression of the original chorales.
Judgments of key distance in original sequences and Simplification (1)
sequences were collected and compared with judgments previously collected for Simplification (2) sequences (Thompson & Cuddy, 1989). Since
the three versions of each chorale sequence always maintained the same
key structure, similarities among che results for the three versions should
revea! the overall contribution of key structure to perceived key rnovement. Differences among rhe versions should revea! influences by local
melodic and harmonic details.

The merhod described refers to experimentation wirh original and Simplificarion (1)
sequences. Procedures far Simplification (2) sequences were similar (Thornpson & Cuddy,



F. Thompson

& Lola L. Cuddy

Fur-Voice Harmony and Single Voices


Four groups of Iisreners, each consisring of 20 undergraduare srudenrs from rhe Universiry of Queensland,
Australia, were rested, Listeners were selecred from a firsr-year
subjecr pool and were given course credir for participaring. The participants had little or
no formal training in tradirional music rheory, bur all lisrened to classical music on a regular
basis, or currenrly played a musical insrrument. Ali subjecrs reporred normal hearing.
Tones were rriangular waveforms produced by a Yamaha DX-11 synthesizer, conrrolled
by a Maclnrosh SE-20 compurer. The synrhesizer was ser to the sysrem of equal rernperamenr wirh A4 equal to 440 Hz. Sequences were prepared wirh rhe music sofrware
package "Professional Composer" and presenred under the control of the sofrware package
Performer." The tempo of each presentation was ser to 120 quarter nores
per minure.
Lisreners were rested in soundproof boorhs. Tones and chords were delivered binaurally
rhrough Sennheiser headphones (HDH 424), and responses were enrered on rhe keyboard
of a cornpurer terminal. Befare the experimental session began, each lisrener was allowed
to adjusr che average SPL ro a comforrable lisrening leve] wirhin rhe range 65 ro 75 dB
Ten phrases were excerpred from rhe complere set of Bach chorales (Leuchrer, 1968).
The original sources are lisred in Thompson and Cuddy (1989, appendix). The 10 excerprs
provided rwo examples of each of five modularion condirions. The five condirions were
as follows: nonmodularing
[Condirion NM], modularing ro the key of the dominanr
[Condirion M(V)], modularing to the key of the subdominanr [Condirion M(IV)], modularing ro rhe key of rhe superronic [Condirion M(II)], and modularing ro rhe key of the
flarrened sevenrh [Condirion M(VIIb)].
Ali excerprs ended with a perfecr cadence to the tonic chord of the final key. Two
versions of each sequence were used in rhe invesrigarion: the original excerprs and a
simplified version of the excerprs [Simplification (1), prepared by Professor F. Lerdahl].
The simplified (1) version of each sequence consisred of eighr chords with no ornamental
or passing notes. For each sequence, rhe key strucrure, the roor of the firsr chord, and rhe
roors of rhe final rwo chords were always rhe same in the original and simplified versions.
However, harmonic progressions were sornetimes altered in rhe simplified (1) sequences
in order to equare rhe poinr of modularion across ali sequences.
Figure 1 shows the original versions of rhe 10 chorale sequences in musical noration.
Figure 2 shows rhe simplified (1) versions of the 10 sequences. The rypes of key movemenr
displayed in Figures 1 and 2 are as follows: nonmodularing-Sequences
1 and 2, modularing one srep on rhe cycle of fifrhs ro rhe dominanr or to rhe subdominanr-Sequences
3, 4, 5, and 6, and modulating rwo sreps on rhe cyde of fifrhs to the superronic orto the
flarrened sevenrh-Sequences
7, 8, 9, and 10.1
Subjecrs were randomly assigned ro one of four groups. Presenrations for rhe four
groups were as follows: Croup 1-harmonic
sequences excerpted from Leuchrer (1968),
Group 2-single
voices of rhe larrer sequences, Group 3-harmonic
presenrarions of Sirn-

Sequences 3 and 4 of Thompson and Cuddy (1989) were dropped for reasons explained in rhat paper, and rhe sequences of chis paper have been renuribered accordingly.

Fig. l. The original versions of the 10 sequences, excerpted frorn Bach Chorales (Leuchrer,
plificarion (1) sequences, Group 4-single
voices of rhe latrer sequences. Each tria! consisred of an inirial melodic partern of five quarrer nores, followed by a pause equal ro rwo
quarrer notes, and then a harmonic or single-voice presenrarion. The inirial melodic partern
oudined rhe tonic rriad of rhe inirial key of rhe sequence and was included ro give rhe
lisrener a srrong sense for rhe inirial key. Presenred in ascending order, rhe five notes of
rhe parrem were ronic, tonic one octave above rhe firsr tone, medianr, dominant, and ronic
rwo ocraves above rhe firsr rone. For harmonic presenrations, each harmonic sequence was


Four-Voice Harmony and Single Voices

William F. Thompson


& Lola L. Cuddy

of rhe scale. Lisreners were inforrned rhar rhere were no righr or wrong answers and thar
rhey should try ro use rhe entire range of rhe response scale.

C :: 1:; ; ; I~; ;

Results and Discussion



J- J....... ~ J J~ J- J- J '

Table 1 displays mean ratings of perceived key distance for three types
of key structure and three versions of the sequences. The three types of
key structure in the table are nonmodulating (Nonrnod), modulating one
step on the cycle of fifths (Mod 1 step), and modulating rwo steps on the
cycle of fifths (Mod 2 steps). The upper part of the table shows mean
ratings for four-voice harrnonic presentations; the lower part shows mean
ratings for single voices averaged across voices (soprano, alto, tenor, bass).
Table 1 reveals similar patterns of resu!ts for each of the three versions.
Single voices, on the average, conveyed as much inforrnation about key
change as four-voice harmonic presentations: As the theoretical distance
on the cycle of fifths increased, ratings of perceived distance increased by
about the same amount for four-voice harmony and for single voices.
Analysis of variance yielded strong effects of modulation distance and no
interactions between modulation distance and version of the sequences.
For four-voice harmonic presentations, nonmodulating sequences were
rated lower than modulating sequences [F(l, 57) = 57.19, p < .01], and
key changes of one step on the cycle of fifths were associated with lower
ratings than key changes of two steps [F(l, 57) = 97.18, p < .01]. Similarly, for single-voice presentations, nonmodulating sequences were rated
lower than modulating sequences [F(l, 61) = 190.93, p < .01], and key
changes of one step were associated with lower ratings than key changes

Mean Ratings of Key Distance for Four-Voice Harmony
and Single Voices

Fig. 2. The simplified (1) versions of the 10 sequences.

presented once. For single-voice presentations, each voice from each harmonic sequence
was presented once. The order of presentation was randomly and independendy determined for each lisrener.

Listeners were inforrned rhar rhey should rate rhe disrance berween the first and final
key of each presentation on a scale of one ro seven. For listeners not familiar with a formal
definition of key, an explanarion was provided. This explanarion included reference ro the
scale, do-re-me-fa-sol-la-ri-do, and ro rhe sense of srabiliry associared with rhe firsr note

Four-voice harmony
Mod 1 step
Mod 2 steps
Mean of four single voices
Mod 1 step
Mod 2 sreps


Simplified (1)

Simplified (2)








William F. Thompson

& Lola L. Cuddy

Four-Voice Harmony and Single l/oices

of two steps [F(l, 61) = 154.20, p < .01]. Thus, ratings of key distance
were as reliable for single voices as they were for full hannonic sequences.



Mean Ratings of Key Distance in Soprano, Alto, Tenor, and Bass

Voices, Averaged across Modulating Sequences


The asymmetry of modulation direction reponed for Simplification (2)

by Thompson and Cuddy (1989) was replicated for the original sequences
but was much less evident in the simplified (1) sequences provided by
Lerdahl. For four-voice harmonic presentations of the original sequences,
key changes involving clockwise movement on the cycle of fifths were
associated with lower ratings (mean, 3.1) than key changes involving
counterclockwise movement on the cycle of fifths (mean, 4.4). For the
simplified (1) harmonic sequences, however, mean ratings were 3.9 and
4.2 for clockwise and counterclockwise movement, respectively. Thus,
original and simplified (1) harmonic sequences differed significantly with
respect to the presence of an asymrnetry of modularion direction [F(l,
38) = 4.43, p < .05]. This finding suggests that directional asymmetry is
related to the specific characteristics of the harmonic progressions-the
means by which modulation is effecred. It is not necessarily a properry
of the psychological representation of key relationships.
There was no asymrnerry of modulation direction for judgments of
single voices, for either version tested in this investigation. This finding
is consistent with the results reported in Thompson and Cuddy (1989):
In both investigations, asymmetry of modulation direcrion, when it
emerged for Iour-voice harmonic sequences, was not evident in judgments
of single voices. This difference between judgments of harmonic presentations and judgmems of single-voice presentations occurred in spite of
the fact that ratings of key distance were equally reliable for single-voice
and harmonic presenrarions. Evidently, processes underlying the asyrnmetry effect reponed for judgments of harmonic presentations were not
operating for judgments of single voices. More generally, harmony and
single voices may be partially independent in the implication of key structure.

Table 2 displays mean ratings assigned to soprano, alto, tenor, and bass
voices, for each of the three versions, averaged across modulating sequences only. For single-voice presentations, the three versions differed
with respect to the amount of key change conveyed by the four voices
contained in rnodulating sequences. For the original versions, rarings of
key distance were lower for soprano voices than for other voices. For the
rwo simplified versions, rarings of key distance for the soprano voice of



Simplified (1)
Simplified (2)

. Soprano








most sequences were similar to ratings of key distance for other voices.
In orher words, the degree of key change conveyed by single voices was
less balanced across voices in the original versions than in the simplified
versions. These differences were supported by an overall interaction between Voice x Sequence-type x Version [F(24, 732) = 5.42, p < .01].
Not surprisingly, voices that introduced the new note or notes involved
in a key change were .most informarive that a key change had occurred.
For example, ratings of the soprano voice of Sequence 9 were much lower
for the original version (mean, 2.45) than for the simplified (1) version
(mean, 5.15) [t(19) = 5.88, p < .001]. This difference appears to be related to the presence of accidentals in the voice: the original soprano voice
involves no accidentals, but the simplified (1) soprano voice involves two
accidentals (i.e., notes from the new key) in the last bar. For the alto of
the same sequence, mean ratings for the original and simplified (1) version
were 5.55 and 3.25, respectively [t(19) = 4.67, p < .001]. Again, the difference appears to be relared to the presence of an accidental. In the
original alto voice, an accidental is.introduced by the new key at the
penu!timate note. In the simplified (1) alto voice, there are no accidentals.


As noted in Table 1, mean ratings of key distance for four-voice harmony and for single voices increased by similar amounts as theoretical
distance on the cycle of fifths increased. However, exarnination of ratings
for the individual sequences within a given modulation distance revealed
that judgments of harmonic presentations and judgments of single-voice
presentations did not always correspond. The musical changes made to
the original sequences to creare simplified (1) sequences did not always
have the same effect on judgments of harmonic presenrations as rhey did
on judgments of single-voice presentations. As noted earlier, fer example,
rarings of original harmonic sequences contained a directional asymmetry
that was significantly reduced in ratings of simplified (1) harmonic se-



F. Thompson

Four-Voice Harmony and Single-Yoices

& Lola L. Cuddy


quences. However, no corresponding influence of the musical changes was

found for the comparison of ratings of original single voices and sirnplified
(1) single voices.
For each sequence, we examined the effect of the changes in musical
detail introduced by Sirnplification (1) on both judgments of harrnonic
presentations and judgrnents of single voices. Differences in ratings between original and sirnplified (1) versions of each hrmonic sequence were
calculated and compared with differences in ratings between original and
simplified (1) versions of the single voices contained in each harmonic
sequence. The changes introduced by Sirnplification (1) sometirnes increased and sometimes decreased ratings both for single-voice presentations and for harrnonic presentations. However, differences for judgrnents
of harrnonic presentations were uncorrelated with differences for judgrnents of single voices [r(8) = -.324, n.s.].2
In a few cases, the direction of the difference berween original and
sirnplified (1) versions for judgments of harmonic sequences was opposite
to the direction of the difference berween these two versions for judgrnents
of single voices. This interaction between Version and Condition of Presentation was most evident for Sequence 10. Figure 3 shows mean ratings
for original and simplified (1) versions of Sequence 10, for harmonic and
single-voice presentations.
Sequence 10 involves a key change from C major to B, major. In
comparison to the original version of Sequence 10, the sirnplified (1)
version involves earlier introduction of chords from the final key. For
judgments of harrnonic presentations, the mean ratings displayed in Figure
3 suggest that the changes intrcduced- by Simplificarion (1) reduced the
perceived degree of key movement. For single voices, however, the opposite effect was found. Changes introduced by Simplification (1) appeared to enhance or emphasize the sense of key change.
The reliability of this interaction was established by comparing ratings
of single voices of Sequence 10 (averaged across voices) wii:h ratings of
rhe full harmonic presentations of Sequence 10. Analysis of variance revealed a significant interaction of Version [original vs. sirnplified
(1)] X Condition of Presentation (single voice vs. full harmony) [F(l,
76) = 8.15, p < .01]. The effect is consistent ~ith the notion that there
are different processes by which key structure is abstracted from single
2. Analysis of variance conducred far judgments of original and simplified (1) harmonic
presenrations indicated a significant contrasr within rhe triple interaction of Modulation
Condirion x Exarnple x Version [F(l, 38) = 4.35, p < .05]. In addition, judgments of
original and simplified (1) harrnonic sequences were significantly different with respect to
rhe presence of an asyrnmetry of modularion direcrion. Thus, the lack of correlarion
berween differences far harmonic presentations and single-voice presentations was not
merely rhe result of correlating rwo randorn distriburions.






Simplification (1)












Four - Voice Hanmony


Fig. 3. Mean rarings of key disrance far original and simplified (1) versions of Sequence
10, far harmonic and single-voice presentations.

voices and full harrnony: musical details that may reduce key movement
in a harrnonic context may actually enhance key movement in single



Judgrnents of key movement in original and simplified (1) sequences

corfirm earlier findings conceming the sensitivity of listeners to key change
in short chorale excerpts. Results obtained with simplified (2) sequences
(Thompson & Cuddy, 1989) may be generalized to their original sources.
Moreover, furrhr evidence is provided that melody and harmony may
follow different principies in the process of idenrifying key change. Asymmetry of modulation direction, when it occurs, emerges for four-voice
harmonic presentations only, and not for single voices. Moreover, effects
of local musical changes on judgments of harmonic sequences were not
similar, and in sorne cases were in an opposite direction, to the effects
of those changes on judgments of single voices.
Comparison of judgrnents for four-voice harmony and single voices
suggest that melody and harmony are partially independent in the implication of key. [In a somewhat different context, Schmuckler (1989)
found that melody and harmony contributed independently and additively
to expectancy generation.] This parrial independence is not well modeled

William F. Thompson & Lola L. Cuddy

by a strict hierarchical systern, in which a melodic line implicares a key

by first implicating an underlying harmonic progression. The results are
consistent with a partially hierarchical systern, however, in which musical
relations at each level can be evaluated with or without reference to other
levels in the hierarchy. Although listeners can evaluare key relationships in a melodic line with reference to an implied harmonic progression,
other aspects of melodic structure are available, and they may provide
listeners with an important source of information about key and key
movement. 3 4

A (De)Composable Theory of Rhythm Perception


Nijmegen University, The Netherlands

A definition is given of expectancy of evenrs projected into rhe future
by a complex temporal sequence. The definition can be decomposed inro
basic expecrancy cornponenrs projecred by each time inrerval implicir
in rhe sequence. A preliminary formularion of rhese basic curves is
proposed and rhe (de)composition merhod is stared in a formalized,
rnathemarical way. The resulting expecrancy of complex temporal patterns can be used to model such diverse topics as categorical rhythrn
perception, dock and meter inducement, rhyrhrniciry, and the similariry
of temporal sequences. Besidesexpecrancy projecred into the Iurure, the
proposed measure can be projected back into the pastas well, generaring
reinforcemenr of pasr evenrs by new data. The consistency of the predictions of the rheory wirh sorne findings In categorical rhythm perceprion is shown.

Many incompatible theories about temporal perception and memory
exist, which explain a number of phenomena well, but fail to predict
others. A common rheoretical basis for such work would be desirable.
Connectionism might be an attractive paradigm in the search for such a
basis, but most of its models lack compositionality. This means that the
model as a monolithic whole might perform well, but it is impossible to
decompose its complex behavior into meaningful smaller parts. Chandrasekaran (1990) argues that the cornposability is a condition fer successful cognitive modeling, even in the connectionist paradigm. In Desain
(1990), the behavior of a subsyrnbolic (connectionist) model of temporal
quantization was described such that it could be cornpared with an incompatible syrnbolic model from the traditional artificial imelligence paradigm. The paper concluded with an abstracrion of the behavior of the
quantizer in the form of an "expectancy of events" with a temporal pattern
as prior context. Expectancy turned out to be (de)composable, which
makes it possible to base a theory of perception of complex stimuli on
a simple model for the perception of their constituting components. Be-

Requesrs for reprints may be senr to Perer Desain, NICI, Nijmegen Universiry, P.O.
Technische Universitdt Miinchen

Informarion processing is characrerized by condicional decisions on hierarchically organized levels, In biological sysrems, this principie is manifesr in rhe phenomena of contourizarion
and caregorizarion, which are
more or less synonymous. Primary contourizarion=
such as in rhe visual
system= is regarded as rhe firsr srep of absrraction. Irs audirory equivalenr is formarion of spectral pirches. Hierarchical processing is characrerized by rhe principies of immediare processing, open end, recursion,
disrribured knowledge, forward processing, auronomy, and viewback.
In that concepr, perceprual phenornena such as illusion, arnbiguiry, and
sirnilariry turn our to be essenrial and rypica]. Wirh respecr ro perceprion
of musical sound, those principies and phenornena readily explain pirch
tone affiniry, octave equivalence
(chroma), roor, and
tonaliry. As a panicular example, an explanarion of rhe tritone paradox
is suggesred.

I regard ir a rnistake if the rheory of consonance is considered as the
essenrial basis for the theory of music, and I had felt that I had expressed rhis in the book wirh sufficienr clariry, The essenrial basis of
music is melody.
H. von Helmholtz
foreword to 3rd ed., The Sensations o( Tone (1870)

Among music experrs there is wide agreement on the notion that a

melody is more than justa sequence of tones. What rnakes a tone sequence
a melody is harmonic and rhyrhrnic organization. So, ifa melody really
is a melody, ir includes both harmony and rhythm. Instead of "includes,"
one 'mayas well say "suggests." The harmonic and rhythmic implicarions
of a melody ordinarily form a background that is created by, and dependent on, the melody itself.
Requesrs far reprinrs may be senr to E. Terhardr,
and Audiocommunicarion, Technische Universirar Mnchen, P.O. Box 20 24 20, D-8000 Mnchen 2,



Ernst Terhardt

Here emerges an analogy between melody and figure (Gestalt). When

in a visual pattern a set of elernents is seen as a figure, the remainder acts
as background. It is impossible to decide what is the figure wirhout deciding what is the background.
Ir is in rhis sense that a figure may
"suggest" -its background. Remarkably, it is the method of
suggesting harmony by melody that provides for making harmonic irnplications and progressions particularly rich and flexible, as was so strikingly demonstrated by J. S. Bach. The analogy drawn by Hofstadter (1979)
between Bach's polyphonic compositions and M.C. Escher's famous
interactions intuitively fits well.
From these considerations one may draw two conclusions. The first is
just formal: As melody actually includes both harmony and rhythrn, it is
misleading to say that the basic cornponents of tonal music are melody,
harmony, and rhythm. One should better say it is temporal pitch contour
(also termed melodic contour, d. Dowling, 1978), harmony, and rhythm.
The second conclusion is about analogies. Analogies are helpful for
undersranding. In fact, analogies are both a too] for, and the essence of,
Drawing analogies means making special observations
more general. While music is a medium that is borh abstraer and nonsernantic, analogies between musical structures and perceptual principies
on the one hand, and visual Gestalt principies and the structure of language on the other hand, have often been discussed (e.g., Bregman &
Campbell, 1971; Carterette, Kohl, & Pitt, 1986; Deutsch, 1969, 1982b;
1988; Helmholtz,
1954; Jackendoff
& Lerdahl, 1982;
Koehler, 1933; McAdams, 1985; Minsky, 1982; Stumpf, 1965; Wellek,
Most (if not all) of those earlier considerations refer to what these days
are termed cognitive processes - as opposed to psychophysical ones. Indeed, on the cognitive levels-that
is, higher levels of abstractionanalogies between perceptual processes in different sensory modes often
suggest themselves readily. This may be attributed to the notion that with
ascending leve] of abstraction the particularities of sensory modes lose
However, ir is a serious drawback of singular high-level analogies that
they can only to a limited extent be experimentally verified and adequately
modeled. They are akin to metaphors rather than systematic, verifiable
analogies. The essential difference between metaphors and "real" analogies is thar the former include a significantly higher degree of
more alternatives-than
the latter. For instance, the
metaphoric equivalence of a musical piece's tonic (in Hindemith's terms,
the "tonal center," cf. Hindemith, 1940) to the vanishing point of a perspective picmre, is intuitively appealing. Yet, from a scientific point of view
ir appears arbitrary to an unsatisfactorily large extent, The same applies

Music Perception and Sensory Information Acquisition


to the aforementioned
analogy between melody and figure-background
On low levels of the sensory hierarchy, where intermodal particularities
are most pronounced, analogies do not so easily become apparent bur
require a high amount of abstraction from agrear variety of emprica! data
and observations. If successful, that effort is rewarded by providing borh
an equivalent amount of information and a basis for better understanding
high-level processes.
In an attempt to take advanrage of those ideas, the presenr study puts
auditory perception of musical tones and chords into rhe general conceptual frame of sensory information processing. This requires critica!
discussion of sorne pertinent concepts such as hierarchy, discretization,
inforrnation, contour, illusion, ambiguiry, and sirnilarity. As a result, it will
become apparent that a number of basic principies of tonal music such
as pitch categorization, tone affinity, chroma, root, and tonality readily
emerge frorn natural principies of sensory informarion acquisition. Moreover, an explanation of the so-called tritone paradox will be suggesred.


of Hierarchical


Alrhough ir is apparent rhat sensory perception musr be hierarchically

organized, the main principies and implications of rhat notion as yet are
vague. According to a common concept, a basic disrincron is made between "psychophysical"
and "cognitive" processes. The former are associated wirh basic sensory attribures such as pirch, loudness, and timbre,
whereas the latter are seen as related to extraction of inforrnation. In the
literature, one can observe a tendency to regard psychophysical processes
as a kind of trivial low-level interface to rhe only significant part, namely,
cognitive processes. Although sorne aspects of rhis arritude are plausible,
it is dangerous, as ir supports ignorance of essenrial interdependencies of
physical stimulus characteristics and low-level sensory processes. Ignoring
those interdependencies would imply no less rhan throwing away an invaluable key to understanding sensory information acquisition, including
perception of music. This is elucidated by the following considerations.
By inspecting sensory systerns and processes frorn the aspects of biological evolution, it becomes apparent rhat any sensory system has been
"designed" to enable an organism to respond rnost efficiently to externa!
events, An elucidating treatise on this view was given, for example, by
Konrad Lorenz (1959). Considering many examples, he arrived at the
conclusion that "intelligence" and "knowledge"
are distributed on all
levels of the hierarchy, such that a sensory sysrem rnost efficiently and
extracts from a stimulus "what it rneans" rather than


Ernsr Terhardr

what irs objecrive details are. He poimed out that in a highly developed
sensory system, a huge amount of knowledge must be included even on
low levels; that is, knowledge about srructures, relationships, and constraints of physical srimulus pararneters rhat carry information on externa!
objecrs and events. Moreover, he pointed out rhat most of that knowledge
has emerged by biological evolution, that is, rhrough interaction wirh the
physical conditions of the externa! world. And he clearly expressed the
norion rhat rhis process essentially is equivalent to learning by tria! and
error by an individual organism. With these notions ir becomes apparent
rhar, for example, Gestalt principies such as proxirniry, closure, and common fare are not principies per se but have evolved through interaction
with the coriditions of rhe externa! world, that is, to respond optimally
to any externa! challenge. This implies that much can be learned about
perception by srudying those physical conditions and constraints and their
psychophysical effects. And it implies further that on low levels of the
hierarchy a considerable arnount of active, "intelligem" processing must
go on "autornarically,"
that is, unconsciously. Another implication is that
rhe question of whether the knowledge implemented in a sensory system
is innare or learned in rnany respects is of minor relevance- beca use in
either case ir has been acquired by trial-and-error imeracrion with the
externa! world.
Wirh rhe following list of principies, an attempt is made to express both
typical characreristics and constraints of sensory hierarchical processing:
Immediate processing: Sensory inforrnation processing begins
right at the lowest possible level, that of the peripheral sense
Open end: The hierarchy does not have a definite number of
levels but is open ended.
Recursion: The basic principies of inforrnation processing are
the same on all levels.
Distributed knowledge: The "knowledge" necessary for optima! processing is distributed on all levels so that on each leve!
the particular type of knowledge that is required for the job
is available.
Forward processing: Inforrnation processing is predominantly
forward, that is, from peripheral to central levels.
Autonorny: On a given leve!, the input that comes from the
preceding leve! is processed according to the knowledge available on that particular leve!. On a short-term scale (i.e., leaving
out long-term learning processes), processing is not affected
from any higher leve!.
Viewback: While (on a short-terrn time scale) decisions made
on a particular leve! cannot be changed from a higher leve!,

Music Perception and Sensory Information Acquisition


their results can be directly inspected from any higher level,

not only from the next one.
The validity of these principies will be discussed in the subsequent secrions.

Discretization, Conditioned Decision, and Information

The most important aspect of cognitive processes is that they depend
on decisions and are concerned with discrete objects. lt is this aspect that
determines the main difference between cognirive processes and continuous sensory attributes. While for instance in speech, timbre is a continuous function of time, the continuous flow of the speech signa! is "automatically"
dissected into discrere units, for example, phonemes.
Although auditory pitch vares through a continuous low-high dimension,
in music rhere exist discrete pitch categories ("pitch classes") that are
organized in tone scales.
The distinction berween continuous sensory attributes and discrete cognitive objecrs marches that between signa! and information-a cornmon
concept in communication theory. Physical magnitudes, such as sound
pressure as a function of time, are virtually continuous; they are regarded
as carriers of information and are termed signals. Generalizing this concept
into the science of perception, such attributes as brightness and color
distriburion in vision and loudness, pitch, and timbre in hearing can be
regarded as psychophysical signals.
Shannon's information theory <loes not claim to say what inforrnation
is. Ir rather is confined to the quantitative aspects of inforrnation and
evaluares them in terms of probabilities. In the qualitative sense, information may be characterized as "something that is dependent on decisions," namely, conditional decisions. Ir is by conditional decisions that
caregories are assigned to signa! patterns. The caregories in turn are physically and psychophysically represen red 'by new signals that are subject to
more decisions, and so on. Ir is thus typical of inforrnation processing that
from one step to the next, the shape of information-carrying
changes radically. Ordinarily, many details of the signal input to a particular decision-making layer no longer exist in rhe ourput. And many
different input signals may be assigned to one and rhe same category, that
is, output.
With that qualitative definition of information, and on the basis of
evolution theory, one can "predict" that, for biological sensory sysrems,
discretization and categorization must be a predorninant and quite natural
behavior. In fact it is a prorninent (perhaps even rhe most rypical) aspecr
of any living organism-from
amoeba to human-that
ir is a "decision
machine" that is busy from the first to the lasr instant of its liferime. And
sensory systems are essential parts of that machinery.

Ernsr Terhardt

Music Perception and Sensory Info:mation Acquisition

Even a superficial inspection, from this point of view, of the visual and
auditory system reveals their "decisin machine character" to an overwhelming extent. Most remarkably, both in vision and audition, decisin
making actually begins right at the periphery, that is, in the eye's retina
and the ear's organ of Corti. Physiologically this is evident, for example,
in the transformation
of the stimulus into sequences of discrete neural
action potentials (nerve impulses) that propagare on discrere nerve fibers.
Psychophysically, ir is the phehomenon of contourization that provides
pertinent evidence. In this author's view, contourization provides a key
to a unifying concept of sensory inforrnation processing on any level.

alrhough its sound signa! on traveling to the listener's ears is heavily

affected by room acoustics. Moreover, the signa[ is corrupted by the
sounds of other instruments. Yet there cannot be any doubt that considerable information on that particular instrument's audible characteristics is still present in the signa! the ear receives-and
that the auditory
system is capable of extracting it.
Equally fundamental and striking is the fact that the pitches of musical
tones are conveyed with high precision from musician to listener. As a
daily-life experience this is so evident that one seldom-if
why it is so and what its scientific implications are. Nevertheless, this
phenomenon is highly significant and deserves careful analysis.
The key to understanding these and other achievements of auditory
communication is provided by the principie of contourization. It is based
on rhe notions:




and Spectral


Perception of a visual Gestalt entirely depends on the existence of prirnary con tours. Without contour there is no Cestalr. Formation of contour
and synthesis of Gestalt are complementary, mutually dependent, active
processes. For the understanding of sensory information processing it is
a key notion that even prirnary contour (i.e., on the retina level) is not
trivial in rhe sense that it is fully determined by the stirnulus alone. What
a stimulus essentially produces on the eye's retina is continuous brightness
and color distribution.
Assignment of contours to such a distribution
requires active decisions on the part of the peripheral sensory system. That
type of decision can be regarded as the first step of inforrnation processing,
that is, cognirive abstraction. Ir is in this sense that cognition begins right
at the periphery (cf. the principie of immediate processing). And it is this
notion= in conjunction with the principies of forward processing and
explains to a considerable extent the enorrnous efficacy
and speed of sensory information processing (e.g., Minsky, 1975). Visual
conrours are so irnportant because they represent the inost typical and
invarianr characteristics of externa! objects. Formation of contours implies
abstraction from many details of the incoming stimulus-in
those that are dependent on intensity and color of illurnination-and
extracrs the typical shape of externa! objects.
Most remarkably, it is auditory spectral pitch (i.e., the pitch of part
tones) thar=-with respect to externa! "acoustical objects" -plays exactly
rhe same role. From rhe basic physical parameters of a sound-source signa!
(i.e., amplitudes, phases, and frequencies of part tones), it is only the
frequencies that are transmitted with highest fidelity; amplitudes and
phases ordinarily are to a considerable exrent corrupted.
Consider, for example, listening to a symphony in a concert hall. One
may without difficulty distinguish one or the other individual instrument,

That there is at least one type of source-signal pararneter that

is not affected by transmission from source to listener, namely,
spectral frequency.
That the peripheral auditory system is an efficient Fourierspectrum analyzer followed by a contourization
that "reads" discrete part-tone pitches from rhe continuous
specrral-intensity distriburion.
As a psychophysical representation of spectral frequencies, rhe part-tone
pitches include most of the information that comes from sound sources
and is carried by sound signals such as musical tones. In auditory
perception of music-spectral
pitch plays the role of
primary contour on which any auditory Gestalt entirely depends.
By simulating auditory extraction of specrral pitches on a computer, we
have verified that the contour-time patterns acrually include all aurally
relevant information. In Figure 1 an example is shown of the so-called
part-tone-time pattern. It includes the first three notes of the song "Summertime," surig by a trained woman. Both the musical and text information included in that sample are represented by the frequency-rime
contours (part-rone amplirudes are ceded in line thickness). The musical
information (i.e., pitch classes, vibrare, intonation) is included in the time
course of harmonic frequencies. The text information is included in enhancement and suppression, respectively, of certain harmonics by vocaltract resonances (forrnants) and in noisy and plosive clues.
Heinbach (1988) has demonstrated rhat from this type of part-tone-time
pattern ariother audio signa! can be synthesized that is aurally alrnost
indistinguishable from the original. This is convincing evidence for the
conclusion rhar the contourized representation includes practically all aurally relevant information. As the above example includes only one voice,

Ernsr Terhardr .


"Sumrxier t

Music Perception and Sensory Lnformation Acquisition




1 -:--...,....:--:--:


Fig. l. Parr-tone parrern as a funcrion of time, of a solo soprano singer (firsr rhree notes
of "Summerrime" by G. Gershwin). Parr-tone amplitudes are coded in line rhickness. The
informarion displayed is sufficienr to synrhesize an audio signa! rhar is aurally almosr
indisringuishable from the original. For rechnical reasons, only rhe frequency band 0-5
kHz was analyzed. The diagram illusrrares rhe tonal informarion presenr on rhe firsr leve!
of abstraction, rhar is, primary audirory contour.

ir should be noted that these results apply to polyphonic music and mulrivoice speech, as well.
In addition to the analogy between visual prirnary contour and spectral
pitch, there exist a number of psychophysical phenomena that strongly
support the analogy and further reduce its arbitrariness. First, there are
sorne accornpanying effects such as contrast enhancement (Mach bands;
cf. Carterette, Friedman, & Lovell, 1969; Small & Daniloff, 1967; Summerfield, Haggard, & Foster, 1984; Viemeister, 1980); after-contours
(Fastl, 1986; Wilson, 1970; Zwicker, 1964); and the type of "illusion"
in which perceprions of shape, Iength, or direction of visual contours are
sysrernatically differenr from corresponding objective pararneters, its auditory equivalenr is subjective shift of specrral pitch (e.g., by superirnposed
noise), and octave enlargement of pure tones (Stumpf, 1965; Terhardt,
1971, 1989; Walliser, 1969; Ward, 1954).


To evaluare adequately those analogous phenomena, it must be taken

into account that in the eye there is an extra spatial dimensionas compared
with the ear. While on the eye's retina the three-dimensional externa!
world is represemed by a rwo-dimensional continuous distribution of light
energy, on the inner ear's cochlear partition ir is a one-dirnensional distribution of sound energy. A visual contour is an abstraction of a line in
a (two-dimensional) plane; the "line" itself is one dimensional. It is thus
just logical that its auditory equivalent (i.e., spectral pitch) is nulldimensional, that is, a "point" on the low-high dimension. Therefore, to
a curvature or bending of a visual contour there corresponds a linear shift
of its auditory equivalent on the low-high dimension.
Second, ir is preciselythe role of spectral pitches as important carriers
of information that throws an explanatory light on the precision and
durability of short-terrn memory for pitch (e.g., Rakowski, 1972). If in
auditory communication, spectral pitches were of no particular significanee, one could hardly understand why the auditory systern spends any
effort to extraer them so efficiently and precisely, and why they are kept
in short-term memory for a considerable time interval (which actually is
on the order of a minute). This conceptual problem is immediately resolved
by rhe notion that spectral pirch is of high functional irnportance for
acquisition of information from acoustic signals whose pararnerers are
time varianr, in particular, speech. The relevance of these notions for the
perception of music is apparent: Perception of tonal music can hardly be
imagined without the aforerrientioned characteristics of short-rerrn mernory for pitch.
A musical tone ordinarily is composed of a number of harmonic part
tones of which the lower 8 to 12 evoke specrral pitches rhat correspond
to rheir frequencies (Thurlow, 1959; Plomp, 1964; Terhardt, 1972). Thus,
on rhe lowest leve! of the cognitive hierarchy an isolated musical tone must
be regarded asan auditory Gesralt=-at'rnoiecule" rather than an "atorn"
of music. While on higher levels of conscious perception the tone ordinarily may appear as a holistic unir to rhe listener, by drawing attention
to the lowesr leve! one can hear the part-tone pitches too. This perceptual
dualism may be regarded as evidence for the principies -of autonomy,
forward processing, and viewback: On presentation of a musical tone, the
forward processing hierarchy spontaneously and readily produces higherlevel holistic representations of the perceived object (i.e., the tone), while
through the viewback channel the individual spectral pitches present on
the lowest leve! can be accessed as well.

Secondary Contour and Virtual Pitch

The term "secondary contour" refers to contourization processes on the
second leve! of the hierarchy; ir does not imply minor relevance. The

existence of secondary conrours is evident borh in vision and audition. In

vision, they ordinarily are termed "illusory contours" (for a review, see
Parks, 1984). The choice of the term "illusory" is borh misleading and
elucidating. It is misleading because it suggests interpretation of the phenomenon as a kind of artifact or even malfunction. It is elucidating because
it reveals the principies of autonomy and viewback.
As illustrated by the example in Figure 2, virtual contours spontaneously emerge from presentation of appropriate configurations of primary
con tours (autonomy). When viewed from a higher cognitive leve], it is
recognized that the virtual conrours are not "real." However, that recognition does not change anything in what is seen. The viewback function
indeed is confined to just noticing an interpretation of the stirnulus that
has been auronomically established. As an essential principle of hierarchical inforrnation acquisition, rhe viewback functiori's purpose and advantage obviously is that ir enables drawing more conclusions on the
higher leve!, rhar is, after rhe autonomous low-level decision mechanisms
have quickly and efficiently finished their job. Apparently this is one of
the tricks by which sensory systems reconcile efficacy with flexibility.
As illustrated in Figure 2, the autonomous decision process on the
second leve! creares borh a nurnber of virtual contours and a virtual Gestalt, namely, a white square that partly covers a black frame. The prorninence of that virtual percept naturally depends on the amounr of primary
inforrnation that is compatible wirh such an interpretation. In that sense,
the second-level interpretation
integrares the separare four black angles
into a holistic object. So this is a visual example of the aforernentioned
dualism of "synthetic" autonornous interpretation .and "analytc" viewback. It appears conclusive that this example reveals another fundamental
and irnportant principle of sensory information acquisition.
Auditory perception of a musical tone can be explained by analogous
principies, at least where musical pitch is concerned. A pertinent theory
does already exist, namely, the virtual-pitch theory (Terhardt, 1972,

Fig. 2. Illustration of virtual conrours and virtual figures as an analogy ro virtual pirch
and roor,

Music Perception and Sensory Information Acquisitian


1974). Although this theory originally was not explicitly based on the
intermodal analogies and general principies discussed here it readilv fits
into them. T aking in to account the aforernentioned
analogies berween
visual prirnary contour and spectral pitch, and between visual virtual
contour and virtual pirch, the virtual-pitch theory turns out to be a natural
part of the comprehensive concept of sensory information acquisition.
The significance of the virtual-pirch theory for music perception has
been found to extend far beyond the ptch of single musical tones. The
theory includes explanations for such basic musical phenoma as tone
affinity and pitch ambiguity (e.g., octave equivalence), octave stretch and
stretch of the tone scale, the root phenomenon
(Rameau's "basse fondamentale"), and equivalence of chord inversions. The theory, and its
algorithrnic implernentations,
can be recornmended as a too! for music
rheory (cf. Terhardt, 1974, 1978, 1979, 1982; Terhardt, Stoll, & Seewann, 1982a,b; Parncutt, 1988, 1989).
Both visual virtual contour and audirory virtual pitch can be regarded
as sarnples of the irnmediate, forward processing, autonornous tendency
of low cognirive levels to extraer straightforwardly
"what a stimulus
rneans." Of course, this requires "knowledge,"
that is, use of certain
reasonable criteria. In visual perception, rhose criteria are dependent on
what type and configuration of objects ordinarily would produce the given
type of stimulus. And the same applies to auditory perceprion. As was
outlined in the virrual-pitch theory, ir is the human speech signa! that
probably provides an irnportant reference for aura! evaluarion of tonal
sounds. As by physical reasons, voiced speech elements are composed of
harrnonic part tones, the low-level mechanisms of the audirory systern
operare on the presumption that this is so fer any sound. According to
the theory, rhis is the reason why audirory creation of virtual pitch consisrently obeys the principle of "subharmonic coincidence derection" (Terhardr, 1972, 1974).
That behavior is assumed to have been acquired and setrled either ar
an early age by an individual or through biological evolurion. As was
discussed earlier, the relationships between subjective pirch shifts and
interval stretch suggest that the forrner is the case, that is, learning in early
life. The assumption was made that development of the aura! mechanism
that creares virtual pitch is an essential part of the systern rhar processes
speech, that is, abstracts linguistic inforrnarion from the highly redundant
speech signa!. This conclusion is supported by experimental evidence
showing that aura! capabiliries to norrnalize phoneric characterisrics of
speech exisr in early infancy (Kuhl, 1979; Miller, Younger, & Morse,
1982). Moreover, ir has been established that a human fetus can hear
already severa! rnonths before birth (in particular, the rnother's voice).
With regard to general principles of biological development, ir is very likely


Ernst Terhardt

that such acoustic stimulation has a pronounced conditioning effect on

the fetus's auditory system. Plasticiry of the pitch-evaluation mechanismwhich of course is required for that type of low-level learning-was
srill to exist in adults (Hall & Peters, 1982; Hall & Soderquist, 1982).
Whether or not individual conditioning and learning is rhe basis of
audirory cognitive achievements, it is not surprising that many of those
achievernents exist already in early infancy, So it is not surprising that
evidence for borh virtual pitch perception and sense fer octave equivalence
has been found in young infanrs (Clarkson & Clifton, 1985; Demany &
Armand, 1984). With regard ro the aforementioned intimare relationships
between principies of pitch perception and basic musical phenomena, it
is evident thar in rhe audirory system of very young-probably
the basic cognitive mechanisms to which tonal music
may appeal are implemented.
Rigorously, rhe question of whether those mechanisms are innate or
learned has not been decided yet. However, for getting many basic insights
imo auditory inforrnation acquisirion and music perception, rhat question
is of minor relevance anyhow-as
mentioned in rhe second section.
Concluding this section on secondary contour, it should be mentioned
that the tendency of any sensory system to extract "meaningful" secondlevel represenrations of primary contour configurations is so pronounced
that it cannot be stopped if, and while, any stimulus is given. A striking
example for this tendency was provided by Houtgast (1976). He dernonsrrared that under certain experimental conditions subjects assign subharmonic virtual pitches even to single pure tones. For example, with a
1000-Hz tone as stimulus, virtual pitches corresponding to 500, 333, 250,
and 200 Hz were heard.
This finding illustrares borh the principie of subharmonic coincidence,
where virtual pitch is concerned, and the fundamenta] "decision machine"
characrer of living organisms. Obviously, evolution has in any living organism very deeply implanted the principie that "any decision is (on the
average) better rhan no decision." Validity of this principle can indeed be
observed on any leve] of perception and behavior. In music it can for
instan ce be found in rhe tendency to assign - in the conrext of tonal
music-ro practically any pitch a certain pitch category, no matter how
much the actual pitch deviates from "ideal" intonarion.
An apparent implication of these notions is ambiguity. In the above
example, rhe pirch of the lOOO"Hz tone was ambiguous such that either
of the equivalent frequencies 1000, 500, 333, 250, and 200 Hz were
offered. The 1000-Hz frequency indicares, on the primary level, what
"really" was presentas a stimulus, while the other indicares, on the second
leve! "what ir reasonably could mean." The role of ambiguity, both in
general and with regard to music, deserves closer inspection, as follows.

Ambiguity and Similarity

Ambiguity has often been recognized as an irnporranr ingredient of
music (e.g., Bernstein, 1976; Thomson, 1983). Where sensory inforrnarion
acquisition in general is concerned, arnbiguiry is both typical and essential.
When one takes into account that in any case rhe effective stimulus of a
sensory organ can include only incomplete information on externa! objects
and events, it is apparent that the "meaning" of any given stimulus can
never be unambiguous.
In the hierarchical systern of conditioned decisions considered here,
ambiguity implies that the number of "solutions" achieved on either level
is greater than one. From the present point of view this indeed is essenrial,
as the solutions achieved on one level provide the input to the next. If that
input did not have any alterriatives, there were nothing left to decide. As
with "illusions," ambiguiry becomes noticed rhrough viewback, that is,
inspection of ready-rnade solutions on lower levels, and drawing additional conclusions on a high leve!.
One can make a distinction berween two basic sources of ambiguity.
The first is insufficiency of structural information included in the stimulus.
A pertinent exarnple was just discussed, that is, the case of reducing the
audirory stimulus to just one pure tone. The second source of ambiguity
is content in the stimulus of contradictory structural information. In vision, pertinent examples are provided by the class of "impossible figures,"
for example, the Necker cube and most of M. C. Escher's graphics. Most
of rhe melodic, harmonic, and rhythmic ambiguity of tonal music is analogous to visual impossible figures.
Even in the simple visual example shown in Figure 2, there is considerable ambiguity. First, the black angles may be seen just as what they
are: black bars on white background. That visual interpretation
achieved most easily when the black bars are narrow. Second, one may
see a white square that floats above a (supposedly closed) black frame.
Third, one may see two white squares (one rotated by 45 degrees and .
floaring above the orher) on a black background. Remarkably, rhe amount
of ambiguity does not seem ro be sysremarically dependent on the prorninence of the virtual contours and figures, which in turn is governed by
the width of the black frame. With increasing width of the frame, prorninence of virtual squares increases, but ambiguity remains the same-or
even increases as well.
The same type of ambiguiry is involved in audirory perception of a
musical tone. There is no such thing as "the" pitch of a complex tone.
On the lowest level, there is a set of (ordinarily harmonic) spectral pitches.
On the second level, virtual pirches are created. What is consciously perceived in rhe "spontaneous"
or synthetic mode is a holistic tonal object


that is prirnarily characterized by virtual pitches. By viewback, the lowlevel spectral pitches become recognized also. The virtual-pitch theory
accounts for that multiambiguity by assigning weights to the individual
pitches, either spectral or virtual. This is illustrated by Figure 3, where the
theoretical pitch distributions of two harmonic complex tones are shown.
The assumed fundamental frequencies are 440 Hz (upper diagram) and
220 Hz (lower). The 440-Hz tone is "higher in pitch" than the 220-Hz
tone, not only in the sense that its predominant pitch is higher, but in the
sense that the whole pattern is higher. So when a melodic sequence of
musical tones is considered, rhe pitch-time contour in rhe sense of Dowling's (1978) concept should be discussed in terms of the corresponding
sequence of pitch patterns rather than of single pitches.
As can also be seen in Figure 3, in the particular case of a 2: 1 fundamental frequency ratio there appears a type of similarity between the
two tones that is determined by identity of sorne pitches. When these two
tones are played one after the other, a portian of pitches of the first will
be "echoed" by the second. By visual analogy one can express this by
saying that the second Gestalt shares a number of contours with the first
one. Ir is apparent that this effect will promote a tendency to perceive the
second tone just as a replication of the first.
This is no less than a simple and straightforward explanation of octave
equivalence, that is, chroma. This explanation is very similar to that suggested already by Helmholtz (1954). However, an important new aspect
is provided by rhe presence of virtual pitches in the patterns. Whereas
Helmholtz had considered only rhe pattern of spectral pitches-which
extends from the fundamental frequency to severa! harmonics-the
present pitch patterns include a considerable number of virtual pitches that
are below the tone's fundamental frequency. This enhances the chance of
higher-level pirches of successive tones coinciding and thus provides considerable additional evidence for Helmholtz's conclusion.
Alrhough most pronounced for a 2: 1 frequency ratio, the effect of
similarity by coincidence of pitches applies to the ratio 2:3 as well, that
is, the fifth. For that ratio, the nurnber and weight of coinciding pitches
is less than for the octave. This accounrs borh for the existence of "fifth
equivalence" and for the fact that it is less pronounced than octave equvalence. Taking advantage of the described principies, one can design models for quantirative evaluation of tone affinities-another contribution to
a scientifically based music theory. The recent work of Parncutt (1988,
1989) provides solutions of rhat type.
While rhe above considerations were made on the basis of data provided
by the virtual-pitch theory, the ambiguous second-level pitch patterns of
individual musical tones have been experimentally verified. Figure 4 shows
a number of pitch histograms that were obtained by pitch marches to
harmonic complex tones with the fundamental frequencies indicated (from

Music Perception and Sensory lnformation


2 55 FFo1~:0Hz220









LJ_ _


5 5

Fz Az 03 A3
[' FF=2~0Hz

~ ~hv v



As E5


r v 1 r

A, 02 Az 03 A3

15 15

At. Es As

Fig. 3. Theorerical pirch parrerns, on rhe second leve! of abstracrion, of rwo harrnonic
complex rones with fundamental frequencies 220 and 440 Hz. Virtual pirches: v; specrral
pitches: s. Note ambiguiry of pirch and parrial coincidence of pirches in rhe rwo patterns.
Calcularion of pirches and pirch weighrs as described by Terhardt et al. (1982a).

Terhardt, Sroll, Schermbach, & Parncutt, 1986). In those experiments,

harmonic complex tones were binaurally presented through earphones,
with 60 dB SPL, and 0.2 sec duration. After each presentation, apure tone
of arbitrary frequency was presented, and the subject adjusted its frequency such that ir matched any spontaneously heard pitch of the previous
complex tone. Eight subjects took part, and each subject did six marches.
The histograms shown in Figure 4 represent rhe number of marches accumulated within a continuously shifted window with a width of 0.2
critica! bands. Abscissa is matching frequency, rhat is, pitch-equivalent
frequency. As expected, the highest peaks are at the complex tone's fundamental frequency, The same type of ambiguity as theoretically predicted
can be seen. Far a very low fundamental frequency (60 Hz), a pronounced
tendency was found far alternative marches one octave higher. At higher
fundamental frequencies, alternative marches to subharmonic frequencies
were found. By and large, the experimental data are well in line with
theoretical predictions (Figure 3).

The Tritone Paradox:

Another Exercise in Ambiguiry of Pitch

Although the ambiguity of pitch of a "normal" musical tone is sufficient

to explain octave equivalence (chroma) and fifth similarity, wirh a par-



Ernsr Terhardt

Music Perception m1g,~'fNs<f?,Jtfformatio11 Acquisition








tf3 ~














fund. frequency:

60 Hz

1.6 kHz 20

Fig. 4. Accumulared distributions of pure-rone marches ro harmonic complex rones wirh

the fundarnenral frequencies indicared (from Terhardt et al., 1986). Nore ambiguity of

ticular type of harmonic complex tone, ambiguiry of pitch can be made

particularly pronounced. That rype of complex tone includes only harmonics rhe frequencies of which are defined by

where n = O, 1,2,3, ... , and ( is a low base frequency, for example, in
the region below 100 Hz. The part tones either cover the enrire audible
frequency range (so that the lowest and highest merely fall below the
threshold of hearing); or they are limited by a bandpass filter to a certain
frequency band (for details see Shepard, 1964; Deursch, 1986).
In that type of complex-tone stimulus, pitch inforrnation is reduced in
such a way that the auditory pitch-evaluation mechanism cannot reasonably assign to it one dominant virtual pitch. What is heard is rather
a set of virtual pitches that are in an octave relationship to each other and
do not much differ in prominence. So, while the "chroma" or "pitch class"
of such a complex tone is well defined, its height is not. In a number of
experimenrs, Shepard (1964), Burns (1981), and Ohgushi (1985) have
dernonsrrated severa! aspects of the "circulariry" of perceived pitch that
is rypical for that type of stimulus.

The pronounced arnbiguiry of "Shepard tones" has suggested a number

of experimenrs to find out how the auditory system behaves when that
rype of tone is used in a simple quasimusical conrexr (Deutsch, 1986,
1988; Deursch, Kuyper, & Fisher, 1987; Deutsch, Moore, & Dolson,
1984). One of the effects found is called the trirone paradox. Manifestation of this paradox starts out with the notion that for two successive
Shepard tones that differ in pirch class by a tritone interval, ir is impossible
to find an objecrive criterion for deciding if the firsr is higher than rhe
second, or vice versa.
The first remarkable finding by Deutsch et al. was that subjects to whom
successive tritone inrervals of Shepard tones were presenred were wirh
considerable consisrency able to make a decisin on whether rhe interval
was "ascending" or "descending" in pirch heighr-alrhough ali subjects
did not respond the same way. The second significant finding was that the
"ascending/descending" decisions were consisrently dependenr on pitch
class. For example, one and rhe same subject would consistenrly hear rhe
interval e - H as descending, however, rhe interval E - A# as ascending,
That dependency of judgments on pitch class was termed the tritone paradox, as on first sight rhere appears to be no simple psychophysical basis
for that rype of response.
The tritone paradox as a phenomenon is indeed striking, and one can
hardly have a doubt that sorne kind of absolute pitch recognition must
be involved. Moreover, the phenomenon suggesrs thar the subject's decisions must be dependent on cognitive processes. The following explanation of the tritone paradox will revea! that borh these conclusions are
true. However, both "absolute pirch recognition" and "cognitive processes" turn out to play their role on a surprisingly low leve! of rhe
The key to the explanation is rhe simple fact that pitch-height ambiguity
of Shepard tones turns out to be distinctly limired. When pitch marches
such as described above for "normal" complex tones are carried out with
Shepard rones, ir rurns out that a certain absolure region of pitchequivalem frequencies is systemarically preferred, namely, the frequency
region exteriding roughly from 200 to 1000 Hz, with a maximum of
"preference" ar about 300 Hz. This was verified both theoretically (Terhardt et al., 1982b) and experimenrally (Terhardt et al., 1986). The source
of this effect is rhe combined influence of spectral dominance (Ritsma,
1967; Plomp, 1967), and subharmonic evaluation in formation of virtual
pirch (Terhardt, 1972).
Figure 5 illustrates the relationship berween the latter type of "absolute
pitch recognirion" and the tritone paradox. With the algorithm described
by Terhardt et al. (1982a), rhe virtual pitches were computed for Shepard
tones. The pitch classes are de~ored on the abscissa, while rhe theoretical

Music Perception

pitches pertinent to each tone are lined up.vertically. The theoretical prominence (pitch weight) of the pitches is indicated by the area of black
squares. The ordinate is scaled in semitones, and the reference frequency
both for the pitch classes on the abscissa and the semitione scale at the
ordinate is 440 Hz for A4. The Shepard tones were assumed to be composed according to Eq. (1) with base frequencies f of 16.35-32.70 Hz
(depending on pitch class). The SPLs of part tones were assumed to be
50 dB, and part tones were included from f up to maximally 6 kHz.
One can see in Figure 5 that, for example, the Shepard tone with pitch
class C (first item at the abscissa; base frequency i; = 16.35 Hz) produces
pitches at 24, 36, 48, etc. semitones, corresponding to the pitch heights
C2, C3, C4, etc The area of squares indicares that the pitches C4 and C5
(262 and 525 Hz equivalent frequency) are most prominent. The second
pitch class, fl, was created with the base frequency t: = 23 .12 Hz, that
is, higher than that of C by a tritone ratio. One can see, however, that
this is virtually irrelevant where theregion of most prominent pitches is
concerned-the pirches corresponding to the base frequency and its second
harmonic are more or less ignored borh by the ear and the theory.
To explain the tritone paradox on that basis-and within the general
approach put forward in the present study-one merely needs take into
account that the pitch patterns shown in Figure 5 visualize 3: second-level
cognitive representation of the corresponding Shepard rones, Evaluation
of whether rhe tritone intervals C--:. fl, C# - G, etc., ar~ ascending or
descending is thus a challenge to the third cognitive level. Although the
criteria for that evaluation are not a priori evidenr, visual inspection of
the diagram (Figure 5)-silggests a reasonable solution, namely, to trace the
direction in which the rnost prorriinent pitches move from the first to the
second Shepard tone. When (arbirrariiy) tle two most prminent pitches
are selected, one arrives at the solution indicated by arrows. The lattej
indeed reflect what was termed the tritone paradox: That the direction
of perceived pitch height is dependent on pitch class.
While the phenomenon that is decisive for this explanation-namely,
preference of a particular absolute pitch region-is induded iri thevirtualpitch theory, individual variations are not, of course. It is not difficult to
see that a small deviation from the average, of the position and/or shape
of the "preference characteristics" of a particular subject, can consiclerably
affect the "ascending/descending" judgments. As "preference characteristics" are justa parameter of auditory cognitive strategy, there may-indeed
exist systematic individual differences. For example, with respect to the
aforementioned theoretical relationships between virtual-pitch evaluation
and speech perception, one may speculate that acoustic parameters of both
externa! voices and one's own voice, by exposition in early life may have

and. Sen-'so:rfiformation Acquisition








. .









Fig. .5. Theoretical pitch patrerns of Shepard tones. Pitch class is indicared on the abscissa.
Pirches pertinent to each tone are verrically lined up. Ordinate: Pirch-equivalenr frequency,
expressed in semirones above C (16.35 Hz). Area of squares is proportional to calculared
pitch weight, that is, represents prominence. Note thar for ali Shepard tones, the rnosr
prominenr pitches are in the height region of abour 36 to 60 .sernitones, corresponding
to 131-525 Hz. On rhe abscissa the Shepard tones are arranged in pairs of trirone intervais.
The existence of the "preference region" causes systernaric, pirch-class-deperidenr ascending or descending of the two most prominent pitches wirhin each pair (arrows). This
explains the tritone paradox. Cornputarion as described by Terhardt et al. (1982a).

a conditioning effect. Whatsoever, in the light of the present explanation

of the tritone paradox, it is not surprising that the experimental results
found by Deutsch et al. show systernatic intersubject differences.


. Ernsr .Terhardt



Music Perception a'Yiar;eizfoiyi'b;formationAcquisition

. These otit>hsparticularly ernphasize the ;levi~te'bfres'e'arcfl on tr;J~it

perceptior by animals. The remarkable performance in perception of musical stirnuli of, for example, pigeons (Porter & Neuringer, 1984) and
.starlings (Hulse & Page, 1988) can be seen as a natural consequence of
rhe above principies. Those principies provide a conceptual link berween
research on animals and on humans.1 2


After the foregoing considerations of the inforrnation processing hierarchy and sorne of its low-level implications for the perception of music,
ir would be just consequent to proceed wirh discussing rhird- and higherlevel processes. Ir is essentially on the third leve! that temporally coded
information comes into play, that is, rhe musical inforrnation included in
melodic contour, root progression, and rhythm. As on those topics much
experimental and theoretical research as been done already (e.g., Deutsch,
1982a), ir is an inreresting challenge to incorporare them into the present
general approach. Of course, rhis by far would exceed the scope of the
present study.
There exist a number of experimental investigations pertinent to the
borderline between low-level, staric, and higher-level dynamic cognitive
processes, providing a kind of interface. Here appear of particular rele-



The "continuiry effect" and "pulsarion threshold" (e.g., Thurlow, 1957; Warren, Obusek, & Ackroff, 1972; Houtgasr,
Relationships between perceived rhythm and spectral- and
virrual-pitch patterns such as dernonstrared by van Noorden
Virtual pitch evoked by nonsimultaneous harmonics (Hall &
Perers, 1981).
Finally, sorne biological aspects of the present approach deserve to be
rnenrioned. Although the present approach does not give an imrnediate
answer to the question on a possible "survival value of music" (Roederer,
1984), ir provides a pertinent message. Thar message essentiatly is thar
tbere exisr a number of fundamental principies of sensory information
acquisition that are decisive for survival:
Immediare conditioned reaction to an externa] challenge.
As a provision of appropriate reaction: Immediate, auronomous processing of the information included in any sensory
stimulus (i.e., abstraction).
- As the basic element of abstraction: Discretization (conrourizarion) by conditioned decision.
As a tool and provision for efficienr absrraction: Evolurion,
acquisition, and utilization of distributed knowledge.
As ouclined in rhe present srudy, it is rhose survival-relevant principies that
govern auditory perception of tonal music as well.



Bernsrein, L. The unanstoered question, Cambridge, MA: Harvard Universiry Press, 1976.
Bregman, A. S., & Campbell, ]. Primary audirory strearn segregarion and rhe perception
of order in rapid sequences of rones. [ournal o( Experimental Psychology, 1971, 89,
Burns, E. Circulariry in relarive pirch judgments for inharmonic rones: The Shepard demonsrration revisited, again. Perception & Psychophvsics. 1981, 30, 467-472.
Carrerette, E. C., Friedman, M. P., & Lovell,]. D. Mach bands in hearing. [ournal of tbe
Acoustical Society of America, 1969, 45, 986-998.
Carrererre, E. C., Kohl, D. V., & Pirt, M. A. Similariries arnong rransforrned melodies: The
absrraction of invarianrs, Music Perception, 1986, 3, 393-410.
Clarkson, M. G., & Clifron, R. K. Infanr pirch perceprion: Evidence for responding ro pitch
caregories and rhe missing fundamental. [ournal of the Acousticai Society of America,
1985, 77, 1521-1528.'
. '
Demany; L., & rmand, F. The perceprual realiry of tone chrorna in early infancy, Journal
'" of the Acoustical Society. of America, 1984, 76; 57-66.
Deu~~h,_;D,Music reccgnirion. Psycbological Reuiew, 1969, !li,_300-307. .
Deutsch, D. (Ed.) The .psychology of music. New York: A.:adem1c,Press, 1982a ....
Deursch, D. Grouping mechanisms in rnusic.In D:Dc:urs<:h(Ed.), Tbe psjicholoi;i of!ndsic.
New York: Academic Press, 1982b.
,;~'-' - ;
Deutsch, D. A musical paradox. Music Perception, 198,, 3, 275-28Q.
Deutsch, D. The semitone paradox. Musk: Perception, i9SS, 6,~115,,-132. ,
. , ,
Deutsch, D., Kuyper, W. L., & Fisher, Y. The trirone paraqt\.x_:-frs'
prsence and form'ii(
disrriburiori 'iri a gerieral popularion. Music Perception, 1987, 5, 79~92:

Deursch, D., Moore, F. R., & Dolson, M. Pirch classes differ wirh respecr ro heighr. lvfusic
, Perception, 1984, 2, 265-271.
.... ,
Dowlirig, W. J: Scale and conrour: Two cornponenrs of J rheory of rnernory for melodies.
Psycbological Reuieur, 1978, 85, 341-354.

Fasrl, Hz Auditory after-images produced by cornplex rones with a specrral gap. In Pro' .ceedings of the 12th lnternational Congress on Acoustics. Bl:--5. Toronro: Beauregard
Press, 1986:

Hall, J. W., & Prers, R. W. Pitch for nonsimultaneous successiveharmonics in quier and
noise. [ournal of tbe Acouetical Socity 'o] Amrica. 19SL 69, 509-513.
Hall.]. W., & Perers, R-.W. Changein rhe pirch of a cornplex rene following irs associarion
_wirh i, second cornplex rone. [ournal of the Acoustical Society of America, 1982, 71,
Hall, J:'W., & Soderquisr, D. R. Transient cornplex and pure rone pirch changes by adapration, [ournal of the Acoustical Society of Americe. 1982, 71, 665-670.
