Episodic Memory For Musical Prosody PDF

Journal of Memory and Language 45, 526–545 (2001)
doi:10.1006/jmla.2000.2780, available online at http://www.academicpress.com on
Episodic Memory for Musical Prosody
Caroline Palmer and Melissa K. Jungers
Ohio State University
and
Peter W. Jusczyk
Johns Hopkins University
Prosodic cues are often informative in speech perception; similar acoustic features distinguish music perform-
ances. Three experiments addressed the role of prosodic cues in memory for music. In Experiment 1, musically
trained and untrained listeners were familiarized with performances of short musical excerpts and later heard the
familiarized performances as well as novel performances of the same music. All listeners identified correctly the
familiarized performances. In Experiment 2, 10-month-old infants were familiarized with the same performances.
In a head-turn preference procedure, infants oriented longer to the familiarized performances than to the novel
performances. In Experiment 3, musically experienced listeners identified familiarized excerpts placed in differ-
ent melodic contexts; identification was more accurate for excerpts whose prosodic cues (intensity and articula-
tion) conflicted with the structure of the melodic context. These findings support episodic memory for music that
incorporates stimulus-specific acoustic features as well as abstract structural features. © 2001 Academic Press
Key Words: episodic memory; music perception; prosody.
When listeners recall a familiar musical tune, mation, amplitude, and relative durations of
do they remember the acoustic features of a par- speech, are said to characterize the uniqueness
ticular performance, or only an abstract struc- of each utterance (Speer, Crowder, & Thomas,
tural pattern of pitches and durations? This 1993). The same acoustic features differentiate
question is similar to the issue of talker identity one music performance from another; variations
in speech recognition: whether listeners’ mem- in frequency, timbre, amplitude, and relative du-
ory for and identification of lexical items incor- rations, often termed performance “expression,”
porates specific nonlexical, prosodic features of form the microstructure of a music performance
an utterance. Prosody can refer both to an ab- and differentiate it from another performance of
stract level of phonological structure and to its the same music (cf. Palmer, 1997). Performance
acoustic realization in speech (Cutler, Dahan, & expression can aid listeners in differentiating
Donselaar, 1997; Shattuck-Hufnagel & Turk, among structural ambiguities (such as phrasal
1996). Prosodic cues, including acoustic varia- and metrical boundaries) that arise in music
tions in fundamental frequency, spectral infor- (Palmer, 1996; Sloboda, 1985), just as prosodic
features of speech can clarify the intended
meaning of a syntactically ambiguous sentence
This research was supported in part by NIMH grant
(Lehiste, Olive, & Streeter, 1976; Price, Osten-
MH45764 to the first author, and by NICHD grant HD-
15795 and a Research Scientist AWARD from NIMH (MH- dorf, Shattuck-Hufnagel, & Fong, 1991; Scott,
01490) to the third author. We gratefully acknowledge the 1982). Thus, prosodic features may play similar
assistance of Grant Baldwin, Zeb Highben, Ann Marie roles in aiding segmentation processes in music
Jusczyk, and Kristina Krasnov in conducting the experi- and speech. We investigate here whether musi-
ments, and Rosalee Meyer, Peter Pfordresher, and Diana
cal “prosody,” or expression that characterizes
Raffman for comments on an earlier draft.
Address correspondence and reprint requests to Caroline particular music performances, plays a similar
Palmer, Psychology Dept., Ohio State University, 1885 Neil function in music perception and recognition as
Ave., Columbus, OH 43210. E-mail: palmer.1@osu.edu. in speech.
526 0749-596X/01 $35.00

Copyright © 2001 by Academic Press
All rights of reproduction in any form reserved.
EPISODIC MEMORY FOR MUSIC 527
Although the structure of music and speech music perception researchers concentrated on
differ, the perceptual issues that researchers ad- normalized musical standards that reduced
dress in both domains are similar: specifically, stimulus variability in perceptual experiments
to account for the perceptual constancy that lis- (cf. Dowling & Harwood, 1986).
teners experience in the presence of a range of More recently, prosodic cues have been con-
physical change. For example, as early as in- sidered a source of information in speech that
fancy, listeners can recognize a melody as the aids lexical recognition, rather than as stimulus
same when it is transposed to a different set of variation to be filtered out. In this view, extralin-
frequencies, performed at a different rate or guistic properties, such as gender, dialect, and
tempo, or performed in a different instrumental speaking rate, can help listeners identify linguis-
timbre or “voice” (Chang & Trehub, 1977; Tre- tic categories (Nygaard, Sommers, & Pisoni,
hub & Thorpe, 1989). Even music whose pitch 1994). Talker-specific characteristics of speech
or duration content has changed, as in varia- can provide information about the talker’s gen-
tions on musical themes, permits perceptual con- der, identity, or age (Fellowes, Remez, & Rubin,
stancy (Dowling & Harwood, 1986; Large, 1997; Remez, Fellowes, & Rubin, 1997). Talker
Palmer, & Pollack, 1995; Serafine, Glassman & and rate information also influence listeners’
Overbeeke, 1989). Despite the many types of memory for speech; previously heard words are
stimulus variability that music displays across more accurately recognized when the produc-
performers, rates, timbres, and contexts, music tion rate or talker’s voice is retained (Bradlow,
perception is robust and adaptable over a range Nygaard, & Pisoni, 1999; Palmeri et al., 1993).
of physical changes in the acoustic signal. In addition, previously presented sentences are
Consistent with perceptual constancy, early recognized more accurately when the same
studies of music perception took a view similar prosodic cues are retained, and prosodic cues
to that of speaker normalization: stimulus vari- aid listeners’ memory for syntactically ambigu-
ability (including differences across music per- ous sentences (Speer, Crowder, & Thomas,
formances) was treated as noise, to be filtered 1993). These findings suggest that highly de-
out in an abstract memory representation (cf. tailed nonlexical features of the acoustic signal
Large, Palmer, & Pollack, 1995). Normaliza- are encoded in memory representations along
tion refers to the process of converting or trans- with lexical content (Goldinger, 1997; Jusczyk,
forming physically different tokens into some 1997; Pisoni, 1997).
common representational format that is the ulti- Patterns of stimulus variability in music per-
mate goal of perceptual analysis, and these formances suggest that performance expression
standardized representations are stored in mem- is systematic and intentional, and likely to be
ory (Pisoni, 1997). Normalization views led to perceptually informative. Musicians can repli-
the search for an idealized symbolic representa- cate expressive patterns of timing and dynamics
tion for spoken language in which stimulus (correlated with intensity) across their perform-
variability is treated as noise; the result of nor- ances with high precision (Henderson, 1936;
malization processes was a reduction in stimu- Seashore, 1938), and attempts to play mechani-
lus variability and often a loss of prosodic cally or without expression significantly
information. Linguistic properties of speech dampen these expressive features (Bengtsson &
that carry the intended message were consid- Gabrielsson, 1983; Palmer, 1989). Performers’
ered separate from indexical or extralinguistic use of prosodic cues to mark musical units such
(nonstructural) features such as the particular as metrical and phrase structure increases with
talker’s voice, gender, or dialect (Gerstman, short-term practice and overall experience
1968; Shankweiler, Strange, & Verbrugge, (Palmer, 1996b; Sloboda, 1983). Expression in
1977; Studdert-Kennedy, 1974). Just as this music performance can be systematically af-
theoretical perspective guided speech re- fected by both structural dimensions (harmony,
searchers’ choices of stimuli that reduced or melody, rhythm, meter, etc.) and nonstructural
eliminated variability on unwanted dimensions, dimensions (affect, tempo, other interpretive de-
528 PALMER, JUNGERS, AND JUSCZYK
cisions), and it is often difficult to separate the an implicit assumption that only through exten-
two. Attempts to formalize a rule-based syntax sive training can one learn the relationship be-
of the relationship between intended musical tween intended structure and acoustic variation
structure and the resulting performance expres- in performance. This contrasts with a common
sion, called the “structure–expression map- view in speech that prosodic features provide a
ping,” have met with some success (Clynes, low-level cue to aid segmentation for even in-
1986; Sundberg, Askenfelt, & Fryden, 1983; experienced listeners (Cooper & Sorensen,
Todd, 1985), as measured by listeners’ judg- 1977; Lehiste, 1970; Streeter, 1978). Research
ments of computationally generated simulations in infant perception also suggests that musical
of performances (Clynes, 1995; Thompson, expertise is not necessary for perception of
1989) and comparisons with human perform- prosodic features. Infants show preferences for
ances (Repp, 1989). However, the results sug- infant-directed songs such as lullabies, which
gest that both piece- and performance-specific show heightened cues such as higher pitch,
factors influence performance expression as slower tempo, distinctive timbre, and changes
much as the structural features modeled by the in fundamental frequency (Trainor, Clark,
simulations. The experiments reported here test Huntley, & Adams, 1997; Trehub & Trainor,
whether prosodic features of music performance 1998), over non-infant-directed songs (Trainor,
encode contextual information that listeners can 1996). Four-month-old infants prefer listening
use in memory tasks. to music with pauses inserted at phrase bound-
Fewer studies have tested which types of aries over the same music with pauses inserted
stimulus variability in music performance are in the middle of phrases (Krumhansl &
perceptually informative. Musically trained lis- Jusczyk, 1990). Further experiments confirmed
teners are able to identify performers’ intended that drops in pitch contour and duration length-
structure (such as metrical and melodic struc- ening are acoustic variables that can mark unit
ture) on the basis of intensity, articulation (inter- or phrase boundaries for infants (Jusczyk &
stimulus intervals between musical tones), and Krumhansl, 1993; Trainor & Adams, 2000), the
timing cues (Nakamura, 1987; Palmer, 1996b; same prosodic markers of clausal units that in-
Sloboda, 1985). For example, musically trained fants respond to in speech (Hirsh-Pasek et al.,
listeners can correctly identify the intended 1987). One of the goals of this paper is to ad-
phrase structure in piano performances more dress whether musical training (in adults) or
often when acoustically varying features of in- experience (in infants) is a prerequisite for
teronset intervals and intensities are present prosodic features to be incorporated in memory
than when they are normalized in the perform- for music.
ances (Palmer, 1996a). Listeners are also accu- Although there is little study of infants’ mem-
rate at identifying performances of the same ory for music, some studies suggest that
music intended as musically expressive (nor- prosodic features can be retained in infants’
mal), mechanical (deadpan), or exaggerated memory for speech. Two-month-old infants can
(Kendall & Carterette, 1990). In addition, listen- use the prosody of a sentence to organize and re-
ers can correctly identify a performer’s intended member words (Mandel, Jusczyk, & Kemler-
emotion on the basis of prosodic features; for Nelson, 1994). Six-month-old infants are better
example, a slow performance tempo often sig- able to locate prosodically well-formed speech
nals a sad emotional state (Gabrielsson & Juslin, units than ill-formed units in speech passages,
1996). The relative importance of different suggesting their ability to use prosodic cues to
prosodic cues is unknown; only recently have parse speech is operative at an early age (Nazzi,
systematic analyses of relations among sources Kemler Nelson, Jusczyk, & Jusczyk, 2000;
of performance variability been undertaken Soderstrom, Jusczyk, & Kemler Nelson, 2000).
(Juslin, 1997). Moreover, when they begin to segment words,
These perceptual studies rely heavily on infants appear to rely heavily on prosodic cues
musically trained listeners, perhaps reflecting to word boundaries (Jusczyk, Houston, & New-
some, 1999). Infants also appear to be storing whether musically trained and untrained adults
information about what they hear; nine-month- listeners can learn and later identify short music
old infants can retain words over a 2-week pe- performances on the basis of their subcategori-
riod that were presented in sentences (Jusczyk cal acoustic cues. The second experiment ad-
& Hohne, 1997). Thus, long-term memory for dresses the same question with 10-month-old
word units is operative by this age. One reason infants; using the Headturn Preference Proce-
that a perceptual system might preserve fine de- dure (Kemler Nelson et al., 1995), we consider
tails about speech would be to allow early learn- whether musical acculturation is required for
ers to accurately imitate and produce patterns listeners to remember particular prosodic fea-
heard in their language environment (Studdert- tures of music. The final experiment examines
Kennedy, 1983). We test here the hypothesis whether prosodic cues are useful during seg-
that listeners as early as infancy (10 months old) mentation, specifically, in the identification of
have the ability to remember and later recognize musical units embedded in larger musical con-
music performances on the basis of particular texts. The contextual relationship between the
prosodic features. prosodic cues of the embedded musical unit and
It is possible that prosodic cues aid music the larger melodic context is altered, by present-
segmentation but are not retained in memory ing the same unit embedded in different
once segmentation processes are complete. melodies. Together, these experiments provide
Raffman (1993) proposes an explanation for evidence as to whether prosodic features of
why prosodic cues in music can be discrimi- music performances can be stored in memory
nated but not recognized or remembered later. and later identified by a variety of listeners.
In her view, expressive nuances are features of
the surface of a performance; they can be influ- EXPERIMENT 1: MUSICIANS’ AND
ential in the formation of a structural representa- NONMUSICIANS’ MEMORY FOR
tion, but they are not retained as part of it. Based PROSODIC CUES
on findings such as Siegel and Siegel’s (1977), We first examine whether adult listeners can
that musically trained listeners cannot accu- identify specific performances of a short musi-
rately identify out-of-tune pitch intervals (inter- cal sequence they heard earlier on the basis of
vals that fall between well-learned pitch cate- prosodic cues. The experiment contains a famil-
gories in equal temperament scales), Raffman iarization stage and a subsequent test stage, to
proposes that only information at a categorical parallel as closely as possible the infant para-
level (such as notated pitches and durations) can digm of Experiment 2. Listeners were familiar-
be encoded in a lasting memory representation. ized with performances of short melodic se-
She argues that expressive nuances in music are quences, and later were presented with both
subcategorical: below the level at which listen- familiar sequences and novel performances of
ers possess schemas. Because a primary func- the same melodies: sequences that contained the
tion of schemas is to reduce the information same melodic and rhythmic structure (same cat-
load imposed by the expressive nuances, there egorical pitch/duration pattern as notated in a
would be no advantage (and likely a memory musical score) but different prosodic cues. The
capacity disadvantage) to acquiring schemas particular prosodic cues in piano performances
whose resolutions were as fine as that of dis- included intensity, interonset interval, and artic-
crimination of prosodic cues. Although listeners ulation cues. If listeners can identify the se-
can discriminate among musical prosodic fea- quences heard at familiarization, that would in-
tures over short time periods, they cannot retain dicate that listeners are using cues other than the
a lasting memory for such features over longer melodic (pitch) or rhythmic (duration) structure
time periods in this view. of the performances: specifically, that listeners
We address listeners’ retention of prosodic are capable of using the subcategorical acoustic
features in music performance in three ex- cues to later identify a particular sequence of
periments. The first experiment investigates musical events.
We also examine whether the ability to iden- stimuli were sounded over JBL Studio Monitor
tify subcategorical acoustic cues requires musi- 4410 speakers placed around the piano in the
cal training. Musically trained listeners are com- same room and at the same decibel level
pared with listeners who had no musical (within 3 dB SPL) at which they were origi-
training. If both groups can identify perform- nally recorded on the piano. The sound level
ances on the basis of prosodic cues, this would averaged 75 dB SPL, about 25 dB above ambi-
suggest a stronger analogy with the speech liter- ent noise levels.
ature, in that a wide range of listeners of varying Materials. Two performances each of two dif-
musical experience show sensitivity to the same ferent five-note melodic sequences, referred to
acoustic features of auditory events. If only mu- as “short sequences,” served as stimulus materi-
sically trained listeners are able to remember als. Each short sequence appeared in musical
and later identify music on the basis of these notation, embedded in two different melodic
cues, it would suggest that sensitivity to acoustic contexts: one was a melody in 3/4 time signa-
cues is acquired with expertise. This finding ture (a ternary meter, with metrical accents on
would limit an analogy to how the same every third beat) and one was in 4/4 time signa-
acoustic cues are perceived in speech, because ture (a binary meter, with metrical accents on
both nonmusician and musician (adult) listeners every second and fourth beats). The melodic
display advanced speech abilities. contexts were designed to elicit changes in the
relative importance of note events in the short
Method sequences, including (but not limited to) the lo-
Participants. Twenty-four musically trained cation of strong and weak metrical accents. One
listeners and 16 listeners without musical train- of the five-note sequences and its two melodic
ing from the Ohio State University community contexts are shown in Fig. 1; only the short se-
participated in this study. Musician listeners quences, marked A⬘ and B⬘ in Fig. 1, served as
had between 4 and 13 years of private instruc- stimuli in this experiment. The melodic con-
tion on a musical instrument (mean ⫽ 8.1 texts, composed by a musician for the experi-
years). Nonmusician listeners had less than one ment, had the following properties: each
year of musical training (mean ⫽ 1 month). melodic context for a given short sequence was
None of the listeners reported any hearing prob- in the same key (C-Major or G-Major) and in
lems. Listeners received either course credit in the same frequency range (female vocal range),
an introductory psychology course or a nominal and each was 13–15 beats long (each melody
sum for their participation. ended on a strong metrical beat). The first tone
Apparatus. The stimuli were created on of each short sequence began in the middle of a
a computer-monitored Yamaha Disklavier melody on a strong metrical beat (indicated by
MX100 acoustic upright piano. Optical sensors the metrical barline or vertical line preceding
detected keypress velocities without affecting each bracketed excerpt in Fig. 1), and the event
the touch or sound of the acoustic piano. The immediately preceding the first tone of the short
timing resolution was 2 ms for note events, sequence was the same in notated pitch and
with precision (measured by the standard devi- duration across the two melodic contexts. Ex-
ation of interonset intervals during recording) cept for the short sequences embedded within
within 0.8% for interonset intervals in the range them, the melodic contexts were composed to
of the performances. The pitch, timing, and be as different in pitch contour and duration pat-
hammer velocity values (correlated with inten- tern as possible, again, to elicit changes in the
sity) for each note event were recorded on the relative importance of note events in the short
computer. The musical stimuli used in the ex- sequences.
periment were first recorded on the piano and Piano performances of the entire melodic
later played back on the same piano and contexts were collected from an experienced
recorded to compact disc at a 44.1 k sampling pianist, who had 30 years of performing experi-
rate for use in the perceptual experiments. All ence and 18 years of private instruction. The
FIG. 1. Example of a short musical sequence, indicated in brackets, and its two melodic contexts, A and B,
used in the experiments. Intensity (hammer velocity units) and articulation values (in ms) for the short sequence
appear below the musical notation.
pianist was instructed to perform the melodies metronome set to 152 beats per minute (quarter-
in an exaggerated fashion, “as if playing to a note ⫽ beat), or 394 ms/quarter-note, to ensure
child,” to mimic instructions to speakers in in- similar tempi across performances. When asked
fant studies of speech perception (Nazzi et al., afterward, the pianist reported not being aware
2000). The performances were recorded to a that an identical short sequence was embedded
in each melody pair. The MIDI event informa- p ⬍ .01, confirming that intensity cues in these
tion (pitch, interonset interval, and hammer performances increased for events at strong
velocity values (correlated with intensity) as metrical accents.
recorded by the optical sensors in the piano) de- Articulation values were measured by the
noting the short sequences was excised from the offset time of the current event (key release
melodic contexts, beginning at the onset time of time) minus the onset time of the next event
the first tone, up to the offset time of the last (keypress) in ms, as in previous studies of piano
tone. performance (Palmer, 1989, 1996b; Sloboda,
Analyses were conducted on the interonset 1983); a positive value denotes a smooth or
interval, intensity, and articulation values in the “legato” style, whereas a negative value denotes
performances. Analyses of each event’s interon- a gap or a “staccato” style. Articulation values
set (IOI) values indicated that the mean quarter- differed across performances, F (3,36) ⫽ 33.3,
note interval was 394 ms (equivalent to the indi- p ⬍ .01, and beats, F (2,36) ⫽ 10.8, p ⬍ .01, re-
cated metronomic value of 394 ms), and there flecting primarily a melody by beat interaction,
were no significant differences in mean quarter- F (6,36) ⫽ 6.8, p ⬍ .01; as shown in Fig. 1,
note IOIs by stimulus melody (4), meter (3/4 or legato/staccato patterns were unique within
4/4), or beats within each measure (1–3, the each performance. There was no correlation of
maximum number of beats that can be com- articulation values with music-theoretic predic-
pared across all melodies). Thus, interonset val- tions of metrical accent strength, and articula-
ues did not differ significantly from the duration values did not correlate with intensity val-
tional categories notated in the score, and there ues in the short sequences or in the entire
were no differences in tempo (largely because melodies (p ⬎ .10). Thus, the short sequences
of the presence of the metronome). Analyses of associated with each melody were identical in
the hammer velocity (intensity) values in arbi- terms of the pitch and duration categories in the
trary MIDI units which range from 0 (silent) to musical score, but each performance had differ-
127 (loudest value) indicated that one of the ent patterns of intensity and articulation cues,
four performances was louder than the others, which reflected in part the metrical structure
F (3,36) ⫽ 3.29, p ⬍ .05. To avoid stimulus dif- and the specific melodic context, respectively.
ferences in absolute intensity level, the MIDI Design and Procedure. In the familiarization
hammer velocity values of this melody were stage of the experiment, listeners heard two of
lowered by a constant value so that the mean of the four short sequences, the ternary-meter (3/4)
the revised values was equal to the mean value performance of one sequence and the binary-
of the remaining three short sequences, but the meter (4/4) performance of the other sequence,
melody retained its note-by-note intensity dif- to ensure that meter was not confounded with
ferences. Analyses of the adjusted intensities familiarization. Which short sequences were
indicated that intensities were higher on metri- present at familiarization and the order in which
cally strong beats; as shown in Fig. 1, perform- sequences were presented were counterbalanced
ances of melodies in 3/4 meter had higher inten- across subjects. In the test stage of the experi-
sities only on beat 1 (strong metrical accent), ment, listeners heard all four short sequences.
whereas performances of melodies in 4/4 meter Each block of trials contained the four short se-
had higher intensities on beats 1 and 3 (strong quences presented in random order, and there
metrical accents) (Dunn-Bonferroni, p ⬍ .05). were 9 blocks of test trials, yielding a total of 36
Event intensities across the short sequences trials. The independent variable of whether each
were correlated with music-theoretic predic- sequence in the test trials was presented at fa-
tions of metrical accent strength (Lerdahl & miliarization was a within-subjects variable.
Jackendoff, 1983), based on the number of ac- At the beginning of the experiment, listeners
cents in a metrical grid predicted by the time were instructed that they would hear different
signature that aligned with each note event (see performances of the same melody during the ex-
Palmer & Krumhansl, 1987, for further details). periment and they would be tested to see if they
These correlations were significant, r ⫽ .72, could recognize different performances. During
the familiarization stage of the experiment, one tification (“yes”) responses across trial blocks
short sequence was sounded 12 times, separated were computed for each listener. A two-way
by 750 ms of silence, followed by the second analysis of variance (ANOVA) on listeners’ per-
short sequence sounded 12 times. Listeners centages of identification responses by familiar-
were told to listen carefully to the familiariza- ization condition (was test sequence familiar-
tion trials because they would be asked to recog- ized/not familiarized) and musical training
nize those particular performances later. They (musician/nonmusician) yielded a significant ef-
were also told they would hear a number of per- fect of familiarization, F (1,38) ⫽ 78.0, p ⬍ .01.
formances afterward, some of which were from There were no effects of training or interactions
the familiarization stage. with familiarization. As shown in Fig. 2, both
During the test stage of the experiment, a groups of listeners were able to identify the par-
short sequence was sounded twice in a row (with ticular performances with which they had been
1 s of silence between the two hearings) on each familiarized, when presented among other per-
trial. Listeners were instructed to respond “yes” formances of the same pitch and rhythmic (du-
or “no” as to whether the test trial was identical rational pattern) content; mean responses for
to one of the sequences they had heard during each condition shown in Fig. 2 differed signifi-
the familiarization stage. Listeners were told cantly from 50%, the chance estimate (p ⬍ .01).
that some of the test sequences would have the The d-prime values were computed to verify
same notes as the familiarization sequences, but that musicians did not have greater sensitivity
they would not be identical because they were than nonmusicians; musicians’ (d⬘ ⫽ 1.73) and
from a different performance. Listeners were nonmusicians’ mean d-prime values (d⬘ ⫽ 1.58)
also asked to mark their degree of confidence in were both greater than zero (p ⬍ .01) and did
their answer on a three-point scale (1 ⫽ not con- not differ significantly from each other (non-
fident, 3 ⫽ confident) for the test trials. There parametric Mann-Whitney U ⫽ 218, p ⬎ .10).
were 5 s of silence between test trials. The confidence scale ratings did not differ by fa-
Nonmusician listeners were given additional miliarization or musical training.
instructions before the experiment, to clarify These results indicate that listeners are able to
the nature of the task. These listeners were told encode prosodic cues associated with a particu-
they would hear two different performances of lar music performance and use them later to
the same musical piece, which the experi- identify familiar performances from a set of per-
menter then sounded for them. Then they were formances of the same music. Acoustic features
told they would hear two identical perform- of intensity and articulation were likely to have
ances of the same musical piece, which the ex- influenced listeners’ identification judgments;
perimenter then played. The reason for the ad- interonset intervals did not differ from the cate-
ditional instructions was to help listeners gorical notated durations in these performances,
distinguish the goal of recognizing the same due to the presence of a metronome. In addition,
performance (same prosodic cues), the aim of musical training was not necessary to perform
this experiment, from the separate goal of rec- this task; listeners with no musical training per-
ognizing the same melody (same pitches and formed as well as listeners with musical train-
durations but different prosodic cues). The mu- ing. Thus, prosodic features in music perform-
sical examples used in the additional instruc- ance were useful for a wide range of listeners
tions were from a similarly constructed musical and could be encoded and later used to identify
sequence that was not included in any of the particular musical sequences.
experiments.
EXPERIMENT 2: ROLE OF MUSICAL
Results and Discussion ACCULTURATION
The sequences that were heard at familiariza- Musical training does not appear to be neces-
tion, the random order of test trials, and trial sary for identification of musical sequences
block were not significant factors in listeners’ based on their prosodic cues. However, identifi-
identification responses, so percentages of iden- cation may require musical acculturation:
FIG. 2. Experiment 1: mean percentage of yes responses for musically trained listeners (black bars) and un-
trained listeners (white bars) by familiarization condition (was test sequence familiarized/not familiarized), with
standard error bars.
knowledge of performance styles and genres also show the same abilities for music perform-
that listeners acquire passively (without explicit ances?
training) over years of exposure to musical We use the Headturn Preference Procedure
forms. If, however, the capacity to store and re- with 10-month-old infants to test their ability to
member acoustic features specific to music per- identify performances heard earlier on the basis
formances is available as early as the first year of prosodic features. The procedure takes ad-
of life, then even infants, who do not have years vantage of the finding that, when varied stimuli
of passive exposure to music, may be capable of are used, preferences among infants over 4.5
identifying performances they have heard be- months of age tend to be correlated with famil-
fore. Indeed, evidence from speech perception iarity. Specifically, the amount of time spent lis-
studies suggests that infants store indexical tening to stimulus items, a measure of prefer-
properties (such as talker voice information) ence, often correlates with amount of familiarity
along with the phonetic properties of utter- with those items (Jusczyk, 1998; Kemler Nelson
ances. Houston and Jusczyk (2000) found that et al., 1995). In this experiment, infant listeners
7.5-month-olds are affected by talker voice were familiarized with the same short musical
characteristics in recognizing words in fluent sequences as used in Experiment 1, and the in-
speech, suggesting that such information is infants were presented with all performances (fa-
cluded in their representations of the sound pat- miliar and novel) at test. The amount of time
terns of words. Moreover, Jusczyk, Hohne, spent looking (as measured by orienting toward
Jusczyk, & Redanz (1993) found evidence that the loudspeaker over which the test sequence
infants remember the voice characteristics of an was sounded) provides a measure of familiarity
unfamiliar talker reading stories for as long as 2 with the stimulus. If prosodic cues in music per-
weeks. Thus, infants appeared to encode not formance are remembered and later used for
only information about what was said but also identification at as early as 10 months of age,
information about how it was said. Might they then infant listeners should exhibit larger orien-
tation times for familiar sequences than for four test sequences were presented at familiar-
novel sequences. ization was counterbanced across infants. Dur-
ing the test stage, infants heard all four test se-
Method quences; the independent variable of whether
Participants. Sixteen 10-month-old infants each test sequence was presented at familiariza-
(mean age ⫽ 44 weeks, 4 days; range, 43 weeks, tion is a within-subjects variable. The dependent
3 days to 46 weeks, 2 days) were recruited from variable is the amount of time spent listening
the Baltimore, MD, community to participate in (with head oriented toward the sound source) to
the study. Two additional infants were tested but each test sequence, summed across test trials.
not included because of restlessness (1) and A modified version of the Headturn Prefer-
parental interference (1). ence Procedure (Kemler Nelson et al., 1995)
Materials. The same musical stimuli were was used to measure orientation times for test
used in this study as in Experiment 1. sequences. The infant sat in the middle of the
Apparatus. A Macintosh Centris 650 com- testing room on the lap of the caretaker; an ob-
puter controlled the presentation of the stimuli server hidden behind the center panel watched
and recorded the observer’s coding of the in- the infant through a peephole and recorded the
fants’ responses. The stimuli were stored in dig- direction and duration of the infant’s headturns
itized form (at a 20 kHz sampling rate) on the using a response box. The observer and the care-
computer. A 16-bit D/A converter was used to giver wore earplugs and listened to masking
recreate the audio signals, which were sounded music over tight-fitting Peltor Aviation-7050
at an amplitude of 72 ⫾ 2 dB SP, approximately headphones. Each trial was as follows: the cen-
20 dB above the ambient noise level. ter green light flashed, and when the infant ori-
The stimuli were played out through an- ented toward it, an observer pushed a button to
tialiasing filters and a Kenwood audio amplifier extinguish the center light and to start flashing
(KA 5700) over Cambridge Soundworks loud- one of the red side lights above a loudspeaker
speakers mounted behind the side walls of the (the program controls randomization of the side
testing booth. A red light was mounted in view lights). When the infant made a headturn of at
on each of the side walls and a green light was least 30 ° in the direction of the speaker, the ob-
mounted on the center panel of the testing server pushed another button to initiate the
booth, all approximately at the seated infant’s sounding of the sequence, which continued until
eye level. Directly below the center light a 5-cm it ended or until the infant looked away for two
hole accommodated the lens of a video camera consecutive seconds. The observer pressed a
used to record each test session. A white curtain button whenever the infant looked away or re-
suspended around the top of the booth shielded oriented toward the speaker. The computer pro-
the infant’s view of the rest of the room. A com- gram kept track, based on the button presses, of
puter terminal and response box were located actual looking times to the side lights; total
behind the center panel, outside the infant’s looking time for a given trial excluded any time
view. The response box, connected to the com- the infant looked away. If the infant looked
puter, was equipped with buttons that were used away for more than 2 s, the side light was extin-
to start and stop the flashing center and side guished, the trial was interrupted and ended, and
lights and record the direction and duration of the next trial began. The flashing red light re-
headturns. Information about the direction and mained on for the duration of the trial.
duration of headturns and the total trial duration During the familiarization stage, each infant
was stored in a data file on the computer. heard two of the four sequences. Each trial con-
Design and procedure. The experimental de- sisted of one of the sequences repeated eight
sign was as similar as possible to that in Experi- times, yielding a total trial length of approxi-
ment 1. Two of the four short sequences were mately 21 s. Each infant had to achieve 30 s of
presented to infants during the familiarization accumulated listening time to each of the famil-
stage, one from each meter; which two of the iarization sequences in order to progress to the
test stage, and trials containing the two familiar- pitch/duration content. Thirteen of the 16 in-
ization stimuli alternated until the infant had fants had longer mean listening times for the test
achieved this threshold. If the infant met the 30- sequences heard at familiarization than for the
s criterion for one sequence before the other, the novel test sequences. Neither the sequences in-
two trial types (containing the different se- cluded at familiarization nor the random order
quences) continued to alternate until the crite- in which subjects heard trials was a significant
rion was met for both. The presentation of each factor.
stimulus was assigned randomly to a particular These results indicate that further musical ac-
loudspeaker on each trial. culturation beyond 10 months of age is not re-
During the test stage, each infant heard all quired for listeners to use prosodic features in
four sequences on different trials. There were 3 identifying particular music performances. This
blocks of 4 trials each, yielding a total of 12 tri- result is consistent with exemplar views of lan-
als. Each of the four sequences was sounded guage acquisition that consider prosodic fea-
within each block, and trials within a block were tures as important not only to the identification
randomly ordered. Each trial was created the of spoken words but also to the formation of
same as in the familiarization stage, and trial memories that distinguish one unit or word from
onset and offset times were determined exactly another (Houston & Jusczyk, 2000; Jusczyk,
as in the familiarization phase. The dependent 1993, 1997). As in speech, musical prosody car-
variable was each infant’s total orientation time ries information that may help to identify a par-
for each sequence summed across the test trials, ticular performance.
which represented the time spent looking at the What purpose could episodic memory for
light during the sounding of each sequence, with music performance serve? It seems likely that
time looking away excluded. prosodic features aid segmentation of musical
sequences into meaningful units. To test whether
Results and Discussion prosodic features serve to mark units in a contin-
A one-way ANOVA on infants’ mean listen- uous stream of music, the short melodies were
ing times by familiarization condition (was test embedded in the next experiment in computer-
sequence familiarized/not familiarized) yielded generated musical contexts that either matched
a significant effect of familiarization, F (1,15) ⫽ or mismatched the features of the short se-
7.77, p ⬍ .05. As shown in Fig. 3, infants lis- quences. When the short sequences are embed-
tened longer to the particular performances with ded in a larger melodic context, their prosodic
which they had been familiarized, when pre- features may be more salient when their struc-
sented among other performances with identical tural correlates do not match the structure of the
FIG. 3. Experiment 2: infants’ mean orientation times by familiarization condition (was test sequence famil-
iarized/not familiarized), with standard error bars.
context. We test this possibility in the following were originally performed in a 3/4 metrical con-
experiment, as an example of the role that text may be more salient when they are embed-
prosodic features play in music perception. ded in a 4/4 context than when they are embed-
ded in a 3/4 context. In addition, we hypothesize
EXPERIMENT 3: ROLE OF MELODIC that listeners will identify all performances
CONTEXT heard at familiarization (whether consistent or
The first two experiments indicate that listen- inconsistent with the melodic context) more
ers of a wide range of ages and musical experi- often than performances not heard at familiar-
ence were able to encode prosodic features of ization, based on the prior two experiments.
short musical sequences (shorter than most The experimental task is similar to that of the
melodies), which were chosen to allow compar- previous experiments. Adult listeners are famil-
isons among infants and adults. Are these iarized with particular performances of the same
prosodic cues useful in larger musical contexts? short sequences used in the first two experi-
Findings in speech perception suggest that lis- ments; at test, they are presented with the se-
teners as young as 6 months old can use quences embedded in larger melodies and asked
prosodic information to encode clausal units, to identify those performances they heard at fa-
and later recognize those units in new contexts miliarization. The melodic contexts are con-
(Nazzi et al., 2000; Soderstrom et al., 2000). We structed by computer and contain constant (un-
test a similar segmentation issue in this study: varying) prosodic cues: all intensities (hammer
whether listeners can identify the short se- velocities), interonset intervals, and articulation
quences embedded in larger melodic contexts, values are the same across all events in the
on the basis of their prosodic features. melodic context. Only the short sequences con-
The perceptual salience of musical events de- tain prosodic cues. Because of the increased dif-
pends largely on the regularities of the context ficulty of the task (listeners must find the short
in which they appear. Western tonal music con- sequence in the melodic context and decide
tains a complex system of structural regularities; whether its prosodic features match those heard
both music-theoretic and psychological ap- at familiarization), only musically trained listen-
proaches suggest that listeners generate expecta- ers were recruited for the experiment.
tions for future musical events based on those We also address the relationship between the
regularities, that influence the perception of mu- prosodic cues and their structural correlates. It is
sical events (Jones & Boltz, 1989; Meyer, 1956; possible, for example, that listeners remember
Narmour, 1990; Palmer & Krumhansl, 1990). the metrical structure or other structural corre-
One of the most regular dimensions of Western lates of meter and do not retain the subcategori-
tonal music is its meter, an alternating pattern of cal acoustic cues, as Raffman (1993) suggests.
strong and weak beats; metrical regularity influ- In order to claim that listeners encode perform-
ences listeners’ perception of and memory for ance-specific details of the excerpts, we address
music (Palmer & Krumhansl, 1990; Yee, the possibility that listeners could recognize the
Holleran, & Jones, 1994). We take advantage of familiarized short sequences based solely on an
this fact in Experiment 3 by placing each short abstract representation of metrical structure. For
sequence in two different metrically regular this purpose, we include a control condition in
contexts, one that matches and one that mis- which a different group of musically trained lis-
matches the prosodic cues in the short sequence. teners attempted to identify the meter of each
We hypothesize that performances whose short sequence (heard out of context), to test
prosodic features do not match the metrically whether an abstract representation of meter
regular context may be more salient perceptu- could be formed from the short sequences alone.
ally (when expectations are inconsistent) than
those performances whose features do match the Method
metrical context (when expectations are consis- Participants. Sixty-four musically trained lis-
tent). For example, the short sequences that teners from the Columbus, Ohio, community
participated in this study. Listeners had between sentation of the metrical structure from the short
4 and 15 years of musical training, with a mean sequences alone, a control experiment was first
of 7.6 years of private lessons on a musical in- conducted in which 14 different listeners (not
strument. None of the listeners reported any included in the experiment) were asked to iden-
hearing problems. Listeners either received tify the meter of each excerpt. The musically
course credit or were paid a nominal fee for trained listeners (4–14 years of training, mean
their participation. None of the listeners had of 6.8 years) were first given examples of
participated in the previous experiments. melodies in a binary meter (defined as a pattern
Materials. The same sequences from Experi- of alternating accents with a strong accent on
ment 1 were used in this experiment, as well as every second beat, as in 2/4 or 4/4 meter) and a
the musical melodies from which they were ternary meter (defined as a pattern of alternating
originally excised. Two different melodic con- accents with a strong accent on every third beat,
texts were created for each short sequence, a as in 3/4 or 6/8 meter). These melodies were not
context that matched the prosodic cues of the se- included in the control experiment. Then listen-
quence (“matched context”) and a context that ers heard on each test trial two repetitions of a
did not match the prosodic cues of the sequence short sequence from Experiment 1, and they in-
(“mismatched context”). The matched context is dicated on paper whether the short sequence
one in which both the short sequence and the would fit or sound best in a binary or a ternary
melodic context share the same meter (binary or meter. The control experiment included five rep-
ternary); the mismatched context is one in etitions of each of the four short sequences from
which the short sequence and melodic context Experiment 1 (two from a binary meter and two
differ in meter. from a ternary meter), totaling 20 trials. Listen-
The melodic contexts were generated from ers’ percentage of correct responses across trials
computerized (mechanical) versions of both indicated no significant differences from chance
melodies for each short sequence, in which no (mean percentage correct ⫽ 54%, p ⬎ .70); re-
prosodic cues were present (all event interonset sponses were equally (in)accurate for sequences
intervals and intensities were equivalent across that were originally performed in binary meter
the melody). The intensities and interonset in- (52% correct) and in ternary meter (55% cor-
tervals were set equal to the mean values in the rect). Thus, it is unlikely that listeners simply
original performed melodies of Experiment 1, formed an explicit abstract representation of
from which the short sequences were taken. meter for the short sequences.
When the short sequence was placed in the Design and procedure. The familiarization
matched context (the melody from which it stage was identical to that of Experiment 1.
originated), the event onsets and offsets at the During familiarization, two of the four short se-
boundaries were the same as in the original per- quences (one from each metrical context) were
formance. When the short sequence was placed presented, 12 times each in succession with
in the mismatched context (the alternate 750 ms of intervening silence. During the test
melody), the event onsets and offsets at the short stage, all four sequences were heard in each
sequence boundaries were aligned with the block of trials in random order, each one pre-
times at which the original short sequence (as sented within a melodic context. Half of the lis-
performed in its original melodic context) began teners heard the sequences presented only in
and ended. That is, the bracketed excerpts A⬘ matched melodic contexts, and half heard the
and B⬘ in Fig. 1 were exchanged between the sequences presented only in mismatched
melodic contexts A and B. Thus, performances melodic contexts. The four sequences were pre-
A-A⬘-A and B-B⬘-B reflect matched contexts, sented in random order within each of nine
and A-B⬘-A and B- A⬘-B reflect mismatched blocks, yielding a total of 36 trials. Half of the
contexts. trials contained sequences that were heard at
Metrical control condition. To ascertain familiarization and half of the trials con-
whether listeners could form an abstract repre- tained sequences not heard at familiarization.
Thus, familiarization of the test sequences The d-prime values confirmed listeners’ in-
(present/absent at familiarization) was a within- creased identification accuracy for performances
subjects factor and was crossed with melodic in mismatched (d⬘ ⫽ 0.868) relative to matched
context (matched meter/mismatched meter), (d⬘ ⫽ 0.358) melodic contexts (nonparametric
which was a between-subjects factor. Mann–Whitney U ⫽ 724, p ⬍ .01); only the
On each test trial a melody was sounded d-primes for mismatched contexts differed from
twice, with 1 s of silence between repetitions. zero (p ⬍ .01). Neither the sequences that were
Instructions to listeners were the same as in- included at familiarization nor the random order
structions in Experiment 1, with the following in which subjects heard trials was a significant
exceptions. Listeners were told that each factor. Analyses indicated higher percentages of
melody at test would contain an embedded short “no” responses in block 1, F (8,496) ⫽ 3.2,
sequence with the same pitch/duration pattern p ⬍ .01, but block effects disappeared when the
as one they heard during the familiarization first block of trials was removed, possibly indi-
stage of the experiment. They were asked to cating an initial difficulty of locating the embed-
respond yes or no as to whether the embedded ded sequence. Confidence ratings did not differ
sequence was identical to one of the perform- significantly across familiarization or context
ances they heard during the familiarization variables.
stage. As in Experiment 1, listeners were told Two additional analyses were conducted to
that some of the test sequences would have the ascertain whether listeners’ musical training in-
same notes as the familiarization sequences, but fluenced the differences found between the
they would not be identical because they were matched versus the mismatched melodic context
from a different performance. The additional conditions, which was a between-subjects vari-
musical examples played for nonmusician lis- able. First, the musical training of the 32 listen-
teners in Experiment 1 were included in this ers in each between-subjects group (matched
study as well, to reduce the increased task diffi- and mismatched contexts) was compared; there
culty in this experiment (listeners must find the were no significant differences in number of
embedded sequence as well as judge its famil- years of musical training between the matched
iarity). Listeners also marked their degree of context group (mean ⫽ 7.4 years) and the mis-
confidence in their decision on a three-point matched context group (mean ⫽ 7.8 years). Sec-
scale (1 ⫽ not confident, 3 ⫽ confident). ond, listeners were divided into two groups
within each familiarization by melodic context
Results and Discussion condition, based on a median split on amount of
An ANOVA on the percentage of identifica- musical experience. Analyses on mean re-
tion (“yes”) responses by familiarization condi- sponses by familiarization condition, melodic
tion (was test sequence familiarized/not familiar- context, and musical training indicated no ef-
ized) and melodic context (matched/mismatched fects of musical training or interactions with the
meter) indicated a significant effect of familiar- other variables. Thus, it is unlikely that the find-
ization, F (1,62) ⫽ 21.0, p ⬍ .01. As shown in ings are based on different degrees of musical
Fig. 4, listeners were able to identify those per- training.
formances they heard at familiarization cor- These results indicate that prosodic features
rectly, even when they were embedded in a for short melodic sequences can be used to iden-
melodic context. Listeners’ responses to the fa- tify those sequences in larger melodic contexts.
miliarized and unfamiliarized sequences differed Listeners were able to locate the familiarized se-
significantly within both the matched and mis- quences in a larger melodic context and identify
matched conditions (p ⬍ .05). In addition, identi- whether they contained the same prosodic cues
fication was improved for the mismatched con- as heard earlier. Furthermore, listeners’ sensitiv-
text over the matched context; the interaction ity to prosodic features was influenced by the
between familiarization and melodic context ap- metrical context in which they appeared; short
proached significance, F (1,62) ⫽ 3.56, p ⫽ .06. sequences presented in mismatched metrical
FIG. 4. Experiment 3: adult listeners’ mean percentage of yes responses by familiarization condition (famil-
iarized/not familiarized) and melodic context (matched/mismatched prosodic cues), with standard error bars.
contexts were easier to recognize than se- This view is also consistent with recent perspec-
quences in matched contexts. However, it is tives on speech perception and recognition that
unlikely that identification judgments were consider stimulus-specific nonlinguistic (indexi-
based solely on an abstract metrical structure, as cal) properties to be perceived and encoded
indicated by listeners’ inability to identify ex- episodically along with abstract linguistic prop-
plicitly the meter of the short sequences out of erties (Goldinger, 1997; Jusczyk, 1993; Luce &
context. Together, these findings suggest that Lyons, 1998; Nygaard, Burt, & Queen, 2000;
listeners identified the embedded performances Nygaard, Sommers, & Pisoni, 1995; Pisoni,
based partially on performance-specific (non- 1997).
structural) features and partially on acoustic fea- How is memory for specific music perform-
tures that correlated with an abstract metrical ances related to memory for abstract musical
structure. structure? This question raises the relationship
between episodic and generic (abstract) memo-
GENERAL DISCUSSION ries. Episodic memory has been defined as a
Several experiments demonstrated that listen- recollection of a specific experience that pre-
ers’ memory for musical sequences can incorpo- serves spatial and/or temporal properties of the
rate detailed, instance-specific acoustic features experience (Tulving & Thompson, 1973).
that differentiate one performance from another. Whether young children have episodic memory
Listeners were able to identify a particular per- is in debate; episodic memory usually entails
formance from a set of performances which re- conscious awareness of the experience (Tulv-
tained the same melodic and rhythmic structure ing, 1985). Repeated exposure to exemplars
(i.e., the same musical composition) but differed may produce not only traces of the individual
in expressive features. This finding disconfirms events in episodic memory but also a single ab-
the view that only structural categories in music, stract representation in a functionally (and pos-
such as pitch and duration categories, can be re- sibly anatomically) separate memory system
tained in memory. More consistent with these (cf. Tulving, 1983). Multiple-trace theories pro-
findings is an episodic view of memory for pose in contrast that only traces of individual
music, in which individual representations for episodes are stored; traces acting in concert at
acoustic events encode stimulus-specific fea- retrieval represent the abstract memory (cf.
tures in addition to abstract structural features. Hintzman, 1986). We do not distinguish be-
tween these alternatives here; rather, we posit as tonality and implied harmony, changes from
that both infants and adults can form memories infancy to adulthood (Cuddy & Badertscher,
that encode subcategorical, temporally specified 1987; Krumhansl & Keil, 1982; Lynch & Eilers,
acoustic properties of specific musical experi- 1992; Trainor & Trehub, 1992), other findings
ences. suggest that children’s basic auditory perception
Do listeners encode only those acoustic fea- abilities function along principles similar to
tures that are correlated with abstract structure those of adults (Baruch & Drake, 1997; De-
(such as meter in these experiments)? Although many, 1982; Drake & Penel, 1999; Trehub,
the acoustic features of the short performances 2000; Trehub, Bull, & Thorpe, 1984). Analogies
were not manipulated separately, the experi- between musical motion and principles of phys-
mental evidence suggests the answer is no. ical motion also suggest that musical training
First, infants as well as musically trained and may not be necessary for sensitivity to acoustic
untrained adults, whose familiarity and training features of performances. For example, the rate
with abstract musical structures varied widely, of slowing at phrase boundaries in music resem-
were able to identify familiarized performances bles deceleration patterns of physical motion, as
in Experiments 1 and 2. The familiarized per- in running and walking (Kronman & Sundberg,
formances were balanced within and across in- 1987). The similarity between infants’ and
dividuals so that metrical structure alone could adults’ identification of music suggests that an
not be a defining feature of which perform- ability to encode stimulus-specific acoustic fea-
ances were heard earlier. Second, musically tures at least for short musical sequences is pres-
trained listeners were unable to identify explic- ent early in life—by 10 months of age. This
itly the metrical structure of the musical se- functionality is important if the acoustic fea-
quences in a control experiment, although im- tures serve to mark salient units in music,
plicit knowledge may be present (cf. Reber, speech, and other auditory domains (Drake &
1989). Third, listeners were able to identify all Penel, in press; Trehub, 2000); an ability to en-
familiarized sequences when they were placed code the acoustic features which mark units is
in metrical contexts in Experiment 3—not just necessary for the recognition of those units.
those sequences that were placed in a mis- One limitation of the current studies is that in-
matched metrical context, as might be expected dividual acoustic features were not isolated in
if the metrical structure alone were retained. stimulus presentation; whether one acoustic cue
The fact that identification was aided by a mis- is more memorable than another is unknown.
matched metrical context rules out the possibil- Further research addresses the extent to which
ity that the context was insufficient to induce a one acoustic dimension dominates another in
perception of meter. Thus, listeners appear to memory for these performances (Jungers &
be identifying music performances based both Palmer, 2000). Another limitation is the reduced
on performance-specific (nonstructural) fea- length and number of musical sequences, which
tures and on abstract structural features. were chosen to allow comparisons among infant
Musical training or experience does not ap- and adult perception. Finally, stimulus variabil-
pear to be a prerequisite for listeners’ ability to ity in these experiments was reduced on several
encode and recall features of particular music acoustic dimensions, including pitch, timbre,
performances. Experiment 1 demonstrated that and time; durational categories and tempi were
both musically trained and untrained adult lis- kept constant, in order to keep the total stimulus
teners could identify particular performances of duration equivalent across performances. Many
the musical sequences with high accuracy. Ex- studies indicate that variability in event timing
periment 2 demonstrated that 10-month-old in- in music performance correlates with musical
fants were able to recognize and distinguish spe- structure (Palmer & Kelly, 1992; Sloboda, 1983;
cific music performances they had heard earlier. Todd, 1985), and listeners are sensitive to that
Although some work suggests that sensitivity to variability (Juslin, 1997; Palmer, 1996b);
complex abstract musical structures, such whether memory for auditory features of music
would differ in the presence of these forms of of the performance-specific features. Last,
stimulus variability is unknown. episodic memory for music performances may
Why would episodic representations for enable listeners to identify individual perform-
music be useful? One reason is its bootstrap- ers in much the same way that voice character-
ping potential in perceptual learning. A com- istics allow identification of individual talkers.
mon view in speech is that prosodic features Music, like speech, is a domain in which
provide a low-level cue to aid segmentation, stimulus variability offers a primary resource
particularly during acquisition of knowledge for aiding memory and learning. This perspec-
about the units of speech (Gleitman & Wanner, tive is supported by several points. First, most
1982; Hirsh-Pasek et al., 1987; Jusczyk & musical instruments permit variability on multi-
Kemler Nelson, 1996; Morgan, 1986; Peters, ple dimensions, and the inherent ambiguities in
1983). Listeners’ abilities to identify smaller musical structure not only allow but require per-
musical units may serve to bootstrap their abil- formers to make use of acoustic variability to
ity to apprehend higher-order relationships encode structure. Sources of stimulus variabil-
among those units. This research indicates that ity, such as the performers’ emotional state, pro-
listeners can incorporate prosodic features in duction rate, and syntactic/interpretive effects,
memory for music in the absence of extensive produce large changes in the acoustic signal.
musical experience. Other studies have shown Second, communication of structural informa-
that acoustic features of music can influence in- tion in music is typically maximized through re-
fants’ perception of musical units (Jusczyk & dundancy of cues; multiple acoustic cues carry
Krumhansl, 1993; Krumhansl & Jusczyk, information about the same structural content.
1990). However, this study is the first to For example, reductions in tempo and amplitude
demonstrate infants’ ability to retain in memory often mark phrase boundaries. These cues may
those acoustic features—an important prerequi- provide reliable information to facilitate com-
site for a bootstrapping explanation of listeners’ munication when the signal is presented under
acquisition of complex musical relationships. degraded conditions (cf. Palmer, 1996b). Fi-
A related reason that episodic memory for nally, music is multidimensional and, like
music performances would be useful is the re- speech, it has a complex mapping among its dif-
lationship that structural ambiguity holds with ferent instantiations in the physical domain, the
acoustic variability in music. Music is often production domain, and the perceptual domain.
ambiguous in structure. Each musical piece al- Its complexity is evidenced by the difficulty of
lows multiple interpretations of phrasing, voic- devising rule-based systems that map acoustic
ing, and segment boundaries, and musicians cues onto structural categories. Episodic mem-
typically use multiple acoustic features to mark ory for at least some acoustic features may be
those structures. Performances of the same necessary to facilitate the recognition of struc-
music can differ greatly, even when performed ture in auditory sequences of this complexity.
by the same musician. These studies and many
others show that prosodic features of perform- REFERENCES
ances are correlated with performers’ structural Baruch, C., & Drake, C. (1997). Tempo discrimination in in-
intentions, and listeners are sensitive to that re- fants. Infant Behavior and Development, 20, 573–577.
Bengtsson, I., & Gabrielsson, A. (1983). Analysis and syn-
lationship. Thus, prosodic features may be
thesis of musical rhythm. In J. Sundberg (Ed.), Studies
necessary for a memory representation that dis- of music performance (pp. 27–60). Stockholm: Royal
tinguishes one structural interpretation of a mu- Swedish Academy of Music.
sical piece from another. This view is consis- Bradlow, A. R., Nygaard, L. C., & Pisoni, D. B. (1999). Ef-
tent with Raffman’s (1993) perspective that fects of talker, rate, and amplitude variation on recogni-
tion memory for spoken words. Perception & Psy-
expressive features help listeners to distinguish
chophysics, 61, 206–219.
among multiple structural representations for a Chang, H. W., & Trehub, S. E. (1977). Auditory processing
given musical piece; we posit further that the of relational information by young infants. Journal of
memory representations encode at least some Experimental Child Psychology, 24, 324–331.
Clynes, M. (1986). When time is music. In J. R. Evans & M. Jungers, M. K., & Palmer, C. (2000). Episodic memory for
Clynes (Eds.), Rhythm in psychological, linguistic, and music performance. Presented at the Meeting of the
musical processes (pp. 169–224). Springfield, IL: Psychonomic Society, New Orleans, Nov.
Thomas. Jusczyk, P. W. (1993). From language-general to language-
Clynes, M. (1995). Microstructural musical linguistics: specific properties: The WRAPSA model of how
Composers’ pulses are liked most by the best musi- speech perception develops. Journal of Phonetics, 21,
cians. Cognition, 55, 269–310. 3–28.
Cooper, W. E., & Sorensen, J. (1977). Fundamental fre- Jusczyk, P. W. (1997). The discovery of spoken language.
quency contours at syntactic boundaries. Journal of the Cambridge, MA: MIT Press.
Acoustical Society of America, 62, 683–692. Jusczyk, P. W. (1998). Using the headturn preference pro-
Cuddy, L. L., & Badertscher, B. (1987). Recovery of cedure to study language acquistion. In C. Rovee-
the tonal hierarchy: Some comparisons across age Collier, L. P. Lipsitt, & H. Hayne (Eds.), Advances in
and levels of musical experience. Perception & Psy- infancy research (Vol. 12, pp. 188–204). Stamford,
chophysics, 41, 609–620. CT: Ablex.
Cutler, A., Dahan, D., & van Donselaar, W. (1997). Prosody Jusczyk, P. W., Hohne, E. A., Jusczyk, A. M., & Redanz, N.
in the comprehension of spoken language: A literature J. (1993). Do infants remember voices? Paper pre-
review. Language and Speech, 40, 141–201. sented at the 125th Meeting of the Acoustical Society
Demany, L. (1982). Auditory stream segregation in infancy. of America, Ottawa, Canada, May.
Infant Behavior and Development, 5, 261–276. Jusczyk, P. W., & Hohne, E. A. (1997). Infants’ memory for
Dowling, W. J., & Harwood, D. L. (1986). Music cognition. spoken words. Science, 277, 1984–1986.
Orlando: Academic Press. Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999).
Drake, C., & Penel, A. (1999). Learning to play music: The beginnings of word segmentation in English-learn-
Rhythm and timing. Parole, 9, 49–62. ing infants. Cognitive Psychology, 39, 159–207.
Fellowes, J. M., Remez, R. E., & Rubin, P. E. (1997). Per- Jusczyk, P. W., & Kemler Nelson, D. G. (1996). Syntactic
ceiving the sex and identity of a talker without natural units, prosody, and psychological reality during in-
vocal timbre. Perception & Psychophysics, 59, fancy. In J. L. Morgan & K. Demuth (Eds.), Signal to
839–849. syntax (pp. 389–408). Mahwah, NJ: Erlbaum.
Gabrielsson, A., & Juslin, P. N. (1996). Emotional expres- Jusczyk, P. W., & Krumhansl, C. L. (1993). Pitch and
sion in music performance: Between the performer’s rhythm patterns affecting infants’ sensitivity to musical
intention and the listener’s experience. Psychology of phrase structure. Journal of Experimental Psychology:
Music, 24, 68–91. Human Perception and Performance, 19, 627–640.
Gerstman, L. (1968). Classification of self-normalized vow- Juslin, P. N. (1997). Emotional communication in music
els. IEEE Transactions on Audio and Electroacoustics performance: A functionalist perceptive and some data.
(ACC-16), 78–80. Music Perception, 14, 383–418.
Gleitman, L. R., & Wanner, E. (1982). The state of the state Kemler Nelson, D. G., Jusczyk, P. W., Mandel, D. R.,
of the art. In E. Wanner & L. R. Gleitman (Eds.), Lan- Myers, J., Turk, A., & Gerken, L. A. (1995). The Head-
guage acquisition: The state of the art (pp. 3–48). turn Preference Procedure for testing auditory percep-
Cambridge: Cambridge Univ. Press. tion. Infant Behavior and Development, 18, 111–116.
Goldinger, S. D. (1997). Words and voices: Perception and Kendall, R. A., & Carterette, E. C. (1990). The communica-
production in an episodic lexicon. In K. Johnson & J. tion of musical expression. Music Perception, 8,
W. Mullennix (Eds.), Talker variability in speech pro- 129–163.
cessing (pp. 33–66). San Diego: Academic Press. Kronman, U., & Sundberg, J. (1987). Is the musical ritard an
Henderson, M. T. (1936). Rhythmic organization in artistic allusion to physical motion? In A. Gabrielsson (Ed.),
piano performance. In C. E. Seashore (Ed.), Objective Action and perception in rhythm and music (no. 55, pp.
analysis of musical performance, Vol. 4 (pp. 281–305). 57–68). Stockholm: Royal Swedish Academy of
Iowa City: Univ. of Iowa Press. Music.
Hintzman, D. L. (1986). “Schema abstraction” in a multiple- Krumhansl, C. L., & Jusczyk, P. W. (1990). Infants’ percep-
trace memory model. Psychological Review, 93, tion of phrase structure in music. Psychological Sci-
411–428. ence, 1, 70–73.
Hirsh-Pacek, K., Kemler Nelson, D. G., Jusczyk, P. W., Krumhansl, C. L., & Keil, F. C. (1982). Acquisition of the
Wright-Cassidy, K., Druss, B., & Kennedy, L. (1987). hierarchy of tonal functions in music. Memory & Cog-
Clauses are perceptual units for young infants. Cogni- nition, 10, 243–251.
tion, 26, 269–286. Large, E. W., Palmer, C., & Pollack, J. B. (1995). Reduced
Houston, D. M., & Jusczyk, P. W. (2000). The role of talker- memory representations for music. Cognitive Science,
specific information in word segmentation by infants. 19, 53–96.
Journal of Experimental Psychology: Human Percep- Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT
tion and Performance, 26, 1570–1582. Press.
Jones, M. R., & Boltz, M. (1989). Dynamic attending and Lehiste, I., Olive, J. P., & Streeter, L. (1976). The role of du-
responses to time. Psychological Review, 96, 459–491. ration in disambiguating syntactically ambiguous sen-
tences. Journal of the Acoustical Society of America, Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993).
60, 1199–1202. Episodic encoding of voice attributes and recognition
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of memory for spoken words. Journal of Experimental
tonal music. Cambridge, MA: MIT Press. Psychology: Learning, Memory, and Cognition, 19,
Luce, P. A., & Lyons, E. A. (1998). Specificity of memory 309–328.
representations for spoken words. Memory & Cogni- Peters, A. (1983). The units of language acquisition. Cam-
tion, 26, 708–715. bridge: Cambridge Univ. Press.
Lynch, M. P., & Eilers, R. E. (1992). A study of perceptual Pisoni, D. B. (1997). Some thoughts on “normalization” in
development for musical tuning. Perception & Psy- speech perception. In K. Johnson & J. W. Mullennix
chophysics, 52, 599–608. (Eds.), Talker variability in speech processing (pp.
Mandel, D. R., Jusczyk, P. W., & Kemler Nelson, D. G. 9–32). San Diego: Academic Press.
(1994). Does sentence prosody help infants organize Price, P. J., Ostendorf, M., Shattuck-Hufnagel, S., & Fong,
and remember speech information? Cognition, 53, C. (1991). The use of prosody in syntactic disambigua-
155–180. tion. Journal of the Acoustical Society of America, 90,
Meyer, L. B. (1956). Emotion and meaning in music. 2956–2970.
Chicago: Univ. of Chicago Press. Raffman, D. (1993). Language, music, and mind (Bradford
Morgan, J. L. (1986). From simple input to complex gram- Book ed.). Cambridge, MA: MIT Press.
mar. Cambridge, MA: MIT Press. Reber, A. S. (1989). Implicit learning and tacit knowledge.
Nakamura, T. (1987). The communication of dynamics Journal of Experimental Psychology: General, 118,
between musicians and listeners through musical 219–235.
performance. Perception & Psychophysics, 41, Remez, R. E., Fellowes, J. M., & Rubin, P. E. (1997). Talker
525–533. identification based on phonetic information. Journal
Narmour, E. (1990). The analysis and cognition of basic of Experimental Psychology: Human Perception and
melodic structures: The implication-realization model. Performance, 23, 651–666.
Chicago: Univ. of Chicago Press. Repp, B. H. (1989). Expressive microstructure in music: A
Nazzi, T., Kemler Nelson, D. G., Jusczyk, P. W., & Jusczyk, preliminary perceptual assessment of four composers’
A. M. (2000). Six-month-olds’ detection of clauses em- “pulses.” Music Perception, 6, 243–274.
bedded in continuous speech: Effects of prosodic well- Scott, D. R. (1982). Duration as a cue to the perception of a
formedness. Infancy, 1, 124–147. phrase boundary. Journal of the Acoustical Society of
Nygaard, L. C., Burt, S. A., & Queen, J. S. (2000). Surface America, 71, 996–1007.
form typicality and asymmetric transfer in episodic Seashore, C. E. (1938). Psychology of music. New York:
memory for spoken words. Journal of Experimental McGraw–Hill.
Psychology: Learning, Memory, and Cognition, 26, Serafine, M. L., Glassman, N., & Overbeeke, C. (1989). The
1228–1244. cognitive reality of hierarchic structure in music. Music
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Perception, 6, 347–430.
Speech perception as a talker-contingent process. Psy- Shankweiler, D., Strange, W., & Verbrugge, R. (1977).
chological Science, 5, 42–46. Speech and the problem of perceptual constancy. In R.
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1995). Ef- Shaw & J. Bransford (Eds.), Perceiving, acting, and
fects of stimulus variability on perception and repre- knowing: Toward an ecological psychology (pp.
sentation of spoken words in memory. Perception & 315–345). Hillsdale, NJ: Erlbaum.
Psychophysics, 57, 989–1001. Shattuck-Hufnagel, S., & Turk, A. E. (1996). A prosody
Palmer, C. (1989). Mapping musical thought to musical tutorial for investigators of auditory sentence proc-
performance. Journal of Experimental Psychology: essing. Journal of Psycholinguistic Research, 25,
Human Perception and Performance, 15, 331–346. 193–247.
Palmer, C. (1996a). Musical communication and theories Siegel, J. A., & Siegel, W. (1977). Categorical perception of
of the stimulus. Paper presented at the 131st Meeting tonal intervals: Musicians can’t tell sharp from flat.
of the Acoustical Society of America, Indianapolis, Perception & Psychophysics, 21, 399–407.
April. Sloboda, J. A. (1983). The communication of musical metre
Palmer, C. (1996b). On the assignment of structure in music in piano performance. Quarterly Journal of Experi-
performance. Music Perception, 14, 23–56. mental Psychology, 35, 377–396.
Palmer, C. (1997). Music performance. Annual Review of Sloboda, J. A. (1985). Expressive skill in two pianists:
Psychology, 48, 115–138. Metrical communication in real and simulated per-
Palmer, C., & Kelly, M. H. (1992). Linguistic prosody and formances. Canadian Journal of Psychology, 39,
musical meter in song. Journal of Memory and Lan- 273–293.
guage, 31, 525–542. Soderstrom, M., Jusczyk, P. W., & Kemler Nelson, D. G.
Palmer, C., & Krumhansl, C. L. (1990). Mental representa- (2000). Evidence for the use of phrasal packaging by
tions for musical meter. Journal of Experimental Psy- English-learning 9-month-olds. In S. C. Howell, S. A.
chology: Human Perception and Performance, 16, Fish, & T. Keith-Lucas (Eds.), Proceedings of the 24th
728–741. Annual Boston University Conference on Language
Development (Vol. 2, pp. 708–718). Somerville, MA: Trainor, L. J., & Trehub, S. E. (1992). A comparison of in-
Cascadilla Press. fants’ and adults’ sensitivity to Western musical struc-
Speer, S. R., Crowder, R. G., & Thomas, L. M. (1993). ture. Journal of Experimental Psychology: Human
Prosodic structure and sentence recognition. Journal of Perception and Performance, 18, 394–402.
Memory and Language, 32, 336–358. Trehub, S. E. (2000). Human processing predispositions and
Streeter, L. A. (1978). Acoustic determinants of phrase musical universals. In N. L. Wallin, B. Merker, & S.
boundary perception. Journal of the Acoustical Society Brown (Eds.), The origins of music. Cambridge, MA:
of America, 64, 1582–1592. MIT Press.
Studdert-Kennedy, M. (1974). The perception of speech. Trehub, S. E., Bull, D., & Thorpe, L. A. (1984). Infants’ per-
In T. A. Sebeok (Ed.), Current trends in linguistics ception of melodies: The role of melodic contour. Child
(pp. 2349–2385). The Hague: Mouton. Development, 55, 821–830.
Studdert-Kennedy, M. (1983). On learning to speak. Human Trehub, S. E., & Thorpe, L. A. (1989). Infants’ perception of
Neurobiology, 2, 191–195. rhythm: Categorization of auditory sequences by tem-
Sundberg, J., Askenfelt, A., & Fryden, L. (1983). Musical poral structure. Canadian Journal of Psychology, 43,
performance: A synthesis-by-rule approach. Computer 217–229.
Music Journal, 7, 37–43. Trehub, S. E., & Trainor, L. J. (1998). Singing to infants:
Thompson, W. F. (1989). Composer-specific aspects of mu- Lullabies and play songs. In C. Rovee-Collier &
sical performance: An evaluation of Clynes’ theory of L. Lipsitt (Eds.) Advances in infancy research (pp.
pulse for performances of Mozart and Beethoven. 43–77). Norword, NJ: Ablex.
Music Perception, 7, 15–42. Tulving, E. (1983). Elements of episodic memory. Oxford:
Todd, N. P. M. (1985). A model of expressive timing in tonal Oxford Univ. Press.
music. Music Perception, 3, 33–58. Tulving, E. (1985). Memory and consciousness. Canadian
Trainor, L. J. (1996). Infant preferences for infant-directed Psychology, 26, 1–12.
versus non-infant-directed play songs and lullabies. Tulving, E., & Thompson, D. M. (1973). Encoding speci-
Infant Behavior and Development, 19, 83–92. ficity and retrieval processes in episodic memory.
Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. A. Psychological Review, 80, 352–373.
(1997). The acoustic basis of preferences for infant- Yee, W., Holleran, S., & Jones, M. R. (1994). Sensitivity to
event timing in regular and irregular sequences: Influ-
directed singing. Infant Behavior and Development,
ences of musical skill. Perception & Psychophysics, 56,
20, 383–396.
461–471.
Trainor, L. J., & Adams, B. (2000). Infants’ and adults’ use
of duration and intensity cues in the segmentation (Received July 6, 2000)
of tone patterns. Perception & Psychophysics, 62, (Revision received December 1, 2000)
333–340. (Published online June 20, 2001)

Episodic Memory For Musical Prosody PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Episodic Memory For Musical Prosody PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Journal of Memory and Language 45, 526–545 (2001)

doi:10.1006/jmla.2000.2780, available online at http://www.academicpress.com on

Episodic Memory for Musical Prosody

Caroline Palmer and Melissa K. Jungers

Ohio State University

Johns Hopkins University

526 0749-596X/01 $35.00

Anda mungkin juga menyukai