Anda di halaman 1dari 353

Origins of Sound Change

This page intentionally left blank


Origins of Sound
Change

Approaches to Phonologization

Edited by
ALAN C. L. YU

OXPORD
UNIVERSITY PRESS
OXPORD
UNIVERSITY PRESS
Great Clarendon Street, Oxford, OXi 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University's objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
editorial matter and organization Alan C. L. Yu 2013
the chapters their several authors 2013
The moral rights of the authors have been asserted
First Edition published in 2013
Impression: i
All rights reserved. No part of this publication maybe reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
ISBN 978-0-19-957374-5
Printed in Great Britain by
MPG Books Group, Bodmin and King's Lynn
Contents
Preface vu
Acknowledgements xii
Notes on Contributors xiii

Part I. What is phonologization?


1 Enlarging the scope of phonologization 3
Larry M. Hyman
2 The role of entropy and surprisal in phonologization
and language change 29
Elizabeth Hume and Frdric Mailhot

Part II. Phonetic considerations


3 Phonetic bias in sound change 51
Andrew Garnit and Keith Johnson
4 From long to short and from short to long: Perceptual motivations
for changes in vocalic length 98
Heike Lehnert-LeHouillier
5 Inhibitory mechanisms in speech planning maintain and
maximize contrast 112
5am Tilsen
6 Developmental perspectives on phonological typology and
sound change 128
Chandan Narayan

Part III. Phonological and morphological considerations


7 Lexical sensitivity to phonetic and phonological pressures 149
Abby Kaplan
8 Phonologization and the typology of feature behavior 165
JeffMielke
9 Rapid learning of morphologically conditioned phonetics: Vowel
nasalization across a boundary 181
Rebecca Morley
vi Contents

Part IV. Social and computational dynamics


10 Individual differences in socio-cognitive processing and the actuation
of sound change 201
Alan C. L Yu
11 The role of probabilistic enhancement in phonologization 228
James Kirby
12 Modeling the emergence of vowel harmony through iterated learning 247
Frdric Mailhot
13 Variation and change in English noun/verb pair stress: Data and
dynamical systems models 262
Morgan Sonderegger and Partha Niyogi

References 285
Language Index 331
Subject Index 333
Preface
The content of this volume grew out of a workshop on phonologization held at the
University of Chicago, Illinois, in April, 2008. The majority of the chapters in this
volume are based on papers presented at the workshop. In an attempt to broaden the
breadth and perspectives presented in this volume, however, others were added.
The term 'phonologization', which Larry Hyman defined in 1976 as 'what begins as
an intrinsic byproduct of something, predicted by universal phonetic principles, ends
up unpredictable, and hence, extrinsic' (Hyman 1976: 408), gained prominence as a
result of the publication of Hyman's seminal article under the same name. As Hyman
reviews in his contribution to this volume, however, defining 'phonologization' is not
so straightforward given the complexity in delineating the boundary between what
is phonetic and intrinsic and what is phonological and extrinsic. He considers the
role of contrast in the phonologization process and suggests that the term 'phonol-
ogization' needs to be extended to cover other ways that phonological structure
either changes or comes into being. He ultimately concludes that phonologization is
but one aspect of the larger issue of how (phonetic, semantic, pragmatic) substance
becomes linguistically codified into form. Elizabeth Hume and Frdric Mailhot, on
the other hand, seek to conceptualize the phenomenon of phonologization from the
perspective of information theory (Shannon 1948). In particular, they argue that
information-theoretic concepts such as entropywhich models a cognitive state of
the language user associated with the amount of uncertainty regarding the outcome
of some linguistic eventand surprisalwhich is context-dependent and is associ-
ated with individual elements of the systemare useful tools for understanding how
external factors, individually and together, influence the progression of sound change.
Phonologization, for example, is predicted to preferentially affect elements linked to
extreme degrees of surprisal.
Many issues are intertwined when discussing the phenomenon of phonologization.
As such, the task of arranging the chapters into coherent sections was made all the
more difficult. In the end, I have settled on four broad themes, corresponding to
different facets of phonologization research. It is important to point out, however,
that many chapters touch on themes that would have made them just as appropriate
under a different heading.
Much energy has been dedicated to understanding sound change by identifying
the very early inception of change, that is, the identification of perturbations of the
speech signal, conditioned by physiological constraints on articulatory and/or audi-
tory mechanisms, which affect the way sounds are analyzed by the listener. While
this emphasis on identifying the intrinsic variation in speech has provided important
viii Preface

insights into the origins of widely attested cross-linguistic sound changes, the nature
of phonologization has remained largely unexplored. Several factors, however, have
been implicated in the phonologization process, chief among them are channel and
analytic biases (Wilson 2006; Zuraw 2007; Moretn 2008, 2010; Yu 2011). Channel
bias refers to the relative likelihood of a phonetic precursor to sound change becom-
ing phonologized into full-fledged sound patterns (e.g. Hyman 1976; Ohala 1993;
Lindblom et al. 1995; Hume and Johnson 2001; Blevins 2004). The four chapters in
Part II consider the nature of the channel bias. Andrew Garrett and Keith Johnson
review the state of the art of channel bias research, showing that most typologies of
sound change have drawn either a two-way distinction between changes grounded in
articulation and perception or a three-way distinction among perceptual confusion,
hypocorrective changes, and hypercorrective changes. Heike Lehnert-LeHouillier
explores the role of language-specific perceptual cues in sound changes involving
vowel length and tone/accent on the one hand, and vowel length and vowel height
on the other. Based on the results of a cross-linguistic perception experiment, which
tested the influence of a falling f0 and vowel height on the perception of vocalic
length, she argues that spectral differences (as acoustic correlates of vowel height)
are more tightly linked to the perception of vowel duration than f0 (as the acoustic
correlate of tone/accent). Sam Tilsen, on the other hand, focuses on the contribution
of motor planning in sound change. He argues that contrast-maintaining inhibitory
interactions during contemporaneously planned articulation play a role in contrast
maintenance on diachronic timescales and bias productions toward maximal con-
trast. Sound change is often assumed to result from listeners having little a priori
assumptions about the language to which they are exposed (e.g. Ohala 1993). Such
an approach emphasizes the role of first language acquisition in shaping the course
of phonologization. Chandan Narayan presents a survey of work addressing devel-
opmental processes and the nature of phonological systems and change. He argues
that the types of phonetic contrasts that infants fail to discriminate are those that
are rare in the world's sound systems, which is in part due to their fragile acoustic-
perceptual salience. He also surveys recent research into the fine-grained phonetics
of infant-directed speech in English, which shows acoustic conditions similar to those
targeted in well-known sound changes in the world s languages. These findings suggest
that the ambient language input to infants has the potential to provide the seeds of
phonological change.
Analytic biases are limitations in computation or markedness relations and con-
straints imposed by the Universal Grammar. An analytic bias might render certain
patterns difficult to acquire even from perfect learning data. The nature of ana-
lytic biases is a matter of much debate. The three chapters in Part II wrestle with
this debate. Abby Kaplan argues for the importance of phonological markedness in
shaping the nature of the lexicon. She examines two cases of 'underphonologiza-
tiori, one where phonetic pattern is known to influence phonological patterns, and
Preface ix

one where it doesn't. She concludes that phonology rather than phonetics directly
influences patterns of lexical frequency. While Kaplan argues for the primacy of
phonology over phonetics, Jeff Mielke argues that phonological features are deriva-
tive of phonetic effects that are phonologized into sound patterns. He measures
the crosslinguistic frequency of occurrence of classes defined by particular features
and examines the phonological behavior of these classes. The characteristic behav-
ior profiles of features suggest that different features behave differently (e.g. more
or less assimilation or dissimilation, different behavior of + and - values, etc.),
often because the need for a particular feature is dominated by a particular type
of phonetically-motivated phonological pattern (e.g. voicing assimilation for classes
defined by [voice] and [sonorant]). He argues that the prevalence of these charac-
teristic phonological patterns is best attributed to the phonologization of phonetic
effects.
Phonological patterns often show effects of non-derived environment blocking.
That is, some sound alternations only obtain at morphological boundaries but not
in non-derived environments. How phonetic precursors to sound patterns come to
be phonologized only at morphological boundaries has not been previously explored.
Rebecca Morley tests the ability of participants to learn an association that was con-
ditioned on a morphological boundary, but that consisted of acoustic information
that was sub-phonemic in nature (degree of nasalization on a pre-nasal vowel, which
is never contrastive in English), using an artificial grammar learning paradigm. The
results show that listeners are successful in learning the morphological association
with novel phonetic cues even over short time periods and that grammatical and sub-
grammatical components of the linguistic system have the ability to interact. These
results thus offer supportive evidence for a historical phonetic origin for phonological
processes that only apply (or only fail to apply) in derived environments.
Understanding the emergence of new speech norms requires more than under-
standing the constraints and biases that shape the trajectory of change. The phonetic
and systematic bias factors delineate the preconditions for change, but they do not
explain why a change emerges at a particular moment in history, in one community
and not others.
The last part of this volume contains chapters that address the issue of the social
and computational dynamics of variation and change, a crucial facet of the phonolo-
gization process. To bridge the gap between the emergence of new variants and their
eventual propagation, a linking theory is needed. Two perspectives are offered in
this volume. Alan Yu argues for the potential role systematic individual differences
in modes of speech perception may play in the initiation and propagation of sound
change. He contends that individuals with different cognitive processing styles, and by
extension, different social and personality traits, might arrive at different perceptual
and production norms in speech. He suggests that individuals who are most likely
to introduce new variants in a speech community (the 'innovators' la Milroy and
x Preface

Milroy 1985) might also be the same individuals who are most likely to be imitated
by the rest of the speech community due to their personality traits and other social
characteristics. Conversely individuals with yet other cognitive processing styles and
personality traits might be more susceptible to the linguistic influence of others (the
so-called early adopters' la Milroy and Milroy 1985) and might lead the early phase
of linguistic convergence. Andrew Garrett and Keith Johnson, on the other hand,
attribute the point of entry to differences in sociolinguistic awareness, that is, how
individuals may differ in how they assign social meaning to linguistic differences. They
hypothesize that some individuals in a language community, but crucially not others,
may attend to linguistic variation within their own subgroup but not to variation in
other subgroups. If such individuals become aware of a particular phonetic variant
in their subgroup, but are unaware that it is also present in other subgroups, they
may interpret the variant as a group identity marker, and they may then use it more
often.
While the fact that language change requires variation is undisputed, how vari-
ation leads to change is a matter of much debate. Three authors investigate the
diachronic dynamics of linguistic variation from a computational perspective. At the
level of phonetic cues, the phonologization process often results in transphonolo-
gization (Hyman 1976). That is, the phonologization of one phonetic cue is often
accompanied by the dephonologization of another. Given that most phonological
distinctions are supported by multiple phonetic cues, what factors determine which
cues are selected for phonologization and which cue should dephonologize? James
Kirby argues for the role of probabilistic enhancement in phonologization through
computational simulation of an ongoing sound change in Seoul Korean. He proposes
that cues are targeted for enhancement as a probabilistic function of their statisti-
cal reliability in signaling a contrast. Simulation results using empirically derived
cue values are taken to support the idea that loss of contrast precision may drive
transphonologization.
In addition to the transfer of linguistic contrast from one cue dimension to another,
phonologization often leads to the establishment of sound patterns. A prime example
is the emergence of vowel harmony from vowel-to-vowel co articulation. Frdric
Mailhot shows that the emergence of a categorical pattern of lexical harmony from
vowel-to-vowel coarticulation can be simulated using a simple model of a language
transmission/acquisition feedback loop iterated over multiple generations. The pro-
gression of sound change does not stop at the introduction of a new variant. Under-
standing the behavior of a new variant once it is introduced in the speech stream
is crucial to explaining the trajectory of sound change. From this perspective, it
is intriguing that linguistic systems are replete with cases where multiple variants
coexist within the system. Why do some new variants coexist with old ones, while
others take over and become the dominant patterns? Morgan Sonderegger and Partha
Niyogi explore this issue of stability of variation computationally, using dynamic
Preface xi

modeling. Through a case study of stress shift in English noun/verb pairs, they show
that changes in stability of variation (i.e. bifurcation in dynamic modeling) occur only
under certain models of learning by individuals in a linguistic population.
Phonologization has emerged as one of the central topics in phonological research
in recent years. Many of the recent advances are made possible by researchers cross-
ing disciplinary boundaries and drawing on ideas from other research traditions to
address difficult questions previously thought unanswerable. The original call for
papers stated that the goal of this workshop is to facilitate collaboration among
phonologists as well as specialists from neighboring disciplines seeking unified the-
oretical explanations for the origins of sound patterns in language, as well as to
move toward a new and improved synthesis of synchronie and diachronic phonology'.
The present collection includes perspectives from phonetics, laboratory and theoret-
ical phonology, computer science, psycholinguistics, language acquisition, cognitive
neuroscience, cognitive and social psychology, and sociolinguistics. I hope that this
volume will serve as a stimulus to furthering the discussion and cross-pollination of
ideas.

This volume is dedicated to the memory of Partha Niyogi, a highly esteemed colleague
and a contributor to this volume, who passed away unexpectedly during the course
of preparation of the volume.
Chicago, IL Alan Yu
December 2011
Acknowledgements
Many thanks to the following reviewers of chapters for their valuable comments:
Adam Albright, Matt Carlson, Cynthia Clopper, Katie Drager, Edward Flemming,
Andrew Garrett, Peter Graff, David Harrison, Vsevolod Kapatsinski, Jelena Kri-
vokapic, Roger Levy, Lauren Hall-Lew, Bjrn Lindblom, Fang Liu, Alexis Michaud,
Andrew Nevins, Lisa Pearl, Anne Pycha, Yvan Rose, Joe Salmons, Ryan Shosted,
Morgan Sonderegger, Rachel Walker, Dominic Watts, Charles Yang, and Kie Zuraw
Caroline Crouch and Alison Thumel also provided much-appreciated assistance with
preparing this manuscript. Thanks also go to Julia Steer and John Davey, linguistics
editors at Oxford University Press, for their continued support during the preparation
of this volume.
Notes on Contributors
ANDREW GARRETT is Professor of Linguistics and Nadine M. Tang and Bruce L. Smith
Professor of Cross-Cultural Social Sciences at the University of California, Berkeley,
where he also directs the California Language Archive. In historical linguistics he has
published on general topics in sound change and morphological change as well as the
dialectology, diversification, and prehistory of Yurok (an Algic language of California)
and Western Numic (Uto-Aztecan), the dialectology and diachronic syntax of English,
and the syntax and morphology of Anatolian, Greek, and Latin.
ELIZABETH HUME is Professor of Linguistics at the University of Canterbury,
New Zealand, formerly of the Department of Linguistics at The Ohio State University.
She has published on topics including consonant/vowel interaction, feature theory,
information theory and phonology, language variation, metathesis, markedness, seg-
mentai structure, and the interplay of speech perception and phonology.
LARRY M. HYMAN received his PhD in Linguistics from UCLA in 1972. He taught
at the University of Southern California from 1971 to 1988. He came to Berkeley's
Department of Linguistics in 1988, which he chaired from 1991 to 2002. He has
worked extensively on phonological theory and other aspects of language structure,
concentrating on the Niger-Congo languages of Africa, especially Bantu. He has pub-
lished several books as well as over 120 articles in both theoretical and Africanist
journals.
KEITH JOHNSON is Professor of Linguistics and Director of the Phonology Laboratory
at the University of California, Berkeley. He has published two phonetics textbooks, a
textbook on quantitative linguistics, and two edited collections on speech perception
and phonology. His research focuses on the effects of phonetic and social experience
on speech perception.
ABBY KAPLAN is Assistant Professor (Lecturer) at the University of Utah. Her research
focuses on the phonology-phonetics interface, using a combination of experimen-
tal and corpus data to study the phonetic grounding of phonological patterns. She
received her PhD in 2010 from the University of California, Santa Cruz; her disserta-
tion research investigated the perceptual and articulatory basis of lenition.

JAMES KIRBY is Lecturer in Phonetics at the University of Edinburgh. He received his


PhD in Linguistics from the University of Chicago in 2010. His research interests
include computational models of language acquisition and transmission, the evolu-
tion of tone and voice quality, and the languages of Southeast Asia.
xiv Notes on Contributors

HEIKE LEHNERT-LEHOUILLIER is currently a visiting Research Assistant Professor in


the Department of Communication Sciences and Disorders at the University at Buf-
falo. Her research interests straddle the areas of experimental phonetics, laboratory
phonology and psycholinguistics. She is particularly interested in the interaction of
suprasegmental, prosodie features and segmental features, in both synchronie and
diachronic contexts.

FRDRIC MAILHOT received his PhD in Cognitive Science from Carleton University,
and now works in the Speech team at Google. He is interested in information-theoretic
and modeling-based accounts of sound change, as well as exemplar-based modeling
of generalization in phonological acquisition and use.

JEFF MIELKE is Associate Professor of Linguistics at the University of Ottawa and


Director of the Sound Patterns Laboratory. He uses laboratory and computational
techniques to study linguistic sound patterns as a testing ground for studying the
interaction of physiological, cognitive, social, and other factors.
REBECCA MORLEY is currently a postdoctoral researcher at The Ohio State University.
She received her PhD from Johns Hopkins University in 2008. She is interested in the
cognitive bases for linguistic universals, typology, and learning theory.
CHANDAN NARAYAN is Assistant Professor of Linguistics at the University of Toronto.
His research is focused on the relationship between perceptual development, acous-
tics, and the nature of sound systems.

PARTHA NIYOGI was Professor of Computer Science and Statistics at The University
of Chicago. He obtained his undergraduate degree from IIT Delhi and SM and PhD
from MIT, and worked at Bell Laboratories before joining the University of Chicago.
His research spanned statistical inference, machine learning, speech and signal pro-
cessing, computational linguistics, and artificial intelligence. He wrote two books
(including The Computational Nature of Language Learning and Evolution) and many
journal and conference papers on these subjects.
MORGAN SONDEREGGER is a PhD candidate in Computer Science and Linguistics at
the University of Chicago. He received his B S from MIT and a master's degree from
Cambridge University. His research addresses stability and change in phonetics
and phonology, both within individuals and at the population level, using corpora
and computational and mathematical methods. He is also interested in quantitative
approaches to linguistics more generally, particularly phonetics, phonology, language
change, and sociolinguistics.
SAM TILSEN is Assistant Professor in the Department of Linguistics at Cornell Univer-
sity. He received his PhD from the University of California, Berkeley in 2009. He is
interested in how speech movements are represented, planned, and coordinated, with
Notes on Contributors xv

the aim of understanding the relation between long-term memory representations


and real-time speech production.

ALAN C. L. Yu is Associate Professor of Linguistics and the College at the University of


Chicago. He also directs the Phonology Laboratory and the Washo Documentation
Project. His research focuses on phonological theory phonetics, language typology
and language variation and change. He is the author of A Natural History of Infixation
(2007, Oxford University Press) and co-editor of the Blackwell Handbook of Phono-
logical Theory, 2nd edition (Wiley-Blackwell, 2011).
This page intentionally left blank
Parti

What is phonologization?
This page intentionally left blank
l
Enlarging the scope
of phonologization*
LARRY M. H Y M A N

"... the original cause for the emergence of all alternants is always purely anthropo-
phonic"
Baudouin de Courtenay (1895 [i9/2a: 184])

i.i Introduction
It is hard to remember a time, if ever, when phonologists were not interested in the
relation between synchrony and diachrony. From the very founding of the discipline,
a constant, if not always central issue has been the question of how phonology comes
into being. As can be seen in the above quotation from Baudouin de Courtenay, the
strategy has usually been to derive phonological structure from phonetic substance.
The following list of movements dating from the early generative period provides a
partial phonological backdrop of the wide-ranging views and interest in the relation
between synchrony and diachrony, on the one hand, and phonetics and phonology,
on the other:
(i) a. classical generative phonology (Chomsky and Halle 1968)
b. diachronic generative phonology (Kiparsky 1965, 1968; King 1969)
c. natural phonology (Stampe 1972, Donegan and Stampe 1979)
d. natural generative phonology (Vennemann i972a, b, 1974; Hooper 19763)

* Earlier versions of this chapter were presented at the Symposium on Phonologization at the University
of Chicago, the UC Berkeley, the Laboratoire Dynamique du Langage (Lyon), MIT, SOAS, and the Univer-
sity of Toronto. I would like to thank the audiences there, and especially my colleagues, Andrew Garrett,
Sharon Inkelas, and Keith Johnson, for their input and helpful discussions of the concepts in this chapter.
Thanks also to Paul Newman and Russell Schuh for discussions on Chadic.
4 Larry M. Hyman

e. variation and sound change in progress (Labov 1971; Labov et al. 1972)
f. phonetic explanations of phonological patterning and sound change (Ohala
!974> 1981; Thurgood and Javkin 1975; Hombert, Ohala and Ewan 1979)
g. intrinsic vs. extrinsic variations in speech (Wang and Fillmore 1961; Chen
1970; Mohr 1971)
For some of the above scholars the discovery of phonetic and/or diachronic moti-
vations of recurrent phonological structures entailed the rejection of some or all of
the basic tenets of classical generative phonology, as represented by Chomsky and
Halle's (1968) Sound Pattern of English (SPE). As a generative phonologist, I found
myself conflicted between a commitment to the structuralist approach to phonology
as reflected in the Prague School (e.g. Trubetzkoy 1939; Martinet 1960) and in SPE,
and a desire to explain this structure in terms of its phonetic and historical under-
pinnings. The resolution I opted for was to focus on the process of phonologization,
which is concerned not only with these underpinnings, but also with what happens
to phonetic properties once they become phonological. Thus, although resembling
Jakobsons (1931) termphonologization (Phonologisierung), which is better translated
as phonemicization (whereby an already phonological property changes from allo-
phonic to phonemic), I intended the term to refer to the change of a phonetic property
into a phonological one. Definitions of phonologization from this period include the
following:

A universal phonetic tendency is said to become 'phonologized' when language-specific refer-


ence must be made to it, as in a phonological rule. (Hyman 1972: 170)
phonologization, whereby a phonetic process becomes phonological.... (Hyman 1975: 171)
... what begins as an intrinsic byproduct of something, predicted by universal phonetic prin-
ciples, ends up unpredictable, and hence, extrinsic. (Hyman 1976: 408)

As opposed to Jakobsons term, which referred to the development of contrasts, my


specific interest was in the development of allophony. However, as seen in the last
quotation above, I explicitly referred to Wang and Fillmore's intrinsic vs. extrinsic
terminology, which they identify as follows:
... in most phonetic discussion, it is useful to distinguish those secondary cues which reflect
the speech habits of a particular community from those which reflect the structure of the
speech mechanism in general. The former is called extrinsic and the latter, intrinsic. (Wang
and Fillmore 1961: 130)

Since a clear distinction was not always made at the time between allophonic varia-
tions which might be captured by phonological rule and language-specific phonetics,
the two were often lumped together. The result is a potential ambiguity, depending on
whether one makes a distinction between allophonics and language-specific phonet-
ics and, if so, whether the latter is identified as 'phonology' or as phonetics.
i. Enlarging the scope of phonologization 5

I have two goals in this chapter. First, I wish to explore the above notion of phonol-
ogization further, specifically addressing the role of contrast in the phonologization
process. Second, I wish to show how phonologization fits into the overall scheme
of the genesis and evolution of grammar. Extending the concept of phonologization
to a wider range of phonological phenomena, I shall propose that it be explicitly
considered as a branch of grammaticalization or what Hopper (1987:148) refers to as
'movements toward structure'.

1.2 Phonologization and contrast

As stated in section 1.1, discussions of phonologization have focused on intrinsic


phonetic variations which tend to become extrinsic and phonological. The most
transparent of these concern cases of what Cohn (1998: 30) refers to as phonetics
and phonology 'doublets'. Processes such as those listed in (2) may be phonetic in one
language, but phonological in another:

(2) process subsequent developments (incl loss of trigger)


a. lengthening before voiced Cs: /ab/ >> [a:b] (> a:p)
b. palatalization: /ki/ >> [k7i] (> ci, si, tsi, si)
c. high vowel frication: /ku/ >> [khu] (> kxu, k f u, p f u, fu)
d. anticipatory nasalization: /an/ >> [an] (> N , :, )
e. umlaut, metaphony: /aCi/ -> [aeCi] (> eCi, eCa, eC)
f. tonogenesis from coda: /a?/ >> [a?] (> a)
g. tonogenesis from phonation: /a / > [a ] (> )
h. tonal bifurcation from onset: /b/ >> [b] (>p)

In order for there to be a phenomenon of phonologization and such doublets, it


is of course necessary to recognize a difference between phonetics and phonology.
Some of the characterizations of phonetics vs. phonology by those who assume a
difference (e.g. Cohn 1998, 2007; Keating 1996; Keyser and Stevens 2001; Kingston
2007; Pierrehumbert 1990; Stevens and Keyser 1989, etc.) are presented in (3).

(3) phonetics phonology


gradient > categorical
continuous > discrete, quantal
quantitative > qualitative
physical > symbolic
analog > digital
semantic > syntactic

As seen, phonetics and phonology can have very different properties. As one pro-
ponent of the distinction puts it, 'The relationship of phonology to phonetics is
6 Larry M. Hyman

profoundly affected by the fact that it involves disparate representations.' (Pierrehum-


bert 1990: 378). While most of the above descriptors are well-known and straightfor-
ward, others are intended as analogies, e.g. analog vs. digital, semantic vs. syntactic
(Pierrehumbert 1990). It should be noted that the phonetics-phonology relationship
is not one of universal vs. language-specific, since much of phonetics is itself language-
specific (cf. below).
Two diagnostics were proposed for determining that phonologization has
occurred: (i) A phonetic effect is exaggerated beyond what can be considered uni-
versal, (ii) A categorical' rule of phonology must refer to the phonologized property.
As an example of the first diagnostic, the vowel length difference in English words
such as bat [baet] and bad [bae:d] exceeds any intrinsic tendency for vowel duration
to vary as a function of the voicing of a following consonant (Chen 1970). Another
example comes from the intrinsic pitch-lowering effect of voiced obstruents which
produces the so-called 'depressor consonant' effects in many tone languages: 'Tonal
depression in Nguni languages has become phonologized. This means that there is no
longer a transparent phonetic explanation for it, and secondly that the phonetic effect
has been exaggerated.' (Traill 1990: 166).
The second diagnostic can also be illustrated via the effects of depressor consonants
in Ikalanga (Hyman and Mathangwane 1998:197, 204). As seen in (4a), when the L-L
noun clthu 'thing' is followed by L-H cuc 'your sg.' there is no tone change:

(4) a. [c-th c-co] 'your thing' c. [zv-th z v -zo] 'your things'


b. [ci-p c-c] 'your gift' d. [zvi-p zvii-z] 'your gifts'

LH L H L H L H
In (4b), however, the H of the L-H noun cl-po gift' spreads onto the pronoun, pro-
ducing a HL-H sequence. In (4c), the corresponding plural of (4a), there again is no
tone change, as expected, since the input is a L-L + L-H sequence. In (4d), the plural
and tonal correspondent to (4b), we do expect the H of -po to spread onto the plural
prefix zvn-, as it did in the singular in (4b). However, this does not occur, because the
voiced obstruent [zv] belongs to the class of depressor consonants which block H tone
spreading in Ikalanga. Since the depressor effect must be referred to by a categorical
phonological rule (H tone spreading) the second diagnostic has been met. As is well
known to Africanist tonologists, there is a tug-of-war between the natural tendency
for tone to spread vs. the intrinsic effects of consonants on pitch:
Since L-H and H-L tend to become L-LH and H-HL as a natural horizontal assimilation
[tone spreading], it can now be observed that the natural tendency of tones to assimilate
sometimes encounters obstacles from intervening consonants. Voiceless obstruents are adverse
to L-spreading, and voiced obstruents are adverse to H-spreading. The inherent properties
of consonants and tones are thus often in conflict with one another. In some languages (e.g.
Nupe, Ngizim, Ewe, Zulu), the consonants win out, and tone spreading occurs only when the
i. Enlarging the scope of phonologization 7

consonants are favorably disposed to it. In other languages (e.g. Yoruba, Gwari), the tones
win out, as tone spreading takes place regardless of the disposition of intervening consonants.
(Hyman 1973: 165-6)

In the terms of Archangeli and Pulleyblank (1994: 211), voiced obstruents are 'antag-
onistic' to H tone spreading, while other consonants are 'sympathetic'.
Two questions concerning what phonologization was (is) supposed to be are:
(i) Does 'intrinsic' mean unavoidable, i.e. 'universally present', or 'universal tendency'?
(ii) Does phonologization require that the phonetic feature of the trigger be con-
trastive? As mentioned earlier, it is widely accepted that one must distinguish between
universal and language-specific phonetics (Keating 1988,1990; Cohn 1993; Kingston
and Diehl 1994, etc.). What this means is that there are two diachronic reanalyses
which need to be recognized, as in (5):
(5) a. b. c.
universal phonetics > language-specific phonetics > phonology
('automatic') ('speaker-controlled') ('structured')
First, a perhaps unavoidable universal phonetic property takes on a language-specific
form which cannot be said to be strictly automatic or mechanical. The result is still
phonetic in the sense of (3), e.g. it may still be gradient rather than categorical. The
second diachronic reanalysis occurs when the language-specific property becomes
phonological in the traditional sense, i.e. structured, categorical.
This brings us to the question: What does it mean to be 'phonological'? This will
determine where 'phonology' begins in (5). For some, anything language-specific,
hence (sb), is phonology by definition: '... any rule, gradient or binary, phonologized
or categorical, to the extent that it appears in the grammar is fully phonological'
(Hajek 1997:16). The generative approach is to view phonology as a module of gram-
mar. However, there is a notoriously fuzzy boundary between postlexical phonology
(Kiparsky 1982) and phonetic implementation (Pierrehumbert 1980): 'The fact that
it is difficult to draw a line follows in part from the conception of phonologization
(Hyman 1976), whereby over time low-level phonetic details are enhanced to become
phonological patterns' (Cohn 2006: 30). Even some of the basic distinctions in (3)
have come under scrutiny. Cohn (2006) and Chitoran and Cohn (2009) consider the
possibility of categorical phonetics and gradient phonology, while Silverman (2oo6a:
214) apparently considers all of phonology to be gradient:
... there is no such thing as 'phonologization : at the proper level of description, all phonological
patterns are sound changes in progress, as they are all gradiently and variably implemented, and
they are all ever-changing... gradience and variation are the very stuff of phonology and sound
change...

If the boundary between phonetics and phonology is elusive, perhaps one can less
ambiguously characterize phonologization in terms of contrastiveness, the hallmark
8 Larry M. Hyman

of structuralist phonology. Here the central question is: What does it mean to be
'contrastive'? As summarized in (6), the term has been used to refer to different levels
of representation and to different domains:

(6) a. contrastive at what level? b. contrastive within what domain?


morphophonemic (URs) within morphemes
phonemic within words (or at stem or word boundaries)
phonetic across words (or at phrase or utterance
boundaries)

Even if we limit ourselves to the quest for minimal pairs, hence words, it is still
necessary to distinguish between underlying and surface contrasts. Many of the exam-
ples of phonologization discussed in the 19705 concerned the 'redundant' effects of
contrastive features, e.g. [voice] in the following two examples:

(7) voice contrast redundant effect contrastive effect


a. /baet/,/baed/ -> [baet], [bae:d] > [baet], [bae:t]
b. /p/,/b/ -> [p], [b] > [p], [p]

(7a) concerns the oft-reported vowel length difference observed before voiced vs.
voiceless stops in English (see Purnell et al. 2005 for updated findings and more
subtle discussion). Since vowels are also longer before fricatives and sonorants, e.g.
gas [gae:s], man [mae:n], the process appears to be one of shortening before voiceless
stops (House 1961). Be that as it may, the durational differences are first phonologized
and then potentially phonemicized by final devoicing, as seen in the outputs. Concep-
tualized this way, the underlying voice contrast would correspond to a surface length
contrast in English.
The second case, (7b), has been much discussed in both the phonologization and
tonogenesis literature. Here we start with a H tone on syllables whose obstruent
onset differs in voicing. As seen, the intrinsic lowering effect of voicing on/0 is first
phonologized to create a rising tone on [b], whose consonant subsequently under-
goes devoicing. The result is a 'tonal bifurcation whereby the rising tone becomes
phonemic.
Much of the work on phonologization concerns such cases of re- or transphonol-
ogization of contrasts (Jakobson 1931; Hagge and Haudricourt 1978). There are at
least two possible interpretations of the voicing effects on duration and/0. The first is
that the phonologizations in (7) represent an enhancement of phonetic voicing. The
second is that they instead enhance the phonological [voice] CONTRAST. The latter
view of phonologization is explicitly adopted by a number of researchers:

... because no other articulation is likely to produce the F0 depression as an automatic byprod-
uct, the depression must itself be a product of an independently controlled articulation, whose
purpose is to enhance the [voice] contrast. (Kingston and Diehl 1994: 425)
i. Enlarging the scope of phonologization _9

Enhancement of the type we are considering here can be considered as a form of 'fine-tuning'
of a basic phonological contrast. (Keyser and Stevens 2001: 287)

While it is possible to view such 'redundant' effects of voicing as enhancements


which provide additional cues of the voicing contrast, the question is whether this
strengthens vs. weakens the contrasting feature, here [=bvoice]. It is quite striking how
allophonic variations such as in (7) often lead to the loss of the original contrast. In
fact, some have seen transphonologization as having the purpose of maintaining a
contrast which is being threatened:
la transphonologisation: une opposition ayant valeur distinctive est menace de suppression;
elle se maintient par dplacement d'un des deux termes, ou de l'opposition entire, un trait
pertinent continuant, de toute manire, distinguer ces termes (Hagge and Haudricourt
1978:75)

On the other hand, phonologization need not imply transphonologization:

I will use the term phonologization throughout to mean specifically the innovation of changes
to phonological representations, whether these result in neutralization of contrasts or not.
(Barnes 2006: 16)

However, is phonologization always motivated by contrastiveness? In the present


context the question is: What can contrastive [voice] do that phonetic voicing can't?
This question will be further examined in section 1.2.1.

1.2.1 Voicedprenasalized consonants and tone


Recall that we are concerned in determining if it is only contrastive [zbvoice] which
may trigger phonologization. As a hypothetical test case, consider a language which
has /t, k, b, d, g/, but no /p/. As seen in (8a), we begin with CV inputs with H tone:

(8) input phonologized transphonologized


a. t, ka t, ka t, ka
b. da, g da, g t, k
c. b b ? p ?

In (8b) these H tones become rising after [d] and [g], a phonologization which could
be seen as an enhancement either of phonetic voicing or of their contrast with /t/ and
/k/. The real question is what would happen in (8c), where /b/ is phonetically voiced,
but does not contrast with /p/. Would the redundant voicing of [b] have an/0 effect,
as shown, or would this phonologization be blocked because there is no contrast with
[p] ? The phonological enhancement theories of Kingston and Diehl (1994) and Keyser
and Stevens (2001) would need to be tweaked by some notion of phonetic analogy
(Vennemann i972a) if (8c) does develop the rising tone. On the other hand, (8c)
seems to be allowed, if not predicted, by Ohala's (1981, 1992, i993b) theory of sound
lo Larry M. Hyman

change, which involves a reinterpretation of the phonetic signal, as well as Kiparsky's


(1995: 656) 'priming effect': 'Redundant features are likely to be phonologized if
the languages phonological representations have a class node to host them'. That is,
the intrinsic/o effect of voiced obstruents is most likely to become phonologized in
languages which already have a tonal contrast (Matisoff 1973; Svantesson 1989).1
While the above example and discussion are hypothetical, a real test case can be
derived from the following characteristic effects of 'depressor consonants' in African
tone systems:
(9) a. trigger b. block
i. lowering of H or L i. raising of H or L
ii. conversion of H to LH or L ii. H tone spreading (cf. (40!))
iii. delinking of H (esp. if followed by H) iii. H tone plateauing

To account for the relation between consonant types and tone in synchronie phonolo-
gies, Halle and Stevens (1971) and Halle (1972) proposed the following distinctive
feature analysis, where [stiff] = stiff vocal cords and [slack] = slack vocal cords:
(10) tones voiceless obstruents sonorants voiced obstruent
H M L ptkfs mnlwy bdgvz

stiff + - - + - -

slack - - + - - +

As seen, both H tone and voiceless obstruents are [+stiff, -slack], while L tone and
voiced obstruents are [-stiff, +slack]. Both M tone and sonorants are [-stiff, -slack].
Like vowels, sonorant consonants readily accept any tone, while obstruents have the
tonal affinities indicated above. While these features are often assumed to this day,
there are additional complications, as noted in the observations in (11).
(11) a. The above three-way distinction is not sufficient for tone (there can be a
fourth or fifth contrastive pitch level).
b. The above three-way distinction is not complete for consonants (Hombert
1978), e.g:

i. implosives are often pitch-raisers, hence expected to pattern with voice-


less obstruents
ii. breathiness and creak are typically pitch lowerers; aspiration is more
complex.
1
Nick Clements has brought Ewe to my attention, where /b/ and l<\ are depressors even though /p/
occurs only in borrowings, and there is no voiceless counterpart to IdJ at all (see Clements 2005).
i. Enlarging the scope of phonologization 11

c. While the 'best' pitch depressors are fully or breathy voiced obstruents,
and although the phonetics of voice is complex (Kingston and Diehl 1994),
depressor consonants readily become unvoiced, e.g. in Nguni (Schachter
1976; Traill 1990; Downing 2009).
d. Prenasalized voiced stops [mb, nd, ng] are sometimes depressors, sometimes
not.
It is the observation in (nd) which potentially bears the question with which we
are concerned: Is it phonetic voicing or enhancement of CONTRAST!VE [voice] that
causes depressor effects? The following quotations show that there is a widespread
belief that the voicing on depressor consonants is necessarily contrastive:
... FO will only vary with the presence of voicing in stops that contrast for [voice].... (Kingston
and Diehl 1994: 436)

Since implosives and prenasalized stops are not contrastively voiced [in Suma], they are
assumed to be unspecified for the feature [voice] and, therefore, naturally excluded from the
depressor consonant group. (Bradshaw 1995: 263)
II convient de souligner que seules les consonnes phonologiquement sonoresc'est--dire
s'opposant des sourdes de mme point et mode d'articulationexercent un effet d'abaissement
[in Yulu], ce qui n'est jamais le cas des consonnes phontiquement sonores des sries glottalise
(partiellement), prnasalise, nasale continue et vibrante. Cet tat de fait prouve, s'il en est
besoin, la pertinence d'une approche phonologique des units articulatoires. (Boyeldieu 2009:
i99n; emphasis my own)

While Bradshaw and Boyeldieu assume that implosives fail to lower pitch because they
are non-contrastively voiced, the prevalent view has been that rapid lowering of the
larynx and tensing of the vocal chords provide quite adequate phonetic explanations
for why implosives tend to pattern with voiceless obstruents and H tone.2 On the
other hand, the ambivalent behavior of the voiced prenasalized stops [mb, nd, rjg],
which are sometimes depressors and sometimes not, is indeed puzzling. The question
is whether their ambivalence has anything to do with contrastiveness.
As a practicing structuralist phonologist, my initial hypothesis was that /mb, nd, ng/
would function as depressor consonants only in languages where they contrast with
/mp, nt, rjk/. In order to test this hypothesis, I examined the relatively small group
of African tone languages which have both depressor consonant effects and voiced
prenasalized consonants (ND), whether contrastive with their voiceless counterparts
(NT) or not. The results are presented in the following table:
2
More recently, Tang (2008) has argued that the tonal effects of implosives can pattern with those
of voiceless obstruents, voiced obstruents, or sonorants in different languages. While implosives do not
contrast in voicing in these languages, it is yet to be determined to what extent these differences can be
attributed to differences in phonetic production. The same conclusion will be reached with respect to voiced
prenasalized stops.
12 Larry M. Hyman

(12) ND contrasts with NT ND doesn't contrast with NT

ND = depressor Nguni* Lamang, Musey, Ngizim,


Ouldeme, Podoko, Mbuko

ND T depressor Bole, Geji, Miya, Zar; Yulu,


Suma, Mijikenda*

(*Nguni includes Swati, Zulu, Ndebele, Xhosa; Mijikenda includes Giryama,


Digo, Kauma, Rihe.)

As seen, three of the four logical combinations of the two properties ([=bcontrast],
[zbdepressor]), were found. Setting aside borrowings (see below), the only languages
with a /NT, ND/ contrast were the Bantu Nguni languages of Southern Africa. Of
the remaining languages, all of those in the upper right quadrant are Chadic, as are
the first four languages of the lower right quadrant. Yulu is Central Sudanic, Suma is
Ubangian, and the Mijikenda languages are Bantu.
From (12) we conclude the following: (i) If/ND/ contrasts with /NT/, /ND/ will
have the same/0 effects as /D/. (ii) If ND does not contrast with NT, ND may have
the same/o effects as 111 or /D/. As mentioned, the first group consists solely of the
Nguni languages, e.g. Swati:
In all cases [in Swati], the prenasalized counterparts of depressor consonants are themselves
depressor consonants, while the prenasalized counterparts of non-depressor consonants are
themselves nondepressors. (Schachter 1976: 213)

It may be relevant to note that the Nguni languages have a rule of postnasal deaspi-
ration (NTh >> NT). The alternations in (13) illustrate the application of this rule in
Ndebele (Felling 1971; Galen Sibanda, pers. comm.):
(13) a. u-phondo 'horn' pi. im-pondo cf. impisi 'hyena
u-p awu 'sign, mark' pi. im-pawu imbizi pot, pan'
b. u-thango 'fence' pi. in-tango cf. intaba hill, mountain
u-t ungo 'rafter' pi. in-tungo indaba 'matter, news'
c. u-k h uni 'firewood' pi. irj-kuni cf. inkalo 'waists, hill passes'
u-k alo 'waist' pi. in-kalo in galo 'arm'
As seen in the forms to the right, this distributional constraint produces (near) min-
imal pairs involving unaspirated [mp, nt, nk] vs. voiced [mb, nd, ng]. The latter's
depressor effect on tone may therefore be a welcome cue for the voicing contrast.
It is interesting to note in this context that a much larger group of Bantu languages
have a rule of postnasal aspiration (NT >> NT h ), e.g. Mwiini, Zigula, Pokomo, Pare,
Shambala, Ngulu, Bondei, Namwanga, Chichewa. This process may then lead to the
i. Enlarging the scope of phonologization 13

transphonologization of aspiration (NT > T h ), as in Swahili, Yaa, Giryama, Digo,


Yaka, Cokwe, Makua, and Venda. As a result, the Mijikenda languages Giryama and
Digo have a surface contrast between Th and ND consonants, the latter of which are
not depressors.
Turning to the languages in the right-hand column of (12), where ND does not
contrast with NT, it should be noted that the difference between the two groups of
languages cannot be attributed to the nature of the tonal property in question. Quite
comparable tonal proceses occur in languages which treat ND differently, e.g. register
lowering after ND in Podoko, but not in Yulu, blocking of H tone spreading by ND in
Ngizim, but not in Bole or Zar.
The question is how to explain the inconsistent depressor status of ND when voicing
is non-contrastive. We will mention four potential accounts. The first is to seek an
explanation in phonetic terms: NDs may have slightly different phonetic properties
in languages where they function as depressors vs. those languages in which they
function as non-depressors. Perhaps ND is fully voiced in one language, but partially
devoiced in another. Or perhaps there are slightly different phonations associated with
ND in the different languages. Another phonetic difference could be in the timing of
the nasal vs. oral portions of the unit: depressor NDs might have a longer D phase than
non-depressor NDs. Since Cohn and Riehl (2008) have recently argued that there is
no phonetic difference between a prenasalized stop ( N D) and a post-stopped nasal
(N D ), pointing out that the D phase is universally short, this does not seem likely
nor is there any motivation for recognizing monosegmental N D vs. bisegmental ND.
In the absence of instrumental evidence, speculations on phonetic differences are
simply that.
A second approach is to seek an explanation in the history of the different lan-
guages. For instance, perhaps ND behaves as a depressor when it derives from *D,
perhaps as 'hypervoicing' (Iverson and Salmons 1996), but as a non-depressor when
it derives from *N via partial denasalization (Wetzels 2007). Although such sources
have been documented in Mexico, Amazonia, New Guinea, and other parts of the
world, the history is less clear in Chadic, which we have seen to be inconsistent in
how it treats ND and tone. A different kind of history might be one involving analogy:
Perhaps languages with depressor ND have (or had) processes by which D and ND
were morphophonemically related, which then caused the pitch-lowering effect of D
to extend to ND. Perhaps this relationship was missing in the other languages, which
may instead have had a relation between N and ND. Like the first two accounts, this
one also is speculative in the absence of historical evidence.
A third strategy is to recognize ND as a separate category from the three conso-
nant types distinguished in (10). Perhaps the high-to-low hierarchy of consonant-/0
relations should be T N ND D (where N = sonorants) with languages
drawing the depressor line in different places, as in (14).
14 Larry M. Hyman

(14) H M L

ND = depressor T N ND D

ND ^ depressor T N ND D

The problem is that we do not know what the intrinsic effects of ND on/0 really are.
The hierarchy in (14) suggests that ND has more of a depressor effect than N, but less
than D. We don't really know this other than from the phonological facts, which are
inconsistent. What is needed are instrumental studies of ND in languages which have
not phonologized depressor consonant effects. We need to do this both for languages
which have a phonetic NT/ND contrast, e.g. Luganda, and which don't, e.g. Kinande
ultimately establishing what the intrinsic effects of ND are expected to be even in
non-tone languages.
The fourth and last account seeks an explanation in terms of contrast, but in the
absence of/NT/ suggests that it is a different contrast that is being enhanced: /ND/ vs.
/N/. Languages which treat ND as a depressor do so to distinguish it further from N.
Particularly if the oral phase is minimal, there could be perceptual confusion between
ND and N, and hence transphonologization via the tone of the following vowel. Such
has happened in Masa, a Chadic language closely related to Musey. While /H/ tone
can occur after any consonant, there is a (near-) predictability of L vs. M tones as in
(15) (Catucoli 1978: 77):

(15) initial root segments tone


a. b, d, g, v, z, z,fc;,fi L
b. p, t, k, f, s, c, i, h, , cf, 1, r, w, y, a, e, i, o, u M
c. m, n, rj L, M

As seen in (isa), L tone appears after a voiced obstruent, while M tone appears if
the root-initial segment is a voiceless obstruent, an implosive, or an oral sonorant,
including vowels. While several Chadic languages have similar distributions of L and
M tones, the originality of Masa is that it has a L vs. M contrast after nasals. The
reason, of course, is that there has been a sound change of *mb, *nd, *rjg > m,
n, rj with the original contrast being transphonologized in terms of L vs. M pitch.
Crucially, those roots which had historical *ND now have L tone, while those which
began with *N have M tone. Since closely related Musey treats ND as a depressor
(cf. (12)), we can be reasonably certain that the same was true in pre-Masa before
the prenasalized consonants lost their oral release. While we cannot predict which
nasals will be depressors, it is possible to say that contrastive [+voice] necessarily
conditions L tone: Le ton moyen est incompatible avec les consonnes sonores ayant
une correspondante sourde... ' (Catucoli 1978: 77).
i. Enlarging the scope of phonologization 15

Transphonologization of an earlier ND vs. N contrast is not without parallel. As


seen in (16), such a merger, either complete or in progress, has been transphonolo-
gized as a contrast in vowel nasalization in several Western Austronesian languages
(Court 1970):

(16) *NDV *NV



a. SeaDayak [narja] to set up ladder' [nn] to straighten
b. Sundanese [mandi]
to bathe' [man] Very'

c. Ulu Muar Malay [mrjoet] to twitch' [rnrjoe?] to bellow'
d. Mntu Land [am^ak] gong stick' [amak] 'sleeping mat'

Dayak [jin^a?] to love' [jiina?] 'snake (sp.)'

As seen, progressive vowel nasalization appears to set in before *ND completely


loses its oral release, just as we can assume the depressor effect of ND to precede its
simplification to N in Masa. Western Austronesian and Chadic are thus quite parallel,
the difference being the feature that is chosen for the transphonologization. While
Western Austronesian is sensitive to the nasal vs. oral release of N vs. ND, the contrast
which is enhanced in Chadic is the sonorant vs. obstruent release of N vs. ND. As we
have seen, it is the combination of obstruent and [+voice] that produces the pervasive
/o lowering seen both in African tone systems and elsewhere. The problem, of course,
is to show with certainty that the ND depressor languages in the upper right quadrant
of (12) have a shaky ND vs. N contrast in need of reinforcement as against a more
robust ND vs. N contrast in the languages of the lower right quadrant.
In (12) the different African languages were classified according to whether they
have a contrast between /NT/ and /ND/. One complication concerns what to do
about languages which have NT only in borrowings, e.g. Ikalanga kmp a camp',
pnte 'paint', donki 'donkey'. In this language of the Shona group, inherited *mp, *nf,
*v)k become p^y t^y Ey whereas in Shona proper *mp, *nf, *rjk > m^y n^y E. In both
languages the resulting consonants lower pitch, thereby illustrating that *NT can also
develop into depressor consonants.
To summarize, we have seen that depressor NDs suggest that the effects of a
non-contrastive [+voice] trigger may also be phonologized. As Sharon Inkelas (pers.
comm.) has reminded me, this is reminiscent of the interaction of predictable post-
nasal voicing with Lyman's Law in Japanese (Ito, Mester, and Padgett 1995). As in
the Japanese case, we are still faced with how to formalize the synchronie differences
between the two groups of languages in the right-hand column of (12). This turns
out not to be a problem, rather a case of having too many possibilities. First, since
postnasal voicing is redundant, one could analyze the non-depressor vs. depressor
difference as /NT/ vs. /ND/. Or, one could use different feature or feature-geometries
for the two kinds of NDs, underspecification, or perhaps different contrast hierarchies
(Dresher 2003, 2009; Mackenzie 2008), as in (17).
16 Larry M. Hyman

(17) a. Ngizim b. Miya

[+voice] [-voice] [+prenasalized] [-prenasalized]

/\ /\
[+prenasal] [-prenasal] [+voice] [-voice]

As seen in (i/a), the primary contrast is [zbvoice], which is further differentiated


into [zbprenasalized] (or whatever feature/representation distinguishes D and ND). In
(i/b), however, the first cut is [zbprenasalized], and only [-prenasalized] consonants
are further distinguished for [zbvoice]. If tone is sensitive to [+voice], ND consonants
will be depressors in Ngizim, but non-depressors in Miya.
The issue of providing different underlying representations for the 'same' segment
types in different languages is an old tradition, and it has come in handy in treating
nasality (see Piggott 1992 and Rice 1993, for instance). In order for such a move to
be compelling it must not appear circular or ad hoc, but rather have implications that
hold through the language question. So far this has turned out to be a problem. Schuh
(1998: 13), for example, treats the non-depressor NDs of Miya as [+sonorant], but
recognizes that this poses a problem for one of his rules:

... if the last consonant in a word is an obstruent, it must be followed by /a/, whereas if the
last consonant is a sonorant, nasal, it cannot... Here, prenasalized consonants pattern with
obstruents (gmbd gourd' vs. gwgm 'dove').

While he proposes to account for the inconsistency by proposing that ND begins as a


sonorant (hence a non-depressor) and ends as an obstruent (hence requiring schwa),
it has already been pointed out that the same tonal process may occur in both types
of languages in the right-hand column in (12). Since my interest here is in the nature
and motivation of the phonologization process, I will leave further implementational
issues to another time.3

1.2.2 ATR harmony in Punu


In the preceding subsection we have seen that it is possible for phonologization to
be triggered by a non-contrastive feature. In this section I present a perhaps even
more striking case of this involving ATR vowel harmony in Punu, a Bantu language

3
Louis Goldstein has suggested to me that when the voicing of ND is non-contrastive, speakers need
not invoke articulatory mechanisms that result in lowered pitch, whereas such mechanisms are unavoidable
when there is a contrast with NT. It is significant that all of the examples cited by Lee (2008) involve
depressor consonants whose voicing is contrastive. Most striking is Tsonga (Baumbach 1987), where NDs
do not contrast with NT and are not depressors, but their contrastive breathy counterparts ND are. In such
a case, there is a disincentive for ND to exploit the gesture(s) which result in the lowering of/0. Thanks to
both Louis Goldstein and Maria-Josep Sol for helpful discussions of these matters.
i. Enlarging the scope of phonologization 17

spoken in Gabon. It is useful to distinguish two prototypes of vowel harmony (VH),


each of which shows clear structure-dependency. The first is root-controlled VH
(Clements 1981) whereby harmony expands out from a root vowel to affixes. This
type of harmony is often bidirectional, feature-filling, and structure-preserving. The
second prototype is non-root-controlled and is often referred to as 'metaphony' or
'Umlaut'. In this case VH is anticipatory, hence unidirectional. Suffixes can be triggers,
while prefixes rarely, if ever, are (Hall and Hall 1980; Hyman 2002, 2oo8a; Krmer
2003). Prefix-triggered VH on a following vowel is rare or non-occurring because
it is neither root-controlled nor anticipatory (Hyman 2002). Attempts to attribute
VH to the phonologization of vowel coarticulation (Ohala i994b; Beddor and Yavuz
1995; Przezdzieci 2005) must account for why VH is typically unbounded and word-
delimited (cf. Barnes 2006: 197-200).
In this section we are concerned with non-contrastive ATR harmony in Punu
(Kwenzi Mikala 1980; Fontaney 1980). In Punu the five vowels /i, , u, o, a/ contrast
within the first CV of a root, most of which are CVC-. Prefixes, suffixes, and non-
initial root vowels are limited to /i, u, a/. Although /e, o/ are limited to the first syllable
of a root, they become tense or [+ATR] in the following contexts (Kwenzi Mikala
1980: 9):
(18) a. /,o/ -> [e,o] / Ci
b. /,o/ -> [e,o] ~ [e,o] / Cu
c. /, o/->> [e, o] / Ca
Other than occurring in occasional ideophones, the only other occurrences of [e] and
[o] result from the fusion of/a+i/ and /a+u/, respectively, which succeed each other
only in prefixes:

(19) a. /a-i-lab-i/ -> [e-lab-i] he sees' (-lab-'see'is the root)

b. /a-u-lab-a/ >> [o-lab-5] he will see'
Finally, an /a/ which occurs 'post-radically', i.e. after the first syllable of the root, is
automatically realized as [a].
With the above vowel processes established, the distribution of underlying and
surface vowels can be summarized by position, as in (20):
(20) Prefixes Root Suffixes/post-radical vowels
Underlying: /i, u, a/ /i, , u o, a/ /i, u, a/
Surface: [i, e, u, o, a] [i, e, , u, o, o, a] [i, u, a]
Since the fusions in (18) result in [e, o], not* [e,o], a feature such as [+tense] or [+ATR]
can be assumed to be phonologically 'active' on /i, u/. In (21) I assume a privative
feature analysis, where each of the features A, F, R, and O is phonologically active
(Hyman 2002, 2003):
18 Larry M. Hyman

(2l)
underlying vowels derived vowels

i u o a e o 9
ATR A X X X X

Front F X X X

Round R X X X

Open O X X X X X

As seen, the postradical process /a/ > [9] would be interpreted as the deletion of the
Open feature (which technically yields [i], from which Punu [9] is non-distinct).
The crucial point concerns the assimilation of // and /o/ to [e] and [o] before /i/
and /u/. This clearly has to be viewed as a phonologization of the common tendency
to tense mid vowels when they are followed by a high vowel in the next syllable.
However, it can be observed from the feature specifications in (21) that the ATR
feature, although active, is non-contrastive on the input vowels: Without ATR, /i/ and
/u/ would still be distinct from /, o, a/ in not having an Open feature. Thus, the tensing
process involves the phonologization of a non-contrastive feature.
Recall from section 1.2.1 that we allowed for the possibility that non-contrastively
voiced ND might exert a depressor effect by virtue of its contrast with plain nasals. It is
hard to make a similar case for Punu. Since post-radical /i/ and /u/ contrast only with
/a/, which is realized as [9], there seems to be little, if any, need to enhance this highly
redundant, minimal contrast. In fact, there are additional processes which further
obscure post-radical vowels. The first two in (22a, b) concern R- and F-VH, while
/a/-reduction is repeated in (22c).

v 22 ) a. a, i>u/ Cu i u a
b. a >> i / Ci
i i-i u-u i-9
c. a >> 9
u u-i u-u u-9

a i-i u-u 9-9

The rules in (22a, b) result in considerable loss of contrast. As seen in the distribu-
tions to the right, nine phonological inputs result in only six distinct outputs. What's
worse, when /CC-aC-i/ and /CC-aC-u/ are realized as CeCiCi and CeCuCu, the
input /a/ is no longer recoverable. The inescapable conclusion is that phonologiza-
tion is not necessarily triggered by contrastiveness, nor does it necessarily lead to
i. Enlarging the scope of phonologization 19

transphonologization (cf. Blevins 2004: 43). While Punu may ultimately develop an
underlying seven- (or eight-) vowel system, the mid-vowel ATR harmony appears to
have been phonologized as a 'mere' articulatory convenience!
In the following section we will extend these findings to other phonological phe-
nomena and then turn to their relation to grammaticalization in general.

1.3 Phonologization and grammaticalization


In section 2 it was established that phonologization is not necessarily dependent on
contrastiveness. In this section I first compare this result with other types of phonol-
ogization and then suggest that phonologization should be viewed as one aspect of
grammaticalization.

1.3.1 Other types of phonologization


While most of the discussion has centered around the phonologization of pho-
netic processes, the terms 'phonologization' and grammaticalization' have both been
invoked to refer to the activation of any formal property within a phonology. The
question, then, is whether other phonologizations such as those listed in (23) are
dependent on contrastiveness?
(23) a. distinctive features (redundant or contrastive)

It will be generally assumed that the inventory of phonological features is identical


to the inventory of phonetic features, and that languages implement these universal
phonetic features in various linguistic ways. In other words, phonetic features can be
'phonologized' (Hyman 1975: 58-9)

b. prosodie constituents: syllable, foot, phonological word


We may interpret the existence of the prosodie unit 'syllable' as a grammaticalization
of one of the planning units for the coordination of muscular gestures. [Re the
foot:] ...for each language this general rhythmic tendency is grammaticalized into
particular phonological rules of foot construction. (Booij 1984: 274)
Since stress has these intrinsic properties associated with it, it is not surprising to find
languages phonologizing... these properties into rules of the language. Numerous
cases of strengthening in stressed syllables and weakening in unstressed syllables are
attested.... (Hyman 1975: 207-8; cf. Barnes 2006: ch. 2)

c. distributional constraints on morphemes, stems, words, ultimately tem-


platic, e.g. the maximum CVCVCV 'prosodie stem' in Tiene, where C2 must
be coronal and C3 must be labial or velar (Ellington 1977; Hyman 2oioa)
d. demarcation: initiality-/finality-effects (Keating et al. 2003; Barnes 2006),
final glottalization (Henton and Bladon 1988; Hyman 1988); also root-affix
asymmetries, stem-initial prominence (Beckman 1997; Smith 2002)
20 Larry M. Hyman

e. intonation based on the grammaticalizatiori of three biological codes


(Gussenhoven 2004: ch. 5)
f. 'boundary narrowing': pause > phrase > word; phonologization of
prepausal effects, which can include final devoicing, debuccalization, glot-
talization, lengthening, 'nasal pause' (Aikhenvald 1996: 511-12), and loss

I would like to suggest that the 'pronunciation in isolation form of a word is its lexical repre-
sentation. At the pause... words may undergo phonetic modifications; in particular, final oral
stops may become unreleased as in English and thereby lose their aspiration, and vocal cord
vibration may cease early, leading to devoicing. Since they occur at the pause, and the ad-
pausal variants are registered in the lexicon according to my proposal, these ad-pausal variants
may next appear in connected speech and may cause or undergo further changes in their new
context. (Vennemann 1974: 364)

Even a cursory glance over (23a-f) will reveal that contrastiveness is involved in
some aspects of the above phonological issues, but not others. Thus, it has long been
observed that syllable structure is never underlyingly contrastiveits very redun-
dancy or predictability in fact kept syllable boundaries (and syllable constituents) out
of early generative phonology:

One argument which has been raised against phonological syllables is that, unlike segments,
the location of a syllable boundary within a morpheme can never be phonemic. That is, two
morphemes such as /a$pla/ and /ap$la/ cannot differ only in their syllable structure.... Because
syllable boundaries can be determined automatically from universal principles and language-
specific facts about the segments contained in the syllables, generative phonologists have largely
worked under the assumption that the syllable is unnecessary in phonology. (Hyman 1975:192)

The syllable would thus appear to have more of an organizational function, rather
than a contrastive one, also presumably the metrical foot and higher level prosodie
domains. The phonologization of prepausal effects is perhaps less clear. It is tempting
to interpret languages which insert prepausal glottal stops as having phonologized
utter anee-final creakiness, as in British English (Henton and Bladon 1988):

... final GS maybe conditioned by a number of disparate factors from all parts of the grammar.
Since the common denominator appears to be 'before pause in declarative utterances', it is
tempting to conclude that such GS's result, historically, from the PHONOLOGIZATION of an
intrinsic variation in the speech signal. In the case of prepausal vowels, the speaker is expected
to cease voicing with the completion of the vowel. When GS is not present, this cessation is
smooth, in many cases giving the impression of a final slight breathiness. On the other hand,
when GS is present, the cessation of voicing is abrupt, giving the impression of a non-syllabic
articulation, i.e. a final 'consonant'. (Hyman 1988: 124)

While some languages suspend the final glottal stop in questions, suggesting a con-
trastive function between declaratives and interrogatives, the situation can be much
i. Enlarging the scope of phonologization 21

more complex. Thus, in Dagbani (Gur; Ghana), a prepausal glottal stop is inserted if
a complex set of conditions is met (Hyman 1988: 122):

(24) a. phonetic condition: before pause


b. pragmatic condition: 'declarative' utterance (i.e. not interrogative)
plus either:
c. syntactic condition: finalword is within scope of negation
or:
d. phonological condition: after a short, stem-final vowel
e. morphological condition: final word is [-Noun]

In fact, final glottal stops do not always derive from prepausal phonologizations. In
certain Akan and Guang languages to the south of Dagbani glottal stops transparently
derive from apocope:

(25) Akuapem/Asante Fante Chumburung Gonja


jri jr? overflow' wurl ka-wul? 'skin
hmf hum? 'breathe' ki-furi ku-ful? 'moon'
tuny tun? 'forge' o-narl e-jiin? 'man
Akan (Schachter and Fromkin 1968: 204) Guang (Snider 1986: 136)

In Tikar, glottal stops are restricted to prepausal position (Jackson and Stanley 1977,
Stanley 1991). As proposed in Hyman (2oo8b), these final glottal stops result from the
debuccalization of coda *f and *fc which are realized as glottal stops before a pause,
but as 0 before a consonant. As part of the process, back vowels were fronted before
*f, while front vowels became backed before *fc, hence transphonologizing the F 2
properties of the two coda consonants as per Thurgood and Javkin (1975).
Concerning boundary narrowing, although Luganda must originally have short-
ened bimoraic long vowels before pause, present-day final vowel shortening is subject
to a number of complex factors and no longer requires pause (Hyman and Katamba
1990). It seems that while contrast can become implicated in a phonologization pro-
cess, it is typically not the driving force of the phenomena enumerated in (23). If the
analysis of Punu in (21) is correct, even a redundant distinctive feature, e.g. ATR, may
first become activated for allophonic effects and only later become contrastive.
While a phonetic motivation has been assumed for all of the phonologizations in
sections 1.1 and 1.2, at least some of the phonological properties in (23) raise the
question of whether phonetics is the only source of phonology, i.e. the only input to
phonologization. At least three other sources of phonology have been proposed in
the literature. First, phonology has been claimed to occasionally arise from frequency
distributions:

... it is possible for a phonological generalization to arise from frequency distributions in the
lexicon rather than from pure coarticulation effects. However, the former type are much less
22 Larry M. Hyman

frequent, since the conditions for coarticulation effects are always present in spoken language.
(Bybee 2001: 94-5)

Second, certain phonological properties have been said to derive from analogical
processes:
... new phonemes can arise through morphophonemic analogy.... In all such cases... no new
distinctive features are added.... morphophonemic genesis merely leads to a combination of
distinctive features which had not previously been used. (Moulton 1967: 1405)
... phonetically unnatural patterns can also arise by analogical processes. Since they are pho-
netically unnatural, they do not have purely phonological origins, but reflect instead the gener-
alization of fortuitous morphological patterns even the most regular morphophonological
patterns may lack phonetic origins. (Garrett and Blevins 2009: 543)

Finally, phonological distributions and alternations can be due to borrowing. For


example, many phonologists assume that English has a rule of velar softening respon-
sible for such alternations as in electric vs. electricity, where the k~s alternation is
clearly borrowed from French.
The above three non-phonetic sources of phonology are of course more indirect
and less frequent producers of phonology than phonetics. If 'phonologizatiori is
interpreted literally as the creation or genesis of phonology, then all of the above
can be referred to as such. However, most phonogenetic work has been concerned
with the phonetics > phonology sense of the term with simultaneous focus on the
structural codification and dephoneticization of possibly universal phonetic ten-
dencies. I return to this dual notion of phonologization + dephoneticization in
section 1.4.

1.3.2 Phonologization as grammaticalization


In this section I address the question of how phonologization fits into the overall
scheme of grammar and grammar change. As indicated in (26), phonologization can
be identified with the second of the four stages which Baudouin de Courtenay's (1895
[ i972a: 197]) proposes for the development of an alternation:
(26) i. embryonic alternation intrinsic takes conscious effort to
perceive
2. neophonetic alternation extrinsic, minimal effort to
or divergence phonologized perceive
3. paleophonetic or phonemicized Phonologisierung
traditional alternation (Jakobson 1931)
4. psychophonetic morphologized, exceptional, arbitary
alternation or lexicalized
correlation
i. Enlarging the scope of phonologization 23

In the above I have provided Baudouin's terminology, the modern equivalences,


and a few descriptive notes. Baudouin's insights are clearly mirrored in the work of
Vennemann (i9/2a, b), Dressier (1976, 1985), Joseph and Janda (1988), and others
on the rise-and-fall 'life cycle' of phonology, where the stages in (27) are distinguished:

(27) phonetic > phonologized > phonemicized > morphologized > lexicalized >
LOSS

We have already discussed phonologization and phonemicization, the latter typically


being the product of transphonologization. Morphologization refers to the loss of the
phonological condition on an alternation, while lexicalization comes in when specific
morphemes have to be marked as undergoing vs. not undergoing the alternation. As
the alternation develops greater exceptionality, one arrives at a stage where there are
only relics of the original rule, followed by its loss entirely.
While intended only to capture the natural history of phonological processes, the
stages in (27) are strikingly similar to the stages of Givn's (1979) proposal for the rise
and fall of syntax and morphology, which I slightly reword as in (28):

(28) pragmatic > syntactic > morphological > morphophonemic > lexical > LOSS

As seen, Givn was primarily concerned with the development of syntax from prag-
matics, which he refers to as 'syntacticizatiori Once a property has become syntactic,
it can then become morphological, as when an original independent word becomes
a concatenated affix, perhaps with phonological reduction or erosion. Givn's mor-
phophonemic stage arises when the original source is obscured, ultimately producing
a phonological alternation which is morphologically conditioned or morphologized.
This alternation may then become lexicalized and lost as in (26).
While phonology plays a role in Givn's view of the rise and fall of grammar, he
is mainly interested in the first three stages of (28), for which he had established
the mantra, 'Today's morphology is yesterday's syntax' (Givn 1971: 413). In fact,
the parallel in (29) is something that phonologists readily acknowledged during this
period:

(29) Phonetics : phonology pragmatics : syntax

... it is... very much part of the business of phonologists to look for 'phonetic expla-
nation of phonological phenomena just as when syntacticians look for pragmatic
accounts of aspects of sentence structure, the reason is to determine what sorts of facts
the linguistic system proper is not responsible for... (Anderson 1981: 497)

Phonetics provides much of the substance of phonology, and pragmatics provides much
of the substance of syntax. However, the ever-present phenomena of phonologization
and grammaticalization cannot be explained by reference to the origin of the substance.
(Hyman 1984: 83)
24 Larry M. Hyman

Two examples of the syntacticization of pragmatic tendencies concern the following


subject-object asymmetries which, as pointed out in much of the literature of the time
(e.g. in various papers in Li 1976), tend to have the properties indicated in (30):
(30) subjects vs. direct objects
given (old) new
presupposed asserted (focused)
definite indefinite
animate inanimate
ist/2nd person 3rd person
actor non-actor
The first example is the tendency for subjects to be definite. While in some lan-
guages the correlation between subjecthood and definiteness is statistical, in others
it becomes a requirement imposed by the grammar. Looking at different discourse
genres in English, Givn (1979: 52) reports the following counts of definite subjects
and direct objects in declarative-affirmative-active clauses:
(31) subject direct object
definite indefinite definite indefinite
302 (91%) 33 (9%) 193 (56%) 156 (44%)
As seen, the skewing between definite and indefinite is dramatic in subject position,
or, as Givn notes, 156/189 of the indefinite noun phases occur as direct object. What
is important is the relationship between English, which tends to have definite subjects,
vs. various Austronesian languages which REQUIRE the subject to be definite (Keenan
1976; Schachter 1976). Put differently, English is at the pragmatic/phonetic stage,
while Malagasy and Tagalog are at the syntactic/phonological stage.
The second example concerns the tendency for the direct object site to double as
a focus position: '...the basic position for the focused or emphasized constituent is
that position which is filled by the object in a neutral sentence' (Harries-Delisle 1978:
464). While, again, this tends to hold pragmatically in discourse, SOV languages
may syntacticize the immediate-before-verb (IBV) position and SVO languages the
immediate-after-verb (IAV) position for focused elements. A case of the latter comes
from Aghem (Watters 1979; Hyman 1984). The sentence in (32a) shows the 'neutral'
word order S AUX V O ADV:
(32) a. n? mo zi k-b ^n 'Inah ate fuf today'
Inah PAST! eat fuf today
b. n? mo zi n ^b ^ko 'Inah ate fuf TODAY'
Inah PAST! eat today fuf DET
c. mo z t n? b ^ko n ' IN AH ate fuf today'
ES PAST! eat Inah fuf DET today
i. Enlarging the scope of phonologization 25

Example (sib) shows that when informational or contrastive focus is placed on the
adverb n 'today', it appears in the IAV position that would otherwise be occupied
by the direct object. Similarly when the subject is in focus in (320), it too appears
in the IAV position, with an expletive subject holding its place. WH-elements also
normally go in the IAV position, as expected, as do other constituents of the sentence,
particularly when they are singled out for exclusive focused information.
The above examples are intended to illustrate the similarities involved in quite
different domains when 'substance' becomes grammaticalized as 'form': The phonol-
ogization of phonetics and the syntacticization of pragmatics are exactly parallel.
Interestingly, reinforcement of a paradigmatic contrast, which has been assumed in
enhancement versions of phonologization and transphonologization, does not seem
applicable here. When the grammar requires a subject to be definite or a focused
element to appear in the superficial object slot, there is the suppression of a paradig-
matic contrast in the one case (subjects no longer contrast in definiteness) vs. the
establishment of a syntagmatic contrast in the other. (To simplify considerably, an
element in the IAV is in a privileged position vis--vis other elements in the sentence.
For recent statements on the IAV and focus in Aghem, see Hyman 201 ob and Hyman
and Polinsky 2009.)
Having established that phonologization bears resemblance to Givn's syntacticiza-
tion, it seems reasonable to incorporate it under the general heading of grammatical-
ization. In (33) I have added phonologization at the bottom of the list of the common
linguistic effects of grammaticalization presented by Heine et al. (1991: 213):

(33) Semantic Concrete meaning > Abstract Meaning


Lexical content > Grammatical content
Pragmatic Pragmatic function > Syntactic function
Low text frequency > High text frequency
Morphological Free form > Clitic
Clitic > Bound form
Compounding > Derivation
Derivation > Inflection
Phonological Full form > Reduced form
Reduced form > Loss in segmental status
ADD: Phonetic substance > Phonological form

Although Heine et al. see phonologization as an accompanying reduction or erosion


following on the heels of the other effects of grammaticalization, phonologization
meets the literal definition of grammaticalization: Something which is not grammar
(phonetics) BECOMES grammar (phonology). It seems appropriate, therefore, to rec-
ognize parallels such as in (29) and adopt phonologization as one of grammaticaliza-
tion's 'movements toward structure' (Hopper 1987: 148).
26 Larry M. Hyman

1.4 Conclusion
In the preceding sections I have established that phonologization need not involve
contrast, nor even be limited to cases where something phonetic becomes phono-
logical. Taken literally to mean 'the processes by which phonology comes into being',
phonologization becomes one branch of the more general phenomenon of grammat-
icalization: 'the processes by which grammar comes into being', i.e. Hopper's 'move-
ments toward structure'. Unfortunately this is not the usual meaning of 'grammatical-
ization', which often refers to the historical development of grammatical morphemes:
'Grammaticalization consists in the increase of the range of a morpheme advancing
from a lexical to a grammatical or from a less grammatical to a more grammatical
status.' (Kurylowicz 1965 [1972: 52] cited by Heine et al. 1991: 3). Thus, the linguistic
effects of grammaticalization indicated above in (32) mostly have to do with what
happens when a lexical morpheme (e.g. a word) becomes a grammatical morpheme
(e.g. an enclitic or affix). In my use of the term, grammaticalization refers more gener-
ally to the development of any aspect or component of grammar (syntax, morphology,
phonology).
This is but one of two terminological problems. The first is that there is no gen-
erally accepted term meaning conversion of substance to form'. While grammat-
icalization' would have been an excellent and transparent choice, it has been pre-
empted for specific phenomena, namely, the creation of grammatical morphemes.
Other terms I have heard are either inexplicit or awkward, e.g. codification, cod-
ing strategies, linguistification, grammatogenesis, movements toward structure. The
second terminological problem is that terms such as phonologization, grammati-
calization, syntacticization, lexicalization, etc. are potentially ambiguous, since they
only indicate the end product, not the source. This issue arose in the discussion in
section 1.3.1 of whether the possible development of phonology from non-phonetic
sources should be included under phonologization. As has been pointed out by others,
alternative terminology might instead refer to the source, hence dephoneticization,
dephonologization, demorphologization, etc. (Dressier 1985; Janda 2003; Joseph and
Janda 1988).
I would like therefore to conclude by making the following modest and totally
impractical proposals: (i) We should create terms which indicate both the input and
the output of the process, (ii) The input should be indicated by the prefix de- (indi-
cating a change in status) or re- (indicating a restructuring with the same status), (iii)
The output should be indicated by a prefix placed on the base -grammaticalization
(or -grammatogenesis*}. (iv) Grammaticalization should be taken to mean that the
output is grammar, whether phonology, morphology, or syntax. With these propos-
als, a systematic terminology of a catalogue of different types of grammaticalization
(in the broader sense) might look like (34).
i. Enlarging the scope of phonologization 27

(34) Input Output Term


a. widespread phonetics phonology dephonetic
phonogrammaticalization
phonology phonology rephonogrammaticalization
lexical grammatical delexical
morpheme morpheme morphogrammaticalization
grammatical grammatical remorphogrammaticalization
morpheme morpheme
syntax syntax resyntactogrammaticalization
pragmatics syntax depragmatic
syntactogrammaticalization
b. 'sporadic' grammatical lexical demorpho-
morpheme morpheme lexicogrammaticalization
grammatical phonemic demorpho-
morpheme material phonogrammaticalization
Since the resulting terms are a bit clumsy perhaps we would refer to them by three-
letter codes: DPP, RPG, DLM, RMG, RSG, DPS, DML, and DMP.
Whatever one thinks about the terminological issue, I hope I have established the
following:

(35) a. phonology is grammar; therefore:


b. phonologization is grammaticalization
c. as with other aspects of grammaticalization, one can have greater interest
in...
i. the beginning point (articulatory, perceptual, conceptual) to determine
what is or isn't available for phonologization, how, and why (Hombert
1977; Moretn 2oo8a, b; Yu 2011)
ii. the end point (phonology), e.g. how the structured version ultimately
diverges from the phonetics
iii. the diachronic correspondences between the beginning and end points
iv. the logical or actual stages of the changes in input/output, their diffu-
sion, social significance, etc.
d. there is overlap and unclarity as to where phonetics ends and phonology
begins
e. however, there is a difference between phonetics (substance) and phonology
(form), just as there is a difference between pragmatics (substance) and
syntax (form)
28 Larry M. Hyman

Much of the interest in phonologization (and Heine et al.'s notion of grammaticaliza-


tion) has been in determining the nature of the substance that underlies grammar.
This has led certain scholars to seek ways of reducing phonology to phonetics and
morphology/syntax to semantics and pragmatics. While no one can deny such rela-
tionships, establishing the sources of grammar is only part of the story. The rest has to
do with why the intrinsic phonetic, semantic, and pragmatic properties do not remain
intrinsic rather than becoming structured within the grammar. This in turn reduces
to the question of why there is grammar at all. On the one hand grammar necessarily
underspecifies the substantive sources: a language cannot provide a structural ana-
logue for every aspect of phonetic naturalness, semantic transparency, or pragmatic
coherence. What it does do is impose strictly formal linguistic structures which take
over from where the extralinguistic sources leave off. A full account must therefore
be concerned with both the beginning and endpoints of phonologization (and, more
generally, grammaticalization), and ultimately recognize that phonologies/grammars
have properties that are not reducible to the natural tendencies in speech and com-
munication:
... it is necessary to assume a considerable degree of independence between linguistic principles
proper and the principles that obtain in those extralinguistic domains that appear to underlie
them. (Anderson 1981: 496)

... the concerns of Grammar... are not derivable from extragrammatical factors. (Hyman
1984: 71)

Or, as I like to put it, Grammar has a mind of its own.


2

The role of entropy and surprisal in


phonologization and language
change
E L I Z A B E T H H U M E A N D F R D R I C MAILHOT*

"What are the laws of motion but the expectations of reason concerning the posi-
tion of bodies in space? We are thus justified, not only in saying that all complete
knowledge involves anticipation, but also in affirming that all rational expectation is
knowledge." (Hitchcock 1903: 673)

2.1 Introduction
Traditionally, the term phonologization has been used to describe a diachronic
change within a given language system from a state of phonetic variation to that of
phonological generalization (Hyman 1976). More specifically, we take this to mean a
diachronic shift from variation across a large number of uncorrelated dimensions to
correlated variation of lower dimensionality. Such transitions are relevant both to the
creation of new categories and patterns (e.g. phoneme, stress pattern), as well as to the
change from one existing category into another. Many factors external to a language's
grammatical system have been shown to play an influential role in this process. Some
of these external factors are listed below (for relevant discussion see Archangeli and
Pulleyblank 1994; Blevins 2004; Bybee 2001; Culicover and Nowak 2002; Davidson
2007; Guin 1998; Hayes and Londe 2006; Hume and Johnson 2ooia; Hyman 1976;
Joseph and Janda 2003; Jeffers and Lehiste 1979; Lindblom 1990; Moretn and
* We owe a debt of gratitude to Kathleen Currie Hall, Dahee Kim, Adam Ussishkin and Andrew Wedel
for much lively discussion regarding the ideas in this chapter. We would also like to thank the following
people for their input on aspects of this research: Paul Boersma, Chris Brew, Joan Bybee, Jennifer Cole,
Peter Culicover, Alex Francis, John Goldsmith, John Hale, liana Heintz, Robert Kirchner, Kate Kokhan, Jeif
Mielke, William Schuler, Andrea Sims, Rory Turnbull, Mike White, Alan Yu, members of the Ohio State
phonetics/phonology and socio-historical linguistics discussion groups, and two anonymous reviewers.
30 Elizabeth Hume and Frdric Mailhot

Thomas 2007; Ohala 1981, 19930, 2003; Peperkamp, Vendelin and Nakamura 2008;
Yu 2007, inter alia}.
Grammar-extern al factors influencing phonologization include:

a. phonetic factors, e.g. perceptual distinctiveness, articulatory difficulty;


b. usage factors, e.g. familiarity, frequency;
c. processing factors due to, e.g., structural complexity.
While there is ample evidence showing the impact of these diverse forces on language
systems, they are often treated independently of one another (though see Blevins
and Wedel 2009). As such, the literature on language change is replete with argu-
ments for why one factor, as opposed to another, underlies a particular modifica-
tion. In this chapter we propose that a unified account of the influence of these
and other factors is possible when we view the phenomena of phonologization
through the lens of information theory (Shannon 1948), in particular making use
of the concepts of surprisal and entropy. Not only do the tools of information
theory allow us a deeper understanding of why these factors influence language
systems in the way that they do, they also provide insight into the process of
phonologization.
In the current context, entropy models a cognitive state of the language user asso-
ciated with the amount of uncertainty regarding the outcome of identifying or pro-
ducing some linguistic event, e.g. the next word in a sentence (Townsend and Bever
2001; Hale 2003; Levy 2008), or the vowel that is epenthesized or deleted (Hume
and Broomberg 2005; Hume et al. 2011). All linguistic elements have an associated
(context-dependent) surprisal, and contribute individually to an overall measure of
uncertainty in selecting among outcomes in a system (the entropy) associated with
the outcome of some event, e.g. which vowel will be epenthesized. As we show, each
element can contribute to entropy as a function of factors such as those discussed
above, e.g. perceptual distinctiveness, usage frequency.
Entropy and surprisal are of particular relevance to phonologization for a number
of reasons. The first is linked to learning. The mind's attentional focus is drawn to
contextually informative, or higher surprisal, elements, e.g. auditory cues (Grossberg
2003; Baldi and Itti 2010), and attentional focus is known to be a crucial component
of learning (McKinley and Nosofsky 1996; Kruschke 2003). Given that speaker-
hearers must learn to associate phonological meaning to particular phonetic details
in order for phonologization to occur, surprisal (and by extension system entropy)
is likely to play a key role. The second reason, and the main focus of this chapter,
is that the approach advocated here brings clarity to phonologization by making
strong predictions about both the likely targets of change, as well as the nature of the
resultant change.
Surprisal is a continuous measure, taking values in the interval [o, oo], with increas-
ing surprisal being a function of decreasing probability. As elaborated below, elements
falling toward each pole of the range of surprisal are unstable, making them more
2. Entropy and sur frisai in phonologization and language change 31

prone to change than elements occurring away from the extremes. Phonologization is
thus predicted to preferentially affect elements linked to extreme degrees of surprisal,
i.e. that have a small entropie contribution. Interestingly while the mechanisms that
affect elements with very low or very high surprisal may differ, they pattern together
in being prone to change given their low contribution to predicting outcomes in a
system.
The current approach also speaks to the nature of change. Unstable elements with
high surprisal are biased to change in the direction of a similar element or pattern
with lower surprisal, consistent with observations regarding analogical change (see
e.g. Phillips 2006; Wedel 2007). In other words, change affecting high surprisal ele-
ments is predicted to preserve structures that the speaker-hearer is already familiar
with. Conversely, as developed below, change in patterns with low surprisal need not
be structure preserving, and such patterns are typically prone to production-based
reduction processes (Bybee 2001), which can introduce novel patterns into a speaker-
hearer's linguistic system.
Before delving into these points in more detail, we define the information-theoretic
concepts of surprisal and entropy more rigorously, then briefly discuss the cognitive
state modeled by surprisal, which we call expectedness'. With this groundwork in
place, we turn to the heart of the chapter: the relevance of entropie contribution and
surprisal for phonologization and language change. Section 2.3 outlines in general
terms the effects of surprisal on language systems. The section also focuses on the
linguistic consequences of two key properties of our approach: instability and bias. In
doing so, we take a closer look at the potential for a given element to undergo change
or be the outcome of change given the degrees of surprisal associated with it.

2.2 Information, surprisal, and entropy


In this section we introduce the basic notions of information theory that we shall
make use of in the approach to phonologization and language change developed
further below. While information-theoretic concepts are foundational to the field of
computational linguistics, they are less familiar to linguistics more generally, though
see Cherry, Halle and Jakobson (1953); Hockett (1955); Broe (1996); Hale (2003);
Goldsmith (1998, 2002); Aylett and Turk (2004); Hume (2006); Hall (2009); Jaeger
(2010); Jaeger and Tily (2011); Levy and Jaeger (2007); Goldsmith and Higgle (to
appear). For further coverage of information-theoretic concepts, the reader is referred
to Shannon (1948), the founding document of the field, which remains an excellent
introduction, or to Cover and Thomas (2006), the currently standard text, for exten-
sive and mathematically rigorous coverage of information-theoretic concepts.

2.2.1 Entropy and surprisal


Information theory is concerned with representing mathematically how much
information is needed to convey a message given the constraints imposed on a
32 Elizabeth Hume and Frdric Mailhot

communication system. Entropy, H, can be understood in terms of making a decision


over a range of outcomes related to the message, e.g. identifying the quality of an
epenthetic vowel in context C C. It is a probabilistic measure of the amount of
uncertainty associated with selecting among outcomes, e.g. a set of vowels. Higher
uncertainty correlates with higher entropy. Studying system entropy is useful for
determining mathematically how much an element in the system contributes to
uncertainty in predicting probabilistic outcomes. As such, it can provide a measure
of the elements contribution to the languages effectiveness as a system of communi-
cation. Elements that contribute more to predicting an outcome are more crucial for
successful communication.
In information-theoretic terms, an elements contribution to system entropy is its
probability multiplied by its surprisal (also referred to as information content). Every
element in a system has an associated surprisal, 5, which is the negative logarithm1 of
its probability:

(1)
2
where X is an event ranging over a set of possible outcomes {xlix2,...,Xi,...} each
with an associated probability, P(X = #/). In the general case, these probabilities are
defined contextually, e.g. phonologically, morphologically, etc.
Figure 2.1 illustrates the relation between probability and surprisal. Surprisal varies
continuously between zero and positive infinity; the occurrence of a highly likely
event (e.g. observing some vowel in a context where it is the only permissible one)
has low surprisal, while a highly unlikely event (e.g. observing some phonotactically
prohibited sequence of segments) has high surprisal. This reflects the intuition that the
occurrence of improbable events is highly surprising, while the occurrence of highly
likely events is not surprising.
As noted above, an element's contribution to the uncertainty (i.e. entropy) asso-
ciated with predicting the outcome of an event is its probability multiplied by its
surprisal, as given in Equation 2,

(2)

where X, as above, is an event whose outcome can take one of several values in
the vocabulary set Vx (e.g. outcomes of X could be any vowel in a language under
consideration), P(X = #/) is the probability that outcome Xi will be observed, and the
quantity log2P(X = #/), as discussed above, is the surprisal of outcome X = #/. We
label Hc(x) the entropie contribution of x.

1
We follow convention here and use a logarithmic base of 2, which allows us to express surprisal and
entropy in units of bits. Using a different logarithmic base is equivalent to a multiplicative scaling.
2
Formally, a random variable.
2. Entropy and sur frisai in phonologization and language change 33

FIGURE 2. i Plot of probability vs. surprisal

The entropy of a system is the sum of its elements' entropie contributions, as in


Equation 3. Thus, it is a measure of the average surprisal of the system.

(3)
Probabilistic notions are clearly relevant to the study of language acquisition, use,
change and representation, as discussed in works such as: Bod, Hay, and Jannedy
(2003); Boersma and Hayes (2001); Bybee (1985, 2001); Coleman and Pierrehumbert
(-997); Frisch, Pierrehumbert and Broe (2004); Goldsmith (2007); Greenberg (1966);
Hooper (i9/6b); Hume (2OO4a, b); Jurafsky et al. (2001); Phillips (1984, 2006); Luce
and Pisoni (1998); Pitt and McQueen (1998); Trubetzkoy (1969); Vitevitch and Luce
(i999); Zipf (1932), inter alia. Hence, the cognitive state modeled by surprisal corre-
lates with probability. The notion 'probability' here may be approximately equated
to subjective degree of belief, as in a Bayesian approach to cognition (Pearl 1988;
Chater, Tenenbaum and Yuille 2006), in which prior states of knowledge are taken
into consideration when computing the probability of some future event or state.
To illustrate, consider a hypothetical language, jf, with the following vowels:
Vj2? = {i, e, a, o, u, 9}. We wish to compute the entropy of jf s system of vowels;
more specifically, we want a measure of the amount of uncertainty associated with e.g.
predicting the observation of some vowel in a given phonological context, an event
34 Elizabeth Hume and Frdric Mailhot

we label L. First we take the case where each vowel is assumed to be, ceteris paribus,
equiprobable; then each v G V<> has a probability of o b s e r v a t i o n T h e
entropy computation is then as follows:

(4)
Of course, since the entropy of a system is its average surprisal, and each vowel in
this case has the same surprisal value (since they are equiprobable), the entropy of
this system is equal to each vowel's surprisal. To illuminate the relationship between
surprisal and entropy more clearly, we can examine how the entropy of this system
changes as we alter the probability estimates for particular vowels. As a simple initial
case, assume that one vowel, e.g. {9}, is more probable in some context than the others,
which are all equiprobable. For concreteness let us assume that the probability of
observing a schwa, P(L = 9), is |, hence the surprisal S(L = 9) = Iog 2 | ~ 1.4.
Then the surprisal of observing any of the remaining vowels is S(L = v ^ 9) =
Iog 2 1 = 3- The entropy of the system under this distribution is then

(5)

Note that the entropy in this case is lower than when all vowels are equiprobable. This
is because there is now less uncertainty about which vowel will occur in the context
under consideration, due to schwas higher probability of observation. We state here
without proof the theorem that the entropy of a system is maximized when all of its
outcomes are equally probable (Shannon 1948: 11).
Consider finally a slight generalization of the previous case, where we examine
all possible values for the probability of schwa occurring in some context, assuming
the remaining vowels are equiprobable. In lieu of additional calculations of entropy,
consider the graphs in Figure 2.2 and Figure 2.3: the first is of the entropy of^f's vowel
system versus the probability of observing schwa, the second is of schwas contribution
to the entropy of^f versus its probability of observation.
Note that entropie contribution goes to zero in Figure 2.3 for both low and high
probabilities. That is, outcomes known to be either (near) certain or (near) impossible
contribute little to the entropy of the system. As will be discussed further below, the
fact that surprisal extremes contribute little to system entropy is crucial to our model
2. Entropy and sur frisai in phonologization and language change 35

FIGURE 2.2 Entropy of J\ vowel system, as a function of the probability of observing {9},
assuming equiprobability of other vowels

FIGURE 2.3 Contribution of {9} to the entropy of , as a function of its probability of obser-
vation, assuming equiprobability of other vowels
36 Elizabeth Hume and Frdric Mailhot

of phonologization. In Figure 2.2, the entropy of the system does not go to zero for
P(L = a), since there is still maximal uncertainty about which of the remaining five
vowels will be observed. Before turning to the details of our model, we discuss more
specifically the measures relevant to the calculation of surprisal.

2.2.2 Bases of phonological surprisal (and entropy)


Our discussion of surprisal thus far is compatible with the use of maximum likelihood
estimates of probability. If we use such estimates, probability is calculated in terms of
the frequency of occurrence of some element; a more frequent element has lower sur-
prisal than a less frequent one. In this manner, frequency can be viewed as condition-
ing the outcome of some linguistic event which, as noted above, is strongly supported
by evidence showing that frequency impacts the learning, use and representation of
sound patterns. Yet frequency is not the only factor that conditions phonological
patterns. As stated in the introduction, it is well-established that other factors are
also relevant, including a pattern's perceptual distinctiveness and the precision with
which a sound sequence is produced (e.g. Blevins 2004; Davidson 2007; Guin 1998;
Hume and Johnson 2001 a; Joseph and Janda 2003; Jeffers and Lehiste 1979; Lindblom
1990; Ohala 1981,1993C, 2003). An adequate model of phonologization and language
change must then also provide a means of integrating these factors.
The concepts of surprisal and entropy allow for precisely this. While both concepts
are formulated probabilistically, it is important to bear in mind that on the view
adopted here, probability is simply an arbitrary mathematical measure of the sub-
jective degree of belief ascribed to some outcome on the basis of a set of observations.
Probability says nothing about which observations are relevant to phonologization
and sound change. For this, we must draw on the results of linguistic study, such as
those expressed by taking into account, for example, phonetic as well as statistical
information. As we sketch just below, expressing results relating to these factors in
terms of a combined measure of surprisal allows for the development of a unified
model of language change.
We begin by considering how to incorporate perceptual distinctiveness into the
measure of surprisal. For this, we follow the information-theoretic account of French
epenthetic and deleted vowels in Hume et al. (2011). Of interest is the observation that
the vowels in question are non-back and rounded [0, ], an apparent anomaly in the
world's languages given that deleted/epenthetic vowels are typically front or central
unrounded vowels. Hume et al. (2011) show that the patterning of the French vowels is
consistent with universal patterns when we take seriously the view of language as a sys-
tem shaped to meet the competing demands of efficiency and robustness in commu-
nication. In this approach, both deletion and epenthesis contribute to communicative
effectiveness. Deleting a vowel enhances system efficiency by removing elements that
contribute little to conveying the message. Conversely, epenthesis enhances system
2. Entropy and sur frisai in phonologization and language change 37

robustness by helping to disambiguate low frequency structures, those with otherwise


perceptually-masked cues, and/or those with a low probability of being accurately
produced. As with deletion, the epenthetic sound contributes little to system entropy.
Perceptual distinctiveness is modeled as a function of mis categorization probabil-
ity: the more a vowel's acoustic space overlaps with those of other vowels in the system,
the higher the probability that the vowel will be miscategorized. Put another way, a
high degree of overlap is correlated with poor perceptual distinctiveness and high
confusability. A modified version of Nosofsky's 1986 Generalized Context Model, with
frequency information factored out, was used for deriving categorization probabilities
from a set of vowel tokens.
The result of applying the modified GCM is a ranking of sounds in a given context
in terms of confusability. This is reminiscent of the P-map (Steriade 2008), though
note crucially that we express an element's confusability in probabilistic terms and
define confusability on a language-specific basis. By taking the negative logarithm of
the resultant probabilities, we derive a surprisal value for each segment in question:
an element with a high probability of being confused is associated with low surprisal,
while an element with extremely noticeable cues, is associated with high surprisal.
In terms of entropie contribution, elements with extremely high or extremely low
surprisal contribute little to system entropy.
We can take a similar approach to production. Consider a scenario in which we
are interested in evaluating the stability of word-final consonants Cg in a language
^ which includes the set of sounds {t, s, !}. Assuming, perhaps non-trivially, the
availability of an independent measure of articulatory complexity, an A-map' of sorts,
on the basis of which the members of C<g may be ranked in terms of probability of
accurate production from least to most probable, ! -< s -< t, we predict that those
elements with very complex or very simple articulations will be less stable than mid-
range elements. Very simple elements will have very low surprisal associated with
accurate production, and very complex elements will have high surprisal; in both cases
they have small entropie contributions.
While our discussion above has briefly sketched out how some factors that condi-
tion phonological patterns can be recast within a model of surprisal, it seems rea-
sonable to assume that other factors could also be defined in probabilistic terms.
Moreover, we can go one step further and combine the various factors to create a
unified model. In fact, in Hume et al.'s (2011) study of French epenthesis and deletion,
it is only when the factors of frequency and perceptual distinctiveness are combined
that the model correctly predicts the non-back rounded vowels to contribute least to
the entropy of the system. When calculated independently, frequency and perceptual
distinctiveness were only weakly predictive.
A unified model may take the following form. Let V^ = { v i , . . . , v / , . . . , v n } rep-
resent the set of vowels from a language user's experience, jf, and ei represent a
context in which v/ may be observed (i.e. produced or perceived), an event we label
38 Elizabeth Hume and Frdric Mailhot

X. We assume that { is defined by grammatical factors relevant to v/ (e.g. 'between


obstruents', in coda position, etc.) as well as statistically (e.g. n-gram frequencies of
the grammatical elements). As described in Equation 6, the surprisal associated with
V{ being accurately produced or identified is determined by the set of conditioning
factors noted above, perhaps among others: confusability k> articulatory precision a,
contextual frequency/, conditioning context 6f. Hence, as a first approximation, the
surprisal S(X = v/) associated with the observation of a given element v/ is:

(6)
A segment's entropie contribution, H c (v/), provides a measure of the degree to which
that element is a factor in Jxf's effectiveness as a system of communication.
How the various factors interact and contribute to the overall surprisal associated
with a particular system is an important line of research yet beyond the scope of this
chapter (though see Hume et al. 2011). As we discuss below, however, it is surprisal
extremes that are of particular relevance to the present discussion, since elements
at these ends are least stable and thus good candidates for phonologization. In this
regard, it is reasonable to assume that extreme degrees of surprisal typically arise when
the impact of several factors point to a common end of the continuum, although
a single factor could potentially contribute sufficiently to determine the surprisal
on its own.
One might ask why we need to talk about surprisal and entropie contribution,
rather than simply limiting our discussion to probability itself. We can think of at
least three reasons. First, although it is a formal measure, the quasi-metaphoric term
'surprisal' helps to evoke and preserve the intuition that we are discussing human cog-
nition, and the impact of (socio)cognitive factors on phonologization and language
change. Second, surprisal is a key component of the entropy of a set of possible
outcomes (e.g. in a linguistic system), and it is the notion of entropy that allows us
to provide a unified account of those elements that are prone to change. Third, Hume
et al. (2011) show that probabilities based on confusability and frequency alone cannot
predict the quality of the epenthetic vowel or deleted vowel in French. Rather, it is the
entropie contribution based on these combined measures that correctly predicts the
observed patterns.

2.2.3 Surprisal and expectedness


To the extent that we are correct in using surprisal to model a cognitive state, we might
call this state (inverse) expectedness.3 That is, a low degree of surprisal associated
with some linguistic outcome in production, perception, and/or processing correlates
with a high degree of expectedness. For example, a sound sequence that has a high
3
We previously (cf. Hume and Broomberg 2005) used the term expectation for this notion, but since
this term overlaps with a concept from probability theory relevant to our discussion, we adopt the neolo-
gism expectedness in its stead.
2. Entropy and sur frisai in phonologization and language change 39

probability of occurring, of having an articulation that is easy to produce accurately


and weak perceptual distinctiveness will, all else being equal, be associated with a low
degree of surprisal (whether in production or perception) and greater expectedness.
Conversely, high surprisal sequences (e.g. due to extreme perceptual distinctiveness,
low frequency, complex articulation, etc.) will have weaker expectedness. These points
are developed in greater detail below.
Expectedness has been studied (under a variety of names) extensively in fields such
as psychology (e.g. Feather 1982; Hitchcock 1903; Kirsch 1999; Reading 2004), music
cognition (e.g. Huron 2006; Jones, Johnston and Puente 2006), vision (e.g. Haith,
Hazan, and Goodman 1988; Puri and Wojciulik 2008), and on language topics relating
to sentence processing (e.g. Kutas and Hillyard 1984), computational modeling of
language (e.g. Hale 2003; Jurafsky 2003; Levy 2008), and markedness (Hume 2oo4a,
2008).
Huron (2006) describes the biological roots of this notion as follows:

Expectation refers to the cognitive function that helps fine-tune our minds and bodies to
upcoming events... The biological purpose of expectation is to prepare an organism for the
future... The capacity for forming accurate expectations about future events confers significant
biological advantages. Those who can predict the future are better prepared to take advantage of
opportunities and sidestep dangers. Over the past 500 million years or so, natural selection has
favored the development of perceptual and cognitive systems that help organisms to anticipate
future events... Accurate expectations are adaptive mental functions that allow organisms to
prepare for appropriate action and perception.

Grossberg (2003) represents expectation in a neural network model as a resonant


state of the brain:

Such a resonance develops when bottom-up signals that are activated by environmental events
interact with top-down expectations, or prototypes, that have been learned from prior experi-
ences. The top-down expectations carry out a matching process that selects those combinations
of bottom-up features that are consistent with the learned prototype while inhibiting those
that are not. In this way, an attentional focus starts to develop that concentrates processing on
those feature clusters that are deemed important on the basis of past experience. The attended
feature clusters, in turn, reactivate the cycle of bottom-up and top-down signal exchange. This
reciprocal exchange of signals eventually equilibrates in a resonant state that binds the attended
features together into a coherent brain state. Such resonant states, rather than the activations
that are due to bottom-up processing alone are proposed to be the brain events that represent
conscious behavior.

Expectedness and thus, surprisal, have considerable explanatory force when it


comes to understanding how phonetically variable material is transformed into
phonologically meaningful units, an explanation that lies in the connection between
expectedness/surprisal and attentional focus. As expressed in the quote from Gross-
berg (2003) above, expected outcomes yield an attentional focus that concentrates
40 Elizabeth Hume and Frdric Mailhot

on those elements (e.g. auditory cues) considered important on the basis of past
experience (cf. Kirby (this volume) for a model of a diachronic shift in the weights
given to various acoustic cues). Given that attentional focus is a crucial component
of learning (e.g. Kruschke 2003; McKinley and Nosofsky 1996), it is directly relevant
to phonologization, since for change to take place, the user must learn to associate
phonological meaning with some phonetic detail. Further, since the resonant states
that result from the interaction of expected outcomes and perceptual input are 'the
brain events that represent conscious behavior', it is instrumental in shaping the
form that behavior takes. This is of particular relevance for our understanding of
phonologization, since although we often refer to the way that languages behave, it
is in fact the behavior of the language user that is at issue. It is the individual who,
for example, perceives the auditory cues that are subsequently phonologized as an
epenthetic vowel, or fails to produce the gestures involved in making one sound as
opposed to another.
It is perhaps worthwhile pointing out that while the discussion above has focused
on phonetic, processing and usage factors, an additional advantage of the approach
developed here is that it can be easily expanded to take into account other factors
including e.g. sociolinguistic attributes and attitudes. For example, if a language vari-
able, such as the pronunciation of [n] in e.g. running, has a specific social meaning
(Campbell-Kibler 2005), there are expectations associated with when and by whom
the variable is used which can influence behavior including an individual's attitudes
regarding its usage. We leave this topic open for future consideration.

2.3 Phonological effects of surprisal


We turn now to discuss more specifically why we believe surprisal is fundamental to
phonologization and language change. Two properties of the current approach are
particularly important: the relation between surprisal and instability, which provides
insight into which elements are likely to be the targets of change, and the relation
between surprisal and direction of change.

2.3.1 Instability associated with the target of change


An important prediction of the current approach is that change preferentially affects
elements associated with extreme degrees of surprisal. The core insight here is that
such extremes create phonological instability, as elaborated on just below. As is clear
from Figures 2.1 and 2.3, what unifies these seemingly divergent cases is that elements
with extreme degrees of surprisal, whether high or low, contribute little to system
entropy. So the key prediction we derive is that elements that contribute little to
predicting an outcome are less crucial for effective communication. As a result, they
are more likely to be unstable, and thus prone to be the targets of diachronic change.
They are, in a sense, more expendable.
2. Entropy and sur frisai in phonologization and language change 41

In order to answer the question of why this might be so, we take any token
of language use (i.e. any speaker-hearer interaction) to be an instantiation of a
communication system striving (perhaps implicitly) to meet the competing demands
of efficiency and reliability. The reliability of a communication system is a function
of the degree of redundancy in transmitted elements. If symbols are on average
highly redundant (i.e. recapitulating information available elsewhere), then they
are more predictable/probable, and hence less informative (i.e. lower surprisal).
Efficiency, conversely, is a function of a communication systems rate of transmission
of information; increasing efficiency corresponds to transmitting more informative
(i.e. higher surprisal) items on average. Consider now the effects of noise; a reliable
system will in general be able to recover from an error in transmission, as the
built-in redundancy ensures that the information lost is likely to be predictable
from context, whereas a maximally efficient system, being non-redundant, makes
no such guarantees, and hence is more adversely affected by transmission errors.
The net result of striking a balance between the demands of reliability (maximal
redundancy/predictability) and efficiency (minimal redundancy/predictability) is
that elements that contribute significantly to the entropy of the system, those that
are neither too surprising, nor too expected, are most important for effective or
successful communication (see Lindblom 1990; Aylett and Turk 2004; Levy and
Jaeger 2007; Jaeger 2010, for related discussion). Interestingly, while elements at
opposite ends of the continuum pattern together in terms of being unstable, the
cause of the instability differs, as discussed below.

2.3.1.1 Low surprisal Low surprisal elements are associated with high frequency,
weak perceptual distinctiveness and simple articulations, among other properties. As
is well documented, elements associated with these properties tend to be unstable.
We acknowledge that isolating the effects of these properties may be a non-trivial
enterprise.
In terms of perception, elements with poor perceptual distinctiveness can result
in a failure to correctly parse the signal, which may result in assimilation or deletion
(Jun 1995) and subsequent sound change. This is consistent with Ohala's (1981) thesis
that an ambiguous signal can cause misperception giving rise to language change.
In fact, the present account subsumes Ohala's proposal as a special case, given that
low surprisal, on our account, can result not only from confusability, but from any of
the factors listed immediately above, presumably among others. Production-related
instability in cases of low surprisal may lead to, for example, reduction, deletion, or
assimilation, a claim supported by the phonetic, phonological and psycholinguistic
literature.
For example, words that occur frequently tend to be reduced, and high frequency
sounds and sequences are prone to processes such as lenition, deletion, and assimi-
lation, among others (cf. Bybee 2001, 2002; Bybee and Hopper 2001; Fosler-Lussier
42 Elizabeth Hume and Frdric Mailhot

and Morgan 1999; Frank and Jaeger 2008; Hooper i9/6b; Jurafsky, Bell, Gregory, and
Raymond 2001; Jurafsky 2003; Munson 2001; Neu 1980; Patterson and Connine 2001;
Phillips 1984, 2001, 2006; Pierrehumbert 2ooia; Raymond, Dautricourt, and Hume
2006; Tabor 1994; Zuraw 2003). Further, high frequency function words in English
such as just and and have been found to undergo deletion of /t, d/ at significantly
higher rates than less frequent words containing alveolar stops in comparable contexts
(cf. Bybee 2001, 2002; Guy 1992; Jurafsky et al. 2001; Raymond et al. 2006). The
result of phonological processes such as metathesis are also conditioned by frequency
(Hume 2OO4b). Consistent with the current approach, changes often have their start in
high frequency forms, subsequently spreading to other similar forms (see, e.g., Bybee
2001; Phillips 2006, inter alia).
It is worth pointing out that this approach is consistent with the observation that
the more a routine is used, the more fluent it becomes (Bybee 2001, 2002; Phillips
2006; Zipf 1932). However, in the current approach changes are viewed as more than
a practice effect. On our view, production, perception, and processing are guided by
surprisal and expectedness, and we hypothesize that this grounds the physiological
reflexes of practice in a cognitive explanation.

2.3.1.2 High surprisal High surprisal is associated with elements that occur with
very low frequency, have complex articulations, and/or have extremely noticeable
perceptual cues, among other factors. Given the link between surprisal and expected-
ness, when an element has high surprisal, its realization will correspondingly be only
weakly expected by the language user. This, we suggest, gives rise to instability from
both the speaker s and hearer s perspectives.
From a production perspective, it is well established that articulatory complexity
can create instability, with phonological consequences taking the form of deletion,
metathesis, assimilation, or other repairs to the unstable form. We provide an example
from metathesis further below.
Very low frequency sequences are also unstable. Treiman et al. (2000), for example,
found that English speakers made more errors in pronouncing syllables with less
common rimes than those with more common rimes. Similarly, Dell (1990) reports
that low frequency words are more vulnerable to errors in production than high
frequency ones. Interestingly, when a form is unstable because aspects of its realization
are unexpected, a speaker may also choose' to compensate by producing it more
slowly and carefully. In this regard, Whalen (1991) found that infrequent words were
longer in duration than frequent ones. The current approach is also consistent with
the observation that low frequency is a factor associated with forms that undergo
analogical change.4 Phillips (2001, 2006), for example, presents numerous examples
of change affecting low frequency items such as the case of [h] deletion in Old English

4
In her study of analogical change in Croatian morphology, Sims (2005) shows frequency as well as
social salience to be contributing factors, findings that are consistent with the current approach.
2. Entropy and sur frisai in phonologization and language change 43

(Toon 1978): low frequency words underwent deletion first giving rise to nut, ring,
loaf, from OE hnutu, hring, hlaf.
With respect to frequency, an interesting consequence of the current approach is
that it provides a unified account of the observation that high and low frequency
elements tend to lead language change (Bybee 2001; Phillips 1984,2000). As discussed
in subsection 2.2.1, frequency is a determinant of, and in direct proportion to, the
probability assigned to a linguistic outcome, hence to its surprisal. To the extent that,
all else being equal, low frequency correlates with high surprisal and high frequency
corresponds to low surprisal (recall Figure 2.1), the current theory makes the strong
and apparently correct prediction that high and low frequency elements will both be
prone to change.
Metathesis provides an apt example showing low frequency and articulatory com-
plexity contributing to instability, thus promoting change. In Hume's (2OO4b) study
of 37 cases of consonant/consonant metathesis, low frequency of occurrence and
similarity emerged as significant predictors of metathesis. In all cases, a consonant
sequence that underwent metathesis was a non-occurring or infrequent structure
in the language. In some cases, the word in which the sequence occurred was also
uncommon, contributing an additional layer of surprisal to the sequence. Further,
in over a third of the cases, the sounds involved were similar. Some shared the same
manner or place of articulation, or agreed in sonorancy, differing only in place and/or
manner, as attested in Georgian (Hewitt 1995; Butskhrikidze and Van de Weijer
2001), Chawchila (Newman 1944), and Aymara and Turkana (Dimmendal 1983),
among other languages. The significance of similarity in the present context relates
to the probability of accurate production. To the extent that sounds in a sequence are
articulatorily similar, it is reasonable to expect an increase in the effort required to
accurately produce and thus render each sound distinct.
A further prediction of the current approach is that elements with extremely dis-
tinctive cues will also be unstable. Clicks would seem to be an example of this type. The
observation that clicks are typologically rare and do not seem to be spreading among
language communities may provide some evidence for this prediction (A. Miller,
p.c.).5 However, our understanding of variable processes involving clicks and other
high surprisal elements is incomplete at this time and thus, we leave this issue for
future consideration. It is worth noting, however, that the patterning of sequences that
are neither overly noticeable or unnoticeable lend support for the present approach in
that they are predicted to be more stable than sounds/sequences at the extreme ends
of the noticeability pole. We thus hypothesize that common sound sequences such
as stop+vowel, sC, and other perceptually well-formed sequences, would be situated
away from surprisal extremes.

5
It is likely that articulatory complexity is also a factor, meaning that both articulatory and perceptual
factors contribute to their high surprisal.
44 Elizabeth Hume and Frdric Mailhot

To summarize, in this section we have suggested that an approach drawing on con-


siderations of communicative effectiveness provides a unified account of the pattern-
ing of elements with very high and very low degrees of surprisal. In both cases, they
are predicted to contribute little to the entropy of the system and thus be less crucial
for effectively communicating the message in question. In the following section, we
focus on the role that surprisal plays in biasing the outcomes of phonological change.

2.3.2 The output of change


The current approach also speaks to the nature of the change affecting unstable lan-
guage patterns. As stated above, the degree to which particular linguistic elements
are expected guides processing, perception, and production. As a result, to the extent
that these expectations are biased in one direction or another, we would expect there
to be linguistic consequences (for related discussion see Pierrehumbert 2001 a; Wedel
2007). For example, if a linguistic item has properties that are strongly expected in
a given context, processing should be faster since the listener will be biased toward
perceiving the item. This is supported by findings that high frequency words and
words containing frequent sound sequences are processed more rapidly than infre-
quent ones (see Oldfield and Wingfield 1965; Jescheniak and Levelt 1994; Vitevitch,
Luce, Charles-Luce, and Kemmerer 1997, among others).
The observation that expectations bias perception is not limited to language. Kirsch
(1999) presents an amusing case relating to visual perception.
When stimuli are ambiguous enough, sets of expectancies can lead to their being misperceived,
even when they are examined slowly and carefully. For example, when 17th- and 18th-century
biologists who believed in preformation examined sperm under the microscope, they reported
seeing fully formed miniature beings. They saw miniature horses in the sperm of a horse,
tiny chickens in the sperm of a rooster, and minuscule human babies in human sperm. The
ambiguity of the stimulus allowed them to see whatever they expected to see. (Kirsch, 1999: 6)

As in the vision example above, the influence of bias is particularly strong in


contexts of ambiguity, such as low surprisal sequences with weak perceptual distinc-
tiveness. Bias also influences the outcome of high surprisal sequences, such as those
associated with very low frequency or considerable articulatory complexity. In both
cases, bias drives the sequences away from the surprisal extremes. That is, a high
surprisal sequence due to, for example, articulatory complexity will be realized as
one with less complexity. Conversely, a low surprisal sequence due to weak perceptual
distinctiveness, will generally be replaced by one with more distinct cues. In each case,
the sequences in question end up contributing more to system entropy and thus, to
communicative effectiveness.
Pitt and McQueen (1998), for example, found that the transitional probabilities
of voiceless alveolar and postalveolar fricatives at the end of nonwords influenced
listeners' identification of an ambiguous fricative as well as that of the following stop
2. Entropy and sur frisai in phonologization and language change 45

consonant; subjects were biased toward the fricative with the highest transitional
probability. This is also consistent with the findings of Vitevitch and Luce (1999),
which reveal segment and sound sequence probabilities to be most influential when
listeners are presented with unfamiliar words; that is, high surprisal words. The obser-
vation that bias is especially strong in cases of high surprisal is of particular relevance
to understanding phonologization. It predicts that if an item is unstable because of
high surprisal, it will be prone to subsequent change to a pattern with lower surprisal;
that is, it will be biased in the direction of a more expected pattern. This is exactly the
pattern of change observed in cases of analogical change.
The study of metathesis once again provides an appropriate example. As noted
above, sequences prone to metathesis are those associated with high surprisal due to
a low probability of accurate production, and the user's limited experience or lack of
experience with the sequence (and perhaps the word it occurs in as well). As predicted,
the direction of change is biased toward a more expected structure with lower sur-
prisal. As the study of metathesis shows, the resultant structure is not only more com-
mon than the form that undergoes metathesis, but it has a higher probability of being
accurately produced, resulting in better perceptual cues. Building on Hume (2OO4b),
the reason why improved perceptual salience is a characteristic of so many results of
metathesis is thus simply an artifact of the nature of sequences that undergo metathe-
sis (those associated with high surprisal) and those that influence how the speech
signal is parsed (those associated with low surprisal); in short, unstable sequences
that undergo metathesis are biased toward phonologically similar patterns with lower
surprisal. Variable pronunciations of the word chipotle provide a simple illustration:

The influence of native language patterns on metathesis can also be heard in some varieties
of American English in the variable pronunciation of t-l in the word, chipotle, the [Nhuatl]
name for a particular kind of pepper and, recently, for a chain of Mexican restaurants. Both
orders of the final two consonants can be heard, even in the speech of the same individual:
chipotle (the original order) or chipolte (the innovative order) [... ] The two sounds involved are
archetypical 'metathesis sounds' and thus contribute to indeterminacy: /t/ with perceptually
vulnerable cues and /!/ with stretched out features [...] Another factor [...] is unfamiliarity
with the borrowed word [...] With indeterminacy, the order of sounds is inferred based on
experience, with the bias towards the most robust order. As predicted, although both /tl/ and
/It/ occur intervocalically in English [...] /tl/, in the original form, occurs in 67 words, while
the innovative /It/ sequence occurs in 356 words. (Hume, 2004b: 223)

An interesting corollary of the influence of bias on the outcome of change concerns


the notion of structure preservation (Kiparsky 1985,1995). When change occurs in the
direction of a low surprisal pattern, as it does with unstable high surprisal elements,
such changes will, ceteris paribus, be structure preserving; for a pattern to have rela-
tively low surprisal, i.e. to be relatively more expected, a user must already be familiar
with it, that is, it must already be part of the user's linguistic experience. Cases of
46 Elizabeth Hume and Frdric Mailhot

analogical change and the observation that the output of metathesis is an existing
structure in the relevant language support this view.
Conversely, the result of change involving unstable patterns with low surprisal
need not be structure preserving. In such cases, the linguistic consequence of high
expectedness is under-realization; that is, a pattern contributes little to the entropy of
the system and is thus less crucial to the message. As discussed above, these elements
can thus be reduced in the interests of communicative efficiency without sacrificing
reliability.
An example of non-structure-preservation comes from the observation that reduc-
tion processes involving low surprisal segments, such as English schwa, can create
syllable structures not otherwise occurring in the language. Schwa can be considered
a low surprisal element given its simple articulation, its poor distinctiveness, its pre-
dictability in unstressed syllables, and its overall high frequency of occurrence in the
language (Hume and Broomberg 2005). As such, a native speaker will have strong
expectations concerning the occurrence of schwa in the initial unstressed syllable of a
word such as telepathy, thus licensing its omission, i.e. [tkpaOi]. While schwa deletion
can result in phonotactically licit syllable onsets (e.g. police [plis]), it can also create
onsets such as [tl], which do not otherwise occur word-initially in the language.

2.3.3 Summary
The ideas presented above are summarized in Table 2.1. It is proposed that a language
pattern is prone to change when, as listed in column I, it has a very low or very high
degree of surprisal and thus contributes little to the entropy of the linguistic system.
Column II identifies some of the factors that can give rise to the relevant level of
surprisal. The rightmost column summarizes the discussion above concerning bias
and the nature of the outcome of language change. For patterns that are unstable due to

TABLE 2.1 Overview of relations between surprisal, conditioning factors, and


change
I: Surprisal II: Influencing III: Outcome of change
factors

high low familiarity, change biased toward similar


low frequency, low-surprisal pattern (structure
strong perceptual distinctiveness, preserving)
complex articulation
low high familiarity, change can be unbiased (need
high frequency, not be structure preserving)
weak perceptual distinctiveness,
simple articulation
2. Entropy and sur frisai in phonologization and language change 47

high surprisal, bias influences the direction of change, while for unstable low surprisal
elements, the outcome of change takes some form of reduction which can result in an
increase in entropie contribution.

2.4 Conclusion
As we hope to have shown in the preceding pages, taking into account communicative
effectiveness, as formally expressed in terms of surprisal and entropy, allows us a
deeper understanding of phonologization and language change. To the extent that
this approach is on the right track, it has the potential to provide a unified model
of the factors conditioning an individual's language system. Given that the preceding
pages offer only a sketch of the current theory, many important aspects remain unre-
solved. These include at least the following fundamental issues: (a) understanding how
the diverse factors interact and contribute to cognitively and linguistically plausible
estimates of an element's surprisal, and (b) identifying the consequences of differing
degrees of surprisal and entropy for language systems, at the segmental level and
beyond.
This page intentionally left blank
Part II

Phonetic considerations
This page intentionally left blank
3

Phonetic bias in sound change


ANDREW GARRETT AND KEITH JOHNSON

3.1 Introduction
Interest in the phonetics of sound change is as old as scientific linguistics (Osthoff
and Brugman iS/S).1 The prevalent view is that a key component of sound change is
what Hyman (1976) dubbed PHONOLOGIZATION: the process or processes by which
automatic phonetic patterns give rise to a language's phonological patterns. Sound
patterns have a variety of other sources, including analogical change, but we focus here
on their phonetic grounding.2 In the study of phonologization and sound change, the
three long-standing questions in (i) are especially important.
(i) a. Typology: Why are some sound changes common while others are rare or
nonexistent?
b. Conditioning: What role do lexical and morphological factors play in sound
change?
c. Actuation: What triggers a particular sound change at a particular time and
place?
In this chapter we will address the typology and actuation questions in some detail;
the conditioning question, though significant and controversial, will be discussed only
briefly (in section 3.5.3).
1
For helpful discussion we thank audiences at UC Berkeley and UC Davis, and seminar students in
2008 (Garrett) and 2010 (Johnson). We are also very grateful to Juliette Blevins, Joan Bybee, Larry Hyman,
John Ohala, and Alan Yu, whose detailed comments on an earlier version of this chapter have saved us from
many errors and injudicious choices, though we know they will not all agree with what they find here.
2
Types of analogical change that yield new sound patterns include morphophonemic analogy (Moulton
1960,1967) and analogical morphophonology (Garrett and Blevins 2009). Of course, the source of a pattern
is not always clear. For example, patterns like the linking [i] of many English dialects have been attributed to
a type of analogical change called 'rule inversion (Vennemann i9/2a), perhaps not phonetically grounded,
but work by Hay and Sudbury (2005) and others calls this into question. Note that some phonological
patterns, while phonetically grounded in a broader sense, correspond to no specific phonetic patterns
because they arise through the telescoping of multiple phonetically grounded sound changes. Again, it
is not always easy to identify such cases confidently.
52 Andrew Garnit and Keith Johnson

The typology question concerns patterns like those in (2-3). In each pair of exam-
ples in (2), one is a common sound change while the other is nonexistent. The ultimate
causes of these patterns are clear enough where there are obvious phonetic correlates,
but the mechanisms explaining the relationshipthat is, the precise mechanisms of
phonologizationare still disputed.
(2) Typologically common vs. nonexistent sound changes
a. Common: [k] > [tj] before front vowels (Guin 1998)
Nonexistent: [k] > [q] before front vowels
b. Common: vowel harmony involving rounding (Kaun 2004)
Nonexistent: vowel harmony involving length
c. Common: vowel reduction restricted to unstressed syllables (Barnes 2006)
Nonexistent: vowel reduction restricted to stressed syllables
d. Common: consonant metathesis involving sibilants (Blevins and Garrett
2004)
Nonexistent: consonant metathesis involving fricatives generally
Our typological point can be sharpened further. Not only are there generalizations
about patterns of sound change, but the typology is overwhelmingly asymmetric. For
example, the inverse of each of the common changes in (3) is nonexistent.
(3) Asymmetries in sound change
a. Common: [k] > [tj] before front vowels
Nonexistent: [tj] > [k] before front vowels
b. Common: intervocalic stop voicing (Kirchner 2001; Lavoie 2001)
Nonexistent: intervocalic stop de voicing
c. Common: [t] > [?] word-finally (Blevins 2004: 120-1)
Nonexistent: [?] > [t]
It is uncontroversial that such asymmetries in sound change must (somehow) reflect
asymmetries in phonetic patterns. We will refer to these as BIASES.
Our approach to the typology question, then, is grounded in processes of speech
production and perception and in the phonetic knowledge of language users. The bulk
of our chapter is devoted to an evaluation of various components of speech production
and perception, with an eye to identifying asymmetries (biases) that should be associ-
ated with each component. Our hypothesis is that various types of sound change can
be grounded in the various speech components based on their typological profiles.
We hope this approach yields a useful framework for discussing the relation between
patterns of sound change and their phonetic correlates.
From a broader perspective the typology question can be seen as a facet of what
Weinreich et al. (1968) call the CONSTRAINTS PROBLEM: determining 'the set of
3- Phonetic bias in sound change 53

possible changes and possible conditions for change' (p. 183). The second main ques-
tion we address in this chapter is what they call the ACTUATION PROBLEM: Why does
a change take place in one language where its preconditions are present, but not
in another? Historical linguists sometimes defer this question to sociolinguists by
assuming that its answer involves contingencies of social interaction, but a compre-
hensive model of phonologization should explain how phonetic patterns uniformly
characterizing all speakers of a language can give rise to phonological patterns that
serve as speech variants or norms for some of them.
Our approach highlights the three elements of phonologization shown in (4).
(4) a. Structured variation: Speech production and perception generate variants
(see sections 3.3-3.4)
b. Constrained selection: Linguistic factors influence the choice of variants (see
section 3.5)
c. Innovation: Individuals initiate and propagate changes (see section 3.6)

Processes of speech production and perception generate what Ohala (1989) memo-
rably describes as a 'pool of variation from which new phonological patterns emerge;
we emphasize that this variation is structured in ways that help determine phonolog-
ical typology. Other processes contribute to the phonologized outcome; for exam-
ple, Kiparsky (1995) and Lindblom et al. (1995) refer to 'selection from the pool
of variants. But our first goal is to understand how the underlying variation itself
is structured by bias factors, even if selectional processes also contribute bias (see
section 3.5). Finally, actuation begins with innovation; our second goal is to under-
stand why individual innovators would increase their use of certain speech variants
from the pool of variation.
This chapter is organized as follows. In sections 3.2-3.5, we address the constraints
problem of Weinreich et al. (1968). We begin with a review of sound change typologies
in section 3.2; despite differences of detail, many share a taxonomy inherited from
the neogrammarians. In section 3.3, we examine elements of speech production and
perception and evaluate possible bias factors in each case; we suggest in section 3.4
that certain patterns of sound change maybe correlated with certain bias factors based
on their phonological typology. We discuss selection in section 3.5, describing facets
of phonologization that are system-dependent; they may involve bias factors, but only
relative to language-specific or universal systematic constraints.
In section 3.6 we turn to the actuation question, sketching a theory of mecha-
nisms that link bias factors and sound changes. While the former are often universal,
the latter are language-specific and at first perhaps even speaker-specific. Successful
changes must propagate from innovators before eventually becoming community
speech norms; we present the results of simulating aspects of this process. We con-
clude in section 3.7 with a brief summary and some questions for future research.
54 Andrew Garnit and Keith Johnson

3.2 Typologies of sound change


Historical linguistics textbooks (e.g. Hock 1991, Hock and Joseph 1996, Campbell
2004, Crowley and Bowern 2009) classify sound changes according to a superfi-
cial typology often naming very specific categories: apocope, cluster simplification,
metathesis, palatalization, umlaut, etc. Of course it is important for students to learn
what these terms mean. But more sophisticated work has always recognized that
an explanatory classification of surface patterns should reflect a typology of causes.
Two typologies have been especially influential within historical linguistics: a tradi-
tional two-way division into articulatorily-grounded and other sound changes, and
a newer three-way division into listener-oriented categories; see Tables 3.1-3.2. We
will briefly describe each approach, as well as Grammont's (1939) more elaborated
scheme.
The traditional typology is due to the neogrammarians. According to this account,
most types of sound change originate through processes of articulatory reduction,
simplification, or variability; dissimilation, metathesis, and a few other types comprise
a residual type with other origins. Osthoff and Brugman (1878) themselves only
briefly comment, indicating that most changes have 'mechanical' (i.e. articulatory)
causes while dissimilation and metathesis are 'psychological' in origin. It was Paul
(1880, 1920) who suggested specifically that the first type originates in articulatory
reduction, speculating as well that the second may have its basis in speech errors.
Crucially, in any case, the neogrammarians and Bloomfield (1933) held that the major
type of sound change was phonetically gradual, imperceptible while under way, and

TABLE 3.1 Several influential traditional typologies of sound change


ORIGIN OF MOST
AUTHOR SOUND CHANGES RESIDUAL TYPE

Osthoff and Brugman 'mechanical' (articulatory) ORIGIN: 'psychological'


(1878) EXAMPLES: dissimilation; metathesis
Paul (1880,1920) articulatory reduction ORIGIN: speech errors?
EXAMPLES: metathesis; non-local
assimilation and dissimilation
Bloomfield (1933) articulatory simplification? ORIGIN: unclear
EXAMPLES: articulatory leaps;
dissimilation; haplology; metathesis;
non-local assimilation
Kiparsky (1995) variation in production ORIGIN: 'perception and acquisition
EXAMPLES: compensatory lengthening;
dissimilation; tonogenesis; context-
free reinterpretation, e.g. [kw] > [p]
3- Phonetic bias in sound change 55

TABLE 3.2 Two recent listener-based typologies of sound change


OHALA (1981; i993b) BLEVINS (2004; 2oo6a; 2oo8b)
LABEL: Hyp o correction LABEL: 'CHOICE'
EXAMPLES: umlaut; many other EXAMPLES: vowel reduction and syncope; vowel shifts;
assimilations stop debuccalization; final devoicing; umlaut; etc.
LABEL: Hypercorrection LABEL: 'CHANCE'
EXAMPLE: dissimilation EXAMPLES: dissimilation; metathesis
LABEL: Confusion of acoustically LABEL: 'CHANGE'
similar sounds EXAMPLES: [6] > [f]; [anpa] > [ampa]; [akta] > [atta]
EXAMPLES: [6] > [f]; [gi] > [di]

regular.3 This theory was couched by Paul (1880, 1920) in a surprisingly modern
exemplar-based view of phonological knowledge (see section 3.6 below).
More recently, a similar two-way scheme has been defended by Kiparsky (1995).
He writes that the first sound change type originates as speech variation with artic-
ulatory causes; certain variants are then selected by linguistic systems, subject to
further (linguistic) constraints.4 The residual type consists of changes that originate
as perceptually-based reintepretations, possibly in the course of language acquisition.
The role of the listener was already crucial for Paul (1880, 1920), according to
whom the major type of sound change occurs when articulatory processes create
variants that are heard by listeners, stored in exemplar memory, and in turn give rise to
new, slightly altered articulatory targets. But in emphasizing the articulatory basis of
sound change, neither the neogrammarians nor their successors explored the possible
details of listener-based innovation. In recent decades, two influential accounts of
sound change have done precisely this. These accounts, due to John Ohala and Juliette
Blevins, share comparable three-way typologies. We highlight the similarities between
them in Table 3.2, though they also have important differences.
For Ohala, most explicitly in a 1993 paper (Ohala i993b), there are three main
mechanisms of sound change.5 The one corresponding most closely to the traditional
category of articulatorily grounded change is what he calls HYPOCORRECTION. This is
rooted in correction, the normalization that listeners impose on a signalfor exam-
ple, factoring out coarticulatory effects to recover a talker's intention. In hypocor-
rection, a listener undercorrects for some co articulatory effect, assuming that it is
3
Bloomfield (1933) suggests with some uncertainty that articulatory simplification may underlie the
major type of sound change; he expresses no view of the cause(s) of the residual type.
4
The role of articulatory reduction in sound change has also been emphasized by other modern linguists
(e.g. Mowrey and Pagliuca 1995; Bybee 2001, 2007), but they have not yet presented an overall account of
how various types of sound change fit together.
5
It is hard to select one or even a few of Ohala's contributions from within his influential and insightful
oeuvre in this area; see Iinguistics.berkeley.edu/phonlab/users/ohala/index3.html for a full list.
56 Andrew Garnit and Keith Johnson

phonologically intended; this leads to the phonologization of coarticulatory patterns.


(One of the most important features of this account is that it explains why articulato-
rily driven changes are not even more widespread: through correction, articulatorily
motivated variants are usually reinterpreted as intended.) A second mechanism is
called HYPERCORRECTION: a listener overcorrects, assuming that a phonologically
intended effect is coarticulatory; this leads to a dissimilatory change. Ohala's third
mechanism of sound change is the confusion of acoustically similar sounds, which he
attributes to the listener's failure to recover some feature found crucially in one sound
but not the other.6
Most recently, Blevins (2004, 2oo6a, 2oo8b) uses the terms CHOICE, CHANCE, and
CHANGE for what she views as the three basic mechanisms of sound change. In
principle they are distinct from Ohala's mechanisms; extensionally they are similar.
For example, CHOICE refers to innnovations grounded in articulatory variation along
the hypospeech-hyperspeech continuum, for which (Blevins 2oo6a: 126) assumes
'multiple phonetic variants of a single phonological form'. Mostly these correspond
to the major type of change recognized by the neogrammarians, and to cases of what
Ohala treats as hypocorrection, though he does not refer to a continuum of phonetic
variants from which hypocorrection operates.
Blevins's term CHANCE refers to innovations based on intrinsic phonological ambi-
guity. For example, a phonological sequence /a?/ might be realized phonetically as
[a], permitting listeners to interpret it phonologically either as (intended) /a?/ or as
/?a/; if/?a/ is chosen, a metathesis sound change has occurred. Dissimilatory changes
described by Ohala as hypercorrection are understood as a special case of CHANCE.
Finally, the term CHANGE refers to innovations in which some perceptual bias leads
to misperception. For example, in an /anpa/ > /ampa/ assimilation, it is hypothesized
that the speaker crucially did not produce [mp]; rather, a listener perceived [anpa]
as [ampa] and interpreted it phonologically as /ampa/. Other examples of this type
include context-free place of articulation shifts like [0] > [f], also mentioned by Ohala
as the parade example of confusion of acoustically similar sounds.7

6
Ohala (i993b: 258) suggests that this can be viewed as a type of hypocorrection; the difference 'is
whether the disambiguating cues that could have been used by the listener (but were not) are temporally
co-terminous with the ambiguous part [as in the confusion of acoustically similar sounds] or whether they
are not', as in hypocorrection.
7
On this sound change see section 3.5.1 below. A potential criticism is that of Blevins's three mech-
anisms, only CHANGE is intrinsically asymmetric (assuming that perceptual biases and constraints on
misperception are asymmetric). By contrast, nothing about CHOICE or CHANCE per se predicts any direc-
tionality; for example, Blevins (2004: 35) notes, in CHANCE 'there is no language-independent phonetic
bias' and 'the signal is inherently ambiguous'. Therefore the explanation for any observed asymmetries
must be sought elsewhere. This criticism is not germane to Ohala's system. In that system, however, since
hypocorrection and hypercorrection are mirror-image processes, there is no immediate explanation for
their many asymmetries (for example, nonlocal laryng al- feature dissimilation is common but nonlocal
laryngeal-feature assimilation is rare).
3- Phonetic bias in sound change 57

Perhaps the fullest typology is that of Grammont (1939), the first author to present
a theory based on a survey of all known sound change patterns.8 For him, sound
changes emerge through competition between constraintshe called them 'laws'
(Grammont 1939:176)favoring effort reduction and clarity, as well as other factors.
Given in (5) is his full scheme; he distinguishes changes where the conditioning
environment is adjacent or local (sb) from those where it is nonlocal (5c). Grammont s
typology cannot readily be adapted to the present day, but it is notable that he invoked
articulatory reduction, perceptual clarity, and motor planning as key ingredients
in sound change. His theory of nonlocal dissimilation is especially interesting (see
already Grammont 1895): he argues that the segment undergoing dissimilation is
always in a 'weaker' position than the trigger; positional strength is defined with
reference to accent, syllable position, and, if all else is equal, linear order, in which
case the first segment is weaker. He suggests that nonlocal dissimilation occurs when
planning for a segment which is in a more prominent position distracts a talker who
is producing a similar segment in a weaker position.
(5) Grammont's (1939) typology of sound changes
a. Unconditioned changes: explanation unclear (in some cases language
contact?)
b. Locally conditioned changes
ASSIMILATION: motivated by articulatory ease
DISSIMILATION: motivated by perceptual clarity
METATHESIS: motivated by perceptual clarity and phonotactic
optimization
c. Nonlocally conditioned changes
ASSIMILATION: explanation unclear, but evidently articulatory
in origin
DISSIMILATION: originates in motor-planning errors
METATHESIS: motivated by perceptual clarity and phonotactic
optimization
Our own presentation draws much from the approaches of earlier authors, but it
crucially differs from them. With its reference to 'articulatory' reduction and variabil-
ity, the traditional dichotomy inherited from the neogrammarians is too simplistic,
even in its modern avatars, and fails to reflect the true complexity of speech produc-
tion. On the other hand, the listener-oriented typologies of Ohala and Blevins leave

8
That is, all patterns known to him over 75 years ago. The only comparable works are by Hock (1991),
whose textbook classifies surface patterns without a theory of causes, Blevins (2004), whose broad coverage
is exhaustive for certain patterns but is not meant to be complete for all types of sound change, and Kmmel
(2007), whose coverage is restricted to a few language families. Today it would be almost impossible to be as
thorough as Grammont tried to be; useful modern sources are Blevins's (2oo8b) 'field guide' and Hanssons
(2008) overview.
58 Andrew Garnit and Keith Johnson

essential questions about speech production unanswered; for example, what processes
generate and constrain the variable input to Blevins's CHOICE? Finally, while thorough
and replete with interesting observations, Grammont's account is too inexplicit and
stipulative to be used without change today.
The typology we present is deductive rather than inductive. That is, rather than
surveying sound changes, we examine components of speech production and percep-
tion, seeking relatively complete coverage, and we ask what biases each component is
likely to yield. We propose that biases emerging from the various systems of speech
production and perception, respectively, underlie various types of sound change with
corresponding phonological profiles. What emerges from this approach has elements
of previous typologies, therefore, but cannot be directly mapped onto any of them.

3.3 Biases in speech production and perception


There are several sources of variability in the speech communication process that may
lead to sound change. For example, a listener may misperceive what the talker says
because the talker is speaking softly or at a distance, or there is some background
noise. Similarly, the talker may misspeak, accidently producing a different sound from
that intended or a variant of the sound that is different from usual. Further, children
may come to language acquisition with bias to organize linguistic knowledge in ways
that turn out to differ from the organization used by their parents.
Variability introduced in these ways by the communication process could be ran-
dom. For example, misperception of a vowel would result in hearing any other vowel
in the language with equal probability. However, most sources of variability are far
from random, and instead introduce bias into the process of sound change so that
some outcomes are more likely than the others.
For example, when the English lax vowel [i] is misperceived (Peterson and Barney
1952), not all of the other vowels of English are equally likely to be heard. Instead,
as Table 3.3 shows, the misperception is likelier to be [e] than any other vowel. This
lack of randomness in perceptual variation is one property of bias factors in sound
change.

TABLE 3.3 Identification of [i] and [e] in Peterson


and Barney (1952). The speaker's intended vowel is
shown in the row label, and the listener's perceived
vowel in the column label
i i s ae a er other

[i] 0.06 92.9 6.75 0.02 o.oi 0.25 o.oi


[s] o 2.5 87.71 9.23 o.oi 0.5 0.05
3- Phonetic bias in sound change 59

TABLE 3.4 Identification of [u] and [A] in Peterson and Barney


(1952). The speaker's intended vowel is shown in the row label,
and the listener's perceived vowel in the column label
U 0 A o a other

[] 0.93 96.55 1.66 0.5 0.16 0.2


[A] 0 1 92.21 1.24 5.25 0.3

A second (defining) property of bias factors in sound change is that bias is direc-
tional. For example, given that [i] is most often misperceived as [e], one might suppose
that [e] would reciprocally be misperceived as [i]. As the second line in Table 3.3
shows, this is not the case. Although [e] is misperceived as [i] at a rate that is greater
than chance, the most common misperception of [e] was as [ae]. Table 3.4 indicates
that the lax back vowels tested by Peterson and Barney showed a similar asymmetric
confusion pattern, where [u] was confused with [A] while [A] was more often confused
with [a]. Labov (1994) observed that in vowel shifts, lax vowels tend to fall in the
vowel space; the perceptual data in Tables 3.3-3.4 suggest that one source of the
directionality of the sound change may be a perceptual asymmetry. In any case, our
main point is that phonetic bias factors are directional.
Phonetic bias factors thus produce a pool of synchronie phonetic variation (Ohala
1989; Kiparsky 1995; Lindblom et al. 1995) which forms the input to sound change;
this is sketched in Figure 3.1. The structure imposed on the phonetic input to sound
change, via the directionality of phonetic variation, is a key source of the typological
patterns of sound change.
In the following subsections, we will consider potential bias factors arising from
the phonetics of speaking and listening, and the extent to which they may provide
both non-randomness and directionality in sound change. Speaking and listening
as a whole can be said to contain four elements that might provide bias factors in
sound change. We will discuss these in turn: motor planning (3.3.1); aerodynamic
constraints (3.3.2); gestural mechanics (3.3.3), including gestural overlap and gestu-
ral blend; and perceptual parsing (3.3.4). The order roughly mimics the order from
thought to speech, and from a talker to a listener. In Section 3.4 we will turn to
discuss representative types of sound change that stem from the various bias factors
we identify in this section.

3.3.1 Motor planning


Motor planning is the process of constructing or retrieving motor plans that will
later be executed by speaking. In this process, speech errors may occur as planning
elements (syllables, segments, gestures, etc.) influence each other through priming or
6o Andrew Garnit and Keith Johnson

Phonetic bias factors:

speech aerodynamics speech perception

motor planning gestural mechanics

pool of phonetic variation

change mechanism
- speech perception

FIGURE 3.1 Phonetic bias factors produce a pool of synchronie phonetic variation which can
be taken up in sound change

coactivation, or through the inhibition of one segment by the activation of another.


Sound changes may then emerge if such speech errors are incorporated into a lan-
guages phonology. Two basic speech error patterns could lead to sound change. The
first of these, BLENDING, has been extensively studied in the speech error literature.
The second, INHIBITION, appears to be much less common, though it is a focus of
language play.9
In motor plan blending, plans for nearby similar segments may influence each other
as they are activated; this is motor PRIMING (see Tilsen loopa with further references).
The blending or interaction of similar, nearby sounds is exemplified in interchange
errors (snow flurries -^ flow snur ries), anticipations (reading list >> leading list), and

9
On motor plan blending see Boomer and Laver (1968), MacKay (1970), Fromkin (1971), Fromkin
(1973), Dell (1986), and Shattuck-Hufnagel (1987). Pouplier and Goldstein (2010) have also shown that
speech planning and articulatory dynamics interact with each other in complex ways, so that the specific
phonetic results of some speech errors may be outside the speakers ordinary inventory of articulatory
routines.
In addition to blending and inhibition, bias may also emerge from what Hume (ioo4b) calls ATTES-
TATION, suggesting that some metathesis patterns point to ca bias towards more practiced articulatory
routines' (p. 229). Undoubtedly, there is a tendency for the articulators to be drawn to familiar, routinized
patterns. This can be seen in loan word adaptation as words are nativized, and probably also exerts a type of
phonotactic leveling as Hume suggests. We consider attestation to be a systematic constraint (section 3.5),
different in kind from the phonetic bias factors, though in this case the difference between linguistically
universal and language-specific biases is particularly fine.
3- Phonetic bias in sound change 61

preservations (waking rabbits >> waking wabbits). Blending of segmental plans due to
adjacency of those plans results in bias toward non-randomness in speech production
errors. People are more likely to blend plans that are in proximity to each otherin
time, phonetic similarity and articulatory planning structure (that is, onsets interact
with onsets, nuclei with nuclei, etc.).
The effects of motor plan inhibition can be seen in tongue twisters where an alter-
nating pattern is interrupted by a repetition (Goldinger 1989). In the sequence unique
New York we have a sequence of onset consonants [j . . . n ... n . . . j] and when the
phrase is repeated the sequence is thus [ . . . j n n j j n n j j n n j j . . . ], an aa bb pattern.
Other tongue twisters are like this as well. For example, she sells sea shells by the sea
shore is [J ... s ... s ... J . . . s ... J]. Typically, in these sequences the error is toward
an alternating pattern [ j . . . n ... j . . . n] instead of the repetition of one of the onsets.
It may be worth noting in this context that repeated tongue motion is dispreferred
in playing a brass instrument like a trombone or trumpet. With these instruments
(and perhaps others) rapid articulation of notes is achieved by 'double tonguing'
alternating between coronal and dorsal stops, rather than 'single tonguing'using a
sequence of coronal stops to start notes.
In both motor plan blending and motor plan inhibition, it is likely that rhythm
and stress may play a significant role in determining that prominent segments will be
preserved while non-prominent segments will be altered, because the prosodie orga-
nization of language is extremely important in motor planning (Port 2003; Saltzman
etal. 2008).

3.3.2 Aerodynamic constraints


Speech production is constrained by aerodynamics even in the absence of interactions
among articulators. Aerodynamic bias factors are characterized by a tendency toward
phonetic change as a result of changing aerodynamic parameters even when all else
(e.g. the position of the articulators) remains constant. Two laws of speech aerody-
namics are involved, among others. The first is the aerodynamic voicing constraint:
in order to produce vocal fold vibration, air pressure below the glottis must be greater
than air pressure above the glottis (Ohala 1983). This physical law governing voicing
sets up an ease of voicing' hierarchy among the phonetic manner classes: stops are the
hardest to voice, since air passing through the glottis will raise supraglottal air pressure
in the closed vocal tract, and vowels are the easiest to voice. Thus, the aerodynamic
voicing constraint introduces phonetic bias into sound change, biasing voiced stops
to become voiceless.10
10
One might wonder if the aerodynamic voicing constraint biases vowels to be voiced. Despite the
symmetry of it, we are reluctant to say so. It seems to us that voiced speech may have some inherent
advantages for spoken communication. For example, voicing provides resistance to air flow so voiced
breath-groups extend over a longer time than voiceless (e.g. whispered) breath groups. Voiced speech is
also louder than voiceless speech, which is a communicative advantage in most situations. We see the
62 Andrew Garnit and Keith Johnson

Linguists have noted a number of different linguistic responses to the phonetic


bias against voiced stops. Voiced stops have lost their voicing and neutralized with
voiceless stops, but in some languages maintenance of a contrast between voiced
and voiceless stops is achieved by altering the phonetic properties of the voiced
series in some way. Such 'repair strategies' include prenasalization, implosion, and
spirantization. In our view, the phonetic bias imposed by the aerodynamic voicing
constraint should impel voiced stops to become voiceless, all else being equal. The
further development of repair strategies is motivated by contrast maintenance; see
section 3.5.1 below on perceptual enhancement.11
A second law of speech aerodynamics that provides phonetic bias in sound change
is a constraint on frication, which can only be achieved when air pressure behind
the fricative constriction is sufficient. This introduces a bias against voiced fricatives
because airflow is impounded by the vocal folds, reducing oral pressure (Ohala 1983;
Johnson 2003: 124; Ohala and Sol 2010). Without any articulatory adjustments,
therefore, voiced fricatives will tend to become glides. As with the aerodynamic voic-
ing constraint, the frication constraint introduces a repelling forcea bias against a
particular combination of phonetic featuresand a direction of change if no contrast-
maintaining repair strategy is applied.

3.3.3 Gestural mechanics


The actual movements of articulators introduce variability in speech, and may intro-
duce bias for sound change. Two types of interaction among articulators have been
implicated in language sound patterns.
In the first type, GESTURAL OVERLAP, independent articulators like lips and tongue
tip are moving at the same time and their movements may obscure each other. For
example, in the utterance hand grenade the tip and body of the tongue are moving to
make stop consonants in rapid succession in the [d] sequence at the word boundary.
Cho (2001) found that the relative timing of gestures across a word boundary is more
variable than is gestural timing within words, so in some productions of hand grenade
the tongue body gesture for [g] may precede the tongue tip gesture for [d]. Though the
[d] closure is made, it has very little impact on the acoustic output of the vocal tract, so
the utterance sounds like [herj graneid]. The coronal gesture of [nd] is hidden by the
dorsal gesture which now covers it. A hearer of such an overlapped utterance would
think that the alveolar gesture has been deleted (Byrd 1994) and so may not include
the hidden gesture in their plan for the word. In gestural overlap, the movement for

aerodynamic voicing constraint as a constraint against voicingproviding a phonetic bias toward the
elimination of voicing in segments where voicing is difficult.
11
On prenasalization and voicing see e.g. Iverson and Salmons (1996). It may be helpful to note also
that contrast maintenancea basic factor that we appeal to in accounting for sound changeis similar to
'faithfulness' constraints in Optimality Theory, whose 'markedness' constraints likewise correspond almost
exactly to our phonetic bias factors.
3- Phonetic bias in sound change 63

a construction can be completely obscured by another. This mechanism introduces


a directional bias in sound changes involving sequences such that back gestures are
more likely to hide front gestures. Debuccalization is an example of this, where a
glottalized coda may be replaced by glottal stop, but we very rarely see glottal stops
become oral.
In the second type of interaction between articulators, GESTURAL BLEND, the pho-
netic plan for an utterance places competing demands upon a single articulator. For
example, in the word keep the tongue body is required to move back toward the soft
palate for the velar [k], and very soon later move forward for the front vowel [i]. Thus,
in this word the location of the tongue during the [k] closure is farther forward in the
mouth than it is during the [k] of words with back vowels like cop.
Several factors determine the outcome of a gestural blend. One of these comes from
the quantal theory of speech production (Stevens 1989). Some gestures are more stable
under perturbation, in the sense that the ouput of an acoustically stable gesture will
not be much affected by blending with another gesture. In blending a quantally stable
gesture with an unstable gesture, the more stable gesture will tend to determine the
acoustics of the output. For instance, the more constricted gesture usually shows a
greater acoustic change in gestural blending, while less constricted gestures are less
impacted. In this way, even though Stevens and House (1963) and later researchers
(Strange et al. 1976, Hillenbrand et al. 2001) found that vowels are significantly influ-
enced by the consonants that surround them, the blending of tongue body gestures
when a vowel follows velar /k/ or /g/ consonants results in a more noticeable change of
the consonant gesture than of the vowel gesture; this yields fronted /k/ and /g/ adjacent
to front vowels.
Because patterns of gestural interaction in blending and overlap are language-
specific, languages develop different or even complementary patterns in phonologiza-
tion.12 For example, while Japanese has vowel devoicing in words like /kusuri/ >>
[krasrari] 'medicine' (Hasegawa 1999), other languages instead have intervocalic
fricative voicing in similar contexts, as in northern Italian /kasa/ >> [kaza]
'house' (Krmer 2009: 213).

3.3.4 Perceptual parsing


The role of listeners and (mis)perception in sound change has been a major research
theme in the three decades since Ohala's 'The listener as a source of sound change'
(1981). Some changes have been explained as a by-product of perceptual similarity:
because two segment types sound similar, they are sometimes confused by listeners.
All else being equal, if the likelihood of misperception is symmetrical, the resulting
sound changes should be symmetrical; if X and Y are confusable sounds, X > Y should
be just as likely as Y > X. If this were true of perceptual confusions generally we would
12
See Bladon and Al-Bamerni (1976) for a discussion of language-specific coarticulation patterns, and
Nolan (1985) on individual differences in co articulation.
64 Andrew Garnit and Keith Johnson

expect perceptual parsing to produce symmetric rather than asymmetric patterns of


change. Simple perceptual confusability would then yield no bias factor favoring one
direction of change over another.
As noted in section 5.1 above, however, sound change is typically asymmetric. For
changes grounded in perceptual parsing, this would mean that listeners sometimes
introduce bias and thus asymmetrical patterns of sound change. In principle this
could happen in at least two ways, though more research is needed in both cases to
determine the nature of the mechanisms. First, in some cases asymmetric misper-
ception may be a bias factor. For instance, in Tables 3.3-3.4 we illustrated perceptual
confusions among lax vowels. These reveal a distinct pattern of perceptual asymmetry
in vowel perception, suggesting that the tendency for lax vowels to lower in the
vowel space (Labov 1994) could have its phonetic roots in asymmetric misperception.
Another such case is studied by Chang et al. (2001), who focussed on sounds that
differ in the presence or absence of some acoustic element (e.g. a particular band of
energy in a stop release burst). They suggest that sounds differing in this way may
be asymmetrically misperceivedthat listeners are more likely to fail to notice the
element than to erroneously imagine it to be present. They relate the asymmetry to
patterns of stop palatalization. Asymmetric misperception could also stem from other
acoustic properties of segments, like the temporal distribution of retroflexion cues
(Steriade 2001), or from properties of the auditory system, like the temporal spread
of masking (Wright 1996; Wright and Ladefoged 1997); more research is needed.
A second class of perceptual bias factors, perceptual hypercorrection, was first
identified by Ohala (1981). This arises when correction (perceptual compensation for
coarticulation) applies to undo coarticulation that is actually absent. For instance,
Beddor et al. (2001) found that listeners are relatively insensitive to vowel nasality
variation when a nasal segment followed. They attributed this perceptual insensitivity
to compensation for coarticulation, and noted that it correlates with a crosslinguistic
tendency for vowel nasality contrasts to be suspended before nasal consonants.13 We
shall note cases in section 3.4.4 where hypercorrection may be a plausible explanation
of sound change.
To forestall misunderstanding, we should comment on the relation between
hypocorrection (in Ohala's sense) and perceptual parsing bias factors for sound
change. As noted above, hypocorrection is Ohala's term for a listeners failure to
correct for coarticulation, which may then lead to sound change. A classic example
involves interactions between vowels and coronal consonants. In a sequence like /ut/,
the coronal tends to front the vowel so that its phonetic realization is closer to [yt].
13
Beddor (2009) argues that, in addition to compensation for coarticulation, the sound change VN > V
is influenced by natural patterns of gestural mechanics in the coordination of oral and nasal gestures.
Interestingly, while there are a number of laboratory demonstrations of correction, there are almost no
controlled observations suggesting that listeners hypercorrect in speech perception. The only example
known to us is presented by Shriberg (1992); cf. Ohala and Shriberg (1990). This may be a gap in the
literature, but it is an important one.
3- Phonetic bias in sound change 65

This mechanical effect (the overlap of consonant and vowel tongue gestures) does not
ordinarily lead to sound change, due to a perceptual mechanism that compensates
for coarticulation (Mann and Repp 1980); coarticulation is perceptually corrected.
But if, for some reason, the listener fails to correct for coarticulation, a change may
result: /u/ > [y] / [cor], with no change before other consonants. Something just
like this seems to have happened in Central Tibetan (Dbus), as illustrated in (6).
Final consonants were debuccalized or lost; the examples in (6a) show vowels that
were unaffected, while those in (6b) show fronting when the final consonant was a
coronal.
(6) Central Tibetan precoronal vowel fronting (Tournadre 2005: 28-32)
a. Written Tibetan (WT) brag rock' > Central Tibetan (CT) fa?
WT dgu 'nine' > CT gu
WTphjugpo rich' > CT thukpo
b. WTkarwool'>CT/;
WTbod 'Tibet' >CTph0?
WTkhortoboiY>CTkh0:
WT bdun 'seven' > CT dy
WT sbrul 'snake' > CT ^yi
Hypocorrection is a key ingredient of change, both in Ohala's and our account, but it
is important to add that hypocorrection per se does not involve a specific bias factor.
The bias factor in cases like (6)the phonetic force that introduces variability and
determines the direction of the changeis gestural. It is coarticulation that deter-
mines whether /u/ will drift toward [y] or [a] in coronal contexts. Hypocorrection
helps determine whether or not a change will occur on a specific occasion, and as
such it is part of a model of actuation; cf. section 3.6.

3.4 Bias factors in sound change


In this section we consider types of sound change that may reflect the bias factors
summarized in section 3.3. Sound changes that can be attributed to motor planning,
aerodynamic constraints, and gestural mechanics are well documented; perceptual
parsing is somewhat harder to substantiate but remains a possible source of sound
change.

3.4.1 Motor planning


Sound changes that have their origins in motor planning bias factors are, in effect,
speech errors that catch on. In recent decadesindeed, since the classic studies of
Meringer and Mayer (1895) and Meringer (1908)it has been dmod to suggest
that speech errors result in change. But while speech error research shows clearly
66 Andrew Garnit and Keith Johnson

that sound change in general cannot be explained as conventionalized speech errors,


it does not exclude the possibility that some types of sound change do have precisely
that origin. This is our contention here.14
In section 3.3.1 we discussed two kinds of motor planning errors: blending and
inhibition. We surmise that there is one common sound change type whose roots may
lie in motor planning inhibition errors: nonlocal dissimilation. Since dissimilation is
complex and its analysis is controversial, we discuss it separately in section 3.4.5. As
for motor planning blending errors, we expect that sound changes emerging from
them should tend to be anticipatory rather than perseverative, and should tend to
involve an interaction between relatively similar segments and segments in relatively
similar prosodie positions; greater similarity should favor the interaction. At least two
types of sound change may conform to our expectations: consonant harmony and
long-distance displacement (nonlocal metathesis).
Consonant harmony is illustrated by the Navajo patterns in (7). Note that harmony
is symmetric; cf. /// -> [s] in (/a) and /s/ -> [J] in (/b).
(7) Navajo (Athabaskan) sibilant harmony (McDonough 1991, cited by Hansson
2010: 44)
a. /j-i/-mas/ * [jismas] I'm rolling along'
//-is-na/ sisnl he carried me'
b. /si-dse:?/ >> [Jidse:?] 'they lie (slender stiff objects)'
/dz-i/-l-ta:l/ -> [dsi/tail] 'I kick him [below the belt] '
The data in (7) illustrate one common feature of consonant harmony: it is more
typically anticipatory than perseverative.
A second common feature of sibilant harmony in particular is a 'palatal bias': many
languages have /s/ >> [J] assimilation but no /// >> [s] assimilation, but the reverse
asymmetry is rare (Hansson 2010: 352-67). In Aari, for example, as seen in (8), affixal
/s/ ^ [J] when added to a root with ///; only /s/ is affected.
(8) Aari (Omotic) sibilant harmony: Causative formation (Hayward 1990)
BASE CAUSATIVE

mer- 'forbid' mer-sis- cause to forbid'



duik- bury' du:k-sis- cause to bury'
diib- 'steal' di:b-zis- cause to steal'
Jen- 'buy' Jen-JiJ- cause to buy'
?u/- 'cook' ?u/-/i/- cause to cook'
Jain- 'urinate' /am-/i/- cause to urinate'

14
In any theory positing occasional events (e.g. misperceptions or failures of perceptual correction)
as sources of the variation that becomes conventionalized in change, it is hard to see what would exclude
occasional speech errors from contributing to the same variation.
3- Phonetic bias in sound change 67

The same asymmetry is found in speech errors (Shattuck-Hufnagel and Klatt 1979).
Stemberger (1991) relates this to 'addition bias' (Stemberger and Treiman 1986),
whereby complex segments are anticipated in planning simple segments; [J] is more
complex because it uses the tongue blade and body.
As Hansson (2010) notes, consonant harmony patterns also resemble speech errors
in being typically similarity-based: more similar segments interact with each other.
In view of this and their other parallels (the nonlocality of consonant harmony its
typically anticipatory nature, and addition bias), Hansson suggests, and we agree, that
phonological consonant harmony patterns are likely to have originated diachronically
in motor planning errors.
Long-distance displacement (nonlocal metathesis) is a second type of sound change
that may have its origin in motor planning. In the typology of metathesis sound
changes (Blevins and Garrett 1998,2004), it is notable that long-distance displacement
commonly affects only some segment types. Often, for example, liquids undergo
displacement leftward to the word-initial syllable onset. This is especially well doc-
umented in Romance varieties and languages influenced by Romance; Old Sardinian
examples are shown in (9).15
(9) Latin (L) > Old Sardinian (OS) liquid displacement (Geisler 1994: 110-11)
L castrum Tort' > OS crstu
L cochlea 'snail' > OS clocha
L complre 'fill' > OS clmpere
L dextra 'right (hand)' > OS dresta
Lfebrurium of February' > OSfrevariu
Lpigrum 'slow' > OS prigu
Lpblicum 'public' > OS plubicu
Such displacements are usually anticipatory and tend to involve comparable sylla-
ble positions. For example, as in (9), displacement is often restricted to interchange
between obstruent-liquid clusters. We take it that such phonologized patterns are
rooted in motor planning. Independent support for this view comes from the fact
that such displacements are a well-documented speech error pattern, as in German
Brunsenbenner for Bunsenbrenner 'Bunsen burner' (Meringer and Mayer 1895: 91).16

15
In Old Sardinian, as Geisler (1994: 112) notes, the displacement is restricted to adjacent syllables. In
modern dialects, longer-distance displacements are also found: Latin fenestra 'window' > Old Sardinian
fenestra > modern dialectal fronsta. This chronological difference between one-syllable and longer dis-
placement patterns undermines an argument by Blevins and Garrett (2004: 134-5), based on comparable
data in southern Italian dialects of Greek, that certain details of the longer displacement patterns favor the
view that such changes originate through misperception.
16
While displacements of this type are not rare in speech error corpora, we have not studied the data
carefully enough to judge whether other displacement patterns that are unattested as sound changes might
also correspond to rarer speech error patterns. If they do, as Juliette Blevins points out to us, we would face
the problem of explaining why such errors do not sometimes yield sound changes.
68 Andrew Garnit and Keith Johnson

3.4.2 Aerodynamic constraints


The aerodynamic constraints on voicing and on frication summarized in section 3.3.2
have consequences for sound change. For example, the familiar change of final obstru-
ent devoicing can be interpreted as an effect of the aerodynamic voicing constraint in
a position where voicing is especially vulnerable.17
The aerodynamic frication constraint is likewise responsible for changes whereby
voiced fricatives become glides. An example of the latter is the common pattern of
[z] > [j] rhotacism (Sol i992a, Catford 2001). This change is known from many
languages, including Latin and West and North Germanic. Its Old English (OE) effects
are seen in words like xerian to praise', maira 'more', and xord 'treasure' (cf. Gothic
hazjan, maiza, and huzd respectively; OE r was probably [i]). In the change in (10),
OE [j] and [y] became glides [j] and [w] when surrounded by voiced segments in
Middle English (ME). When preceded by a liquid, ME w remained intact but in other
positions the glides in (10) became diphthong oifglides or underwent further changes.
(The Middle English forms in (10) are not given in IPA.)
(10) Middle English (ME) voiced dorsal fricative gliding (Luick 1940: vol. 2,
PP- 945-6; the earlier forms shown in each case are from late OE or early ME)
a. cij. > ME keie 'key'
e:j.e > ME eye 'eye' (cf. German Auge)
pljian > ME pleien 'play'
b. laye> ME law e 'law'
jeoyup > ME youth 'youth' (cf. German Jugend)
c. boryian > ME borwen 'borrow' (cf. German borgen)
folyian > MEfolwen 'follow' (cf. German foIgen)
morye > ME morwe '(to)morrow' (cf. German Morgen)
sorye > ME sorwe 'sorrow' (cf. German Sorge)
The precise mechanism by which aerodynamic constraints yield new pronun-
ciations warrants consideration. We prefer to avoid teleological formulations (e.g.
[y] > [w] 'to avoid the combination of frication and voicing'), and we find it more
appealing to assume that aerodynamic factors give rise to a biased distribution of
variants. In voiced fricatives, for example, the tendency to reduced airflow behind the
fricative constriction will automatically yield occasional glide variants. Sound changes
like the ones illustrated above then take place when these variants become individual
or community speech norms.

17
Other changes indirectly attributable to this constraint are noted in section 3.5.1.
3- Phonetic bias in sound change 69

3.4.3 Gestural mechanics


In section 3.3.3 we discussed two types of interaction among articulations: gestural
overlap and gestural blending. The latter occurs when segments place competing
requirements on a single articulator; gestural overlap involves interaction between
independent articulators. Some very common types of sound change are rooted in
gestural overlap, including those in (11):

(11) a. VN > nasalized vowel


b. Cluster simplifications that originate in gestural masking, e.g. [ktm] > [km]
c. Stop debuccalizations that originate in glottal coarticulation
For debuccalizations as in (nc), we assume that /k/ > [?] changes may have an
intermediate [k?] realization. If the glottal closure then masks the oral closure, the
audible result is [?].
A less common change originating in gestural overlap is the first stage of the English
development in (12):
(12) English velar fricative labialization: [x] > [f] / round V
Old English *koxxian > Middle English kouxe > cough
Old English xlxxan > Middle English lauxe > laugh
Old English ruix > Middle English rouxe > rough

Note that (as the modern ou, au spellings indicate) all three English words in (12) had
a round vowel [u] before [x]. We follow Luick (1940: vol. 2, pp. 1046-53) and Catford
(1977) in assuming that en route from [x] to [f ] there was a realization like [xw], result-
ing from overlap of the round vowel and [x]. Catford notes that a strongly rounded
pronunciation can still be heard in southern Scotland: [krxw] 'laugh', [rAuxw] 'rough',
etc. The remaining [xw] > [f] change is not due to gestural mechanics and will be
discussed in section 3.5.1 below.
Typical changes due to gestural blend are coronal or velar palatalization (see further
3.4.4 below), the Tibetan precoronal vowel fronting pattern in (6) above, and vowel
coalescence. Shown in (13), for example, are Attic Greek coalescence patterns for non-
high non-identical short vowels. Here the coalescence of mid vowels preserves height;
directionality is relevant in some but not all cases.18

18
Omitted in (13) are the coalescence of identical vowels as long vowels and of glide sequences as
diphthongs. Note in relation to palatalization that not all 'palatalization is the same: whereas coronal
palatalization can be interpreted as an effect of gestural blend, labial palatalization would reflect gestural
overlap.
/o Andrew Garnit and Keith Johnson

(13) Selected Attic Greek vowel contraction patterns (Rix 1992: 52-3, Smyth
1956: 19)
INPUT CONTRACTION EXAMPLE

e+o o: phileomen > philoimen


o +e o: *dileton > dsditon

a+o o: *ti:momen > timimen


o+a o: *aida > aidoi

a+e a: *timae > turna:


e+a e: gnea > gnei
Such examples are categorized as gestural blend because, in terms of vowel height and
backness (not rounding), they involve a single articulator, the tongue body, on which
successive vowel segments place conflicting demands.

3.4.4 Perceptual parsing


In section 3.3.4 we described three perceptual parsing phenomena that might yield
sound change: symmetric misperception; asymmetric misperception; and perceptual
hypercorrection. As we noted, symmetric misperception cannot generate asymmetric
bias factors as such; in fact, it is rarely correlated with well-established (bidirectional)
sound change patterns. Perceptual hypercorrection and dissimilation will be dis-
cussed separately in section 3.4.5. In this section, we discuss three types of sound
change that have been attributed to asymmetric misperception: velar palatalization;
unconditioned [0] > [f ] changes; and obstruent + [w] > labial obstruent shifts. In each
case, there is some evidence that perceptual parsing underlies the change but other
evidence pointing elsewhere.19 We regard the matter as unsettled.
Velar palatalization is the best-studied case where there maybe a meaningful corre-
lation between a sound change type and asymmetric misperception. One of numerous
examples of this type of change is found in English, as shown in (14), where the
highlighted examples of k and f/are from original */c.20
(14) OE palatalization: *fc > f/in syllables with front vowels (Sievers 1898: 101-5)
a. Word-initial palatalization
f/efl/'chaff
tfeiap cheap'
tfild child'
19
For example, Babel and McGuire (2010) report that [0] perception is more variable than [f] percep-
tion in both audio and audio-visual stimuli.
20
Only voiceless *k palatalization is illustrated, because the interaction of spirantization with the
palatalization of *g would require more detailed exposition.
3- Phonetic bias in sound change 71

b. Internal onset palatalization


*drenki- > drentf a drink'
ortfeard orchard'
riitfe 'rich'
c. Coda palatalization
ditf 'ditch'
pitf 'pitch'
swiltf 'such'
d. No palatalization in syllables with back vowels
kuiO 'known'; cf. (un)couth
sak 'sack'
Of course, as noted in section 3.4.3 above, gestural blending is implicated in velar
palatalization, which arises from the interaction of articulatory instructions for a front
vowel and a velar consonant. But a coarticulatorily palatalized velar is far from being
an alveopalatal affricate; it is that distance that perceptual parsing accounts are meant
to bridge. For example, Guin (1998) studied the perceptual similarities of velar stops
and alveopalatal affricates and found that when stimuli are degraded by gating or noise
masking, tokens of [ki] are significantly often misperceived as [t/i], while tokens of
[ka], [ku], [t/i], [t/a], and [t/u] are more often perceived accurately. In a nutshell,
[ki] is misperceived as [t/i] but [t/i] is not misperceived as [ki]. Guin suggests
that velar palatalization leads to alveopalatal affricates because of this asymmetric
misperception.21
Our main reservation regarding this argument is that it is not yet supported by
phonetic studies of ongoing changes that show a clear articulatory leap from [t ] to
[tj]. We hesitate not only because gestural blending is involved, but because it remains
possible that the transition from [t ] to [tj] is mediated not by perceptual parsing but
by processes that include perceptual enhancement (section 3.5.1). In Modern Greek,
for example, velar palatalization yields palatals: /k g x y/ >> [c j j] before front
vowels (Arvaniti 2007); some dialects have a further [c j] > [tj dg] change. If this is a
typical pathway for [k] > [tj] palatalization, we would want to evaluate the possiblity
that affrication of [c] reflects perceptual enhancement. But insofar as clear cases of
asymmetric misperception are identified, and are correlated with sound changes that
do seem to have originated as articulatory leaps between the relevant segment types,
it is likely that they are a source of sound change.
We are also uncertain about the asymmetric-misperception account of the [0] > [f ]
change found in English and Scots dialects and some other languages.22 A point
21
The nature of Guion's argument is similar to that of Chang et al. (2001), but they discussed asymmetric
misperception of [ki] and [ti], which does not correspond to a well-attested sound change pattern.
22
For what it is worth, the change itself is not very common. Though it has occurred in several languages
(Blevins 2004: 134-5, Kmmel 2007: 193), it is less common than the superficially comparable change
/s/ > [0], which evidently targets dental [s] and thus seems to have an articulatory basis.
72 Andrew Garnit and Keith Johnson

in favor of this account, to be sure, is that experimental studies (Miller and Nicely
1955, Babel and McGuire 2010) show that [0] is misperceived as [f] significantly
more often than the reverse; this is consistent with the fact that a [f] > [0] change
is unknown.23 But we suspect that the change may involve first the development of
labialization on [0],i.e. [0] > [0W], with a further [0W] > [f] change that is similar to the
English [xw] > [f] change mentioned in section 3.4.3. We have three reasons for our
suspicion. First, in Glasgow, to which the English [0] > [f ] change has spread in recent
decades, there is a variant that Stuart-Smith et al. (2007) describe as a labialized dental
fricative, perceptually intermediate between [0] and [f]. 24 Second, in South Saami and
Latin there are cases where an interdental > labiodental fricative change is limited to
labial contexts (Kmmel 2007:193); we interpret these as shifts targeting phonetically
labialized interdentals, equivalent to the [0W] > [f] step that we assume for [0] > [f]
shifts generally. Third, within Northern Athabaskan, as analyzed by Howe and Fulop
(2005) and Flynn and Fulop (2008), a reconstructible series of interdental fricatives
and affricates has the outcomes in (15):
(15) Selected reflexes of Northern Athabaskan interdental fricatives and affricates
a. Interdentals: Dene Tha dialect of South Slavey
b. Labials ([p], [ph], [p?], [f], [v]): Tulita Slavey
c. Labial-velars (e.g. [kw], k wh ], [kw?], [AY], [w]): Dogrib, Hare, Gwich'in
d. Velars: Dene Tha and Gwich'in dialects
e. Pharyngealized sibilants: Tsilhqot'in
Howe and Fulop (2005) argue that the Tsilhqot'in development in (ise) was as in (16),
and that all the outcomes in (i5b-i5e) passed through a labialized interdental stage.
(16) Northern Athabaskan interdental fricatives and affricates in Tsilhqot'in
[*t0, *t0h, *t0?, *0, *8] > [*t0w, *t0wh, *t0w?, *0W, *9W] > [ts?, ts h , ts?, s?, z ? ]
If so, two of the best-documented [0] > [f] cases (in English and Scots dialects, and in
Athabaskan) show evidence for an intermediate [0W] stage. Howe and Fulop (2005)
and Flynn and Fulop (2008) suggest that the reason labialization emerges is that it
enhances the acoustic feature [grave], which, they contend, characterizes interden-
tals; in their Jakobsonian formulation, [flat] enhances [grave]. In short, on this view
of [0] > [f] shifts, the initial bias factor driving them is not perceptual parsing but
perceptual enhancement (section 3.5.1).
23
As Nielsen (2010: 10) points out, however, if it is asymmetric misperception that explains [0] > [f]
shifts, we might expect [0] > [f] substitutions in English second-language learning; in fact other substitu-
tions appear to be more common.
24
We are not aware of detailed phonetic studies of the ongoing [0] > [f] change in other dialects. Note
that an independent earlier [0w] > [f ] change is documented in Scots dialects: Old English Owiitan > Buchan
Scots fdjt cut' (Dieth 1932). Of course this does not prove that the same change happened later, but it
establishes the change as a natural one within the phonological context of English and Scots.
3- Phonetic bias in sound change 73

A final common type of sound change where asymmetric misperception has been
assumed is the 'fusion of obstruent + [w] sequences as labial obstruents. In the typical
examples in (17), sequences with stops fuse as bilabial stops and those with frica-
tives fuse as labiodental fricatives.25 Two other examples were mentioned above: the
Buchan Scots Ow >/change in note 24 and the hypothesized Tulita Slavey labiodental
> labial shift in (isb).
(17) a. Stop-glide fusion: Latin dw > b I #
dwellum > bellum 'war'
dwenos > bonus good'
*dwis > bis 'twice'
b. Stop-glide fusion: Ancient Greek kw > p
*wekwos > epos 'word'
*leikwoi > leipoi 'I leave'
*kwolos > polos 'pivot'
c. Fricative-glide fusion: Old English xw > Buchan Scots/(Dieth 1932)
xwa: >/a:'who'
xwcet > fat 'what'
xwiit >fojt 'white'
xwonne > fan 'when'
Significantly, the fricative changes involve a bilabial > labiodental place of articulation
shift. Note also that the Slavey change is non-neutralizing (the phonological inventory
previously lacked labials) while the others are neutralizing.
In essence, the perceptual parsing account of changes like these is that [kw] is
sufficiently likely to be misheard as [p], and [0w] or [xw] is sufficiently likely to be
misheard as [f], for such misperceptions occasionally to give rise to a new phono-
logical representation. Though we do not know of any relevant experimental work,
we would not be surprised to learn that asymmetric misperception patterns such as
these can be confirmed in the laboratory. Still, one or two points are worth making.
First, competing with the perceptual parsing account is one based on articulatory
change: an account in which the glide [w] becomes a stop or fricative before the
immediately preceding stop or fricative articulation is lost. For example, according to
the competing view, [kw] > [p] via intermediate [kp] (or the like) and [xw] > [f ] via
intermediate [x<f>] (or the like). That such an intermediate stage is possible has support
from several sources. For the stop changes in (17), Catford (1977) mentions examples
like that of Lak and Abkhaz, where, for example in Lak, /kw?/ is realized as [kp ? ].
Catford writes that the 'the labial element is an endolabial stop: the lips are pushed

25
In some cases the glide is printed as a secondary articulation, in other cases as a distinct segment.
This reflects the standard phonological analyses of the languages and probably does not signify any relevant
phonetic difference.
74 Andrew Garnit and Keith Johnson

forward, but kept flat (not rounded)', and suggests that the Greek change in (i/b) may
have passed through the same stage. As Larry Hyman reminds us, labialized velar >
labial-velar changes are also well documented in Africa, for example in the Eastern
Beboid (Niger-Congo) language Noone (Hyman 1981; Richards 1991). To confirm
the perceptual parsing account of [kw] > [p] changes, it would be desirable to identify
an ongoing case where such a change involves no intermediate variants.
For fricative changes such as [xw] > [f], Catford (1977) compares Scots dialects:
The labialisation becomes quite intense, towards the end of the sound, and, intervocalically,
almost completely masks the sound of the velar component. Anyone who heard a South Scot
saying 'What are you laughing at', ['xwAt a r i 'Ierran at] can have no further doubts about how
[x] developed to [f ].

It is important to note the difference between [<f>] and [f]. It may be that the shift to
a labiodental place of articulation is due to perceptual parsing, but since labiodental
fricatives are noisier than bilabial fricatives it may alternatively be possible to assume
auditory enhancement (of continuancy). In any case, for the stop changes (e.g. [kw] >
[kp]) and the fricative changes (e.g. [xw] > [x<f>]), we are left with the question of
whether the emergence of [p] and [<f>] respectively is due to perceptual parsing (e.g.
[kw] misperceived as [kp), articulatory variability (e.g. [w] occasionally pronounced
with lip closure or near-closure), or some other cause.26 The question strikes us
as unresolved, and with it the role of perceptual parsing in sound changes of the
three broad types examined in this section, which target palatalized and labialized
obstruents. We turn in the next section to a final type of sound change that has been
attributed to perceptual parsing.

3.4.5 Nonlocal dissimilation


Broadly speaking, there are two competing explanations of nonlocal dissimilation.27
As discussed above, the well-known model of Ohala (1981, i993b) explains dis-
similation as an effect of perceptual hypercorrection; cf. Gallagher's (2010) recent
study invoking perceptual processing. A traditional competing explanation appeals
to motor planning errors (Grammont 1895; Carnoy 1918; Grammont 1939; Frisch
2004; Frisch et al. 2004; Alderete and Frisch 2007). For example, Carnoy (1918: 104)
writes that 'when two sounds or two syllables coincide and have to be visualized
together and articulated after one another . . . the image of one of them easily crowds
out the image of the other'; we take this as a reference to planning. Somewhat less
obscurely, Alderete and Frisch (2007: 387, citing Berg 1998 and Frisch et al. 2004)
26
Dialect variation in the realization of Swedish csj' may be fertile ground for studying fricative place
of articulation change. This sound, which is described by the IPA as a voiceless simultaneous palatal-velar
fricative, has a variety of realizations in dialects of Swedish, including a velarized labiodental variant [fx]
(Lindblad 1980; Ladefoged and Maddieson 1996).
27
See Alderete and Frisch (2007) and Bye (2011) for overviews and general discussion with reference
to further literature.
3- Phonetic bias in sound change 75

refer to a 'functional motivation... in the difficulty of processing words containing


repeated segments during speech production.
We believe that it is worth re-examining the motor planning account of nonlocal
dissimilation. As background we begin by presenting four typical dissimilatory sound
changes. The first is a less celebrated case of the most famous example of dissimila-
tion, Grassmann's Law in Indo-European. This term refers to independent changes
(in Greek and Sanskrit) whereby the first of two nonadjacent aspirated stops was
deaspirated. It has been suggested that the same change may also have happened in
the prehistory of Latin; examples are shown in (18).
(18) Grassmann's Law in Latin (Weiss 2010: 156)
a. *bhardha > *bardha (> barba 'beard'; cf. OCS barda, English beard]
b. *g^ladhros > *gladhros (> glaber 'smooth'; cf. German, Yiddish glatt)
The crucial change in (18) was prehistoric: *bh > *b in (i8a), *g^ > g in (i8b). The
change is shown by the eventual Latin outcomes, with initial b and g in (i8a-b)
respectively. Without dissimilation, regular Latin sound changes would have yielded
initial *bh >/in (i8a), i.e. ^frb a, and probably initial *g^l > /in (i8b), i.e. glaber (just
as *g^r > r in rvus gray'; cf. English gray).
Another laryngeal feature is targeted by a Secwepemctsin (Shuswap) change that
has been called a Salish Grassmann's Law (Thompson and Thompson 1985). Dissim-
ilatory deglottalization is shown in (19) with diachronic and synchronie examples.
(19) Secwepemctsin dissimilatory deglottalization
a. Diachronic examples (Thompson and Thompson 1985)
PROTO-INTERIOR-SALISH SECWEPEMCTSIN
? ?
*k ip 'pinch' kip?-m
*qw?ats? 'full' qwets?-t
*ts?ekw? 'shine' tsdkw-tsdkw?-t

b. Synchronie examples: Reduplication and infixation (Kuipers 1974)


NO DISSIMILATION TRIGGER DISSIMILATION

k?J3J 'be cold, freeze' t-kj-k?ij-t 'chilled'


?
q ix-t 'strong' qd-qi-q?X-t 'stronger'
q iw-t 'to break' qw-q -iw 'brittle'
st?ekw 'to show off' ste-t?-kw 'smarty'
kw?inx 'how many?' kwi-kw?-nx 'how many (animals)?'
Finally, in (20-21) we illustrate typical sonorant dissimilations. Liquids are the
most common segment type to be affected by nonlocal dissimilation, as in Sundanese,
where an infix /-ar-/ surfaces as /-al-/ when it is followed somewhere in the word by
/6 Andrew Garnit and Keith Johnson

r; examples are in (20). 28 Dissimilatory changes involving / and r in morphology are


crosslinguistically common.
(20) Liquid dissimilation in Sundanese (Western Malayo-Polynesian; Cohn 1992)
BASE PLURAL

A poho p<ar>oho 'forget'


gilis g<ar>ilis 'beautiful'
ayim <ar>ayim 'patient'
di-visualisasi-kin di-v<ar>isualisasi-kin 'visualized'
B dahar d<al>ahar 'eat'
parceka p<al>arceka 'handsome'
motret m<al>otret 'take a picture'
In (21), we see cases in Italian where original n... n sequences dissimilated to l...n.
The first of two nasals lost its nasality and became another coronal sonorant.

(21) Lexically irregular nasal dissimilations in Italian


SOURCE ITALIAN
Celtic Bononia Bologna 'Bologna
Greek Panormos Palermo 'Palermo'
Latin venenum veleno 'poison
Latin unicornis licorno 'unicorn'
The examples in (18-21) are typical of the featural and positional typology of
dissimilation. In featural typology typical dissimilation targets include secondary
features such as aspiration as in (18), glottalization as in (19), labialization, and
palatalization, as well as some sonorant features, including nasality as in (20) and
most especially liquid features as in (21). This profile has been interpreted in two
main ways. First, Ohala (1981:193) writes that 'only those consonantal features should
participate in dissimilation which have important perceptual cues spreading onto
adjacent segments'. (This view has the potential problem that in cases like (21), it is
necessary to assume that velum lowering in unicornis spanned an intervening [k].)
Second, Carnoy (1918) suggests that dissimilation typically targets features that are
either articulatorily complex (he mentions the trill [r]) or 'more fugacious and more
inconspicuous' (including aspiration and glottalization).
In any case, we are struck by parallels, having to do with liquids, between the
featural profiles of dissimilation and of motor-planning speech errors. The speech
errors in (22) are unambiguously dissimilatory in nature; in (22a-c) the output of
liquid dissimilation is also a liquid, while the output in (22d) is a nasal.

28
An additional pattern is that with an /-initial base, the infix undergoes assimilation and surfaces as
/-al-/: litik little -> plural l<al>itik.
3- Phonetic bias in sound change 77

(22) Liquid dissimilations in speech errors


a. Das ist doch ungrau... unglaublich 'that's incredible' (Meringer 1908: 93)
b. Eine Partei muss auch in den verschiedenen Gle... Gremien die Fragen der
Zeit diskutieren. (Berg 1998: 182-3)

A political party also has to discuss the current issues in its various commit-
tees.'
c. the blide of Frankenstein (for the bride of Frankenstein) (Fromkin 2000:
no. 1711)
d. zwei Fliegen mit einer Knapp... Klappe schlagen (Berg 1998: 178)

to kill two birds with one stone'
These two speech error outcomes correspond to the two most common diachronic
liquid dissimilation patterns.
The examples in (23) are ambiguous because / and r are both present in the imme-
diate context in each example, so the errors might in principle be assimilatory; but in
each case positional parallelismla/le in (23a), gr/gr in (23!)), bl/fl in (23c)suggests
that dissimilation is a likelier interpretation.
(23) Liquid dissimilations in speech errors: Ambiguous examples of planning inhi-
bition
a. Kravierlehrer (for Klavierlehrer 'piano teacher') (Meringer and Mayer
1895:96)
b. ein grosser Gleu... Greuel a great abomination' (Meringer 1908: 93)
c. bergebri...gebliebenes Fleisch left-over meat' (Meringer 1908: 93)
Dissimilatory speech errors are admittedly uncommon; those involving liquids are
less than ten per cent as frequent as nonlocal assimilatory errors involving liquids.29
But they are well enough documented, as illustrated in (22-23), that a theory of speech
production should take account of them. And if dissimilatory speech errors are a clear
pattern, they might in some cases lead to sound change.30
We next consider positional typology: In what positions are segments the targets
of dissimilation? A traditional generalization is that nonlocal consonant dissimila-
tion is more often anticipatory, as in (18-21), than perseverative. This view is not
supported in recent work (Bye 2011), but it is worth noting that the latter does
not count lexically irregular cases or distinguish surface-true patterns from affixal
alternations. In any case, based on a range of (mostly Indo-European) examples,

29
The examples in (22-23) include the complete dossier of reasonably persuasive cases in the published
corpora of Meringer (Meringer and Mayer 1895; Meringer 1908) and Fromkin (2000).
30
We do not know of speech error studies for languages with phonological glottalization, aspiration,
etc. The motor-planning account of dissimilation predicts the existence in such languages of dissimilatory
speech errors involving those features.
78 Andrew Garnit and Keith Johnson

Grammont (1895, 1939) argues that dissimilation tends to target segments in unac-
cented positions and in 'weaker' syllable positions (e.g. onsets rather than codas).
The idea that typical targets of dissimilation are 'weak' positions and perhaps 'weak'
features (secondary features such as aspiration) is consistent with a motor-planning
approach. In interactions between nearby segments with identical features, motor
plan inhibition (section 3.3.1) eliminates repetition by preserving the more salient
(anticipated or positionally 'stronger') segment.31

3.5 Systemic constraints on phonologization


As discussed in sections 3.3-3.4, biases in speech production and perception pro-
vide the starting point in sound change, but they do not exhaust the processes of
phonologization. Rather, as noted in section 3.1, they generate a pool of structured
variation from which phonological patterns emerge; other processes too contribute to
the outcome. In this section we identify some additional elements of phonologization
that a full account will need to treat in detail, and we comment on possible associated
bias factors.

3.5.1 Enhancement
The initial stages of sound changes that emerge from the bias factors discussed in
sections 3.3-3.4 are either categorical or incremental. They are categorical if they are
already phonetically complete in their initial stage. For example, if motor planning
errors are a source of sibilant harmony, the erroneous pronunciation of [s] may
already have been a fully changed [J]. Our expectation is that changes rooted in motor
planning and perceptual parsing are often categorical.
By contrast, in changes emerging from aerodynamic constraints and gestural
mechanics, the structured variation found in the initial stage of phonologization may
involve pronunciation variants that differ considerably from the eventual outcome.
For example, the first stages of adjacent-vowel coalescence might involve only partial
gestural overlap, with complete coalescence resulting only after several generations or
longer. Similarly, there is apparently a range of intermediate pronunciations between
[Vwx] and [Vxw], or between the latter and [f]. We use the term ENHANCEMENT
to refer to processes by which a relatively small initial bias effect is amplified to its
eventual categorical result.32 This in turn has two distinct profiles.

31
Tilsen (this volume) proposes a connection between motor-planning inhibition and dissimilatory
effects, grounded in the following experimental observations (from areas outside language): 'when move-
ment A to one target location is prepared in the context of planning a distractor movement B to a sufficiently
different target location, then the executed trajectory of movement A deviates away from the target of
movement B... In addition, more salient distractors induce greater deviations
32
This use of the term is not what Stevens and Keyser (1989) meant when they wrote about featural
enhancement, but there are parallels. Some phonetic property is made more recoverable by changes in
3- Phonetic bias in sound change 79

First, in what we call ARTICULATORY ENHANCEMENT, the magnitude of an existing


feature is enhanced. For instance, in a typical umlaut change targeting /uCi/, the shift
from a partly fronted [u] (the result of gestural blending) to a fully fronted [y] is a
shift of gestural magnitude. Numerous changes driven by gestural mechanics can be
described in comparable terms. In some such cases a secondary feature may become
prominent. For example, the distinction between long and short vowels in English was
enhanced by the promotion of redundant vowel quality differences between long and
short vowelsyielding the modern tense/lax distinction, cued by both vowel length
and vowel quality (e.g. [ii] vs. [i]). In a sense, this is a perceptual phenomenon; a
contrast is perceptually strengthened by exaggerating a redundant cue (Stevens and
Keyser 1989; Whalen 1990; Kingston and Diehl 1994). But articulatory enhancement
has not introduced any new phonetic cues, and thus has no place in a list of phonetic
bias factors.
In some cases a feature is temporally realigned, yielding greater perceptual dis-
tinctness, rather than having its magnitude as such enhanced. For example, in the
development of English [f ] from earlier round vowels followed by [x], a crucial step
was evidently a shift such as [wx] > [xw], in which labialization is realigned with the
end of the fricative.33 Presumably this timing change served to enhance the perceptual
distinctness of labialization. Similarly, in the development of /ai/ diphthong central-
ization before voiceless consonants ('Canadian Raising'), Moretn and Thomas (2007)
argue from age-graded phonetic data that the effect first emerged in the offglide and
subsequently spread to and was enhanced in the nucleus. As they schematize the shift
in tight vs. tied, a [thait] vs. [thaed] difference evolved into [thAit] vs. [thaed].
Another more dramatic case of temporal realignment is described in Bessell's
(1998) study of anticipatory consonant-vowel harmony in Interior Salish. In
Sncicuumscn (Coeur d'Alne Salish), this process targets vowels that are fol-
lowed in the word by so-called FAUCALS: uvulars, pharyngeals, or /r/. Examples are
given in (24).
(24) Sncicuumscn anticipatory harmony (Reichard 1938; Bessell 1998)
NO HARMONY TRIGGER HARMONY

[tsij-t] it is long' [tsej-cdqw]
he is tall'

[settj-nts] he twisted it' [n?-sattj-?qs-n] crank (on a car)'
Crucially, as Bessell demonstrates, this process cannot be analyzed as phonetic spread-
ing, because intervening consonants are demonstrably unaffected phonetically. She
pronunciation that highlight the phonetic essence of the sound. Stevens and Keyser noted that featural
enhancement may be language-specific; this is consistent with phonetic enhancement in sound change.
33
Cf. Silvermans (ioo6b) account of a Trique sound change whereby velars became labialized after [u]:
*uk > [ukw], *ug > [ugw] (e.g. [nukwah] 'strong', [rugwi] 'peach'); non-velar consonants were unaffected
(e.g. [uta] 'to gather', [duna] 'to leave something'). Silverman suggests that labialization emerged because
those velar tokens that happened to be slightly labialized would have been more likely to be categorized
correctly by listeners, and in this fashion labialized velars gradually evolved.
8o Andrew Garnit and Keith Johnson

suggests that this pattern (which amounts to long-distance agreement) arose directly
from the purely local vowel-consonant coarticulation found in closely related Interior
Salish languages. She writes that the root cause of the shift is 'that faucal features
are maximally compatible with vocalic rather than consonantal structure... [T]he
phonologisation of local co articulation [in related languages] lays the ground for
a more general assignment of faucal features to vocalic structure, so that faucal
features appear on any preceding vowel' (Bessell 1998: 30). Note that in this as in
other cases of articulatory enhancement, the basic direction of change is determined
by articulatory factors; the bias emerges from gestural mechanics, not perceptual
enhancement.
Second, in what we call AUDITORY ENHANCEMENT, a new articulatory feature is
introduced with the effect of enhancing the auditory distinctness of a contrast. A
classic example is lip rounding on back vowels, which positions vowels in the acoustic
vowel space in a maximally dispersed way (Liljencrants and Lindblom 1972), thus
enhancing the overall perceptual contrast in the vowel system. Other redundant sec-
ondary features that can be analyzed in a similar way include the labialization of [J].
In our discussions of individual sound changes above, we have also identified several
developments, listed in (25), that may be attributable to auditory enhancement.
(25) Possible examples of sound change due to auditory enhancement
a. Prenasalization in voiced stops enhances voicing (section 3.3.2)
b. [0] > [0W] enhances [flat] (section 3.4.4)
c. [ x <f>] > [f] enhances continuancy (section 3.4.4)
The emergence of auditory enhancement could be envisioned in at least two ways.
One possibility is that talkers possess linguistic knowledge of acoustic targets, and that
new articulatory features are sometimes introduced in speech when a contrast is insuf-
ficiently salient. Such new features then spread like any other linguistic innovations.
Another possibility is that features that emerge through auditory enhancement are
occasionally present in natural speech, simply by chance along with other phonetic
variants, but that because they enhance a contrast they have a privileged status in
listeners' exemplar memories, and are then more frequently propagated. We cannot
judge which account is likelier. But whether the speaker-oriented or the listener-
oriented approach ultimately proves more satisfactory, it is worth noting that auditory
enhancement, unlike articulatory enhancement, does define a set of bias factors for
linguistic change: new features may arise that auditorily enhance existing contrasts.
This is a bias factor, but unlike those described in sections 3.3-3.4, it is system-
dependent. 34
34
Note that enhancement need not be regarded as teleological. For example, Blevins and Wedels (2009)
account of anti-homophony effects may generate articulatory (and perhaps even auditory) enhancement
effects as an automatic by-product of phonetic categorization.
3- Phonetic bias in sound change 81

3.5.2 Selectional bias


Since phonologization involves the transformation of a phonetic pattern into a cat-
egorical speech norm, part of a languages phonological system, it is possible that
some selectional constraints intervene in this transformation. This could happen in
several ways. For example, given a language whose pool of variation includes two
equally robust patterns, both corresponding to known sound changes, the phonolog-
ical structure of the language might make one pattern likelier to be selected. So, in a
language with intervocalic lenition of some segment types, perhaps it is likelier that
lenition of other segment types will be phonologized. Arguments along these general
lines have been made by Martinet (1955): the structure of a language favors certain
selections.
Another possible profile for selectional bias is that the phonologization of a pho-
netic pattern may be disfavored by the structure of a language or by universal proper-
ties of language, even if the precursor pattern is phonetically robust. This position is
defended by Kiparsky (2006), who argues that final obstruent voicing never emerges
as a sound change despite what he contends is the possibility that natural changes
could conspire to yield a suitable phonetic precursor; his explanation is that there
is a universal constraint against final obstruent voicing. Wilson (2006) also suggests
that learning biases favor phonetically natural patterns. Similarly, Moretn (2oo8a,
2010) argues that comparably robust types of phonetic pattern are phonologized at
different rates. For instance, phonologized dependencies between adjacent-syllable
vowel heights are common, while interactions between vowel height and consonant
voicing are rarely phonologized despite being phonetically robust. Moretn (2oo8a)
attributes this to a learning constraint: single-feature dependencies are easier to learn.
Concerning these possibilities, we should emphasize two points. The first is that
if selectional constraints exist, they constitute a second-order bias type, operating
on patterns that are already structured along the lines we have discussed above. We
have focused here on first-order bias types because we think it is helpful to sort these
out first. Our approach thus differs from that of Kiparsky (2006), who acknowledges
that selection (constrained by universal properties of language) operates on a pool of
phonetic variation, but does not emphasize that phonetic variation is already struc-
tured. One of the key questions in phonological theory concerns the relative burden
of selectional bias, as opposed to production and perception biases, in determining
patterns of phonological typology.
The second point is that many aspects of selectional bias remain unclear. For exam-
ple, it seems plausible that a language's phonological system could make some patterns
likelier to be selected in phonologization, and it is easy to point to examples that
can be interpreted in such terms after the fact. It is harder to show that this is what
happened, and we think it is fair to say that the jury is still out. It is likewise obvious
in principle that any universal constraints on grammar in general must also constrain
82 Andrew Garnit and Keith Johnson

selection in particular, and that the discovery of selectional bias patterns with no other
explanation may be evidence for universal constraints. But the details are debated; on
final voicing compare Yu (2004), Blevins (2oo6a, b), and Kiparsky (2006). Finally
Moreton's suggestion of general learning constraints on learning seems reasonable
a priori, but requires more investigation to be securely established as a source of
linguistic asymmetries (cf. Yu 2011).

3.5.3 Lexical and morphological effects


A final system-dependent aspect of phonologization is worthy of brief discussion (we
have little to add to existing literature) because it concerns the question of condition-
ing in (ib) on p. 51 above. This question has been a source of controversy since the
neogrammarian era: what role do a language s lexical and morphological patterns play
in sound change?
Concerning morphology the question is whether the neogrammarians and many
later historical linguists are right to claim that when morphological patterns seem to
have played a role in sound change, what actually happened is that a later (indepen-
dent) analogical change has interfered with its effects. It is often possible to reana-
lyze supposed cases of morphologically conditioned sound change along these lines.
Nonetheless, the fact remains that apparently 'analogical' effects can be discerned
before a phonological innovation has become categorical. First described by Bloom-
field (1933: 364-6), who called it SUBPHONEMIC ANALOGY, this phenomenon has
been studied by Trager (1940), Steriade (2000), and others in the recent laboratory
phonology literature. At this point, we do not know in general how early in their life-
cycle, and under what circumstances, morphological patterning plays a role in strictly
phonological changes.
Concerning a languages lexical patterns, the main question has to do with the role
of word frequency in sound change.35 In the experimental literature, lexical effects
on pronunciation variation are well established. For example, less frequent words
tend to be pronounced with greater duration or greater articulatory effort than their
more frequent homophones (Guin 1995); see Gahl (2008) and Bell et al. (2009)
with references to other earlier work describing a range of leniting effects. This leads
to an expectation that leniting sound changes should show frequency conditioning
across a range of languages and historical contexts, but this expectation is not yet
well supported in the literature. To be sure, cases of the expected type have been
35
Another question concerns homophony avoidance; it has been suggested that a sound change is less
likely if it neutralizes a contrast that distinguishes relatively many words (cf. Jakobson 1931; Martinet 1955
vs. King 1967), or that a sound change can be blocked in words where it would yield homophony (Gessner
and Hansson 2004; Blevins 2005; Blevins and Wedel 2009). Research in this area is intriguing but not yet
definitive. Hume's (2004b: 229) idea that 'more practiced articulatory routines' may influence sound change
raises yet another possibility; she suggests that language-specific phonotactic frequencies may influence
the direction of changes such as metathesis. This idea is attractive, though its overall role in the typology of
sound changes remains to be assessed.
3- Phonetic bias in sound change 83

described in changes such as English vowel reduction (Fidelholtz 1975) and flapping
(Rhodes 1992), among others summarized by Bybee (2001, 2002) and Phillips (2006),
but three problems remain. First, many well-studied leniting changes show no fre-
quency effects; examples include Latin rhotacism, Verner's Law, and the degemination
of Latin geminate stops in languages like Spanish.36 If word frequency effects are
implicated in sound changes from their earliest stages, the difference between changes
where these effects vanish and changes where they persist is unexplained. Second,
the nature of the effects identified experimentally (a gradient relationship between
frequency and duration) and in studies of phonological patterns (where words may
fall into two frequency-determined groups, only one of which shows a change) are not
precisely the same, and the relationship between them is not clear. And third, more
than one sociolinguistic study has found, echoing the classical view of Bloomfield
(!933- 352-362), that ongoing changes tend to exhibit lexical irregularities only late
in their development, after they have become sociolinguistically salient, whereas 'the
initial stages of a change' are regular (Labov 1994: 542-3; cf. Labov 1981; Harris 1985).
In our judgment not enough is understood yet about the emergence of frequency
effects in sound change to build a coherent picture out of the contradictory facts.
In any case, the role played by lexical and morphological patterns in grammar and
usage is independent of the role played by bias factors for asymmetric sound change.
Important as the question is, it falls outside the scope of this chapter.

3.6 A model of actuation


Weinreich et al.s (1968) actuation question (ic), on p. 51, concerns the historically
contingent appearance of a sound change in a particular place at a particular time.
The phonetic and systemic bias factors identified above represent preconditions for
change, and determine the direction of change if it does occur, but they do not explain
why a change emerges in one community rather than another, or in one decade rather
than another. What causes actuation?
Among the elements of actuation it seems necessary to distinguish two phenomena.
First, given that bias factors are in principle present throughout a language commu-
nity, in the speech of one or more individuals there must be a deviation from the norm
for some reason. Whatever the phonetic precursor(s) of a change, someone must first
use it (or them) more often or to a greater degree than is the community norm. Second,
based on this, some other individuals must then modify their speech, or the nascent
change will not endure. Milroy and Milroy (1985) refer to the two types of individuals
as INNOVATORS and EARLY ADOPTERS, identifying social differences between them.
36
Latin rhotacism comprised an intervocalic *s > *z change followed by a *z > r change. Verner's Law was
a Germanic process of intervocalic fricative voicing (also conditioned by accent); notably, Verner (1877:
102-3) himself evaluated and rejected a frequency-based explanation of the exceptions to Grimm's Law
that motivated his discovery.
84 Andrew Garnit and Keith Johnson

Of course it is hard to observe innovators in the wild, but we can still ask the crucial
question: What causes them to deviate from the norm? Why do some individuals
speak differently from all the people around them?
To this first part of the actuation question there are several possible answers.37
One answer, following Yu (loioa, this volume), appeals to individual differences in
perceptual compensation. As discussed in section 3.3.4, perceptual compensation
ordinarily leads listeners to ignore coarticulation effects. In an exemplar model of
linguistic knowledge, this would have the effect of focusing an exemplar cloud more
closely on its phonological target. Individuals with systematically attenuated per-
ceptual compensation would therefore have more divergent exemplars in memory,
mirroring the bias patterns discussed in section 3.4, and might then produce such
variants more often.
A second possible answer would appeal to individual differences in linguistic devel-
opment and experience. For example, language learners may develop different articu-
latory strategies for realizing the 'same' acoustic target. It may be that two such strate-
gies yield perceptibly different outcomes in some contexts, such as coarticulation; this
could be the point of entry of a sound change.38 Or perhaps small random differences
in experiencedifferences in what are sometimes called 'primary linguistic data'
yield differences in the phonetic systems that learners develop.39
A third possible answer, which we explore here, appeals to differences in sociolin-
guistic awareness. The basic idea is that individuals (or groups) may differ in how they
assign social meaning to linguistic differences. We speculate that some individuals
in a language community, but crucially not others, may attend to linguistic variation
within their own subgroup but not to variation in other subgroups. If such individuals
become aware of a particular phonetic variant in their subgroup, but are unaware that
it is also present in other subgroups, they may interpret the variant as a group identity
marker, and they may then use it more often. One social parameter that may give
rise to such a dynamic is power; Galinsky et al. (2006: 1071) suggest that power may
'inhibit the ability to pay attention to and comprehend others' emotional states'. To this
we might add a converse linguistic principle: lack of power sharpens one's attention to
linguistic variation (Dimov et al. 2012). What follows is meant as a proof of concept.

37
The truth may involve a combination of answers. Or perhaps there is no answerLabov (2010: 90-
91) compares mass extinctions caused by a meteor: there is nothing biologically interesting about the causes
of a meteor collision. But for linguistic innovation, we can at least hope to find some underlying linguistic
or psychological causes.
38
Individual phonetic differences without sociolinguistic salience have been identified in English vowel
production (Johnson et al. i993b), rhotic production (Westbury et al. 1998), and coarticulation patterns
(Mielke et al. 2010); other such differences undoubtedly exist.
39
This view of how change is triggered is common in the historical syntax literature (Lightfoot 1999);
cf. Blevins's (2oo6a: 126) comment that sound change of the type she calls CHOICE can depend on simple
frequency changes of variants across generations, as well as differential weightings of variants based on
social factors... '.
3- Phonetic bias in sound change 85

We are aware that it makes sociolinguistic assumptions that remain to be tested; we


hope that this will stimulate future discussion of the details of linguistic innovation.
The approach we take, simulating the behavior of a collection of autonomous
agents, has been used by previous researchers studying language change (Klein 1966;
Pierrehumbert 2001 a; Culicover and Nowak 2003; Galantucci 2005; Wedel 2006).
Common to these and other models of phonological systems is the assumption that
speakers are generally faithful in their reproductions of the phonetic forms of lan-
guage, perhaps with the involvement of a phonetic retrenchment mechanism (Pierre-
humbert 2001 a); most also assume phonetic bias factors like those discussed above.40
In addition to these model parameters, the simulations presented below add social
variation so that social identity is a filter on variation.41
The bias factors discussed in sections 3.3-3.4 are sources of variance in linguistic
performance. Ordinarily, in the course of speaking and hearing, the phonetic distor-
tions introduced by these factors (whether in speech production or perception) do
not result in sound change. This is because listeners usually disregard the phonetic
variants introduced by bias factors. For example, as a result of categorical perception,
listeners are less likely to notice small phonetic variations within phonetic regions
associated with a phonetic category, while the same amount of variation is much
more noticeable for sounds near a category boundary (Liberman et al. 1957; Khl
1991). Perceptual compensation for coarticulation is also known to 'remove' phonetic
variation due to coarticulation; for example, nasalized vowels sound more nasal in
isolation than when immediately followed by a nasal segment (Beddor et al. 2001).
Similarly, listeners are able to detect mispronunciations and other speech errors and
may disregard them. Even simple misperceptions may be disregarded by listeners
when the speaker's intent is discernible from context, as in the similarity of can or cant
in normal conversational English. 'Corrected' misperceptions, like speech errors, may
be disregarded by listeners.
Given all this, if the usual pattern is for the variants introduced by bias factors to
be filtered out by perceptual processing, how can bias factors play a role in initiating
sound change? We suggest that at one level of representation bias variants are not
filtered out, and that they are available for reanalysis in sound change. We will further
suggest that social factors interact with bias variation in ways that lead to sound
change. Our theory linking bias factors to sound change is based on the assumption
that linguistic categories are represented by clouds of exemplars, and that speech
production is based on such constellations of remembered instances.

40
Within the framework of Optimality Theory the two assumptions correspond generally to faithful-
ness and markedness constraints (Prince and Smolensky 2004).
41
Another mechanism that has been utilized recently in multi-agent modeling of sound change is the
'probabilistic enhancement' proposed by Kirby (this volume).
86 Andrew Garnit and Keith Johnson

The rest of section 3.6 has three parts, first establishing some parameters for the
multi-agent modeling of sound change and then presenting a set of simulations. In
subsection 3.6.1, we review exemplar models of linguistic memory and relate them to
the study of sound change. In subsection 3.6.2, we review research on imitation and a
variety of factors that influence it. Finally, subsection 3.6.3 presents the simulations.

3.6.1 Exemplar memory


Exemplar-based models of phonology (Johnson 199/a, 2006; Pierrehumbert 2ooia)
are based on the idea that the cognitive representation of a phonological object con-
sists of all experienced instances ofthat object. This view of phonology is compatible
with traditional theories of sound change that have referred to similar notions in
explaining articulatory drift. Thus, already Paul (1880 [1920: 49]) wrote that sound
change is mediated by a set of'representations in memory':

Even after the physical excitement [the direct experience of articulation and perception] has
disappeared, an enduring psychological effect remains, representations in memory, which are
of the greatest importance for sound change. For it is these alone that connect the intrinsically
separate physiological processes and bring about a causal relation between earlier and later
production of the same utterance.

In his view, random variation in the cloud of representations yields gradual articula-
tory drift. Similarly, Hockett (1965: 201) wrote about a density distribution in acoustic
space measured over years:

In the long run (measured in years), the time-dependent vector that is the speech signal for
a given speakerboth for what he himself says and for what he hears from othersspends
more time in some regions of acoustic space than in others. This yields a density distribution
defined for all points of the space. The density distribution is also time-dependent, since the
speech signal keeps moving; we may also imagine a decay effect whereby the importance for
the density distribution of the position of the speech signal at a given time decreases slowly as
that time recedes further and further into the past.

The key aspect of exemplar memory models for sound change is that, in such
models, the representation of a category includes variants. This is important because
the cloud of exemplars may gradually shift as new variants are introduced. Exemplar
theory provides an explicit model of how variability maps to linguistic categorization,
and for sound change this model is important because it permits the accumulation
of phonetically biased clouds of exemplars that serve as a basis for sound change.
Exemplars retain fine phonetic details of particular instances of speech, so phonetic
drift or sudden phonological reanalysis are both possible (as will be discussed in more
detail below). Other models of the mapping between phonetic detail and linguistic
categorization assume that phonetic detail is discarded during language use, and
3- Phonetic bias in sound change 87

therefore these theories offer no explanation of how phonetic detail comes to play
a role in sound change.
There is a central tension in exemplar theory, however, which relates directly to
sound change. We mentioned above several mechanisms (categorical perception,
compensation for coarticulation, and mispronunciation detection) that lead listeners
to disregard exemplars. More generally, it has become evident that not all exemplars
have the same impact on speech perception or production. One particularly obvious
point concerns differences between the phonetic space for listening and the phonetic
space for speaking. Listeners may be perfectly competent in understanding speech
produced in accents or dialects that they cannot themselves produce. For example,
we are able to perfectly well understand our young California students at Berkeley,
but neither of us can produce a plausible imitation of this variety of American English.
The space of familiar exemplars utilized for speech perception is thus, evidently, larger
and more diverse than the space of exemplars utilized for speech production. When
we say, as above, that specific exemplars may be disregarded by listeners, this can be
interpreted to mean that the variants introduced by bias factors are not added to the
set of variants used in speech production.
Building on this idea that speech production and perception are based on different
sets of phonetic exemplars, following Johnson (i99/a) we posit that the perceptual
phonetic space is populated with word-size exemplars for auditory word recogni-
tion. We follow Wheeldon and Levelt (1995) and Browman and Goldstein (i99oa)
in assuming that the speech production phonetic space is populated with smaller
(segmental or syllabic) exemplars used in calculating speech motor plans. These
articulatory exemplars are also recruited in certain speech perception tasks, and in
imitation.
Evidence for this dual-representation model comes from a number of different
areas of research. For example, in neurophonetics Blumstein et al. (1977) noted
the dissociation of segment perception from word recognition in certain forms of
aphasia. Hickok and Poeppel (2004) fleshed out a theory of speech reception in
which two streams of processing may be active. A DORSAL stream involves the
speech motor system in perception (Liberman et al. 1967; Liberman and Mattingly
1985), and is engaged in certain segment-focussed listening tasks. More commonly
in speech communication, speech reception is accomplished by a VENTRAL stream
of processing that involves more direct links between auditory and semantic areas of
representation.
Speech errors and perceptual errors differ qualitatively as a dual representation
model would predict. In the most common type of (sound-based) slips of the tongue,
segments in the speech plan interact with each other, to transpose or blend with the
main factors being the articulatory similarity and structural position similarity of the
interacting segments. For example, the [f ] and [t] in the speech error delayed auditory
feedback > > . . . audifauditory... share voicelessness and are in the onsets of adjacent
88 Andrew Garnit and Keith Johnson

stressed syllables. Slips of the ear, on the other hand do not usually involve interaction
of segments in an utterance, but are much more sensitive to whole-word similarity
and availability of an alternative lexical parse (Bond 1999). For example, He works in
an herb and spice shop was misheard as He works at an urban spice shop and at the
parasession was misheard as at the Paris session.
Another source of support for a dual-representation model comes from the study of
phonetic variation in conversational speech (Pitt and Johnson 2003). Johnson (2004)
studied phonetic variation in conversational speech and found that segment and
syllable deletion is extremely common. He concluded that auditory word recognition
models that rely on a prelexical segment processing stage would not actually be able to
perform accurate (human-like) word recognition and that whole-word matching is a
better approach to deal with the massive phonetic variation present in conversational
speech.
Proponents of the Motor theory of speech perception (Liberman et al. 1967)
argued for a special SPEECH MODE of segment perception. We can now hypothe-
size that in experiments that require listeners to pay careful attention to phonetic
segments, this mode will dominate (Burton et al. 2000). But when listeners are
mainly attuned to the meaning of utterances, the speech mode of listening will not
be engaged (as much) and a LANGUAGE MODE of word perception will dominate.
Lindblom et al. (1995) refer to the contrast as the how'-mode vs. the what'-mode of
perception.
A dual-representation model of phonology is also consistent with several strands
of thinking in psycholinguistics. For example, Cutler and Norris's (1979) dual-route
model of phoneme monitoring (as implemented in Norris 1994) holds that phonemes
may be detected by a phonetic route, in a speech mode of listening, or via a lexical
route where the presence of the phoneme is deduced from the fact that a word
containing the phoneme has just been detected. They identified a number factors
that influence which of these two routes will be fastest. Two modes of perception
were also implemented in Klatt's (1979) model of speech perception. Ordinary word
recognition in his approach was done using a whole-word matching system that he
called LAPS (lexical access from spectra), and new words were incorporated into the
lexicon using a segmental spell-out system that he called SCRIBER. This approach
recognizes that reception of speech may call on either of these systems (or perhaps
both of them in a race).
Dual representation is important in our model of sound change because articu-
latory targets tend to be resistant to change, and in particular sound change is not
dominated by pronunciations found in conversational speech, as a naive exemplar
model might predict given the predominance of 'massive reduction (Johnson 2004)
in conversational speech. This resistance to change is consistent with the idea that the
speech mode of perception (and the consequent activation of articulatory represen-
tations) is somewhat rare in most speech communication.
3- Phonetic bias in sound change 89

3.6.2 Imitation
Laboratory studies of phonetic accommodation have shown that speakers adjust
their speech on the basis of recent phonetic experience, i.e., that phonetic targets
are sensitive to variation. In phonetic accommodation studies, subjects simply repeat
words that they hear and are seen to adopt phonetic characteristics of words pre-
sented to them (Babel 2009 on vowel formant changes; Nielsen 2008 on consonant
aspiration changes). Speech motor plans are maintained by feedback, comparing
expected production with actual production, and evidently in phonetic accommo-
dation the expected production (the target) is computed on the basis of one's prior
speech exemplars, together with phonetic representations derived from hearing other
speakers.
The feedback tuning of speech motor control can also be seen in the laboratory in
studies of altered auditory feedback (Katseff et al. 2012). In altered feedback exper-
iments, the talker hears (in real time) re-synthesized copies of his/her speech with
the pitch (Jones and Munhall 2000), formants (Purcell and Munhall 2006; Houde and
Jordan 1998; Katseff et al. 2012), or fricative spectra (Shiller et al. 2009) altered. Talkers
respond by reversing the alterations introduced by the experimenter, even though
they don't notice that a change was introduced. In both phonetic accommodation
and altered auditory feedback studies, we see the operation of a phonetic mechanism
that may be responsible for sound change: a feedback control mechanism that incor-
porates phonetic exemplars that the speaker hears others produce, or in other words
a subconscious phonetic imitation mechanism.
Studies of phonetic accommodation and altered auditory feedback have found a
number of parameters that are relevant for a theory of imitation in sound change.
First, imitation is constrained by prior speaking experience. People do not imitate
perfectly and do not completely approximate their productions to those of others
(Pardo 2006; Babel 2009). Some of the inhibition is due to the speaker's own personal
phonetic range; Babel (2009) found that vowels with the most variation in a subject's
own speech showed the greatest accommodation. We speculate, though this has not
been tested, that the degree of match between voices may influence imitation.
Second, imitation is socially constrained. People do not automatically or uncon-
trollably imitate others, but are more likely to imitate someone they identify with
at some level (Bourhis and Giles 1977; Babel 2009). This has implications for sound
change because it indicates that the use of bias variants in speech production is socially
conditioned.
Third, imitation generalizes. Thus instances of long VOT influence speech in words
or segments not heard; for example, /p/ with long VOT produces long (imitative)
VOT in /k/ (Nielsen 2008). This finding has important implications for the regular-
ity of sound change. The 'speech mode' system that we propose, by virtue of using
segment-sized chunks provides an account of the regularity of sound change (where
the receptive whole-word exemplar space would not). Interestingly, Nielsen's results
90 Andrew Garnit and Keith Johnson

suggest that phonetic features, or gestural timing relations, may be represented in a


way that they can be imitated in different segmental contexts.
Fourth, imitation is constrained by feedback in both auditory and propriocep-
tive sensory domains (Katseff et al. 2012). This finding is important because it
helps define the range of phonetic imitation that is possible with 'self-exemplars'
namely that proprioceptive feedback is involved. The implication of this is that imi-
tation may be limited by sensory factors that are not immediately apparent to the
linguist.
In addition to these observations drawn from prior research on imitation in pho-
netic accommodation, there are two general properties of imitation that we assume
in our model of sound change. First, the only exemplars produced by others that have
an impact on imitation are those that are processed in the speech mode of perception.
Our dual-representation model entails that articulatory phonetic analysis of items
does not always take place, thus not all instances of heard speech contribute to the
pool of exemplars used in computing a motor plan.42
Finally, speech production targets are calculated from a population of phonetic
exemplars as a sort of weighted average where the 'activation of each exemplar deter-
mines its weight in the calculation. Among the many factors that determine exemplar
activation the intended linguistic category obviously matters a great deal, and there
will also be residual activation from exemplars that have just been said (priming) and
exemplars activated by what you have just heard.
It may be objected that imitation does not provide a link between bias factors and
sound change, because the phonetic accommodation mechanism must presume that
some speaker in the community has already undergone a sound change toward which
other speakers are 'drifting'. According to this objection, imitation is a mechanism
for the spread but not the actuation of sound change. This fails to take account of
two facts. First, listeners do not know whether the speech they are hearing is what
the speaker intended to say, or if it has been altered by a bias factor. The listener's
inclination to imitate applies regardless of whether other speakers intend to produce
changed variants or not. Second, listeners do not know whether they are hearing
what the speaker actually produced or a perceptually distorted variant of the speaker's
pronunciation. In this case, the listener may imitate a figment of her own imagination.
In either case, phonetic accommodation yields sound change, whether the target of
accommodation is the result of a production or perception 'error' or not.

3.6.3 Simulating sound change


We implemented the assumptions discussed above in three simulations. They are
in the spirit of Labov's (1994: 586-7) suggestion that 'misunderstood tokens may
42
Several researchers studying exemplar phonology have noted that word frequency effects are not as
strong as a single-representation exemplar model would predict: Morgan et al. (under review); Pierrehum-
bert (looia).
3- Phonetic bias in sound change 91

FIGURE 3.2 Simulating Labov's (1994) conception of how 'misunderstanding' is involved in


sound change. The starting distribution graph shows histograms of vowel second formant (F2)
values of three vowels in a crowded low vowel space. The vowels overlap slightly because of
articulatory phonetic variability. The remaining panels show how the vowels shift in acoustic
space as we add heard exemplars to each vowel space. Each cycle involves sampling the space
looo times, and then recalculating the mean vowel target for each vowel category. The model
has two assumptions: (i) F 2 below 1000 Hz is unlikely, and (2) perceptually misidentified
tokens are not added to a category's exemplar cloud

never form part of the pool of tokens that are used', so that if a listener fail[s] to
comprehend [a] word and the sentence it contains.. .this token will not contribute to
the mean value' of the target segment.43 According to this view, perceptual confusion
may result in conservation of a boundary between confusable phonemes, by limiting
the exemplars of adjacent categories to only those that are correctly identified. The
results of the simulation, shown in Figure 3.2, illustrate this. We created hypothetical
vowel formant distributions that overlapped slightly and took a random sample of
one thousand tokens from each distribution. Each vowel token was classified as an

43
Simulations by Pierrehumbert (looia) and Wedel (2006) echo in various ways the simulations pre-
sented here; see also Kirby (this volume). Like many authors, Labov assumes that the mean value of a cloud
of exemplars is a rough indicator of a vowel target. This view may not be accurate (Pierrehumbert looia),
but serves as a viable simplifying assumption in our model.
92 Andrew Garnit and Keith Johnson

example of one of the three vowel categories based on its distance to the category
centers. The category centers were then recomputed, with the misrecognized vowel
tokens removed, and a new random sample of one thousand tokens was then drawn
from each vowel category. In order to make the simulation more realistic we limited
the possible vowel space and started the simulation with the back vowel (lowest Fi
value) located at the back edge of the space. This essentially fixed it in place with
a mean of about 1200 Hz. As the figure indicates, after several cycles of selective
exclusion of exemplars in the vowel categories, the category centers of the front vowel
and the mid vowel shift so that they no longer overlap. This simulation illustrates a
mechanism in speech perception that results in vowel dispersion (Liljencrants and
Lindblom 1972).
In extending this style of simulation to study how bias factors result in sound
change we included a social component. This was because we wanted to study not
only how sound change might emerge from simple assumptions about exemplar-
based phonological categories, but we also wanted a better understanding of the
normal case where bias does not result in sound change. Therefore, the remaining
simulations in this section track the development of phonetic categories in adjacent
speech communities, where a sound change occurs in the system for one group while
the other group does not experience the change. For both groups of speakers, we
constructed phonetic categories that were represented by clouds of exemplars which
include both normal variants and, crucially in both communities, a few exemplars (ten
per cent) that have been altered by a bias factor. The key difference between the groups
is whether or not the bias variants are disregarded. It seems reasonable to assume that
variants produced by phonetic bias factors are usually corrected', either by perceptual
processes like compensation or by rejection of speech errors. Stability of phonetic
categories is thus the norm. As we shall discuss, we assumed that these correction
processes were not implemented to the same degree by all speakers; one group of
speakers more actively applied perceptual compensation mechanisms than the other.
Thus, the difference between groups is modeled as a difference in the exemplars
selected by group members to define the phonetic category.
The top row of Figure 3.3 shows the starting phonetic and social distributions of
our first simulation of social stratification and sound change. The simulation tracks
the pronunciation of /z/ in two social groups. As discussed above, voiced fricatives
like /z/ are biased by aerodynamic constraints, and sometimes are realized with
reduced frication (more like an approximant). This simulation of a gradient phonetic
effect is appropriate for modeling many types of sound change including context
free vowel shifts, the despirantization of voiced fricatives, vowel fronting near coro-
nal consonants, vowel nasalization, and vowel coalescence, among other changes. In
this simulation, a bias factor produced a slightly skewed phonetic distribution. Most
productions (ninety per cent) clustered around the phonetic target value, which was
arbitrarily set to zero. A few productions (ten per cent), however, were a little biased
3- Phonetic bias in sound change 93

FIGURE 3.3 Simulation of a gradient phonetic bias. The starting phonetic and social identity
distributions are shown in the histograms. The results of a bivariate random selection from
these distributions is shown in the top right panel. Social group differences are indicated on
the vertical axis, which measures an arbitrary 'social identity' parameter. Phonetic output is
shown on the horizontal axis, where a value of zero indicates a voiced fricative production, and
a value of four indicates a voiced approximant production. The bottom panels show the gradual
phonetic drift, from iteration o to iteration 50 of the simulation, as the phonetic target includes
approximated variants for one social group, and persistent phonetic instability for the other
group who do not allow the inclusion of approximated variants to influence the target

so that the phonetic distribution has a longer tail in one direction than it does in the
other. The speech community in this simulation was also characterized by a bimodal
social stratification with fifty per cent of exemplars produced by one social group and
fifty per cent by another group of talkers. Each dot in the top right graph represents an
exemplar in the sociophonetic space defined by phonetic output and social identity.
At the start of the simulation there is no correlation between the phonetic and social
values; the bias factor is equally likely to affect the speech of each population group.
The bottom row of graphs shows how this phonetic system evolved over the course of
fifty iterations of simulated imitation.
As seen in Figure 3.3, the phonetic output of the two simulated groups of speakers
diverges. One group (centered around social identity index value o) maintained the
starting phonetic realizationa situation of persistent phonetic instability, where an
aerodynamic bias factor influences about ten per cent of all /z/ productions, but
this bias factor does not induce phonetic drift. The other group (centered around
social identity index value 6) shows gradual phonetic drift, so that by the end of the
94 Andrew Garnit and Keith Johnson

simulation the original /z/ is now /r/. Speakers in both groups are assumed to base
their productions on a cloud of exemplars (using the mean value of a set of exemplars
as a target). The difference is in the selection of exemplars to include in the cloud.
The V group, who did not experience a sound change, disregarded the phonetic
bias variantsthey successfully compensated for the bias and removed it from their
exemplar-based phonetic definition of /z/. The 6' group, who did experience the
sound change, INCLUDED the bias variants in /z/, and thus the phonetic target was
changed by the bias.
Why would different groups of speakers treat bias variants in different ways?
Although bias variants occur with equal frequency for both groups of speakers, we
assume that phonetically unusual productions may take on indexical meaning for 6'
group. Speakers who seek to identify with the group may be more likely to notice
phonetic variation among group members and thus include it in as a group index-
ical property, even though that same variation exists in the population as a whole.
Prospective group members may thus notice variants when they are produced by the
target group even though they disregard those same variants when produced by other
speakers. Considered from another point of view, a group that is aware of some social
distance from another group may attend to phonetic deviations from the norm as
marks of social differentiation.
It has to be admitted, though, that change caused by gradient bias may also be more
inevitable than change induced by more discontinuous bias factors, in that listeners
may be less likely to disregard bias variants that are only very minimally different from
unbiased variants. Thus, variation introduced by a gradient phonetic bias may be less
useful for social differentiation than a more discontinuous bias factor because it may
fuel sound change regardless of social identity factors.44 It is important, therefore, to
study the link between discontinuous bias factors (such as those introduced by speech
production or perception errors) and sound change.
To model more discontinuous phonetic bias factors such as the motor planning
errors that we posited for cases of consonant harmony, the same basic model can
be used. However, discontinuous bias is often structure preserving in the sense that
speech errors often result in sounds already present in the language, so we assume
that the basic mechanism is one of probability or frequency matching (Vulkan 2000;
Gaissmaier 2008; Koehler 2009; Otto et al. 2011). For example, we can model the
harmony process that results in a change from [s] to [J] by assuming that one group
includes harmonized instances of [J] in the exemplar cloud for /s/ while the other
group does not. Then, following Pierrehumbert (2ooia), we assume that speech pro-
duction involves a process that results in frequency matching so that the likelihood of
drawing from one or the other mode in the phonetic distribution (that is [s] or [J])
matches the frequency of exemplars in those regions of phonetic space.

44
But note that this is definitely not Labov's (1994) view.
3- Phonetic bias in sound change 95

FIGURE 3.4 Simulation of a sound change caused by a discontinuous phonetic bias (such as a
motor planning error that results in a consonant harmony)

The simulation (Figure 3.4) was structured in much the same way as the previous
one. We have a population of individuals who are evenly divided into two social
groups. We also have a phonetic distribution in which ten per cent of the output tokens
are mutated by a phonetic bias factor. In this case, though, the bias factor produces a
discontinuous jump in phonetic space. However, we cannot suppose that acceptance
of the bias variants into a phonological category would result in gradual phonetic
drift because the intermediate phonetic space may be unpronounceable, or the bias
variants are good instances of an existing phonetic category. So the average phonetic
target centered around /s/ (phonetic output equal to zero in the model) stays as it was,
as does the average phonetic target centered around /// (the bias variant, modeled with
phonetic output equal to 6). However, speakers in one group are willing to accept bias
variants as acceptable ways to say forms with an /s... // sequence, while speakers in the
other group do not accept bias variants. Thus with a frequency matching production
model, where the speaker's produced distribution of variants matches the distribution
of the exemplar cloud, the bias factor may lead to wholesale change?5
These simulations of the link between phonetic bias factors and sound change have
shown that exemplar-based models provide a useful, explicit method for studying the
45
This simulation provides a useful reminder of the importance of compensation mechanisms, for
phonetic stability. If the simulation is allowed to run over thousands of epochs the frequency matching
mechanism, plus the phonetic bias factor, leads to oscillation between [s] and [/]. The model does not
stabilize unless the group who shifted from [s] to [J] begin to treat instances of [s] as errors which should
be corrected and thus removed from the exemplar cloud.
96 Andrew Garnit and Keith Johnson

role of bias factors in sound change. We have also shown, with citations from Paul,
Hockett, and Labov, that an exemplar-based conception of human phonetic memory
is the mainstream view.46
The simulations also identified a crucial role for exemplar selection in sound
change, and in particular concluded that socially motivated exemplar selection rules
make it possible to model both sound change and phonetic stability. Building on this
finding, we speculate that a group who tend to accept bias variants (phonetic variants
caused by bias factors) is likely to be engaged in a project of social differentiation,
and are looking for cultural material that could be of value in this project. Thus,
bias variants, though phonetically confusing, may be socially useful. Although this
is stated as if it is a phonetically conscious activity, it need not be. To the extent that
changes are 'involuntary' and 'unconscious' (Paul 1880; Paul 1920: ch. 2; Strong et al.
1891: ch. i), we can speculate that a low status group who seek social identity with
each other, against some other group, may be more attentive to phonetic detail than a
group who feel secure in their social standing.
Finally, although we used an exemplar memory in all of the simulations, we used
two kinds of mechanism to model sound changephonetic target recalculation for
gradient bias factors (Figure 3.3) and frequency matching for discontinuous bias
factors (Figure 3.4). This difference relies on what Hockett (1965) called the 'Quanti-
zation hypothesis'the idea that the continuous range of phonetics is, for speakers,
divided into discontinuous quanta of phonetic intentions. In the exemplar model, the
difference boils down to whether the bias factor should be interpreted as changing
the articulatory plan for a specific gesture, or changing the production rule used to
select gestures in word production. One is tempted to associate this difference also
with neogrammarian sound change, as against lexical diffusion (as Labov 1981 did).
But there is no reason to believe that frequency matching is any less regular than
target changingthat is to say, there is no reason to think that the shifting frequency
distributions of [s] and [J] would not affect all tokens of [s].

3.7 Conclusion
In this chapter we have outlined a framework for categorizing and understanding
some key features of sound change. Much remains to be examined from this point
of view, of course, including questions only touched on above. For example, how do
processes of enhancement (section 3.5.1) work? How do we interpret lexical and mor-
phological effects in sound change (section 3.5.3)? And what actual sociolinguistic
and psychological evidence bears on the specific theories of actuation discussed in
section 3.6?

46
That is to say, the exemplar approach is mainstream in that part of linguistic research that Strong et al.
(1891: i) called the 'science of language', as opposed to 'descriptive grammar'.
3- Phonetic bias in sound change 97

TABLE 3.5 Well-established bias factors and representative changes


BIAS FACTORS REPRESENTATIVE SOUND CHANGES

PRODUCTION AND PERCEPTION BIAS


Motor planning (3.3.1) Consonant harmony; anticipatory
displacement (3.4.1)
Aerodynamic constraints (3.3.2) Rhotacism, other fricative-to-glide shifts; final
devoicing (3.4.2)
Gestural mechanics (3.3.3) Palatalization; umlaut; VN > V; vowel
coalescence (3.4.3)
SYSTEMIC BIAS
Auditory enhancement (3.5.1) Interdental fricative labialization; back vowel
rounding

We have described two broad classes of bias factors that may help explain asym-
metries in sound change. The first, our main focus (sections 3.3-3.4), consists of
bias factors emerging in speech production and perception through motor plan-
ning, aerodynamic constraints, gestural mechanics, and perceptual parsing. Despite
its familiarity, we suggested that perceptual parsing is the least securely established
factor; its prototypical examples may have other interpretations. More research is in
order on this and all the other production and perception bias factors we discussed.
Systemic constraints (section 3.5) are a second broad class of bias factors, aris-
ing from language-specific or universal features of a phonological system. This class
includes perceptual enhancement and in particular auditory enhancement, which
can yield asymmetries in sound change; selectional bias (favoring certain variants,
universally or in certain phonological systems); and perhaps lexical effects. Since some
of the bias factors in this broad class are less well established at this point, the eventual
dossier may be smaller than what we have identified. In Table 3.5 we summarize some
of the best established bias factor types in both broad classes, with a few representative
sound changes that we have mentioned.
Finally, since any full account of phonologization must address the emergence of
speech norms (in an individual or community) from occasional phonetic variants,
we have sketched the outline of a linking theory that relates them (section 3.6).
Whether this sketch and our discussion of bias factors are on the right track or in
need of substantial revision, we hope in any case to stimulate further discussion of
the phonetic bases of phonologization.
4

From long to short and from short


to long: Perceptual motivations for
changes in vocalic length
HEIKE LEHNERT-LEHOUILLIER

4.1 Introduction
The fact that sound change can be motivated by phonetic factors is rather uncon-
troversial (e.g. Ohala 1993). In particular, perceptual motivations have been invoked
and proven useful in the study of phonologization patterns (Ohala 1981, 1992, 1993;
Hume and Johnson 2001; Kavitskaya 2002).
According to Ohala's (1981) proposal, which has been widely adopted, sound
change may arise in cases when listeners misparse certain properties of the speech
signal and reinterpret what has been heard. For example, a listener may misperceive
a vowel with a falling tone, which is phonetically longer than other vowels, as phone-
mically long (see section 4.2.2), or a vowel length contrast may be reinterpreted by a
listener as a tonal contrast because certain tonal patterns consistently co-occur with
vowels of a certain quantity (see section 4.2.1). In this example, the sound change
involving vocalic length and tonal pattern may go in either direction (i.e. from tonal
contrast to length contrast or from length contrast to tonal contrast). I will call this
scenario bidirectional sound change.
However, not all sound changes are bidirectional. For example, sound changes
involving vowel height and vowel length seem to be unidirectional; accounts of
a difference in vowel length developing into a difference of vowel height do exist
(see section 4.2.3), however, a difference in vowel height has not been shown to
develop into a length contrast.1
The current study investigates this asymmetry in directionality of sound changes
involving vocalic length and tone on the one hand, and vocalic length and vowel
1
Possible counterexamples, which are extremely rare, seem to be instances of hypercorrection rather
than to be motivated by phonetic factors.
4- Perceptual motivations for changes in vocalic length 99

height on the other hand. In particular, the hypothesis that this asymmetry arises
from differences in the perception of tonal and spectral cues will be investigated by
drawing on the results of a cross-linguistic perception study. This perception study
was designed to test how tightly spectral cues (as acoustic correlates of vowel height)
and fundamental frequency (as the acoustic correlate of tone and pitch accent) are
associated with the perception of vowel duration. The rationale of the experiment was
that if listeners are sensitive to a cue regardless of whether or not that cue is used in
vowel length perception in their native language, this cue is intrinsically more tightly
associated with vowel duration than a cue that impacts only those listeners with a
specific language background; namely a language in which the cue is known to co-
occur systematically with vowel duration (i.e. extrinsically associated). The associ-
ation strength of two cues (intrinsically vs. extrinsically), in turn, can be linked to
phonologization patterns in the following way: If a cue impacts the perception of a
given dimension, such as vocalic length, in the same way for all listeners regardless
of language background, phonologization patterns will presumably reflect this by
allowing changes only in the direction that does not force tightly linked cues to
separate. For two cues that are less tightly associated, we would expect more variability
in phonologization patterns, hence allowing for bidirectionality in sound changes.
The remainder of the chapter is organized as follows: Examples of changes in vocalic
length and their interaction with tone and pitch accent as well as vowel height are
discussed in section 4.2. Section 4.3 reports on the cross-linguistic perception exper-
iment, and section 4.4 discusses the results and argues that the difference between
perceptual cues that are intrinsically linked and those that are extrinsically linked
at the very least correlates withif not motivatesthe asymmetry in sound change
patterns found in sound changes involving vowel length.

4.2 Patterns of changes in vocalic length


4.2.1 The development of a vowel length contrast from a tonal contrast
The change from a tonal contrast into a quantity contrast was reportedly the case in
the development from Middle Korean to Modern Seoul Korean (Kwon 2003). Middle
Korean was a tone language with three tones, a low tone (L), a high tone (H), and a
rising tone (LH). Whether or not Middle Korean also had a vowel length distinction
is controversial, but most likely vowel length was allophonic at best. Modern Seoul
Korean has no tonal contrast. However, it does have a vowel length contrast, even
though this contrast seems to be disappearing. There is a strong correspondence
between syllables which had a rising tone in Middle Korean, and syllables with a
long vowel in Modern Seoul Korean. Therefore, the vowel length contrast in Modern
Seoul Korean is assumed to have arisen from the tonal contrast in Middle Korean.
The examples in (i), taken from Kwon (2003: 68-73) illustrate this vowel change in
Korean.
loo Heike Lehnert-LeHouillier

(1) Middle Korean Modern Seoul Korean


nun (L) nun eye'

nun (LH) nuin snow'
mal (H) mal 'unit of measure'
mal (LH) mail 'word'
Another case where a vowel length distinction has arisen from a tonal contrast has
been reported for the Dutch Limburgian dialect spoken in Weert (Heijmans 2003).
The Weert dialect, which is spoken in an area located at the periphery of a dialectal
region with lexical tone, has developed long vowels where most other dialects in the
area have the so-called Accent II, and short vowels where Accent I is found in the
neighboring dialects. Vowels carrying an Accent II are phonetically longer than those
carrying Accent I. The main difference between the two accents is the alignment of
the/o peak with respect to the syllable onset. This development of vowel length and
its relation to the accentual patterns is illustrated in (2) with examples from Heijmans
(2003: 15).
(2) Baexem dialect (tonal) Weert dialect (non-tonal)
knim (Accent II) knim 'rabbit'
knim (Accent I) knin 'rabbits'
A similar change to that reported by Heijmans (2003) has also been reported for the
Huldingen dialect spoken in Northern Luxembourg. In Huldingen, younger speakers
have replaced the tonal opposition found in the speech of older speakers with a vowel
length contrast (Goudaillier 1987).
Assuming a view of sound change in which the listener is the source of sound
change (Ohala 1981,1992,1993; Kavitskaya 2002), the change from a tonal distinction
into a length distinction is easily accounted for. In the case of Korean, vowels with a
rising tone were most likely phonetically longer than vowels with a level tone due to
articulatory requirements. Consequently, listeners could have interpreted the tonal
distinction as a length distinction in vowels, andby adjusting their pronunciation
accordinglyinitiated the sound change from tonal contrast to vowel length contrast.
Similar scenarios are assumed to be responsible for the change from accent to vowel
length in the Dutch Limburgian dialect spoken in Weert (Heijmans 2003) and the
Huldingen dialect spoken in Northern Luxembourg (Goudaillier 1987) since in both
dialects the vowels associated with the phonetically longer accent developed into
long vowels. The only difference between the sound change in Korean as opposed to
the Dutch Limburgian and Northern Luxemburgian dialects is that in these dialects
the phonological category of vowel length already existed whereas in the case of
Korean, phonemic length emerged as a phonological category at the expense of the
tonal contrast. However, this difference is orthogonal to the current discussion of the
phonetic motivation for the sound change involving vowel length and tonal/accentual
patterns.
4- Perceptual motivations for changes in vocalic length 101

4.2.2 The development of a tonal contrast from a length contrast


An account of a tonal contrast developing from a vowel length contrast is given by
Svantesson (1989) for the Mon-Khmer languages Hu and U. The conditioning factor
of tonogenesis in Hu was vowel length, which was subsequently lost. A high tone
developed in words with an original short vowel, and a low tone in words with original
long vowel. This development is illustrated in (3) with examples from Hu and cognates
from Lamet, a closely related language that has preserved vowel length. The examples
are from Svantesson (1989: 68)
(3) Hu Lamet

jam (H) jam to die'

jam (L) jaim to cry'
A diachronic change in U also resulted in the development of tones with vowel
length as conditioning factor. However in U, the nature of the final consonant played
an important role as well. High tones developed in syllables with a short vowel and
an obstruent coda consonant, while a rising tone developed in syllables with a long
vowel and an obstruent coda. A low tone emerged in syllables containing a short vowel
closed by a sonorant coda consonant, while a falling tone developed in syllables with
a long vowel closed by a sonorant coda. The vowel length distinction in U was also
subsequently lost.
More recently, Lehiste (2003, 2004) has argued that a similar development is tak-
ing place in Estonian. As noted before, Estonian has three vowel quantities, short,
long, and overlong. Lehiste (2004) argues that a tonal contrast is currently emerging
between long and overlong vowels in disyllabic words. While the/0 in disyllabic words
with a short or a long vowel in the initial syllable rises in the first syllable and falls
in the second syllable, the /0 in disyllables with an initial overlong vowel shows a
rising-falling contour in the first syllable and a level /0 in the second syllable. In a
perception experiment Estonian listeners could not distinguish between long and
overlong vowels in stimuli with a level/0 differing in duration alone; only when the
respective/o patterns were present could the Estonian listeners distinguish long from
overlong vowels. Based on these results, Lehiste (2004) argues that a tonal contrast is
developing in Estonian between long and overlong vowels.
The phonetic motivation for the change from vowel length to tone is at least in
the case of Estonian comparable to that described for the change from tone to vowel
length. Assuming again Ohala's view that the listener is the source of sound change, the
consistent co-occurrence of the overlong vowels with the rising-falling /0 may very
well have led Estonian listeners to reinterpret the length distinction between long and
overlong vowels as one in the/0 contour, resulting in an adjustment of the production
patterns which ultimately is necessary for the sound change to happen. The phonetic
motivation for the development of a tonal contrast out of the vowel length distinction
in U and Hu is not quite as easily explained. This is mainly due to the fact that in U
102 Heike Lehnert-LeHouillier

and Hu level tones rather than falling or rising tones interact with vowel length, and
that the interaction of different /0 heights and vowel duration is not yet very well
understood. Whatever the exact mechanism underlying the pattern that short vowels
are often associated with high tones and long vowels with low tones (see Yu 2oioc
for more examples and speculations), Ohala's view of sound change may still apply to
the tonogenesis in the two Mon-Khmer languages, since the length distinction must
have been reinterpreted at some point as a tonal distinction. Otherwise the consistent
occurrence of high tones in syllables containing historically short vowels and low
tones in syllables containing historically long vowels cannot be explained at all.

4.2.3 Development of a vowel quality contrast from a vowel length contrast


A well-known example of the loss of vowel length driven by a change in the quality
between short and long vowels occurred in Late Spoken Latin. Lloyd (1987) suggests
that the length distinction in Latin vowels was lost after the high and mid short vowels
lowered in vowel height, resulting in the short high vowels /i/ and /u/ having nearly the
same vowel quality as the long mid vowels /e:/ and /o:/. Evidence from inscriptions is
cited (Lloyd 1987: 74) to show that listeners started to perceive the short high vowels
as mid vowels, which resulted in the merger of the short high vowels with the long
mid vowels and the loss of the length distinction. The changes in the vowel system
from Classical Latin to Late Spoken Latin are summarized in (4):

(4) /ii/ -> /i/ /ui/ -> / u /

7i/ /u/
^ ^,
/ei/ -> /e/ I oil -> /o/

/e/ . /o/ .
^^ ^^
// PI

/ai/and/a/ -> /a/

Changes in vowel length and vowel quality similar to those described for Latin are
also attested in Iranian Persian, however with a remaining vowel length contrast in
the low vowels /a:/ and /a/ (Windfuhr 1997: 687).
This sound change can also easily be accounted for assuming the listener as source
of the change. Since shorter vowels often tend to be more centralized and, therefore,
somewhat lower in vowel height than the corresponding long vowels, listeners may
come to reinterpret the vowel quality rather than the vowel length as the most promi-
nent feature. Consequently, listeners will adjust the production of these vowels such
4- Perceptual motivations for changes in vocalic length 103

that they are not produced with a shorter duration any more, which, in turn, will then
result in a sound change of the type observed in Latin.

4.2.4 Summary
In analogy to the interaction between tone/pitch accent and vowel quantity, we would
expect to find languages in which a quality contrast has developed into a vowel length
contrast. Such a change could also be phonetically motivated by the well known
fact that high vowels are intrinsically shorter than mid vowels, which in turn are
intrinsically shorter than low vowels (Lehiste 1970). Given this, it would be reason-
able to expect a scenario in which listeners come to reinterpret the length difference
between a high vowel and a mid vowel to be the most prominent characteristic that
distinguishes these vowels, and consequently adjust the production of the vowels such
that the original high vowel turns into a short high vowel and the vowel that was
originally a mid vowel turns into a long high vowel, as illustrated in (5):
(5) lil -> lil /u/ -> /u/

liil Ai:/

/e/ ' /o/ '

However, this scenario is extremely uncommon (see footnote i on p. 98 above). Here,


it seems, the explanatory power of Ohala's model of the listener as source of sound
change has reached its limit. As Ohala (1993) points out, listeners do in many cases
correct or normalize predictable perturbations in the speech signal. However, why
listeners would normalize for vowel length due to vowel height but not for vowel
height due to vowel length cannot be explained by the mechanisms of misperception
and reinterpretation.

4.3 Phonetic motivations for the asymmetry in patterns of sound change


4.3.1 The influence of spectral cues and fundamental frequency on
vowel length perception
This section explores the possibility that the asymmetric patterning of tone and vowel
height in changes of vocalic length, described above, may be rooted in the difference of
the relationship between duration and the phonetic cues associated with tone/pitch
accent on the one hand and vowel height on the other. There is abundant evidence
that both/o cues (as acoustic correlates of tone/pitch accent), as well as spectral cues
(as acoustic correlates of vowel height), influence the perception of vowel duration.
The interaction between vowel quality and vowel duration finds its most prominent
theoretical account in Lindblom's (1963) target undershoot model. 'Target under-
shoot' refers to a situation in which the articulators fail to reach the target position
104 Heike Lehnert-LeHouillier

for the production of a given vowel, resulting in a formant structure that places
the shorter vowel in a more central position in the acoustic vowel space. Lindblom
(1963) found that the amount of undershoot as determined by the first three formants
was directly related to the duration of a vowel: the shorter the vowel the more the
target undershoot. The original target undershoot model, which was inspired by a
damped mass-spring model of the articulators (Lindblom 1983), is rather automatic
in nature, as it assumes that the target undershoot is the result of power limitations on
the movement of the articulators. In other words, target undershoot occurs because
more articulatory effort would be required in order to reach a given target in less
time.
Target undershoot is often linked to vowel reduction processes, such as reduction
of vowels in unstressed syllables (Lindblom 1963; Engstrand 1988; van Bergem 1993;
Crosswhite 2004). In addition to being linked to vowel reduction processes, the target
undershoot model has also been called upon to account for the quality differences
between long and short vowels in languages with a vowel length contrast. For example,
Johnson and Martin (2001: 82) note about the vowels in the Muskogean language
Creek that 'short vowels are centralized relative to long vowels because of vowel target
"undershoot" in short vowels'. Although it has been shown that target undershoot is
neither a completely automatic coarticulatory process (Manuel 1987; Whalen 1990)
nor a mechanism that can be found in all languages to the same degree (Delattre
1969), many languages with a vowel length contrast exhibit vowel centralization of
the short vowel in a long/short vowel pair. Studies on the influence of spectral cues
on vowel length perception have found that listeners (Heike 1972; Sendelmeier 1981
for German; Abramson and Ren 1990; Roengpitya 2001 for Thai) are influenced in
their judgment of vowel length by spectral cues, such that the more central vowels are
judged shorter than the corresponding peripheral vowels.
Investigations of the perception of dynamic /0, such as Lehiste (1976), Pisoni
(1976), and Wang et al. (1976), found that listeners perceive vowels with a dynamic
/o (i.e. a falling, a rising, or a falling-rising/0) as longer than vowels with a level/0.
All these studies used synthetic stimuli consisting of a single vowel (Lehiste 1976 and
Pisoni 1976) or isolated vowels and non-speech (Wang et al. 1976). While Lehiste s
stimuli compared the perception of a vowel with either a rising-falling or a falling-
rising/0 contour to the perception of a vowel with a level/0 of the same length, Pisoni
(1976) and Wang et al.'s (1976) stimuli compared vowels with a falling and a rising
/o to stimuli with a level/0. Wang et al. found that vowels with a rising/0 contour
are perceived as longer than those with a falling/0. This can be accounted for by the
results found in production studies where vowels with falling tones are shorter than
those with rising tones. Vowels with falling tones were perceived in Wang et al.'s study
as longer than the vowels with a level fundamental frequency. This result was recently
replicated by Yu (2oioc), with the additional finding that vowels with a low level tone
were perceived as shorter than those with a high level tone.
4- Perceptual motivations for changes in vocalic length 105

However, other perception studies either failed to replicate these results (Rosen
J-977) or found that an increase in perceived vowel duration due to a dynamic/0
was context dependent (van Dommelen 1993). Using monosyllabic and disyllabic
words, presented either in isolation or embedded in a sentence, van Dommelen (1993)
found that German listeners only perceived vowels with a dynamic/0 as longer when
they occurred in isolated monosyllabic words. In all other conditions the perceptual
lengthening effect was reversed.

4.3.2 Cross-linguistic experimental investigations


4.3.2.1 Motivations for the perception study As discussed in the previous section,
spectral cues as well as/0 cues impact vowel length perception. The discussion also
pointed out that the impact of a dynamic/0 on vowel length perception may depend on
the language background of the listener. The cross-linguistic perception experiment
(for further detail and additional experimental conditions see Lehnert-LeHouillier
2010) was motivated by the hypothesis that the differences in the phonological pattern
of tone and vowel height in sound changes involving vowel length may arise out
of differences in the perception of the phonetic cues associated with /0 and vowel
height. More specifically, it was hypothesized that spectral cues (as correlates of vowel
height) would be more tightly associated with the perception of vowel duration than
/o cues. The association strength of two cues is assessed in this study by whether or not
listenersregardless of language backgroundshow sensitivity to the investigated
cues. A cue that is used by all listeners is taken to be more tightly associated with
vowel duration perception than a cue that requires a certain language background
in order to impact the perception of vowel length. Note that spectral and/0 cues are
assessed only with respect to vowel duration perception; these cues may, of course,
show different patterns of association strength when investigated in relation to the
perception of other categories.
In order to test the impact of spectral and/0 cues on the perception of vowel dura-
tion, a cross-linguistic perception experiment was conducted with native speakers of
Japanese, Thai, German, and Latin American Spanish. Thai, Japanese, and German
were chosen for this study since all these languages have a phonemic vowel length
contrast, and the Spanish listeners served as a control group. Furthermore, the lan-
guages differ with respect to the extent of the vowel height differences between long
and short vowels as well as with respect to the restrictions on the occurrence of a
falling fundamental frequency.
For the languages investigated here, data showing that short vowels are located
more centrally in the acoustic vowel space compared to long vowels is available for
all languages with a vowel length contrast. Most prominent is the spectral difference
between long and short vowels in German, where all long/short pairs with the excep-
tion of the low vowels [a] and [a:] are reported to exhibit spectral differences. This has
io6 Heike Lehnert-LeHouillier

been found in numerous studies since the first investigation of spectral differences
between long and short vowels by }0rgensen in 1969. Acoustic measurements on Thai
long and short vowels have also shown that short vowels are more centralized than
long vowels (Abramson 1962; Abramson and Ren 1990; Roengpitya 2001). While
Japanese is traditionally viewed as a language in which the contrast between long and
short vowels is exclusively durational (cf. Vance 1987: 13), a few acoustic studies that
measured formant values of Japanese long and short vowels did find slight differences
in the quality between long and short vowels (Nishi et al. 2008; Hirata and Tsukade
2003). The fourth language investigated, Spanish, does not have a vowel length
contrast.
The four languages also differ in the co-occurrence restrictions on falling/0 and
vowel length. In Japanese, the occurrence of a falling/0 is restricted by the phonology:
long vowels consist of two morae while short vowels consist only of one mora. Since
each mora can maximally be specified for one/ 0 target, a falling/0 contour (high
/o target on the first mora and low/0 target on the second mora) may only occur
with long vowels (McCawley 1968; Vance 1987). While phonological restrictions
on the distribution of tones in Thai look very similar to Japanese on the surface
a falling tone may occur only in CV syllables containing long vowels but not short
unstressed vowels (Abramson 1962; Moren and Zsiga 2006)the phonetic realization
of the tones of Thai reveals an important difference between Thai and Japanese. The
falling tone (HL) is phonetically realized by a rising-falling/0 contour, and the low
tone (L) is realized by a falling/0 from the mid-range to the low range (Abramson
1962; Candour et al. 1991). This means that the only tone in Thai that is realized
by a falling/o contour is the low tone, which may occur on both long and short
vowels in any syllable context (Abramson 1962; Moren and Zsiga 2006). For German
and Spanish, the occurrence of a falling /0 is not restricted to either long or short
vowels.

4.3.2.2 Study design The participants in this study were twelve native speakers of
Thai, twelve native speakers of German, twelve native speakers of Japanese, and twelve
native speakers of Latin American Spanish. All participants were presented with vowel
continua progressing from a short to a long vowel in three different experimental
conditions. All listeners performed a categorial AXB forced choice task. The stimuli in
the first condition differed in duration only, those in the second condition contained
in addition a falling/0 from 260 Hz to 180 Hz over each of the vowels. The third
condition contained conflicting cue stimuli, in which the spectral cues remained that
of the short vowel throughout the continuum while the long comparison vowel had
different spectral properties.
The stimuli for this experiment were based on the speech of a 22 year-old female
Estonian talker, who produced the vowels in the context of CV(i) syllables, where the
initial consonant was a voiceless unaspirated alveolar stop. Estonian was chosen as the
4- Perceptual motivations for changes in vocalic length 107

language from which the stimuli were drawn in order to avoid a native language
bias for the listeners of any of the investigated languages. Vowel continua for each of
the vowel pairs [ta]-[ta:], [te]-[te:], and [ti]-[tii] were created for each experimental
condition, such that there were seven stimuli on each continuum. Stimulus i on each
continuum was equivalent in duration to the original short vowel, and stimulus 7 was
equivalent in duration to the original long vowel, as produced by the Estonian talker.
The vowels were lengthened in equidistant steps from the duration of the short vowel
to that of the long vowel using Psola (Moulines and Charpentier 1990).
The stimuli for the Duration Only condition were based on the short vowel in the
original [ta], [te], and [ti] utterances. For each of the stimuli, the/0 contour of the
original short vowels was manipulated by removing the original /0 contour and by
replacing it with a level/0 of 180 Hz. Then these stimuli were lengthened in equidistant
steps.
The stimuli for the continua testing the influence of a falling/0 were also based on
the short vowel in the original [ta], [te], and [ti] utterances. However, unlike in the
Duration Only condition, the/0 contour of all stimuli in these continua was replaced
by a falling/o from 260 to 180 Hz. This means that all seven stimuli on each of the
three continua in this condition differed in duration of the vowel and by the steepness
of the /o contour. Since the slope of the /0 contour depends on the duration of the
vowel over which the fall from 260 Hz to 180 Hz is realized, the shortest stimuli on
each continuum had the steepest/0 slope and the longest stimulus on the continuum
had the slope with the least degree of steepness.
The design of the stimuli in the vowel height condition was a conflicting-cue design.
It tested whether listeners judge vowel length by durational cues alone or whether they
also take the quality of the vowel into account to make their judgments. If the spectral
cues did not influence the listeners, we would expect the same category boundary
judgments as in the Duration Only condition. However, if listeners are influenced
by the spectral cues, we should see a difference in the category boundary judgments
between the Duration Only condition and the Vowel Height condition. Just as for the
continua in the other two conditions, the stimuli in this condition were based on the
short vowel in the original CV utterances. The lengthening procedure for the vowels
was the same as in the previous two conditions. The stimuli for the continua in the
Vowel Height condition had the spectral properties of the short vowels, and a level/0.
However, unlike in the Duration Only condition, the vowel quality of the long flanking
vowel presented in either the A or B position of the AXB triad had the spectral cues of
the original long vowels, which were more peripheral in the acoustic vowel space. In
other words, the cues that conflicted were the duration and the quality of the vowel,
such that the stimuli in steps 4, 5, 6, and 7the stimuli with longer durationshad
the quality of the short vowel but the duration of a longer vowel.
The three experimental conditions (Duration Only, Falling/0, Vowel Height) were
presented in randomized order in three separate experimental blocks. Within each
io8 Heike Lehnert-LeHouillier

block, the AXB triads containing the stimuli from the three continua for each of the
vowel pairs [a]-[a:], [e]-[e:], and [i]-[ii] were also presented in randomized order.
Each stimulus was presented six times, yielding 126 trials for each block. Participants
completed a practice block of seven trials before completing the experimental blocks.
All stimuli were played over headphones directly from a PC. Each participant was
instructed to complete the task as accurately and quickly as possible. All participants
were instructed to press A' on the computer keyboard if they felt that the first and the
second stimuli in the triad sounded more alike, and B' if the second and the third
stimuli sounded more alike.
4.3.2.3 Results For the analysis of the results, the number of'short' responses, i.e. the
number of times out of the six repetitions that participants identified the stimulus as
a short vowel, was recorded. The data were then expressed in terms of a percentage of
'short' responses, as a function of the stimulus. For each subject the crossover point
from the 'short' category to the 'long' category on each continuum was determined by
first transforming the sigmoid function yielded by the raw data into a probit function.
This was done using SPSS. The 50 per cent crossover point, which is taken to be the
location of the category boundary, was then calculated using the formula in (6):
(6) x=(y-b)/m
In this formula, x is the point along the stimulus continuum where y (the percentage
of'short' responses) is 50 per cent, b is the intercept with the;/-axis and m is the slope
of the probit function. A two-way (language-vowel continuum) repeated measures
ANOVA was performed on the category boundary results from each experimental
condition, and post-hoc tests of significance using a Bonferroni paired f-test proce-
dure were performed where significant interactions were found.
In order to assess the impact of the spectral cues and the/0 cues on the perception
of vowel duration, the location of the category boundary (50 per cent cross-over)
in each of the two test conditions (Vowel Height or/ 0 ) was respectively compared
to the category boundary in the baseline (Duration Only) condition by subtracting
the category boundary values from the Duration Only condition from the respective
values in the test condition. If the difference in category boundary was significant (as
assessed by a post-hoc Bonferroni paired f-test on the category boundary results in
the two conditions being compared), it was concluded that the cue had a significant
impact on the perception of vocalic length.
Comparing the results of the category boundaries in the Vowel Height and the
Duration Only conditions, we find a significant influence of spectral cues on vowel
length perception for all listeners (F(s, 44) = 3.37; p = .02), regardless of language
background. All listeners judged vowels that they had judged to be long vowels
in the Duration Only condition, as short vowels in the Vowel Height condition.
In other words, they judged the vowels in the middle of the continuum predomi-
nantly based on spectral cues rather than duration. The paired f-tests comparing the
4- Perceptual motivations for changes in vocalic length 109

category boundary in the baseline condition to those in the Vowel Height condition
yielded statistical significance for all four languages: Thai (p < .0001) Japanese (p <
.0001), German (p < .0001), and Spanish (p < .0001). These results are shown in
Figure 4.1.
However, there is some language specificity with respect to how much spectral
cues influenced listeners' judgments of vowel length: the German listeners were
affected most by the spectral cues, while Japanese listeners showed the least sensitivity
to spectral cues, and Thai listeners were influenced somewhat more than Japanese
listeners. Post-hoc Bonferroni paired f-tests showed that German was significantly
different from Thai (p = .003) and from Japanese (p = .002), while Spanish was not
significantly different from either of the other languages. These language specific
differences in the exploitation of spectral cues in the perception of vowel duration
suggest that listeners did not simply respond in a psychoacoustic mode. In partic-
ular the fact that the spectral cues also influenced the vowel duration perception of
the Spanish listenersa group whose native language does not have a vowel length
contrastlends strong support to the hypothesis that spectral cues are more tightly
associated with vowel duration, and that no experience with a phonemic long/short
vowel contrast is needed in order to exploit this cue for the perception of vowel
length.
If we now turn to the impact of a falling/0 on the perception of vowel duration, we
find a quite different state of affairs. As shown in Figure 4.2, the perception of vocalic
length affected only the Japanese listeners significantly (p < .001). The Japanese listen-
ers judged the vowels in the mid-region of the continuumthe ones they had judged
as short in the Duration Only conditionas long in the/0 condition.

FIGURE 4.1 The difference in the location of the category boundary between long and short
vowels in the Duration Only and the Vowel Height conditions averaged across the three vowel
continua [a]-[ai], [e]-[ei], and [i]-[i:]. Asterisks indicate significance at the .0001 level
no Heike Lehnert-LeHouillier

FIGURE 4.2 The difference in the location of category boundary between long and short vowels
in the Duration only and the/0 conditions averaged across the three vowel continua [a]-[ai],
[e]-[ei], and [i]-[i:]. Asterisk indicates significance at the .001 level

Unlike spectral cues, a falling/0 seems to impact the perception of vowel duration
only for those listeners whose native language associates a falling/0 with vowel length.
As discussed in section 4.3.2.1, in Japanese, the occurrence of a falling/0 is restricted
such that it may only occur with long vowels. This co-occurrence restriction seems
to bias listeners towards a long vowel judgment when a vowel of ambiguous duration
contains a falling/0.
Furthermore, we notice that although the difference between the Duration Only
and the /0 condition was not significant for the other language groups, there is not
even the same trend apparent in the direction of how/0 impacts length perception.
While Thai and German listeners, following the statistically significant trend of the
Japanese listeners, tend to interpret the vowels with a falling/0 as longer than those
with a level/o, the Spanish listeners show a (non-significant) trend in the other
direction.

4.4 Discussion and conclusion


The results of the perception experiment reported here suggest that the two cues,
/o and spectral cues, differ in how tightly they are associated with the perception of
vowel duration. This difference in association strength between the two investigated
cues, in turn, patterns with the asymmetry (unidirectional vs. bidirectional) in sound
changes involving vocalic length: The more tightly associated cues (spectral cues and
durational cues) allow only for sound changes that do not separate these cues, while
less tightly associated cues such as /0 and vowel duration are more susceptible to
separation through sound change. Note that a further criterion contributing to the
directionality in sound change is the existence of some inherent phonetic/articulatory
4- Perceptual motivations for changes in vocalic length 111

motivations. In the example at hand, the fact that there is some inherent directionality
in how vowel height and duration pattern (the more central a vowel the shorter it is;
see 4.3.1) motivates to some degree the patterns we see in sound changes involving
vowel height and vowel length. However, a similar phonetic motivation exists for
vowel length and tonal contour (vowel duration is longer in vowels with a falling/0
contour, compared to vowels with a rising or level/0; see 4.3.1), yet we find a different
pattern in sound changes involving/0 and vowel length as well as different (although
statistically non-significant) trends in how/0 impacts vowel length perception in the
experimental study (see 4.3.2.3). In other words, while association strength in cue
perception might not be solely responsible for the asymmetry in the directionality of
sound change, it is certainly one factor in explaining the puzzle.
The question that remains is why/0 is less tightly associated with the perception
of vowel duration than spectral cues. A possible explanation for why some cues are
readily perceived by all listeners, regardless of language background, and other cues
only impact the judgment of listeners with a specific language background, may be
rooted in the articulatory organization of speech. In particular, an explanation for the
difference in the influence of/0 and spectral differences on the perception of vowel
length could be grounded in articulation. Spectral differences arise from a difference
in the shape of the vocal tract. In vowel production, these differences are predomi-
nantly caused by gestures of the tongue body. In other words, a tongue body gesture
is an intrinsic requirement for vowel productionwith the exception of a targetless
schwa. If we assumeas proposed for example by Goldstein and Fowler (2003)that
perception tracks articulation, we would expect that all listeners are sensitive to slight
spectral differences in vowels. A dynamic/0unlike an intrinsic/0is not essential
to the articulation of a vowel, and, therefore, the implicit knowledge that a certain/0
pattern is associated with a vowel or syllable has to be established, maybe by means of
categorizing speech events via an exemplar mechanism.
5

Inhibitory mechanisms in speech


planning maintain and maximize
contrast
SAM TILSEN*

5.1 Introduction
This chapter proposes that an inhibitory speech planning mechanism is involved in
the maintenance and maximization of phonological contrast. The maintenance of
contrast is of central importance to the understanding of phonologization. Generally
speaking, assimilatory coarticulation will, unchecked, lead to contrast neutralization.
Yet loss of contrast is far from the inevitable consequence of co articulation; this
implies that there exist cognitive mechanisms that oppose the phonologization of
co articulation. A complete theory of phonological change requires an account not
only of the mechanisms that lead to loss of contrast, but also the ones that preserve
contrast.
Limits on coarticulatory variation are commonly attributed to forces or constraints
that maximize the perceptual distinctiveness of contrast. Dispersion theories (Lil-
jencrants and Lindblom 1972; Lindblom 1986, 1990, 2003; Flemming 1996, 2004)
assert that there exist cognitive mechanisms which function to make speech targets
less perceptually similar. The reader should keep in mind that sound systems never
literally maximize perceptual differences between sounds, because other things, like
co articulation, often oppose the maximization of perceptual distinctiveness.
Recent experimental work on speech motor planning suggests an alternative view
of how contrast is maintained: inhibitory interactions between contemporaneously
planned articulatory targets result in dissimilatory effects, and over time these effects

* Thanks to Keith Johnson for discussions of this research. Two anonymous reviewers contributed to
the improvement of this chapter. Thanks to Yao Yao and Ron Sprouse for assistance in the University of
California, Berkeley Phonology Lab. This work was supported by the National Science Foundation under
Award No. 0817767.
5- Inhibition functions to maintain contrast 113

can prevent speech targets from becoming perceptually indistinct. For example,
experimental observations show that speakers tend to produce an [i] with more
peripheral Fi and F 2 values when they have very recently planned an [a] (Tilsen
2OO9b). Likewise, experimental results presented in this chapter show that Mandarin
speakers dissimilate tones that are planned in parallel. Findings of this sort suggest
that the planning of a speech target is influenced by other simultaneously planned
targets. These dissimilatory effects can be understood to arise from inhibitory
motor planning mechanisms, and can explain how speakers maintain and maximize
contrast.
Here the phonologization of vowel-to-vowel co articulation into vowel harmony
will serve as a representative example of phonologization processes associated with
assimilatory phonetic patterns. This sort of phonologization falls under a general cate-
gory of sound changes considered to arise from hyp o correction (Ohala 1981, i993b).
Section 5.2 describes how Ohala's listener-oriented theory of hypocorrective sound
change applies to co articulation, contextualizes this theory in an exemplar-based
model of memory, and discusses how dispersion theories model the forces counter-
acting this process via maximization of perceptual contrast. Section 5.3 will describe
experimental evidence for dissimilation between contemporaneously planned vowels
in speech, and will present new experimental evidence that indicates tones in Man-
darin exhibit the same effect. Section 5.4 discusses these experimental results, argues
that they arise from an inhibitory mechanism in the planning of articulatory targets,
and explains the importance of this mechanism for understanding phonologization:
i.e. inhibition functions to maintain and maximize contrast.

5.2 Background
To exemplify how hypocorrection leads to sound change, and how dispersion theory
models the forces opposing this process, we use carryover vowel-to-vowel coartic-
ulation as an example. Vowel-to-vowel co articulation is an assimilatory influence
upon the articulatory movements of one vowel due to the presence of a nearby vowel.
Vowel-to-vowel (henceforth V-V) co articulation is either anticipatory or carryover,
and both types have been observed in a variety of languages (hman 1966; Gay
!974> !977; Bell-Berti and Harris 1976; Fowler 1981; Parush et al. 1983; Recasens
1984; Recasens et al. 1997; Manuel and Krakow 1984; Manuel 1990). Carryover
coarticulation in V1 -V2 sequences may arise from a combination of several factors.
Mechanical constraints on the movement from the articulatory posture for V1 to
the posture for V 2 may give rise to coarticulation (Recasens 1984; Recasens et al.
1997). Another potential source of coarticulation is gestural overlap, which in the
task dynamic framework of articulatory phonology (Saltzman and Munhall 1989;
Browman and Goldstein 1986,1988, i99ob) would arise when the gestural activation
interval for V x extends into the time during which V2 is active.
114 Sara Tusen

However, mechanical constraints and gestural overlap cannot be the only sources of
V-V coarticulation because they are not expected over the observed temporal range of
V-V coarticulation, which can span up to three syllables (Fowler 1981; Magen 1997;
Grosvald 2009). A third possibility is that when the articulatory targets for V x and V2
are planned contemporaneously those targets may interact, resulting in assimilatory
shifts in the target of V2 toward V 15 or vice versa (cf. Whalen 1990). In other words,
prior to articulation, there may be variation in the formation of vowel targets that
is influenced by other vowel targets in the preceding and subsequent utterance con-
text, which are planned in parallel. Interestingly, the experimental evidence indicates
that these interactions are predominantly dissimilatory in nature, and hence tend to
oppose the effects of mechanical factors and gestural overlap.
In the highly influential model developed by Ohala (1981, i993b, i994b), V-V
coarticulation, and more generally any form of assimilatory coarticulation, can lead
to sound change through hyp o correction. In this process, sound change begins with
a 'phonetic perturbation that frequently occurs in a given linguistic context. The
sources of such perturbations can be mechanical, aerodynamic, motoric, and/or per-
ceptual. Carryover V-V coarticulation is one example. The normal functioning of the
perceptual apparatus, in this view, is to compensate for the contextually conditioned
perceptual similarity of V2 to V x . In a sense, compensation corrects' or 'normalizes'
for the perturbation in V 2 , undoing its effects on the perception and memory of the
sound.
Hypocorrection occurs when the compensatory mechanism under-corrects for
phonetic perturbations: 'in the vast majority of cases the listener (somehow) parses
the signal correctly and infers the speaker's intended pronunciation. But occasionally a
listener may misparse the signal' (Ohala i994b). The key idea here is that the perturba-
tion is 'parsed as independent of the perturbing vowel'. The correction mechanism fails
to compensate for coarticulation, and so a subtle phonetic assimilation is reinterpreted
as a new pronunciation norm. In the case of V-V coarticulation, hypocorrection leads
to vowel harmony, a contrast neutralization in which the vowels in some structural
domain (e.g. a root, stem, or word) covary in some of their features (cf. Vergnaud
1980; Rennison 1990; Krmer 2001; Finley 2008).
It is important to note that for phonologization to occur a new 'pronunciation norm'
must be established both within an individual speaker and across a group of speakers.
Exemplar theories (Goldinger 1992,1996,1998; Johnson 199/b, 2006; Pierrehumbert
2001 a, 2002) provide a useful way to understand how sound change occurs within a
given speaker. In the exemplar model of perception developed in Johnson (i99/b),
every perceived speech sound is stored in memory as a separate exemplar. The exem-
plars incorporate phonetic details of the particular instantiation of the sound, along
with a variety of contextual information and associations to categorial labels. Each
exemplar is assumed to have an activation levelits relative salience in memory,
which is influenced by its recency and potentially many other contextual factors, such
5- Inhibition functions to maintain contrast 115

as the word in which it occurred, nearby segments, the listener, speaker, etc. Hence
the memory of a sound is not an abstract category, but a large collection of detailed
exemplars that include, among other things, spectrotemporal information.
On the production side, the exemplar model described in Pierrehumbert (2001 a,
2002) uses the collection of stored exemplars to form a production target in the
following way. First, an exemplar is randomly selected, then a weighted average of the
phonetic values of similar exemplars is taken in order to form a production target.
The activation level is a factor in the weighting, and hence more recent exemplars
will play a greater role in target formation. The phonetic values are considered to be
perceptually or articulatorily relevant variables, which for vowels includes formant
values. Moreover, the categorial labels and phonetic values can be used to define a
similarity metric, allowing for a notion of'similar' exemplars.
In the context of this model, regularly present phonetic perturbations can gradually
shift the distribution of exemplars in phonetic space. For example, frequent carryover
V-V coarticulation will tend to assimilate the target of V2 to V x in that context.
This happens because each time a production target is formed, previously stored
exemplars influence the weighted averaging. Furthermore, the exemplar memory of a
given speaker is part of a network of interacting agents, each with their own exemplar
memory. If the phonetic perturbations occur with sufficient frequency across the
population, then memories of both self-generated and other-generated sounds will
feed into the sound change (cf. Oudeyer 2oo6a; Pierrehumbert 2004; Wedel 2OO4a).
Left unchecked, this will lead to partial contrast neutralization, and in the present
example, vowel harmony. What, then, opposes these tendencies?
Dispersion theories describe a formal approach to understanding the maintenance
and maximization of contrast, but these approaches do not explain how speakers
accomplish these things. There are two prominent dispersion theories we consider
here. The adaptive dispersion theory of Liljencrants and Lindblom (1972)cf. also
Lindblom (1986, 1990, 2003)models vowels as mutually repelling objects in a per-
ceptual space (e.g. a 2-D Fi, F2 space), and models vowel system organization as an
optimization problem. In contrast, the constraint-based approach of Flemming (1996,
2004) employs three goals, implemented as constraints: minimize articulatory effort,
maximize the number of contrasts, and maximize the perceptual distinctiveness of
contrasts.
Both approaches have in common an appeal to a cognitive mechanism which
functions to make perceptual contrasts maximally distinct, and both require that this
mechanism coexists with factors that indirectly reduce perceptual distinctiveness. In
the case of V-V coarticulation, both theories correctly predict that in languages with
more vowels, those vowels will exhibit a lower degree of V-V coarticulation because
there is more pressure to maximize perceptual contrast (cf. Manuel and Krakow
1984; Manuel 1990,1999; Magen 1989). However, adaptive dispersion and constraint-
based dispersion do not explain, nor purport to explain, how speakers implement
no Sam Tusen

the repulsive forces or constraints in real time; rather, they describe patterns that are
fairly removed from individual speakers and utterances. In that regard, dispersion
theories fall short of describing how contrast is maintained. Experimental evidence
presented in the next section points to an alternative understanding of contrast
maintenance and maximization, one that utilizes a well-motivated motor planning
mechanism.

5.3 Experimental evidence of dissimilation in motor planning


Recent experimental work indicates that contemporaneously planned vowel and tone
targets are dissimilated. It is argued that these dissimilatory effects arise from an
inhibitory motor planning mechanism. The experimental methodology reported on
here, as well as the theoretical analysis of results, was inspired by studies of reaching
and oculomotor control which have probed the interaction between movements
planned simultaneously. In short, with numerous variations, the nonspeech studies
show the following: when movement A to one target location is prepared in the con-
text of planning a distractor movement B to a sufficiently different target location, then
the executed trajectory of movement A deviates away from the target of movement
B (cf. Sheliga et al. 1994; Doyle and Walker 2001; Van der Stigchel and Theeuwes
2005; Van der Stigchel et al. 2006; Welsh and Elliot 2005; Houghton and Tipper 1996;
Ghez et al. 1997). In addition, more salient distractors induce greater deviations away
(Tipper et al. 2000). As we will see, these experiments are relevant to understanding
analogous effects observed in speech.

5.3.1 Dissimilation between vowels in a primed shadowing task


Tilsen (2oo9b) reports dissimilation between the vowels /a/ and HI in a primed vowel-
shadowing task. In this paradigm, the subject hears a prime vowel, then after a delay
of several hundred milliseconds the subject hears a target stimulus, which is either
a vowel or a beep. There are three types of trials: concordant trials, in which the
prime and target vowels belong to the same phonemic category; discordant trials,
in which the prime and target vowels belong to different phonemes, and no-target
trials, in which the target is a beep. On the concordant and discordant trials, the
speaker shadows (repeats) the target vowel as quickly as possible. On the no-target
trials, the speaker produces the prime vowel as quickly as possible. In order to respond
quickly, the speaker must pre-plan the prime vowel on every trial. Hence on all trials,
the speaker first plans to produce either /a/ or /i/, but on one-third of the trials (the
discordant ones), the speaker subsequently produces the other vowel. Importantly, the
paradigm allows one to investigate speech target planning interactions in a Vi-V 2
sequence without the mechanical and motoric confounds associated with articula-
tion of Vi.
5- Inhibition functions to maintain contrast 117

Acoustic analyses comparing response vowel Fi and F 2 on concordant and dis-


cordant trials revealed quasi-dissimilatory effects: /a/ responses after III primes were
acoustically less similar to /i/ than were /a/ responses after /a/ primes, and vice versa
for lil responses. In other words, on discordant trials, /a/ and lil responses were
more peripheral in Fi, Fi vowel space, as if dissimilated from the /i/ and /a/ primes,
respectively. Figure 5.1 shows normalized bivariate mean Fi and Fi 95 per cent
confidence regions for productions on concordant and discordant trials. Formant
trajectories were obtained using a Matlab implementation of a robust LPC algorithm,
and a dynamic formant tracking algorithm developed at the University of California,
Berkeley Phonology Laboratory. Formants were averaged over the middle third of
each vowel, and normalized within subjects. Each subject produced approximately
80-120 vowels in each of the conditions. The figure shows normalized values com-
bined across all twelve native speakers of American English (six male, six female) who
participated in the experiment.
Figure 5.1 shows that discordant trial productions of/a/ had significantly higher
Fi than concordant trial productions. Discordant productions of HI had significantly
lower Fi and higher F 2 than concordant trial productions. It should be noted that,
although not all subjects exhibited these patterns, dissimilation was the predominant
trend across the population. For more detailed information on the design, analysis,
and subject-specific variation, the reader should consult Tilsen (loopb). The 'dissim-
ilation observed here should be understood in a literal, phonetic sense, entailing less
similarity. These dissimilatory effects, although relatively subtle, are fairly remarkable
in that they point to a mechanism that subtly alters a vowel target as a function of
other targets that are planned in parallel.

FIGURE 5.1 Comparisons of primed vowel shadowing responses on concordant and discor-
dant trials. Ellipses represent 95 per cent confidence regions for within-speaker normalized Fi,
F2 bivariates averaged over the middle third of each response
n8 Sam Tusen

5.3.2 Dissimilation between Mandarin tones


5.3.2.1 Methodology In most respects, the experimental design of the primed tone-
shadowing task reported here is identical to the primed vowel-shadowing design
described above and in Tilsen (2oo9b), with the following important differences.
Stimuli were the vowel [ai] with Mandarin Tone i (55 high), Tone 2 (35 rising), and
Tone 4 (53 falling). To construct stimuli, 100 samples of each tone were recorded by
a female native speaker of Mandarin. These tokens were subjected to automated/0
analysis (described below), and the ones most similar in/0 to the mean contours for
each set were selected as the experimental stimuli. The stimuli were windowed to 250
ms and amplitude-normalized. Participants were twelve native speakers of Mandarin
Chinese, ages 18-30. Each speaker participated in two one-hour sessions, and only
produced two of the three tones. There were four speakers for each combination of
tones. In instruction and practice phases, it was emphasized that subjects should
produce the correct tone, should avoid starting the response with one tone then
switching to the other, and should avoid producing the tones too rapidly.
In processing the data for analysis, responses were excluded which were initiated
early (i.e. with an RT of less than 150 ms after the onset of the target tone), initiated late
(with an RT more than 2.2 s.d. greater than the mean for each subject), or which were
duration outliers (more than =b 2.2 s.d. from the mean for each subject)./0 analysis
was conducted using a robust automated pitch tracking algorithm implemented in the
Voicebox Speech Processing Toolbox (Mike Brookes, Department of Electrical and
Electronic Engineering, Imperial College) for Matlab. Analysis frames of 10 ms were
used. For each subject and tone,/0 contours were normalized by linear interpolation or
compression to the median number of frames, and then unweighted moving-aver age
smoothing with a five-frame window was applied. Because subjects occasionally pro-
duce incorrect tones, or switch from one to another during the response, it is necessary
to identify such occurrences and exclude them from the analysis. To accomplish this,
average contours and first-difference (A/ 0 ) contours were calculated for each target
tone. Then for each frame of each response, the number of standard deviations of/0
and A/o from the target and non-target averages were calculated. If more than fifteen
per cent of the frames in a response were outliers (f0 or A/0 more than two s.d. away
from the target mean), or if there were more outliers relative to the target than the
non-target, the response was considered an errorful production or mis-analysis, and
was excluded. The total number of excluded responses was about 9.5 per cent of the
total number of responses.

5.3.2.2 Results Eight of the twelve subjects exhibited significant or marginally signif-
icant dissimilation on discordant trials compared to concordant trials. However, the
interpretation of dissimilation is sometimes ambiguous due to the dynamic nature
of/o in contour tones. Figure 5.2 shows within-speaker comparisons of/ 0 - and
duration-normalized tone contours for each of the three tone combinations. Average
5- Inhibition functions to maintain contrast 119

concordant trial contours are shown with a solid line, average discordant trial con-
tours with a dotted line. Both contours are accompanied by 95 per cent confidence
standard error regions. Statistical tests comparing/0 on concordant and discordant
trials were conducted for the first, middle, and last third of each tone. Significant dif-
ferences (p < 0.05) are indicated with V, marginally significant differences (p < 0.15)
are indicated with +'.
Figure 5.2a shows results for subjects who produced Tone i (high) and Tone 2
(rising). Subjects 805, 515, and so6 show dissimilation in one or both tones, i.e. the
discordant contour for a given tone is less similar to the other tone than the concor-
dant contour. Subject su exhibits an anomalous average discordant trial contour, in
which the high tone responses appear to initially assimilate to the non-target rising
tone (which begins lower), and then subsequently dissimilate from the rising tone.
Since the non-target tone rises toward the end, it is possible to see the dissimilation
in Tone i as a form of assimilation to the rising pattern of Tone 2. In other words, the
similarity between tones can be assessed on the basis of relative/0 values, or on the
basis of a pattern of change in/0. However, this latter form of assimilation does not
appear to occur generally across the subject population.
Figure 5.2b shows results for Tone i (high) and Tone 4 (falling). Subjects sio and
si2 exhibit dissimilatory patterns, while so8 and 514 exhibit assimilatory patterns.
Note that 514, who had the largest assimilatory pattern in the experiment, produced
anomalously short tones. The interpretation of dissimilation in si 2 is based upon the
observation that the/0 in the final third of the falling discordant trial contour is further
away from the high tone contour than the concordant trial one. This is more suggestive
of dissimilation than the pattern produced by so8, for whom the discordant falling
tone both begins and ends lower than concordant one. In the so8 case, the contour
is most readily viewed as the consequence of an assimilatory contour-wide lowering
of/0; in the si 2 case, the relative fall in/0 in the final third of the falling tone is more
straightforwardly interpreted as a propensity to exaggerate the fall in/0.
Figure 5.2C shows results for Tone 2 (rising) and Tone 4 (falling). Subjects 503,
so/, and 509 exhibit a dissimilatory pattern in one of the tones. Subject 513 exhibited
no differences between the discordant and concordant conditions for either tone.
Subjects so/ and 509 tended to dissimilate Tone 2 from Tone 4 on discordant trials
by lowering/0; the effect was highly significant for so/, but marginally significant for
so9 and localized to the middle third of the contour. The dissimilation observed in 503
is of the sort identified in s 12, where the final third of the falling contour falls lower on
discordant trials, making it less similar to the rising pattern of the non-target rising
Tone 2.
Table 5.1 shows mean duration and RT data by subject, for each tone-concordance
condition. There were no significant differences in duration or RT between con-
cordant and discordant trials. One subject, so/, appears to have responded anoma-
lously slowly compared to the others. The absence of any effects of discordance on
120 Sam Tusen

FIGURE 5.2 Within-speaker comparisons of/ 0 - and duration-normalized tone contours for
each of the three combinations of Mandarin Tone i (H), Tone 2 (LH), and Tone 4 (HL). Average
concordant trial contours are shown with a solid line, and average discordant trial contours are
shown with a dotted line. Statistical tests comparing/0 on concordant and discordant trials
were conducted for averages taken over the first, middle, and last third of each tone. Significant
differences (p < 0.05) are indicated with V, marginally significant differences (p < 0.15) are
indicated with '+'. For each tone combination, all panels employ the same normalized f0 and
duration scales
FIGURE 5.2 Continued

TABLE 5.1 Mean durations and RTs for each tone and concordance condition
Tone A Tone B
concordant discordant concordant discordant
Tone A-B mean (s.d.)
DUR. (ms) sos 1-2 0.321 (0.029) 0.316 (0.029) 0.314 (0.027) 0.311 (0.028)
so6 1-2 0.137 (0.019) 0.135 (0.020) 0.142 (0.019) 0.138 (0.018)
su 1-2 0.272 (0.027) 0.269 (0.032) 0.241 (0.028) 0.247 (0.023)
sis 1-2 0.341 (0.027) 0.342 (0.030) 0.327 (0.025) 0.334 (0.029)
S03 1-4 0.325 (0.028) 0.332 (0.028) 0.310 (0.037) 0.312 (0.040)
S07 1-4 0.351 (0.029) 0.373 (0.034) 0.409 (0.042) 0.394 (0.040)
S09 1-4 0.289 (0.023) 0.296 (0.023) 0.271 (0.035) 0.273 (0.034)
si3 1-4 0.276 (0.030) 0.277 (0.031) 0.218 (0.031) 0.226 (0.025)
so8 2-4 0.278 (0.063) 0.287 (0.062) 0.245 (0.057) 0.242 (0.056)
sio 2-4 0.260 (0.032) 0.274 (0.033) 0.239 (0.021) 0.238 (0.025)
si2 2-4 0.270 (0.024) 0.275 (0.022) 0.263 (0.024) 0.255 (0.021)
si4 2-4 0.165 (0.017) 0.160 (0.013) 0.152 (0.012) 0.146 (0.011)
RT (ms) S05 1-2 0.434 (0.065) 0.417 (0.064) 0.400 (0.058) 0.428 (0.062)
so6 1-2 0.304 (0.062) 0.295 (0.069) 0.298 (0.068) 0.307 (0.067)
Sll 1-2 0.231 (0.074) 0.230 (0.076) 0.228 (O.07l) 0.221 (0.072)
si5 1-2 0.245 (0.066) 0.249 (0.062) 0.247 (0.056) 0.253 (0.064)
S03 1-4 0.262 (0.088) 0.268 (0.088) 0.266 (0.082) 0.269 (0.089)
S07 1-4 0.513 (0.052) 0.510 (0.051) 0.505 (0.055) 0.521 (0.049)
S09 1-4 0.284 (0.072) 0.281 (0.061) 0.285 (0.065) 0.284 (0.067)
si3 1-4 0.294 (0.077) 0.299 (0.081) 0.288 (0.085) 0.283 (0.074)
so8 2-4 0.423 (0.053) 0.412 (0.048) 0.394 (0.046) 0.410 (0.050)
sio 2-4 0.304 (0.083) 0.307 (0.080) 0.292 (0.078) 0.286 (0.071)
S12 2-4 0.307 (0.044) 0.308 (0.043) 0.304 (0.047) 0.308 (0.042)
si4 2-4 0.346 (0.059) 0.344 (0.059) 0.340 (0.058) 0.343 (0.071)
122 Sam Tusen

duration or RT indicates that the dissimilatory/0 patterns cannot be interpreted as


indirect consequences of differences in reaction time or tone duration between the
two conditions.

5.4 Discussion
To summarize, a majority of subjects exhibited dissimilation on discordant trials, in
at least one of the tones. However, substantial inter-subject variability was observed
in this regard, along with instances of assimilatory patterns. Section 5.4.1 will address
the potential sources of this variability. Section 5.4.2 will argue that the dissimilatory
patterns arise from an inhibitory motor planning mechanism, and section 5.4.3 will
explain how this inhibitory mechanism may be responsible for the maintenance and
maximization of contrast.

5.4.1 Intersubject variability


Not all subjects exhibited dissimilation in both responses of the tone-shadowing task,
and some of them produced assimilatory patterns. This variation is consistent with
the results of primed vowel-shadowing in Tilsen (2oo9b), and does not negate the
importance of the dissimilatory behavior. If one views the mechanism of dissimilation
as learned, or perhaps, as innate but modulated by context and experience/learning,
then one should expect speaker-specific variation in its effects. The mere presence
of the dissimilation in some speakershere the majoritybegs for an explanation.
Moreover, there are a number of factors which may mask the output of the dissimila-
tory mechanism in some cases.
For one thing, there may be ceiling or floor effects attributable to/0 register. Some
speakers may not normally produce/0 above or below a certain range; thus where a
dissimilatory pattern would raise or lower/0 beyond that range, no dissimilation is
produced. This could account for the near-absence of dissimilation in the initial third
of Tone 4 (falling), since this tone tends to begin at the top of the normal f0 range.
Stimuli and speaker gender may also have an influence on dissimilatory behavior,
although the current design was not well-suited to analysis of such effects. It is also
possible that variation results from differences in attention to the task. In Tilsen
(2OO9b), subjects who produced assimilatory patterns either responded abnormally
slowly or with high error rates, indicative of a lack of focushere, however, no such
correlation was observed.
It is important to consider why dissimilatory patterns are not generally observed
in paradigms where speakers execute both elements of a sequence. For example, in
studies of articulated VCV sequences or tonal coarticulation (e.g. Shen 1990; Xu 1997;
Candour et al. 1994), assimilatory patterns are by far the predominant ones. This
is presumably because mechanical factors, gestural overlap, or other sources of assimi-
latory coarticulation tend to overwhelm the dissimilatory effects of contemporaneous
5- Inhibition functions to maintain contrast 123

target planning. The primed-shadowing task circumvents these effects by inducing


the speaker to plan, but not articulate, the first element of the sequence. The
assimilatory patterns in fully articulated sequences are, like dissimilation in primed
vowel-shadowing, only tendencies. There is, indeed, one study that has reported a
dissimilatory effect between articulated vowels: Fletcher (2004) found a slight dissim-
ilation between /a/ and /i/ in Southern British English /a kaki/ sequences, particularly
for one subject. Furthermore, on a token-by-token basis, dissimilation is still observed
in articulated sequences, and the extent to which assimilation or dissimilation occurs
in natural speech, where various phonological, prosodie, semantic, and discourse
factors are uncontrolled, is not well known. One should not conclude, just because
assimilation is the tendency observed in the lab, that only assimilation occurs outside
the lab.

5.4.2 Dissimilation is caused by inhibitory mechanisms


Inhibitory mechanisms have been broadly implicated in the control of sequential
movement, and are necessary for understanding how action sequences are performed
when actions are planned contemporaneously. Lashley (1951), on the basis of obser-
vations of anticipatory and perseveratory errors in movement sequences, argued that
plans associated with each element in a sequence must be activated in parallel. Par-
allel activation has found experimental support in studies of prepared movement
sequences, for example in a series of experiments conducted by Sternberg et al.
(1978, 1988). They showed that the number of syllables and number of interstress
units (or feet) in an utterance have independent, additive effects on the latency to
initiate the utterance. Similar results were obtained for typing, and in a related speech
paradigm by Wheeldon and Lahiri (1997, 2002). The theoretical interpretation Stern-
berg and colleagues offer for these findings is as follows. Prior to the initiation of
an utterance, all action units are active in working memory. All but the first unit
must be suppressed to initiate the utterance. Hence the more units there are, the
longer it takes to inhibit the non-initial ones and begin the first one. The concept
of competition between movement plans activated in parallel has been modeled as
competitive queuing in neural networks (Grossberg 1978; Bullock and Rhodes 2003;
Bullock 2004).
The dissimilation of movement targets and trajectories from competing ones has
been theoretically related to inhibition. Houghton and Tipper (1996) and Tipper et al.
(2000) argue that deviation away from distractors is the result of selective inhibition
of motor plans associated with the distractor. In this view, the trajectories and targets
of movements are represented by activity patterns in overlapping populations of neu-
rons. In order to move to one target, other movement plans that are simultaneously
active in working memory must be selectively inhibited. Moreover, because the neu-
ral populations encoding for these plans overlap to some degree, inhibition of one
population can have an effect on the population encoding the target movement.
124 Sara Tusen

FIGURE 5.3 Simulation of the effects of intergestural inhibition on concordant and discordant
trials with an HI target. Stage i shows excitation functions after the prime vowel. Stage 2 shows
excitation and intergestural inhibition functions after the target stimulus. Stage 3 shows the
activation function from which a production target is derived. For comparison, the concordant
(black ) and discordant (white o) centers of activation are shown in both activation functions

Dissimilatory patterns observed in primed vowel- and tone-shadowing can be


understood to arise from intergestural inhibition in the context of an exemplar theory
of production. Figure 5.3 uses schematized planning stages to model the effect of
inhibition in a vowel-shadowing task. The figure compares /i/-exemplar activation
in Fi, F 2 space on a concordant (left) and discordant (right) trial. Since each vowel
is associated with many exemplars, it is reasonable to approximate the excitation of
exemplars having any particular pair of Fi and F 2 values with a smooth function. In
this case a bivariate Gaussian is used, though the concept generalizes to any relatively
smooth function. The excitation function, minus any inhibition, constitutes a net
activation function which can be viewed as a probability that a given Fi, F2 bivariate
target will be produced. In Stage i, after the prime vowel stimulus, Figure 5.3 shows
that the excitation function is substantially greater for prime vowel exemplars than
for non-prime exemplars. Since the probability of producing the prime is 2/3, the
summed excitation of prime vowel exemplars is twice the excitation of the non-prime
exemplars.
5- Inhibition functions to maintain contrast 125

In Stage 2, when the target is known, intergestural inhibition is applied and the
target exemplars are fully excited. The inhibition function, shown to the right of the
Stage 2 excitation function, is modeled as a bivariate Gaussian located on the center of
mass/activation of the non-target excitation function for /a/-exemplars. There are two
important aspects of the inhibition. First, inhibition of the non-target /a/-exemplars
is greater on the discordant trial than on the concordant trial. This is justified by
the observation that more salient distractors produce stronger dissimilatory effects
(Tipper et al. 2000). In other words, more inhibition is necessary on the discordant
trials because the non-target prime was more highly excited. Second, the inhibition
function is non-zero throughout the region of Fi, F 2 space where the target /i/-
exemplars are located, and crucially, the inhibition is greater on the side ofthat region
closer to the non-target /a/-exemplars. From these two characteristics, it follows that
the center of mass of the activation function (excitation minus inhibition, shown in
stage 3) is shifted further away from the non-target on the discordant trial, compared
to the concordant trial. In Stage 3, the concordant (black ) and discordant (white o)
centers of activation are shown in both activation functions, for purposes of compar-
ison. The Fi, F2 difference between discordant and concordant trials is about [-30,
55] Hz. Model equations and further details of implementation can be found in Tilsen
(2007).
A more complicated version of this model would treat a larger number of phonetic
variables, as well as dynamical aspects of speech targets. After all, vowel formants
are often dynamic, and Mandarin tones exhibit substantial change over time; this
must be incorporated into target planning and should therefore be subject to dis-
similation. Exemplar theory allows for modeling time as an additional dimension of
exemplar space (cf. Johnson 199/b), so that memories incorporate spectrotemporal
information. Hence the model proposed above should be generalizable to higher-
dimensional exemplar spaces with a temporal dimension. It is also noteworthy that
the model does not require one to commit to representation in either perceptual
or motoric coordinate space. Acoustic coordinates were used here for expository
purposes only.

5.4.3 Intergestural inhibition, coarticulation, and contrast


In the context of an exemplar model, intergestural inhibition can function to maintain
contrast and maximize the use of a phonetic space. Consider once more the phonol-
ogization of V-to-V coarticulation to vowel harmony. First, the dissimilatory effect of
intergestural inhibition to some extent opposes assimilatory co articulation between
Vi and V 2 by subtly dissimilating the target of V 2 from V 15 and perhaps vice versa.
On average, the tendency in VCV sequences appears to be assimilatory coarticulation,
due perhaps to some combination of mechanical factors and gestural overlap. This
suggests that these factors tend to outweigh the effects of intergestural inhibition. Over
li Sam Tusen

time, if unconstrained, this situation could lead to loss of contrast, i.e. phonologization
of vowel harmony.
However, the inhibition model also predicts that as V x and V2 exemplar distribu-
tions shift closer in phonetic space, the strength of intergestural inhibition will become
greater on the region of V2 exemplar space (this follows as long as the inhibition
function remains constant over time). In other words, closer targets are more strongly
dissimilated. In some cases, this stronger inhibition will not dissimilate the target of
V2 enough to prevent loss of contrast, but in other cases, the dissimilation may be
strong enough to do so. The exemplar distribution in the latter case will come to reflect
a balance between the assimilation from coarticulatory forces and the dissimilation
from inhibitory ones. This balance is precisely what is described by dispersion theo-
ries. Indeed, intergestural inhibition can be seen as a mechanism through which the
speaker attempts to maximize contrast on an utterance-by-utterance basis. Whether
or not a relatively stable balance occurs in any given language is likely to depend on
many factors, particularly on vowel and consonant inventories of a language and co-
occurrence frequencies of the units in VXV sequences. Ultimately, what intergestural
inhibition provides is a real-time, utterance-anchored mechanism for maintaining
and maximizing contrast. Contrast is never fully maximized because highly variable
coarticulatory forces are always influencing the exemplar distribution, but dispersion
theories likewise do not predict that a phonetic space is actually maximally used
they only posit a tendency toward this.
Hence intergestural inhibition is not a priori mutually exclusive with perceptual
dispersion or perceptual correction. It can be seen in two ways, either as operating
alongside perceptual mechanisms, or as the underlying basis for them. It is also
reasonable to see inhibition both as an intrinsic aspect of how working memory
operates and as something modulated by experience. Whenever articulatory plans
are brought into working memory, the serial ordering of those plans is accomplished
by interacting excitatory and inhibitory processes; the production of one articulation
requires the simultaneous suppression of others, yet the extent to which inhibition
is exerted between plans is inferred and learned from the linguistic experience of a
speaker.
One problem with dispersion theories is that they lack an account of how articu-
latory targets are planned so as to maximize perceptual contrast. These theories hold
that the speaker, for functional reasons, produces sounds that maximize perceptual
contrast. However, there is limited evidence for a real-time perceptual dispersion
mechanism. The most suggestive evidence to date is the hyperspace effect reported
in Johnson, Flemming, and Wright (i993a) and in Johnson (2000). In Johnson et al.
(i993a), listeners identified the 'best' examples of a range of synthetic vowel stimuli
as the ones that were more peripheral than their own productions. The source of this
difference can be interpreted as a consequence of target undershoot in production, or
as the result of an active perceptual process. An alternative account of the hyperspace
5- Inhibition functions to maintain contrast 127

effect is suggested by mounting evidence that the perception of a sound involves


simulation of the corresponding motor activity that speakers would use to produce the
sound themselves. It is well-established that activity in cortical premotor and motor
regions, via the mirror system, accompanies the perception of actions (including the
production of speech sounds), and that this motor activity plays an important role in
accurate and quick perception (D'Ausilio et al. 2009; Watkins et al. 2003; Fadiga et al.
2002; Pulvermller et al. 2006; Rizzolatti and Craighero 2004; Gallantucci et al. 2006;
Gallese et al. 1996). Hence, the 'best' example of a speech sound may correspond to
the motor simulation which involves maximal inhibition of related speech targets. In
other words, the best /i/ target would be the one formed when other vowel sounds
are maximally suppressed, and hence, the most dissimilated /i/ target. This reasoning
could extend to the selection of as you say it' examples, and to the stimuli which were
used to avoid consonantal context and talker-unfamiliarity confounds in Johnson
(2000). In sum, the hyperspace effect could very well involve a perceptuo-motor
mechanism which relies heavily on intergestural inhibition in the motor simulation
of sensory stimuli.
Worthy of mention is an alternative account of perceptual correction that involves
lower-level perception, advocated by Holt, Lotto, and Kluender (2000). They suggest
that on very short timescales, a general mechanism of neural adaption to a perceptual
stimulus results in a subsequently diminished perceptual response to acoustically
similar stimuli. It is likely that both low-level perceptual adaptation of this sort, and
higher-level inhibitory interactions associated with more categorial speech percepts,
are involved in perceptual compensation.
In sum, intergestural inhibition in motor planning is important for understanding
what limits assimilatory coarticulation and its phonologization. The effects of inhibi-
tion are manifest on two timescales: in the real-time planning of speech targets, and
indirectly on a diachronic timescale by virtue of dissimilating exemplar distributions.
Perceptual dispersion can be seen as a pattern emerging from intergestural inhibition,
exemplar memories, and interacting agentsas opposed to a cognitive mechanism in
and of itself. It is of course possible to see the production-perception interaction in a
causal loop, whereby the diachronic selection of more contrastive sounds reinforces
the extent of motor planning dissimilation. Ultimately, we must conclude that there
is another domain of constraints on phonological change that is neither strictly per-
ceptual nor strictly articulatory. Such constraints arise not from perceptual discrim-
inability, nor from physical forces or temporal overlap of articulations, but rather,
from cognitive mechanisms governing the planning, suppression, and execution of
sequential movement.
6

Developmental perspectives
on phonological typology
and sound change
C H A N D A N NARAYAN

6.1 Introduction
The relationship between first language acquisition and phonologization lies at the
crossroads of developmental psychology and historical phonologydisciplines not
often considered in the same breath when addressing the nature of sound patterns
and change. Despite these traditional boundaries I believe that this combined research
program can make significant contributions to a more nuanced understanding of why
sound systems look the way they do and change in particular directions. The present
chapter deals with the relationship between the earliest stages of language acquisition
and the shape of phonological systems and phonological processes including sound
change. The term 'developmental' encompasses both the dynamic nature of the cogni-
tive mechanisms underlying infants' and very young children's emerging organization
of their acoustic-phonetic environment as well as the nature of the linguistic environ-
ment itself. Of particular significance here is the potential contribution of develop-
mental processes (infant speech perception and caregiver speech production) to the
phonologization of acoustic variance in the input. The scope of these developmental
contributions is not limited to the infant and her abilities but also characteristics of
the unique register used by caregivers when interacting with infants. This research
program asks two questions:

1. Is there a relationship between patterns in developmental speech perception and


the relative rarity in sounds in phonological systems of the world's languages?
2. What is the role of caregiver-infant interactions in providing acoustic conditions
which could potentially lead to sound change?
6. Developmental perspectives on phonological typology and sound change 129

My approach to these questions is guided throughout by the notions that phonological


inventories reflect, to some degree, sufficient acoustic-perceptual salience between
contrasts (Liljencrants and Lindblom 1972; Lindblom and Maddieson 1988; de Boer
2001; Oudeyer 2oo6b; Narayan 2008), that misperception can lead to sound change
(Ohala 1981), and that the adaptive nature of speech production (Lindblom 1990) has
the potential to create less-than-ideal learning situations for infants. I argue that (i)
those contrasts for which infants require language experience in order to discriminate
are those that are rare in phonological systems and often the targets of change, and (2)
that the relationship between developmental speech perception performance and the
shapes of phonological systems is mediated by infants' initial psycho acoustic biases
and the acoustic salience of the contrasts in question. With respect to sound change,
it must be made clear, at the outset, that the present discussion does not deal with
the diffusion of sound change across a community of learners (children), but rather
with what some have termed the 'seeds' of sound change (Hombert et al. 1979). In
examining the structured variability inherent in the acoustic signal available to very
young children (e.g. Foulkes et al. 2005) and the perceptual biases children bring to the
language-learning table, I explore possible developmental influences on the directions
of phonologization.

6.1.1 Children's productive phonology


History has provided us with numerous examples of how linguists have approached
the connection between development, phonological typology, and sound change
(Herzog 1904; Sweet 1913; Baudouin de Courtenay i895[i972a]; Grammont 1933;
Jakobson 1968; Jakobson and Waugh 1979). The approach in much of this litera-
ture examines performance factors in development rather than children's compe-
tence. That is, linguists have asked whether children's emerging productive phonology
resembles well-known sound changes. For the most part, the answer to this question
has been no, for the types of phonological changes that are reflected in children's
productions are too varied to be reflections of real, phonetically motivated sound
changes. An early appraisal of this state of affairs is provided by Baudouin de Courte-
nay (i972a):
... when the child has not yet begun to talk but is already aware of the properties of the native
language and can understand it within certain limits, that is, when the child has reached the
state of advanced audition and perception, but without phonation, there naturally cannot be
any question of neophonetic alternations or divergences, since these depend on individual
pronunciation.

Others have viewed the relationship between children's productive phonology and
phonological change more sympathetically and directly (e.g. Labov 1989). Grammont
(1933) suggests that children's productions are a 'microcosm' of historical change,
while Stampe went a step further in suggesting that children are the prime agents
in phonological change (Stampe 1972). Greenlee and Ohala (1980) argue that both
130 Chandan Narayan

children and adults are responsible for the type of phonetic variation that can lead to
sound change. Under the rubric of Ohala's misperception-based sound change (Ohala
1981), where physical constraints on articulatory and perceptual dynamics lead to the
phonologization of variation, Greenlee and Ohala (1980) outline the shifts of child
phonology that are similar to diachronic processes (e.g. French V > Vrj in French
loans in Vietnamese and children learning French).
More recently, linguists have suggested that the relationship between child phonol-
ogy and historical phonology should be played down precisely because 'typical or
potential sound changes' do not match observed phonological states in children's
production (Kiparsky 1988). Blevins (2004) argues that the mismatch between chil-
dren's productive phonology and sound changes (i.e. the types of production mistakes
made by children do not always look like typical sound changes) is a non-problem,
as the enterprise of child phonology does not necessarily assess competence (in the
form of perceptual acuity) but rather performance factors very likely governed by
physiological development (see also Hale and Reiss 1998). She outlines children's
productions, described in terms of phonological rules, as falling under two categories:
those resulting from immature articulatory development, or secondly true 'mini-
sound changes' which may spread through a community of speakers. As Blevins and
others have argued, the problem with looking to children's productive phonology for
clues to directions in phonologization is that articulatory and perceptual capacities
in the first few years of life mature along differing time scales, with motor control
and oral tract development lagging behind the shaping of perceptual competence.
At the earliest stages of language acquisition, production is not necessarily a reflec-
tion of competence (perceptual discrimination and categorization), with perceptual
acuity becoming honed well before infants' production of their first word at around
twelve months.
The clearest demonstration of the connection, and perhaps influence, of children's
productions and phonological phenomena can be seen in typological inventories.
Sound change aside, linguists have recognized the connection between the age of
productive acquisition of phonologically relevant phones and the relative rarity of
these sounds in phonological systems (Ferguson 1973). In general, age of successful
production can be described as exponentially related to frequency of occurrence in
the world's sound systems, that is, the more rare the consonant, the later its pro-
ductive acquisition.1 Figure 6.1 plots the age of productive 'mastery'2 of consonants
1
While there is certainly a relationship between the frequency of occurrence and the emergence of
certain phonological structures (see Levelt et al. 1999; Demuth and Johnson 2003; Rose 2009), the relation-
ship between accurate production of individual phones and the frequency of those phones in the ambient
language of the child is less clear than the overall typological frequency across languages. Table 6.3 (p. 146
below) provides a table of the frequency of consonants in the Brent corpus of infant-directed speech (Brent
and Siskind 2001).
2
Mastery of English consonants in Templin's (1957) study is described as 75 per cent accuracy, while
a more strict criterion of 90 per cent is used in Hua and Dodd's (2000) and Amayreh and Dysons (1998)
studies.
6. Developmental perspectives on phonological typology and sound change 131

FIGURE 6.1 Age of production mastery according to frequency in the UPSID (Maddieson
1984) in American English (Templin 1957), Putonghua (Hua and Dodd 2000), and Jordanian
Arabic (Amayreh and Dyson 1998). Values are jittered within each year

by American English-, Putonghua Chinese-, and Jordanian Arabic-learning children


against the consonants' frequency of occurrence in the UPSID (Maddieson 1984).
The plots show a general cross-linguistic trend with simple, oral obstruents being
produced very early in productive development, intermediate acquisition of voice-
less sibilants, and late production of interdentals, dorsals, affricates, and obstruents
with secondary articulations (e.g. pharyngealized stops in Arabic). This relationship
suggests that languages are less likely to exhibit sounds which are articulatorily more
difficult for children (as measured by the relative lateness of their mastery).
We turn next to the flip side of the production/typology connection. In the next sec-
tion, I ask whether typological generalizations can be derived from the performance
of infants in speech perception tasks, that is, is there a relationship between the types
of contrasts that infants fail to discriminate and contrasts that are rare in the world's
sound systems?

6.2 Infant speech perception and phonological typology


The literature on infant speech perception presents remarkable evidence of the
capacity for infants to discriminate non-native phonetic contrasts across a host of
genetically unrelated languages. Infants as young as one month have been shown
132 Chandan Narayan

to discriminate non-native contrasts that adults find difficult to discriminate (Eimas


et al. 1971; Trehub 1976). For example, Trehub (1976) showed that English-hearing
infants aged 5-17 weeks successfully discriminated a French oral-nasal vowel con-
trast ([pa]-[p]) and a Czech fricative contrast ([za]-[ja]). When English-speaking
adults were asked to discriminate the Czech contrast, they performed at chance
levels. This phenomenon is perhaps best captured by the work of Werker and col-
leagues, who showed that discrimination of non-native contrasts follows a distinc-
tive developmental pattern. Werker and Tees (1984) showed that English-learning
infants aged 6-8 months discriminated non-native consonant contrasts (Hindi voice-
less dental/retroflex [ta]-[ta]; Hindi dental voiceless/voiceless aspirated [t h a]-[d h a];
Nlaka'pamux velar-uvular ejective [k'i]-[q'i]). By 10-12 months, however, English-
learning infants failed to reliably show discrimination of the same contrasts. At 10-
12 months, infants from both Hindi-speaking and Nlakapamux-speaking homes
discriminated their native contrasts. This same pattern of perceptual reorganization
was subsequently found for other consonants as well (e.g. Ill-III with Japanese- and
English-learning infants, Tsushima et al. 1994). The perceptual reorganization has
been shown to occur earlier for vowels, with infants' non-native discrimination abil-
ities declining by six months (Khl et al. 1992; Polka and Werker 1994). Results like
these, showing the effects of experience, led infant speech perception research to
converge on the idea that infants come into the world with language-general speech
perception, which becomes tuned to the particular phonetic characteristics of their
native language by the end of their first year. While the generalization that infants are
born citizens of the world is compelling from a neural plasticity point of view (e.g. Hut-
tenlocher 2002), it is certainly not the complete picture of the nature of infant speech
perception.
Infants' performance on speech perception tasks generally lies on a cline of more
or less discriminability, with developmental profiles being mediated by the inter-
section of innate perceptual bias, psychoacoustic salience, and language experience
(Aslin and Pisoni 1980; Narayan et al. 2010). Recent work has demonstrated that
certain contrasts follow a path of facilitation, whereby initially poor discrimination
is enhanced with native language experience (Polka et al. 2001; Khl et al. 2006;
Narayan et al. 2010). The facilitation of discrimination highlights the fact that initial
speech perception abilities are poor (or undetectable using behavioral methods) for
certain contrasts; not all phonetic contrasts are perceptually equal for the infant and
the relative difficulty is mediated by acoustic salience. I outline below some instances
in the infant speech perception literature that reflect a connection between relatively
poor discrimination, acoustic salience, and typological frequency.

6.2.1 Case studies


Nasal place of articulationIn Narayan (2008) I examined the relationship between
nasal-place acoustics and nasal place typology in the world's languages. In general,
6. Developmental perspectives on phonological typology and sound change 133

FIGURE 6.2 Proportion RMS energy change from nasal murmur to post-nasal vowel in
Bark 5-7 and 11-14 for [ma](), [na](*), and [rja](o) in Filipino. Used with permission from
Narayan (2008)

languages are more likely to exhibit a two-way /m/-/n/ contrast than a three-way
/m/-/n/-/n/ contrast in syllable-initial position (Maddieson 1984; Anderson 2008).
I argued, based on static (Fi x/ 3 frequencies at the onset of the NV transition) and
dynamic acoustic properties (RMS energy change from nasal murmur to V) of the
three nasal places in Filipino (Figure 6.2) and corresponding discrimination tests with
adult Filipino-speaking listeners, that the acoustic-perceptual salience of the /m/-/n/
distinction is more robust than /n/-/n/. Both static and dynamic acoustic measure-
ments showed better classification (with discriminant analyses) of the /m/-/n/ and
/m/-/n/ distinctions than the /n/-/n/ contrast where tokens showed significant over-
lap along the critical acoustic dimensions. Consequently, the /n/-/n/ distinction is
disproportionately affected by adverse listening conditions. In the noisiest listening
condition (~5dB SNR), discrimination of the [na]-[na] contrast fell to chance while
discrimination of both [ma]-[na] and [ma]-[na] remained near ceiling.
In a follow-up study I examined the perception of nasal place contrasts in Filipino-
and English-learning infants (Narayan et al. 2010) using the Visual Habituation
technique.3 Following on from the typological and acoustic-perceptual results from
Narayan (2008), the [na]-[na] contrast proved difficult for both groups of infants.
English-hearing infants at 10-12 and 6-8 months discriminated the acoustically
robust and typologically common [ma]-[na] contrast. English-learning infants did
not reliably discriminate the acoustically fragile [na]-[na] contrast, even at 6-8
months, an age when other non-native (oral) consonant contrasts are successfully
3
See Werker et al. 1998 for details regarding infant speech perception methods.
134 Chandan Narayan

discriminated. Even very young English-learning infants (4-5 months) were unable to
discriminate [na]-[na] while they successfully discriminated the acoustically robust
[ma]-[na] contrast. When 10-12- and 6-8-month-old Filipino-learning infants' dis-
crimination of the native [na]-[na] contrast was tested, only the older group showed
discrimination.
The results from Narayan et al. (2010) are suggestive of a role for acoustic salience in
developmental speech perception. The [na]-[na] contrast, which is acoustically fragile
(relative to the robust [ma]-[na] contrast), is poorly discriminated in early infancy
and only successfully discriminated with appropriate language experience by the end
of the first year. I would suggest that infants' difficulty discriminating the perceptually
similar syllable-initial [n]-[n] contrast contributes to the typological restrictions on
nasal onsets and the directions of sound change patterns observed in nasals in the
world's languages (i.e. Proto-Austronesian syllable-initial *m, n y r j > Thao, Malagasy,
Tetun, Hawaiian, Tahitian m> n).

Fricative contrasts: /f/-/9/ and /s/-/z/Dental, non-sibilant fricatives are rare in the
world's sound systems, occurring in only 3.99 per cent of the languages surveyed in the
UPSID (Maddieson 1984). In the WALS database of 567 genetically diverse languages,
they occur in just 43 (7.6 per cent) (Maddieson 2008). Correspondingly, contrasts
involving dental fricatives have been shown to pose problems for infants in speech
discrimination tasks. In a series of studies in the 19705, Eilers and colleagues showed
that English-hearing infants at both 6-8 months and 10-12 months fail to accurately
discriminate the English labiodental-interdental fricative place distinction ( [fa] - [0a] )
(Eilers 1977) using the Conditioned Head Turn procedure (CHT). The older group
showed discrimination of the contrast only when the fricative was followed by [i].
This result proved highly controversial and led to two subsequent studies, both of
which showed English-learning infants discriminating the [fa]-[0a] contrast. While
Holmberg et al. (1977) showed that six-month-olds discriminated the contrast, they
noted that subjects required twice as many trials to achieve criterion (an indirect
measurement of perceptual difficulty) than they did to reach criterion on the /S/-/JV
contrast. Further, at two months, infants were shown to successfully discriminate
[fa]-[0a] using the High-Amplitude Sucking (HAS) procedure (Levitt et al. 1988).
The conflicting reports of labiodental-interdental fricative discrimination in English-
hearing infants suggests the perceptual difficulty of the contrast relative to plosive
obstruent place contrasts.
I suggest that there is an acoustic source for infants' difficulty in discriminating
/f/-/0/, which potentially contributes to the relative rarity of the contrast in sound
systems. In a recent acoustic study of twenty American English speakers, the fricative
noise in both sounds was shown to have similar duration (165 ms), spectral peak
locations (8 kHz), mean spectral moments (5.1 kHz), kurtosis, and skewness (Jong-
man et al. 2000), all of which contribute to place perception in fricatives (Behrens and
6. Developmental perspectives on phonological typology and sound change 135

Blumstein 1988; Jongman 1988; Hedrick and Ohde 1993). In Jongman et al.s (2000)
study, when 21 acoustic predictors were used in a discriminant analysis classification,
27 per cent of labiodental tokens were classified as interdentals, and 26 per cent of
interdentals as labiodentals. This rate of confusion is consistent with human percep-
tual confusions between the two fricative places. In Miller and Nicely (1955), at the
highest signal-to-noise ratio and with the broadest band of frequency information
(+12 dB SNR, 200-6500 Hz), listeners identified /0/ as /f/ at a rate of 26 per cent.
Further, in several varieties of English (e.g. working-class London speech), /f/ and /0/
are merging.
Another fricative contrast that has proved difficult for infants to discriminate is the
alveolar voicing contrast. In a HAS procedure, English-learning infants (1-4 mos.)
failed to discriminate the [sa]-[za] contrast (Eilers and Minifie 1975; Eilers et al.
1977). There is corresponding asymmetry in the distribution of voiced and voiceless
alveolar fricatives in the world's languages, as well. In UPSID 69 per cent of alveolar
fricatives are voiceless. While there is a clear articulatory/aerodynamic reason behind
the preference for voiceless (over voiced) fricatives4 (Ohala 1983) and corresponding
devoicing of/z/ (Smith 1997), there is no clear acoustic-perceptual reason for infants'
failure to discriminate /s/-/z/ at such an early age.5 Indeed English-speaking adults'
perception of the contrast leads to little confusion (Miller and Nicely 1955), perhaps
owing in part to differences in voice onset time and fricative duration and amplitude
(Jongman et al. 2000).

VOTDiscrimination of voice onset time (VOT) distinctions provided the earliest


demonstration of infants' ability to perceive speech-like sounds categorically (Eimas
et al. 1971) and laid the groundwork for research in the 19705 and 19805 testing the
limits of infants' perceptual abilities. A series of studies by Lasky, Streeter, Eimas, Eil-
ers, and colleagues revealed interesting patterns with respect to distinctions between
lead/lag VOT contrasts and short/long lag contrasts. In general, there is an asym-
metry in infants' perception of the two types of distinctions. In all studies where
stimuli mimicked a short-lag vs. long-lag VOT distinction (similar to the English
implementation of voicing), infants succeeded in discriminating the contrast (Eimas
et al. 1971; Lasky et al. 1975; Streeter 1976; Eilers et al. 1979). Interestingly, infants
whose native language background did not contrast short vs. long lag also successfully
discriminated the distinction. Kikuyu-learning infants discriminated a +io/+4O ms
VOT contrast (Streeter 1976) and Spanish-learning infants discriminated a +2o/+6o
ms contrast (Lasky et al. 1975).
4
The turbulent noise of fricatives requires a high volume velocity of pulmonic airflow, which is neces-
sarily impeded by the oscillating vocal cords during voicing.
5
A drawback in interpreting early infant speech perception results is that precise acoustic measurements
of stimuli are often unavailable, thus precluding robust comparisons across studies. Eilers s studies likewise
provided minimal acoustic data. Data which were provided suggest that /s/ and /z/ differed along the
perceptually-critical parameters of voice onset time and fricative duration.
136 Chandan Narayan

TABLE 6.1 Summary of infants' discrimination of lead vs. short-lag VOT


contrasts
VOT contrast Discrimination Age Language
(ms) successful? (mos.) background Author

-30/0 + 2 Kikuyu Streeter (1976)


-30/0 6 English Eilers et al. (1979)
-60/+10 6 English Eilers tal. (1979)
-20/+10 6 English Eilers tal. (1979)
-20/+20 4-6.5 Spanish Lasky tal. (1975)
-40/+20 2,3 English Eimas (1974)
-20/0 1,4 English Eimas tal. (1971)
-70/+10 + 2,3 English Eimas (1974)
-60/+10 6 English Eilers etal. (1979)

The results of infants' perception of lead vs. short-lag VOT is quite different, how-
ever. The overwhelming majority of studies investigating this distinction suggest that
infants' discrimination is quite poor. Only two studies (Eimas 1974; Streeter 1976)
have shown infants' successful discrimination of the prevoicing/short-lag contrast
(Table 6.1). Kikuyu-learning infants discriminated both a prevoiced/simultaneous
(30/0 ms) VOT distinction as well as the short/long-lag distinction. It remains
unclear, however, whether the prevoiced discrimination results from experience with
Kikuyu or the psychophysical salience of the contrast, for English-learning infants
do not show discrimination of a similar distinctions (Eimas et al. 1971; Eilers et al.
1979). These studies suggest that the lead/short-lag implementation of voicing is
disadvantageous from the infant's point of view (but see Aslin et al. 1981). The lag
region of the VOT continuum is most likely privileged by the perceptual system for
psychophysical reasons (Pisoni 1977) as it provides more robust cues to a voicing
contrast (aspiration, Fi onset) than does the lead/short-lag distinction.
The perceptual advantage afforded to the short/long lag distinction in infancy has
an analogue in production as well, where mastery of prevoicing occurs relatively
late compared to short-lag VOT in languages like Spanish (Eilers and Benito-Garcia
1984), French (Allen 1985), and Thai (Candour et al. 1986) (but see Whalen et al. 2007
for VOT in babbling). The connection between infants' greater success at discriminat-
ing short/long-lag contrasts versus lead/short-lag contrasts and typological patterns
remains unclear, owing to a lack a comprehensive cross-linguistic survey (similar to
UPSID or WLS) of voicing implementation along the VOT dimension. Keating et al.
(1983)'s survey of 51 languages shows that a voicing contrast always utilizes a Voiceless
unaspirated' (short-lag) stop. Keating (1984) suggests that in contrast to the short-lag
implementation of VOT, languages which feature stop voicing contrasts are equally
likely to use 'fully voiced' (lead) or Voiceless aspirated' stops. The perceptual patterns
6. Developmental perspectives on phonological typology and sound change 137

of infants would predict, however, that languages more often utilize a short/long-lag
implementation of voicing than lead/short-lag VOT.

/1/-/J/Perception of the [la] -[ia] (English V) contrast has recently been shown to
be be facilitated by native language experience (Khl et al. 2006). Khl (2006) investi-
gated English- and Japanese-learning infants' perception of the (naturally produced)
contrast at 6-8 and 10-12 months of age using the CHT procedure. At 6-8 months,
both English- and Japanese-learning infants discriminated the contrast at a rate of
65 per cent correct, well below native levels of discrimination ability (approximately
80 per cent correct for synthetic stimuli in Miyawaki et al. 1975). By the end of their
first year, English-learning infants' perception of the contrast improved to approx-
imately 75 per cent correct. Further supporting the relative difficulty of III-III dis-
crimination in infancy, Kuhl's (2006) results revealed a directional asymmetry, where
facilitation of the contrast occurs only when infants are conditioned to discriminate
a change from III to III.
The English V is rare among the world's sound systems (occurring in roughly two
per cent of the languages in UPSID, compared with 39 per cent of languages with /!/)
and notoriously difficult to produce and perceive for non-native speakers (e.g. Goto
1971; Miyawaki et al. 1975; Polka and Strange 1985). Acoustically, III and III have very
similar spectral profiles, differing primarily in/3, which is characteristically low in III
(Fant 1960; Dalston 1975; Espy-Wilson 1992).

/d/-/9/Typologically, interdental fricatives are rare, relative to their nearest plo-


sive counterparts. For example, 44 per cent of UPSID languages exhibit alveolar
stops, while 7 per cent show interdental fricatives. The asymmetry becomes even
more apparent when compared to similar stop-fricative contrasts at other places of
articulation: 96 per cent of languages have bilabial stops vs. 44 per cent labiodental
fricatives; 97 per cent with velar stops vs. 28 per cent velar fricatives; 70 per cent
with dental or alveolar stops vs. 7 per cent with interdental fricatives. L2 speakers
of a language exhibiting the fricative often resort to substitution of/9/ with /d/ or /z/
(e.g. Dutch speakers' production of English 9 as [d] and French speakers' production
as [z]). Similar substitutions for /9/ are found in non-standard Englishes: /9/> African
American Vernacular English [d]; Cockney English [v].
Infants' discrimination of the /d/-/9/ contrast was the first to show the developmen-
tal profile of'facilitation (Polka et al. 2001). Two groups of infants were tested: French-
learning infants, for whom the /d/-/9/ contrast is non-native, and English-learning
infants, for whom the contrast is native to their ambient phonology. At 6-8 months,
both French- and English-learning infants showed mean Af scores only slightly better
than chance,6 less than that of the control contrast (/b/-/W), which is native to both
groups. In addition, infants' Ar scores showed more variation for /d/-/9/ contrast than
6
Ar is a nonparametric index of sensitivity similar to d'', the difference between z-scored proportion of
hits and false alarms in discrimination.
138 Chandan Narayan

the control. By 10-12 months, mean A' scores for English-learning infants increased
slightly, while remaining unchanged for French-learning infants. Adult English speak-
ers showed A7 scores reflecting ceiling levels of discrimination. Adult French speakers'
A' scores remained unchanged from the infant groups. These results are suggestive
of the interpretation that language experience serves to facilitate (or improve) native
contrast discrimination. Further, they also show that the initial state of /d/-/8/ per-
ception is less accurate than a similar stop-fricative (here /b/-/W) contrast.
In clean listening conditions, English-speaking adults discriminate the /d/-/8/ con-
trast quite well (Polka et al. 2001), but with the addition of additive noise, confusion
patterns result that are consistent with both the infants' relatively poor discrimination
and also the substitution patterns observed in L2 speakers attempting to produce /8/
(Miller and Nicely 1955). Taken together, results from native infant and adult percep-
tion are suggestive of a low-level acoustic source for relatively poor discrimination of
the /d/-/8/ contrast and the substitutions observed.

6.2.2 Implications
Infants' perceptual sensitivities are far from language universal. The outline presented
above, highlighting instances where infants' perceptual performance falls short of the
language-general perceptual specification often cited by linguists and psychologists,
corresponds to the typological regularities found across the world's languages. I would
suggest that these contrasts, which are fragile in terms of their acoustic distinctiveness,
are prone to misperception at the earliest stages of phonological development.
Another stage in the acoustics/development/typology story is type frequency in the
lexicon and token frequency in ambient speech. It is often the case that phones in a
weak acoustic-perceptual salience relationship are rather infrequent exemplars in the
lexicon (such as /8/ restricted to demonstrative articles in English) as well as token
frequency (as in syllable-initial /n/ in Filipino) in a language (Narayan 2008). I would
suggest that if infants have minimal evidence (in terms of a stochastic mechanism
for category formation) (Johnson 199/b; Pierrehumbert 2ooia; Maye et al. 2002) for
an already acoustically weak contrast, which is then coupled with a low functional
load (Martinet 1933), they have the potential to affect misperception-based change
(Greenlee and Ohala 1980). This argument is further bolstered by the fact that, in
some children, production patterns suggest an effect of perception on early lexical
representations (Macken 1980; Rose 2009).

6.3 Infant-directed speech and phonologization


The natural state of the acoustic-phonetic input to infants (or infant-directed speech
or IDS) and its relation to emerging phonology has been the focus of a growing
body of literature on speech category learning (Khl et al. 1997; Andruski et al. 1999;
6. Developmental perspectives on phonological typology and sound change 139

Liu et al. 2003; Werker et al. 2007). Much of this work is driven by recent models
of category learning as a function of the frequency of the input, where infants are
shown to discriminate phonetic categories when familiarized to tokens comprising
different modes in an artificially created stimulus continuum (Maye et al. 2002, 2008).
Researchers have found such modally distributed cues in the acoustic input to infants.
For example, Werker et al. (2007) showed that Japanese- and English-speaking moth-
ers, when teaching new words to their young infants, consistently produced acousti-
cally distinct modes of vowel quality (/i/ vs. /e/ and /e/ vs. /e/ for English) and vowel
duration (/i/ vs. /i:/ and /e/ vs. /e:/ for Japanese). Much of the research examining IDS
has highlighted its enhancing hallmarks, where categorical phonetic distinctions are
exaggerated (i.e. vowel duration, vowel quality, toneTang and Maidment 1996; Liu
et al. 2007). The present section considers an often overlooked acoustic consequence
of the IDS register, namely the reduced clarity of contrast in the speech to very
young infants (Baran et al. 1977; Malsheen 1980; Sundberg and Lacerda 1999), and
its implications for the directions of sound change.

6.3.1 From emotional and social to linguistic function


Much like the developing perceptual system, caregivers' speech changes over the
course of an infants first year. In early infancy (before infants' one-word stage) IDS is
very much a biologically relevant acoustic signal, serving to assuage, arouse, and reg-
ulate infants' attentional state (Sachs 1977; Fernald 1992), stimulating them to calm
awareness' (Cooper and Aslin 1989). By the time infants begin producing their first
words, the communicative intent underlying IDS is said to take on a more linguistic
function. Psychologists and linguists have arrived at this conclusion by examining the
changing acoustic clarity (the distance between phonemes in some acoustic space)
of IDS over the course of development. For example, early research on the prosodie
quality of English IDS showed that the distinctive pitch excursions characteristic of
stereotypical IDS decreased as the child's age increased (Garnica 1977). Fernald et al.
(1989) examined prosodie characteristics of IDS in English, French, German, Italian,
and Japanese, and found similar results: higher mean /0 with wider range, longer
pauses, shorter utterances, and more repetitions compared to adult-directed speech
(ADS). These acoustic characteristics were more exaggerated in speech to very young
infants and decreased as children became more skilled in language use (Fernald 1992).
Exaggerated prosodie features such as intonational patterns and syllable duration
contribute to the IDS effect, which has been shown to be preferred by young infants.
Infants younger than six months show more attentional and affective response to
IDS than do infants at nine months (Werker and McLeod 1989). More recently IDS
has been shown to facilitate word segmentation, which has implications for other
aspects of language learning. Infants (seven months old) were exposed to either IDS
or ADS nonsense sentences where the statistical structure of the syllables served as
140 Chandan Narayan

the only cue to word boundaries. Only infants exposed to the IDS input were able to
distinguish words from part-words (Thiessen et al. 2005).
Interestingly acoustic features of IDS at the level of the segment also seem to change
over the course of an infant's development. Malsheen (1980) examined voicing in a
longitudinal study of English IDS spoken to children ranging from six months to five
years of age, and found that only when infants were 15-16 months old did mothers
significantly separate the voiced and voiceless categories along the VOT dimension.
At 15-16 months, mothers implemented longer VOTs for voiceless tokens than in
their voiceless tokens to younger infants. Baran et al. (1977) found no significant
differences in VOT between IDS and ADS when infants were twelve months old.
Sundberg and Lacerda (1999) found that in the IDS addressed to three-month-old
Swedish infants, VOT was significantly shorter in both voiced and voiceless stops
than in ADS. This resulted in more overlap between the voicing categories in IDS. The
authors provide a developmental account of their findings by suggesting that acoustic
properties of obstruents are less 'specified' in the IDS to young infants and gradually
reach adult-directed VOT values at around the time infants produce their first word.
More recently, in a study of Norwegian IDS, Englund (2005) found that alveolar and
velar stops have longer VOTs during infants' first six months than in ADS. While
there were no differences in the voiced/voiceless distinction along the VOT dimension
between the two registers, the developmental profile of the data suggested that VOT
in IDS becomes more like ADS as infants get older. The developmental account is
consistent with studies of IDS vowel production as well, where acoustic clarity is found
only in those lexical categories used by the child (Bernstein Ratner 1984).
What I argue in the case study below is that the not-so-careful speech to very young
infants has acoustic consequences which have the potential to become phonologized
by infants in this perceptually sensitive stage of development (Werker and Tees 1984).
The interaction between the socially driven imperatives of early IDS and contrastive
phonetic salience can provide the learner with the kind of structured acoustic vari-
ability associated with misperception-based sound changes (Ohala 1981).

6.3.2 Modeling voicing in English IDS and ADS7


The covariation between voice-onset time (VOT) and post-consonantal/0 is a well-
known source of tone in languages that have historically lost voicing contrasts
(Matisoff 1973; Hombert et al. 1979). As a result of naturally conditioned pitch per-
turbation, where voiced consonants exert a lower/0 than do voiceless consonants on a
following vowel (Abramson and Lisker 1985), a relatively low tone develops on vowels
following previously voiced consonants and a relatively higher tone develops on vow-
els following previously voiceless consonants. Hombert and Ohala (1979) note: 'The
historical development of tones (tonogenesis) can result from the reinterpretation by
7
Resarch reported in this section was conducted in close collaboration with Kyle Gorman and Daniel
Swingley at the Institute for Research in Cognitive Science at the University of Pennsylvania.
6. Developmental perspectives on phonological typology and sound change 141

listeners of a previously intrinsic cue after recession and disappearance of the main
cue.' While the primary cue to voicing in English (VOT) has not 'disappeared' as hap-
pened in many cases of tonogenesis, I would argue that the IDS register contributes
to acoustic ambiguity in voicing that is consistent with the development of tone.
Previous studies have shown that the distribution of voiced and voiceless tokens
along VOT are more similar in the IDS to infants under twelve months than in the
IDS to older infants or in ADS (in American English and Swedish). Voiceless VOTs
are generally shorter in IDS, resulting in more overlap with voiced VOTs (Baran et al.
1977; Malsheen 1980; Sundberg and Lacerda 1999) compared to ADS or IDS to older
infants. In a recent study of word-initial voicing in American English IDS and ADS we
(myself together with Kyle Gorman and Daniel Swingley from the University of Penn-
sylvania) examined VOT and post-consonantal/0 in the hope of understanding (i)
the regularity of the acoustic features of voicing available to young infants and (ii) the
relative weights of VOT and/0 in predicting voicing in IDS and ADS. In examining the
covariation of VOT and/0 in voicing in two different registers we hope to shed light
upon the history of the interaction between these features as providing potentially
ambiguous and ultimately misinterpretable cues.

6.3.3 Methodological approach: Logistic regression modeling


Voicing in English IDS and ADS was modeled using binary logistic regression (Hos-
mer and Lemeshow 1989; Gelman and Hill 2007). Logistic regression is a linear mod-
eling technique that generates coefficients () for predictor variables that contribute
to a classification of binary data (here voiced or voiceless). The predictors of voicing
were the VOT and/0 characteristics for word-initial plosives in the speech of eight
women: four from the Brent corpus of infant-directed speech (Brent and Siskind
2001) speaking to their infants at nine months, and four from the Buckeye corpus
of conversational (adult-directed speech) (Pitt et al. 2007). The speakers from the
Buckeye corpus were selected on the basis of their being new mothers or soon-to-
be mothers. Forced-phoneme alignment of the audio from the Brent corpus (Quam
et al. 2008) allowed us to examine the acoustic characteristics of word-initial conso-
nants and following vowels. The Buckeye corpus includes a phoneme-aligned parse.
A trained phonetician (CRN) measured VOTs by hand for all speakers. Five hundred
utterances per Brent speaker were examined, and approximately twenty minutes of
speech from each Buckeye speaker were examined.
VOT was calculated as the time between the word-initial stop burst, characterized
by a brief high-frequency noise, and the onset of periodic laryngeal vibration of the
post-stop vowel measured at the first zero crossing. The VOT of prevoiced tokens was
calculated as the time between the onset of periodic vibration during stop closure
and the release of the stop into the following vowel. In general, the release of the stop
was simultaneous with the onset of periodic voicing of the vowel. In keeping with the
literature on VOT, prevoiced tokens were assigned a negative value (e.g. Keating et al.
142 Chandan Narayan

1981). In order to control for varying speech rate, which is known to be slower in IDS
compared to ADS (Khl et al. 1997), VOT was normalized by dividing the raw VOT
measurement (ms) by the duration of the following vowel. This ratio has been shown
to serve as a perceptual criterion for voicing category affiliation (Boucher 2002).
Voiced regions inside the post-stop vowel region were extracted and pitch tracks
obtained (at i ms time steps) using SWIPE' (Camacho 2007). The pitch extraction
algorithm required that the voiced region be at least 10 ms. Tokens with less than
10 ms of post-stop voicing were discarded. The procedure yielded 1200 IDS and 1058
ADS CV tokens. A visual inspection of all the pitch tracks confirmed that there were
no obvious halving errors in the extraction. In order to control for individual speakers'
pitch ranges, raw/0 measurements were normalized by speaker using the standard z
calculation. Following Umeda (1981), peak (or maximum) f0 (in the first half of the
post-stop vowel) was computed for analysis.

6.3.4 Results
Analyses of mean VOTs according to register and voicing were consistent with pre-
vious reportsthere was a voicing x register interaction suggesting that voiced and
voiceless stops in IDS showed more overlap along VOT than in ADS (F(i, 2258) =
1552,p < o.oooi), that is, the modes of VOT were more separable for voiced and
voiceless tokens in ADS than in IDS. There was also a general pitch perturbation effect
(with no interaction of register) suggesting that voiced stops were followed by a lower
pitch than voiceless stops.8
The logistic regression models of IDS and ADS were fitted using VOT,/0, and
their interaction as predictors of voicing. Table 6.2 presents regression models of
voicing in IDS and ADS. Both registers show a significant main effect of VOT, with a
negative slope () indicating that an increase in VOT results in less voiced prediction.
Fundamental frequency is significant in both registers as well, again with a negative
slope confirming the pitch perturbation effect.
The interaction between VOT and f0 is significant in only the IDS model. The
interactions (plotted in Figure 6.3) suggest that as VOT increases,/0 has a greater effect
on voicing prediction. In IDS,/0 becomes more and more predictive of voicing as VOT
increases. As a result, where VOT is most ambiguous in the signal (VOT ratio between
o and o.5),/o becomes more useful as an indicator of voicing. No such effect is present
in ADS. For example, given a VOT ratio of 0.25 (where there is significant overlap
between voiced and voiceless tokens) and an f0 value at the 10 per cent quantile
8
Both the VOT and /0 analyses were conducted using 2 (register: IDS, ADS) x 2 (voicing: voiced,
voiceless) x 3 (place of articulation: velar, apical, bilabial) ANOVAs. There were considerably more pre-
voiced tokens in the ADS sample than in IDS. We explored the possibility that the interaction between
voicing and register was driven by the more negative mean VOT for voiced tokens in the ADS sample.
This interpretation was not supported, as the interaction was also significant when prevoiced tokens were
removed from the analysis. There was an expected effect of place on VOT, with velars having the longest,
followed by alveolars, then bilabials.
6. Developmental perspectives on phonological typology and sound change 143

TABLE 6.2 Register-specific logistic regression models of voicing as a function of


VOT and/o. The table shows the significant contribution of the VOT x/ 0 interaction
only in the IDS model of voicing

IDS Std. Error 95% CI z Sig.


(Intercept) 0.01 0.09 (-0.16, 0.17) 0.11 0.91
VOT -6.39 0.40 (-7.21, -5.64) -15.91 <0.0001

/o --59 0.10 (-0.79, 0.40) -5.82 <0.0001


VOTx/o -2.03 0.44 (-2.91, -1.17) -4.59 <0.0001

ADS

(Intercept) 0.35 0.11 (0.15,0.56) 3.32 <o.ooo5


VOT -6.82 0.47 (-7-79, -5-95) -14.58 <0.0001

/o -0.31 0.12 (-0.54, 0.08) -2.62 <0.01


VOTx/o 0.22 0.52 (-0.86, 1.18) 0.41 0.61

FIGURE 6.3 VOT x /0 interaction in voicing prediction in IDS and ADS. The two panels
show probability curves overlaid on a jittered VOT x voicing scatterplot. The curves represent
quantile values of the distributions of/0. For example, the median curve (solid line) represents
the/o value below which 50 per cent of the data lie in each register. 'High' and Low'/0 represent
the highest and lowest/, values in each corpus
144 Chandan Narayan

(a relatively low pitch), the probability of the token being voiced is approximately
55 per cent in IDS. Given the same VOT and a median (50 per cent quantile) f0
value, the probability of the token being voiced is approximately 30 per cent. In the
ADS model, similar shifts in f0 do not substantially change voicing predictions.9
Voicing is therefore more consistently implemented with VOT in the ADS corpus,
thus minimizing the effect of/0 as a cue to voicing.

6.3.5 Implications
At nine months, infants in English-speaking environments are being exposed to
highly variable VOT, with considerable overlap between categories, thus highlighting
the regularity of pitch perturbation as a reliable cue to voicing. While we do not expect
English-learning infants to reinterpret the regularity of pitch perturbation as tone,
what this study points up is the acoustic instability of the IDS to young infants, par-
ticularly at an age when they are thought to be developing and honing their perceptual
sensitivity (Werker and Tees 1984; Narayan et al. 2010). The models suggest that, at
this young age, consistency of the pitch perturbation effect essentially prevents the
learner from making incorrect predictions of voicing.
VOT as the primary cue to voicing in English is salvaged, however, for by the
time infants produce their first words, VOT as a cue to voicing in IDS resembles the
consistency of ADS, thereby precluding infants' reinterpretation of/0 as the primary
cue to voicing.

6.4 Conclusions
Linguists have long speculated on the potential role for infants and children in histor-
ical phonology processes (Baudouin de Courtenay i895[i9/2a]). For the most part,
these speculations have been relegated to the domain of speech production and the
nuances of child phonology. As Blevins (2004) has noted, the enterprise of locating
phonetically motivated historical processes in child phonology is inherently con-
founded with inter- and intra-speaker variability associated with the quickly maturing
vocal apparatus. That is, language-specific phonological patterns may be obscured in
children's productions, by changing physical constraints. Unfettered by these non-
cognitive limitations, behavioral studies of speech perception offer a window into
the earliest sensitivities of infants, which in turn allow us to assess infant biases and
connections to directions of phonological change and typological regularities.

9
Voicing implementation was also modeled by considering individual speaker variation using hierar-
chical logistic regression (Gelman and Hill 2007; Gorman 2009). Results similar to the models presented in
Table 6.2 were obtained, with three out of the four IDS speakers showing a significant interaction between
VOT and/o. The interaction was not significant for any of the ADS speakers.
6. Developmental perspectives on phonological typology and sound change 145

Despite the remarkable cognitive abilities exhibited by young infants, the earli-
est stages of their speech perception are less than ideal. Infants, rather than being
citizens of the world', are more likely members of the majority party, with initial
perceptual abilities reflecting acoustically robust, and typologically common, pho-
netic contrasts. I argued that ubiquitous contrasts in the worlds languages reflect,
to some degree, the natural perceptual biases infants bring to the language-learning
table. Conversely, contrasts which are typologically rare reflect the relative difficulty of
their discrimination by young infants. The consistency between patterns in develop-
mental speech perception and phonological typology is consistent with functionally
based approaches to the phonetics-phonology interface such as Lindblom's disper-
sion theory (Liljencrants and Lindblom 1972; Lindblom 1986; Johnson et al. 1993a),
which proposes that phonological contrasts are sufficiently distinctive perceptually
(in order to be learned and remain stable). While the question of how and why
certain perceptually difficult contrasts remain in sound systems cannot be answered
within the present research program, we appeal to general learning mechanisms (i.e.
statistical learning in terms of frequency of occurrence, cf. Maye et al. 2002) for their
persistence.
Directions for future research might include exploring infant biases in the per-
ception of typologically asymmetric distributions of suprasegmental features such
as tone. Recent work suggests that infants' discrimination of certain tone contrasts
follows the typical profile of perceptual reorganization (Werker and Tees 1984). Mat-
tock and Burnham (2006) showed that English-learning infants discriminated Thai
rising vs. falling and rising vs. low tones more accurately at six months than at nine
months, suggesting that infants' perceptual sensitivities have reorganized in the direc-
tion of privileging native contrasts. Given the connections between the earliest biases
in infant speech perception and typological patterns, we might next ask if infants
discriminate acoustically similar tones (e.g. tones 22 vs. 33 from Cantonese) as well
as tone contrasts with a robust salience (e.g. Cantonese 21 vs. 25) (Khouw and Ciocca
2007).
Finally, the connection between development, typology and phonologization is
not limited to child behavior. This chapter also outlined the type of variability
associated with infant-directed speech in English and the similarity between its
acoustic characteristics and the phonetic conditions giving rise to tone from the
loss of voicing contrasts. While this analysis does not claim to capture a sound
change in progress, it provides evidence for treating infant-directed speech as criti-
cal input to the developing speech perception system, particularly when emotional
affect results in either hyper- or hyp o-articulation (Lindblom 1990) which can
potentially be reinterpreted as phonologically different from the intended linguistic
gesture.
146 Chandan Narayan

TABLE 6.3 Frequency of consonants within the Brent corpus of


infant-directed speech. Proportions are rounded to two decimal
places. Consonants are listed in decreasing order of frequency
Consonant Proportion in Brent

t 0.13
n o.io
j, s 0.07
k, d, l 0.06
m, j, , w 0.05
b, g, h 0.04
z, p 0.03
f, rj 0.02
v, 6, J, tf, et; o.oi
3 <o.oi
Part III

Phonological and morphological


considerations
This page intentionally left blank
7

Lexical sensitivity to phonetic and


phonological pressures
ABBY KAPLAN*

7.1 Introduction
Cross-linguistically, some phonological structures are preferred over others. For
example, voiceless obstruents are favored relative to voiced obstruents in that many
languages place special restrictions on voiced obstruents or lack them altogether.1
When confronted with asymmetries such as these, linguists have long referred to the
dispreferred structure as 'marked' (Croft 1990).
Markedness is often closely tied to phonetic facts. Voiced obstruents, for example,
are disfavored for well-documented reasons to do with the aerodynamic properties
of the vocal tract (Ohala 1983: 194). However, some linguists would argue for the
existence of an abstract cognitive representation of markednessone that may reflect
phonetic realities but cannot ultimately be reduced to them. For example, this is the
role played by markedness constraints in the view of some, but not all, practitioners of
Optimality Theory (Prince and Smolensky 2004). The role of these abstract represen-
tations is often to mediate between gradient phonetic patterns and their categorical
analogues. In this chapter, I will refer to low-level causes such as the aerodynamics of
the vocal tract as 'phonetic' pressures on marked structures, and to the hypothesized
coarse-grained cognitive representations of markedness as 'phonology'. Although the

* Many thanks to Elliott Moretn, whose questions led to this chapter; Colin Wilson, for statistical advice
and other helpful discussion; Paul Willis, for collaboration on the Korean database (and Yongeun Lee for
technical assistance), and Aaron Kaplan, Grant McGuire, Armin Mester, Jeremy O'Brien, Jaye Padgett, Matt
Tucker, and audiences at the 2008 OCP and the Symposium on Phonologization for helpful comments at
various stages of the project. All shortcomings are my own. This research was funded by an NSF Graduate
Research Fellowship and conducted at the University of California, Santa Cruz.
1
This is, of course, a simplification; there are also certain environments in which voiced obstruents are
preferred to voiceless onesfor example, postnasally (Pater 2004).
150 Abby Kaplan

existence and nature of the phonology/phonetics distinction is much debated, there is


some evidence that these cognitive entities do have an independent existence (Wilson
2006; Moretn 2010, 2oo8a; Hayes and Wilson 2008). Even if the boundary between
the two ultimately turns out to be difficult to define, there do seem to be qualitative
differences between the two ends of the scale.
Markedness is reflected in various measures of frequency (lexical, textual, and so
on: Greenberg 1966: 14) such that phonologically marked configurations are less
frequent than their unmarked counterparts. For example, voiceless obstruents are less
marked than voiced obstruents, and are also more frequent in corpora and lexicons.
This chapter addresses the nature of these asymmetries. Essentially, the question is
this: are marked phonological structures less frequent because they are phonologically
unmarked, or because they are phonetically dispreferred? Figures /.ia-c sketch, in a
radically simplified way, three possible answers to this question. The arrows represent
influence among the three relevant components of human linguistic systems; the
influence of phonetic precursors on phonological markedness is represented by an
arrow from the phonetic component to the phonological component. The question
addressed here is that of what other arrows are present: is the lexicon (that is, patterns
of lexical frequency) influenced by phonology (Figure /.ia), phonetics (Figure /.ib),
or both (Figure /.ic)?
Teasing apart the effects of phonological markedness and phonetic pressures is
difficult precisely because of the powerful influence of phonetics on phonology. The
strategy of this chapter is to examine two cases of 'underphonologizatior (Moretn
2010): cases in which the categorical analogue of some robust low-level phonetic

FIGURE j.ia Lexicon affected by phonology

FIGURE j.ib Lexicon affected by phonetics


/. Lexical sensitivity to phonetic and phonological pressures 151

FIGURE 7.10 Lexicon affected by phonology and phonetics

pattern is unexpectedly absent from phonological patterns.2 Because instances of


underphonologization are places where phonology and phonetics diverge, they offer
a valuable window onto the separate behavior of the two (in addition to constituting
evidence that they are distinct in the first place). In other words, they allow us to dis-
tinguish between phonetic patterns that do have categorical phonological analogues,
and those that do not.
The first case study pairs a categorical pattern of vowel fronting adjacent to coronals
with a gradient pattern of backing due to perceptual overcompensation; the second
case study pairs a categorical pattern of vowel height harmony with a gradient pattern
of raising by voiced obstruents. In each case, I investigate which of the two patterns is
actually reflected in patterns of lexical frequencythat is, type frequency computed
over a languages lexiconand find that it is the phonological pattern that is the best
match to the lexical frequency facts. These findings suggest that we should rule out
scenarios sketched in Figures /.ib and /.ic, in which phonetics has an effect on the
lexicon independent of its effect on phonology. We are left with Figure /.ia: lexical
frequency is related to phonological markedness rather than its phonetic precursors.
Alternatively, as discussed in section 7.5.2, these results are also consistent with
models such as exemplar theory in which lexical frequency influences categorical
phonological patterns, rather than the other way around.

7.2 Method
7.2.1 Corpora
In each of the two cases of underphonologization discussed below, the phonetic and
phonological patterns are tested against actual patterns of lexical frequency in seven
languages: English, German, Dutch, French, Spanish, Serbo-Croatian, and Korean.

2
Note that the term 'underphonologization' is also used in the literature to refer to cases in which some
phonetic pattern is realized as a categorical phonological pattern less often than expected. To avoid circu-
larity (cross-linguistic frequency defines markedness, which is in turn found to affect lexical frequency), I
restrict my attention to phonetic patterns that appear never to be phonologized.
152 Abby Kaplan

Data for English, German, and Dutch was obtained from the CELEX lexical database
(Baayen et al. 1996); data for French from Lexique (New et al. 2001); data for Spanish
from BuscaPalabras (Davis and Perea 2005); and data for Serbo-Croatian from the
Ukrstenko corpus (Sipka 2002).
Data for Korean was obtained from the Korean National Database (Lee 2006).
The entries in this database are listed in Korean orthography, which is largely mor-
phophonemic. However, for the sake of consistency with the results from the other
databases (which contain broad phonetic transcriptions), the database was postpro-
cessed with a basic SPE-style phonology of Korean with the goal of yielding more
surface-like representations of the lexical entries. The phonological grammar was
written in collaboration with Paul Willis at UC Santa Cruz and based on the descrip-
tions of Sohn (1999). There is reason to be cautious in interpreting the results for
Korean: since morphological boundaries were not available in the original database,
we were unable to implement any morphophonological rules.
For each language, lexical frequency was calculated over monomorphemic lemmas.
None of the languages has phonologized any of the patterns discussed below. (Korean
does have vowel harmony in affixes, but affixes were not included in the corpus and
thus did not affect the results.)

7.2.2 Modeling procedure


In both of the cases of underphonologization examined below, the patterns that have
the potential to be reflected in lexical frequency are more subtle than the asymmetries
between, for example, voiced and voiceless obstruents; thus, standard metrics such as
observed-expected ratios may not be sensitive enough to detect whether the relevant
effects are present. (The subtlety of the relationships being examined here may explain
why some previous work, such as Maddieson and Precoda 1992 and Walter 2008, has
had difficulty finding significant patterns of interaction between adjacent or nearby
segments.)
The strategy adopted here is to build logistic regression models (LRMs) over the
sequences of interest, while factoring out possible confounds. For example, in section
7.4, the models that investigate the interaction between the height of adjacent vowels
also account for the effects of vowel frontness and tenseness. The procedure by which
these models were built was as follows for each language:

1. Extract all relevant sequences from the lexicon (e.g. all vowel-vowel sequences
in which the two vowels are separated only by consonants).
2. Build an LRM predicting the dependent variable (e.g. vowel height) from
other potentially relevant factors (e.g. vowel tenseness) and their two-way
interactions.
3. Run an ANOVA on the model to identify non-significant factors or
interactions.
/. Lexical sensitivity to phonetic and phonological pressures 153

4. Remove the least significant factor or interaction and rebuild the model.
=> Exception: never remove the factor of interest; where this factor is not sig-
nificant, this fact is reflected in its p-value.
5. Repeat steps 3-4 until all factors are significant at p < .05.
When the resulting models are reported below, I indicate which factors were actually
included in the final version of each model. For reasons of space, the individual inter-
actions included in each final model are not listed. See the appendix for information
on how the vowels and consonants of these corpora were coded for the relevant
features.

7.3 Case one: Coronal fronting


7.3.1 Two patterns
There is a gradient phonetic tendency for vowels to be articulated in a more front
position when they are adjacent to coronal consonants; it is possible that this pattern
is articulatorily motivated (because the production of coronals involves the front part
of the tongue, as does the production of front vowels). This pattern has been phonol-
ogized in languages such as Cantonese (Flemming 2001): vowels between coronal
consonants are neutralized and required to be realized as front.
Ohala (1981) shows that listeners are apparently aware of the fronting effect of coro-
nals on adjacent vowels, and compensate for it: an ambiguous vowel that is adjacent to
coronals is more likely to be perceived as back than the same signal when it is adjacent
to non-coronals. In other words, listeners attribute some of the Trontness' in the signal
to the effect of the coronals rather than to the vowel itself.
In the same paper, Ohala argues that this type of perceptual overcompensation
is responsible for a wide range of phonological dissimilation processes. For exam-
ple, under this model, a language in which a labialized consonant cannot be fol-
lowed by a round vowel might result from a pattern of systematic misperception in
which listeners attribute the roundness of such labialized consonants to the follow-
ing vowel, reinterpreting the labialized consonants as plain ones. On analogy with
labial dissimilation, then, we might expect to see a similar 'dissimilation of vowels in
coronal contexts: a pattern in which vowels adjacent to coronals are required to be
back.
However, this perceptually-driven pattern of vowel backing in coronal contexts
does not appear to be attested; rather, attested phonologies exhibit the Cantonese
pattern, in which coronals cause adjacent vowels to become front. In other words, this
looks like a case of underphonologization: the articulatory pressure to front vowels
adjacent to coronals is sometimes phonologized, while the perceptual pressure to back
them never is.
154 Abby Kaplan

Phonetic Pressure Phonologized?

Vowels next to coronal consonants become front Yes


(articulation?)
Vowels next to coronal consonants become back No
(perception)

The phonetic and phonological patterns therefore make distinct predictions for lex-
ical frequency: there are phonetic precursors for both fronting and backing next to
coronals, but only fronting matches attested phonological patterns.

7.3.2 Lexical frequency


Contiguous CVC sequences were extracted from the lexicon of each language; the
analysis was limited to CVC sequences in which both Cs were oral stops. Non-stops
were eliminated because the representation of place for some sonorants (such as [i]
and [!]/[!] in English) is non-trivial, and because the range of place contrasts is limited
for many non-stops (such as affricates and nasals). Stops, by contrast, exhibit a robust
three-way contrast among labials, coronals, and dorsals in all seven languages, thus
giving each place of articulation a chance to make its effects felt.
The dependent variable in each model was the frontness of the vowel, coded as
a two-way distinction (front vs. back). There were two predictors of interest, the
place of the first consonant (Ci) and the place of the second consonant (Ci). Other
factors included in the models were the height, tenseness, and stress of the vowel (V).
Stress data (primary/secondary/none) was available only in CELEX. In Spanish and
Serbo-Croatian, tenseness is restricted to non-low (Spanish) or high (Serbo-Croatian)
vowels and thus provides no additional information.
In most of these languages, front/back vowel pairs also differ in rounding. Ohala's
(1981) experiment noted above tested English listeners with the vowels [i] and [u],
showing that perceptual compensation for frontness adjacent to coronals can even
overcome differences in rounding.
Table 7.1 summarizes the resulting LRMs. These models, dealing with categorical
factors (e.g. front vs. back), model the behavior of the dependent variable in terms
of likelihood: for each level of each predictor variable, the coefficient assigned by
the model represents its effect on the likelihood that the dependent variable will
be realized with the default level; here, front vowels and coronals are taken as the
default. For the English model, the coefficient associated with a labial Ci is 3.2.
This means that in CVC sequences in the English lexicon, if Ci is labial, the vowel
is less likely to be front than if it were adjacent to a coronal. The p-value associated
with each coefficient represents a test against the null hypothesis that the coefficient
is zero.
/. Lexical sensitivity to phonetic and phonological pressures 155

TABLE 7.1 Prediction of vowel frontness from place of surrounding1 consonants


Language Factors in Final Model C Place Coefficients
Height Tenseness Stress Place Ci P C2 p
English / / / Labial -3-2276 .0000 1.6404 .0000
Velar -2.8289 .0000 -.8348 .0001

German / / / Labial -1.7912 .0000 2.6052 .0000


Velar -3-0134 .0000 -1.4746 .0000

Dutch / / / Labial .8100 .0000 -1.6108 .0000


Velar 1.0035 .0000 -1.3896 .0000

French / / NA Labial -.2355 .0704 -.1580 .2308


Velar .6644 .0000 .8881 .0000

Spanish / NA NA Labial -.1419 .2267 1.1030 .0000


Velar -1.8278 .0000 .8963 .0000

Serbo-Croatian / NA NA Labial -1.1341 .0000 -1.1878 .0000


Velar 1.0506 .0000 -.5346 .0000

Korean / / NA Labial 2.2451 .0000 -1.1334 .0000


Velar .4668 .0038 -2.0443 .0000

It would be unwise to compare the magnitudes of the various coefficients across


models, especially since the models contain different factors. Of interest instead is
the sign of each coefficient: do non-coronal consonants increase or decrease the
likelihood that the vowel is front? The graphical summary in Figure 7.2 shows that
the coefficients are overwhelmingly negative: a non-coronal consonant almost always
decreases the likelihood that the adjacent vowel is front. Although there are some
exceptions to this pattern, they have every appearance of being sporadic: they are
not concentrated in any particular language, nor are they consistently associated with
particular types of consonants. In addition, if we consider the effects of interactions
(examining all possible combinations of various levels of the non-place factors; omit-
ted here for reasons of space), in the majority of cases coronals are associated with
front vowels and non-coronals with back ones. It seems safe to conclude that in pat-
terns of lexical frequency, coronals are associated with front vowels and non-coronals
with back vowels. This is consistent with the phonologized pattern; no language in
this sample exhibits the backing effect of coronals that is predicted by the perceptual
dissimilation facts discussed above.
156 Abby Kaplan

FIGURE 7.2 Coefficients of C place factors (*':p < .05; ":p < .1)

Although both fronting and backing next to coronals have phonetic precursors,
we do not know whether those precursors are equally strong. Ohala does not pre-
dict how often dissimilation will occur, only that it can result from misperception;
indeed, a reviewer suggests that since knowledge of the articulatory fronting effect is
a prerequisite for hypercorrection, we might expect the former to be stronger. Thus,
an alternative interpretation of these results is that lexical frequency reflects phonetic
precursors in proportion to their strength; further research would be required to rule
out (or confirm) this possibility. The case of underphonologization in the next section
is less susceptible to this kind of explanation.

7.4 Case two: Voiced obstruent raising


7.4.1 Two influences on vowel height
Moretn (2oo8a) observes that gradient phonetic variation in vowel height is influ-
enced about equally by two environmental factors (among others). One is the height
of the adjacent vowel: the higher the first of a pair of vowels, the higher the second.
The second is the voicing of the following obstruent: a voiced obstruent raises the
preceding vowel.
Moretn also documents that the two effects are acoustically equal in their
magnitudes, but not phonologically equal. Phonological interactions between
the height of adjacent vowels (i.e. height harmony) are very common. However,
phonological interactions between the height of a vowel and the voicing of the
/. Lexical sensitivity to phonetic and phonological pressures 157

following obstruent are rare or possibly nonexistent: in an extensive typological


survey Moretn finds only three marginal cases; one of these (Canadian raising)
involves an interaction in the wrong direction (voiced obstruents associated with
lower preceding diphthongs), and the other cases are not clearly voicing-related or
are of doubtful productivity.

Phonetic Pressure Phonologized?

High vowels tend to be adjacent to other high vowels Yes


Voiced obstruents tend to raise preceding vowels No

This is therefore another case of underphonologization: there are articulatory pho-


netic patterns such that vowels are likely to be high before voiced obstruents and after
high vowels, while attested phonological patterns involve only the latter interaction
with vowel height.

7.4.2 Height-height interactions in the lexicon


By way of comparison, first consider the height-height interaction, where both pho-
netic and phonological patterns predict vowels of similar height to co-occur. To test
this, V(C*)V sequences were extracted from the lexicon of each language (that is,
pairs of vowels separated by zero or more consonants, but crucially not by another
vowel). For reasons explained below, two models were built for each language: one
with all the vowels, and one excluding the low vowels. The dependent variable in
each model is the height of the second vowel (2), coded as a three-way distinction
(low ~ mid ~ high). The predictor variable of interest is the height of the first vowel
(Vi); other factors included the frontness of Vi and the frontness, tenseness, and
stress of Vi.
Table 7.2 summarizes the resulting models. Here, the lowest vowels in each model
are taken to be the default (that is, low vowels for the models with all vowels and
mid vowels for the models with only non-low vowels). Each coefficient represents the
effect of a vowel of the given height, relative to a low/mid vowel, on the likelihood
that the following vowel is mid/high vs. low, and on the likelihood that the following
vowel is high vs. mid/low (see Baayen 2008: 208-13 for a discussion of LRMs with
ordered factors). Thus, for the English model, the coefficient for mid vowels in the
model with all the vowels is .47, meaning that a mid vowel is less likely than a low
vowel to 'raise' the following vowel. If these languages exhibit a static harmony-like
pattern, then all of the coefficients should be positive (meaning that non-low Vis
are associated with high V2S more than low V2s), and the coefficients for high vowels
should be larger than those for low vowels (meaning that high Vis raise V2S more than
mid Vis do).
158 Abby Kaplan

TABLE 7.2 Prediction of vowel height from height of previous vowel


Factors in Final Model Coefficients of Vi

V2 Vl V2 V2
Language Vowels Front Front Tens Stress Mid p High p
English All / / / / -.4740 .0000 -.1824 .0185
Nonlow / / / / - .3768 .0001
German All / / / / -.3438 .0000 -.2616 .0000
Nonlow / / / / - .0707 .2758
Dutch All / / / / -.0384 .1333 .1902 .0000
Nonlow / / / / - .2213 .0000
French All / / / NA -.2721 .0000 -.2765 .0000
Nonlow / / NA - .6906 .0000
Spanish All / / NA NA -.3150 .0000 -.3318 .0000
Nonlow / / NA NA - .1067 .0936
Serbo-Croatian All / NA NA .1701 .0000 .6461 .0000
Nonlow NA NA - .1503 .0000
Korean All / / / NA .7998 .0000 .9541 .0000
Nonlow / / / NA - .1277 .0001

FIGURE 7.3 Coefficients of Vi height factors, all vowels (*':p < .05; ":p < .1)

The results for the models with all vowels are summarized graphically in Figure 7.3.
They paint a surprising picture: most of the coefficients are negative, suggesting some-
thing like an OCP effect over vowel height. In addition, the pattern of Korean, which is
the only language to display effects in the predicted direction, may well be a relic of the
vowel harmony system that existed in older forms of the language. This apparent anti-
harmony pattern is predicted neither by the attested phonological patterns of vowel
height harmony nor by their well-documented phonetic precursors.
/. Lexical sensitivity to phonetic and phonological pressures 159

FIGURE 7.4 Coefficients of Vi height factors, non-low vowels (*':p < .05; ":p < .1)

Further investigation, however, suggests that a more subtle effect of height-height


harmony can still be found in these languages. In addition to the models of lexical
frequency that incorporated all of the vowels of each language, I built a second model
for each language, this time excluding low vowels. The starting point for this second set
of models was the observation that in the original models for some languages (English,
German, and Korean), the coefficient for high vowels is larger' (more positive) than
the one for mid vowelsin other words, high vowels are slightly more likely than mid
vowels to be associated with a following high vowel. This suggests, in turn, that vowel
harmony might be making itself felt among non-low vowels only.3
The results are given in Table 7.2 and summarized graphically in Figure 7.4. Here,
most of the coefficients (and all of the significant coefficients) are positive, as expected:
in other words, a high vowel is more likely to be followed by a high vowel than a mid
vowel is. Thus, although it does not exactly look the way it might have been expected
to, a static kind of height-height harmony does seem to have an effect on these
languages' lexicons. This is the pattern that matches both phonetic and phonological
facts.

7.4.3 Height-voicing interactions in the lexicon


The interaction between the height of adjacent vowels found in the previous section
stands in telling contrast with the interaction between vowel height and obstruent
3
This observation does not hold true for the other four languages under investigation. In an earlier
version of this work that used slightly less sophisticated modeling techniques, the appearance of harmony
among non-low vowels only was true for a larger proportion of the languages sampled. Although the facts
for the models with all vowels have changed, the results of the models with non-low vowels only have
notthey still strongly suggest some form of height harmony, as discussed below.
loo Abby Kaplan

voicing. To investigate this latter pattern, VC sequences in which C was an obstruent


were extracted from the lexicon of each language. To ensure comparability with the
results for height-height interactions, two models were once again built for each
language: one with low vowels and one without.
The dependent variable was the height of the vowel, and the predictor was the
voicing4 of the consonant. Other factors included the manner of the consonant and
the frontness, tenseness, and stress of the vowel.
Table 7.3 gives the results of each model, and Figure 7.5 summarizes the models
graphically. The defaults are low vowels (mid vowels in the models without low vow-
els) and voiced obstruents; thus, the coefficient of .071 in the English model with
all vowels means that a voiceless obstruent is slightly more likely to be preceded
by a high vowel than a voiced obstruent is. If lexical frequency reflects the relevant
phonetic facts, we should expect to see mostly negative coefficients (meaning voiceless
obstruents are associated with lower preceding vowels than voiced obstruents are).
Figure 7.5 shows that, if anything, we see the opposite of the expected effect: in
many cases, voiced obstruents are associated with lower preceding vowels rather than
higher ones. Excluding low vowels as was done for the height-height pattern does
produce a few negative coefficients, but no overall trend consistent with the phonetic
facts. In other words, the underphonologized interaction between vowel height and

TABLE 7.3 Prediction of vowel height from voicing of following obstruent


Factors in Final Model
Language Vowels Frontness Tenseness Stress Manner Coefficient p
English All / / / / .0709 .0393
Nonlow / / / / 1.0632 .0000
German All / / / / .2025 .6607
Nonlow / / / / -.7546 .1647
Dutch All / / / / 7243 .0000
Nonlow / / / / -.0386 .4621
French All / / NA / .0259 .4819
Nonlow / / NA / .5087 .0000
Spanish All / NA NA / .9183 .0000
Nonlow / NA NA / -.1942 .0027
Serbo-Croatian All NA NA / .0116 .4789
Nonlow NA NA 0.0000 1.0000
Korean All / / NA / .7658 .0000
Nonlow / / NA / 1.3187 .0000

4
'Voicing is here a cover term for a range of laryngeal contrasts: voiced/voiceless, unaspirated/aspirated,
lax/tense. In each pair, obstruents of the former type are expected to raise preceding vowels. Korean data
is reported for the unaspirated/aspirated obstruent series.
/. Lexical sensitivity to phonetic and phonological pressures 161

FIGURE 7.5 Coefficients of voiceless C factors C*':p < .05; ":p < .1)

obstruent voicing does not consistently affect lexical frequency, unlike its phonolo-
gized cousin involving height-height interactions. If we examine the various combi-
nations of levels of other factors and consider the effects of interaction coefficients, the
difference between the height-height pattern and the height-voicing pattern remains:
the height-height trends are far more consistent with the phonetic pattern than the
height-voicing ones.

7.5 Discussion and conclusion

Cases of underphonologization provide a useful opportunity for studying separately


the behavior of phonetic and phonological patterns in natural language. In the two
cases examined here, some phonetic patterns, but not others, are reflected in patterns
of lexical frequency: only those phonetic patterns that have categorical ('phonologi-
cal') analogues seem to affect the lexicon. This result suggests the model in Figure /.ia.
The implication is that lexical frequency is not affected by phonetics directly; rather,
it is influenced by whatever cognitive mechanisms are responsible for what we call
'phonology', which are themselves heavily influenced (but not completely determined)
by phonetic factors.

7.5.1 Other interpretations of the results


An alternative interpretation of these results might argue that they show, not that
phonetics has no influence on lexical frequency, but rather that phonetics influences
phonology and lexical frequency in the same way. For example, one might propose
loi Abby Kaplan

that some phonetic patterns are 'stronger' than others, and it is only the strong
phonetic patterns that are able to influence other areas of natural languageamong
them, phonology and the lexicon. Although a priori plausible, it is likely that this
idea cannot be the whole story, given Moretons (2oo8a) finding that the acoustic
effects of vowel height and obstruent voicing on vowel height are of comparable
magnitudes.
It is certainly possible that this case of underphonologization could be accounted
for in terms of cue strengthfor example, by showing that although the two pre-
cursors are acoustically equivalent, one is perceptually stronger than the other. (See
Yu 2011 for an approach along these lines.) However, even if it proves possible to
construct a universal measure of the strength of phonetic precursors, we must still find
a way of identifying which precursors are ineligible for phonologization. For example,
are they precursors that fall below a certain threshold of strength? Or are they the
percursors that are the weakest within some small comparison set? It is by no means
clear that this criterion, whatever it is, emerges spontaneously from purely phonetic
considerations; even specifying a simple threshold seems to also require some kind of
higher-level, more coarse-grained cognitive mechanismin other words, what here
I call 'phonology'.

7.5.2 Lexical frequency: explanandum or explanans?


By asking which components of the grammar influence lexical frequency, this chapter
takes for granted that lexical frequency itself is something to be explained by other
parts of the human linguistic system. This assumption is far from uncontroversial; for
example, theories of sound patterns that fall under the umbrella of 'exemplar models'
(Pierrehumbert 2002) take the construction of phonological categories and patterns
to be based on the statistical distribution of the incoming data. In other words, in
many theories of this type, it is patterns of lexical frequency that drive phonological
facts, rather than the other way around (as in Figure 7.6).
The present results are consistent with exemplar-type theories as well; indeed, to
my knowledge, the architecture of such models predicts essentially the generalization
observed here: a close match between phonological patterns and patterns of lexical
frequency. If a given phonetic pattern is able to influence the lexicon, that same pattern
should go on to influence phonology; thus, the apparent absence of cases in which

FIGURE 7.6 Lexicon affected by phonetics, affects phonology


/. Lexical sensitivity to phonetic and phonological pressures 163

the lexicon contains a pattern that is never realized in phonology is good for these
models.5 Of course, the reason why some phonetic patterns influence the lexicon
while others do not is an important topic for future research within exemplar-type
frameworks, just as the reason why some phonetic patterns influence the phonology
while others do not is a topic for research in the more traditional framework adopted
above (for example, see Moretn 2008a for a proposal in the latter framework). The
contribution of this chapter is merely to rule out a model in which phonetics has one
relationship between phonology and an independent relationship with the lexicon:
these results show that whatever mechanisms selectively allow some phonetic patterns
but not others to be reflected in phonology, the same selectivity applies to which
phonetic patterns are reflected in the lexicon.

7.5.3 Future research


Future research should focus on investigating a broader range of data, both in terms
of languages and in terms of cases of underphonologization (or, more generally,
of phonetics-phonology mismatch). In addition, the question of what mechanisms
actually lead to these frequency imbalances is an important one that appears to be
understudied (although see Martin 2007). Finally, I have not addressed the question
of what determines the potential size of a lexical frequency bias. For example, it has
been noted (Maddieson and Precoda 1992; Walter 2008) that the lexical frequency
effects associated with consonant place OCP pressures are far larger than some other
effects involving vowels.
If the claim made here turns out to be more broadly supported, and we find that
lexical frequency is truly selectively sensitive to phonological pressures alone, then
patterns of lexical frequency offer linguists a valuable window onto the phonetics-
phonology distinction and a new tool for investigating it.

Appendix: Feature coding


Vowels in the databases were coded for height, frontness, rounding, tenseness, and
stress. Height, frontness, rounding, and tenseness were coded as shown in Table 7.4.
The diphthongs [19], [09], and [9] were included because they are used in CELEX
to represent r-colored vowels; they were coded high, high, and mid, respectively.
The rising mid diphthongs [el], [ei], [oey], and [90] were included as well. All other
diphthongs (e.g., [ai]) were excluded, as was schwa.

5
Pierrehumbert (looib) argues that if a pattern of lexical frequency is sufficiently non-robust, it can fail
to surface as a phonological pattern because not enough speakers will happen to have a lexicon that contains
the right information. Thus, an exemplar theory with this assumption could handle cases in which patterns
of lexical frequency do not match phonological patternsbut only if the lexical patterns are sufficiently
weak.
104 Abby Kaplan

TABLE 7.4 Vowel Features


Front Back
Nonround Round Nonround Round
High Tense i, i:, i:: y y, y:, y:: u, u:
High Lax I, la Y u, u, 09
Mid Tense e, e:, el, si 0, 0:, y T, t o, o:, 90
Mid Lax e, :, s:, as 9 , :, : 9, A o, 5, o:
Low Lax se, se, se: a, , a:, a, , a:, : D, D:, D:

TABLE 7.5 Consonant Features


Lab. Dent. Alv. Retr. Pal. Vel. Glot.
Stop phpbb' thtdd' khkgg'
Fricative fv 6 sz gz^ p/3 xy h
Affricate pf tstsh gf h tph tp tf tfh (fe (fe'

Consonants were coded for voicing, place, and manner. Place and manner coding
are shown in Table 7.5. Sonorants (not listed) play no role in the analyses above.
8

Phonologization and the typology


of feature behavior
JEFF MIELKE

8.1 Introduction
One of the successes of distinctive feature theory has been the identification of a num-
ber of phonetically defined features which are able to describe the groups of sound
that are phonologically active in many unrelated languages. This study measures the
crosslinguistic frequency of occurrence of classes defined by particular features which
have been proposed and examines the phonological behavior of these classes. The
characteristic behavior profiles of particular features are explored in terms of two
approaches to feature effects, one of which draws on representations for explanation,
and the other of which draws upon phonologizable phonetic effects for explanation.
Innatist approaches to distinctive features have accounted for the crosslinguistic
recurrence of particular types of sound patterns by building crosslinguistic gener-
alizations into the representations used for phonological patterns (Chomsky and
Halle 1968; Clements 1985; Sagey 1990). In this view, representations are explanatory.
Recurrent classes are definable using innate features, and the behavior of particular
classes is attributed to the organization of the mental representation of phonology.
The observation that only some logically possible classes of sounds are frequently
active in sound patterns is accounted for by positing that only the features which
define the active classes exist in a universal feature set. More specific observations
are accounted for by positing that certain feature values do or do not exist, and that
features are organized in a hierarchy that restricts the ways in which they can interact.
This approach maybe summarized with the slogan 'Things happen because of features'.
Another view is emergent features (Mielke 2008), in which feature effects are
accounted for in terms of the historical development of sound patterns, as marked-
ness generalizations and other patterns are accounted for in Evolutionary Phonology
(Blevins 2004). Recurrent phonologically active classes are defined by features whose
phonetic correlates are involved in commonly-phonologized phonetic effects, and the
166 JeffMielke

FIGURE 8.1 Relationships between phonetics, features, and phonological patterns (Mielke,
2008: 8)

behavior of particular classes is attributed to the nature of the phonetic effects from
which they developed. This approach can be summarized with the slogan 'Features
happen because of things'.
These two approaches to feature effects are schematized in Figure 8.1. In innate
feature theories, recurrent sound patterns are built out of distinctive features from the
universal feature set, which are in turn grounded in phonetics, so that features serve
as a link between phonetics and sound patterns. In the case of sound patterns which
are not easily captured using a particular feature set (e.g. sound patterns involving
unnatural classes of sounds), recourse can be made to phonetic effects or historical
accidents (the dotted line connecting 'sound pattern and 'phonetics'). In emergent
feature theory, this is the only connection between sound patterns and phonetics, i.e.
all sound patterns are historical accidents, but some of these accidents, such as the
phonetically natural ones which form the primary data for innate feature theories,
are more frequent than others. Features, in emergent feature theory, are posited by
learners in response to observed sound patterns.
The purpose of this study is to investigate whether the features that are frequently
required to describe sound patterns are attributable to frequently phonologized pho-
netic effects, i.e. to the dotted lines in Figure 8.1. Among the goals is to tease apart
features from their phonetic correlates as sources of explanation. Emergent feature
theory predicts that features are needed in rules only insofar as they are related to the
origin of the particular sound patterns they are involved in. In this view, features are a
component of a language user's grammar, as the formalization of a sound pattern that
is evident in the ambient language. Only features that are useful for characterizing
rules that occur due to real diachronic changes are useful for the grammar. This is
8. Phonologization and the typology of feature behavior 167

a prediction of any approach to phonology in which sound patterns are attribute to


phonologized phonetic effects rather than to synchronie representations. For exam-
ple, if a language undergoes a change in which vowels become allophonically nasal-
ized next to nasal consonants, as a consequence of the phonologization of naturally-
occurring overlap between velum lowering and other gestures, then this language
will have a sound pattern recognizable as nasal assimilation, and a learner of this
language would posit a rule that refers to the classes of nasal consonants and vowels
involved. This would make use of feature values traditionally described by linguists as
something along the lines of [+nasal] and [consonantal].
To explore the connection between phonetic effects and features involved in sound
patterns, the features needed to describe a sample of phonologically active classes
were counted and categorized according to their behavior. The next sections describe
how the counting and classification were performed, and what the results show. The
results bear on questions such as how often particular feature values are active in
phonological patterns, and what the classes defined by these features typically do.
Further, they are relevant for the question of what aspects of these patterns can be
accounted for in terms of the relationship between the features' phonetic correlates
and phonologization precursurs.

8.2 Methods
Counting and categorization of classes defined by particular features were conducted
on the sound patterns included in P-base1 (Mielke 2008), a database of sound patterns
found in language grammars available on library shelves. It includes 628 language
varieties, which are grouped into 549 languages. Dialects were considered to be one
language if they shared an entry in Ethnologue (Grimes, Grimes, and Pittman 2000).
All phonologically active classes involving more than one but fewer than all of the
segments in an inventory reported in these grammars were recorded. The definition of
a phonologically active class given in (i) is based entirely on phonological patterning,
as opposed to the traditional definition of natural class which also involves phonetic
or featural naturalness.
(i) Phonologically active class (Mielke 2008: 48-9): any group of sounds which, to
the exclusion of all other sounds in a language's inventory, do at least one of the
following:
a. undergo a phonological process;
b. trigger a phonological process; or
c. exemplify a static distributional restriction.

1
P-base is freely available on the web: <http://www.oup.com/uk/companion/mielke>. See Mielke (2008:
ch. 3) for a more detailed description of the survey methods.
168 JeffMielke

The 6077 phonologically active classes matching (ia, b) (excluding the static distri-
butional restrictions2) were classified according to features based on those proposed
in The Sound Pattern of English (Chomsky and Halle 1968), listed in (2). The features
[syllabic], [long], and [extra (long/short)] were used to capture prosodie distinctions
which are not considered to be the responsibility of the segmental feature system, and
the classes defined by these features are not discussed here.
(2) Features used for categorization of classes and changes:
[consonantal] [anterior] [delayed primary release]
[vocalic] [distributed] [delayed release of secondary closure]
[sonorant] [strident] [glottal (tertiary) closure]
[continuant] [lateral] [heightened subglottal pressure]
[voice] [back] [movement of glottal closure]
[nasal] [low]
[tense] [high] ([syllabic])
[coronal] [round] ([long])
[covered] ([extra (long/short)])
Features are being used as a familiar descriptive labeling convention in order to group
together different phonologically active classes of sounds from different languages,
for the purposes of counting them. The use of features for classificatory purpose is
orthogonal to the question of whether features are primitives in phonological pat-
terns. The particular feature set in (2) was chosen because it was able to represent
the greatest number of phonologically active classes using conjunctions of distinctive
feature values (Mielke 2008: ch. 7), compared to features from Preliminaries to Speech
Analysis (Jakobson, Fant, and Halle 1952) and Unified Feature Theory (Clements and
Hume 1995). Some of these features are no longer widely used, but this is of little
consequence here, because the primary concern of this study is the behavior of very
frequent classes, which are easily handled, often in the same way, by many different
feature systems. See Mielke, Magloughlin, and Hume (2011) for a comparison of six
feature systems using the same database.
Featural descriptions for phonologically active classes were generated by an algo-
rithm that constructs a feature matrix for the segment inventory of a language and
produces the minimal set of feature values that can define the class, if the class is defin-
able in this way. See Mielke (2008: 47-55) for details of how this algorithm works.3
Of the 6077 classes, 4313 (71.0 per cent) could be represented by a conjunction of the
2
Static distributional restrictions have been excluded in order to avoid mixing productive phonolog-
ical patterns with patterns that are more likely to be fossilized remnants. Distributional restrictions are
nonetheless an interesting object of study, and this is set aside for future investigation.
3
The algorithm selects an analysis which requires the minimum number of features. In cases where one
feature is implied by another one (e.g. [+lateral] implies [+coronal] in the SPE system), both features were
counted. In cases where more than one minimal feature bundle was possible, one of these was selected
arbitrarily.
8. Phonologization and the typology of feature behavior 169

features in (2) (Mielke 2008:147), and these classes are considered further. The residue
of'unnatural' classes, discussed in Mielke (2008:118-33), is a mixture of phonetically
unnatural classes and phonetically natural classes that are not handled well by the
feature theory. Since distinctive features were used to classify the sound patterns, the
analysis in this chapter focuses on the classes that were handled well by SPE features.
In addition to defining the classes automatically, the changes involved in the sound
patterns were defined featurally by hand.
Occurrences of features were categorized according to the types of behavior in
(3). These examples illustrate the four types of feature behavior using [+voice] as an
example. In (3a), [-hvoice] is involved in the change and also present in the environ-
ment triggering the change. This is classified as spread. In (3b), [-hvoice] is involved
in the change, and its opposite value ([voice]) defines the environment triggering
the change, so this would be classified as dissimilation. In (3c(i)), [-hvoice] defines a
class of sounds undergoing a change but is not involved in the change itself, making
this an example of a feature being used to partition an inventory into undergoers
and nonundergoers of a sound pattern. Partitioning an inventory into triggers and
nontriggers of a sound pattern without being involved in the change is also classified as
partitioning, as in (3c(ii)), where [-hvoice] sounds trigger a change which is unrelated
to voicing. Any use of a feature that does not fit into one of these three categories, such
as being involved in a change that is neither dissimilatory nor assimilatory, as in (3d),
is classified as other.

(3) Categorizing feature occurrences by phonological behavior


a. Spread: [son] >> [-hvoice] / [-hvoice]
b. Dissimilate: [son] >> [-hvoice] / [voice]
c. Partition:

i. [-hvoice] -> [-hcont] / [+syl] [+syl]


ii. [-hcont] >> [+strident] / [-hvoice]

d. Other: [-son] -> [-hvoice] / [-hcont] etc.

8.3 Results
The results are presented here in terms of the features used to define classes and
changes, beginning with a look at the frequency of the most frequently-used features,
and proceeding to their behavior.
Figure 8.2 shows the eighteen features that are used in the descriptions of most
sound patterns, and the activity of their + and values. The dark bars represent cases
where a single feature value defines a class or change, and the light bars represent
cases where the feature value is used in conjunction with other feature values to
i/o JeffMielke

FIGURE 8.2 The most frequently-used features

define a class or change. The features [voice] and [high] are used the most, followed
by [back], [nasal], [continuant], and [sonorant]. Occurrences of [voice] are divided
roughly equally into [+voice] alone, [voice] alone, [+voice] as part of a larger
feature bundle, and [voice] as part of a larger feature bundle. Occurrences of [high],
however, are dominated by cases where it is used as part of a larger bundle (such as
[-hhigh, + vocalic]).
The features are sorted according to the total number of occurrences of either fea-
ture value, although it is apparent that some features are more symmetrical than others
in the occurrence of their + and values. The feature [voice] is quite symmetrical, but
other features are used more for one value than for the other. For instance, [-hnasal] is
more than three times as frequent as [nasal], and [sonorant] and [+distributed]
are also much more frequent than their opposites.
Figure 8.3 shows the number of occurrences of spreading, dissimilating, parti-
tioning, and other behavior for classes defined using the most frequently-used fea-
ture values. The chart is based on all of the occurrences of each value, not just
8. Phonologization and the typology of feature behavior 171

FIGURE 8.3 Total number of occurrences of each type of feature behavior for the most frequent
feature values

the ones where the feature is used by itself. Both are interesting things to count.
Counting only single-feature bundles (instances where a feature is used by itself)
yields more differences between features, but counting all occurrences of each fea-
ture (as in the figure) provides results that are more applicable to whether phonetic
effects can account for the need for a particular feature, because the figure shows
all of the instances where the feature is needed to describe a phonological pattern.
Figures 8.6-8.7, m the appendix, display the same information for single-feature
bundles.
1/2 JeffMielke

As seen in Figure 8.3, much of the spreading is concentrated among a small number
of feature values. The feature [+voice] is involved in 20.1 per cent of all spreading,
followed by [-hnasal] (12.1 per cent), [+back] (8.7 per cent), and [+continuant]
(8.2 per cent). Other feature values, such as [sonorant] and [continuant], seldom
if ever spread, but are required in other capacities (partioning, other). Dissimila-
tion is much less frequent than assimilation, and is concentrated primarily among
[sonorant] and [-hconsonantal], followed by [continuant], [voice], [zbnasal],
and [-hhigh]. All of the features are used extensively in partitioning, and there are
features, many of them major class features, which do almost nothing else.
Figure 8.4 shows the rates of spreading, dissimilating, and partitioning for the
same features, as percentages of the occurrences of each feature. The overall average

FIGURE 8.4 Rate of spreading, dissimilating, partitioning, and other behavior for the most
frequently-used feature values
8. Phonologization and the typology of feature behavior 173

rates of spreading, dissimilating, partitioning, and other behavior are indicated. Some
feature values, such as [-hsonorant], [low], and [consonantal], have high rates of
spreading, although they do not account for a very large proportion of all the instances
of spreading, because the feature values are used less overall.

8.4 Discussion

There are several ways in which the results show different behavior for different fea-
tures, including differences in spreading, dissimilation, partitioning, and in the way
particular values of the same feature behave.

8.4.1 Spreading values vs. non-spreading values


In representational approaches, spreading has been taken as particularly strong evi-
dence that a feature exists, and features that do not seem to spread have had their
existence called into question. For example, Hume and Odden (1996) argue that
[consonantal] lacks clear cases of being involved in a change, and that instances where
the feature has been used to define segmental contrasts or define classes involved
in patterns (partitioning) can be reanalyzed in terms of other features. There is a
practical reason for favoring spreading in the establishment of the need for a fea-
ture. The reason is that changes, particularly changes appearing to involve only one
feature, are difficult to reanalyze. On the other hand, there are often many alter-
natives for defining a phonologically active class. The result is that it is difficult to
argue for the necessity of particular features whose main activities do not involve
spreading.
One thing that is clear in the results above is that most of the work done by
features involves partitioning rather than spreading, and many feature values that are
not involved in spreading are needed for their role in defining classes. The classes
involved in phonological patterns most likely cannot be reduced to just the spreading
feature values in Figure 8.3. Therefore the distinction between robustly-spreading
features and non-spreading features cannot be handled by positing a universal feature
set which only includes features with evidence of spreading. This distinction can,
however, be made by appealing to the origins of sound patterns involving spreading,
if spreading feature values have phonetic correlates which are involved in coarticu-
latory effects that are precursors for phonologization. In other words, if assimilation
is treated as phonologized coarticulation, then it is expected that some feature val-
ues (those whose phonetic correlates are prone to noticeable coarticulation, perhaps
those with long, simple gestures that can overlap a lot of segments and be prone to
misparsing) will be spread more often in phonological patterns. Other feature values
may be necessary for other purposes but simply not be involved in sound patterns in
this way.
1/4 Jeff Mielke

The feature values that are responsible for the most spreading depicted in Figure 8.3
have phonetic correlates that are known to be involved in coarticulatory effects. These
features include [voice], [+nasal], as well as [+high] and [zbback], and [-hsonorant]
and [-hcontinuant], which are involved in intervocalic lenition. The feature values that
almost never spread ([son], [voc], and [+cons]) do not have phonetic correlates
that are involved in coarticulation, so they lack phonologization precursors for assim-
ilatory patterns. By attributing the difference between spreading and non-spreading
values to the phonetic effects that tend to get phonologized, it is possible to account
for this distinction while leaving a role in the theory for non-spreading feature values.
The features that have the highest ratios of partitioning to spreading are
[sonorant], [vocalic], and [-hconsonantal]. These features are frequently used
to define phonologically active classes, partioning inventories into undergoers and
non-undergoers or triggers and non-triggers. These features have a history of being
difficult to define. For example, Kenstowicz and Kisseberth (1979: 21) observe that
some features are hard to define phoneticaly but are still necessary to describe sound
systems:
There are no truly satisfactory articulatory or acoustic definitions for the bases of these two
different partitions [consonant and sonorant]. Nevertheless, they are crucial for the description
of the phonological structure of practically every language.

Chomsky and Halle (1968: 318) observe that it is not obvious how laterals should
be defined with respect to the feature [continuant]. Mielke (2005) gives evidence that
they pattern as continuants and as noncontinuants, and suggests that what has been
treated as a single feature [continuant] may be better treated as a bundle of related
phonetic parameters that oppose stops to fricatives and/or vowels but treat phonet-
ically ambiguous sounds (e.g. laterals and nasals) differently. This is consistent with
Kenstowicz and Kisseberth (1979)5 comments about [consonantal] and [sonorant].
Since there is often more than one way to define a class, crucial evidence about the
definitions of partitioning features is hard to find. Consequently, it is more difficult
to make a case for the universality of partitioning features, and Mielke (2005) argues
that [continuant] is only as predictable crosslinguistically as the phonetic properties of
the sounds involved, and that the feature effects which are the most consistent across
languages seem to be the ones with the most direct phonetic basis.
The connection between being hard to define phonetically and not spreading can
be treated as an issue of analysis, i.e. that it is easier to identify phonetic correlates
of features that spread, because the assimilated and unassimilated segments can be
compared directly. When features define a partition rather than spread, it is often
in conjunction with other features, and often there are multiple alternative feature
bundles which can define the same class.
The murkier phonetic dimensions are also less straighforwardly involved in coar-
ticulation or less straighforwardly reinterpreted as phonological patterns when they
are involved in co articulation. The spreading-partitioning distinction coincides with
8. Phonologization and the typology of feature behavior 175

Halle and Stevens (1991)'s distinction between articulator-bound and articulator-free


features. It is potentially harder to have coarticulation that corresponds to a feature
that is not tied to a particular articulatory implementation. McCarthy (1988) accounts
for the lack of spreading by [consonantal] and [sonorant] by placing them inside the
root node in feature geometry so that they do not project their own tiers.
So far the class of partitioning features have been identified negatively in their
failure to be spread in assimilatory patterns (due perhaps to the failure of their pho-
netic correlates to be involved in co articulation or result in phonologization) and the
failure of the classes they define to submit easily to unambiguous feature analysis.
There are also positive reasons for being involved in partitioning, such as when a
features phonetic correlates cause sounds to be more susceptible to certain changes.
The value [sonorant] has such a relationship with voicing: certain supralaryngeal
configurations require more effort to produce vocal fold vibration, amplifying the
tendency of vocal fold vibration to spread between segments, and so the cases of
[sonorant] partitioning include a lot of cases involving [voice].
In proposing the feature [sonorant], Chomsky and Halle (1968: 300-2) draw a dis-
tinction between spontaneous voicing (for sonorants) and non-spontaneous voicing
(for obstruents). A cluster of a voiced sonorant and a voiceless obstruent is a cluster
containing two segments with voicing specifications that are most compatible with
their supralaryngeal configuration. A (non-spontaneously) voiced obstruent, which
requires more effort to voice, is more likely to affect or be affected in a cluster with a
voiceless obstruent. See also Halle and Stevens (1967); Westbury and Keating (1986);
Jansen (2004) on voicing.
The feature [sonorant] is used 658 times as part of a larger feature bundle to define
classes. The value [-hsonorant] spreads in some lenition patterns, but [sonorant],
which accounts for the bulk of the uses of [sonorant], is almost exclusively involved
in partitioning. This is consistent with other major class/manner features with murky
phonetic correlates. The only instances of [-hsonorant] being used by itself are
nine cases where it defines assimilatory or dissimilatory changes. Thus [-hsonorant]
defines changes but the class of sonorants is never active in the database. The value
[sonorant] is used by itself 81 times to define the active class of obstruents. Close to
half of these involve voicing or devoicing. In 32 cases the class is voiced or devoiced,
and in six cases the class triggers devoicing. In addition to having reasons not to
spread, [sonorant] has particular reasons to partition. This also accounts for part of the
asymmetry in the behavior of the + and values: [sonorant] is associated with pho-
netic properties that have a phonologizable effect on voicing, and [-hsonorant] is not.

8.4.2 Dissimilatingfeature values


Many of the instances of dissimilation involve [sonorant], [+consonantal], and
[continuant]. These are primarily fortition patterns, triggered by vowels, in which
1/6 JeffMielke

fricatives become glides (dissimilating to [sonorant, -hconsonantal]) or stops


become fricatives (dissimilating to [continuant]). Other fortition patterns classified
as dissimilation include aspiration of consonants ([-hheightened subglottal pressure],
which is not frequent enough to appear in Figures 8.3-8.4). The remaining feature val-
ues involved in a substantial number of cases of dissimilation are [voice], [zbnasal],
and [-hhigh]. The values [voice] and [nasal] are notable because they are involved
in a lot of single-feature dissimilatory changes (whereas [-hhigh] is often involved in
patterns where it is not the only feature changing). The values [voice] and [nasal]
are the opposites of the feature values which do the most spreading. The value [nasal]
does not define very many changes, but it does define a relatively large number of
dissimilatory changes. The fact that [-hnasal] spreads frequently may be a clue to why
the rarely active [nasal] is involved more than most feature values in the uncommon
phenomenon of dissimilation.
A connection between dissimilation and spreading by opposite feature values is
consistent with Ohala (1981)5 account of dissimilation as the phonologization of
mistakenly undone assimilation. The instances of [voice] and [nasal] dissimilation
are cases where a class of segments become denasalized next to a nasal segment
or devoiced next to a voiced segment. Ohala argues that dissimilation arises as the
result of a listener misapplying a correction for coarticulation or assimilation that
did not actually occur. This means, for example, that dissimilation to [nasal] would
arise as a result of a listener hearing a sequence such as [an] and understanding it as
/an/ with coarticulatory or assimilatory nasalization, and subsequently pronouncing
it as [an] in careful speech, thus 'undoing' nasalization which did not occur in the
first place. In order for a listener to attempt to correct an apparently coarticulated
sequence, there must be reason to suspect the co articulation in the first place. If this
is right, then some types of dissimilation are dependent on assimilation or coarticu-
lation, because they result from the misapplication of a correction for assimilation or
co articulation.
To examine the relationship between the two types of phenomena, a Pearson
product-moment correlation was used to compare dissimilating feature values with
spreading feature values. There is no correlation between dissimilation and spreading
of the same feature value [R = 0.080, p = 0.704], but there is a significant positive
correlation between dissimilation and spreading of the opposite value [R = 0.515,
p = 0.008]. This is illustrated in Figure 8.5, which shows the number of occurrences
of single-feature dissimilatory changes plotted against the number of occurrences of
single-feature spreading by the same feature value (left) and the opposite feature value
(right).
The dissimilatory patterns involving feature values that do not fit this pattern (e.g.
[^heightened subglottal pressure]), while formally similar to the other dissimilatory
8. Phonologization and the typology of feature behavior 177

FIGURE 8.5 Instances of single-feature dissimilation compared with instances of spreading by


the same and opposite feature values

patterns, may originate from domain-initial strengthening rather than overcompen-


sation for coarticulation.

8.4.3 Different behavior of different values


The asymmetrical behavior of + and feature values may also be accounted for
in terms of phonetic effects that can become phonologized. For example, while
[+voice] and [voice] are involved in similar numbers and types of sound patterns,
[-hsonorant] and [sonorant] are asymmetrical, i.e. the class [sonorant] is active in
many languages, but the class [-hsonorant] is not. Both feature values are used as part
of larger bundles, but the value is used more often. The observation that [-hnasal]
is more active than [nasal] and [sonorant] is more active than [-hsonorant] could
potentially inform feature models in which one pole of an opposition is treated as
marked and the other as unmarked. Asymmetrical behavior has also been accounted
for by positing that certain features have only one value, although this leaves the
question of why some features have different numbers of values than others in the
first place (see Flemming 2002: 131-2 for discussion), and does not address relative
activity, since the absence of a value predicts no activity. In the case of [sonorant], an
explanation is that the phonetic correlates of [sonorant] are associated with patterns
such as voicing assimilation, while the phonetic correlates of [-hsonorant] do not
predispose features to such patterns. However, while [-hsonorant] is less active than its
opposite value and defines no classes on its own, it is still involved (in conjunction with
other features) in defining phonologically active classes. Appealing to the relationship
between phonetic effects and feature behavior allows for a more nuanced account of
feature behavior than is available from positing the presence or absence of particular
feature values.
1/8 JeffMielke

8.4.4 The appearance of universality


The similar patterning of segments in unrelated languages has given the impression of
a universal feature set. Drawing upon phonetic effects for explanation, the answer to
the question of why there seem to be universal features varies from feature to feature.
The value [+voice] is needed to describe sound patterns in language after language,
in large part due to the tendency for the vocal fold vibration of one segment to overlap
another segment. This is seen in the large number of cases of assimilation involving
this feature value. The value [sonorant] is active in many languages because certain
supralaryngeal configurations require more effort to produce vocal fold vibration,
amplifying the tendency of voicing to spread, and making [sonorant] a frequent par-
titioning feature. The value [+high] is common for more than one reason, including
the tendency for tongue height gestures to overlap, and the tendency for non-sonorous
vowels to be reinterpreted as glides.
Features are good for referring to groups of sounds that have the same behavior
within a particular language. What the sounds do depends on what sound patterns
they are likely to be involved in in the first place. Tracing the appearance of universality
to particular phonetic effects is consistent with the observation that in synchronie
patterns the segments with the most consistent behavior with respect to a particular
phonetic dimension are the segments which are most unambiguous with respect to
that dimension (Mielke 2005, 2008).

8.4.5 Parallels in sound pattern typology


Accounting for the behavior of different features in terms of the phonetic effects which
lead to particular sound patterns has parallels in the explanation of different types of
sound patterns in terms of their development or phonetic basis. For example, assimila-
tion often involves recurrent phonetically-related sets of features, and this motivated
feature hierarches in which articulatorily-independent features are segregated from
one another (Clements 1985). The feature hierarchies can be interpreted as models
of the articulators involved in the coarticulation which could give rise to assimilatory
patterns. Dissimilation is usually structure-preserving, because it is driven by listeners
who are positing plausible lexical representations as they mistakenly undo assimila-
tion (Ohala 1981), and metathesis is also usually structure-preserving, for the same
reason (Blevins and Garrett 1998; Hume 2oo4b). Epenthesis is usually traceable to a
few natural sound changes (Blevins 2oo8a).

8.5 Conclusion
In summary, the survey has provided evidence that different features have different
behavior, which in many cases can be attributed to their phonetic correlates' involve-
ment in phonologization precursors. This is expected if features are abstractions from
8. Phonologization and the typology of feature behavior 179

sound patterns, and different features have different reasons for existing (Mielke
2008), but it is surprising if features are treated as explanatory primitives. Understand-
ing how phonologization gives rise to certain sound patterns is key to understanding
the sound patterns themselves. Representational approaches have been developed
with the purpose of accounting for feature behavior, but universal representation is
often too blunt an object to account for the behavior of features, often placing too
much emphasis on whether a particular feature or value does or does not exist. Rather,
the life of features seems to be richer than can be compressed into a model based on
presence or absence of universal features.

FIGURE 8.6 Behavior of single-feature bundles (totals)


18o JeffMielke

FIGURE 8.7 Behavior of single-feature bundles (percentage)

Appendix: Single-feature bundles


Figures 8.6-8.7 show the behavior of feature values acting alone. These are the same
as Figures 8.3-8.4 but with all the occurrences of the feature value in conjunction with
other features removed. This results in the appearance of more dramatically different
behavior between features, but excludes instances of spreading, dissimilating, parti-
tioning, and other behavior which these features are involved in in conjunction with
other features.
9

Rapid learning of morphologically


conditioned phonetics: Vowel
nasalization across a boundary*
REBECCA MORLEY

9.1 Introduction
The speed and ease with which young children converge on seemingly complicated
and abstract linguistic knowledge has long been taken as support for the hypothesis
that many aspects of grammar must be innate (Chomsky 1986). However, there has
been a growing body of work showing success in pattern extraction by associationist
models (cf. Rumelhart and McClelland 1986; Elman 2003), as well as statistical learn-
ing by infants and adults (cf. Jusczyk et al. 1999; Maye et al. 2002; Newport and Aslin
2004; Wilson 2006). This work has re-opened the question of how much information
is in fact contained within the auditory input, and how much ofthat can be attended
to and extracted by listeners.
The fact that many aspects of phonetics cannot be attributed to universal processes
of articulation and motor planning, but that individual languages adopt individual
phonetic implementations, is evidence that these phonetic facts must be learned, and
therefore, that learners must be able to induce them from the speech stream. Research
suggests that among the relations that listeners must encode are the degree to which
their language nasalizes vowels before nasal consonants, and lengthens low vowels
relative to high ones (Keating 1985; Sol i992b; Beddor and Krakow 1999). There is,
additionally, considerable evidence that speakers can, at least for certain tasks, access
highly detailed representations of particular words and sounds (Summerfield 1981;
Goldinger 1996; Remez et al. 1997; Clopper and Pisoni 2004; Allen and Miller 2004).

* This work was supported by a Department of Education Javits fellowship as well as a National Science
Foundation IGERT grant to the Johns Hopkins Cognitive Science department. I would like to thank Paul
Smolensky, Colin Wilson, and the members of the Johns Hopkins IGERT lab.
18 2 Rebecca Morley

9.1.1 Evolutionary Phonology


That this fine-grained acoustic and articulatory level information also plays a central
role in how languages change over time is a tenet of the theory of Evolutionary Phonol-
ogy which situates language change in the process of transmission of the physical
speech signal (Ohala 1981, 1990, 1993!}; Blevins 2004). Actual speech cannot readily
be broken up into its constituent elements; adjacent sounds overlap and sometimes
merge or disappear completely when produced at fast, or even normal speaking rates.
The listener, in successfully reconstructing the speakers intended utterance, must
somehow be able to subtract out this noisiness in the channel. This task, in turn,
must rely on a familiarity with the way acoustic cues are likely to appear in particular
environments: a degree of phonetic expertise. For example, some features possess a
certain long-range character, such that, in production, they can be distributed over
multiple segments in the acoustic signal. This is the case for the nasalization that
occurs on vowels adjacent to nasal consonants (phonetically [Vn]). The listener who
is, at some level, aware of this property will be able to hypothesize that the only
underlying nasal feature belongs to the final nasal consonant, phonemically /Vn/.
This reconstruction might be broadly classed as compensation for coarticulation, and
various experimental results support our ability to perform it (Mann and Repp 1980;
Alfonso and Baer 1982; Whalen 1991).
If, however, listeners fail to correctly compensate, either by under-correcting or
over-correcting, that is where change can occur. The stage at which acoustics, or
phonetics, becomes part of the phonology is the point at which a persistent difference
arises between the original source of the signal (the speaker's intention), and what
the listener encodes. This may happen either because what the listener perceives runs
counter to their phonetic expectation, or because the signal is inherently ambiguous
in some way. Consider again the case of nasalized vowels. In some languages, such
as Portuguese, nasality is contrastive on vowels, and minimally distinct word pairs
can be found: /vi/, meaning 'saw', and /v/, meaning came'; /mudu/, meaning mute',
and /mdu/, meaning 'world'. In other languages, such as English, nasal vowels have
a completely predictable distribution; they occur only in environments before nasal
consonants, and are usually attributed to the phonetics rather than the phonology (see
Cohn 1993 for articulatory evidence of gradient nasalization in English). The standard
account for the historical emergence of phonemic nasal vowels attributes their origins
to these phonetic allophones. Furthermore, it is usually assumed that the conditioning
nasal must be lost in order for the nasality on the vowel to become phonemicized (see
Ohala i993b).
In terms of the Evolutionary Phonology account outlined above, listeners who are
aware of the expected degree of phonetic nasalization may use it to predict the pres-
ence of a following nasal consonant. Conversely, the presence of the nasal consonant
can be used to assess the quality of the preceding vowel. The case in which phonetic
nasality is present on the vowel but the conditioning nasal is missing represents a
9- Rapid learning of morphologically conditioned phonetics 183

mismatch between perception and expectation. There is no source in the signal for
the nasality perceived on the vowel, other than the vowel itself. Thus, in the absence
of the originally conditioning nasal consonant, the nasality cues on the vowel may
first become more salient to the listener, leading them to be encoded as part of the
underlying representation of the vowel, and ultimately to an oral/nasal vowel contrast
in the language as a whole. This argument gains support from experimental work by
Kawasaki (1986) who found that, for a series of one syllable stimuli, English speaking
participants rated the vowel as more nasal, the lower the amplitude (and thus the lower
the perceptibility) of the final nasal consonant.1
The ultimate story, of course, must be significantly more complicated than this,
given the multifarious character of natural language. As one example of this com-
plexity, consider the prosodie hierarchy, nested levels of constituent structure each
of which may have its own associated phonetic and phonological rules (see Selkirk
1980; Byrd and Saltzman 1998). Furthermore, a large body of psycholinguistic work
on processing of sentence, word, and morpheme size units exists to motivate a cog-
nitively plausible model of phonological competence. As far as I am aware, there
exists no study of how these known factors should or could be incorporated into a
diachronically-based theory. The current work represents a first step in integrating
these linguistic areas, laying the groundwork for consideration of phonetic cues within
structured domains and how they might become phonological cues over time.
The particular cue investigated in this article is vowel nasalization, and the method-
ology is experimentalan artificial grammar learning task in which vowel nasality
is linked with morphological inflection. In sections 9.2 and 9.3, I will describe the
experiment in detail, and I will argue that success in learning phonetically conditioned
alternations across a morpheme boundary provides necessary support for a phonetic
origin of this type of domain restricted process. In section 9.4 I will consider more
detailed analyses of the experimental results, looking carefully at the degree of pho-
netic nasality in individual stimulus items. Finally, in section 9.5 I will summarize my
conclusions, and situate them within an Evolutionary Phonology account, suggesting
a special role that morphological decomposition might play in the process of phonol-
ogization.

9.1.2 Morphemes and derived environment effects


Evidence for the activity of phonological rules comes largely from alternations; many
of these alternations occur, in turn, at morphological boundaries. In some cases the
morphophonology reflects rules or constraints that are also observable in the static
phonotactics of the language. In other cases, the alternation reflects a restriction
1
But see Beddor (2009) for an alternative account that situates this type of phonologization in the
inherent ambiguity of the signal, where trading relations in the localization of phonetic cues are the source
of the discrepancy between speaker and listener (this closely resembles Blevins 2004 CHANCE route to
change).
184 Rebecca Morley

that does not apply to mono-morphemes (over-application). Finally, the absence of


an expected alternation indicates a restriction that holds only for simplex and not
complex words (under-application).
Consider the following example of an over-application derived environment effect.
Stop consonants preceding high front vowels surface as affricates in Korean, as
shown in (la). This rule, however, fails to apply when the conditioning environ-
ment and the undergoer occur within the same morpheme, as shown in (ib) (from
McCarthy 2002).
(i) a. /pat h /+/i/ -> [patjhi]
field-COP
/mat/+/i/ -> [mat/i]
eldest-NOM
b. /mati/ -> [mati]
knot
/katjhi/ -> [katjhi]
value
Translating this phenomenon into Evolutionary Phonology terms, we could
hypothesis a point in the history of Korean at which a discrepancy arose between the
amount of palatalization perceived across the morpheme boundary, and that antici-
pated by the listener. An additional factor that makes this hypothesis an interesting
one, is the general assumption of productivity in morphology. A difference between
off-line processing (monomorphemes), and on-line processing (polymorphemes)
could provide the basis for derived environment effects. These, in turn, positing a
gradual spread of the rule or constraint from one environment to another, could
potentially provide the basis for the emergence of any new phonological pattern (see
the conclusion for further discussion of this idea).

9.2 Experiment
The present experiment is designed to investigate the relation between phonetic and
phonological patterns, in particular, the hypothesis that phonologization is initiated
due to a discrepancy in perception or production, either of which can lead to a differ-
ence between the listener's analysis of their input and the speaker s intended output.
An example of this, as discussed previously, is the genesis of phonemically nasal
vowels (/V/) from phonetically nasal vowels ([Vn]) that have lost their final nasals. A
clearly necessary pre-condition for a failure of expectation is the establishment of such
an expectation in the first place. The experiment described here will be concerned
with testing the hypothesis that such an expectation can be inducedthat listeners
are able to attend to sub-phonemic vowel nasalization, as well as learn novel rules
that link those cues with the grammatical structure of morpheme boundaries.
9- Rapid learning of morphologically conditioned phonetics 185

Previous experimental work has found both production and perception differences
related to the presence or absence of morphological boundaries. Measurements of
Korean speakers' productions show that there is a difference in amount of variability
in gestural timing with regards to palatalization across versus within a morpheme
boundary (Cho 2001). Work on English has demonstrated a correlation between
morphological boundary strength and degree of phonetic reduction (Hay 2003). And
work by Frazier (2005), also in English, shows reliable differences in vowel length
for monosyllabic words depending on whether they are monomorphemic or bimor-
phemic (e.g. passed/past). These differences have also been shown, in some cases, to be
accessible to hearers. Frazier reports a perceptual effect correlated with vowel length
in terms of participants' likelihood of selecting the mono- or bimorphemic variants in
a forced choice task. Another set of experiments have examined the effect of different
boundary types (phonological phrase, prosodie word) on word selection (Salverda
et al. 2003; Christophe et al. 2004), providing evidence that listeners can make use of
phonetic cues associated with domain structure, cues which can include differences
in segment length, pitch accent, and degree of coarticulation.
These perception experiments employed experimental tasks that were centered
around the explicit disambiguation of semantically distinct minimal pairs. The cur-
rent experiment, on the other hand, involves explicit training on a novel mor-
phological alternation (the presence or absence of the suffix -/m/), and only
implicit training on the associated (redundant) phonetic difference of interest (the
degree of nasalization on pre-nasal vowels). Furthermore, the task is discrimina-
tion between two words which are phonologically identical, but one of which is
the correct word phonetically (given the participants' training), and one of which
is not.
Similarly, the work described above has shown that listeners are sensitive to dif-
ferences in the realizations of the phonetic cues associated with the productions
of different speakers, and in different environments. But that work dealt with cues
which were otherwise contrastive in the given language (such as VOT), or were
robust phonetic indicators of phonemic contrast (such as vowel length differences,
which signal voicing distinctions on stops in English). In the current case, how-
ever, the feature under investigation is nasalitynever contrastive on English vowels,
and, as far as I am aware, almost always redundant in signaling a subsequent nasal
consonant.
The present paradigm is an approach that combines the power of the statistical
learning paradigm (cf. Newport and Aslin 2004)the ability to carefully control the
listener's input, and test associations learned implicitlywith the type of experimental
phonology advocated by Ohala (1974, 1981) and exemplified in Kawasaki's (1986)
work investigating the acoustic-level correlates of language change. This combined
approach is a very promising avenue to testing a number of hypotheses about language
learning and change.
186 Rebecca Morley

9.2.1 Procedure
The experimental design was as follows. Participants were told that they would be
hearing words in a new language, words spoken by somebody named Frank,2 and
that they would later be asked questions about those words. Each word would appear
in the singular, accompanied by a picture of a single object, and then in the plural,
accompanied by a picture of two of the same object. What followed was a passive train-
ing stage in which participants listened to words over the headphones and looked at
pictures on the computer monitor. There was a 1200 ms pause between each picture-
word pair. All training items were presented in such pairs, with the singular appearing
first. The singular and the plural differed in that the plural ended with the suffix -/ml.
For example, participants heard the word 'skimtu' over the headphones at the same
time they saw a picture on the screen of a single key; this was followed by the word
'skimtum' heard over the headphones, accompanied by a picture of two keys.
Participants were trained on twelve distinct singular-plural word pairs, repeated in
six randomized blocks. Once midway through training, and again after training had
completed, a practice block occurred. Each practice block consisted of presentation
of twelve pictures (a random selection from the set of all singular and plural items
seen during training). Each of these pictures was presented once, and accompanied
by two auditorily presented words. Six hundred ms after the picture appeared, the
first word was played; this was followed 800 ms later by the second word. Participants
were instructed to select (via key press) the spoken word that matched the picture
(V for the first word; '2' for the second). This was a test of the singular/plural dis-
tinction, such that, of the two word choices per picture, one was a singular inflec-
tion, and the other a plural inflection. As soon as the participants pressed a key,
the picture disappeared. Participants received feedback during these practice trials,
seeing either correct' or 'incorrect' appear on the screen, and hearing a buzzer noise
in the latter case. Two hundred ms later the next picture appeared. Participants'
performance in the second practice block was used as a criterion test for inclusion of
their results.
The alternation of interest related to the behavior of pre-nasal vowels. Since all
stems were vowel-final, all plural words contained such a vowel before the plural suffix
(-/m/), that is, at the morpheme boundary. Half of the words also contained stem-
internal nasals. In both conditions the degree of regressive nasalization on the vowel
contrasted in these two environments. In training for the ORAL-NASAL condition,
there was o per cent regressive nasalization within morphemes and 100 per cent
across; in the NASAL-ORAL condition, those values were reversed. See Table 9.1
for example stimuli. It should be noted that the ORAL-NASAL condition presents
an alternation to the learner on the word final vowel, whereas the NASAL-ORAL
condition shows no alternation.

2
The speaker was identified in the hopes of priming speaker identificationa task known to support
the encoding of sub-phonemic cues (Reniez et al. 1997; Allen and Miller 2004).
9. Rapid learning of morphologically conditioned phonetics 187

Half of the stems ended in /i/ and half in /u/. 3 Half of the words also contained
/m/ in the stem, which was preceded in three stems by /i/ and in the other three by
/u/. Thus, in both conditions, subjects heard six instances of /im/ and six of /urn/
word-internally, and six instances of each word-finally; what differed was whether
the vowel was nasalized in the word-internal (tautomorphemic) or word-final (het-
eromorphemic) /Vm/ sequences.
At test, subjects were asked to identify which of two words was the one spoken
by Frank. A two-alternative forced-choice task consisted of two auditorily presented
words, one with the high degree of nasal coarticulation, and one without any nasal-
ization on the pre-nasal vowel, but otherwise identical. Test items included both old
words (heard during training), and new words, both singular and plural. These words
were accompanied by pictures, as in the training phase. The six stems that lacked an
internal nasal were tested only in the plural; the six nasal stems were tested both in
the singular and plural. Each test item was presented twice in either order, for a total
of twelve singular test items and twenty-four plural test items each for old and new
words. The order of items was randomized across participants. Table 9.1 gives example
stimuli for each condition.
The phonetic cues present in the stimuli are natural (regressive nasalization asso-
ciated with nasal consonants) and redundant (only occurring with the cue of the
accompanying nasal consonant). Furthermore, vowel nasalization of any degree is
non-contrastive in English, the native language of the experimental participants. For
these reasons, we might not even expect listeners to reliably hear differences along
this dimension. To check for accuracy of perception, the final task of the experi-

TABLE 9.1 Example training and test items for the two experimental conditions
ORAL-NASAL NASAL-ORAL

Training Test Training Test


Singular Plural Singular Plural

oski oskm oskm/oskim oski oskim oskm/oskim


a
skimtu skimtm skimtum/skimtum skmtu skmtum skmtum/skmtuma
skimtu/skmtu skimtu/skmtu
a
The plural nasal-stem test items were different across the ORAL-NASAL and NASAL-ORAL conditions.
This was so that only one position was being tested at a time. In the plural nasal-stem items the trained
degree of nasalization always appeared on the stem-internal vowel, and only the degree of nasalization
on the stem-final vowel alternated across the test items.

3
The full list of stems:
Old Stems (heard both in training and test) New Stems (heard only at test)
'haezi 't/u 'skimtu ai'gimdu 'di 'fi ja'dimfu 't/umgu
'oski 'spi ja'tumbi 9'd3umpu 'ploksu 'ipi 'hum/i glau'dumki
!
'hu g9u 'Gumzi 'twimt/i 'nldu 'stu fra'bimsi 'imdji
18 8 Rebecca Morley

ment was an AXB test to assess participants' auditory discrimination of the phonetic
nasalization cue. Test items consisted of a subset of the words participants had been
tested on earlier in the experiment. For each triplet, participants had to choose which
two words were identical; either the first word was the same as the second, or the third
word was the same as the second. The non-identical token differed only in degree of
nasalization, e.g. [skimtu] [skmtu] [skmtu].

9.2.2 Stimuli
The stimuli consisted of words of varying length and syllable structure. The nasal
vowel at the morpheme edge was always post-tonic (except for the one-syllable roots);
the nasal vowel within the morpheme was always tonic, but sometimes occurred in
the first and sometimes in the second syllable. Natural speech tokens were used; each
token was recorded separately. All words were produced by a phonetically trained
male native American English speaker who was instructed to pronounce unstressed
vowels as full vowels (rather than reducing them). All stimuli were recorded in a
sound-attenuated booth at a 22 kHz sampling rate. Also recorded were monomor-
phemes with final oral consonants, and monomorphemes with final nasal consonants,
e.g. /haezim/, /haezi/, /zib/, /zim/. Nasalized vowel tokens were created by splicing
a portion of the vowel from a stressed nasal coda environment ([zim], generating
[haezm]). Since these stressed vowels were longer in duration it was possible to
select only the part of the vowel that was nasalized. This was determined by visual
inspection, verifying that a nasal formant (around 1000 Hz) was visible throughout
the duration of the vowel. See Figure 9.1 for an example spectrogram. Non-nasalized
tokens were created by splicing from a non-nasalized environment, e.g. [zib], gen-
erating [haezim]. For these items, no nasal formant was visible in any part of the

FIGURE 9.1 Example spectrogram showing nasal formant


9- Rapid learning of morphologically conditioned phonetics 189

vowel. Vowels were normalized for length, such that root-internal front vowels did not
significantly differ across oral and nasal tokens (similarly for root-final front vowels,
root-internal back vowels, and root-final back vowels). To reduce auditory artifacts,
splicing was always done at zero crossings, and effort was taken to produce a smooth
intensity and frequency contour by splicing from multiple parts of the replacement
vowel (beginning, middle, and end). Intensity was adjusted where necessary to avoid
the percept of stress on the spliced vowel.

9.2.3 Participants
Fifty-eight undergraduates at Johns Hopkins University were given course credit
to complete the thirty minute experiment. All participants reached criterion in the
second practice block (>95 per cent correct). Only the results from the thirteen
participants in each condition who reached threshold on the AXB task (>/o per
cent singular and plural items, separately) are plotted. See Figure 9.2. Analyses were
subsequently carried out for the entire set of participants (adding back in the thirty-
two who fell below threshold).

9.2.4 Results
The results discussed here are for responses in the 2-Alternative Forced-Choice task.
Only participants who performed above threshold on the AXB discrimination task are
included for the first analyses. Test items consisted of two types: singular and plural.
The choice at test was always between a nasalized variant and a non-nasalized variant
(that is, the two possible responses differed only in nasality on the critical vowel).
Two conditions were contrasted: ORAL-NASAL: nasalization across boundary, none
within; and NASAL-ORAL: no nasalization across boundary, nasalization within.
Each participant was run in only one condition.
The dependent variable in the first analysis was the percentage of responses for
which participants selected the nasalized variant (as opposed to the non-nasalized).
Under successful learning, the value of this variable should be large for plural items
in the ORAL-NASAL condition, and small for singular items (small for plural items
in the NASAL-ORAL condition, large for singular items). As just described, the
prediction is that there should be a significant interaction between condition and test
type. See Figure 9.2a.
Since the dependent variable was a proportion response, varying between o
and i, a logistic regression analysis was performed (Jaeger 2008; Agresti 1996). Each
model term was assessed for its reduction in the residual deviance of the logistic fit as
compared to the model without that term. The significance of the reduction was then
evaluated using a chi-square test of significance, producing the p values shown here
and in the final column of the tables in the Appendix.
For the first analysis two separate regressions were performed, one for old items,
to establish firstly discriminability and attention, and one for new items, to show true
generalization. Model terms were Condition (ORAL-NASAL or NASAL-ORAL) and
190 Rebecca Morley

FIGURE 9.2 Experiment i: (a) Percent Nasal variant chosen by Condition (= ORAL-NASAL
or NASAL-ORAL) and Type (= Singular or Plural), plotted separately for old and new words,
(b) Percent Correct chosen by trained degree of nasality, plotted separately for old and new
words; Conditions and inflection types combined
9- Rapid learning of morphologically conditioned phonetics 191

Type (plural or singular), as well as a term for the interaction between Condition and
Type (the critical test of difference). Old Items: There was no main effect of Type,
but Condition was significant (x 2 (1) = 6.38,p < .05). Adding the critical interaction
term improved the model fit by a significant factor (x 2 (i) = 39-9 2 > P < .0005). New
Items: There was no main effect for Condition or Type. Adding the critical interaction
term improved the model fit by a significant factor (x 2 (i) = 11.46, p < .005).
Alternatives to the morphological hypothesis were also considered. One possibility
is that the participants were learning a single degree of nasalization (rather than
two differing by morphological location); another possibility is that nasalization was
encoded by stress rather than morphological condition (since stress location is largely
confounded). To test these hypotheses, a second regression model was run by item for
plurals alone, with participant accuracy as the dependent variable, and model terms
for stem type (nasal stem/oral stem, which provides a test of the single-association
hypothesis by comparing performance on nasal stem plurals with other plurals), and
stress location (pre-final/final) No effect was found for the interaction of stem type
with stress position.
As can be seen from Figure 9.2b, response levels hovered near chance for items
that were nasalized during training. A regression model of participant accuracy with a
term for training type showed that responses were significantly more likely to be accu-
rate for trained Oral (ORAL-NASAL singular and NASAL-ORAL plural) than for
trained Nasal (ORAL-NASAL plural and NASAL-ORAL singular) (x 2 (i) = 12.33,
p < .0005). No effect, however, was found for condition, and the results of the two
conditions are combined in Figure 9.2b.
A final analysis involved pooling the data from all fifty-eight participants (including
those previously excluded due to their below-threshold performance on the AXB
task). The results of analysis i hold when performed over all participants, implying
that learning has taken place. This in turn suggests that the 70 percent discrimination
threshold is an unnecessarily stringent criterion. Discrimination level (averaged over
all test types in the AXB task) was added as a continuous term to the regression
model of accuracy. This term was a significant predictor of accuracy over all test
items (x 2 (i) = 7-48, p < .05). This shows, perhaps unsurprisingly, that the better
participants were at the AXB task (at reliably discriminating the difference to be
learned), the more accurately they performed at test.

9.2.5 Discussion
A robust interaction effect (Condition x Type) indicates an effect of training on
participant response. For old items, learners were able to encode detailed phonetic
representations for words they had heard before, representations that included infor-
mation about sub-phonemic vowel quality features for at least two different positions
in the word. This same interaction effect for new items indicates that learners were able
to make an association between those phonetic features and some other property of
the training words such that they were able to correctly generalize to novel test items.
192 Rebecca Morley

A main effect of Condition for old items indicates that the nasalized variant was
less likely to be chosen overall for the NASAL-ORAL condition than the ORAL-
NASAL condition. This main effect goes away, however, for new items. As a result,
it is not entirely clear how to interpret this finding. Furthermore, accuracy modeled
against condition for both old and new items indicates that there was no difference in
performance between the ORAL-NASAL and the NASAL-ORAL condition (p > .5).
There was, however, a consistent difference in accuracy between oral and nasal
items, an asymmetry such that learners were more permissive of oral forms when they
were trained on nasal, but were less likely to accept nasal forms when trained on oral.
This might reflect a bias such that, prior to any training, there is an expectation for a
low (or close to zero) degree of nasal coarticulation in all contexts. To test this hypoth-
esis, a control experiment was run. The results are described in the next section.

9.3 Experiment 2: Control


Experiment 2 was run as a control condition in which participants were exposed to
variable input (equal numbers of oral and nasal tokens in both singular and plural
contexts). The results will show whether a bias exists for English speakers to pick
oral forms, thus accounting for the asymmetry found in Experiment i. If the bias
hypothesis is correct, then oral items should be chosen significantly more often than
nasal in Experiment 2. Otherwise, in the absence of such a bias, we should expect
participants to be at chance.
The stimuli and procedure were the same as for Experiment i, except now partici-
pants heard both variants (Nasal and Oral) of each word type. For nasal-stem plurals,
there were four possible variants: e.g., {skimtum, skfmtum, skimtm, skmtum}; for
singulars, there were two: {skimtu, skimtu}. Over four blocks of training data, partic-
ipants heard each possible nasal-stem plural once, and each possible singular twice.
Twenty-one Johns Hopkins undergraduates were given course credit to participate
in the thirty minute experiment. The thirteen with the highest scores on the second
practice trial (>9O per cent correct) and the AXB task (>/5 per cent correct overall;
average = 88 per cent) were included in analysis. The results are shown in Figure 9.3.
The results show that there is no oral bias for this group of English speaking par-
ticipants. If anything, there's a slight tendency to pick the nasal variantat least for
singular items, that is, stressed vowels. This is consistent with other work characteriz-
ing English as a language with a relatively high degree of phonetic nasalization (Sol
i992b; Tanowitz and Beddor 1997). Regression analyses comparing the results of
Experiment i to Experiment 2 show that there is no significant difference between the
trained Nasal items and participants' responses after hearing unpredictable nasaliza-
tion (although ORAL-NASAL plural old items are marginally different from control).
In other words, participants seem to act as though they had received no consistent
9. Rapid learning of morphologically conditioned phonetics 193

FIGURE 9.3 Experiment 2: Percent Nasal variant chosen by Type (^Singular or Plural), plotted
separately for old and new words

training for those items, and the learning effect found in Experiment i is carried by
the trained Oral items. In order to determine the reason for this asymmetry, I took a
closer look at the phonetic characteristics of the experimental stimuli in question.

9.4 Degree of nasalization

As described in previous sections, nasalized items were created by splicing nasalized


vowels from a stressed pre-nasal environment. The only criterion used was that a nasal
formant (around 1000 Hz) be visible in the spectrogram throughout the length of the
spliced vowel portion. This measure was based on previous work referring only to
the proportion of a vowel that was nasalized (rather than to degree) (Tanowitz and
Beddor 1997; Beddor and Krakow 1999). This method, however, relying as it does on
visual inspection, is clearly quite crude. Furthermore, there has been work suggesting
a standardized approach to measuring degree of vowel nasalization. In the following
sections I will be following the methodology of Chen (1997).

9.4.1 Measurements
Co articulation with a neighboring nasal segment can be observed in nasal formants
which are often visible in the spectrogram of a nasalized vowel. The amplitude of these
194 Rebecca Morley

formants is correlated with degree of nasalization. Also correlated with nasalization is


a lowering of the amplitude of the vowel's first oral formant. Chen's metric combines
the amplitude of the first nasal peak in the vicinity of the first oral formant (^950
Hz): Pi, and the amplitude of the first oral formant: Ai (Chen 1997). A higher
value of Pi (as compared to the oral vowel) indicates a higher level of nasalization,
whereas a higher value of Ai (as compared to the oral vowel) indicates a lower level
of nasalization. The quantity Ai-Pi is computed for both members (oral and nasal) of
each pair of tokens, with the difference, A, giving the relative amount of nasalization.
Ai and Pi were measured in each instance using a spectral window over the entire
length of the vowel. A pair of example spectral windows are shown in Figure 94b for
the token bskim' (cf. Figure 9.1), with approximate Pi location indicated.
This measure was taken for the stimuli of Experiments i and 2, calculating the
nasalization difference between the Nasal and Oral variants for each word. Box plots
of the results are given in Figure 94a (where nasalization for nasal-stem plurals was
measured on the final vowel). As can be seen, degree of nasalization varied consid-
erably, with a minimum value of 1.5 dB (9umzi vs. 9umzi) and a maximum of 21.6
dB (aigimdum vs. aigumdm). 4 Average nasalization for [i] vowels was somewhat
higher than that for [u] vowels (11 dB vs. 9 dB).
If Chen's measurement can be taken as an objective degree of nasalization then
the spread in Figure 9.43. indicates a non-uniform distribution over the Nasal tokens
participants heard in each of the experimental conditions. The oral bias reported in the
results section becomes less surprising when viewed in this light. Training conditions
are more accurately characterized as Oral-Variable Nasal, and Variable Nasal-Oral.
The lack of measurable variation in the nasality of the Oral items correlates with
participants' more consistent rejection of nasal test variants. Oral variants, however,
are more acceptable responses to a category of training items that exhibited a range of
nasalization degrees. Another way to express these results is as participants' sensitivity
to a departure from English-like coarticulationfound only in the oral tokens.

9.5 Conclusions
This chapter has been a description of initial experimental work within a paradigm
that combines the strengths of the artificial grammar learning apparatus with the
insights of work investigating the phonetic bases of phonological sound changes. The
results of Experiments i and 2 suggest that listeners can perceive and encode different
phonetic associations for boundary versus non-boundary environments. This might
be characterized as the development of an expectation for a degree of coarticulation,
or perhaps degree of variability of coarticulation. What is more, adult speakers can
learn new associations of this type with very little training (less than thirty minutes).
4
The measured values fall within the range observed by Chen in her study of the natural productions
of eight English speakers (monosyllables, either completely oral or nasal in context, e.g. 'bed' vs. 'men).
FIGURE 9.4 (a) Box plot of Degree of nasalization: A(Ai-Pi). Separately by vowel (i or u),
and separately by Old and New words. The extreme outlier tokens are indicated, (b) Example
Spectral Slice from token 'oskim' (see Figure 9.1). Top: Nasal; Bottom: Oral. The location of the
nasal formant is indicated by Pi
196 Rebecca Morley

This result may seem surprising for a couple of reasons. One of these is the body of
work that suggests that discriminating and producing phonemic distinctions in a sec-
ond language which are allophonic in the native language is quite difficult (Goto 1971,
Dupoux et al. 1998). In the second case, making an association based on morphology
requires a level of abstraction on the part of the learner, one that is not always observed
in artificial grammar learning experiments in which participants fail to generalize to
novel segments or novel words, or unseen members of a natural class (see Peperkamp
2003 for a review of some of the literature).
For these reasons alone, this result is a significant one. However, it also has a
broader relevance. Listeners may use the phonetic cue of nasality on a preceding
vowel to predict the nasality of a following consonant, or the nasality of a following
consonant to assess the quality of the preceding vowel. And on an Evolutionary
Phonology account it is a mismatch between an expected and an observed degree of
this coarticulation which can lead to the genesis of a phonemically nasal vowel over
time. The experimental results presented here satisfy a necessary condition for this
basic misparsing storythe ability to develop such an expectation in the first place,
in particular, an association tied to the linguistically active domain of the morpheme
boundary. However, there is much further to go in developing a complete theory of
the route from phonetics to phonology.
One question that immediately comes to mind is under what circumstances we
might expect compensation for coarticulation to fail. In other words, what factors
might make it more likely for a mismatch to arise between the listener's expectation
and their perception of the speech signal? Intuitively, these circumstances seem to
be provided in cases in which the conditioning environment is somehow lost. But
what about situations in which the conditioning environment remains? The derived
environment effects in Korean, discussed at the beginning of this chapter in (i), would
be an example of this type.
We could start by assuming, for the moment, that with perfect knowledge of the
correct class of element (the correct boundary) and the expected range of feature
spread (or coarticulation) due to that boundary, we are practically perfect in our
ability to correctly reconstruct the constituent phonemes. If, on the other hand, our
ability to recognize the correct boundary is diminished in some way, then our ability
to apply the appropriate degree of compensation for coarticulation is automatically
compromised. In this way might a listener attribute a phoneme to a different category
than that intended by the speaker, either by interpreting its degree of coarticulation
with the neighboring segments as insufficient (and thus subtracting the relevant fea-
ture), or analyzing the degree as exceeding expectation (and thus adding the relevant
feature).
The way in which morphological junctures might become important to this story
beyond demarcating a specific class of phonological processes is by providing a
mechanism for initiating the process of phonologization. That is, we could advance
9- Rapid learning of morphologically conditioned phonetics 197

the preliminary hypothesis that all internal phonological change originates at the
morpheme boundary. This move allows us to make use of independent work in
psycholinguistics related to the question of word level processes. In this framework,
the ability to reconstruct a morpheme boundary can be related to the representational
status of the morphologically complex word. If a particular complex form, due to its
high frequency of use, achieved a lexicalized, undecomposed status, then the original
morpheme boundary could be thought of as weakening or disappearing.
Lexical access models of morphologically complex words often describe a competi-
tion between two routes to word meaning. One route achieves access via composition
of the constituent morphemes, and the other through the word as a whole. Sensitivity
of response times to word frequency in lexical decision tasks is taken as evidence for
access via the whole-word route. This is expected to occur above a certain frequency
threshold, such that the compositional route only wins when the whole word fre-
quency is relatively low (see Gordon and Alegre 1999 for discussion of these models).
We don't yet know how sub-phonemic information might enter into the picture. But
we can imagine a situation in which a speaker produces the appropriate high degree
of nasalization across a boundary, but a listener for whom the whole word, rather than
the compositional route has become the predominant one fails to compensate for the
boundary effect. This story may not be the right one, but it provides us with some
account of how this original failure to correctly reconstruct the source of the speech
signal might systematically occur, an element often missing from historically-based
accounts. Furthermore, it raises an intriguing hypothesis about the relation between
domain-limited effects in linguistics and general processes. If the former is a necessary
stage on the way to the latter, then a more careful study of a seemingly marginal
phenomenon like derived environment effects might prove fruitful for insights into
linguistic phenomena of all kinds.
Appendix

TABLE 9.2 Experiment i: '% nasal response fit to Condition, Test Type and Condition x Type; above-threshold participants only

OLD Df Dev Resid. Df Resid. Dev P(> |Chi|) NEW Df Dev Resid. Df Resid. Dev P(> |Chi|)
NULL 51 120.23 NULL 51 61.91
Gond i 6.38 50 113.85 0.01* Cond i 0.61 50 61.30 0.43
Type i 2.73 49 111.11 0.09 Type i 0.98 49 60.32 0.32
Cond:Type i 39.92 48 71.18 2.646-10* Cond:Type i 11.46 48 48.85 0.001*

TABLE 9.3 Experiment i: % correct response by item for plurals only; model fit to Stress location (pre-final/final), Nasality of Stem (2
nasals/1 nasal); above-threshold participants only

OLD & NEW combined Df Dev Resid. Df Resid. Dev p(>|Chi|)


NULL 23 32.282
NasalStem i 0.58031 22 31.702 0.4462
Stress i 0.04912 21 31.653 0.8246

TABLE 9.4 Experiment i: % correct response fit to Condition, Test Type, Training type (Oral/Nasal), and AXB results; all participants

OLD and NEW combined Df Dev Resid. Df Resid. Dev p(>|Chi|)


NULL 231 278.42
Cond i 0.3761 230 278.04 0.54
Type i 1.0147 229 277.02 0.31
Training i 12.3306 228 264.69 0.0004 ***
AXB results i 74753 227 257.22 0.006**
Part IV

Social and computational dynamics


This page intentionally left blank
10

Individual differences
in socio-cognitive processing
and the actuation of sound change
A L A N C. L. YU*

lo.i Introduction
What motivates the introduction of new linguistic variants, such as a new sound or
a new sound pattern, and how these variants flourish and propagate throughout the
speech community? These questions are at the heart of research in phonologization
and the origins of sound change. Many theorists drew inspiration from biological
evolution and conceptualize the actuation of sound change in terms of a two-step
process of variation and selection (Lindblom et al. 1995; Kiparsky 1995; Mufwene
2001; Blevins 2004; Mufwene 2008). New variants propagate across a speech com-
munity as a result of a process of selection and rejection by language users who
evaluate all variations with respect to their social, articulatory, perceptual, and lexical-
systematic dimensions. The sources of variation are many (Ohala i993b; Lindblom
et al. 1995; Mufwene 2008; Beddor 2009). Setting aside the influence of language
contact, new variants are commonly assumed to be introduced as the results of the
effects of channel biases that are inherent in the modalities of speech communication
(e.g. biases in motor planning, speech aerodynamics, gestural dynamics, perceptual
parsing; see Garrett and Johnson's chapter in this volume for more discussion) and
analytic biases that come from presumed universal computational mechanisms such
as Universal Grammar (Wilson 2006; Moretn 2oo8a). When members of a speech
* I thank Penny Eckhert, Andrew Garrett, Peter Graff, Lauren Hall-Lew, Tyler Schnoebelen, and Tom
Wasow for their insightful comments and discussions. Attendees of the Variation and Language Processing
workshop at University of Chester and audiences at the Chinese University of Hong Kong, University of
Ottawa, and University of California, Berkeley provided useful feedback. Naturally, all errors are my own.
This material is based upon work partially supported by the National Science Foundation under Grant no.
0949754. Any opinions, findings, and conclusions or recommendations expressed in this material are those
of the author(s) and do not necessarily reflect the views of the National Science Foundation.
202 Alan C. L. Yu

community come to share these new perceptual and production targets, sound change
obtains. How a speech community, or a community of practice (Eckert 2000), comes
to adopt a new norm is a matter of much debate, however. Proponents of exemplar-
based models of sound change, for example, argue that sound change may be mod-
eled in terms of drifts of exemplar clouds' (e.g. Pierrehumbert 2001 a; Wedel 2006,
2007; see also Garrett and Johnson, this volume). That is, assuming that exemplars
in such models retain fine phonetic details of particular instances of speech, new
variants introduced by persistent bias factors would accumulate in such a fashion
that eventually moves the distributions of exemplars in the direction of the biased
variants, presumably as a consequence of convergence via imitation. That is, speak-
ers' production targets are altered along some phonetic dimensions to become more
similar to those of their fellow interlocutors (Babel 2009; Goldinger 1998; Nielsen
2007; Pardo 2006; Shockley et al. 2004). While the ability to imitate is assumed to be
innate (Dijksterhuis and Bargh 2001), imitation is not likely to be the lone driving
force behind the systematic propagation of new variants throughout the speech com-
munity, since phonetic imitation is not an entirely automatic or unrestricted process.
Social factors have been suggested as important motivators for imitation (Giles and
Powesland 1975; Clark and Murphy 1982; Bell 1984; Dijksterhuis and Bargh 2001;
Babel 2009). Gender difference is the one that is most commonly observed, although
there are conflicting results regarding which gender is more likely to imitate. Pardo
(2006), for example, found that men were more likely to converge in a map task than
women, yet Namy et al. (2002) found female participants converged more than male
participants in a shadowing experiment. Speaker attitude toward the interlocutor
(Babel 2010; brego-Collier et al. 2011) and perceived sexual orientation (Yu et al.
2011) have also been associated with degree of phonetic convergence and divergence.
Rather than propagating aimlessly and blindly as implied by a simplistic concep-
tion of an exemplar-based model of sound change, these findings suggest that new
variants are spread across the speech community when they come to be associated
with social significance (Eckert 2000; Labov 2001). It is often argued that social sig-
nificance may be associated with new variants via the influence of socially-relevant
innovators within the speech community (Labov 2001). That is, the propagation of
change happens when the sound patterns of an individual or a group of linguistic
innovators (i.e. the Teader(s)' of change) who occupy sociolinguistically influential
positions within the community are adopted by members of the speech community.
Given that the question of selection hinges on the role of the innovator, research in
the selection aspect of sound change actuation has focused on uncovering the social
dynamics that facilitate the promotion of an innovator (e.g. the network configuration
of the social group, the social profile of the innovator, the stylistic practice of the
individual, etc.).
The twin questions of where variants come from and how they come to acquire
social significance via the role of the linguistic innovator within the speech
lo. Individual differences in socio-cognitiveprocessing and sound change 203

community have traditionally been investigated separately however. Yet, a truly


explanatory theory of sound change, and of language change in general, not only must
explain the origins of variation, it should also take into account the orderly differen-
tiation in a language serving a community (Weinreich et al. 1968), as reflected in the
associations between linguistic variation and social structures and meanings. Social
meanings may be locally-defined (Eckert 2000) or are reflected in macrosocial mem-
berships, such as socioeconomic class, ethnicity, and gender (Labov 2001). Despite
these connections, research on the origins of variation is often pursued without the
consideration of the sociolinguistic aspects of change. While past research has identi-
fied much covariation of linguistic variables with social variables, it remains unclear
what factors, if any, there might be to allow or facilitate the coupling of linguistic and
social variables in the first place.
In this chapter, I explore the hypothesis of individual variability in cognitive pro-
cessing as a conduit for linking the introduction of new variants and their eventual
spread throughout a community. The proposal advanced in this work consists of
three parts. First, I argue that variability in cognitive processing style is an important
contributing factor to variation in perceptual and, by extension, production norms
across individuals. Second, such variability in cognitive processing style can be shown
to correlate with individual differences in social traits. These social traits may in turn
influence how an individual interacts with other members of his/her social network.
Taken together, it is argued that individuals who are most likely to introduce new
variants in a speech community might also be the same individuals who are most
likely to be imitated by the rest of the speech community due to their personality
traits and other social characteristics.
This article begins with a brief review of factors that might contribute to individual
variability in speech perception and production in section 10.2. Section 10.3 motivates
the idea that variability in cognitive processing style is associated with variability in
cognitive traits that have social significance. Section 10.4 presents data establishing a
significant association between socially-relevant cognitive traits such as empathizing
and systemizing drives and how individuals perceive and classify speech sounds in
a context-specific manner. A discussion of the implications of these findings appears
in section 10.5. Section 10.6 concludes with a discussion of the limits of the theory
advocated in this study.

10.2 Background
Individual differences in cognitive processing styles are evident at all levels of human
cognition, including vision (Stoesz and Jakobson 2008), learning (Riding and Rayner
2000), and sentence processing (Daneman and Carpenter 1983; King and Just 1991).
Note that 'individual differences' here are taken to mean variability in cognitive pro-
cessing that are systematic (i.e. governed by some fixed factors), rather than the results
204 Alan C. L. Yu

of chance. Before diving into the effects of cognitive processing style on speech pro-
cessing, I briefly consider individual-level factors that could contribute to variation
in phonetic and phonological processing. Broadly speaking, there are two primary
sources: experiential and cognitive-biological.

10.2.1 Speaker background and past experience


A primary source of individual variability comes from speaker prior experience (lin-
guistic or otherwise), as evidenced in how foreign language learners learn to produce
non-native sounds and sound sequences and how language borrowers incorporate
these sounds and sound sequences into their native language. English speakers, for
example, have been shown to have difficulties with non-native contrasts such as the
Czech retroflex vs. palatal fricatives (Trehub 1976), Korean aspirated, weak, vs. strong
laryngeal contrasts (Francis and Nusbaum 2002), Thai voiced vs. voiceless unaspirated
stops (Lisker and Abramson 1970), Hindi dental vs. retroflex stops, and Salish velar
vs. uvular ejectives (Polka 1991; Werker and Tees 1984, 1994). Difficulties are found
in the production of non-native sounds as well. In addition to having great perceptual
difficulties in perceiving the English /r/-/l/ contrast (Bradlow et al. 1997; Goroto 1971;
MacKain et al. 1981; Miyawaki et al. 1975; Mochizuki 1981; Sheldon and Strange 1982,
Yamada and Tohkura 1992), Japanese speakers, for example, have difficulties with
producing such a contrast as well (Sheldon and Strange 1982; Bradlow et al. 1997).
Many studies have observed that listeners' perceptual responses are influenced by
their knowledge of what are possible and impossible sound sequences in the language
(Davidson 2011; Dupoux et al. 1999; Hall et al. 1998; Massaro and Cohen 1983;
Kabak and Idsardi 2007; Pitt 1998). Massaro and Cohen (1983), for example, found
that, when listeners were asked to classify a synthetic /r/-/l/ continuum embedded in a
C_i context where C = {t, p, v, s}, they were most likely to report the ambiguous liquid
as V when C = /t/ and the least likely when C = /v/ or /s/, presumably due to the fact
that tl- and vr-lsr- sequences are phonotactically ill-formed in English. Phonotactic
influence can be found in speech production as well. Davidson, in a series of studies
(Davidson 2005, 2oo6a, b), demonstrated that speakers' knowledge of the phonology
and phonetics of their native language strongly affects the way they articulate non-
native sequences of sounds. For example, she showed that English speakers often
repair unattested word-initial sequences (e.g. /zg/, /vz/) by producing the consonants
with a less overlapping coordination (Davidson 2005, 2oo6a).
Effects of prior experience on speech perception and production is not limited to
linguistic experience per se. Recent behavioral and neurophysiological studies have
demonstrated superior processing of lexical tones in musicians (Chandrasekaran et al.
2009, Wong et al. 2007, Wong and Perrachione 2007). To be sure, speakers of a
tone language, such as Chinese, show larger mismatch negativity (MMN) responses
than musicians, suggesting that cortical plasticity to pitch contours varies depending
on the types of long-term experience in pitch processing individuals experience;
lo. Individual differences in socio-cognitiveprocessing and sound change 205

English-speaking musicians, as well as native speakers of tone languages, are nonethe-


less more sensitive to pitch changes, measured in terms of MMN and discrimination
judgments, than English-speaking non-musicians (Chandrasekaran et al. 2009). Indi-
viduals with extensive music training are also better at acquiring words cued by tonal
contrasts than non-musicians (Wong et al. 2007; Wong and Perrachione 2007).

10.2.2 Cognitive-biological differences


Notwithstanding the prevalence of prior experience effects on speech perception and
production, such factors are not likely to contribute to sound changes that do not
involve language contact if speakers within the same linguistic community, all else
being equal, have access to the same range of linguistic experience. All else is not equal,
however. Chandrasekaran et al. (2010), for example, observed that variability in the
likelihood of success in learning lexical tonal contrasts is influenced by pre-training
differences in cue-weighting. That is, individuals who attend more to pitch direction
as a cue for tonal contrast are better learners than those who do not. Given that these
subjects have no prior knowledge of tonal languages and have little or no musical
training, the source of such pre-training differences in prior cue-weighting might
originate in non-experientially-driven sources. What are these non-experientially-
driven sources of individual variability of speech perception and production?

10.2.2.1 Neurophysiological factors Diaz et al. (2008) found neurophysiological evi-


dence for individual differences in sensitivity to phonetic contrast even within the
perceiver s native language. Their study found that early, proficient Spanish-Catalan
bilinguals who differed in their mastery of the Catalan (L2) phonetic contrast /e-e/
showed corresponding differences in discrimination accuracy of Spanish vowels
(o-e), reflected electrically as a mismatch negativity (MMN). That is, good perceivers
of the Catalan /e-e/ contrast showed larger MMN responses to both native (/o/-/e/)
and non-native (/o/-/o/) phonetic contrasts than poor perceivers. Two aspects of
this study are particularly noteworthy. First, their findings show that the observed
individual variability stems not from variation in the general psycho acoustic abilities
of the perceivers, but is linked rather to speech-specific abilities. That is, no difference
between the two test groups was observed in the participants' response to acoustic
conditions such as frequency, duration, and pattern (i.e. sequences of two alternating
pure tones). Second, the two groups appear to differ in the way their perception system
is able to extract relevant features of speech sounds, as evidenced by the difference
in the amplitude of the MMN between the groups that are present only at frontal
electrodes, but absent at supratemporal ones. The front generator is associated with
the triggering of involuntary attention, while the temporal generator is associated with
sensory processing and the comparison of sensory information with memory repre-
sentations. Assuming that the capacity to behaviorally discriminate between sounds
depends on two stages (i.e. the automatic generation of a neural signal indicating
206 Alan C. L. Yu

stimulus change followed by the process to 'read' the neural signal and to create
new perceptual categories: Ntnen 2001; Tremblay et al. 1998), Daz et al. (2008)
interpreted this to mean that, while both groups are equally able to represent the
phonetic auditory sensory information and to integrate this information into mem-
ory representations (i.e. processing at Stage 2), they may differ in the strength and
sensitivity of Stage i processing such that the activation of the neural code necessary
for the processing at the temporal areas might be hampered.
Individual variability may also come from differences in the regulation of neuro-
chemistry across individuals. Motivated by the association of striatal function and
phonological processing, as evidenced in the linguistic performance of patients with
Parkinson's Disease (Abdullaev and Melnichuk 1997), Tettamanti et al. (2005) mea-
sured modulations of the dopaminergic system using [11C]raclopride and positron
emission tomography while (Italian-speaking) participants judged the acceptability
of pseudowords that were made to either conform to or violate the phonotactics of
Italian. Crucially, participants in Tettamanti et al.'s (2005) study were drawn from
a healthy non-pathological population (eight healthy right-handed male university
students, ranging from 22 to 29 years old). Nonetheless, they found significant corre-
lations between performance in the pseudoword judgement task and dopaminergic
input to the left dorsal basal ganglia. In particular, better individual performances
correlate with less dopamine release in the left dorsal caudate nucleus while faster
response time correlates negatively with dopamine release in the left dorsal putamen.

10.2.3 'Autistic traits' and speech perception


The type of individual variability of interest here concerns differences in cognitive
processing style. Cognitive processing style refers to psychological dimensions repre-
senting preferences and consistencies in an individual's particular manner of cognitive
functioning, with respect to acquiring and processing information (Ausburn and
Ausburn 1978; Messick 1976; Witkin et al. 1977). A particularly intriguing type of
cognitive processing style effect is the association between levels of'autistic traits' and
speech perception abilities in humans. Stewart and Ota (2008), for example, found
that total Autism-Spectrum Quotient (AQ; Baron-Cohen et al. 200 ib) taken from
within a neurotypical population correlates significantly negatively with the extent of
identification shift associated with the 'Ganong effect' (i.e. the bias in categorization
in the direction of a known word). The AQ is a short, self-administered scale for
identifying the degree to which any individual adult of normal IQ may have traits asso-
ciated with the Autism-Spectrum Condition, of which classic autism and Asperger's
Syndrome are the clearest subgroups. The AQ is not a diagnostic measure, although
it has been clinically tested as a screening tool; traits as assessed by the AQ show high
heritability and are stable cross-culturally. The test consists of fifty items, made up of
ten questions assessing five subscales: social skills, communication, attention to detail,
attention-switching, and imagination. The identification shift associated with the bias
lo. Individual differences in socio-cognitiveprocessing and sound change 207

toward a known word is shown to relate to the 'Attention Switching' and 'Imagination
components of the AQ in particular. These findings suggest that individuals with cer-
tain 'autistic traits' are less likely to be affected by lexical knowledge in their phonetic
perception, possibly due to their heightened sensitivity to actual acoustic differences.
The authors ruled out higher auditory sensitivity retardation of lexical access, and
verbal intelligence as potential alternative explanations for the observed correlation.
They found no correlation of AQ with the performance in a VOT discrimination task,
accuracy and speed in a lexical decision task, or individual verbal IQ. Similar findings
have been reported for native Mandarin Chinese from Taiwan (Huang 2007).
To further examine the extent of the association between 'autistic traits' and vari-
ability in human speech perception abilities, Yu (2010) investigated the association
between 'autistic traits' and the perceptual compensation for vocalic context and talker
voice. Previous studies show that listeners generally perceive more instances of [s]
than [J] in the context of [u] than in the context of [a] (Mann and Repp 1980; Mitterer
2006), presumably because listeners take into account the lowered noise frequencies
of /s/ in a rounded vowel context. Similarly, when listeners encounter ambiguous
sibilants, they more often report hearing /s/ when the talker is male than when the
talker is female (Strand 1999), possibly due to the lower peak frequency of /s/ (i.e.
more ///-like) when produced by male talkers than by female talkers.
In Yu's (2010) study, sixty subjects (32 females; age ranging from 18-47, with a
mean of 22 (SD = 4.7)) performed a 2-Alternative Forced-Choice task by listening
to a series of CV syllables (C = a synthesized 7-step /s/-/J/continuum; V = /a/ or /u/
in either a female or a male voice) and deciding whether the fricative was /s/ or ///.
After the identification task, participants took the Autism-Spectrum Quotient (AQ;
Baron-Cohen et al. 2ooib), Empathy Quotient (EQ; Baron-Cohen and Wheelwright
2004) and Systemizing Quotient (SQ; Baron-Cohen et al. 2003). All three quotients
are short, self-administered scales for identifying the degree to which any individual
adult of normal IQ may have traits associated with Autism-Spectrum Condition. Only
the effects of AQ were reported in Yu 2010, given that than article was focused on
establishing, for the first time, a significant association between 'autistic traits' and
perceptual compensation in speech.
Yu (2oioa) found that the magnitude of the compensation (i.e. context-dependent
identification shifts akin to that of the 'Ganong effect') is modulated by the listener's
sex as well as by the level of'autistic traits' s/he exhibits. In particular, individuals with
low AQ, particularly women with low AQ, show the least amount of identification
shift, but this effect of overall AQ score on identification shift is only evidenced in
the perceptual compensation for vocalic coarticulation, not in the case of talker voice
compensation. That is, individuals' overall AQ scores mediate the processing of lin-
guistic information (i.e. vocalic context), but do not seem to influence the processing
of socio-indexical information such as the (perceived) sex of the talker. The author
did observe that the magnitude of talker voice compensation is modulated by the
208 Alan C. L. Yu

perceiver s AQ subscores, including the components of Social Skills, Attention to


Details, Attention Switching, and Communication. The magnitude of the subscore
effects on talker voice compensation is much weaker than the effects of the total AQ
and AQ subscores (Attention Switch and Communication) on perceptual compensa-
tion for vocalic coarticulation, however.

10.3 The social relevance of individual variation in cognitive


processing style
The association between 'autistic traits' and perceptual compensation for vocalic
co articulation in speech is of particular relevance to understanding the connection
between the creation of new linguistic variants and their eventual propagation across
the speech community. To begin with, given the systematicity of individual variability
in perceptual compensation across cognitive processing style, individuals who con-
sistently do not compensate for coarticulatory effects in speech, i.e. the persistent
minimal compensators, would presumably have different perceptual and pronuncia-
tion norms from individuals who succeed in perceptual compensation, assuming that
perceptual experience informs articulatory production. If such persistent minimal
compensators also occupy socially significant stations within the speech community,
the perceptual and production norms of these individuals might come to be associated
with social significance and spread to the rest of the speech community (Eckert 2000,
Labov 2001, Milroy and Milroy 1985).
Research on the social and personal characteristics of leaders in linguistic change
has found that leaders are more often women rather than men who are core members
of their social network. Such leaders also have intimate contacts throughout their local
groups as well as in the wider neighborhood and the wider contacts often include
people of different social statuses such that their influence spreads downward and
upward from the central group (Labov 2001: 360). In light of these characteristics, it
is interesting to note that the association of 'autistic traits' with degree of perceptual
compensation not only raises questions about the neurocognitive mechanisms under-
lying such a linkage, it also points to potential sociolinguistic ramifications. Building
on the observation that minimal compensation is gender-differentiated (i.e. females
are more likely to under-compensate than males and females with lower AQ under-
compensate more than females with higher AQ), Yu (2oioa) hypothesizes that one
contributing factor to reports of women making use of a wider range of variation than
men (Eckert 1988, 1989, 2000; Labov 2001) and females being more often the more
active agents of the diffusion of sound change compared to men (see Labov 1990,
2001; cf. S chilling-Este s 2002) might be related to women's superior ability to retain
variants in speech (i.e. minimal compensation for coarticulation) than men. That is,
given that low AQ women are least likely to compensate for coarticulatory influence
lo. Individual differences in socio-cognitiveprocessing and sound change 209

in speech perception, it is hypothesized that their perceptual exemplar space would


encompass a wider array of marginal exemplars (i.e. variants) than individuals who
compensate robustly.
To be sure, it remains to be demonstrated that individuals who are minimal com-
pensators have different perceptual and production norms than those who are robust
compensators. Also, the gender effect mentioned should be taken with caution as
biological sex is only one of many potential factors that influence a persons gender
role in society. Notwithstanding these caveats, a link between variation in percep-
tual compensation with sociolinguistically-relevant factors, such as gender, points
to a possible deeper connection between individual variability in speech processing
and socio-cognitive traits. Further evidence corroborating this hypothesis regarding
the connection between individual socio-cognitive variability and the emergence
of sociolinguistic differentiation in sound change comes from studies that establish
statistically significant associations between 'autistic traits' and personality traits. The
AQ, for example, has been shown to correlate with differences in personality traits
such as neuroticism, extraversion, agreeableness, and conscientiousness (Austin 2005;
Wakabayashi et al. 2006).1 In particular, high AQ individuals are associated with high
neuroticism, low extraversion, and low agreeableness (Austin 2005) or conscientious-
ness (Wakabayashi et al. 2006). In addition, Baron-Cohen (2002,2003), who advances
the empathizing-systemizing (E-S) theory of typical psychological sex differences,
including autism, proposes that individuals differ in their drives to empathize (i.e.
the ability to identify another person's emotions and thoughts, and to respond to
these with an appropriate emotion) and to systemize (i.e. the ability to analyze or
construct rule-based systems, whether mechanical, abstract, natural, etc.), which can
be measured by the Empathy Quotient (EQ; Baron-Cohen and Wheelwright 2004)
and the Systemizing Quotient (SQ; Baron-Cohen et al. 2003; Wheelwright et al. 2006)
respectively. Goldenfeld et al. (2005) further proposes to determine an individual's
brain type (i.e. Types E, S, E(xtreme)E, and ES) using the measure D, which is derived
based on a normalized difference between standardized EQ and SQ scores.2 A positive
1
The Five Factor Model of personality consists of five broad personality dimensions: openness, consci-
entiousness, extraversion, agreeableness, and neuroticism (John et al. 2008). Openness refers to a general
appreciation for art, emotion, adventure, unusual ideas, imagination, curiosity, variety of experience. People
with low scores on openness tend to have more conventional and traditional interests. Conscientiousness
is a tendency to show self-discipline and aim for achievement. Individuals who are conscientious tend to
show a preference for planned rather than spontaneous behavior. Extraversion is characterized by positive
emotions and a tendency to seek out stimulation and the company of others. Individuals who are intro-
verted generally lack the social exuberance and activity levels of extraverts and may seem quiet, low-key,
and deliberate. However, their lack of social involvement should not be interpreted as shyness or depression.
Agreeableness is a tendency to be compassionate and cooperative rather than suspicious and antagonistic
towards others. Finally, neuroticism, sometimes called emotional instability, is the tendency to experience
negative emotions, such as anger, anxiety, or depression. Individuals who score low in neuroticism are less
easily upset and less emotionally reactive.
2
Standardized quotient scores were transformed using the formulae S = (SQ <SQ>)/max(SQ) and
E = (EQ <EQ>)/max(EQ), where <... > denotes the typical population mean (see Table 10.1). That is,
210 Alan C. L. Yu

D score indicates a brain type of Type S (i.e. D scores between the 65th and 97.5th
percentile), or Extreme Type S (ES; the top 2.5 per cent), while a negative score
indicates brain type of Type E (scores between the 2.5th and 35th percentiles) or
Extreme Type E (EE; the lowest scoring 2.5 per cent). Scores close to zero indicate
a balanced brain type (i.e. Type B; D scores between the 35th and 65th percentile).
Females are said to be stronger in empathy than their drive in systemizing (E > S,
also referred to as Type E), while males have a stronger drive to systemize than to
empathize (S > E, or Type S). According to this typology, individuals with Autism-
Spectrum Condition (ASC) have an extreme male brain cognitive profile (S E, or
Extreme Type S: Baron-Cohen 2002). Of particular interest here are findings suggest-
ing that individual differences in empathizing and systemizing abilities also closely
associate with differences in personality traits. Nettle (2007), for example, found that
EQ correlates significantly with agreeableness as well as with extraversion. SQ is found
to correlate moderately with openness. Such differences in personality traits may have
consequences for how an individual might interact with other members of his/her
social network. EQ, for example, has been shown to be a significant predictor of social
network characteristics (Nettle 2007). Individuals with higher EQ are associated with
a large sympathy group (i.e. close friends) and a larger support clique (i.e. individuals
to whom one turns in a time of major personal problems), as measured by a self-
reported amount of social contacts and social support.
The connection between personality traits and empathy, systemizing drive, and
brain type is further strengthened in light of the results of a recent survey study
conducted with 116 respondents (70 females, age range = 18-36) at the Univer-
sity of Chicago. As shown in Figure 10.1, the EQ scores of the respondents were
found to significantly correlate with four personality traits, in order of decreas-
ing magnitude of correlation: Agreeableness (r = 0.606,p < o.oooi), Conscien-
tiousness (r = 0.324, p < o.ooi), Extraversion (r = o.248,p < o.o), and Openness
(r = o. 198,p < 0.05). EQ is also weakly correlated with respondents sympathy
group (r = 0.185,p = 0.053) and support clique (r = 0.208,p < o.o5). Unlike what is
observed in Nettle's findings, SQ only significantly correlates with Conscientiousness
(r = 0.238,p < 0.05). Of particular interest are the significant correlation between
Brain Type and personality traits. D scores correlate significantly negatively with
Agreeableness (r = 0.484,p < o.oooi) and Extraversion (r = 0.252,p < o.oi),
suggesting that individuals who are Type E (i.e. high D score) are more likely to be
more agreeable and extraverted, while Type S (low D score) individuals are likely to
be less agreeable and more introverted. Individuals with a balanced brain type, which

the difference between the score and the population mean is divided by the maximum possible score of
the quotient (80 for the EQ and 150 for the SQ). The original EQ and SQ axes were then rotated by 45,
essentially factor-analyzing S and E, and were normalized by the factor of 1/2 to produce the new measure,
D (= i/2((5Q - <5Q>)/i50 - (EQ - <Q>)/8o)).
lo. Individual differences in so do-cognitive processing and sound change 211

FIGURE 10.1 Significant correlations between individual-difference dimensions (EQ, SQ, and
D) and personality traits. A = Agreeableness, N = Neuroticism, C = Conscientiousness,
E = Extraversion, O = Openness, SG = Sympathy Group, SC = Support Clique

comprises the bulk of the respondents, tend to exhibit more neutral personality traits,
at least with respect to agreeableness and extraversion.
Given the associations between individual-difference dimensions such as EQ,
SQ, and brain type, which capture individual differences in cognitive processing
styles, personality traits, and other social characteristics, might they also covary
with differences in perceptual compensation responses across individuals, as in the
case of the AQ? If such an association were established, it would go a long way
to establishing a firm link between individual differences in cognitive processing
212 Alan C. L Yu

style and the emergence and propagation of sociolinguistically-motivated sound


changes.
To this end, in what follows I explore this question through a reanalysis of the
data of Yu (2010)5 original study by considering the effects of the three additional
individual-difference dimensions mentioned above, Empathy Quotient (EQ: Baron-
Cohen and Wheelwright 2004), Systemizing Quotient-Revised (SQ: Baron-Cohen
et al. 2003; Wheelwright et al. 2006), and Brain Type (Goldenfeld et al. 2005) on lis-
teners' ability to perceptually compensate for vocalic context in sibilant identification.

10.4 The model

This section lays out the results of a linear mixed-effects model testing for the effects,
if any, EQ, SQ, and brain type might have on sibilant perception. As reviewed in
section 10.2.3, the data comes from Yu 2010, which tested sixty native speakers of
American English (32 females; age range from 18 to 47, with a mean of 22 (SD = 4.7))
on the classification of an /sV-/V/ continuum by identifying each initial sibilant as
either /s/ or ///. The experiment was implemented in E-Prime. Subjects heard the
test stimuli over headphones in a soundproof booth. Subjects made their selection
by pressing one of two labeled keys on a response box. The session consisted of
three trial blocks. In each block, all 28 tokens (= 2 vowels x 2 talkers x 7 steps)
were presented four times in random order. Each subject categorized 336 tokens
(= 2 vowels x 2 talkers x 7 steps x 3 blocks x 4 times). After the identification task,
participants took the Autism-Spectrum Quotient questionnaire (AQ: Baron-Cohen
et al. (2ooib)), the Empathy Quotient (EQ: Baron-Cohen and Wheelwright (2004)),
and the Systemizing Quotient (SQ: Baron-Cohen et al. (2003)). A more detailed
account of the setup of the experiment and the preparation of the stimuli can be found
in the Materials and Methods section in Yu 2010.

10.4.1 Descriptive statistics


Descriptive statistics of the quotient scores are summarized in Table 10.1. Recall that
the AQ consists of fifty questions. As in Yu (2010), the AQ items were scored on a
Likert scale (1-4). The total AQ score was calculated by summing all of the scores
for each of the items, with a maximum score of 200 and a minimum score of 50.
Scores for the subscales (SS, CM, AD, AS, IM) have a maximum score of 40 and
a minimum score of 10. All scales were scored in such a way that a high score is
more 'autistic', i.e. lower social skills, difficulty in attention switching/strong focus of
attention, high attention to detail and patterns, lower ability to communicate, and
low imagination. Like the AQ, the EQ and SQ were s elf-administered and have a
forced-choice format. Participants were asked to indicate whether they 'strongly agree',
'slightly agree', 'slightly disagree', or 'strongly disagree' with a statement. Approximately
i o. Individual differences in so do-cognitive processing and sound change 213

TABLE 10.1 Descriptive statistics of measured factors. Scores


averaged across the sexes are bolded. The AQ was scored in
such a way that a high score is more 'autistic', i.e. lower social
skills, difficulty in attention switching, high attention to detail
and patterns, lower ability to communicate, low imagination.
The EQ and SQ were scored in such a way that individuals
with high scores are more empathetic and more systemizing
respectively
Factor Sex Mean Median Range SD

Overall AQ 110.00 108 78-155 17.79


/ 109.00 105 78-155 18.33
m 111.20 111 80-151 17.41
Social Skills (SS) 19.82 19 12-33 5.82
fm 20.21 19 12-33 5.58
6.14
19.38 17 12-31
Attention Switching (AS) 24.31 24 15-36 4-79
f 24.12 24 17-34 4.70
m 24.52 25 15-36 4.97
Attention to detail (AD) 26.74 27 15-37 5.24
fm 26.42 26 15-37 5.04
2/.10 27 18-37 5.51
Communication (CM) 19.23 18 10-33 4.96
f 19.12 18 10-33 5.48
m 19.34 18 11-27 4.38
Imagination (IM) 19.65 19.50 10-30 4.44
fm 18.91 19 10-28 4.56
4.22
20.48 21 13-30
EQ 45-50 45.50 10-74 13.29
f 46.67 47 11-74 12.58
m 44.17 44 10-71 14.16
SQ 65.5 64.5 36-144 20.12
fm 62.40 60 36-103 17.41
69.03 65 36-144 22.60

half the items on each questionnaire are worded so that a high scorer will agree with
the item, to avoid response bias. The EQ comprises 40 items and the SQ 75 questions;
two points are given for a 'strongly' response and one point for an appropriate 'slightly'
response. The maximum scores for EQ and SQ are 80 and 150 respectively, while their
minimum is zero.
The distribution of AQ scores was typical of normally developing populations. As
a general comparison, the mean total AQ of individuals with ASC (N = 58) in Baron-
Cohen et al.'s (2001) study was 35.8 (SD = 6.5), while the mean total AQ of the Cam-
bridge University students they surveyed (N = 840) was 17.6 (SD = 6.4). Applying
FIGURE 10.2 Correlations between individual-difference dimensions as measured by the AQ subcomponents, EQ, and SQ for all participants with
regression lines superimposed. The Pearson correlation coefficient, given on top of each subplot, corresponds to the overall correlation irrespective
of sex
i o. Individual differences in so do-cognitive processing and sound change 215

Baron-Cohen et al.'s scoring method (they did not calculate the AQ on a Likert-scale
as in the present study), subjects in the present study have a mean total AQ of 18.45
(SD = 8.25). The distributions of EQ and SQ scores are typical of normal developing
populations as well. Wheelwright et al. (2006) reported that the average AQ, EQ, and
SQ of the neurotypicals in their study were 16.3(80=5.9), 44.3(12.2), and 55.6(19.7)
respectively. Figure 10.2 summarizes the correlation between individual quotients.
SQ correlates significantly only with Attention-to-detail (r = 0.496,p < o.ooi) and
marginally so with Imagination (r = 0.226, p = o.08). EQ correlates significantly with
Attention-Switching (r = 0.391,p < o.oi), Social Skills (r = 0.645,p < o.ooi),
Imagination (r = 0.535,p < o.ooi), and Communication (r = 0.679,p < o.ooi).
SQ and EQ do not correlate significantly (r = 0.169,p = 0.193).
Subjects' ///-responses were modeled using a mixed-effects model with a logit link
function. The model was fitted in R (Team 2010), using the ImerQ function from the
Im4 package for mixed-effects models. Positive regression weights indicate a positive
correlation between a predictor variable and the likelihood of a /// response. The
current model was selected from a full model containing all individual-difference
predictors and their interactions with vocalic context and the subject's biological
sex by eliminating predictors that do not significantly improve model likelihood. In
addition to EQ, SQ, BRAIN TYPE, and the subject's biological sex, the five AQ subscores
were entered into the model, in lieu of the overall AQ, to determine whether the effects
of EQ, SQ, and BRAIN TYPE, whatever they may be, are independent of the effects of the
AQ components on perceptual compensation. Given that the number of individuals
with extreme brain types (EE and ES) was small in this sample population, only
three brain types were considered (i.e. B, E, and S). Exploratory data analysis further
revealed that only the contrast between balanced (B) and imbalanced brain types
(E and S) was relevant, thus the BRAIN TYPE predictor was recoded as a binary pre-
dictor (balanced vs. imbalanced). With the changes to the model predictors described
above, three AQ subscores (IM, AD, CM) dropped out. The final model contains
ten fixed input variables: TRiAL(i-336), STEP (1-7), SUBJECT.SEX (male vs. female),
VOWEL (/a/ vs. /u/), TALKER (male vs. female), AS (1-50), SS (1-50), EQ (0-80), SQ
(0-150), BRAIN TYPE (balanced vs. imbalanced), as well as a by-subject random slope
for TRIAL.
Categorical variables were sum-coded (i.e. female = i, male = i; a = i, u= i;
balanced = i, imbalanced = i). Following Gelman (2008), EQ, SQ, and the AQ
subscores were centered and standardized by dividing the difference between the
input variable and its mean by two times its standard deviation in order facilitate the
comparisons of the magnitude of effects across categorical and continuous factors.
Each unit of difference in a standardized quotient score corresponds to a difference
of two standard deviations. Overall collinearity of predictors was low. The average
partial correlation of fixed effects was 0.014 and the highest variation inflation factor
was 2.479.
2i6 Alan C. L. Yu

TABLE 10.2 Estimates for all predictors in the


analysis of listener response in the identifica-
tion task. c * * * ' = p < o.ooi; c * *' = p < o.oi;
4
*' = p < 0.05
Task-specific factors Coeff SE(/0

Intercept -0.576 0.202 **


TRIAL -0.059 0.148
STEP 3467 0.054***
VOWEL 0.466 0.023 ***
TALKER 0.644 0.022 ***
SUBJECT.SEX 0.310 0.191
VOWEL x SUBJECT.SEX -0.034 0.021
VOWEL x TALKER 0.152 0.021 ***
VOWEL x STEP 0.199 0.046 ***
TALKER x STEP 0.222 0.046 ***

Cognitive factors

AS 0.040 0.46l
SS 0.154 0.596
SQ 0.464 0.404
EQ 0.350 0.528
BRAIN TYPE 0.189 0.213
AS x SUBJECT.SEX 0.043 0.385
VOWEL x AS 0.206 0.052 ***
VOWEL x SS 0.251 0.067 ***
VOWEL x SQ 0.217 0.044 ***
VOWEL x EQ 0.512 0.059 ***
VOWEL x BRAIN TYPE 0.096 0.023 ***
VOWEL x AS x SUBJECT.SEX 0.162 0.044 ***

Table 10.2 summarizes the parameter estimate for each of the fixed effects
in the model, as well as the estimate of its standard error SE(/3), and the signifi-
cance level. Consistent with previous studies on the perceptual compensation for
vocalic coarticulation (Mann and Repp 1980; Mitterer 2006) and the sex of the
talker (Strand 1999), the model shows the expected main effects of vocalic context
and talker voice on sibilant perception. There is approximately a 20 per cent drop
in /// response when the following vowel is /u/ ( = o.466, z = o.023,p < o.oooi:
Figure 10.3a), rather than /a/, while the drop in /// response is about 30 per cent
when the talker is male rather than female ( = 0.644, z = 0.022,p < o.oooi: see
Figure io.3b). There is an interaction effect of vocalic context and talker voice
( = o. 152, z = o.021,p < o.oooi); ///-response is least likely when the talker is
male and the following vowel is /u/ (see Figure io.3c). There are also significant
lo. Individual differences in socio-cognitiveprocessing and sound change 217

FIGURE 10.3 Effects of (a) vocalic context, (b) talker, and (c) their interaction, on sibilant
identification

effects of the continuum step on vocalic context and talker voice gender compen-
sation. Beyond these canonical effects, individuals with low AS subscore (i.e. bet-
ter attention-switching skills: Figure io.4a) or low SS subscore (better social skills:
Figure io.4b) are less influenced by the effects of vocalic context in sibilant identi-
fication. The model likelihood is improved significantly in a model with a VOWEL
x ATTENTION-SWITCHING interaction (x 2 (2) = 16.161, p < o.001) or with a VOWEL
x SOCIAL SKILLS interaction (x 2 ( 2 ) = 21 -473>p < o.ooi) relative to a model with-
out these interactions. The interaction between VOWEL and ATTENTION-SWITCHING
was mediated by SUBJECT.SEX. Unlike Yu (2010), the three-way interaction between
VOWEL, SOCIAL SKILLS, and SUBJECT.SEX did not improve data likelihood significantly
(X 2 (4) = 8.325^ = 0.080).
Drive to empathize and to systemize. To test the effects of EQ and SQ on the
perceptual compensation for vocalic coarticulation, the significance of data likeli-
hood improvement of models with and without two-way interactions between these
cognitive traits and vocalic contexts was examined. The interaction between EQ and
VOWEL significantly improved the model's likelihood (x 2 ( 2 ) = 74-47 2 >P < o.ooi:
Figure io.4c); individuals with lower EQ (i.e. poor empathizers) are less affected
by the vocalic context in sibilant classification ( = 0.512, z = o.059,p < o.ooi).
The interaction between SQ and VOWEL significantly improves data likelihood as
well (x 2 ( 2 ) = 24.128,p < o.oooi: Figure io.4d); this interaction indicates that the
lower SQ an individual scores (i.e. the less driven an individual is to systemize),
the less affected the person is by the vocalic context during sibilant perception
( = 0.217, z = o.044, p < o.ooi).
Recall in Figure 10.2 that EQ correlates significantly negatively with both the
Attention-Switching (AS) and Social Skills (SS) subcomponents of the AQ score.
This suggests that poor empathizers (individuals with low EQ) tend to be highly
2i8 Alan C. L. Yu

focused (high AS score) and have poor social skills (high SS score). Yet, the results
of our statistical analysis thus far suggest that individuals who are less influenced by
vocalic contexts in sibilant perception (the minimal compensator) tend to be poor
empathizers with good social skills (low SS), and are also easily distractible (i.e. low
AS score). These cognitive traits thus appear to be in conflict with each other. That is,
a minimal compensator is not likely to be simultaneously a poor empathizer with low
social skills and distracted attention (and vice versa). This conflict is resolved once
BRAIN TYPE is taken into account.
The interaction between BRAIN TYPE and VOWEL significantly improved the
model's likelihood (x 2 ( 2 ) = 20.983,p < o.ooi: Figure io.4e), suggesting individuals
with imbalanced empathy and systemizing traits (i.e. Types E, EE, S, and ES) are
less affected by the vocalic context in sibilant classification than those with a more
balanced brain type. This finding helps to explain the puzzle above, since it suggests
that not all strong empathizers compensate for vocalic coarticulation equally robustly.
Strong empathizers with a weak systemizing drive are less likely to engage in percep-
tual compensation for vocalic context, as do poor empathizers with a strong system-
izing drive. On the other hand, individuals with a balanced drive toward empathy

FIGURE 10.4 Perceptual compensation for vocalic coarticulation as mediated by (a) attention-
switching skills, (b) social skills, (c) empathy, (d) systemizing drive, and (e) brain type (balanced
(B) vs. unbalanced (U))
lo. Individual differences in socio-cognitiveprocessing and sound change 219

and systemizing (i.e. strong empathizers with a strong drive to systemize or poor
empathizers with a weak drive to systemize) are more likely to compensate for vocalic
coarticulation.

10.5 General discussion


I have shown evidence that individual differences in cognitive processing style, as
measured by the EQ, SQ, and the AS and SS subscores of the AQ, as well as their
derivatives such as Baron-Cohen's brain type typology significantly influence lis-
tener's perceptual responses with respect to sibilant perception in context-specific
settings. These findings suggest that individuals with different cognitive processing
styles might have different perceptual norms. While further research is needed to
ascertain whether differences in perceptual norms as modulated by cognitive process-
ing style also correspond to differences in speech production norms, recent studies at
least suggest a plausible linkage between individual variability in perceptual norm
and individual differences in production targets (Beddor et al. 2002; Harrington et al.
2008; cf. Galantucci et al. 2009; Watkins and Paus 2004). Kataoka (2010, 2011), for
example, shows that an individual's context-specific production targets for vowels
are correlated with her context-specific perceptual responses. While Kataoka's (2010,
2011) study focuses on the production and perception of English high vowels in
different consonantal contexts, her findings nonetheless point to the feasibility of a
perception-production feedback loop (Pierrehumbert 2001 a; Oudeyer 2006) and to
the idea that differences in perceptual norms attributed to differences in cognitive
processing style would be reflected in differences in production norms as well.

10.5.1 The cognitive profile of linguistic innovators


Another vexing question that remains unaddressed so far is how the findings of
this study shed light on our understanding of sound change. As alluded to earlier,
many researchers of sound change, most notably Ohala (i993b) and Blevins (2004),
attribute a primary endogenous source of innovative linguistic variants to listeners
failing to properly compensate for variation from coarticulation (e.g. the vocalic effect
on neighboring sibilants). Errors in perception may lead to adjustments in percep-
tual and production norms. Thus in the case of sibilants, speakers might mistake a
lexical item, say /su/, for /Ju/ by not taking the coarticulatory rounding effect of /u/
into account, and might subsequently start producing the same lexical item as [Ju].
Repeated errors of this nature could result in a drastic reduction of /s/ exemplars
before /u/ and an overwhelming number of /// before /u/ and an s > // u sound
change would obtain.3 However, it is not clear how perception and production norms
driven by listener errors could acccumulte in a systematic fashion and result in sound
3
An example of such a sound change can be found in certain speakers of Modern Cantonese. Underlying
/s/ is palatalized before Ijl but not before /i/ (i.e. [Jyi] ] 'book' vs. [sk] ] 'silk'), suggesting that s-palatalization
220 Alan C. L. Yu

change, since many experimental investigations have shown that listeners are on aver-
age quite effective in compensating' for the effects of coarticulation (Mann and Repp
1980; Mitterer and Blomert 2003; Mitterer 2006; Beddor and Krakow 1999; Beddor
et al. 2002; Viswanathana et al. 2010). This has led to the hypothesis that only listeners
with minimal knowledge of the language, such as children and second language
learners, are likely to repeatedly commit such perceptual errors (Ohala 1993!}; see
also Kiparsky 1995).
The H & H theory of phonetic variation (Lindblom 1990; Lindblom et al. 1995), on
the other hand, advocates a more speaker-oriented approach to sound change. The
H & H theory proposes that speakers adaptively tune their performance along the
H(yper)-H(ypo) continuum according to their estimates of the listener's needs in that
particular situation. These needs include preferences to maximize the distinctiveness
of contrasts and to minimize articulatory effort. Speakers hyp er-articulate when lis-
teners require maximum acoustic information; they reduce articulatory efforts, hence
hyp o-articulate, when listeners can supplement the acoustic input with information
from other sources. From this perspective, sound change occurs when intelligibility
demands are redundantly met or when the listeners focus their attention on the
'how' (signal-dependent) mode rather than the 'what' (signal-independent) mode
of listening (Lindblom et al. 1995). New phonetic variants accumulate during the
'how' mode of listening. When these newly accumulated variants are selected by the
listener-turned-speaker, sound change obtains. However, little is known about the
circumstances under which individuals would focus their attention on the signal-
dependent 'how' mode of listening and away from the signal-independent 'what'
mode.
The discovery of individuals with different 'autistic traits' exhibiting variable
degrees of lexical influence in speech perception and perceptual compensation for
coarticulation provides a promising solution to the seemingly opposing views of the
H & H and the listener-misperception approaches to sound change. Recall that indi-
viduals who exhibit minimal compensation for coarticulation (i.e. low AQ individ-
uals) also exhibit strong lexical effects in speech perception (Stewart and Ota 2008),
while those who compensate for coarticulations strongly (high AQ individuals) tend
to exhibit weak lexical influence. This trade-off between the influence from low-level
phonetic variation and higher order lexical information is in concert with cognitive
theories of autism that argue that autistic individuals have superior abilities with
respect to the processing of low-level perceptual information but exhibit difficulties
with the integration of higher-order information (Bonnel et al. 2003, Happ and Frith
2006, Mottron et al. 2006). In light of these findings, from the perspective of the H &
H model, high AQ individuals can be seen as individuals whose cognitive processing
is due to the rounding of the high front vowel, rather than frontness alone. Note, however, that this change
is only triggered by high vowels since non-high rounded vowels do not trigger this palatalization (e.g. [so:] ]
'comb'). The high back rounded vowel /u/ is not permitted after coronals in Cantonese.
lo. Individual differences in socio-cognitiveprocessing and sound change 221

style favors attending to lower order information (i.e. the 'how' mode of listening),
while low AQ individuals tend to focus more on higher order information, such as
lexical information, and place less emphasis on the low-level detail of the incoming
signal (i.e. the so-called 'what' mode of listening). From this point of view, individuals
who favor attending to the 'what' mode of listening should be the ones who register
more new variants in their phonetic memory 'pool', contrary to Lindblom et al.'s
assumption, since the 'what' mode listeners (i.e. low AQ individuals) exhibit lesser
perceptual compensation for coarticulation. That is, when a speaker produces /su/,
perhaps intending to call out for her dog, but the utterance ends up sounding more
like [Ju], a high AQ individual (the 'how' mode listener) would compensate for the
vocalic coarticulation and categorize the [J] as another instance of/s/, as intended by
the speaker. On the other hand, a low AQ individual (the 'what' mode listener) might
be inclined to accept the percept [Ju] at face value and treat /Ju/ as an acceptable
phonological variant for the name of this dog. Under this scenario, two individuals,
one with high AQ and the other with low AQ, upon hearing the same utterance, might
arrive at very different conclusions as to the name of the dog being called. For the low
AQ individual, who starts calling the dog /Ju/ regularly, this might be seen as a mini-
sound change.

10.5.2 The personality and social profile of the innovator


The presence of a mini-sound change does not guarantee the eventual propagation
of this sound change throughout the language. Given that propagation of linguis-
tic innovation crucially hinges on how the linguistic innovator is embedded within
his/her social environment, whether a minimal compensator (the low AQ 'what' mode
listener) becomes the source of linguistic innovation ultimately depends on what
social role she occupies within her social reality and how such roles could facilitate her
potentials as a linguistic innovator. As noted earlier, sociolinguists have suggested that
linguistic innovators tend to have weak social ties within the local speech community
(Labov 1973; Milroy and Milroy 1985), while leaders in linguistic change, who might
or might also be linguistic innovators themselves, are more often women rather than
men who are centrally located in the socioeconomic hierarchy. Leaders also tend to
have a diffused network structure, often with contacts throughout their local groups
as well as in the wider neighborhood. The wider contacts often include people of
different social statuses such that their influence spreads downward and upward
from the central group (Labov 2001: 360). How might minimal compensators be
distributed within the social network and hierarchy relative to the above-mentioned
characteristics of linguistic innovators and leaders in change? Might the minimal
compensators' personality profile and social distribution contribute to the socially-
structured distribution of linguistic innovation (cf. Cheshire et al. 2008; Stuart-Smith
and Timmins 2009)?
222 Alan C. L Yu

Recall that the in dividual-difference dimensions considered in this study are also
significant indicators of personality traits and other social characteristics. For exam-
ple, AQ is correlated positively with neuroticism and conscientiousness and negatively
with extraversion and agreeableness (Austin 2005; Wakabayashi et al. 2006; see also
discussion regarding Figure 10.1). Jobe and White (2007) found that, with a sample
of non-clinical undergraduate students from a large, urban university (N = 97; mean
age = 19.4 =b 2 years), overall AQ significantly negatively correlates with length of
best friendship (r = 0.23,p = 0.02) and total AQ score is also a valid predictor
in a linear regression of loneliness ( = .48,p < .001), as measured by the UCLA
loneliness scale (version 3: Russell 1996). Given that Yu (2010) found that individu-
als with low AQ are more likely to compensate less for coarticulatory influences in
speech, it suggests that such minimal compensators tend to be less neurotic and less
conscientious but are more extraverted and agreeable. They also tend to have longer
best friendship and stronger feelings of loneliness.
Similar inferences might be made with respect to other individual-difference
dimensions. In the correlation study with 116 respondents discussed above
(Figure 10.1), the Attention-Switching (AS) and Social Skills (SS) subcomponents of
the AQ correlate significantly with various personality and social traits (Figure 10.5).
The AS subscore, for example, significantly correlates positively with neuroticism but
negatively with extraversion, suggesting that individuals who are easily distracted (a
trait of minimal compensators) are not very neurotic and more extroverted. AS scores
also correlate marginally significantly with agreeableness (r = 0.202,p = 0.053).
The SS subscores significantly correlate negatively with agreeableness, conscientious-
ness, extraversion, openness, the size of sympathy group and the size of support clique.
SS subscores also positively correlate with neuroticism. Taken together, individuals
with low SS subscores (another trait of minimal compensators) tend to be more
agreeable, less neurotic, more conscientious, more extraverted and more open to new
ideas. Crucially, such individuals also have more social contacts (as measured by the
size of the sympathy group) and more close friends (as measured by the size of the
support clique).
Likewise, EQ correlates positively with agreeableness, conscientiousness, extraver-
sion, and openness; SQ correlates positively with conscientiousness and openness but
negatively with neuroticism (Nettle 2007). Recall also that individuals with higher
EQ are also associated with a larger sympathy group and a larger support clique (see
discussion with respect to Figure 10.1; see also Nettle 2007).
Finally, minimal compensators generally have imbalanced brain types, that is, of
Type E/EE and Type S/ES. Type E and EE individuals, who have a stronger drive
to empathize, are likely to be highly agreeable, extraverted, and neurotic, but may
also be less conscientious and open; Type S and ES individuals, who are superb
systemizers, are not likely to be neurotic and are likely to be conscientious and
open, even though they might be quite introverted. To the extent that personality
lo. Individual differences in socio-cognitiveprocessing and sound change 223

FIGURE 10.5 Significant correlations between the Attention-Switching (AS) and Social Skills
(SS) subcomponents of the AQ and personality traits. Only significant correlations (p < 0.05)
are shown here. A = Agreeableness, N = Neuroticism, C = Conscientiousness, E = Extraver-
sion, O = Openness, SG = Sympathy Group, SC = Support Clique
224 Alan C. L. Yu

traits have consequences for how individuals interact in the social world, it seems
at least plausible that individuals with imbalanced brain types might have different
social network profiles than individuals with balanced brain types. In particular,
I would conjecture that minimal compensators who are superior empathizers might
be at an advantage in exerting their speech patterns on others within their social
network(s).
That women have been argued to be superior empathizers than men (Baron-Cohen
2003) is, for example, consistent with the general characteristics of leaders in linguistic
change. The fact that good empathizers tend to have a larger sympathy group and
support clique is also consistent with the observation that leaders in change often
have more contacts and have access to a wider network. What is not clear is to what
extent highly systemizing individuals (i.e. Type S or ES individuals) also contribute to
the propagation of sound change. Might the fact that Type S or ES individuals tend to
be more introverted and less agreeable (on account of their low EQ) lead them to have
fewer close friends and have less social contacts with others? If so, the speech patterns
of Type S or ES minimal compensators are not likely to influence the speech patterns
of the rest of the speech community. On the other hand, Type S/ES individuals are
also likely to be more conscientious and open. Labov (1973) suggests that the lames'
(i.e. individuals who are social outcasts or isolates during their formative years) tend
to carry less local features in their speech and are least capable of evaluating the
complexity of the in-group features on account of their exposure to more features of
other dialects and varieties. Could these characteristics (i.e. using less local features
and diminished capabilities in evaluating the complexity of the in-group features) be
a reflection of their Type S/ES brain type? Perhaps paradoxically, Labov concludes
that, to the extent that they are the kinds of lames' who eventually manage to break
out of their own niche and succeed in life, they might still manage to propagate their
speech patterns by virtue of having a wider network of contacts (cf. Milroy and Milroy
1985). It should also be noted that the innovators ultimately do not need to be socially
central themselves. Provided that they play the right role in a social network and exert
an effect on the influential individual(s) in that network, their innovations might still
spread.

10.6 Conclusion
In this work, I have offered support for the idea that, in addition to differences in
individual experiences, a major source of variability in speech comes from inherent
differences in the individual's cognitive makeup (as measured by individual-difference
dimensions such as AQ, EQ, and SQ). Crucially, variation in cognitive processing style
can be shown to covary with differences in listener's response pattern during speech
perception, particularly in the case of perceptual compensation for coarticulation.
To the extent that such differences in perceptual response may ultimately lead to
lo. Individual differences in socio-cognitiveprocessing and sound change 225

individual differences in perceptual and production norms, variability in cognitive


processing style stands to be a major contributor to the creation of new linguistic
variants in sound change. To be sure, covariation between differences in cognitive
processing style and speech processing does not imply a direct causal link. Individual
differences in cognitive processing style and variability in speech processing might
ultimately be reflexes of deeper cognitive mechanisms. Further neuropsychological
research might shed light on this issue.
Notwithstanding the significance of identifying a new source of linguistic variants,
the present findings also shed light on how the creation of new variants might be tied
to the sociolinguistic aspect of sound change propagation. Variationist research in the
past decades has demonstrated time and again the ordered heterogeneity that exists
in language. In particular, linguistic variables are found to covary with sociolinguistic
variables. The research reported in this article shows that such covariation extends
even to the level of speech perception. Whether and how robustly a person takes
coarticulatory contexts into account in sound categorization covaries with differences
in individual-difference dimensions, such as empathy and drive to systemize, as well as
general 'autistic traits', such as attention-switching and social skills. These individual-
difference dimensions are in turn associated with individual differences in personality
and social traits. In particular, it is shown that, while low-AQ individuals are most
likely to discount coarticulatory context in speech perception, their empathizing and
systemizing drives seem to play a significant role as well. Crucially, the effect of empa-
thy and the drive to systemize is not all or nothing. Whether perceptual compensation
is ameliorated in low-AQ individuals is not determined by whether the person is
or is not able to empathize or systemize. Rather, individuals showing an imbalance
between empathizing and systemizing abilities (the so-called imbalanced brain types)
are more likely to exhibit minimal perceptual compensation than individuals who
exhibit balanced individual-difference dimensions.
To be sure, the discovery of significant associations between patterns of speech
perception and sociolinguistically-relevant in dividual-difference dimensions must be
treated with care. To begin with, the extent to which empathizing and systemizing
abilities and general 'autistic traits' are appropriate proxies for capturing the social
characteristics of an individual within a speech community must be investigated
further. As noted earlier, differences in AQ, EQ, and SQ have been associated with
different personality traits, and differences in EQ have also been found to significantly
predict certain aspects of an individual's social network. Yet, it bears emphasizing
that variation in social and cognitive processing style is undoubtedly only one of
many factors contributing to the eventual emergence of a linguistic variant and its
subsequent propagation. Many forces, as documented in the large body of literature
in sociolinguistic and variationist research, may conspire to propel or restrict the
propagation of a new variant. Individual variation in cognitive processing style may
serve as only one of many potential earlier inputs toward what might be a long process
226 Alan C. L Yu

of sound change actuation. Detailed ethnographic studies of individuals with different


cognitive processing styles might be able to reveal in more detail how these individual-
difference dimensions might manifest themselves during an individual's interpersonal
interactions and how they facilitate sound change.
In addition, the identification of a variable as a significant predictor does not nec-
essarily suggest a direct causal relation. What might be the causal relationship, if any,
between the individual-difference dimensions measured in this study and variation
in perceptual responses in speech? Might there be an adaptive significance of such
linguistic variation? It seems reasonable to hypothesize that variations in cognitive
processing style as captured by the various individual-difference dimensions reviewed
here are not directed at creating linguistic variation per se. Rather, linguistic variation
(as a consequence of variation in perceptual abilities) is likely an unintended by-
product of this aspect of human diversity. After all, variation in cognitive processing
manifests itself in domains far beyond the confines of language. For example, indi-
vidual differences in AQ have been shown to predict performance in both typical and
ASC populations on tasks such as self-focused attention (Lombardo et al. 2007), local
versus global processing (Grinter et al. 2009), inferring others' mental states from
the eyes (Baron-Cohen et al. 200ia), and attentional cueing from gaze (Bayliss and
Tipper 2005). Individuals with high AQ have been found to show global perceptual
hemineglect (i.e. a significant reduction in global perception when the stimulus was
presented in left hemifield; Crewther et al. 2010). Variation in 'autistic traits' is asso-
ciated with changes in structure and patterns of activation in typical participants'
brains (von dem Hagen et al. 2010). Differences in perceptual compensation might
just be another 'broader phenotype' (Bailey et al. 1995) that characterizes differences
among individuals along the autism spectrum. Such facts point to an interpretation
of linguistic variation as essentially an accidental by-product of the cognitive and
biological diversity of humans. How might such a cognitive accident contribute to
sound change that is sociolinguistically motivated? Sociolinguists have taught us to
focus on the resources that are available in the linguistic marketplace (Eckert 2000;
Chambers 2003). It is, however, equally important to attend to what type of resources
the individual brings to the table. That is, not everyone is equally receptive to utilizing
the linguistic resources put before him or her. The socio-cognitive processing abilities
of an individual thus provides an important conduit through which the likelihood of
propagation can be discerned. If a well-liked or well-respected individual happened
to be a minimal compensator, the type of perceptual and production norms such
an individual promulgates might be adopted more readily by fellow members of a
speech community. Yu (2010) suggests that low AQ women are least likely to percep-
tually compensate for coarticulatory context in speech perception, and that might be
associated with their increased likelihood to be leaders in change. However, such an
interpretation might be unnecessarily overreaching, as suggested by the present work.
That is, cognitive processing styles may vary along a highly multi-dimensional space.
lo. Individual differences in socio-cognitiveprocessing and sound change 227

The importance of isolating such differences in cognitive processing style is to discern


what trait combinations might simultaneously underlie both individual variability in
speech perception and individual differences in social behavior in the real world.
Establishing such a correlation would strengthen the idea that sociolinguistically-
motivated language change might ultimately have a cognitive biological foundation,
to the extent that differences in cognitive processing style ultimately reflect differences
in the neurobiological diversity in humans.
11
The role of probabilistic
enhancement in phonologization
JAMES KIRBY*

ii.i Introduction
PHONOLOGIZATIONthe process by which intrinsic phonetic variation gives rise to
extrinsic phonological encodingis often invoked to explain the acquisition and
transmission of sound patterns (Jakobson 1931; Hyman 1976; Ohala 1981; Blevins
2004). A familiar example is the idea that lexical tone contrasts can trace their origins
to the pitch perturbations conditioned by differences in obstruent voicing (Matisoff
1973; Hombert et al. 1979). A phonologization account of tonogenesis is sketched in
Table 11.1. First, intrinsic differences in vowel f0 (Stage I) become a perceptual cue to
the identity of the initial consonant (Stage II). If other cues to the contrast between
initial consonants are lost, the contrast may be maintained solely by differences in
/o (Stage III), setting the stage for a reanalysis of pitch as a contrastive phonological
feature.

TABLE ii.i Stages of phonol-


ogization (after Hyman 1976).
Sparklines show time course of/0
production
Stage I Stage II Stage III

pa[] pa[] pa[]


ba[^] b[^] p[^]

* Portions of this work have appeared previously in Kirby (2010). I would like to thank Bob Ladd,
Bob McMurray, Morgan Sonderegger, Alan Yu, and Yuan Zhao for helpful comments and suggestions on
previous versions of this chapter.
ii. Probabilistic enhancement inphonologization 229

TABLE 11.2 Phonologization of/0 in Seoul Korean


manner Hangul 19605 20005 g/055

lat
fortis ppul pul 'horn

T=a
lenis pul p h l 'fire

3X
aspirated p h ul p h l grass'

This process can be observed in vivo in Seoul Korean, a language which maintains
a three-way phonological contrast between initial stops (Table 11.2). While studies of
Korean stop acoustics conducted during the 19605 and 19705 found this contrast to
be signaled primarily by differences in voice onset time (VOT: Lisker and Abramson
1964, Kim 1965, Han and Weizman 1970), subsequent studies have reported that lenis
and aspirated stops are no longer distinguished solely by VOT in either production
or perception, but rather that/ 0 has come to play a more central role (Kim et al.
2002; Silva 2oo6a, b; Wright 2007; Kang and Guin 2008). One way to describe
this change is as the phonologization of previously intrinsic, mechanical phonetic
variation, conditioned here by initial obstruent voicing.
While the phonologization model provides a useful descriptive framework for this
type of sound change, it also raises several new questions. First, while it is known that
multiple acoustic-phonetic cues are available to signal any given phonological contrast
(Lisker 1986), there has been relatively little discussion of how and why certain cues
are targeted for phonologization. In Seoul Korean, for instance, it has been established
that, in addition to VOT and/0, spectral tilt, the amplitude of the release burst, and
amplitude of the release burst are relevant perceptual cues to the initial onset contrast
(Cho et al. 2002; Kim et al. 2002; Wright 2007). So why was/0, and not some other
cue, phonologized in this case?
A related issue is Hyman's (1976) observation that the phonologization of one
cue often entails dephonologization of another, a process sometimes referred to
as TRANSPHONOLOGIZATION (Hagge and Haudricourt 1978). In the case of Seoul
Korean, as/0 has become an increasingly important acoustic correlate of the contrast
between lenis and aspirated stops, VOT has become correspondingly less informa-
tive. Given that contrasts are almost always redundantly cued, this shift is somewhat
unexpected. What might cause an increase in the informativeness of one cue to be
accompanied by a decrease in the informativeness of another?
This chapter proposes to answer these questions by arguing that phonologiza-
tion is an emergent consequence of adaptive enhancement in speech (Lindblom
1990; Diehl 2008). In particular, it is proposed that as contrast precision is reduced,
cues are enhanced to compensate. The degree of enhancement is argued to be a
230 James Kirby

probabilistic function of contrast precision, while the probability with which a given
cue is enhanced is related directly to its informativeness, the degree to which it con-
tributes to accurate identification of a speech sound (what Hume and Mailhot, this
volume, refer to as CUE QUALITY). To explore this hypothesis, phonetic categories are
modeled as finite mixtures (Nearey and Hogan 1986; Toscano and McMurray 2010),
and a case studythe phonologization of f0 in Seoul Koreanis explored in detail
through the use of agent-based computational simulations. The results suggest that
both probabilistic enhancement and loss of contrast precision interact to drive the
process of phonologization.
The remainder of this chapter is structured as follows. Section 11.2 reviews the
roles of the speaker and listener in sound change and motivates an adaptive notion
of enhancement. Section 11.3 discuss the mixture model of phonetic categories, and
section 11.4 describes the algorithm used to simulate speaker-hearer interaction.
These are used to explore the phonologization of/0 in Seoul Korean in section 11.5.
The results and implications are discussed in section 11.6, and section 11.7 provides
a general conclusion.

11.2 Bias and enhancement in sound change


Even under relatively ideal conditions, successful speech communication is a chal-
lenge. Along with contextual and coarticulatory effects, a range of physiological,
social, and cognitive BIAS FACTORS can introduce variability into the acoustic realiza-
tion, potentially obscuring the speaker s intended message. Garrett and Johnson (this
volume) provide a thorough overview of such factors, which include details of motor
planning, aerodynamic constraints, and the effects of gestural overlap and perceptual
hypercorrection. Moreover, cognitive-selectional biases favoring the transmission of
certain sound patterns and speaker-specific social and indexical characteristics may
introduce additional asymmetric variability into the speech signal (Wilson 2006;
Moretn 2008a; Yu, this volume).
What is most important to note for present purposes is that, regardless of their
source, different types of bias factors may have a similar effect: namely, they reduce
the precision with which the phonetic category intended by the speaker is accurately
identified by the listener. In this chapter, the term PRECISION will be used to refer to
the accuracy with which a listener can distinguish between members of a phonetic
contrast, and the term BIAS will be used specifically to refer to factors which reduce
this precision. One example of this type of bias is the aerodynamic voicing constraint
(Ohala 1997), which conditions a loss of precision between voiced and voiceless
stop categories; the neutralization of place cues by high front vowels conditioning
asymmetric misperception of [ki] as [ti] is another (Chang et al. 2001).
In the context of the case study examined in section 11.5, the bias in question
involves hyp o articulation of a phonetic cue, but it is worth abstracting away from
details of particular bias factors in order to ask how speakers and hearers might
11. Probabilistic enhancement in phonologization 231

respond to loss of precision more generally. Researchers such as Ohala (1981 et seq.)
often assume, tacitly or otherwise, that speakers produce phonetic targets more or less
as they are intended (modulo contextual effects such as coarticulation). The response
to a loss of precision may then be a reanalysis on the part of the listener. For example,
on this view, phonologization of a cue such as/0 might come about due to listeners'
failure to compensate for the intrinsic perturbation effects of an initial consonant
on the pitch contour of the following vocalic segment. After these effects have been
phonologized, the initial conditioning environment (here, obstruent voicing), now a
redundant cue to the contrast, is free to dephonologize. However, it is not clear what
motivates this dephonologization, given that phonetic distinctions are rarely signalled
by a single cue. It is also not immediately clear why listeners would fail to compensate
for intrinsic variation along one dimension but not another.
A different account is suggested by more broadly functional approaches to sound
change, which hypothesize a more active role for the speaker (Liljencrants and Lind-
blom 1972; Kingston and Diehl 1994; Boersma 1998). A common theme in these
treatments is the idea that the acoustic realization of a phonetic target may be mod-
ulated both by TALKER-ORIENTED constraints enforcing efficiency in speech commu-
nication ( be efficient') as well as LISTENER-ORIENTED constraints requiring speech
sounds to be sufficiently distinctive ( be understood'). Talker-oriented constraints
are often implemented by penalizing gestures in terms of the energy or precision
required for their realization. Listener-oriented constraints are usually implemented
in such a way as to maximize distinctiveness between contrasts, although this takes
on a variety of forms: combining articulatory gestures which have mutually reinforc-
ing acoustic consequences (Kingston and Diehl 1994), adding redundant features
or secondary gestures to reinforce contrast perception (Stevens 1989; Keyser and
Stevens 2006), encoding a preference for accuracy in the approximation of phonetic
targets (Lindblom 1990; Johnson et al. 1993a; Boersma 1998), or imposing systemic
constraints to maximize the distance between contrasts (Liljencrants and Lindblom
1972; Flemming 2002).
A common thread in all of these treatments is the notion of enhancement of
phonetic targets. In this chapter, the term ENHANCEMENT will be used specifically to
refer to those actions taken on the part of the speaker which increase the precision of
a phonetic contrast. For example, a talker might enhance the contrast between two
initial obstruent categories by producing them with hyperarticulated VOT values,
or by reducing the variability in their productions of those values. These notions of
enhancement and precision will be more rigorously formalized in sections 11.3 and
11.4 below.
Functional approaches predict enhancement to be more likely in situations where it
would improve intelligibility for the listener. This suggests at least a partial explanation
for why any particular phonetic property might be phonologized: all else being equal,
cues which more reliably signal a difference between categories are more likely to be
enhanced. However, it is still not clear why phonologization should be accompanied
232 James Kirby

by dephonologization. Why should the promotion of intrinsic variance to extrinsic


indicator of contrast be accompanied by the reverse process?
The answer advanced here is that phonologization is itself a response to loss of
contrast precision. If cues are enhanced as a probabilistic function of the current
contrast precision (measured as the classification accuracy of the listener) and cue
informativeness (measured as a function of their reliability), this means that more
informative cues are more likely to be enhanced than less informative cues, and cues
will be enhanced to a greater extent when categorization error is high than when it is
low (the PROBABILISTIC ENHANCEMENT HYPOTHESIS). Viewed in this way, phonolo-
gization can be understood as an emergent consequence of the interaction between
bias and enhancement in speech communication.

11.3 A mixture model of phonetic categories


It order to evaluate this proposal, the notions of precision, informativeness, and
enhancement must be rigorously quantified. To this end, it is useful to consider a
representational scheme for phonetic categories that encodes the multidimensional
variability inherent in the speech signal. One formal representation meeting this
description is a FINITE MIXTURE MODEL (McLachlan and Peel 2000), which models a
statistical distribution as a weighted sum (or mixture) of other distributions. Mixture
models have a long history in speech research and have been used in work on speech
perception (Lisker and Abramson 1970; Nearey and Hogan 1986; Pierrehumbert
2ooia; Clayards 2008), the perceptual integration of acoustic cues (McMurray et al.
2009; Toscano and McMurray 2010), and the unsupervised induction of phonetic
category structure (de Boer and Khl 2003; Vallabha et al. 2007; Feldman et al. 2009).
Following previous researchers, it is assumed that the underlying probability distri-
butions of the mixture components (i.e. the cue dimensions) are normal (Gaussian).
In a GAUSSIAN MIXTURE MODEL (GMM), an observation vector x = {#1,... ,#D}
assumed to be independently generated by an underlying distribution with a prob-
ability density function

d)
where the structure 0 = ((7tlt f i l t EI), . . . , (TTK> /%> 2it)) contains the component
weights 7T, mean vectors /, and covariance matrices of the D-variate compo-
nent Gaussian densities A/"i,..., A/it- Figure n.ia shows how these three parameters
describe a given mixture component.
To make this more concrete, think of x as a bundle of cue values representing an
instance of phonetic category c; of D as representing the number of cue dimensions
(m1, m 2 > . . . , mu) relevant to the perception ofthat category; and of K as representing
the total number of category labels ( C i , c 2 , . . . >CFC) competing over the region of
ii. Probabilistic enhancement inphonologization 233

FIGURE 11.1 (a) Parameters of a Gaussian distribution for a single component (adapted from
McMurray et al. 2009). (b) Two class-conditional Gaussians (dotted grey lines) and their mix-
ture (solid black line)

phonetic space defined by D. For example, for a language like Korean with three
initial stops (K = 3) cued along five dimensions (D = 5), we might have c1 = /p/, c2 =
/pp/, c3 = /p h / and m x = VOT, m2 = burst amplitude, m 3 =/ 0 , rn4 = spectral tilt, and
m 5 = following vowel length. A given observation x will thus consist of five elements,
each one providing a value for one of these cues.
Figure 11. ib illustrates a GMM where K = 2 and D = i. The individual component
densities are shown in gray, while the mixture density is outlined in black. Although
more difficult to visualize, the mixture modeling approach extends straightforwardly
to the multivariate case where D > i.
In the GMMs for phonetic categories used in this chapter, experience forms the
basis for both production and perception. The speaker's task is to produce an instance
of a phonetic category; this may be modeled by sampling cue values from the relevant
class-conditional mixture component A/^. The listener's task is to assign this utterance
a category label c. If we assume that listeners weight information in the speech signal
by its quality (informativeness), we can construct a model of their behavior that would
optimize this task. Such models are sometimes referred to as IDEAL OBSERVER models
(Geisler 2003; Clayards 2008). The following section provides a brief overview; for a
more in-depth treatment, see Clayards (2008) or Kirby (2010).

11.3.1 The ideal observer


In order to come to a decision about whether or not a given utterance x = {x1,..., XD }
is a member of category c, the ideal observer requires access to two sources of infor-
mation: p(c) (the prior probability of the category c) andp(x|c) (the probability of
the observation, given that it is a member of category c). These probabilities may be
estimated from the statistical distributions of speech cues (Maye et al. 2002; Clayards
2008). The probability that the speaker intended an instance of category c given the
234 James Kirby

evidence that cue m takes on value x can then be evaluated using Bayes' rule, as
shown in (2):

(2)

If contrasts are represented in a high-dimensional space, posterior probabilities


can still be computed using (2), but are instead conditional on the entire utterance
vector, i.e. p(c|x). As D increases, however, the number of observations required to
obtain robust parameter estimates begins to grow quickly. Under the assumption that
cues are conditionally independent (Clayards 2008; Toscano and McMurray 2010),
the probability that an utterance x bears category label c is simply the product of
the conditional probabilities p (x1 |c), p (x2 |c),... ,p(xD\c) normalized over all K cate-
gories competing over the D-dimensional phonetic space, as shown in (3):

(3)

11.3.2 Cue informativ'eness


The ideal observer model predicts that listeners should make use of the probability
distribution of all cues when attempting to identify a speaker's intended utterance.
The existence of multiple cues to phonetic categories does not, however, imply their
equivalence: some cues provide more information about the perceptual identity of
a sound than do others. The informativeness of a cue can be approximated as its
statistical reliability, although other factors may also contribute (Holt and Lotto 2006).
Intuitively, the less distributional overlap between two categories, the more informa-
tive the cue in determining the perceptual identity of an input.
Figure 11.2 illustrates this concept along a single cue dimension. The solid lines
in Figure n.2a show the distribution for two categories with little overlap along cue
m, while the dotted lines show the distribution for two categories with more overlap.
The categorization functions in Figure n.2b show the probability of categorizing a
stimulus as c1 given the value of m, computed using Equation 3. Note that while the
value of m for which the probability of the stimulus belonging to either category
c1 or c2 is the same (i.e. the point on the ;/-axis where the function crosses 0.5),
the slope of the functions differs, reflecting increased uncertainty in the case of the
dotted distributions in Figure n.2a. In other words, cue m is more informative in
distinguishing between the solid distributions than it is in distinguishing between the
dotted distributions.
While reliability of a cue can be expressed as an identification function, it is also
useful to have an index of a cue's informativeness relative to other cues. One way to
accomplish this is based on the detection-theoretic d! statistic (Green and Swets 1966),
the absolute value of the difference in category means divided by the average variance:
11. Probabilistic enhancement in phonologization 235

FIGURE 11.2 (a) Probability distributions of a cue dimension m for two categories cx (dark
lines) and c2 (light lines). Solid lines show a mixture where there is little overlap between the
components, dashed lines a mixture with more overlap, (b) Optimal categorization functions
given the distributions in (a). (Adapted from Clayards et al. 2008)

(4)
The informativeness com for an individual cue can then be expressed as

(5)
11.3.3 Categorization and contrast precision
Equation 3 allows the listener to compute the probability of category membership, but
it does not determine how such information should be used to assign a category label.
The approach taken here is to assign utterances a category label with probability pro-
portional to their relative strength of group membership (Nearey and Hogan 1986).
For example, an utterance which has probability 0.9 of belonging to category c1 and
probability o.i of belonging to category c2 will be assigned label c1 90 per cent of the
time, and label c2 10 per cent of the time. However, the statistically optimal classifier
the model which maximizes classification accuracyassigns the category label with
the highest maximum a posteriori probability. To continue with the previous example,
an utterance which has probability 0.9 of belonging to category c1 and probability o.i
of belonging to category c2 will always be assigned label c1 by the optimal classifier.
Although optimal classifiers make strong assumptions and their predictions are not
always in line with human classification behavior (Ashby and Maddox 1993), they
provide a lower bound on the error rate that can be obtained for a given classification
problem. In this work, contrast precision s is defined as the current error rate of the
optimal classifier for that contrast, i.e:
236 James Kirby

(6)

11.4 Modeling probabilistic enhancement


The previous section has provided an overview of how speech production and per-
ception can be modeled in a probabilistic mixture model framework, allowing for
the quantification of the notions of contrast precision and cue informativeness. This
section explores how the hypothesis of probabilistic enhancement can be tested using
computational simulation.
In section 11.2, enhancement was informally described as any action taken by the
speaker to increase contrast precision. In light of the previous discussion, we can now
begin to give a more precise definition: if contrast precision is defined in terms of
statistical reliability, enhancing a cue means affecting an increase in informativeness
along that cue dimension. If the probabilistic enhancement hypothesis is correct, then
the targeting of cues for enhancement should be to some extent predictable based on
their informativeness.
One way to explore the predictions of this hypothesis is through the use of com-
putational simulation. The framework described here is broadly exemplar-based, in
that it tracks the production and perception of individual utterances, but it differs
from previous models in several ways. In treatments such as Pierrehumbert (2001 a)
or Wedel (2006), agents map speech tokens onto a granular similarity space based on
the token's similarity to a stored exemplar prototype; exemplars which fall between
the cracks of this space are then encoded as identical. Thus, a stored exemplar need
not correspond to a unique perceptual experience per sey but rather to an equivalence
class' of perceptual experiences. The present implementation differs slightly in that
exemplars are used to estimate the parameters of the cue distributions relevant for
some phonetic contrast. Instead of being mapped to prototypes, experienced tokens
are stored together with decay weights, which are used to determine when an exemplar
should be deleted from the list of tokens associated with a given category label. Once
the decay weight of a token falls below a user-defined threshold, it is deleted from the
list and is no longer referenced during parameter estimation. When simulating speech
production, values for each cue are simply sampled from each conditional density in
the usual fashion. In this way, the same exemplar list may be referenced in both the
production and perception of phonetic categories. A more detailed discussion of the
framework described below can be found in Kirby (2010).

11.4.1 Architecture
Simulations are run for a fixed number of iterations. Each agent is characterized by
a lexicon, a set of exemplar lists 1,..., % corresponding to their experience with
ii. Probabilistic enhancement inphonologization 237

phonetic categories cly..., CK- Before the simulation begins, these lists are populated
by sampling from the conditional densities of a GMM representation of each category.
For simplicity here we consider agents with lxica containing just two categories.
Subsequently each iteration consists of a single interaction between two agents,
one acting as speaker and the other as listener (the framework can also be extended
to accommodate more than two agents). Each iteration contains four steps: produc-
tion, enhancement, bias, and categorization. All agents use the same production and
categorization strategies described in section 11.3. However, the strength of bias and
the degree of enhancement can be altered by manipulating two tuning parameters:

1. a vector X = { A ^ , . . . , A>}, encoding the strength of the phonetic bias affecting


each cue dimension; and
2. a constant G [o, i] representing the functional load or system-wide impor-
tance of the contrast (Martinet 1952; Hockett 1955).

Each iteration then proceeds through the following steps:

1. Production. In the production phase, the talker agent selects a target category
Cfc based on the mixture weights TT^, and samples a series of values xly... >XD
from the conditional densities J\fd(x\k,0) to form a PRODUCTION TARGET
X = (*!,.. . ^D)"^

2. Enhancement. Enhancement contains two sub-steps: first, determining if a cue


will be enhanced, and second, determining which cue is enhanced. The probabil-
ity that any particular dimension m^ will be enhanced is an exponential function
of the current contrast and the functional load constant G [o, i]. The likeli-
hood of enhancement at any iteration is inversely proportional to the contrast
precision (^) scaled by the importance of the contrast (}, i.e. P(enhance) = 6^.
In the event that an utterance is selected for enhancement during a given
iteration, each cue has its distributionally-defined informativeness o)d chance
of being enhanced in that iteration (see section 11.3.2). Once a specific cue has
been targeted for enhancement, its production target value Xd is modified by
sampling from a modified distribution with an exaggerated mean and a reduced
variance, thereby potentially increasing the statistical reliability of the dimen-
sion. The degree to which the mean value is increased and variance reduced is
attenuated by the precision and functional load of the contrast as well as by the
informativeness of the cue dimension selected (see Kirby 2010 for details). The
end result is that more reliable cues are more likely to be produced with extreme
(hyperarticulated) values than less reliable cues, and cues will be enhanced
to a greater extent when error (e) is high and is low (i.e. functional load
is high).
3. Bias. Next, the (potentially enhanced) production target is modified along one
or more cue dimensions by adding the bias vector X. In order to ensure that cue
238 James Kirby

values stay within a well-defined range, each bias term A.^ may be scaled rela-
tive to the distance between category means before being applied, approaching
zero when the means become identical (i.e. when the dimension is no longer
informative in distinguishing the contrast).
4. Categorization. Finally, the modified production target x' is presented to the
listener agent for classification, who assigns it a category label as described
in section 11.3.1. Once labeled, x' is added to the appropriate exemplar list.
Both agents then recompute the memory decay weights for each exemplar in
their lexicon, and delete exemplars whose weights have fallen below the decay
threshold. In the next iteration, the role of speaker is assumed by the listener
agent and vice versa.
In summary, the architecture provides two tuneable parameters (X and /3), corre-
sponding to phonetic bias and functional load, respectively. Varying these parameters
allows us to explore the effects of probabilistic enhancement in different scenarios, and
to see what parameter values best approximate observed data patterns. In the follow-
ing section, the probabilistic enhancement hypothesis is explored in this framework
using empirical data from the phonologization of/0 in Seoul Korean.

11.5 Transphonologization in Seoul Korean


Armed with the computational framework described above, it is now possible to test
the probabilistic enhancement hypothesis using empirical language data. Here, we
consider the case of the phonologization of/0 in Seoul Korean described in section
ii.i. Apparent time studies suggest that while the distinction between lenis and
aspirated stops in the Seoul Korean of the 19605 was mainly cued by VOT, this
distinction is now cued chiefly by/0 at the onset of the following vowel and has been
accompanied by a loss of contrast along the VOT dimension (Silva 2oo6a, b; Kang
and Guin 2008). This is a classic instance of transphonologization, where reduction
of informativeness along one cue dimension is accompanied by enhancement of a
previously redundant dimension. The goal of these simulations was to determine if
these shifts in the distribution of cues could be replicated without making specific
reference to/0 as a target of enhancement.
The proposal advanced here holds that phonologization is driven by loss of contrast
precision, and there exists considerable evidence for a systemic production bias affect-
ing VOT in Seoul Korean (Silva 1992, 1993, 2oo6a). In particular, lenis /p t k/ and
aspirated /ph th k h / stops tend to be produced with similar VOT in initial position.
On Silvas analysis, fortis stops would not be subject to this same bias, since they
are phonologically geminate (2oo6a: 303). Since this proposed bias factor would not
have affected the production of fortis stops, the following discussion is limited to the
contrast between lenis and aspirated stops for expository clarity.
ii. Probabilistic enhancement inphonologization 239

The simulations described here considered five cues which have been argued to
be relevant for the perception of the Korean stop contrast: voice onset time (VOT),
/o and duration of the following vowel (VLEN), the difference in amplitude between
the first two formants of the vowel (HiH 2 ), and the amplitude of the burst (BA).
Data on each of these cues reported in Cho et al. (2002), Kim et al. (2002), Silva
(2oo6a), and Kang and Guin (2008) were used to seed the initial exemplar lists of
two ideal observer agents with a simple lexicon consisting of just two syllables, lenis
/pa/ and aspirated /p h a/. This state corresponds to the cue distributions reported for
Seoul Korean speakers in the 19605. The initial parameters and their correspond-
ing informativeness values are shown in Table 11.3; two-dimensional scatterplots
showing the joint distributions of VOT and each of the cues are shown in the first
row of Figure 11.3. The second row of Figure 11.3 shows distributions based on the
parameters shown in the second half of Table 11.3, estimated on the basis of the speech
of younger speakers gathered in the 2ooos. It is to these distributions that the state of
the agents will be compared at the end of each simulation run. In other words, we
want to see under what circumstances the agents' states will evolve from the top row
of Figure 11.3 to the bottom row.
Three series of simulations are reported, each seeded with the same initial configu-
ration. The first round of simulations considered the effects of applying probabilistic
enhancement in the absence of phonetic bias (section 11.5.1); the second considered
the effect of applying phonetic bias to the production of a single cue, but without
enhancement (section 11.5.2); and the third explored the effects of applying both
enhancement and bias (section 11.5.3).
The simulations reported here are representative runs of 25,000 iterations, at which
point the statistical reliability of the cue targeted by the bias factor and/or the probabil-
ity of enhancement approached zero. Goodness of fit between the target distributions

TABLE 11.3 Mean (s.d.) and informativeness a) of cues to Korean stops,


estimated from data in Cho et al. (2002); Kim et al. (2002); Silva (2oo6a);
Kang and Guin (2008). VOT = voice onset time (in ms); VLEN = vowel
length (in ms); H x H 2 = spectral tilt (in dB); BA = burst amplitude (in
dB);/0 (in Hz)
Category VOT VLEN Hi-H 2 BA /o

lenis 35(n) 337 (8) 6(2) 48(8) 162 (14)


19608 aspirated 93 (15) 34o(i5) 7-5 (1) 64(9) 227 (21)
0) 0.4 0.03 0.09 0.16 0.32
lenis 65(11) 338(io) 5.5(i) 48(8) 170 (10)
2000S aspirated 73d5) 343 (12) 7-5 (1) 64(9) 250(11)
) 0.06 0.03 0.16 0.14 0.61
FIGURE 11.3 Row i: distribution of five cues to the laryngeal contrast in Korean (gray = lenis /pa/, black = aspirated / p a / ) used to seed the
simulations, based on the speech recorded in the 19608. Row 2: distribution of the same cues based on the speech recorded in the 2ooos. Data
estimated from Cho et al. (2002); Kim et al. (2002); Silva (2oo6a); Kang and Guin (2008). Captions give cue informativeness as computed by
Equation (5). VOT = voice onset time (in ms); VLEN = vowel length (in ms); H x H 2 = spectral tilt (in dB); BA = burst amplitude (in dB);/0 (in Hz)
ii. Probabilistic enhancement inphonologization 241

and the results of the various simulations was quantified by the KULLBACK-LEIBLER
(KL) DIVERGENCE (Kullback and Leibler 1951) between each target and simulated
cue dimension. This is a non-symmetric measure of the dissimilarity between two
distributions; KL divergence equals zero when two distributions are identical and
grows with the dissimilarity between them.

11.5.1 Enhancement without bias


As can be seen in the top rows of Table 11.3 and Figure 11.3, it would appear that a
contrast along the/0 dimension already existed in the Seoul Korean of the 19605, albeit
covertly. One interpretation of the phonologization model is that active enhancement
of cues on the part of speakers itself conditions the transition of a cue from a covert
to overt indicator of contrast. This interpretation may be tested by considering the
application of probabilistic enhancement in the absence of any external bias. In this
set of simulations, the constant was set to zero, meaning that some cue was always
enhanced at each timestep. Each element in the X vector was also set to zero, meaning
that no phonetic bias factors were applied.
The first row of Figure 11.4 shows the results of a representative simulation run
using these parameter settings. In each case, the most informative cue at initialization
(here, VOT) maintained its relative dominance throughout the simulation. The overall
degree of enhancement was extremely small, reflecting the fact that the precision of
the contrast is never in jeopardy, although as shown in Figure 11.5, the error rate
does fluctuate somewhat over time. In short, these parameter settings result in few
or no changes to the cue structure of the categories over time, demonstrating that
probabilistic enhancement alone is insufficient to induce phonologization of a pho-
netic dimension along which categories may be only weakly separated. Furthermore,
it shows that enhancement along one cue dimension does not in and of itself entail
loss of contrast along another. This suggests that some other mechanism is necessary
to drive the process of phonologization.

11.5.2 Bias without enhancement


The second set of simulations considered the inverse of the above interpretation. If
two categories are redundantly (if perhaps weakly) distinguished along some cue
dimension, it is possible that this cue will become more informative simply as a result
of continuous application of systemic bias to a highly informative cue. To test this
hypothesis, simulations were run in which the VOT element of the bias vector X
was computed dynamically as | log(/zCl /x Cl )|, a range of o to about 4 ms. This
had the effect that VOT values for category c2 words (/p h a/) were produced with
slightly shorter VOTs at each timestep, while values for category c1 words (/pa/) were
produced with slightly longer VOTs. No cues were enhanced in these simulations, i.e.
P(enhance) was set to zero.
242 James Kirby

FIGURE 11.4 Cue distributions (gray = lenis /pa/, black = aspirated /p h a/) after 25,000 iter-
ations. Row i: enhancement without bias. Row 2: bias without enhancement. Row 3: bias and
enhancement. Row 4: empirical targets. Captions give cue informativeness as computed by Eq.
(5). VOT = voice onset time (in ms); VLEN = vowel length (in ms); H x H 2 = spectral tilt (in
dB); BA = burst amplitude (in dB)

The results of a representative simulation run are shown in the second row of
Figure 11.4. As evidenced both by the scatterplots as well as the a) values, VOT has
ceased to be informative in distinguishing this contrast; to the extent that a contrast
between the two categories still exists, it is supported chiefly by a difference in/0 (row
2, panel 4). This differs slightly from the attested modern Korean situation (row 4) in
11. Probabilistic enhancement in phonologization 243

FIGURE 11.5 Comparison of contrast precision as measured by classification error rate at each
simulation timestep for simulations reported in sections 11.5.1-11.5.3

that the actual parameters characterizing the distributions of/0 have not changed for
either category:/0 has become the most informative cue simply because all other cues
have become less informative. However, the empirical Korean data indicate that the/0
means for aspirated and lenis obstruents have shifted slightly away from one another,
suggesting that they have been enhanced both in terms of a shift in means as well as
a reduction in variance (compare rows i and 2 of Figure 11.3).
As shown in panel 2 of Figure 11.5, in the absence of any kind of enhancement,
the precision of the contrast degrades steadily over time as bias is applied. These
simulation results indicate that while a redundant or covert contrast may become
exposed by a systemic production bias, at least in the present case, bias alone cannot
account for the shifts in cue distributions that are empirically observed.

11.5.3 Bias and enhancement


The third and final series of simulations considered the effect of applying VOT bias
while allowing for probabilistic enhancement of cues. Here, the constant was arbi-
trarily fixed at 0.5, and the same dynamic VOT bias described in section 11.5.2 was
applied. Thus, while bias was applied at each iteration, the likelihood of enhancement
covaried with contrast precision.
A representative agent state after 25,000 iterations is shown in the third row of
Figure 11.4. Of the three types of simulations run, these results most closely resemble
the empirical data, as evidenced by the small KL divergences shown in Table 11.4 and
the high CD value for/0 (compare rows 3 and 4 of Figure 11.4). While both spectral tilt
and burst amplitude are somewhat more informative relative to their initial values,
/o is the most informative cue to the contrast. Crucially, the phonologization of/0
was an adaptive, probabilistic response to the continued application of a bias in the
production of VOT, resulting in an increasing loss of informativeness along that
244 James Kirby

TABLE 11.4 Comparison of mean (s.d.)> cue informativeness, and KL


divergence (in bits) for three simulation scenarios. VOT = voice onset time
(in ms); VLEN = vowel length (in ms); B A = burst amplitude (in dB);
H!-H2 = spectral tilt (in dB);/0 (in Hz)
Source Category VOT VLEN Hi-H 2 BA /o
lenis 36(10) 336(8) 5-6 (2.4) 48 (74) 159(15)
enhancement aspirated 92(13) 342(10) 7.6 (0.9) 62 (8.7) 225 (20)
only ) 0.4 0.06 0.1 0.14 0.31
KL 0.2 0.002 0.27 0.05 0.01

lenis 65(ll) 340 (8) 6.3 (1.8) 48(7) 162(12)


bias only aspirated 6 5 (16) 340 (9) 7.7 (0.9) 64(8) 227 (20)
) 0 0 0.13 0.29 0.57
KL 0.09 0.002 0.16 0.05 0.01

lenis 66(12) 338 (7) 4-7 (2.5) 49 (7.6) 152(12)


bias + aspirated 67 (19) 341 (10) 7-3 (0.9) 65 (9.6) 248 (17)
enhancement ) 0 0.04 0.16 0.19 0.62
KL 0.09 0.002 0.09 0.06 0.008
lenis 65(11) 338(10) 5.5(i) 48(8) 170 (10)
target aspirated 73(15) 343 (12) 7.5 (1) 64(9) 250(11)
(cf. initial) ) 0.06 0.03 0.16 0.14 0.61
KL 0.16 0.002 0.12 0.06 0.008

dimension. At no point was/0, or any other cue, specifically targeted for enhancement.
As seen in panel 3 of Figure 11.5, while the error rate increased slightly in the early
iterations of this simulation, it was quickly reduced by the countervailing force of
probabilistic enhancement.

11.6 General discussion


The simulation results presented above demonstrate how phonologization may be
predicted in a model where probabilistic enhancement is an adaptive response to
a loss of contrast precision. This is not to say that phonologization must always be
driven exclusively by loss of contrast precision, or that loss of precision will invariably
result in phonologization; to be sure, there are cases in which bias leads to contrast
neutralization (Kirby 2011). Nevertheless, these results indicate that at least some
cases of phonologization may be the result of enhancement in response to a systemic
production bias, and that both the presence of a redundant or covert contrast and the
reduction of primary cues need to be present simultaneously in order for phonolo-
gization to take place.
ii. Probabilistic enhancement inphonologization 245

As measured by KL divergence, the distributions resulting from the application


of both enhancement and bias were most similar to the target Korean distributions,
compared with those resulting from the application of only enhancement or only bias.
While the KL divergences reported in Table 4 are generally quite small, it is worth
noting that the KL divergences between the initial and final (target) distributions are
quite small as well. The KL divergences for various dimensions should thus not be
interpreted in an absolute sense, but instead relative to other values for the same cue
dimension.
It is important to note that it is not simply the presence of both bias and probabilistic
enhancement that allow for accurate modeling of phonologization, but they also allow
us to understand how different parameter settings can give rise to different outcomes
for simulations of differing lengths. This is precisely the strength of the the present
account, which provides a framework in which to map out under what circumstances
phonologization is more or less likely, given an empirical characterization of language-
specific biases and cue distributions. This model goes beyond the observation that a
system biased against one cue will choose another by arguing that precisely which cue
takes over can be predicted with some accuracy. In this formulation, the speaker plays
an important role in sound change, enhancing phonetic cues in a fashion optimally
suited to accommodate the communicative needs of listeners. In other words, the
present model provides a principled explanation for why/0, and not H x H 2 or burst
amplitude, was the cue which transphonologized in Seoul Korean.
However, depending on the distributional patterns and bias factors involved, the
outcome could well be different for another contrast or another language. The results
obtained in section 11.5 are dependent on the initial state of the agents when the
simulation begins, and similar results may not necessarily obtain for other initial
states. In particular, if all cues are equally balanced in terms of their informativeness
at the start of a simulation, then all will maintain their relative informativeness on this
scheme if a constant bias is applied. Similarly, a strong bias (or low ) can overwhelm
the probabilistic enhancement strategy, leading to neutralization even in cases where
both bias and enhancement are applied.
The present model makes two assumptions which deserve further mention. The
first is that all cues are conditionally independent in perception. While the structure of
the acoustic cues available to listener may be consistent with a linear model (Clayards
2008), this does not necessarily mean that they are treated as such by listeners, as
other factors such as task and saliency may play a role in determining how these
dimensions are ultimately weighted (Holt and Lotto 2006; Toscano and McMurray
2010). To a certain extent, this assumption is orthogonal to the issues discussed in the
present chapter, as probabilistic enhancement could just as easily be applied regardless
of whether cue perception is represented by a linear or a multivariate model. However,
the range of potential outcomes in a model which does not make this assumption has
yet to be fully explored.
246 James Kirby

The second assumption is that any acoustic-phonetic dimension serving as a per-


ceptual cue is amenable to enhancement in speech production. This is a somewhat
stronger version of the phonetic knowledge hypothesis than that originally proposed
by Kingston and Diehl (1994), who argued that cues are enhanced based on the degree
to which they contribute to the perception of an INTEGRATED PERCEPTUAL PROPERTY
(IPP) which reinforces an existing phonological contrast. In the case of a voicing
contrast for initial stops, for example, Kingston and Diehl would predict that cues
with similar auditory properties, such as Fi and/0, would integrate, while cues such as
closure duration and/0 would not, because they do not both contribute to the amount
of low-frequency energy present near a stop consonant (Kingston 2008). If cues are
enhanced based on the degree to which they contribute to IPPs, this predicts that
certain cues might not be enhanced regardless of their distributional informativeness
in signaling a contrast. In contrast, the probabilistic enhancement predicts cues will
be targeted based on informativeness and contrast precision, regardless of their rela-
tionship to IPPs. The different predictions made by these two theories awaits further
experimental investigation.

11.7 Conclusion

This chapter has argued for the role of probabilistic enhancement in phonologization
through computational simulation of an ongoing sound change in Seoul Korean. Two
challenges faced by a phonologization model of sound change were addressed: deter-
mining how cues are selected, and explaining why phonologization is often accompa-
nied by dephonologization. It was proposed that cues are targeted for enhancement
as a probabilistic function of their informativeness, so a cue which may be targeted
for enhancement in one language may be ignored in another. Simulation results using
empirically derived cue values were presented, providing strong support for the idea
that loss of contrast precision may drive the phonologization process. Depending on
the distribution of cues, the interaction of phonetic bias and probabilistic enhance-
ment can set the stage for a reorganization of the system of phonological contrasts.
12

Modeling the emergence of vowel


harmony through iterated learning
FRDRIC MAILHOT

12.1 Explanation in phonology


In explaining the existence of typologically frequent synchronie sound patterns,
generative phonologists typically suggest that humans come to the task of language
acquisition equipped with a rich base of innate, domain-specific knowledge (Chom-
sky and Halle 1968; Prince and Smolensky 2004). That is, the preponderance of
common patterns is a consequence of humans' common, genetically endowed, initial
state. This rich initial state in turn leads to the development of (representationally
and/or procedurally) rich synchronie grammars.
However, several phonologists and phoneticians (Ohala 1992; Hale and Reiss 2000;
Hayes et al. 2004, inter olios) have noted that many recurrent patterns can be given
explanations grounded in phonetic factors (cf. Moretn 2oo8a for an overview of the
debate). Researchers who adopt this functional approach divide according to whether
they take the functional pressures to operate synchronically or diachronically, that is,
whether speakers actively adapt their outputs to maximize articulatory ease (Kirchner
1998) and minimize risk of listener error (Lindblom 1990), or whether these biases
are more latent and ateleological, manifesting only through the multiplicative effect of
successive iterations of slightly biased transmission and acquisition (Blevins 2004).1
In this chapter, I focus on a particular instance of diachronic explanation of a
synchronie pattern: the emergence of vowel harmony. I will show that lexical har-
mony can emerge diachronically from interactions between synchronie coarticulation
and a biased transmission-acquisition feedback loop. Depending on the amounts of
coarticulation and channel noise, the amount of lexical harmony is seen to stabilize
1
In fact, this kind of historical phonological explanation has a long history, dating back at least to the
work of Baudouin de Courtenay (i895[i972aj), who explicitly suggested that misperception of form due
to persistent physical biases in production and perception could result in sound change and the emergence
of regular synchronie patterns.
248 Frdric Mailhot

at intermediate levels between a baseline amount of harmony (given a uniform dis-


tribution over features) and full harmony. The chapter is organised as follows: in
the following section, I give an overview of the relevant linguistic background; in
section 12.3 I give a brief overview of language change modeling and some related
work; in section 12.4 I present my model in detail, and subsection 12.4.2 discusses
the simulations and results. I conclude with some discussion and remaining issues
and questions in section 12.5 and section 12.6.

12.2 Background
12.2.1 Vowel harmony
Across a wide variety of languages and in virtually all language families, one finds
vowel co-occurrence restrictions operating over particular phonological domains.
These constraints on which vowels may appear together in a word are typically con-
sidered a unitary phenomenon and called vowel harmony. The vowels in a language
with vowel harmony can be classified into disjoint sets2 such that vowels from only
one of the sets are found within the relevant domain, typically a phonological word.
A standard example from the literature involves the front/back distinction in Finnish
vowels (van der Hlst and van de Weijer 1995):

TABLE 12.1 Finnish backness harmony


surface form gloss
tyhm 'stupid'
tyhmst 'stupid' (elative)
tuhma 'naughty'
tuhmasta 'naughty' (elative)

The point to note in Table 12.1 is that the root vowels are either all front {y, a} or
all back {u, a}, and that elative case has two exponents, [-st] and [-sta]> whose vowel
backness depends on whether the stem has front or back vowels.

12.2.2 Phonologization
Phonologization is a term used to describe the diachronic process whereby linguistic
variation that is under physical/physiological (i.e. 'phonetic') control comes to be
under cognitive (i.e. 'phonological') control. The term was introduced by Jakobson
and was most recently reintroduced by Hyman (1972).

2
I am glossing over the issue of neutral vowels, as they are not addressed by the simulations reported
here. See section 12.6 for some discussion of ongoing work dealing with neutrality.
12. Modeling the emergence of vowel harmony through iterated learning 249

For the purposes of this chapter, I will take phonologization to simply mean that
some detectable variation that is not due to any properties of the target phonological
grammar (i.e. the grammar that produces the data that the acquirer learns from),
becomes encoded in the acquirer's phonology.

12.2.3 Co articulation
Co articulation is the label given to the predictable effects that segments have on their
neighbours in running speech. Co articulation may affect adjacent consonants, as
when an English speaker says [Iimbejk9n] for lean bacon, anticipating the bilabial
closure (Kingston 2007), or between vowels and consonants, as when an English
speaker produces a nasalised vowel before a nasal consonant, as in pfi~/pit/~[pit]
versus pm~/pin/~ [pm]. Finally, it has been known since the work of hman (1966)
that vowels may coarticulate with other vowels across intervening consonants.
This vowel-to-vowel (V-to-V, henceforth) co articulation underlies one of the best-
known explanations for the existence and typological distribution of vowel harmony.
Ohala (i994b) proposes that vowel harmony is a result of the phonologization of this
V-to-V co articulation. In particular, he argues that harmony results when listeners
are unable to 'parse out' or compensate for the acoustic effects of distal segments (viz.
neighbouring vowels) and misattribute contextual variation to the proximal segment.

12.2.4 Diagnosing vowel harmony


Within the context of an account of the emergence of harmony from phonetic factors,
how do we decide whether a language has vowel harmony? Three criteria have been
acknowledged in the literature.

Lexical statistics The proportion of harmonic roots in a language's lexicon may


deviate significantly from the expected amount (as measured by some statistical
or information-theoretic criterion) given its inventory of vowels (Goldsmith and
Riggle to appear; Denis 2010), or may increase or decrease measurably on historical
timescales.
Loanword adaptation When disharmonie loanwords are borrowed into a language,
do they become harmonized over time or otherwise behave as expected in vowel
harmony (Zimmer 1985; Kornai 1990; Kertsz 2003)? For example, Turkish has
clearly identifiable sets of French and Arabic borrowings, from distinct historical
periods. Both sets trigger suffix alternations, but only the historically much older
Arabic borrowings have become harmonised with in roots, as well.
Synchronie alternations Are there productive, general surface alternations, such
as the alternations in the Finnish case suffixes shown above? For many, alterna-
tions like these are the only true diagnostic of active harmony within a language.
Mahanta, for example, argues that [t]he only criterion] that may play a role is the
250 Frdric Mailhot

presence of two alternating sets of vowels in the inventory. When one set induces
the other to change, vowel harmony exists in that language' (Mahanta 2007: 14,
emphasis my own).3
As stated in section 12.1, the work presented here focuses on the diachronic emer-
gence of lexical harmony. Although this may seem surprising in light of the preceding
discussion of the perceived importance of synchronie alternations, I believe the work
described here is nonetheless a valuable first step in getting a computational handle
on diachronic explanations. Moreover, there is at least some evidence that lexical har-
mony in the absence of alternations may be used by the phonological system (Denis
2010). In the closing sections I discuss ongoing work addressing the acquisition and
emergence of productive alternations in vowel harmony.

12.2.5 The origins of phonological assimilation


Ohala (i993a) provided a commonly accepted answer to the question of how assim-
ilatory phonological phenomena originate; gradient patterns of coarticulation are
misperceived and/or misparsed by acquirers, and over historical timescales become
incorporated into learners' lexicons and grammars (e.g. as categorical patterns of
phonological assimilation). In some cases, the conditioning context is independently
lost, generating a phonemicized (Hyman 1976) contrast, as in the development of
French nasal vowels. For the particular case of vowel harmony, Ohala (i994a) pro-
posed that the exact same scenario plays out in the domain of V-to-V co articulation.
One of the aims of the work discussed in this chapter is to address the means
available for independently verifying this type of diachronic claim. In order to have a
viable Ohalian explanation of this type, at minimum the following need to be given
(Ohala 1981, 1989):
1. a demonstration of synchronie variation in production
2. a demonstration that this variation is detectable by listeners
3. a relatively worked out model of synchronie linguistic knowledge
4. a relatively worked out model of language acquisition, and finally
5. a demonstration that the previous items can bring about the phenomenon under
consideration, given sufficient time
If the claims in Ohala (1981) and Ohala (1989) are correct about the role of the listener
in sound change, and about sound change being a product of synchronie variation,
then item i and item 2 acknowledge that phonologization is essentially a form of
Neogrammarian sound change, and item 3 and item 4 are simply requirements on the
explicitness of auxiliary assumptions. These are relatively uncontroversial, and are the
bread and butter of experimental and theoretical phonologists. On the other hand,
3
Additionally, an anonymous reviewer notes in regards to the current focus on lexical harmony: 'When
people think of Vowel harmony' they usually mean alternations [...], not static lexical patterns.'
12. Modeling the emergence of vowel harmony through iterated learning 251

item 5 leads to difficulties. There is no obvious way to verify or test the diachronic
dimension which is crucial to this kind of explanation. To be sure, one can make
and record some predictions and trust that their confirmation or refutation will
be followed up on by future generations, but this is a rather unrewarding way of
doing research. Moreover, it is almost impossible that specific, falsifiable predictions
would ever pan out, given the sheer amount of uncontrollable factors, e.g. patterns
of connectivity and communication in social networks, language contact situations,
etc.4 Of course, rather than making predictions about specific occurrences of change,
diachronic explanations make typological predictions and retrodictions that are in
principle open to verification. In other words, if a particular change is predicted to be
likely or frequent, one assumes that its outcome will be typologically well-represented.
Of course, typological data are as subject to noise and extraneous factors as any others,
and in fact are probably more subject to arbitrary types of noise that are difficult to take
into account (e.g. how funding gets distributed, and which languages are considered
'interesting' or worthy of study, which language groups are accessible, etc.).
An implicit claim of this chapter is that computational modeling is a viable, useful,
and perhaps soon necessary tool to have in a diachronic linguists arsenal. Modeling
gives the researcher a Virtual lab' in which to test explanations, with tight control
over parameters of interest, as well as perfect repeatability. In addition, computational
models generate quantitative data, which at least in principle allows for the possibility
of theory comparison and choice. Finally, implementation of a particular diachronic
explanation or model forces a rarely seen degree of explicitness and precision with
respect to the necessary auxiliary assumptions and parameters.

12.3 Modeling language change


Modeling strategies can be broadly classified as being either analytic or synthetic. The
former are models based on closed-form mathematical equations, such as systems of
differential equations, which are solved numerically by computer, and are typically
focused on population-level properties, e.g. the proportion of a speech community
adopting a particular variant of some linguistic form. A good example of this is the
work by Komarova and Nowak (2003), examining the emergence of coherence in a
group of language users. Synthetic models, conversely, are primarily concerned with
modeling individuals, and population-level propertiesto the extent that they are of
interestemerge from the local interactions.
In a synthetic approach to language modeling, often called agent-based or multi-
agent models, agents with individual-level properties are explicitly modeled. Agents
interact with the world by means of sensors and effectors, have some internal state
that carries persistent information (about the world, the agent, or both), and generally
4
But see Niyogi (2006) and references discussed in Hruschka et al. (2009) for recent attempts to address
some of these issues.
252 Frdric Mailhot

possess some kind of learning algorithm, which can be viewed as a function mapping
the internal state to itself (Russell and Norvig 1995).
Synthetic models can be further subclassified according to constraints on the flow
of information between agents. In a horizontal model, any pair of agents can interact
and all agents can update their internal state. An example of this type of model in a
language-based context is in de Boer (2001). In a model with vertical information flow,
there are restrictions on which pairs of agents may communicate and which agents
may change their internal state.
Kirby (1999) introduced and popularised linguistic agent-based models with ver-
tical information flow as iterated learning models.5 In an iterated learning model, the
population of agents is partitioned into two disjoint classes, one with fixed internal
state (modeling 'adults') and the other with modifiable internal state (modeling chil-
dren or learners'). Agents may only communicate across classes, and most typically,
children are listeners and adults speakers. The adult grammars serve as approximate
targets to which the child grammars are meant to converge. Upon convergence, or
after some predetermined amount of time, the adults are replaced by the children,
whose internal states become fixed, a new generation of children is introduced, and
the process is repeated. This feedback loop, iterated over several generations, is meant
to explicitly capture the interaction between I- and E-language (Chomsky 1986) in
language transmission and acquisition. There are two potential drivers of change in
these models: noisy data transmission and the information bottleneck that obtains
when learners are exposed to only a subset of the data.
The model presented here is in a sense the simplest possible iterated learning model,
with one adult and one child per generation. Notwithstanding this simplicity, the
model shows how noisy language transmission coupled with a form of probabilistic
learning can change a gradient pattern of V-to-V co articulation into a pattern of lexical
vowel harmony.

12.3.1 Related work


With few exceptions (cf. Klein et al. 1969), computational and mathematical modeling
of language change is a relatively recent development in linguistics, beginning approx-
imately a decade ago, and entering the mainstream only in the last few years (Niyogi
2006). Previous work has mostly dealt with syntactic (Niyogi and Berwick 1998)
or morphological (Hare and Elman 1995) change, with little work on phonological
change until quite recently; I review some of the contributions to this domain here.
Wedel (2oo4b, 2006, 2007) uses exemplar-based models of synchronie knowledge
coupled with an explicitly Darwinian evolutionary model to study the emergence
of categorical patterns and regularity in phonology. Rather than studying specific
patterns that may be of interest to phonologists, Wedel examines how selective
5
Hare and Elman (1995) is a clear precursor to these models.
12. Modeling the emergence of vowel harmony through iterated learning 253

pressures derived from the production-perception feedback loop, coupled with the
dynamics of lexically-biased exemplar models, lead to (i) the appearance of cate-
goricity from initially gradient phenomena, (ii) general patterns of contrast main-
tenance, and (iii) something akin to the strict constraint domination in Optimality
Theory.
Dras and Harrison (2002) create multi-agent simulations of the emergence of
backness harmony in the Turkish lexicon, with a particular focus on modeling the

S-shaped' trajectory that has been claimed to characterize historical language change
(Kroch 1989). They model a population of interacting Turkic speakers (i.e. horizon-
tal information flow) initiated with 50 per cent harmonic i,ooo-word lexicons. At
each interaction, an agent can choose with some fixed probability to harmonize or
disharmonize a word that is transmitted to it. The single parameter which controls an
agent's decision to (dis)harmonize a word conflates several properties, some of which
I am interested in keeping apart (coarticulation, lexicon structure,... ). Additionally,
children in this model directly inherit a subset of their parents' lexicon, eliminating
the particular interaction that is key in the account developed here. Although these
simulations are clearly related to the present work with respect to content (vowel
harmony), the choices that the authors make in designing their models prevent them
from addressing the issues with which I am concerned here.
Choudhury (2007) is concerned with creating computational models of real-world
phonological change, specifically changes in Bengali verbal inflections, and the devel-
opment of a schwa-deletion rule in Hindi. One of these models is a multi-agent simu-
lation of the development of a schwa-deletion rule in Hindi. Choudhury's agents have
a stochastic bias toward schwa-reduction, and interact by means of an 'imitation game'
(de Boer 2000), in which there is explicit feedback about communicative success
or failure. This is plainly an unrealistic (albeit not uncommon) model of linguistic
interaction, especially between parent and child. Perhaps most troubling, however,
is that the built-in stochastic tendency for context-free schwa reduction seems to
build the looked-for behaviour right into the model. Given a steady stochastic bias
towards the shortening of schwa, it seems inevitable that schwa-deletion should be
the outcome (cf. section 12.5 on the 'actuation problem'). In sum, Choudhury's model
provides a good example of synthetic modeling, but is still not addressing the issues
that I hope to explore.

12.4 Modeling the emergence of harmony


12.4.1 The agent
The architecture of the linguistic agents in my models6 is derived from the generic
agent architecture outlined in Russell and Norvig (1995). Agents have comprehension

6
The models were programmed in Python, making heavy use of the numerical and scientific packages
NumPy and SciPy (Oliphant et al. 2001). Source code may be obtained from the author.
254 Frdric Mailhot

FIGURE 12.1 Architecture of a linguistic agent (adapted from Russell and Norvig 1995)

and production modules in lieu of sensors and effectors, and an internal knowledge
state, which essentially models a lexicon (cf. Figure 12.1).
The chief building blocks of the lexicon are two binary phonological features which
model the standard phonological features [HIGH] and [BACK]. 7 Lexical items are
sequences of four vowels,8 and there is no morphophonology This is clearly a highly
impoverished grammar', and yet it will be shown to suffice for the induction of lexical
harmony, given the learning algorithm discussed below. To model more sophisticated
aspects of harmony (e.g. alternations, neutrality) will require additional (or different)
entities and operations, cf. the discussion in section 12.5.
In producing outputs, discrete phonological features are transduced to continu-
ous articulatory parameters [HIGH], [BACK], and [ROUND] on the real interval [o, i].
These articulatory specifications are Beta distributed (see the appendix for details
concerning the parameters) over the front/back space modeling individual-level
hypo/hyperarticulation (Lindblom 1990). The articulatory parameters are in turn fed
to the following equations from de Boer (2001) to synthesize Fi and F 2 formant
values.9

Fi = ((-392 + 392r)/z 2 + (596 - 668r)/z + (-146 + i66r))p2


+((348 - 34$r)/z2 + (-494 + 6o6r)/z + (141 - i/50)p
+((340 - 72r)/z 2 + (-796 + io8r)/z + (708 - 38r))
F2 = ((-1200 + I2o8r)/z 2 + (1320 - i328r)/z + (118 - i58r))p 2
+((1864 - i488r)/z2 + (-2644 + i5ior)/z + (-561 + 22ir))p
+((-670 + 49or)/z2 + (1355 - 697r)/z + (1517 - ii7r))

7
Whether these features are learned or innate is orthogonal to the discussion here, although I find the
arguments by Mielke (2008) persuasive. I assume their availability here for convenience.
8
I abstract away from consonants, since the focus here is on vowel-to-vowel coarticulation and
harmony.
9
[ROUND] was unused and consistently set to zero.
12. Modeling the emergence of vowel harmony through iterated learning 255

The parameter of interest in these models is front-back coarticulation, modeled


here by parametric variation in F2, which is a key acoustic correlate of front-back
articulatory variation. Both anticipatory and perseverative co articulation were mod-
eled by adding or subtracting a user-specified value from F2 according to whether the
preceding or following vowel had an opposite backness specification. Additionally
a small amount of Gaussian noise (/x = o, a 2 = 30 Hz) was added to the acoustic
outputs to model general noise such as articulatory fatigue, ambient acoustic inter-
ference, etc.10 This additional noise turned out to be of significance in modeling the
diachronic development of harmony (see subsection 12.4.2).
The comprehension and learning modules are folded together in another multi-
step system. The learner's inputs are sequences of four (Fi, F2) pairs, i.e. the adult's
outputs. As an initial step, the learner uses /c-means clustering (MacKay 2002: 285)
to find acoustic prototypes in the data.n Given the found acoustic prototypes (cluster
centres), the learner inverts the articulation-acoustics mapping to recover the artic-
ulatory parameters responsible for the data.12 From the articulatory descriptions, the
learner uses Maximum A Posteriori learning to infer the underlying representations
of each prototype vowel:

Here, v represents the learner's hypothesis about the underlying structure (i.e. feature
description) of the vowel under consideration, P(D = d\V = v) is the likelihood of
the observed acoustic form, given the learner's hypothesis, P(V = v) is the prior prob-
ability ofthat hypothesis being correct, and z is a normalizing constant to ensure that
the calculation generates well-behaved probabilities. Since I only investigated uniform
priors (i.e. each underlying representation is equally probable, a priori), this algorithm
reduces to Maximum Likelihood learning, whereby the underlying representation
that gives highest likelihood to the observed acoustic form is the one chosen.
Given the articulatory specifications for the vowel cluster centres, the learner then
assigns underlying representations to entire lexical entries by means of a simple vector
quantization algorithm; each vowel in a word is assigned the underlying representa-
tion of the acoustic prototype nearest to it.

10
Because this noise models a sum of presumably independent sources, a Gaussian is a reasonable
hypothesis for its shape.
11
The variable k was set explicitly to 4 in these simulations. The simplification of essentially telling the
learners how many vowels to look for was mainly in the interests of computational tractability, although it
is also not implausible given the assumed availability of two binary features. Some attempts were made at
clustering the acoustic data using a mixture of Gaussians trained with the EM algorithm, and finding the
appropriate number of clusters with the Bayesian Information Criterion, but the addition of co articulatory
effects renders the data non-Gaussian and so the number of clusters was consistently overestimated.
12
This is clearly an unrealistic assumption, which could presumably be addressed in future research
with an 'analysis-by-synthesis' approach (Stevens and Halle 1967).
256 Frdric Mailhot

Algorithm i Iterated learning algorithm for emergence of lexical vowel harmony


Require: gens G N as number of iterations, coarta, coartp G R+ as degrees of antici-
patory and perseverative coarticulation
Initialize zeroth ADULT agent with 44-word lexicon (all length-4 permutations of
two binary features)
while gens > o do
new CHILD (empty lexicon)
ADULT outputs full lexicon via formant synthesis, perturbed by courtfl, coartp
CHILD finds means of adult vowels via /c-means clustering (k = 4)
CHILD finds MLE articulatory values for acoustic cluster centres
CHILD assigns underlying reps to adult outputs from prototype articulations via
vector-quantization
ADULT CHILD (delete previous ADULT)
gens gens i
end while

Algorithm i sketches the sequence of steps carried out for each generation of the
iterated learning model incorporating the production and comprehension modules
discussed above.

12.4.2 Simulations
For each degree of coarticulationanticipatory or perseveratory from o Hz to 400 Hz
in 50 Hz incrementsthe model was run 15 times for 250 iterations. The graphs in
Figure 12.2 show the results for some of these parameter settings with anticipatory
coarticulation.13 In particular, they show the increase over time (measured in gen-
erations') of the proportion of lexical items in the learners' lexicons that have fully
harmonic underlying feature specifications, i.e. full agreement of [BACK] across all
vowels in a word.
Of interest is the fact that there appear to be two stable levels of harmony between
absence of harmony and full harmony. Figure i2.2a shows the lexicon asymptoting
toward a harmonic proportion in the neighbourhood of 0.33, while Figure 12.2C
and Figure i2.2d show another region of stability around 0.66. Additionally, in
Figure i2.2a and Figure i2.2c we see clearly that for any given parametric set-
tings, a subset of the runs may 'escape' the principal region of stability and end

13
The results with perseverative coarticulation were qualitatively and quantitatively similar and will not
be discussed. Also, runs with intermediate co articulatory values are not shown or discussed here, as they
had qualitatively similar dynamics, and varied only in the speed at which they achieved stability.
12. Modeling the emergence of vowel harmony through iterated learning 257

FIGURE 12.2 Effects of varying degrees of anticipatory coarticulation. Fifteen runs per figure.
Gaussian noise ~ Ai( = o, a = 30) on post-articulatory outputs
258 Frdric Mailhot

with a higher proportion of harmonic forms in the lexicon (or else reach a plateau
much more quickly than other runs with the same parametric specification). This
variability across different runs of a particular parametric configuration is due to
post-coarticulatory Gaussian noise. The randomness of the distribution in acous-
tic space interacts synergistically with coarticulation, increasing the likelihood that
in the assignment of underlying forms, any particular vowel will be categorized
in an 'incorrect' cluster, i.e. assigned to an acoustic prototype different from that
which generated it. Because coarticulatory noise is anisotropic and biased in the
direction of the opposite articulatory specification, this misclassification is more
likely to happen in the direction of increased local harmony (viz. in agreement with
the immediately preceding or following vowel). Depending on where a particular
speaker lies along the hypo/hyperarticulatory continuum (recall that the parameters
a y which control this are normally distributed), several of these misclassifications
may occur together within a generation and conspire to drive a language toward
harmony much more quickly than is typical, as seen in e.g. five of the runs in
Figure i2.2a.

12.5 Discussion and future directions


In the face of a stochastic pressure towards harmonization, all runs of the model might
be expected to inexorably evolve toward fully harmonic forms across the lexicon (cf.
the discussion of Choudhury (2007) in subsection 12.3.1). This intuition is essentially
a version of the 'actuation problem' introduced by Weinreich et al. (1968), who noted:
'[...] the question always remains as to why the change was not actuated sooner, or
why it was not simultaneously actuated wherever identical functional properties pre-
vailed. The unsolved actuation riddle [... ] creates the opposite problemof explain-
ing why language fails to change' (Weinreich et al. 1968: 112, emphasis mine).
As shown in Figure 12.2, rather than runaway harmonization, what in fact happens
is that particular runs stop at one of a few stable intermediate levels of harmony, taking
more or less time to reach these plateaux as a function of the degree of coarticulatory
influence. This can be explained in terms of differential resistance to co articulation
(Recasens 1984). Experimental work (Beddor et al. 2002) has shown that some vowels,
particularly high front vowels, are less prone to coarticulatory effects than others.14
In the context of the model described here, this is a straightforward consequence
of having vowels that are more dispersed than the average 'reach' of co articulation.
Whatever the mechanism underlying dispersion in human vowel systems (e.g. syn-
chronie or diachronic pressures toward contrast preservation or homophony avoid-
ance), it is sufficiently strong to ensure that its effects are greater than the amounts
14
Ohala (i994a) largely attributes the prevalence of high front transparent vowels in harmony to this
stability, arguing that their coarticulatory effects are easier for listeners to 'parse out'.
12. Modeling the emergence of vowel harmony through iterated learning 259

of coarticulation characteristic of human speech (see Beddor et al. 2002 for some
data). This in turn results (by hypothesis) in only sporadic opportunitiesabetted
by anisotropic noise from other factorsfor phonologization of the variety described
here.
There remains much work to be done in fleshing out this model to more accurately
reflect the conditions that obtain in real-world examples of sound change and phonol-
ogization. The model as presented here fits broadly into the view of phonological
diachrony espoused by Hale (2007), whereby sound changes and phonologization
are initiated within the heads of individual (particular) speaker-hearers. Of course,
individuals acquire their language from multiple sources (hence more variable input
forms), and children's language is often shaped as much by their peers as their parents
(Labov 1994), so even for a diachronician who ascribes to Hales viewpoint, it seems
unwise to avoid the influence of external actors. An incarnation of the model currently
in development incorporates acquisition from multi-source data.

12.6 Conclusions
The work presented in this chapter represents a first step in demonstrating that
computational modeling can supportand even be a crucial component of
diachronic explanation of synchronie phonological patterns. Given the recently
increasing focus on this style of explanation (Blevins 2004; Hale and Reiss 2008), and
the obstacles to empirically investigating phenomena which arise over timescales
potentially spanning centuries or millennia, the usefulness of computational models
in putting diachronic functional explanations on a sound theoretical and empirical
footing is clear.
In this chapter, I focused on a particular instance of diachronic explanation: Ohala's
(i994b) claim that vowel harmony emerges from the phonologization of vowel-to-
vowel coarticulation. Using a simple model of the language transmission/acquisition
feedback loop iterated over multiple generations, I showed how a gradient pattern of
front/back coarticulation coupled with anisotropic noise arising from external factors
(fatigue, noise, etc.) could eventually become phonologized as a categorical pattern of
lexical harmony.

12.6.1 On the role of coarticulation


The work presented in this chapter serves as an existence proof for the Ohalian theory
of the origins of vowel harmony, namely that it results from the phonologization of
vowel-to-vowel coarticulation. However, there is a recent convergence of evidence
that casts some doubt on this view of the origins of vowel harmony, specifically with
respect to the role that coarticulation plays.
2o Frdric Mailhot

Beddor and colleagues (Beddor et al. 2002, 2007; Beddor 2009) have recently
demonstrated that coarticulationin V-to-V and VN sequencesand 'perceptual
compensation for co articulation are highly language-specific, in particular that antic-
ipatory and perseverative co articulation vary widely in degree across languages, and
that compensation for coarticulation is largely attuned to a language's amount of
co articulation. This immediately puts the 'phonologization of coarticulation account,
at least as it has been implemented here, on a less certain footing. If listeners gen-
erally compensate as much as speakers tend to coarticulate, it is unclear whether
failures of compensation happen frequently enough for phonologization to gain any
traction.
Independently of this, there is a line of research giving increasing evidence that
language users have access to highly detailed episodes of linguistic experiences
(Goldinger 1996; Johnson 199/b; Pierrehumbert 2ooia; Hawkins 2003, inter alia),
and in particular that language users store acoustically-detailed 'word-sized' exem-
plars of linguistic experiences (Silverman 2oo6a; Johnson 2007; Vlimaa-Blum 2009).
But if humans' lexical representations are acoustic and word-sized, then there is no
meaningful sense in which coarticulation, within words at least, happens at all. Con-
sider a very basic example, in which the difference between the (relatively palatal)
[k] in keep versus the (relatively velar) [k] in coop is highlighted as an example of
anticipatory coarticulation. According to the 'phonetically detailed exemplars' view,
this difference is (at least synchronically) not attributable to coarticulation, but instead
is a product of the fact that these forms have only ever been heard in their respective
palatalized and velarized forms by the language learner.
In ongoing research (Mailhot 2010), I am modeling the synchronie acquisition and
diachronic emergence of vowel harmony within such an exemplar-based approach
to phonetics/phonology. In these models, agents explicitly store word-sized for-
mant (Fi, F2) sequences. Individual word tokens are synchronically subject only
to isotropic (Gaussian) noise, modeling the sum of external' noise sources, and
the emergence of vowel harmony comes about due to synchronie perceptual biases.
Synchronically, the model acquires productive alternations (e.g. in affixal morphol-
ogy) successfully, and preliminary results on the diachronic model indicate that
this acquisition model embedded into an iterated learning simulation can in some
instances give rise to such alternations over time, from an initial state lacking such
alternations.
12. Modeling the emergence of vowel harmony through iterated learning 261

Appendix: The Beta distribution

The Beta distribution (Weisstein 2009) models events which are constrained to take
place within an interval, e.g. the probability density of hitting an instance of an artic-
ulatory target.

FIGURE 12.3 The Beta distribution, for various values of shape parameter a ( = 5)

The shape parameters were distributed a ~ A/"(4O, 5) and ~ A/"(5> i) for the
simulations discussed here.
13

Variation and change in English


noun/verb pair stress: Data and
dynamical systems models
MORGAN S O N D E R E G G E R AND PARTHA NIYOGI*

13.1 Introduction
In every language, change is ubiquitous and variation is widespread. Their interac-
tion is key to understanding language change because of a simple observation: every
linguistic change seems to begin with variation, but not all variation leads to change.
What determines whether, in a given linguistic population, a pattern of variation leads
to change or not? This is essentially the actuation problem (Weinreich et al. 1968),1
which we rephrase as follows: why does language change occur at all, why does it
arise from variation, and what determines whether a pattern of variation is stable
or unstable (leads to change)? This chapter addresses these questions by combining
two approaches to studying the general problem of why language change occurs: first,
building and making observations from datasets, in the tradition of sociolinguists and
historical linguists (such as Labov and Wang); second, building mathematical models
of linguistic populations, to model the diachronic, population-level consequences
of assumptions about the process of language learning by individuals (Niyogi and
Berwick 1995 et seq.; Niyogi 2006).
We describe the diachronic dynamics of an English stress shift, based on a
diachronic dataset (1600-2000) which shows both variation and change. This stress
shift has several interesting properties which can be explored using computational
models. We focus here on a pattern characterizing much language change, the

* We thank two anonymous reviewers for comments on an earlier draft of this chapter, John Goldsmith,
Jason Riggle, and Alan Yu for insightful discussion, and Max Bane for both. Audiences at LabPhon 11, the
University of Chicago, and Northwestern University provided useful feedback.
1
'Why do changes in a structural feature take place in a particular language at a given time, but not in
other languages with the same feature, or in the same language at other times?' (p. 102).
13. Variation and change in English noun/verb pair stress 263

existence of periods of long-term stability punctuated by periods of change. We dis-


cuss several proposed causes of change from the literature (listener-based misper-
ception, word frequency analogy) and their application to our dataset; we then link
observed dynamics and proposed causes by determining the diachronic dynamics of
three models of learning by individuals in a linguistic population.2 Based on these
models, we argue that bifurcations in the dynamics of linguistic populations are a
possible explanation for actuation, and that the presence or absence of bifurcations
can be used to evaluate proposed mechanisms of language change.

13.2 Data
The data considered here are English disyllabic noun-verb pairs such as convict, con-
crete, exile, referred to as N/V pairs throughout. As a rough count of the number of
N/V pairs in current use, 1143 are listed in CELEX (Baayen et al. 1996).3 N/V pairs are
a productive class (YouTube, google).
All current N/V pairs for which N and V have categorical stress follow one of the
three patterns shown in Table i3.i. 4 The fourth logically possible pattern, {2,1}, does
not occur; as discussed below, this patttern is also never observed diachronically. At
any given time, variation exists in the pronunciation of some N/V pairs, e.g. research,
address in present-day American English.
Variation and change in the stress of N/V pairs have a long history. Change in N/V
pair stress was first studied in detail by Sherman (1975), and subsequently by Phillips
(1984). Sherman (1975) found that many words have shifted stress since the first
dictionary listing stress appeared (1570), largely to {i, 2J. 5 On the hypothesis that this
was lexical diffusion to {i, 2}, he counted 149 pairs listed with {i, 2} or possible {1,2}
pronunciation in two contemporary dictionaries, one British and one American, and

TABLE 13.1 English noun/verb pair stress patterns


Pattern N V Examples

{1,1} era era anchor, fracture, outlaw


{1,2} 0(3 era consort, protest, refuse
{2,2} era era cement, police, review

2
These models are sampled from a larger project (Sonderegger 2009; Sonderegger and Niyogi 2010),
whose goal is to determine which model properties lead to dynamics consistent with the stress data, and
with observations about variation and change more generally.
3
The number of N/V pairs in current use depends on the method used to count. Many examples are
clear, but others have rarely-used N or V forms (e.g. collect} which are still listed in dictionaries.
4
We use curly brackets to denote N and V stress, where i = initial stress and 2 = final stress.
5
However, most words are not first listed until 1700 or later.
204 Morgan Sonderegger and Partha Niyogi

examined when the shift for each N/V pair took place. We call these 149 words List i
(Appendix A). Sherman found the stress of all words in List i for all dictionaries listing
stress information published before 1800, and concluded that many words were {i, 2}
by 1800, and those that were not must have shifted at some point by 1975. We will
revisit the hypothesis of lexical diffusion to {1,2} below, after examining the dynamics
of an expanded dataset.
Because Shermans study only considers N/V pairs which are known to have
changed to {1,2} by 1975, it does not tell us about the stability of the {1,1}, {2,2},
and {1,2} pronunciations in general. Over a random set of N/V pairs in use over
a fixed time period, is it the case that most pairs pronounced {1,1} and {2,2} shift
stress to {1,2}?
List 2 (Appendix B) is a set of no N/V pairs, chosen at random from all N/V pairs
which (a) have both N and V frequency of at least one per million words in the British
National Corpus; (b) have both N and V forms listed in a dictionary from 1700 (Boyer
1700); (c) have both N and V forms listed in a dictionary from 1847 (James and Mole
1847). These criteria serve as a rough check that the N and V forms of each word have
been in use since 1700.
In List 2, only 11.8 per cent of the words have changed stress at all between 1700 and
2007. Those stress shifts observed are mostly as described by Sherman, from {2,2} to
{1,2}, and mostly for words from List i. But this quick look suggests that when the set
of all N/V pairs is sampled over a 300 year period, most words do not change stress:
{i, i}, {i, 2}, and {2, 2} are all 'stable states,' to a first approximation. From this perspec-
tive, both sides of the actuation problem are equally puzzling for the dataset: why do
the large majority of N/V pairs not change, and what causes change in those that do?

13.2.1 Diachronie: Dictionary data


To get a better idea of the diachronic dynamics, Sherman's data on N/V stress for
List i words from 33 British dictionaries were extended to the present using 29
additional British and 14 additional American dictionaries, published between 1800
and 2003.6 Words from List i were used rather than a list of N/V pairs controlled for
first attestation and non-zero frequency (such as List 2) for two reasons. First, we wish
to use the large dataset already collected by Sherman for List i pronunciations up to
1800. Second, we are interested in the dynamics o change, and would therefore like to
focus on words which have changed by the present. Because most pairs do not change
stress over time and most change is to {1,2}, List i will include most pairs which have
undergone a stress shift.
For the 149 N/V pairs of List i in 76 dictionaries, each of N and V was recorded
as i (initial stress), 2 (final stress) 1/2 (both listed, i first) 2/1 (both listed, 2 first),
1.5 (level stress), or o (not listed). We assume 1/2, 1.5, and 2/1 reflect variation in the

6
The dictionary list is in Sonderegger (2009); the stress data are available on the first author s web page.
13. Variation and change in English noun/verb pair stress 265

TABLE 13.2 Observed types of complete stress shift, ordered by


decreasing frequency of occurrence
Change Examples

{2,2} -> {i, 2} concert, content, digest, escort, exploit, increase, permit,
presage, protest, suspect
{i, 1} -> {i, 2} combat, dictate, extract, sojourn, transfer
{i, 2} -> {i, 1} collect,prelude, subject
{1,2} -> {2,2} cement

population, either due to variation within individuals (e.g. the dictionary's author(s))
or variation across individuals (each using initial or final stress exclusively). At a given
time, the N or V forms for many words in List i are rare, archaic, or not in use. The
pattern {2,1} is never observed.
Changes in individual N/V pairs' pronunciations can be visualized by plotting the
moving average of their N and V form stresses. To represent averages of reported
stresses on a scale, we need to map reported stresses s as numbers /(s) in [1,2].
We use

This measure overestimates variation between i and 2 by interpreting 1/2 and 2/1 as
meaning equal variation between i and 2.7
For a word w at time f, the average of pronunciations reported in the time window
(t 25, t + 25) (years) was plotted if at least one dictionary in this time window listed
pronunciation data for w. So that the trajectories would reflect change in one dialect
of English, only data from British dictionaries were used. Figures 13.1-13.2 show a
sample of the resulting 149 stress vs. time trajectories.8
Four types of complete stress shift, defined as a trajectory moving from one end-
point ({i, i}, {1,2}, or {2,2}) to another, are observed, ordered by decreasing fre-
quency in Table 13.2. The types differ greatly in frequency: {2,2}^{1,2} is by far the
most common, while there are only i or 2 clear examples of{i,2}>>{2,2}. For both
the {1,1} and {2,2} patterns, change to {1,2} occurs more frequently than change from
{1,2}. Change directly between {i, 1} and {2, 2} never occurs. A sample of each type
is shown in Figure 13.1.

7
In fact, dictionary authors often state that the first listed pronunciation is primary,' so that 1/2, 2/1,
and 1.5 could represent different types of variation in the population, in view of which we might want to
set/(i/2) < 0.5 and/(2/i) > 0.5. In practice, 1/2 and 2/1 are uncommon enough that trajectories plotted
with/(i/2) and/(2/i) changed look similar, at least with respect to the qualitative terms in which we
describe trajectory dynamics below.
8
All trajectories are given in Sonderegger (2009), and posted on the first authors web page.
266 Morgan Sonderegger and Partha Niyogi

FIGURE 13.1 Sample trajectories i: change between endpoints. Solid/dotted lines are moving
averages of N/V stress respectively
13. Variation and change in English noun/verb pair stress 267

FIGURE 13.1 Continued


268 Morgan Sonderegger and Partha Niyogi

TABLE 13.3 Distribution (in %) of data with


both N and V stresses listed. Var' means 1.5,
1/2, or 2/1
V= i V = var V= 2

N= i 7-1 4.6 57-1


N = var 0 2.2 7-1
N = 2 0 0 21.8

For a given N or V stress trajectory, variationa moving average value greater than
i and less than 2could either be due to dictionary entries reporting variation, or
a mix of dictionary entries without variation reporting (exclusively) initial or final
stress. To give an idea of how often variation is reported in individual dictionary
entries, Table 13.3 shows the percentages of entries (with both N and V stresses listed)
reporting variation in N, V, or neither. Variation occurs within N or V in 13.9 per cent
of entries, but variation in both N and V at once is relatively uncommon (2.2 per cent
of entries).
What is the diachronic behavior of the variation observed in the stress trajec-
tories? Examining all trajectories, we can make some impressionistic observations.
Short-term variation near endpoints (converse; Figure i3.2a) is relatively common.
Long-term variation in one of the N or V forms (exile; Figure i3.2b) is less common;
long-term variation in both the N and V forms at once (rampage; Figure 13.20) is rare.
The pattern {2,1} is never observed in the dataset, and we argue it is in fact 'unstable'
in the following sense. Entries 'near' {2,1}, such as (N=2/i,V=i/2) are very rare (nine
entries), and are scattered across different words and dictionaries. This means that
the few times the N form of an N/V pair comes close to having a higher probability of
final stress than the V form, its trajectory quickly changes so this is no longer the case.
In the language of dynamical systems (section 13.4.1), this suggests that the region
pron^ > prony contains an unstable fixed point (one which repels trajectories), {2,1}.
We can summarize the observed diachronic facts as follows:

1. {1,1}, {1,2}, {2,2} are 'stable states', but short-term variation around them often
occurs.
2. Long-term variation occurs, but rarely in both N and V forms simultaneously.
3. Trajectories largely lie on or near a iD axis in the 2D (pron^, pron y ) space:
{1,1} ^> {i> 2} ^> {2, 2}. Both variation and change take place along this axis.
4. Changes to {1,2} are much more common than changes from {1,2}.
5. {2,1} never occurs, and is an 'unstable state'.

Returning to the question of what kind of change is taking place, we see that to a
first approximation and restricted to List i, Sherman was correct: most change takes
13. Variation and change in English noun/verb pair stress 269

FIGURE 13.2 Sample trajectories 2. (a) Short-term variation; (b) long-term variation in the V
form; (c) long-term variation in both N and V forms. Solid/dotted lines are moving averages of
N/V stress respectively
i/o Morgan Sonderegger and Partha Niyogi

FIGURE 13.3 Schematic of observed changes. Each oval represents a stable state: {1,1}, {1,2},
and {2,2} are the observed N/V pair stress patterns, and {1,0} and {0,2} indicate disyllabic
words without V and N forms, respectively. Solid lines indicate observed N/V pair stress shifts,
with line thickness indicating the relative frequency of each shift; e.g. {2,2}^ {1,2} is the most
frequent and {1,2}> {2,2} the least frequent. Dotted lines indicate all ways in which an N or V
form can come into or fall out of use

place to {i, 2}. But taking into account that change from {1,2} also occurs, and that
most words in stable states never change, the diachronic picture is more completely
schematized as in Figure 13.3. The observed dynamics are thus more complicated
than diffusion to {i, 2}. To understand their origin, we consider below (section 13.3)
proposed mechanisms driving stress shift in N/V pairs.
13. Variation and change in English noun/verb pair stress 271

13.2.2 Synchronie: Radio data


We can infer from the dictionary data that significant population-level variation exists
in the pronunciation of many N/V pairs at a given time. However, to build realistic
models, we must also know whether pronunciation variation exists in individuals or
not: do individuals learn gradient (a probability a G [o, i] of using one form versus
another) or categorical (each speaker uses one form exclusively) forms? We call these
options within-speaker and between-speaker variation.9
One place to check the type of variation is on the radio, by observing how an
individual speaker pronounces different tokens of words known to show variation
at the population level. For a sample of 34 stories from National Public Radio, the
American public radio network, Table 13.4 lists the number of speakers (31 total, 18
male) who pronounced the noun form of research, address, or perfume, exclusively
with initial stress, exclusively with final stress, or used both. Each speaker listed for a
word used it at least five times.10
Within-speaker variation thus does occur for N/V pairs, at least in this rela-
tively small dataset. This finding has important consequences for modeling. As has
been pointed out in both dynamical systems (Niyogi 2006) and other computational
models of language change (e.g. Liberman 2000; Troutman et al. 2008), the choice
of whether learners' target is a gradient or categorical form profoundly affects the
population-level dynamics.
Based on the radio data, we can also make an observation about the structure
of within-speaker variation for modeling: although within-speaker variation exists,
two thirds of speakers show no variation at all. This could be taken to suggest that
learners are not simply probability matching (assuming their input includes both
N=i and N=2 examples), and that the learning procedure can terminate in gradient
or categorical output, given gradient input. We do not pursue this possibility in the
models presented below.

TABLE 13.4 Summary of radio pro-


nunciation data (see text)
Word #N= i # Var #N = 2

research 9 6 2
perfume 2 3 4
address 2 2 l

9
The terminology is slightly misleading because the structure of variation (the a stored) differs
between speakers in 'within-speaker' variation as well.
10
See Sonderegger (2009) for details, including the list of stories.
2/2 Morgan Sonderegger and Partha Niyogi

13.3 Motivations for change


We outline several proposed types of causes of phonological change, and for each one
explore its relevance for the observed diachronic dynamics of N/V pair stress.

13.3.1 Mistransmission
An influential line of research holds that many sound changes are based in asymmetric
transmission errors: because of articulatory factors (e.g. coarticulation), perceptual
biases (e.g. confusability between sounds), or ambient distortion between produc-
tion and perception, listeners systematically mishear some sound a as /3, but rarely
mishear as a.11 Such asymmetric mistransmission is argued to be a necessary con-
dition for the change a >> at the population level, and an explanation for why the
change a >> is common, while the change >> a is rarely (or never) observed.
Mistransmission-based explanations were pioneered by Ohala (1981 et seq.), and
have been the subject of much recent work (reviewed by Hansson 2008).
Although N/V pair stress shifts are not sound changes, their dynamics are
potentially amenable to mistransmission-based explanation. There is significant
experimental evidence for perception and production biases in English listeners
consistent with the most commonly-observed diachronic shifts ({2,2}, {1,1}>>{i,2}).
English listeners strongly prefer the typical stress pattern (N=i or =2) in novel
English disyllables (Guin et al. 2003), show higher decision times and error rates
(in a grammatical category assignment task) for atypical (N=2 or V=i) than for
typical disyllables (Arciuli and Cupples 2003), and produce stronger acoustic cues
for typical stress in (real) English N/V pairs (Sereno and Jongman 1995).12 It is also
known that for English disyllables, word stress is misperceived more often as initial
in 'trochaic-biasing' contexts, where the preceding syllable is weak or the following
syllable is heavy; and more often as final in analogously 'iambic-biasing' contexts.
This effect is more pronounced for nouns than for verbs; and nouns occur more
frequently in trochaic contexts (Kelly and Bock 1988; Kelly 1988,1989). Michael Kelly
and collaborators have argued these facts are responsible for both the N/V stress
asymmetry and the directionality of N/V pair stress shifts.

13.3.2 Frequency
Stress shift in English N/V pairsin particular the most common change, the dia-
tonic stress shift (DSS; {2,2}>>{i,2})has been argued to be a case of analogical
11
A standard example is final obstruent devoicing, a common change cross-linguistically. Blevins
(2006) summarizes the evidence that there are several articulatory and perceptual reasons why final voiced
obstruents could be heard as unvoiced, but no motivation for the reverse process (final unvoiced obstruents
heard as voiced).
12
For example, Sereno and Jongman find that the ratio of amplitudes of the first and second syllables
an important cue to stressis greater for initially-stressed N/V pairs (e.g. polic) read in noun context,
compared to verb context.
13. Variation and change in English noun/verb pair stress 273

change (Hock 1991; Kiparsky 1995) or lexical diffusion (Sherman 1975; Phillips 1984,
2006); indeed, the relationship between the two is controversial (see Phillips 2006
vs. Kiparsky 1995; Janda and Joseph, 2003). For both types of change, frequency has
been argued to play a role in determining which forms change first; in particular,
lower-frequency forms are said to be more susceptible to analogical change (e.g.
Manczak 1980), or to change first in cases of lexical diffusion which require lexical
analysis' (Phillips 2006), such as N/V stress shifts. This type of effect has been demon-
strated for the most common N/V stress shift: words with lower frequencies are more
likely to undergo the DSS (Phillips 1984; Sonderegger in press). More precisely, among
a set of N/V pairs pronounced as {2,2} in 1700, those with lower present-day combined
N+V frequency are more likely to have changed to {1,2} by the present.13
There is, however, an important ambiguity to this finding: present-day frequencies
are used, under the implicit assumption that they have changed little diachroni-
cally. We must therefore distinguish between (at least) two hypotheses for why low-
frequency words change (on average) earlier:

1. Words' relative frequencies stay approximately constant diachronically. In a


given year, word a is more likely than word b to change if a is less frequent than b.
2. A word changes when its frequency drops below a (possibly word-specific)
critical value.

Under Hypothesis 2, the reason present-day frequencies are on average lower for
words which have changed is that their frequencies have decreased diachronically.
We can begin to differentiate between these hypotheses by examining diachronic
frequency trajectories for N/V pairs which have changed, and checking whether they
show negative trends. Real-time frequency trajectories (combined N+V frequencies)
were found for six N/V pairs (combat, decrease, dictate, perfume, progress, protest)
which have shifted stress since 1700.14 Figure 13.4 shows frequency trajectories along-
side pronunciation trajectories for these pairs.
Frequencies were found by sampling from prose written by British authors in the
Literature Online (LiOn) database, then normalizing against frequency trajectories
for a set of four reference words. Details and some justification for this normalization
step are given in Appendix C.15

13
Sonderegger (in press) argues that frequency and phonological structure interact to influence which
words undergo the DSS first. Here we refer to the finding that there is a significant main effect of frequency
once prefix class is taken into account.
14
A reviewer suggests that either N or V frequency alone would be a more relevant measure for
particular changes, i.e. change in the stress of the N form might be triggered by change in its frequency or in
the V form's frequency. This seems plausible, and we plan to consider frequency trajectories more carefully
in future work; here we consider N+V frequency rather than N or V frequency alone for compatibility with
previous work (Phillips 1984; Sonderegger in press), where N+V frequency is used.
15
lion. chadwyck. com. Only six words/four reference words were considered because finding trajec-
tories is time-intensive.
2/4 Morgan Sonderegger and Partha Niyogi

FIGURE 13.4 Frequency (lower) and pronunciation (upper) trajectories for combat, decrease,
dictate, perfume, progress, protest

All words show negative correlations between year and N+V frequency, four
out of six of which are significant (p < o.O5).16 Although any conclusion must be
tentative in view of the small number of frequency trajectories considered, these
16
Alphabetically: r = 0.78 (p < o.ooi), r = 0.78 (p < o.i), r = 0.79 (p < o.o), r = 0.32
(p > 0.25), r = 0.76 (p < 0.05), r = 0.74 (p < o.oi).
13. Variation and change in English noun/verb pair stress 275

negative correlations lend support to Hypothesis 2, and rule out the hypothesis that
the frequency trajectories for N/V pairs show no long-term trends. We thus adopt
the working hypothesis that change occurs in an N/V pair when its frequency drops
below a critical level.

13.3.3 Analogy/Coupling
A very broad explanation often invoked in language change is analogy: linguistic
elements which are similar by some criterion change to become more similar. In the
case of N/V pairs, it has been suggested that the most common stress shift, from {2,2}
to {1,2}, could be due to analogical pressure: given the strong tendency in English for
nouns to have earlier stress than verbs (e.g. Ross 1973), speakers 'regularize' {2,2} pairs
to follow the dominant pattern of stress in the lexicon (Phillips 2006: 37-9).
In the context of our N/V diachronic pronunciation trajectories, we restate analogy
as coupling between trajectories. We can check for coupling effects at two levels: within
N/V pairs, and within prefix classes:

Within N/Vpairs We have shown that to a first approximation, trajectories move


along the {1,1} ^> {i> 2} ^> {2, 2} axis (only one of the N or V forms changes at
a time), and the pronunciation {2,1} never occurs. These facts are strong evidence
for coupling between the N and V forms of each pair: if there were no coupling,
there would be no reason why {2,1} could not occur, since N=2 and V=i do occur
independently in the dataset. There would also be no reason for trajectories to mostly
move along this axis.

Within prefix classes Impressionistically, over all N/V pair trajectories, those for
pairs sharing a prefix often seem more similar than would be expected by chance.
For example, many re- pairs were historically {2,2}, then began to change sometime
between 1875 and 1950. We would like a principled way to test the hypothesis of
coupling between the trajectories for words in the same prefix class; to do so, we
need a way to test how much two words change like' each other, or how similar
their trajectories are. We use a simple distance metric over trajectories' dissimilarity
('distance'), denoted d(w> wf) (for N/V pairs w and wf).u
Finding d(w> wf) for all possible word pairs defines a graph G(d) with nodes
M>I, . . . , w149, and edges d(wfy Wj) equal to the distance between w/'s and Wj's trajecto-
ries. This structure suggests a way of testing whether, given a group of words which are
linguistically related, their trajectories are similar: check the goodness of the cluster
formed by their vertices in G. For a subset of vertices C G [n] of G = (V>E)y define

17
Over both N and V trajectories, the sum of the mean trajectory difference and the mean difference
between trajectory first differences. Details are given in Sonderegger (2009).
2/6 Morgan Sonderegger and Partha Niyogi

R(C) to be the mean in-degree of C minus the mean out-degree of C.18 R(C) will be
high if most vertices of C are on average closer to each other than to vertices in V \ C.
This quantity is adapted from a common metric for finding community structure in
networks (Newman and Girvan 2004), with the important difference that here we
are only evaluating one hypothesized community rather than a partitioning of G into
communities.
As a measure of the goodness of a cluster C, let p(C) G [o, i] be the empirical
p-value, defined as the location of R(C) on the distribution of R for all communities
of size |C| in G. The closer the value ofp(C) to zero, the more similar the trajectories
for words in C are, compared to trajectories of a random set of words of size |C|. This
setup can be used to test whether words in List i which share a prefix have similar
trajectories. Table 13.5 shows p(C) for all prefix classes of size |C| > 2.
Many potential prefix classes have small p(C), confirming the initial intuition that
N/V pairs sharing a prefix tend to have more similar trajectories. The corn-Icon- and
im-1 in- categories are particularly interesting because they suggest that it is a shared
morphological prefix rather than simply shared initial segments which correlates with
trajectory similarity. The value of p(C) for combined com- and con- is lower than for
either alone, and the same holds for im-/in-; this makes sense under the assumption
that in- and im- are allomorph of a single underlying prefix.

TABLE 13.5 Prefix class p (C) values, |C| > 2. 'Bound' = re-fi, where
fji is a bound morpheme
c |C| p(Q c |C| p(Q
a- 10 0.270 out- 10 0.055
com- 5 0.067 per 3 0.263
comp- 3 0.032 pre 5 0.065
con- 17 o.ooi pro 4 0.078
cont- 4 0.266 re- 24 0.011
conv- 4 0.033 re- (bound) 8 0.576
conWcon- 22 0.0005 re- (unbound) 16 0.0017
de- 7 0.285 sub- 3 0.710
de- w/o des- 5 0.050 sur- 2 0.475
dis- 5 0.746 trans- 3 0.173
ex- 6 0.981 up- 7 0.196
im- 4 0.02l
in- 12 0.029
im-/in- 16 0.004

18
13. Variation and change in English noun/verb pair stress 277

We also find that larger classes have lowerp(C): there is a significant negative rela-
tionship between |C| andlog(p(C)) (r = 0.72, p < io~ 4 ) for the data in Table 13.5.
That is, larger classes show stronger analogical effects, in the sense of trajectory simi-
larity considered here.

13.4 Modeling
We have so far described the diachronic dynamics of variation and change in the
stress of N/V pairs, and proposed causes for these dynamics. We now build dynamical
systems models to test whether some proposed causes, implemented in the learn-
ing algorithm used by individuals, lead to one aspect of the observed dynamics at
the population level: change following long-term stability, which in the language of
dynamical systems corresponds to the presence of a bifurcation. This is only one of the
multiple patterns observed in the data; the remainder are in part addressed elsewhere
(Sonderegger 2009; Sonderegger and Niyogi 2010) and in part left to future work.

13.4.1 The dynamical systems approach


We derive models in the dynamical systems framework, which over the past fifteen
years has been used to model the interaction between language learning and language
change in a variety of settings (Niyogi and Berwick 1995, 1996; Komarova et al.
2001; Yang 2001, 2002; Mitchener 2005; Niyogi 2006; Pearl and Weinberg 2007) This
framework is not a theory of language change, but a formalism to test theories of how
change occurs. More precisely, it allows us to determine the diachronic, population-
level consequences of assumptions about the learning algorithm used by individuals,
as well as assumptions about population structure, the input received by learners, etc.
Our models are discrete dynamical systems, or iterated maps.19 Given a domain X,
an iterated map is a function / : X >> X that 'iterates' the system by one step. If a
system has value at X at step f, it has value a+1 = f((xt) e X at step f + i. In mod-
els considered here, X = [0,1]. For example, cc t G [o, i] will mean that at time f, the
probability that a random N example from the population (for a particular N/V pair)
is produced with final stress is P(N = 2) = ot> and the probability it is produced with
initial stress is P(N = i) = i at. Because we have assumed no coupling between
the N and V forms, the situation for V, represented for example by t G [o, i], would
be the same.20
Example 13.4.1 Let I = [0,1], and let/(#) = xay where a > o, so that at+i = <xta>
Solving for ot gives ot = a0at. However, unlike in this example, for a given/ it is
usually impossible to explicitly solve for ot as a function of the initial state otQ. The
19
See Strogatz (1994) for an introduction to dynamical systems.
20
In models of coupling between the N and V forms of a pair, the domain is [o, i] 2 , with (cttyi)
corresponding to the N and V probabilities at t.
2/8 Morgan Sonderegger and Partha Niyogi

dynamical systems viewpoint is to instead look at the system's long-term behavior as


a function of the initial state.
Definition 13.4.2 a* G X is a fixed point off if a* =f(a*\
In the example, o and i are fixed points. However, when a is fixed, there is a qual-
itative difference between them. For a fixed o < a < i, for any initial state o0 ^ o,
lim a t = i. o is 'unstable' in the sense that perturbing o0 from o gives different long-
term behavior (t >> oo), while i is 'stable' in the sense that perturbing o0 from i
does not.
Definition 13.4.3 A fixed point a* is stable if lim ot = o*foro0 near a*, andunsta-
t^OG
ble otherwise.
Stability turns out to be equivalent to a simple condition on/: a fixed point c* is stable
if and only if \f'(c*) \ < i, where/7 denotes the derivative of/.
Definition 13.4.4 A bifurcation occurs when the number or stability of fixed points
changes as a system parameter is changed.
For example, in Ex. 13.4.1, there is a bifurcation at a = i where the fixed points o and
i exchange stabilities.
A central insight of the dynamical systems approach to modeling language change
is that the pattern characterizing much language changethe sudden onset of change
following a long period of stabilitycan be understood as a bifurcation in which a
fixed point loses stability as some system parameter drifts past a critical value (Niyogi
2006). In linguistic populations, system parameters could be the frequency of a word
or cue, the probability of misperceiving one segment as another, or the relative fre-
quency of contact with speakers of one dialect versus another.
Although we mostly do not give derivations here, the task of a dynamical systems
analysis is determining how the location, number, and stability of fixed points vary as
a function of system parameters.
We make the following assumptions in all models considered below:

Learners in generation n learn from generation n i.


Each example a learner in generation n hears is equally likely to come from any
member of generation n i.
Each generation has infinitely many members.
Each learner receives an identical number of examples.

These are idealizations, adopted here to keep models relatively simple. The effects of
dropping each assumption are explored in Niyogi (2006) and Sonderegger (2009).
We also assume here that probabilities of producing initial vs. final stress for nouns
and verbs are learned separately: that is, there is no 'coupling' between them. However,
13. Variation and change in English noun/verb pair stress 279

a range of models for the N/V case incorporating coupling are considered in Son-
deregger (2009) and Sonderegger and Niyogi (2010).

13.4.2 Model i: Probability matching, mistransmission


Consider a population of learners following the above assumptions. Member i of
generation t learns a probabilityp/ G [o, i], which characterizes the probability with
which she uses form 2, versus form i. As input to learners of the next generation, she
produces form 2 examples with probability p/^ and form i examples with probability
1
- Pi,t-
Let at be the mean value of p/^. We add in mistransmission errors (section 13.3.1)
via mistransmission probabilities that one form is heard when the other was intended:
a = P(i heard | 2 intended), b = P(2 heard | i intended)
In generation t + i, learner i setsp/^ +1 by probability matching, as follows:
Draw N examples from generation t. Let fc/,+i be the number of examples heard
as form 2.
Setp/, +1 = fc/,f+i/N
The evolution equation in this case works out to

ai+1 =f(ctt) = ot(i - a) + (i - ctt)b (13.1)


(see Sonderegger, 2009: section 5.2.2). Solving/(a*) = a*, there is a unique, stable
fixed point at a* = -^^. The location of a* does not depend on N, meaning word-
frequency plays no role in the dynamics.
The dynamics show no bifurcations: as system parameters a, b, N are varied, the
fixed point's location changes smoothly as a function of the ratio of the mistransmis-
sion probabilities. Thus, this model does not show the desired property of change
following long-term stability, as a system parameter passes a critical value.

13.4.3 Model 2: Discarding


Model i assumes that each learner hears N examples, every one of which is heard as
form i or form 2. We now consider a population of learners where each example can
be heard as form i, form 2, or discarded. Learners then probability match based on
only a subset of the data, the non-discarded examples.21 For the case of N/V pair stress,
the experimental literature suggests one (speculative) reason English learners might
discard some examples. Suppose learners discard examples where they are uncertain
about stress placement. Given that the acoustic cues to stress in typically-stressed

21
This model of the learner is similar in spirit to the idea of'input filtering' suggested in Lisa Pearl's
computational studies of English acquisition and change (Pearl 2007 et seq.), where learners consider only
examples relevant to the cue currently being set.
28o Morgan Sonderegger and Partha Niyogi

examples (N=i, V=2) are stronger than in atypically-stressed examples (N=2, V=i)
for at least some speakers (Sereno and Jongman 1995), some atypically-stressed exam-
ples might be discarded by learners.
We define discarding probabilities that form i or form 2 examples are discarded:

r x = P(discarded | i intended), r2 = P(discarded 2 intended)

and define p/^ as above. For learner i in generation t + i, the algorithm is:

Draw N examples from generation f, of which k^ i+1 are heard as form 2, k^ i+1
as form i, and N k^ij+i k^2\yt+i are discarded.

Setp/, i+1 =

where r e [o,ij.

That is, the learner's default strategy when all examples are discarded is to set p = r
(for r fixed). For any N and non-zero discarding probabilities, there is always some
chance (though possibly very small) that all examples are discarded. Where r comes
from is left ambiguous; for example, it could be the percentage of known disyllabic
words with final stress.
The evolution equation works out to

U3-2)
(see Sonderegger 2009: section 5.5.1). In the high-frequency (N >> oo) limit, this
reduces to:

(13.3)
In practice, the long-term dynamics of Eqn 13.3in particular the location of the
unique stable fixed pointare extremely similar to the true (frequency-dependent)
evolution equation (Eqn 13.2) for N greater than a small value (^3-5, depending on
the values of r1 and r 2 ). That is, the long-term dynamics are only affected by frequency
for very small N. We thus only consider Eqn 13.3 here.
Solving (Xt+i = ot in Eqn 13.3 gives two fixed points: o- = o and a+ = i. There
is a bifurcation at r x = r2: for r x < r 2 , U- is stable and a+ is unstable; for r2 < r y o-
is unstable and a+ is stable. Intuitively, the form with a higher probability of being
discarded is eliminated.

13.4.4 Model 3: Discarding + mistransmission


Consider now a simple model incorporating both mistransmission and discarding.
For a given example, define a,b,R G [o, i] such that:
13. Variation and change in English noun/verb pair stress 281

P(H = i | 7 = i ) = i - b P(H = 2\I = 2) = i-a


P(H = 2\I=i) = bR P(H = i\I = 2) = aR
P(discarded \I=i) = b(i-R) P(discarded 1 1 = 2) = 0(1 - R)

where H=heard', I=intended'. Values a and b are now the probabilities of not hearing
a form i example as form L When this occurs, the probability than the example is
heard as the wrong form, rather than being discarded, is R.
The learning algorithm for member i of generation t + i is the same as in Model 2,
but now k^2\j+i may include some mistransmitted form i examples (and similarly
forfcW/.f+J.'
Analysis of the resulting evolution equation (Sonderegger 2009: section 5.5.2)
shows there is a single fixed point, a*, and thus no bifurcations. Similarly to Model
2, there is essentially no effect of frequency on long-term dynamics for N above a
relatively small value, and we thus consider the high-frequency limit of the evolution
equation. The location of c* as a function of a b as R is varied is plotted in Fig-
ure 13.5. -R controls how 'bifurcation-like' the curve is: for R small, c* changes rapidly

FIGURE 13.5 Location of a* vs. b a, a + b = 0.5, for different values of R


282 Morgan Sonderegger and Partha Niyogi

at a = b\ for R >> i, a* varies smoothly as a function of a b. However, there is


no bifurcation to bifurcation-like behavior: adding any mistransmission R > o elim-
inates the bifurcation seen in Model 2.

13.5 Discussion
We have described a diachronic corpus of N/V pair stress, the dynamics of stress
shifts observed in the corpus, and several proposed factors driving this change: mis-
transmission, word frequency, and analogy. We then determined the population-level,
diachronic dynamics of three models of learning, to explore which models show
bifurcations, i.e. which give stability followed by sudden change as system parameters
are varied. We did not evaluate models with respect to the frequency or analogical
effects observed in the corpus (sections 13.3.2-13.3.3); however, both are considered
in the larger set of models described elsewhere (Sonderegger 2009; Sonderegger and
Niyogi 2010).
Following an idea proposed by Niyogi (2006), we suggest that bifurcations in the
diachronic dynamics of a linguistics population are a possible explanation for the core
of the actuation problem: how and why does language change begin in a community,
following long-term stability? This viewpoint suggests a powerful test of theories of the
causes of language change: do their diachronic dynamics show bifurcations? We found
that mistransmission alone (Model i) does not give bifurcations, while discarding
alone (Model 2) does: the form more likely to be discarded is eliminated from the
population. Combining mistransmission and discarding (Model 3) eliminates bifur-
cations, but gives more or less bifurcation-like behavior as the relative probability of
mistransmission and discarding is varied.
In line with other computational work on population-level change where several
models are considered (e.g. Liberman 2000; Daland et al. 2007; Baker 2008; Troutman
et al. 2008), the different dynamics of Models 1-3 illustrate that different proposed
causes for change at the individual level, each of which seems plausible a priori, can
have very different population-level diachronic outcomes. Among models tested here,
only those including discarding showed bifurcations (Model 2) or bifurcation-like
behavior (Model 3); the model including only mistransmission (Model i) did not.
Given the popularity of mistransmission-based explanations of phonological change,
this result illustrates an important point: because of the non-trivial map between
individual learning and population dynamics, population-level models are necessary
to evaluate any theory of why language change occurs.
13. Variation and change in English noun/verb pair stress 283

Appendix A: Word list from Sherman (1975) (List i)


Script indicates first reported pronunciation: {1,1}, {2,2}, {1,2}

abstract concrete defile export legate premise redress survey


accent conduct descant extract misprint presage refill suspect
addict confect desert ferment object present refit torment
address confine detail impact outcast produce refund transfer
affect conflict dictate import outcry progress refuse transplant
affix conscript digest impress outgo project regress transport
alloy conserve discard imprint outlaw protest rehash transverse
ally consort discharge incense outleap purport reject traverse
annex content discord incline outlook rampage relapse undress
assay contest discount increase outpour rebate relay upcast
bombard contract discourse indent outspread rebel repeat upgrade
cement contrast egress infix outstretch rebound reprint uplift
collect converse eject inflow outwork recall research upright
combat convert escort inlay perfume recast reset uprise
commune convict essay inlet permit recess sojourn uprush
compact convoy excerpt insert pervert recoil subject upset
compound decoy excise inset postdate record sublease
compress decrease exile insult prefix recount sublet
concert defect exploit invert prelude redraft surcharge

Appendix B: Sample of words in use 1700-2007 (List 2)


Script indicates pronunciation from Boyer (1700), as above. Asterisk indicates that
1700, 1847 (James and Mole 1847), and 2007 (Cambridge Advanced Learner's Dic-
tionary, OED) entries are not identical. Sample selection is described in Section 13.2.

abuse buckle decrease* forecast* levy premise* repeal thunder


accent bundle decree forward licence present repose title
advance butter diet gallop license proceed* reserve torment
affront cement* digest* glory matter protest* review travel
ally* challenge dispatch hammer measure purchase rival treble
anchor channel dissent handle mention puzzle saddle triumph
arrest command distress harbour merit quarry second trouble
assault concern double hollow motion reason shiver value
assay conduct envy import* murder redress shoulder visit
attack consort exile* increase* muster reform squabble vomit
bellow contest express interest order regard stable whistle
blunder contract favour iron outlaw relapse* stomach witness
bottom convict ferret journey pepper relish table
breakfast cover flourish level plaster remark tally

Appendix C: Frequency trajectory normalization


Because LiOn only gives absolute counts, we normalized the N/V pair frequency tra-
jectories in section 13.3.2 by the (summed) frequency trajectories of four words from
the Swadesh list: red, walk, man, flower. For the conclusion reached in section 13.3.2
284 Morgan Sonderegger and Partha Niyogi

(that the N/V pairs considered decrease in frequency over time) to be valid, it must be
the case that this set of reference words remains approximately constant in frequency
over time. We checked that these words' frequencies show no time trends in two ways.
First, when normalized by the LiOn frequency trajectory of one extremely frequent
word (the)y whose frequency presumably is approximately constant diachronically, the
sum frequency of the reference words shows no time trend (p > o.i for both Pearson
and Spearman correlations). Second, the summed relative frequencies of the set of
reference words (i.e. occurrences per million) show no time trends (p > 0.15, Pear-
son and Spearman) in the Corpus of Historical American English (COHA), which
includes 400 million words from 1810-2000.22 Although COHA covers a different
dialect of English and a somewhat different time period than the N/V pair frequency
trajectories, it is the largest available diachronic corpus of English, and thus provides
some reassurance that the summed frequency of the set of reference words chosen is
not especially volatile.

corpus.byu.edu/coha/, beta version.


References
Abdullaev, Yalchin G. and Melnichuk, Konstantin V. (1997). Cognitive operations in the human
caudate nucleus. Neuroscience Letters, 234,151-5.
Abramson, Arthur (1962). The vowels and tones of Standard Thai: Acoustical measurements
and experiments. Indiana University Research Center in Anthropology, Folklore, and
Linguistics, Bloomington.
and Lisker, Leigh (1985). Relative power of cues: Fo shift versus voice timing. In Phonetic
Linguistics (ed. V. A. Fromkin), pp. 25-33. Academic Press, San Diego.
and Ren, Nianqi (1990). Distinctive vowel length: Duration versus spectrum in Thai.
Journal of Phonetics, 18, 79-92.
Abrego-Collier, Carissa, Grove, Julian, Sonderegger, Morgan, and Yu, Alan C. L. (2011). Effects
of speaker evaluation on phonetic convergence. In Proceedings of the International Congress
of the Phonetic Sciences. ICPhS.
Agresti, Alan (1996). An introduction to categorical data analysis. Wiley, New York.
Aikhenvald, Alexandra Y. (1996). Words, phrases, pauses and boundaries: evidence from South
American Indian Languages. Studies in Language 20, 487-517.
Alderete, John D. and Frisch, Stefan A. (2007). Dissimilation in grammar and the lexicon. In
The Cambridge handbook of phonology (ed. P. de Lacy), pp. 379-98. Cambridge University
Press, Cambridge.
Allen, George D. (1985). How the young French child avoids the pre-voicing problem for word-
initial voiced stops. Journal of Child Language, 12, 37-46.
Allen, J. Sean and Miller, Joanne L. (2004). Listener sensitivity to individual talker dif-
ferences in voice-onset-time. Journal of the Acoustical Society of America, 115(6),
3171-83.
Alfonso, P. and Baer, T. (1982). Dynamics of vowel articulation. Language and Speech 25,
151-73.
Amayreh, Mousa M. and Dyson, Alice T. (1998). The acquisition of Arabic consonants. Journal
of Speech, Language, and Hearing Research, 41, 642-53.
Anderson, Gregory D. S. (2008). The velar nasal. In The world atlas of language structures
online (eds. M. Haspelmath, M. Dryer, D. Gil, and B. Comrie). Max Planck Digital Library,
Munich.
Anderson, Stephen R. (1981). Why phonology isn't 'natural'. Linguistic Inquiry 12, 493-539.
Andruski, Jean E., Khl, Patricia K., and Hayashi, A. (1999). The acoustics of vowels in Japanese
women's speech to infants and adults. In Proceedings of the i4th International Congress of
Phonetic Sciences, Berkeley, pp. 2177-9. University of California.
Archangeli, Diane and Pulleyblank, Douglas (1994). Grounded phonology. MIT Press,
Cambridge, MA.
Arciuli, Joanne and Cupples, Linda (2003). Effects of stress typicality during speeded grammat-
ical classification. Language and Speech, 46(4), 353-74.
286 References

Arvaniti, Amalia (2007). Greek phonetics: The state of the art. Journal of Greek Linguistics, 8,
97-208.
Ashby, E Gregory and Maddox, W. Todd (1993). Relations between prototype, exemplar,
and decision bound models of categorization. Journal of Mathematical Psychology, 37,
372-400.
Aslin, Richard N. and Pisoni, David B. (1980). Some developmental processes in speech per-
ception. In Child phonology: Perception (eds. G. Yeni-Komishan, J. Kavanaugh, and C. Fer-
guson), Volume 2, pp. 67-96. Academic Press, New York.
Hennessy, Beth L., and Percy, Alan J. (1981). Discrimination of voice onset time
by human infants: New findings and implications for the effects of early experience. Child
Development, 52,1135-45.
Ausburn, Lynna J. and Ausburn, Floyd B. (1978). Cognitive styles: Some information and
implications for instructional design. Educational Communication and Technology, 26(4),
337-54-
Austin, Elizabeth J. (2005). Personality correlates of the broader autism phenotype as assessed
by the autism spectrum quotient (AQ). Personality and Individual Differences, 38, 451-60.
Aylett, Matthew and Turk, Alice (2004). The smooth signal redundancy hypothesis: A func-
tional explanation for relationships between redundancy, prosodie prominence, and dura-
tion in spontaneous speech. Language and Speech, 47(1), 31-56.
Baayen, R. Harald (2008). Analyzing linguistic data: A practical introduction to statistics. Cam-
bridge University Press, Cambridge.
Piepenbrock, Richard, and Gulikers, Leon (1996). CELEX2 (CD-ROM). Linguistic Data
Consortium, Philadelphia.
Babel, Molly (2009). Phonetic and social selectivity in speech accomodation. PhD thesis,
University of California, Berkeley.
(2010). Dialect convergence and divergence in New Zealand English. Language in Soci-
ety, 39(4), 437-56.
and McGuire, Grant (2010). A cross-modal account for synchronie and diachronic pat-
terns of /{/ and /6/. Unpublished manuscript, University of British Columbia and University
of California, Santa Cruz.
Bailey, Anthony, Couteur, Ann Le, Gottesman, Irving, Bolton, Patrick, Simonoff, Emily, Yuzda,
E., and Rutter, Michael (1995). Autism as a strongly genetic disorder: evidence from a British
twin study. Psychological Medicine, 25, 63-77.
Baker, Adam (2008). Addressing the actuation problem with quantitative models of sound
change. Penn Working Papers in Linguistics, 14(1), 1-13.
Baldi, Pierre and Itti, Laurent (2010). Of bits and wows: A Bayesian theory of surprise with
applications to attention. Neural Networks, 23(5), 649-66.
Baran, Jane A., Zlatin Laufer, Marsha, and Daniloff, Ray (1977). Phonological contrastivity in
conversation: a comparative study of voice onset time. Journal of Phonetics, 5, 339-50.
Barnes, Jonathan (2006). Strength and weakness at the interface: Positional neutralization in
phonetics and phonology. Mouton de Gruyter, Berlin.
Baron-Cohen, Simon (2002). The extreme male brain theory of autism. Trends of Cogntive
Sciences, 6, 248-54.
(2003). The essential difference: Men, women and the extreme male brain. Penguin, London.
References 287

Baron-Cohen, Simon, Richler, Jennifer, Bisarya, Dheraj, Gurunathan, Nhishanth, and Wheel-
wright, Sally (2003). The Systemising Quotient (SQ): An investigation of adults with
Asperger Syndrome or high functioning autism and normal sex differences. Philosophical
Transactions of the Royal Society, Series B, 358, 361-74.
and Wheelwright, Sally (2004). The Empathy Quotient: An investigation of adults with
Asperger Syndrome or High Functioning Autism and normal sex differences. Journal of
Autism and Developmental Disorders, 34(2), 163-75.
Hill, Jacqueline, Raste, Yogini, and Plumb, Ian (200la). The Reading the Mind in the
Eyes Test revised version: a study with normal adults, and adults with Asperger syndrome
or high-functioning autism. Journal of Child Psychology and Psychiatry, 42, 241-51.
Skinner, Richard, Martin, Joanne, and Clubley, Emma (20oib). The Autism-
Spectrum Quotient (AQ): evidence from Asperger syndrome/high-functioning autism,
males, females, scientists and mathematicians. Journal of Autism and Developmental Dis-
orders, 31, 5-17.
Baudouin de Courtenay, Jan N. (i972a). An attempt at a theory of phonetic alternations [origi-
nally published in 1895]. In A Baudouin de Courtenay anthology: The beginnings of structural
linguistics (ed. E. Stankiewicz), Indiana University Studies in the History and Theory of
Linguistics, pp. 144-212. Indiana University Press, Bloomington. Edited and translated by
E. Stankiewicz (1972).
(i972b). The difference between phonetics and psychophonetics [originally published in
1927]. In A Baudouin de Courtenay anthology: The beginnings of structural linguistics (d. E.
Stankiewicz). Indiana University Press.
Baumbach, Ernst J. M. (1987). Analytical T songa grammar. University of South Africa, Pretoria.
Bayliss, Andrew P. and Tipper, Steven P. (2005). Gaze and arrow cueing of attention reveals
individual differences along the autism spectrum as a function of target context. British
Journal ofPscyholgy, 96, 95-114.
Beckman, Jill (1997). Positional faithfulness, positional neutralization, and Shona vowel har-
mony. Phonology 14,1-46.
Beddor, Patrice S. (2009). A coarticulatory path to sound change. Language, 85(4), 785-832.
Brasher, Anthony, and Narayan, Chandan (2007). Applying perceptual methods to the
study of phonetic variation and sound change. In Experimental approaches to phonology (eds.
M.-J. Sol, P. S. Beddor, and M. Ohala), pp. 125-43. Oxford University Press.
Harnsberger, James D., and Lindemann, Stephanie (2002). Language-specific patterns of
vowel-to-vowel coarticulation: acoustic structures and their perceptual correlates. Journal of
Phonetics, 30, 591-627.
and Krakow, Rena A. (1999). Perception of coarticulatory nasalization by speakers of
English and Thai: Evidence for partial compensation. Journal of the Acoustical Society of
America, 106(5), 2868-87.
and Foldstein, Lovis (1986). Perceptual constraints and phonological change: a study
of nasal vowel height. In Phonology yearbook 3 (eds. C. Ewen and J. Anderson), pp. 197-217.
Cambridge University Press.
and Lindemann, Stefanie (2001). Patterns of perceptual compensation and their
phonological consequences. In The role of perceptual phenomena in phonology (eds. E. Hume
and K. Johnson), pp. 55-78. Academic Press.
288 References

Behrens, Susan and Blumstein, Sheila (1988). On the role of amplitude of the fricative noise
in the perception of place of articulation in voiceless fricative consonants. Journal of the
Acoustic Society of America, 84(3), 861-7.
Bell, Allan (1984). Language style as audience design. Language in Society, 13,145-204.
Bell, Alan, Brenier, Jason M., Gregory, Michelle, Girand, Cynthia, and Jurafsky Dan (2009).
Predictability effects on durations of content and function words in conversational English.
Journal of Memory and Language, 60, 92-111.
Bell-Berti, Fredericka and Harris, Katherine (1976). Some aspects of coarticulation. Haskins
Laboratories Status Report on Speech Research, SR45/46,197-204.
Berg, Thomas (1998). Linguistic structure and change: An explanation from language processing.
Oxford University Press, Oxford.
Bergem, Dick R. Van (1993). Acoustic vowel reduction as a function of sentence accent, word
stress, and word class. Speech Communication, 12,1-23.
Bernstein Ratner, N. (1984). Patterns of vowel modification in mother-child speech. Journal of
Child Language, 11, 557-78.
Bessell, Nicola J. (1998). Local and non-local consonant-vowel interaction in Interior Salish.
Phonology, 15,1-40.
Bladon, Richard A. W. and Al-Bamerni, Ameen (1976). Coarticulation resistance in English III.
Journal of Phonetics, 4,137-50.
Blevins, Juliette (2004). Evolutionary phonology: The emergence of sound patterns. Cambridge
University Press, Cambridge.
(2005). Understanding antigemination: Natural or unnatural history. In Linguistic diver-
sity and language theories (eds. Z. Frajzyngier, D. Rood, and A. Hodges), pp. 203-34.
Benjamins, Amsterdam.
(20o6a). A theoretical synopsis of Evolutionary Phonology. Theoretical Linguistics, 32(2),
117-66.
(2oo6b). Reply to commentaries. Theoretical Linguistics, 32, 245-56.
(20o8a). Consonant epenthesis: Natural and unnatural histories. In Proceedings of the
Workshop on Explaining Linguistic Universals (ed. J. Good), pp. 79-107. Oxford University
Press, Oxford.
(2oo8b). Natural and unnatural sound patterns: A pocket field guide. In Naturalness and
iconicity in language (eds. K. Willems and L. D. Cuypere), pp. 121-48. John Benjamins,
Amsterdam.
and Garrett, Andrew (1998). The origins of consonant-vowel metathesis. Language, 74,
508-56.
(2004). The evolution of metathesis. In Phonetically based phonology (ed. B. Hayes,
R. Kirchner, and D. Steriade), pp. 117-56. Cambridge University Press, Cambridge.
and Wedel, Andrew (2009). Inhibited sound change: An evolutionary approach to lexical
competition. Diachronica, 26,143-83.
Bloomfield, Leonard (1933). Language. H. Holt and Company, New York.
Blumstein, Sheila E., Baker, Errol, and Goodglass, Harold (1977). Phonological factors in audi-
tory comprehension in aphasia. Neuropsychologia, 15,19-30.
Bod, Rens, Hay, Jennifer, and Jannedy, Stephanie (2003). Probabilistic linguistics. MIT Press.
References 289

Boersma, Paul (1998). Functional phonology. PhD thesis, University of Amsterdam.


and Hayes, Bruce (2001). Empirical tests of the gradual learning algorithm. Linguistic
Inquiry, 32, 45-86.
Bond, Zinny S. (1999). Slips of the ear: Errors in the perception of casual conversation. Academic
Press, San Diego.
Booij, Geert (1984). Principles and parameters in prosodie phonology. In Explanation of lan-
guage universals (eds. B. Butterworth, B. Comrie, and 0. Dahl), Linguistics 21, 249-80.
Bonnel, Anna, Mottron, Laurent, Peretz, Isabelle, Trudel, Manon, Gallun, Erick, and Bonnel,
Anne-Marie (2003). Enhanced pitch sensitivity in individuals with autism: A signal detec-
tion analysis. Journal of Cognitive Neuroscience, 15, 226-35.
Boomer, Donald S. and Laver, John D. M. (1968). Slips of the tongue. Disorders of Communi-
cation, 3, 1-12.
Boucher, Victor J. (2002). Timing relations in speech and identification of voice-onset times:
A stable perceptual boundary for voicing categories across speaking rates. Perception and
Psychophysics, 64(1), 121-30.
Bourhis, Richard Y. and Giles, Howard (1977). The language of intergroup distinctiveness.
In Language, ethnicity and intergroup relations (ed. H. Giles), pp. 119-35. Academic Press,
London.
Boyeldieu, Pascal (2009). Le quatrime ton du yulu. Journal of African Languages and Linguis-
tics, 30(2), 193-230.
Boyer, Abel (1700). The Royal Dictionary abridged. In two parts. I. French and English. II. English
and French. Containing near five thousand words more than any French and English dictionary
yet extant, besides the Royal. To which is added, the accenting of all English words, to facilitate
the pronunciation of the English tongue to foreigners. Printed for R. Clavel, H. Mortlock,
S. Lowndes, J. Robinson, D. Brown, W. Hensman, S. Crouch, E. Evets, J. Lawrence, R. Sare,
A. Churchill, S. Smith, T. Home, J. Taylor, T. Bennet, J. Knapton, J. Wyat, R. Wilkins, E.
Castle, D. Midwinter, London.
Bradlow, Ann R., Pisoni, David B., Akahane-Yamada, R., and Tohkura, Y. (1997). Training
Japanese listeners to identify English /r/ and III: IV. some effects of perceptual learning on
speech production. Journal of the Acoustical Society of America, 101, 2299-310.
Bradshaw, Mary (1995). Tone on verbs in Suma. In Theoretical approaches to African linguistics
(ed. A. Akinlabi), pp. 255-72. Africa World Press, Inc., Trenton, NJ.
(1999). A crosslinguistic study of consonant-tone interaction. PhD thesis, Ohio State Uni-
versity.
Brent, Michael and Siskind, Jeffrey M. (2001). The role of exposure to isolated words in early
vocabulary development. Cognition, Si, 633-644.
Broe, Michael B. (1996). A generalized information-theoretic measure for systems of phono-
logical classification and recognition. In Computational phonology in speech technology: Sec-
ond meeting of the ACL special interest group in computational phonology (ed. R. Sproat),
pp. 17-24. Association for Computational Linguistics.
Browman, Catherine P. and Goldstein, Louis (1986). Towards an articulatory phonology. In
Phonology Yearbook (eds. C. Ewen and J. Anderson), Volume 3, pp. 219-52. Cambridge
University Press, Cambridge.
290 References

Browman, Catherine P. and Goldstein, Louis (1988). Some notes on syllable structure in artic-
ulatory phonology. Phonetica, 45, 140-55.
(i99oa). Gestural specification using dynamically-defined articulatery structures.
Journal of Phonetics, 18, 299-320.
(i99ob). Tiers in articulatory phonology, with some implications for casual
speech. In Papers in laboratory phonology I: Between the grammar and the physics of
speech (eds. M. Beckman and J. Kingston), pp. 341-76. Cambridge University Press,
Cambridge.
Bullock, Daniel (2004). Adaptive neural models of queuing and timing in fluent action. Trends
in Cognitive Sciences, 8(9), 426-33.
and Rhodes, Bradley J. (2003). Competitive queuing for serial planning and performance.
In Handbook of brain theory and neural networks (ed. M. Arbib), pp. 241-4. MIT Press,
Cambridge, MA.
Burton, Martha W., Small, Steven L., and Blumenstein, Sheila E. (2000). The role of seg-
mentation in phonological processing: An fMRI investigation. Journal of Cognitive Neuro-
science, 12, 679-90.
Butskhrikidze, Marika and de Weijer, Jeroen Van (2001). On v-metathesis in modern Geor-
gian. In Surface syllable structure and segment sequencing, pp. 91-101. Holland Institute of
Linguistics.
Bybee, Joan (1985). Morphology: a study of the relation between meaning and form. John Ben-
jamins, Amsterdam.
(2001). Phonology and language use. Cambridge University Press, Cambridge.
(2002). Word frequency and context of use in the lexical diffusion of phonetically condi-
tioned sound change. Language Variation and Change, 14, 261-90.
(2007). Frequency of use and the organization of language. Oxford University Press, New
York.
Chakraborti, Paromita, Jung, Dagmar, and Scheibman, Joanne (1998). Prosody and seg-
mentai effect: Some paths of evolution for word stress. Studies in Language 22, 267-314.
and Hopper, Paul (eds.) (2001). Frequency and the emergence of linguistic structure. John
Benjamins, Amsterdam.
Bye, Patrik (2011). Dissimilation. In The Blackwell companion to phonology (eds. M. van Oos-
tendorp, C. J. Ewen, E. Hume, and K. Rice), Chapter 63, pp. 1408-33. Wiley-Blackwell,
Oxford.
Byrd, Dani (1994). Articulatory timing in English consonant sequences. Volume 86, Working
Papers in Phonetics. Department of Linguistics, UCLA, Los Angeles.
and Saltzman, Elliot (1998). Intragestural dynamics of multiple prosodie boundaries.
Jour nal of Phonetics, 26,173-99.
Catucoli, Claude (1978). Schemes tonals et morphologie du verbe en masa. In Pralables
la reconstruction du proto-tchadique (eds. J.-P. Caprile and H. Jungraithmayr), pp. 67-93.
SELAF, Paris.
Camacho, Arturo (2007). SWIPE': A sawtooth waveform inspired pitch estimator for speech and
music. PhD thesis, University of Florida.
Campbell, Lyle (2004). Historical linguistics: An introduction (2nd edn). MIT Press, Cambridge,
Mass.
References 291

Campbell-Kibler, Kathryn (2005). Listener perceptions of sociolinguistic variables: The case of


(ING). PhD thesis, Stanford University.
Carnoy, Albert J. (1918). The real nature of dissimilation. Transactions and Proceedings of the
American Philological Association, 49,101-113.
Catford, John C. (1977). Diachronie phonetics. Department of Linguistics, University of Michi-
gan (originally intended to be Chapters 13 and 14 of Catford, Fundamental Problems in
Phonetics, 1977).
(2001). On Rs, rhotacism and paleophony. Journal of the International Phonetic Associa-
tion, 31,171-85.
Chambers, Jack K. (2003). Sociolinguistic theory: Linguistic variation and its social significance
(2nd edn). Blackwell, Oxford.
Chandrasekaran, Bharath, Krishnan, Ananthanarayan, and Candour, Jackson T. (2009).
Relative influence of musical and linguistic experience on early cortical processing of pitch
contours. Brain and Language, 108(1), 1-9.
Sampath, Padma D., and Wong, Patrick C. M. (2010). Individual variability in cue-
weighting and lexical tone learning. Journal of the Acoustical Society of America, 128(1),
456-65.
Chang, Steve, Plauch, Madeleine, and Ohala, John J. (2001). Markedness and consonant
confusion asymmetries. In The role of speech perception in phonology (eds. E. Hume and
K. Johnson), pp. 79-101. Academic Press, San Diego.
Chater, Nick, Tenenbaum, Josh B., and Yuille, Alan (2006). Probabilistic models of cognition:
Conceptual foundations. Trends in Cognitive Sciences, 10(7), 287-91.
Chen, Marilyn Y. (1997). Acoustic correlates of English and French nasalized vowels. Journal
of the Acoustical Society of America, 102(4), 2360-70.
Chen, Matthew (1970). Vowel length variation as a function of the voicing of the consonant
environment. Phonetica 22,129-59.
Cherry, Colin, Halle, Morris, and Jakobson, Roman (1953). Toward the logical description of
languages in their phonemic aspect. Language, 29(1), 34-46.
Cheshire, Jenny, Fox, Sue, Kerswill, Paul, and Torgersen, Eivind (2008). Ethnicity, friendship
network and social practices as the motor of dialect change: linguistic innovation in Lon-
don. In Sociolinguistica: International yearbook of European sociolinguistics, pp. 1-23. Max
Niemeyer Verlag.
Chitoran, loana and Cohn, Abigail C. (2009). Complexity in phonetics and phonology: gradi-
ence, categoriality, and naturalness. In Approaches to phonological complexity (eds. C. Coupe,
E. Marsico, F. Pellegrino, and I. Chitoran), pp. 19-46. Walter de Gruyter, Berlin and New
York.
Cho, Taehong (2001). Effects of morpheme boundaries on intergestural timing: Evidence from
Korean. Phonetica, 58,129-62.
Jun, Sun-Ah, and Ladefoged, Peter (2002). Acoustic and aerodynamic correlates of Korean
stops and fricatives. Journal of Phonetics, 30,193-228.
Chomsky, Noam (1986). Knowledge of language: Its nature, origins and use. Praeger,
New York.
and Halle, Morris (1968). The sound pattern of English. Harper and Row, New York.
292 References

Choudhury, Monojit (2007). Computational models of real world phonological change. PhD
thesis, Indian Institute of Technology, Kharagpur, India.
Christophe, Anne, Peperkamp, Sharon, Pallier, Christophe, Block, Eliza, and Mehler, Jacques
(2004). Phonological phrase boundaries constrain lexical access: I. Adult data. Journal of
Memory and Language, 51, 523-47.
Clark, Herbert H. and Murphy, George L. (1982). Audience design in meaning and reference.
In Language and comprehension (eds. J. E L. Ny and W. Kintsch.) Vol. 9, pp. 287-99. North-
Holland, Amsterdam.
Clayards, Meghan (2008). The ideal listener: Making optimal use of acoustic-phonetic cues for
word recognition. PhD thesis, University of Rochester.
Tanenhaus, Michael K., Aslin, Richard, and Jacobs, Robert A. (2008). Perception of speech
reflects optimal use of probabilistic speech cues. Cognition, 108, 804-9.
Clements, George N. (1985). The geometry of phonological features. Phonology yearbook, 2,
225-52.
(2005). Universal trends vs. language-particular variation in feature specification: Com-
ments on a paper by Elan Dresher. Handout of presentation at the Workshop on Phonolog-
ical Features, CUNY, New York, March 10-11, 2005.
and Hume, Elizabeth V. (1995). The internal organization of speech sounds. See Gold-
smith (1995), pp. 245-306.
Clopper, Cynthia G. and Pisoni, David B. (2004). Some acoustic cues for the perceptual cate-
gorization of American English regional dialects. Journal of Phonetics, 32(1), 111-40.
Cohn, Abigail C. (1992). The consequences of dissimilation in Sundanese. Phonology, 9,
199-220.
(1993). Nasalisation in English: phonology or phonetics. Phonology, 10, 43-81.
(1998). The phonetics-phonology interface revisited: where's phonetics? Texas Linguistic
Forum, 41, 25-40.
(2006). Is there gradient phonology? In Gradience in grammar: generative perspectives
(eds. G. Fanselow, C. Eery, and M. Schlesewsky), pp. 25-44. Oxford University Press.
(2007). Phonetics in phonology and phonology in phonetics. Working Papers of the
Cornell Phonetics Lab, 16,1-31.
and Riehl, Anastasia (2008). The internal structure of nasal-stop sequences: Evidence
from Austronesian. Paper presented at Laboratory Phonology 11, post-conference draft,
August 22, 2008.
Coleman, John and Pierrehumbert, Janet B. (1997). Stochastic phonological grammars and
acceptability. In Proceedings of the 3rd Meeting of the ACL Special Interest Group in Com-
putational Phonology, pp. 49-56. Association for Computational Linguistics.
Cooper, Robin P. and Aslin, Richard N. (1989). The language environment of the young
infant: implications for early perceptual development. Canadian Journal of Psychology, 43,
247-65.
Court, Christopher (1970). Nasal harmony and some Indonesian sound laws. In Pacific Lin-
guistics, Series C, No. 13 (eds. S. Wurm and D. Laycock). Australian National University,
Canberra.
Cover, Thomas and Thomas, Joy (2006). Elements of information theory (2nd edn). Wiley-
Inters cience, New York.
References 293

Crewther, David, Crewther, Daniel, Ashton, Melanie, and Kuang, Ada (2010). Left global
visual hemineglect in high Autism-spectrum Quotient (AQ) individuals. Journal of Vision,
10,358.
Croft, William (1990). Typology and universals, Chapter 3. Markedness in typology, pp. 64-94.
Cambridge University Press, Cambridge.
Crosswhite, Katherine M. (2004). Vowel reduction. In Phonetically based phonology
(ed. B. Hayes, R. Kirchner, and D. Steriade), pp. 191-231. Cambridge University Press,
Cambridge.
Crowley, Terry and Bowern, Claire (2009). Introduction to historical linguistics. Oxford Univer-
sity Press, Oxford.
Culicover, Peter W. and Nowak, Andrzej (2002). Markedness, antisymmetry and complexity of
constructions. In Variation yearbook, pp. 5-30. John Benjamins, Amsterdam.
(2003). Dynamical grammar: Minimalism, acquisition, and changes. Oxford Univer-
sity Press, Oxford.
Cutler, Anne and Norris, Dennis (1979). Monitoring sentence comprehension. In Psycholin-
guistic studies presented to Merrill Garrett (eds. W. E. Cooper and E. C. T. Walker),
pp. 113-34. Erlbaum, New Jersey.
Daland, Robert, Sims, Andrea D., and Pierrehumbert, Janet B. (2007). Much ado about nothing:
A social network model of Russian paradigmatic gaps. In Proceedings of the 4$th Annual
Meeting of the Association of Computational Linguistics, pp. 936-43. Association for Com-
putational Linguistics, Prague, Czech Republic.
Dalston, Rodger M. (1975). Acoustic characteristics of English /w,r,l/ spoken correctly by young
children and adults. Journal of the Acoustical Society of America, 57(2), 462-9.
Daneman, Meredyth and Carpenter, Patricia A. (1983). Individual differences in integrating
information between and within sentences. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 9, 561-84.
D'Ausilio, Alessandro, Pulvermller, Friedemann, Salmas, Paola, Bufalari, Ilaria, Begliomini,
Chiara, and Fadiga, Luciano (2009). The motor somatotopy of speech perception. Current
Biology, 19, 381-5.
Davidson, Lisa (2005). Addressing phonological questions with ultrasound. Clinical Linguistics
and Phonetics, 19, 619-33.
(20o6a). Phonology, phonetics, or frequency: Influences on the production of non-native
sequences. Journal of Phonetics, 34(1), 104-37.
(20o6b). Phonotactics and articulatory coordination interact in phonology: evidence
from non-native production. Cognitive Science, 30(5), 837-62.
(2007). The relationship between the perception of non-native phonotactics and loanword
adaptation. Phonology, 24, 261-86.
(2011). Phonetic, phonemic, and phonological factors in cross-language discrimination
of phonotactic contrasts. Journal of Experimental Psychology: Human Perception and Perfor-
mance, 37(1), 270-82.
Davis, Colin J. and Perea, Manuel (2005). Buscapalabras: A program for deriving orthographic
and phonological neighborhood statistics and other psycholinguistic indices in Spanish.
Behavior Research Methods, 37(4), 665-71.
294 References

de Boer, Bart (2000). Self-organization in vowel systems. Journal of Phonetics, 28, 441-65.
(2001). The origins of vowel systems. Oxford University Press, Oxford.
and Khl, Patricia (2003). Investigating the role of infant-directed speech with a computer
model. Acoustics Research Letters On-line, 4(4), 129-34.
Delattre, Pierre ( 1969). An acoustic and articulate ry study of vowel reduction in four languages.
International Review of Applied Linguistics and Language Teaching, VII, 295-325.
Dell, Gary S. (1986). A spreading-activation theory of retrieval in sentence production. Psycho-
logical Review, 93, 283-321.
(1990). Effects of frequency and vocabulary type on phonological speech errors. Language
and Cognitive Processes, 5(4), 313-49.
Demuth, Katherine and Johnson, Mark (2003). Truncation to subminimal words in early
French. Canadian Journal of Linguistics, 48(3/4), 211-41.
Denis, Derek (2010). Passive diagnostics of contrast. Presented at Montreal-Ottawa-Toronto
Phonology Workshop, Carleton University, Ottawa, Ontario, March 2010.
Daz, Begoa, Baus, Cristina, Escera, Carles, Costa, Albert, and Sbastian-Galles, Nuria (2008).
Brain potentials to native phoneme discrimination reveal the origin of individual differences
in learning the sounds of a second language. Proceedings of the National Academy of Sci-
ences, 105(42), 16083-8.
Diehl, Randy L. (2008). Acoustic and auditory phonetics: The adaptive design of speech sound
systems. Philosophical Transactions of the Royal Society, 363, 965-78.
Dieth, Eugen (1932). A grammar of the Buchan dialect (Aberdeenshire), descriptive and histori-
cal. W. Heifer and Sons, Cambridge.
Dijksterhuis, Ap and Bargh, John A. (2001). The perception-behavior expressway: Automatic
effects of social perception on social behavior. In Advances in experimental social psychology
(ed. M. P. Zanna), Volume 33, pp. 1-40. Academic Press, San Diego.
Dimmendal, G. J. (1983). The Turkana language. Foris Publications, Dordrecht.
Dimov, Svetlin, Katseff, Shira, and Johnson, Keith (in press). Social and personality variables
in compensation for altered auditory feedback. In The initiation of sound change: Percep-
tion, production, and social factors (eds. M. J. S. Sabater and D. Recasens). John Benjamins,
Amsterdam.
Donegan, Patricia and Stampe, David (1979). The study of natural phonology. In Current
approaches to phonological theory (ed. D. Dinnsen), pp. 126-73. Indiana University Press,
Bloomington.
Downing, Laura J. (2009). On pitch lowering not linked to voicing: Nguni and Shona group
depressors. Language Sciences, 31(2-3), 179-98.
Doyle, Melanie and Walker, Robin (2001). Curved saccade trajectories: Voluntary and reflex-
ive saccades curve away from irrelevant distractors. Experimental Brain Research, 139,
333-44-
Dras, Mark and Harrison, K. David (2002). Emergent behavior in phonological pattern change.
In Artifical Life VIII (eds. R. K. Standish, M. A. Bedau, and H. A. Abass), pp. 390-3. Oxford
University Press, Oxford.
Dresher, Elan (2003). Contrast and asymmetry in inventories. In Asymmetry in gram-
mar: Morphology, phonology, acquisition (ed. A. di Sciullo), pp. 237-59. John Benjamins,
Amsterdam.
References 295

(2009). The contrastive hierarchy in phonology. Cambridge University Press.


Dressier, Wolfgang (1976). Morphologization of phonological processes (are there distinct
morphonological processes?) In Linguistic studies presented to Joseph H. Greenberg (ed.
A. Juilland), pp. 313-37. Anma Libri, Saratoga.
(1985). Morphophonology: The dynamics of derivation. Karoma Publishers, Ann Arbor.
Dupoux, Emanuel, Kakehi, Kazuhiko, Hirose, Yuki, Pallier, Christophe, and Mehler, Jacques
(1999). Epenthetic vowels in Japanese: A perceptual illusion? Journal of Experimental Psy-
chology: Human Perception and Performance, 25,1568-78.
Eckert, Penelope (1988). Adolescent social structure and the spread of linguistic change. Lan-
guage in Society, 17(2), 183-207.
(1989). The whole woman: Sex and gender differences in variation. Language Variation
and Change, 1(3), 245-67.
(2000). Linguistic variation as social practice. Blackwell Press, London.
Eilers, Rebecca E. (1977). Context-sensitive perception of naturally produced stop and fricative
consonants by infants. Journal of the Acoustic Society of America, 61(5), 1321-36.
and Benito-Garcia, Carmen R. (1984). The acquisition of voicing contrasts in Spanish and
English learning infants and children: A longitudinal study. Journal of Child Language, 11,
313-36.
and Minifie, Fred D. (1975). Fricative discrimination in early infancy. Journal of Speech
and Hearing Research, 18,158-67.
Wilson, W R., and Moore, J. M. (1977). Developmental changes in speech discrimination
in infants. Journal of Speech and Hear ing Research, 20, 766-80.
(1979). Speech discrimination in the language-innocent and the language-
wise: a study in the perception of voice onset time. Journal of Child Language, 6,
1-18.
Eimas, Peter D. (1974). Linguistic processing of speech by young infants. In Language per-
spectives: Acquisition, retardation and intervention (eds. R. Schiefelbusch and L. Lloyd),
pp. 164-92. University Park Press, Baltimore.
Siqueland, Einar R., Jusczyk, Peter W, and Vigorito, James (1971). Speech perception in
infants. Science, 171, 303-6.
Ellington, John (1977). Aspects of the Tiene language. PhD thesis, University of Wisconsin,
Madison.
Elman, Jeffrey L. (2003). Generalization from sparse input. In Proceedings of the 38th Annual
Meeting of the Chicago Linguistics Society. University of Chicago Press.
Englund, Kjellrun T. (2005). Voice onset time in infant directed speech over the first six months.
First Language, 25(2), 219-34.
Engstrand, Olle (1988). Articulatory correlates of stress and speaking rate in Swedish CVC
utterances. Journal of the Acoustical Society of America, 85,1863-75.
Espy-Wilson, Carol (1992). Acoustic measures for linguistic features distinguishing the
semivowels /wjrl/ in American English. Journal of the Acoustic Society of America, 92(2),
736-51.
Fadiga, Luciano, Craighero, Laila, Buccino, Giovanni, and Rizzolatti, Giacomo (2002). Speech
listening specifically modulates the excitability of tongue muscles: a TMS study. European
Journal of Neuroscience, 15, 399-402.
296 References

Fant, Gunnar (1960). Acoustical theory of speech production. Mouton, The Hague.
Feather, Norman T. (1982). Expectations and actions: Expectancy-value models in psychology.
Lawrence Erlbaum, Hillsdale, New Jersey.
Feldman, Naomi H., Griffiths, Thomas L., and Morgan, James L. (2009). Learning phonetic
categories by learning a lexicon. In Proceedings of the 3 ist Annual Conference of the Cognitive
Science Society (eds. N. Taatgen and H. van Rijn), pp. 2208-13. Cognitive Science Society,
Austin, TX.
Ferguson, Charles A. (1973). Fricatives in child language acquisition. Papers and Reports on
Child Language Development, 6, 61-85.
Fernald, Anne (1992). Human maternal vocalizations to infants as biologically relevant sig-
nals: An evolutionary perspective. In The adapted mind (eds. J. Barkwo, L. Cosmides, and
J. Tooby), pp. 391-428. Oxford University Press, New York.
Taeschner, T., Dunn, J., Papousek, M., de Boysson-Bardies, B., and Fukui, I. (1989). A
cross-language study of prosodie modifications in mothers' and fathers' speech to preverbal
infants. Journal of Child Language, 16(3), 477-501.
Fidelholtz, James L. (1975). Word frequency and vowel reduction in English. CLS, 11, 200-13.
Finley, Sara (2008). Formal and cognitive restrictions on vowel harmony. PhD thesis, Johns
Hopkins University.
Flemming, Edward (1996). Evidence for constraints on constrast: the dispersion theory of
contrast. UCLA Working Papers in Phonology, i, 86-106.
(2001). Scalar and categorical phenomena in a unified model of phonetics and phonology.
Phonology, 18, 7-44.
(2002). Auditory representations in phonology. Routledge, New York.
(2004). Contrast and perceptual distinctiveness. In The phonetic bases of markedness
(ed. B. Hayes, R. Kirchner, and D. Steriade). Cambridge University Press, Cambridge.
Fletcher, Janet (2004). An EMA/EPG study of vowel-to-vowel articulation across velars in
Southern British English. Clinical Linguistics and Phonetics, 18(6), 577-92.
Flynn, Darin and Fulop, Sean (2008). Dentals are grave. Unpublished manuscript, University
of Calgary and California State University, Fresno.
Fontaney, Louise (1980). Le verbe. In Elments de description dupunu (ed. F. Nsuka-Nkutsi),
pp. 51-114. CRLS, Universit Lyon II.
Fosler-Lussier, Eric and Morgan, Nelson (1999). Effects of speaking rate and word fre-
quency on pronunciations in conver[sa]tional speech. Speech Communication, 29(2-4),
137-58.
Foulkes, Paul, Docherty, Gerry, and Watt, Dominic (2005). Phonological variation in child-
directed speech. Language, 81(1), 177-206.
Fowler, Carol A. (1981). A relationship between coarticulation and compensatory shortening.
Phonetica, 38, 35-50.
Francis, Alexander L. and Nusbaum, Howard C. (2002). Selective attention and the acquisi-
tion of new phonetic categories. Journal of Experimental Psychology: Human Perception and
Performance, 28(2), 349-66.
Frank, Austin F. and Jaeger, T. Florian (2008). Speaking rationally: Uniform information density
as an optimal strategy for language production. In Proceedings of the 3oth Annual Meeting
References 297

of the Cognitive Science Society (eds. B. C. Love, K. McRae, and V. M. Sloutsky), pp. 933-8.
Cognitive Science Society, Austin, TX.
Frazier, Melissa (2005). Output-output faithfulness to moraic structure: evidence from Ameri-
can English. In North East Linguistics Conference, U Mass, Amherst (eds. C. Davis, A. R. Deal,
and Y. Zabbal), pp. 1-14. GLSA, Amherst, Mass.
Frisch, Stefan A. (2004). Language processing and segmental OCP effects. In Phonetically based
phonology (eds. B. Hayes, R. Kirchner, and D. Steriade), pp. 346-71. Cambridge University
Press, Cambridge.
Pierrehumbert, Janet B., and Broe, Michael B. (2004). Similarity avoidance and the OCP.
Natural Language and Linguistic Theory, 22,179-228.
Fromkin, Victoria A. (1971). The non-anomalous nature of anomalous utterances. Lan-
guage, 47,27-52.
(ed.) (1973). Speech errors as linguistic evidence. Mouton, The Hague.
(2000). Fromkins speech error database. Online database, Max Planck Institute
for Psycholinguistics, Nijmegen (http://www.mpi.nl/resources/data/fromkins-speech-
error-database/).
Gahl, Susanne (2008). Time and thyme are not homophones: The effect of lemma frequency on
word durations in spontaneous speech. Language, 84, 474-96.
Gaissmaier, Wolfgang (2008). The smart potential behind probability matching. Cognition, 109,
416-22.
Galantucci, Bruno (2005). An experimental study of the emergence of human communication
systems. Cognitive Science, 29, 737-67.
Fowler, Carol A., and Goldstein, Louis (2009). Perceptuomotor compatibility effects in
speech. Attention, Perception, and Psychophysics, 71(5), 1138-49.
and Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psycho-
nomics Bulletin Review, 13, 361-77.
Galinsky, Adam D., Magee, Joe C., Inesi, M. Ena, and Gruenfeld, Deborah H. (2006). Power
and perspectives not taken. Psychological Science, 17,1068-74.
Gallagher, Gillian (2010). The perceptual basis of long-distance laryngeal restrictions. PhD thesis,
MIT.
Gallese, Vittorio, Fadiga, Luciano, Fogassi, Leonardo, and Rizzolatti, Giacomo (1996). Action
recognition in the premotor cortex. Brain, 119, 593-609.
Candour, Jack, Petty, Soranee H., Dardarananda, Rochana, Dechongkit, Sumalee, and Mukn-
goen, Sunee (1986). The acquisition of the voicing contrast in Thai: A study of voice onset
time in word-initial stop consonants. Journal of Child Language, 13, 561-72.
Potisuk, Siripong, and Dechongkit, Sumalee (1994). Tonal coarticulation in Thai. Journal
of Phonetics, 22(4), 477-92.
Crnica, Olga Kaunoff (1977). Some prosodie and paralinguistic features of speech to young
children. In Talking to children (eds. C. Snow and C. Ferguson), pp. 63-88. Cambridge
University Press, Cambridge.
Garrett, Andrew and Blevins, Juliette (2009). Analogical morphophonology. In The nature of
the word: Studies in honor of Paul Kiparsky (eds. K. Hanson and S. Inkelas), pp. 527-45. MIT
Press, Cambridge, Mass.
298 References

Gay, Thomas (1974). A cinefluorographic study of vowel production. Journal of Phonetics, 2,


255-66.
(19/7). Articulatory movements in VCV sequences. Journal of the Acoustical Society of
America, 62,183-93.
Geisler, Hans (1994). Metathese im Sardischen. Vox Romnica, 53,106-37.
Geisler, Wilson S. (2003). Ideal observer analysis. In The visual neurosciences (eds. L. M.
Chalupa and J. S. Werner), Volume i, pp. 825-37. The MIT Press.
Gelman, Andrew (2008). Scaling regression inputs by dividing by two standard deviations.
Statistics in Medicine, 27, 2865-73.
and Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cam-
bridge University Press, New York.
Gessner, Suzanne and Hansson, Gunnar Olafur (2004). Anti-homophony effects in Dakelh
(Carrier) valence morphology. In Proceedings of the soth Annual Meeting of the Berkeley
Linguistics Society (eds. M. Ettlinger, N. Fleisher, and M. Park-Doob), pp. 93-104. Berkeley
Linguistics Society, Berkeley.
Ghez, C., Favilla, M., Ghilardi, M., Gordon, J., Bermejo, R., and Pullman, S. (1997). Discrete
and continuous planning of hand movements and isometric force trajectories. Experimental
Brain Research, 115, 217-33.
Giles, Howard and Powesland, Peter R (1975). Speech styles and social evaluation. Academic
Press, New York.
Givn, Talmy (1971). Historical syntax and synchronie morphology: An archeologist's field trip.
Chicago Linguistic Society 7, 394-415.
(i979). On understanding grammar. Academic Press, New York.
Goldenfeld, Nigel, Baron-Cohen, Simon, and Wheelwright, Sally (2005). Empathizing and
systemizing in males, females, and autism. Clinical Neuropsychiatry, 2, 338-45.
Goldinger, Stephen D. (1989). Movement dynamics and the nature of errors in tongue twisters:
An observation and research proposal. No. 15 in Research on Speech Perception, Progress
Reports. Speech Research Laboratory, Indiana University.
(1992). Words and voices: Implicit and explicit memory for spoken words. PhD thesis,
Indiana University.
(1996). Words and voices: Episodic traces in spoken word identification and recogni-
tion memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22,
1166-83.
(1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105,
251-79.
Goldsmith, John (ed.) (1995). The handbook of phonological theory. Blackwell, Cambridge,
Mass.
(1998). On information theory, entropy, and phonology in the 20th century. In Royau-
mont CTIPIIRound Table on Phonology in the 2oth Century, Royaumont (June 26,1998).
(2002). Probabilistic models of grammar: Phonology as information minimization.
Phonological Studies, 5, 21-46.
(2007). Probability for linguists. MS. University of Chicago.
and Riggle, Jason (to appear). Information theoretic approaches to phonological structure:
the case of Finnish vowel harmony. Natural Language and Linguistic Theory.
References 299

Goldstein, Louis and Fowler, Carol (2003). Articulatory phonology: a phonology for public lan-
guage use. In Phonetics and phonology in language comprehension and production: Differences
and similarities (ed. A. Meyers and N. Schiller), pp. 159-207. Mouton de Gruyter, Berlin.
Gordon, Peter and Alegre, Maria (1999). Is there a dual system for regular inflections? Brain
and Language, 68, 212-17.
Gorman, Kyle (2009). Hierarchical regression modeling for language research. Technical
report, Institute for Research in Cognitive Science, University of Pennsylvania.
Goto, Hiromu (1971). Auditory perception by normal Japanese adults of the sounds T and R'.
Neuropsychologia, 9, 317-23.
Goudaillier, Jean-Pierre (1987). Einige Spracheigentmllichkeiten der Ltzebuergeschen
Mundarten im Licht der intrumentellen Phonetik. In Aspekte des Ltzebuergeschen (ed. J.-P.
Goudaillier), pp. 197-230. Buske Verlag, Hamburg.
Grammont, Maurice (1895). La dissimilation consonantique dans les langues indo-europennes
et dans les langues romanes: Thse prsente la Facult des Lettres de Paris. Darantire, Dijon.
(i933)- Trait de phontique. Delagrave, Paris.
(1939). Trait de phontique (2nd edn.). Delagrave, Paris.
Green, David M. and Swets, John A. (1966). Signal detection theory and psychophysics. Wiley,
New York.
Greenberg, Joseph H. (1966). Language universals, with special reference to feature hierarchies.
Mouton de Gruyter, Berlin.
Greenlee, Mel and Ohala, John J. (1980). Phonetically motivated parallels between child
phonology and historical sound change. Language Sciences, 2(2), 283-308.
Grimes, Barbara E, Grimes, Joseph E., and Pittman, Richard S. (eds.) (2000). Ethnologue:
Languages of the world, itfh Edition. Summer Institute of Linguistics, Dallas, TX.
Grinter, Emma J., Maybery, Murray T., Van Peek, Pia L., Pellicano, Elizabeth, Badcock,
Johanna C., and Badcock, David R. (2009). Global visual processing and self-rated autistic-
like traits. Journal of Autism and Developmental Disorders, 39,1278-90.
Grossberg, Stephen (1978). A theory of human memory: S elf-organization and performance
of sensory-motor codes, maps, and plans. In Progress in theoretical biology (eds. R. Rosen
and E Snell), Volume 5, pp. 233-374. Academic Press, New York.
(2003). Resonant neural dynamics of speech perception. Journal of Phonetics, 31(3-4),
423-45-
Grosvald, Michael (2009). Interspeaker variation in the extent and perception of long-distance
vowel-to-vowel coarticulation. Journal of Phonetics, 37(2), 173-88.
Guin, Susan G. (1995). Word frequency effects among homonyms. In Texas Linguistic Forum
35: Papers in Phonetics and Phonology (eds. T. C. Carleton, J. Elorrieta, and M. J. Moosally),
pp. 103-15. Department of Linguistics, University of Texas at Austin, Austin.
(1998). The role of perception in the sound change of velar palatalization. Phonetica,
55> 18-52.
Clark, J. J., Harada, Tetsuo, and Wayland, Ratree P. (2003). Factors affecting stress place-
ment for English nonwords include syllabic structure, lexical class, and stress patterns of
phonologically similar words. Language and Speech, 46(4), 403-27.
Gussenhoven, Carlos (2004). The phonology of tone and intonation. Cambridge University Press,
Cambridge.
300 References

Guy, Gregory (1992). Explanation in variable phonology: An exponential model of morpho-


logical constraints. Language Variation and Change, 3,1-22.
Hagge, Claude and Haudricourt, Andr (1978). La phonologie panchronique. Presses Univer-
sitaires de France, Paris.
Haith, Marshall, Hazan, Cindy, and Goodman, Gail (1988). Expectation and anticipation of
dynamic visual events by 3.5-month-old babies. Child Development, 59, 467-97.
Hajek, John (1997). Universals of sound change in nasalization. Blackwell, Oxford and Boston.
Hale, John (2003). Grammar, uncertainty and sentence processing. PhD thesis, Johns Hopkins
University.
Hale, Mark (2007). Theory and method in historical linguistics. Oxford University Press,
Oxford.
and Reiss, Charles (1998). Formal and empirical arguments concerning phonological
aquisition. Linguistic Inquiry, 29, 656-83.
(2000). Phonology as cognition. In Phonological knowledge (eds. N. Burton-Roberts,
P. Carr, and G. Docherty), pp. 161-84. Oxford University Press.
(2008). The phonological enterprise. Oxford University Press, Oxford.
Hall, Beatrice L. and Hall, Richard M. R. (1980). Nez Perce vowel harmony: an Africanist
explanation and some theoretical questions. In Issues in vowel harmony (d. R. Vago),
pp. 201-36. Benjamins, Amsterdam.
Hall, Kathleen Currie (2009). A probabilistic model of phonological relationships from contrast
to allophony. PhD thesis, The Ohio State University.
Halle, Morris (1972). Theoretical issues in phonology in the 19708. In Proceedings of the
Seventh International Congress of Phonetic Sciences (eds. A. Rigault and R. Charbonneau),
pp. 179-205. Mouton, The Hague.
and Stevens, Kenneth N. (1967). On the mechanism of glottal vibration for vowels and
consonants. MIT Research Laboratory of Electronics Quarterly Progress Report, 85, 267-70.
(1971). A note on laryngeal features. MIT Research Laboratory of Electronics Quar-
terly Progress Report, 101,198-213.
(1991). Knowledge of language and the sounds of speech. In Music, language, speech
and brain (eds. J. Sundberg, L. Nord, and R. Carlson), pp. 1-19. Macmillan Press, London.
Hall, Pierre, Segui, Juan, Frauenfelder, Uli, and Meunier, Christine (1998). Processing of illegal
consonant clusters: A case of perceptual assimilation. Journal of Experimental Psychology:
Human Perception and Performance, 24(2), 592-608.
Han, Mieko S. and Weizman, Raymond S. (1970). Acoustic features of Korean /P, T, K/, /p, t, k/
and /ph, th, kh/. Phonetica, 22,112-28.
Hansson, Gunnar (2008). Diachronie explanations of sound patterns. Language and Linguistics
Compass, 2, 859-93.
(2010). Consonant harmony: Long-distance interactions in phonology. University of Cali-
fornia Press, Berkeley.
Happ, Francesca and Frith, Uta (2006). The weak coherence account: Detail-focused cognitive
style in autism spectrum disorders. Journal of Autism and Developmental Disorders, 36,5-25.
Hardcastle, William J. (1985). Some phonetic and syntactic constraints on lingual coarticula-
tion in stop consonant sequences. Speech Communication, 4, 247-63.
Hare, Mary and Elman, Jeff (1995). Learning and morphological change. Cognition, 56, 61-98.
References 301

Harries-Delisle, Helga (1978). Contrastive emphasis and cleft sentences. In Universals of human
language, Vol. 4: Syntax (ed. J. H. Greenberg), pp. 419-86. Stanford University Press.
Harrington, Jonathan, Kleber, Felicitas, and Reubold, Ulrich (2008). Compensation for coar-
ticulation, /u/-fronting, and sound change in standard southern British: An acoustic and
perceptual study. Journal of the Acoustical Society of America, 123(5), 2825-35.
Harris, John (1985). Phonological variation and change: Studies in Hiberno-English. Cambridge
University Press, New York.
Hasegawa, Yoko (1999). Pitch accent and vowel devoicing in Japanese. In Proceedings of the
XlVth International Congress of Phonetic Sciences, San Francisco, 1-7 August 1999 (eds.
J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, and A. C. Bailey), pp. 523-6. ICPhS.
Hawkins, Sarah (2003). Roles and representations of systematic fine phonetic detail in speech
understanding. Journal of Phonetics, 31, 373-405.
Hay, Jennifer (2003). Causes and consequences of word structure. Routledge, New York and
London.
and Sudbury Andrea (2005). How rhoticity became /r/-sandhi. Language, 81, 799-823.
Hayes, Bruce, Kirchner, Robert, and Steriade, Donca (2004). Phonetically based phonology.
Cambridge University Press, Cambridge.
and Londe, Zsuzsa C. (2006). Stochastic phonological knowledge: the case of Hungarian
vowel harmony. Phonology, 23(1), 59-104.
and Wilson, Colin (2008). A maximum entropy model of phonotactics and phonotactic
learning. Linguistic Inquiry, 39(3), 379-440.
Hay ward, Richard J. (1990). Notes on the Aari language. In Omotic language studies (ed.
R. J. Hayward), pp. 425-93. School of Oriental and African Studies, University of London.
Hedrick, M. and Ohde, R. N. (1993). Effect of relative amplitude of frication on perception of
place of articulation. Journal of the Acoustic Society of America, 94(4), 2006-26.
Heijmans, Linda (2003). The relationship between tone and vowel length in two neighbor-
ing Dutch Limburgian dialects. In Development in prosodie systems (eds. P. Fikkert and
H. Jacobs), pp. 7-45. Mouton de Gruyter, New York.
Heike, Georg (1972). Quantitative und qualitative Differenzen von /a(:)/-Realisationen im
Deutschen. In Proceedings of the Vllth International Congress of Phonetic Sciences, Prague,
pp. 725-9.
Heine, Bernd, Claudi, Ulrike, and Hnnemeyer, Friederike (1991). Grammaticalization: a con-
ceptual framework. University of Chicago Press.
Henton, Caroline and Bladon, Anthony (1988). Creak as a sociophonetic marker. In Language,
speech and mind (eds. L. M. Hyman and C. N. Li), pp. 3-29. Routledge, London and New
York.
Herzog, Eugen (1904). Streitfragen der romanischen Philologie. M. Niemeyer, Halle.
Hewitt, B. George (1995). Georgian: A structural reference grammar. John Benjamins, Amster-
dam and Philadelphia.
Hickok, Gregory and Poeppel, David (2004). Dorsal and ventral streams: A framework for
understanding aspects of the functional anatomy of language. Cognition, 92, 67-99.
Hillenbrand, James M., Clark, M. J., and Nearey, Terence M. (2001). Effects of consonantal
environment on vowel formant patterns. Journal of the Acoustical Society of America, 109,
748-63.
302 References

Hirata, Yukari and Tsukada, Kimiko (2003). The effects of speaking rate and vowel length on
the formant movements in Japanese. In Proceedings of the 2003 Texas Linguistics Society
Conference: Co articulation in Speech Production and Perception (eds. A. Agwuele, W. Warren,
and S.-H. Park), Somerville, pp. 73-85. Cascadilla Proceedings Project.
Hitchcock, Clara (1903). The psychology of expectation. The Psychological Review, 5(3), 1-78.
Hock, Hans Henrich (1991). Principles of historical linguistics (2nd edn). Mouton de Gruyter,
Berlin.
and Joseph, Brian D. (1996). Language history, language change, and language rela-
tionship: An introduction to historical and comparative linguistics. Mouton de Gruyter,
Berlin.
Hockett, Charles E (1955). A manual of phonology. International Journal of American Linguis-
tics, memoir 11.
(1965). Sound change. Language, 41,185-202.
Holmberg, Tristan L., Morgan, Kathleen A., and Khl, Patricia K. (1977). Speech perception in
early infancy: Discrimination of fricative consonants. Presented at the 94th Meeting of the
Acoustical Society of America.
Holt, Lori L. and Lotto, Andrew J. (2006). Cue weighting in auditory categorization: Implica-
tions for first and second language acquistion. Journal of the Acoustical Society of America,
ii9>3059-7i.
and Kluender, Keith (2000). Neighboring spectral context influences vowel identifi-
cation. Journal of the Acoustical Society of America, 108(2), 710-22.
Hombert, Jean-Marie (1977). Development of tones from vowel height. Journal of Phonetics 5,
9-16.
(1978). Consonant types, vowel quality, and tone. In Tone: a linguistic survey (ed. V. A.
Fromkin), pp. 77-111. Academic Press, New York.
Ohala, John J., and Ewan, William G. (1979). Phonetic explanations for the development
of tones. Language, 55, 37-58.
Hooper, Joan B. (i976a). Introduction to natural generative phonology. Academic Press, New
York.
(i976b). Word frequency in lexical diffusion and the source of morphophonological
change. In Current progress in historical linguistics (ed. W Christie), pp. 95-105, North-
Holland, Amsterdam.
Hopper, Paul (1987). Emergent grammar. Berkeley Linguistics Society 13,139-57.
Hosmer, David W and Lemeshow, Stanley (1989). Applied logistic regression. John Wiley and
Sons, New York.
Houde, John E and Jordan, Michael I. (1998). Sensorimotor adaptation in speech production.
Science, 279,1213-6.
Houghton, George and Tipper, Steven (1996). Inhibitory mechanisms of neural and cognitive
control: Applications to selective attention and sequential action. Brain and Cognition, 30,
20-43.
House, Arthur S. (1961). On vowel duration in English. Journal of the Acoustical Society of
America 33,1174-8.
Howe, Darin and Fulop, Sean (2005). Acoustic features in Athabascan. Unpublished
manuscript, University of Calgary and California State University, Fresno.
References 303

Hruschka, Daniel, Christiansen, Morten, Blythe, Richard, Croft, William, Heggarty, Paul,
Mufwene, Salikoko, Pierrehumbert, Janet B., and Poplack, Shana (2009). Building social
cognitive models of language change. Trends in Cognitive Sciences, 13, 464-9.
Hua, Zhu and Dodd, Barbara (2000). The phonological acquisition of Putonghua (Modern
Standard Chinese). Journal of Child Language, 27(1), 3-42.
Huang, Hui-Chun (2007). Lexical context effects on speech perception in Chinese people with
autistic traits. Master s thesis, University of Edinburgh.
Hume, Elizabeth (2004a). Deconstructing markedness: A predictability-based approach. In
Proceedings of the Berkeley Linguistic Society 13, pp. 182-98.
(2oo4b). The indeterminacy/attestation model of metathesis. Language, So, 203-37.
(2006). Language specific and universal markedness: An information-theoretic approach.
Paper presented at the 8oth Linguistic Society of America Annual Meeting, Symposium on
Information Theory and Phonology, Albuquerque.
(2008). Markedness and the language user. Phonological Studies, 11, 295-310.
and Broomberg, liana (2005). Predicting epenthesis: An information-theoretic account.
Paper presented at the 7th Annual Meeting of the French Network of Phonology, Aix-en-
Provence.
and Johnson, Keith (200 la). A model of the interplay of speech perception and phonology.
In The role of perception in phonology (eds. E. Hume and K. Johnson), pp. 3-26. Academic
Press, New York.
(2ooib). The role of speech perception in phonology. Academic Press, New York.
Mailhot, Frdric, Wedel, Andrew, Hall, Katherine C., Kim, D., Ussishkin, Adam, Add-
Decker, Martime, Gendrot, Cdric, and Fougeron, Ccile (2011). Anti-markedness patterns
in French declension and epenthesis: an information-theoretic account. In Proceedings of
the 37th Annual Meeting of the Berkeley Linguistics Society. Berkeley, CA.
and Odden, David (1996). Reconsidering [consonantal]. Phonology, 13, 345-76.
Huron, David (2006). Sweet anticipation: Music and the psychology of expectation. MIT
Press.
Huttenlocher, P. R. (2002). Neural plasticity: The effects of environment on the development of
the cerebral cortex. Harvard University Press.
Hyman, Larry M. (1972). Nasals and nasalization in Kwa. Studies in African Linguistics, 4,
167-206.
(1973). The role of consonant types in natural tonal assimilations. In Consonant types
and tone (ed. L. M. Hyman), Southern California Occasional Papers in Linguistics i,
pp. 151-79. University of Southern California, Los Angeles.
(1975). Phonology: theory and analysis. Rinehart and Winston, New York.
(1976). Phonologization. In Linguistic studies presented to Joseph H. Greenberg (ed. A. Juil-
land), pp. 407-18. Anna Libri, Saratoga, Calif.
(1981). Noni grammatical structure, with special reference to verb morphology. Department
of Linguistics, University of Southern California, Los Angeles.
(1984). Form and substance in language universals. In Explanation of language universals
(eds. B. Butterworth, B. Comrie, and O. Dahl), pp. 67-85. Stanford University Press.
(1988). The phonology of final glottal stops. In Proceedings of W.E.C.O.L. 1988,
pp. 113-30. CSU, Fresno.
304 References

Hyman, Larry M. (2002). Is there a right-to-left bias in vowel harmony? Paper presented at
9th International Phonology Meeting, Vienna, Nov. i, 2002. To appear in John R. Rennison,
Friedrich Neubarth, and Markus A. Pochtrager (eds.), Phonologica 2002. Berlin: Mouton.
(2003). 'Abstract' vowel harmony in kloon: A system-driven account. In Typologie
des langues d'Afrique et universaux de la grammaire (eds. P. Sauzet and A. Zribi-Hertz),
pp. 85-112. L'Harmattan, Paris.
(20o8a). Directional asymmetries in the morphology and phonology of words, with spe-
cial reference to Bantu. Linguistics, 46, 309-49.
(2oo8b). Universals in phonology. The Linguistic Review, 25, 83-137.
(2oioa). Affixation by place of articulation: the case of Tiene. In Rara and rarissima:
Collecting and interpreting unusual characteristcs of human languages (eds. M. Cysouw and
J. Wohlgemuth), pp. 145-84. Mouton de Gruyter, Berlin and New York.
(2oiob). Focus marking in Aghem. In Information structure in African languages: Typolog-
ical studies in language (TSL) (eds. I. Fiedler and A. Schwartz), pp. 95-116. John Benjamins,
Amsterdam and Philadelphia.
and Katamba, Francis X. (1990). Final vowel shortening in Luganda. Studies in African
Linguistics, 21,1-59.
and Mathangwane, Joyce (1998). Tonal domains and depressor consonants in
Ikalanga. In Theoretical aspects of Bantu tone (eds. L. M. Hyman and C. Kisseberth),
pp. 195-229. C.S.L.I., Stanford.
and Polinsky, Maria (2009). Focus in Aghem. In Information structure: theoret-
ical, typological, and experimental perspectives (eds. M. Zimmerann and C. Fry),
pp. 206-33. Oxford University Press.
Idiatov, Dmitry (2008). Antigrammaticalization, antimorphologization and the case of Tura.
In Theoretical and empirical issues in grammaticalization (eds. E. Seoane and M. J. Lpez-
Couso), pp. 151-69. John Benjamins, Amsterdam.
It, Junko, Mester, Armin, and Padgett, Jaye. (1995). Licensing and redundancy: underspecifi-
cation in optimality theory. Linguistic Inquiry 26, 571-614.
Iverson, Gregory K. and Salmons, Joseph C. (1996). Mixtee prenasalization as hypervoicing.
International Journal of American Linguistics, 62,165-75.
Jackson, Ellen and Stanley, Carol (1977). Description phonologique du tikar (parler de
Bankim). Ms. S. I. L. Yaounde.
Jaeger, T. Florian (2008). Categorical data analysis: Away from ANOVAs (transforma-
tion or not) and towards logit mixed models. Journal of Memory and Language, 59(4),
434-46.
(2010). Redundancy and reduction: Speakers manage syntactic information density. Cog-
nitive Psychology, 61(1), 23-62.
and Tily, Harry (2011). On language 'utility': Processing complexity and communicative
efficiency. WIREs: Cognitive Science, 2(3), 323-35.
Jakobson, Roman (1931). Prinzipien der historischen phonologie. Travaux du Cercle Linguis-
tique de Prague, 4, 247-67.
(1931 [1972]). Principles of historical phonology. In A reader in historical and comparative
linguistics (ed. A. R. Keiler), pp. 121-38. Rinehart and Winston, New York.
(1968). Child language aphasia and phonological universals. Mouton, The Hague.
References 305

Jakobson, Roman, Fant, G. Gunnar M., and Halle, Morris (1952). Preliminaries to speech anal-
ysis: the distinctive features and their correlates. MIT Press, Cambridge, Mass.
and Waugh, Linda (1979). The sound shape of language. Indiana University Press, Bloom-
ington.
James, William and Mole, A. (1847). Dictionary of the English and French languages for
general use with the accentuations and a literal pronunciation of every word in both lan-
guages, comp. from the best and most approved English and French authorities. B. Tauchnitz,
Leipzig.
Janda, Richard (2003). 'Phonologization as the start of dephoneticization - Or, on sound
change and its aftermath: Of extension, generalization, lexicalization, and morphologization.
In Handbook of historical linguistics (eds. B. Joseph and R. Janda), pp. 401-22. Blackwell,
Maiden, MA.
and Joseph, Brian (2003). On language, change, and language changeor, of history,
linguistics, and historical linguistics. In Handbook of historical linguistics (ed. B. Joseph and
R. Janda), pp. 3-180. Blackwell, Oxford.
Jansen, Wouter (2004). Laryngeal contrast and phonetic voicing: A laboratory phonology
approach to English, Hungarian, and Dutch. PhD thesis, University of Groningen.
Jeffers, R. and Lehiste, I. (1979). Principles and methods for historical linguistics. MIT Press,
Cambridge, MA.
Jescheniak, Jrg D. and Levelt, Willem J. M. (1994). Word frequency effects in speech produc-
tion: Retrieval of syntactic information and of phonological form. Journal of Experimental
Psychology: Learning, Memory and Cognition 20(4), 824-43.
Jobe, Lisa E. and White, Susan Williams (2007). Loneliness, social relationships, and a broader
autism phenotype in college students. Personality and Individual Differences, 42,1479-89.
John, Oliver P., Naumann, Laura P., and Soto, Christopher J. (2008). Paradigm shift to the inte-
grative big-five trait taxonomy: history, measurement, and conceptual issues. In Handbook of
personality: Theory and research (eds. O. P. John, R. W Robins, and L. A. Pervin), pp. 114-58.
Guilford Press, New York, NY.
Johnson, Keith (i997a). The auditory/perceptual basis for speech segmentation. Ohio State
University Working Papers in Linguistics, 50,101-13.
(i997b). Speech perception without speaker normalization: an exemplar model. In Talker
variability in speech processing (eds. K. Johnson and J. Mullennix), pp. 145-66. Academic
Press, San Diego.
(2000). Adaptive dispersion in vowel perception. Phonetica, 57,181-8.
(2003). Acoustic and auditory phonetics (2nd edn). Blackwell, Maiden, Mass.
(2004). Massive reduction in conversational American English. In Spontaneous speech:
Data and analysis. Proceedings of the ist Session of the loth International Symposium
(eds. K. Yoneyama and K. Maekawa), pp. 29-54. National Institute for Japanese Language,
Tokyo.
(2006). Resonance in an exemplar-based lexicon: The emergence of social identity and
phonology. Journal of Phonetics, 34, 485-99.
(2007). Decision and mechanisms in exemplar-based phonology. In Experimental
approaches to phonology (eds. M.-J. Sol, P. Beddor, and M. Ohala), Chapter 3, pp. 25-40.
Oxford University Press, Oxford.
3o6 References

Johnson, Keith, Flemming, Edward, and Wright, Richard (1993a). The hyperspace effect: Pho-
netic targets are hyperarticulated. Language, 69, 505-28.
Ladefoged, Peter, and Lindau, Mona (199313). Individual differences in vowel production.
Journal of the Acoustical Society of America, 94, 701-14.
and Martin, Jack (2001). Acoustic vowel reduction in Creek: Effects of distinctive length
and position in the word. Phonetica, 58, 81-102.
Jones, J. A. and Munhall, K. G. (2000). Perceptual calibration of Fo production: Evidence from
feedback perturbation. Journal of the Acoustical Society of America, 108,1246-51.
Jones, M. R., Johnston, H. M, and Puente, J. (2006). Effects of auditory pattern structure on
anticipator and reactive attending. Cognitive Psychology, 53, 59-96.
Jongman, Allard (1988). Duration of frication noise required for identification of English
fricatives. Journal of the Acoustic Society of America, 85(4), 1718-25.
Wayland, Ratree, and Wong, Serena (2000). Acoustic characteristics of English fricatives.
Journal of the Acoustic Society of America, 108(3), 1252-63.
J0rgensen, Hans Peter (1996). Die gespannten und ungespannten Vokale in der norddeutschen
Hochsprache mit einer spezifischen Untersuchung der Struktur der Formantenfrequenzen.
Phonetica, 19, 217-45.
Joseph, Brian and Janda, Richard (1988). The how and why of diachronic morphologization
and demorphologization. In Theoretical morphology (eds. M. Hammon and M. Noonan),
pp. 193-210. Academic Press, San Diego.
(eds.) (2003). The handbook of historical linguistics. Blackwell, Oxford.
Jun, Jongho (1995). Place assimilation as the result of conflicting perceptual and articula-
tory constraints. In Proceedings of West Coast Conference of Formal Linguistics, Volume 14,
pp. 221-37.
Jurafsky, Dan (2003). Probabilistic modeling in psycholinguistics. In Probabilistic Linguistics
(eds. R. Bod, J. Hay, and S. Jannedy), pp. 39-95. MIT Press, Cambridge, Mass.
Bell, Alan, Gregory, Michelle, and Raymond, W (2001). Probabilisitc relations between
words: Evidence from reduction in lexical production. In Frequency and the emergence of
linguistic structure (eds. J. Bybee and P. Hopper), pp. 229-54. John Benjamins, Amsterdam.
Jusczyk, Peter W, Goodman, Mara B., and Baumann, Angela (1999). Nine-month-olds atten-
tion to sound similarities in syllables. Journal of Memory and Language, 40, 62-82.
Kabak, Baris and Idsardi, William J. (2007). Perceptual distortions in the adaptation of English
consonant clusters: Syllable structure or consonantal contact contraints. Language and
Speech, 50, 23-52.
Kang, Kyoung-Ho and Guin, Susan G. (2008). Clear speech production of Korean stops:
Changing phonetic targets and enhancement strategies. Journal of the Acoustical Society of
America, 124(6), 3909-17.
Kataoka, Reiko (2010). Individual variation in speech perception as a source of 'apparent'
hyp o-correction. Paper presented at the 12th Conference on Laboratory Phonology, Albur-
querque, New Mexico, July 10.
(2011). Phonetic and cognitive bases of sound change. PhD thesis, University of California,
Berkeley.
Katseff, Shira, Houde, John, and Johnson, Keith (in press). Partial compensation for altered
auditory feedback: A tradeoff with somatosensory feedback? Language and Speech.
References 307

Kaun, Abigail R. (2004). The typology of rounding harmony In Phonetically based phonology
(eds. B. Hayes, R. Kirchner, and D. Steriade), pp. 87-116. Cambridge University Press,
Cambridge.
Kavitskaya, Darya (2002). Compensatory lengthening: Phonetics, phonology, and diachrony.
Routledge, New York.
Kawasaki, Haruko (1986). Phonological universals of vowel nasalization. In Experimental
phonology (eds. J. J. Ohala and J. J. Jaeger), pp. 81-98. Academic Press, Orlando, FL.
Kaye, Jonathan (1974). Morpheme structure constraints live! In Montreal Working Papers in
Linguistics, Volume 3, pp. 55-62.
Keating, Patricia A. (1984). Phonetic and phonological respresentations of stop consonant
voicing. Language, 60(2), 286-319.
(1985). Universal phonetics and the organization of grammars. In Phonetic linguistics:
Essays in honor of Peter Ladefoged (ed. V. A. Fromkin), pp. 115-32. Academic Press,
Orlando.
(1988). The phonology-phonetics interface. In Linguistics: The Cambridge survey, Vol-
ume I: Grammatical theory (ed. F. J. Newmeyer), pp. 281-302. Cambridge University
Press.
(1990). Phonetic representations in a generative grammar. Journal of Phonetics 18,
321-34.
(1996). The phonology-phonetics interface. In Interfaces in phonology (ed. U. Kleinhenz),
pp. 262-78. Akademie Verlag, Berlin.
Cho, Taehong, Fougeron, Ccile, and Hsu, Chai-Shune (2003). Domain-initial
articulatory strengthening in four languages. Papers in Laboratory Phonology, 6,
143-61.
Linker, Wendy, and Huffman, Marie (1983). Patterns of allophone distribution for voiced
and voiceless stops. Journal of Phonetics, 11, 277-90.
Mikos, M. J., and Ganong III, W. F. (1981). A cross-language study of range of voice onset
time in the perception of initial stop voicing. Journal of the Acoustic Society of America, 70(5),
1261-71.
Keenan, Edward L. (1976). Towards a universal definition of 'subject'. In Subject and topic
(ed. C. N. Li), pp. 303-33. Academic Press.
Kelly, Michael H. (1988). Rhythmic alternation and lexical stress differences in English. Cogni-
tion, 30, 107-37.
(1989). Rhythm and language change in English. Journal of Memory and Language,
28, 690-710.
and Bock, J. Kathryn (1988). Stress in time. Journal of Experimental Psychology: Human
Perception and Performance, 14(3), 389-403.
Kenstowicz, Michael and Kisseberth, Charles (1979). Generative phonology. Academic Press,
San Diego.
Kertsz, Zsuzsa (2003). Vowel harmony and the stratified lexicon of Hungarian. In The odd
yearbook, 7. ELTE Press, Budapest.
Keyser, Samuel Jay and Stevens, Kenneth N. (2001). Enhancement revisited. In Ken Hale: A life
in language (ed. M. Kenstowicz), pp. 271-91. MIT Press, Cambridge, MA.
(2006). Enhancement and overlap in the speech chain. Language, 82(1), 33-63.
3o8 References

Khouw, Edward and Ciocca, Victor (2007). Perceptual correlates of Cantonese tones. Journal
of Phonetics, 35,104-17.
Kim, Chin-Wu (1965). On the autonomy of the tensity feature in stop classification (with special
reference to Korean stops). Word, 21, 339-59.
King, Jonathan and Just, Marcel Adam (1991). Individual differences in syntactic processing:
The role of working memory. Journal of Memory and Language, 30, 580-602.
King, Robert D. (1967). Functional load and sound change. Language, 43, 831-52.
(1969). Historical linguistics and generative grammar. Prentice-Hall, Englewood Cliffs, N.J.
Kingston, John (2007). The phonetics-phonology interface. In The Cambridge handbook of
phonology (ed. P. de Lacy), pp. 401-34. Cambridge University Press, Cambridge.
and Diehl, Randy L. (1994). Phonetic knowledge. Language, 70, 419-54.
Kirk, Cecilia J., and Castleman, Wendy A. (2008). On the internal perceptual struc-
ture of distinctive features. Journal of Phonetics, 36, 28-54.
Kiparsky, Paul (1965). Phonological change. PhD thesis, M.I.T.
(1968). Linguistic universals and language change. In Universals in linguistic the-
ory (eds. E. Bach and R. T. Harms), pp. 171-202. Rinehart and Winston, New
York.
(1982). Lexical phonology and morphology. In Linguistics in the morning calm (ed. In-
Seok Yong), pp. 3-91. Hanshin, Seoul.
(1985). Some consequences of lexical phonology. In Phonology yearbook, pp. 85-183. MIT
Press.
(1988). Phonological change. In Linguistics: The Cambridge Survey (ed. E Newmeyer),
Volume i: Theoretical foundations, pp. 363-415. Cambridge University Press, Cambridge.
(1995). The phonological basis of sound change. In Handbook of phonological theory (ed.
J. Goldsmith), pp. 640-70. Basil Blackwell, Oxford.
(2006). Amphichronic linguistics vs. Evolutionary Phonology. Theoretical Linguistics, 32,
217-36.
Kirby, James P. (2010). Cue selection and category restructuring in sound change. PhD thesis,
University of Chicago.
(2011). Modeling the acquisition of covert contrast. In Proceedings of the Seventeenth
International Conference of the Phonetic Sciences, Hong Kong.
Kirby, Simon (1999). Function, selection and innateness: The emergence of language universals.
Oxford University Press, Oxford.
Kirchner, Robert (1998). An effort-based approach to consonant lenition. PhD thesis, UCLA.
(2001). An effort based approach to consonant lenition. Routledge, New York.
Kirsch, Irving (1999). Response expectancy: an introduction. In How expectancies shape
experience, (ed. I. Kirsch), pp. 3-13. American Psychological Association, Washington,
DC.
Klatt, Dennis H. (1979). Speech perception: A model of acoustic-phonetic analysis and lexical
access. In Perception and production of fluent speech (ed. R. A. Cole), pp. 243-88. Erlbaum,
Hillsdale, N.J.
Klein, Sheldon (1966). Historical change in language using Monte Carlo techniques. Mechani-
cal Translation and Computational Linguistics, 9, 67-82.
References 309

Klein, Steven, Kuppin, Michael, and Meives, Kirby (1969). Monte Carlo simulation of language
change in Tikopia and Maori. In Proceedings of the 1969 Conference on Computational
Linguistics (COLING), pp. 699-729.
Koehler, Derek J. (2009). Probability matching in choice under uncertainty: Intuition versus
deliberation. Cognition, 113,123-7.
Komarova, N. L., Niyogi, Partha, and Nowak, M. A. (2001). The evolutionary dynamics of
grammar acquisition. Journal of Theoretical Biology, 209(1), 43-60.
and Nowak, Martin (2003). Language dynamics in finite populations. Journal of Theoret-
ical Biology, 221, 445-57.
Kornai, Andrs (1990). Hungarian vowel harmony. In Approaches to Hungarian, Volume Three:
Structures and Arguments (ed. I. Kenesei), pp. 183-240. JATE, Szeged.
Krmer, Martin (2001). Vowel harmony and correspondence theory. PhD thesis, University of
Dsseldorf.
(2003). Vowel harmony and correspondence theory. Mouton de Gruyter, Berlin.
(2009). The phonology of Italian. Oxford University Press, Oxford.
Kroch, Anthony (1989). Reflexes of grammar in patterns of language change. Language Varia-
tion and Change, i, 199-244.
Kruschke, John (2003). Attention in learning. Current Directions in Psychological Science, 12(5),
171-5.
Khl, Patricia K. (1991). Human adults and human infants show a 'perceptual magnet effect'
for the prototypes of speech categories, monkeys do not. Perception and Psychophysics, 50,
93-107-
, Andruski, Jean E., Chistovich, Inna A., Chistovich, Ludmilla A., Kozhevnikova, Elena V.,
Ryskina, Viktoria, Stolyarova, Elvira L, Sundberg, Ulla, and Lacerda, Francisco (1997).
Cross-language analysis of phonetic units in language addressed to infants. Science, 277,
684-6.
, Stevens, Erica, Hayashi, Akiko, Deguchi, Toshisada, Kiritani, Shigeru, and Iverson, Paul
(2006). Infants show a facilitation effect for native language phonetic perception between 6
and 12 months. Developmental Science, 9, Fi3-F2i.
, Williams, Karen A., Lacerda, Francisco, Stevens, Kenneth N., and Lindblom, Bjrn
(1992). Linguistic experience alters phonetic perception in infants by 6 months of age.
Science, 255, 606-8.
Kuipers, Aert H. (1974). The Shuswap language. Mouton, The Hague.
Kullback, Solomon and Leibler, Richard A. (1951). On information and sufficiency. Annals of
Mathematical Statistics, 22(1), 79-86.
Kmmel, Martin (2007). Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und
ihre Konsequenzen fr die vergleichende Rekonstruktionen. Reichert, Wiesbaden.
Kurylowicz, Jerzy (1965 [1972]). The evolution of grammatical categories. In Esquisses linguis-
tiques II, pp. 38-54. Fink, Munich.
Kutas, Marta and Hillyard, Steven A. (1984). Brain potentials during reading reflect word
expectancy and semantic association. Nature, 307,161-3.
Kwenzi Mikala, J. (1980). Esquisse phonologique du punu. In Elments de description dupunu
(ed. F. Nsuka-Nkutsi), pp. 51-114. CRLS, Universit Lyon II.
310 References

Kwon, Kyung-Keun (2003). Prosodie change from tone to vowel length in Korean. In Devel-
opment in prosodie systems (eds. P. Fikkert and H. Jacobs), pp. 67-89. Mouton de Gruyter,
New York.
Labov, William (1971). Methodology. In A survey of linguistic science (ed. W. O. Dingwall),
pp. 412-97. University of Maryland.
(1973). The linguistic consequences of being a lame. Language in Society, 2(1), 81-115.
(1981). Resolving the Neogrammarian controversy. Language, 57, 267-308.
(1989). The child as lingusitic historian. Language Variation and Change, i, 85-97.
(1990). The intersection of sex and social class in the course of linguistic change. Language
Variation and Change, 2(2), 205-54.
(1994). Principles of linguistic change, Volume i: Internal factors. Blackwell, Oxford.
(2001). Principles of linguistic change, Volume 2: Social factors. Blackwell, Oxford.
(2010). Principles of linguistic change, Volume 3: Cognitive and cultural factors. Wiley-
Blackwell, Maiden, Mass.
Yaeger, Malka, and Steiner, Richard (1972). A quantitative study of sound change in
progress. U.S. Regional Survey, Philadelphia.
Ladefoged, Peter and Maddieson, Ian (1996). The sounds of the world's languages. Blackwell
Publishers, Oxford.
Lashley, Karl S. (1951). The problem of serial order in behavior. In Cerebral mechanisms in
behavior (ed. L. Jeffress). Wiley, New York.
Lasky, R., Syrdal-Lasky A., and Klein, D. (1975). VOT discrimination by four- to six-month-
old infants from Spanish environments. Journal of Experimental Child Psychology, 20,
215-25.
Lavoie, Lisa (2001). Consonant strength: Phonological patterns and phonetic manifestations.
Routledge, New York.
Lee, Senghun Julio (2008). Consonant-tone interaction in optimality theory. PhD thesis, Rutgers
University.
Lee, Yongeun (2006). Sub-syllabic constituency in Korean and English. PhD thesis, Northwestern
University.
Lehiste, Use (1970). Suprasegmentals. MIT Press, Cambridge.
(1976). Influence of fundamental frequency pattern on the perception of duration. Journal
of Phonetics, 4,113-17.
(2003). Prosodie change in progress: from quantity language to accent language. In Devel-
opment in prosodie systems (eds. P. Fikkert and H. Jacobs), pp. 47-65. Mouton de Gruyter,
New York.
(2004). Bisyllabicity and tone. In Proceedings of the International Symposium on Tonal
Aspects of Language, pp. 111-14.
Lehnert-LeHouillier, Heike (2010). A cross-linguistic investigation of cues to vowel length
perception. Journal of Phonetics, 38(3), 472-82.
Levelt, C., Schiller, N., and Levelt, W (1999). The acquisition of syllable types. Language Acqui-
sition, 8, 237-64.
Levitt, Andrea G., Jusczyk, Peter W, Murray, Janice, and Carden, Guy (1988). Context effects
in two-month-old infants' perception of labiodental/interdental fricative contrasts. Journal
of Experimental Psychology: Human Perception and Performance, 14(3), 361-8.
References 311

Levy, Roger (2008). Expectation-based syntactic comprehension. Cognition, 106,1126-77.


and Jaeger, T. Florian (2007). Speakers optimize information density through syntactic
reduction. In Advances in neural information processing systems (eds. B. Schlkopf, J. Platt,
and T. Hoffman), Volume 19, pp. 849-56. MIT Press.
Li, Charles N. (1976). Subject and topic. Academic Press, New York.
Liberman, Alvin M., Cooper, Francis S., Shankweiler, Donald P., and Studdert-Kennedy,
Michael (1967). Perception of the speech code. Psychological Review, 74, 431-61.
, Harris, Katherine S., Hoffman, Howard S., and Griffith, Belver C. (1957). The discrim-
ination of speech sounds within and across phoneme boundaries. Journal of Experimental
Psychology, 54, 358-68.
and Mattingly, Ignatius G. (1985). The motor theory of speech perception revised. Cogni-
tion, 21,1-36.
Liberman, Mark (2000). The 'lexical contract': Modeling the emergence of word pronuncia-
tions. MS., University of Pennsylvania.
Lightfoot, David (1999). The development of language: Acquisition, change, and evolution. Black-
well, Maiden, Mass.
Liljencrants, Jonas and Lindblom, Bjrn (1972). Numerical simulation of vowel quality: The
role of perceptual contrast. Language, 48(4), 839-62.
Lindblad, Per (1980). Svenskans sje- och tje-ljud i ett allmnfonetisk perspektiv. Volume 16,
Travaux de l'Institut de Linguistique de Lund. C. W. K. Gleerup, Lund.
Lindblom, Bjrn (1963). Spectrographic study of vowel reduction. Journal of the Acoustical
Society of America, 35,1773-81.
(1983). Economy of speech gestures. In The production of speech (ed. P. McNeilage),
pp. 217-46. Springer Verlag, New York.
(1986). Phonetic universals in vowel systems. In Experimental phonology (ed. J. Ohala and
J. Jaeger), pp. 13-44. Academic Press, Orlando.
(1990). Explaining phonetic variation: a sketch of the H&H theory. In Speech production
and speech modelling (eds. W. Hardcastle and A. Marchai), pp. 403-39. Kluwer Academic,
Dordrecht.
(2003). Patterns of a phonetic contrast: Towards a unified explanatory framework. In
Proceedings of the i$th International Congress of Phonetic Sciences (eds. M. Sole, D. Recasens,
and J. Romero), pp. 39-42.
Guin, Susan, Hura, Susan, Moon, Seung-Jae, and Willerman, Raquel (1995). Is sound
change adaptive? Rivista di Lingistica, 7, 5-36.
and Maddieson, Ian (1988). Phonetic universals in consonant systems. In Language,
speech and mind (eds. L. Hyman and C. Li). Routledge, London.
Lisker, Leigh (1986). 'Voicing' in English: A catalogue of acoustic features signaling /b/ versus
/p/ in trochees. Language and Speech, 29, 3-11.
and Abramson, Arthur (1964). A cross-language study of voicing in initial stops: Acous-
tical measurements. Word, 20, 384-422.
(1970). The voicing dimension: some experiments in comparative phonetics. In
Proceedings of the 6th International Congress of Phonetic Sciences (eds. B. Hala, M. Romporti,
and P. Janota), Publishing House of the Czechoslovak Academy of Sciences, pp. 569-73.
Academia, Prague.
312 References

Liu, Huei-Mei, Khl, Patricia K., and Tsao, Feng-Ming (2003). The association between
mothers' clarity and infants' speech discrimination skill. Developmental Science, 6,
Fi-Fio.
Tsao, Feng-Ming, and Khl, Patricia K. (2007). Acoustic analysis of lexical tone in Man-
darin infant-directed speech. Developmental Psychology, 43(4), 912-17.
Lloyd, Paul M. (1987). From Latin to Spanish: Historical phonology and morphology of the
Spanish language. The American Philosophical Society, Philadelphia.
Lombardo, Michael V., Barnes, Jennifer L., Wheelwright, Sally J., and Simon Baron-Cohen
(2007). Self-referential cognition and empathy in autism. PLoS One, 2, 883.
Luce, Paul and Pisoni, David (1998). Recognizing spoken words: The neighborhood activation
model. Ear and Hearing, 19,1-36.
Luick, Karl (1921-40). Historische Grammatik der englischen Sprache. Tauchnitz, Leipzig.
MacKain, Kristine S., Best, Catherine T., and Strange, Winifred (1981). Categorical perception
of English lil and III by Japanese bilinguals. Applied Psycholinguistics, 2, 369-90.
MacKay, Donald G. (1970). Spoonerisms: The structure of errors in the serial order of speech.
Neuropsychologia, 8, 323-50.
MacKay, David J. (2002). Information theory, inference and learning algorithms. Cambridge
University Press, Cambridge.
Macken, Marlys A. (1980). The child's lexical representation: The 'puzzle-puddle-pickle' evi-
dence. Journal of Linguistics, 16,1-17.
Mackenzie, Sara (2008). Contrast and similarity in consonant harmony processes. PhD thesis,
University of Toronto.
Maddieson, Ian (1984). Patterns of sounds. Cambridge University Press, Cambridge.
(2008). Presence of uncommon consonants. In The world atlas of language structures
online (eds. M. Haspelmath, M. Dryer, D. Gil, and B. Comrie), Chapter 19. Max Planck
Digital Library, Munich.
and Precoda, Kristin (1992). Syllable structure and phonetic models. Phonology, 9,45-60.
Magen, Harriet S. (1989). An acoustic study of vowel-to-vowel coarticulation in English. PhD
thesis, Yale University.
(1997). The extent of vowel-to-vowel coarticulation in English. Journal of Phonetics,
25,187-205.
Mahanta, Shakuntala (2007). Directionality and locality in vowel harmony. PhD thesis, Utrecht
University.
Mailhot, Frdric (2010). Modelling the acquisition and evolution of vowel harmony. PhD thesis,
Carleton University.
Malsheen, Bathsheba J. (1980). Two hypotheses for phonetic clarification in the speech of moth-
ers to children. In Child phonology (eds. G. Yeni-Komishan, S. Kavanaugh, and C. Ferguson),
Volume 2: Perception. Academic Press, San Diego.
Maczak, Witold (1980). Laws of analogy. In Historical morphology (ed. J. Fisiak), pp. 283-8.
Mouton, The Hague.
Mann, Virginia A. and Repp, Bruno H. (1980). Influence of vocalic context on perception of
the [J]-[s] distinction. Perception and Psychophysics, 28, 213-28.
Manuel, Sharon (1987). Acoustic and perceptual consequences of vowel-to-vowel coarticulation
in three Bantu languages (Zimbabwe). PhD thesis, Yale University.
References 313

Manuel, Sharon (1990). The role of contrast in limiting vowel-to-vowel coarticulation in differ-
ent languages. Journal of the Acoustical Society of America, 88,1286-98.
(1999). Cross-linguistic studies: relating language-particular coarticulation patterns to
other language-particular facts. In Coarticulation: Theory, data and techniques (eds. W. Hard-
castle and N. Hewlett), pp. 179-98. Cambridge University Press, Cambridge.
and Krakow, Rena (1984). Universal and language particular aspects of vowel-to-vowel
coarticulation. Raskins Laboratories Status Report on Speech Research, SRjj/jS, 69-78.
Martin, Andrew Thomas (2007). The evolving lexicon. PhD thesis, University of California, Los
Angeles.
Martinet, Andr (1933). Remarques sur le systme phonologique du franais. Bulletin de la
Socit de Linguistique de Paris, 34,191-202.
(1952). Function, structure, and sound change. Word, 8(1), 1-32.
(i955)- conomie des changements phontiques. Francke, Berne.
(1960). Elments de linguistique gnrale. Colin, Paris.
Massaro, Dominic W. and Cohen, Michael M. (1983). Evaluation and integration of visual
and auditory information in speech perception. Journal of Experimental Psychology: Human
Perception and Performance, 9, 753-71.
Matisoff, James (1973). Tonogenesis in Southeast Asia. In Cosonant types and tone (ed.
L. Hyman), pp. 71-95. University of Southern California.
Mattock, Karen and Burnham, Denis (2006). Chinese and English infants' tone perception:
Evidence for perceptual reorganization. Infancy, 10(3), 241-65.
Maye, Jessica, Weiss, Daniel J., and Aslin, Richard N. (2008). Statistical phonetic learning in
infants: facilitation and feature generalization. Developmental Science, 11(1), 122-34.
Werker, Janet F., and Gerken, LouAnn (2002). Infant sensitivity to distributional infor-
mation can affect phonetic discrimination. Cognition, 82(3), Bioi-Bin.
McCarthy, John J. (1988). Feature geometry and dependency: a review. Phonetica, 43, 84-108.
(2002). Comparative markedness. In Papers in optimality theory II (eds. A. W C. Angela
C. Carpenter and P. de Lacy), pp. 171-246. Amherst, MA.
McCawley, James D. (1968). The phonological component of a grammar of Japanese. Mouton,
The Hague.
McDonough, Joyce (1991). On the representation of consonant harmony in Navajo. Proceedings
ofWCCFL, 10,319-35.
McKinley, Stephen C. and Nosofsky Robert (1996). Selective attention and the formation
of linear decision boundaries. Journal of Experimental Psychology: Human Perception and
Performance, 22, 294-317.
McLachlan, Geoffrey J. and Peel, David (2000). Finite mixture models. Wiley, New York.
McMurray, Bob, Aslin, Richard N., and Toscano, Joseph C. (2009). Statistical learning of pho-
netic categories: Insights from a computational approach. Developmental Science, 12(3),
369-78.
Meringer, Rudolf (1908). Aus dem Leben der Sprache: Versprechen, Kindersprache, Nach-
ahmungstrieb. Behr, Berlin.
and Mayer, Karl (1895). Versprechen und Verlesen: Eine psychologisch-linguistische Studie.
Gschen, Stuttgart.
Messick, Samuel (1976). Individuality in learning. Jossey-Bass, Oxford.
314 References

Mielke, Jeff (2005). Ambivalence and ambiguity in laterals and nasals. Phonology, 22(2),
169-203.
(2008). The emergence of distinctive features. Oxford Studies in Typology and Linguistic
Theory. Oxford University Press.
Baker, Adam, and Archangeli, Diana (2010). Variability and homogeneity in American
English lil and Is/ retraction. In Laboratory phonology 10 (eds. C. Fougeron, B. Kuehnert,
M. DTmperio, and N. Valle), pp. 699-719. Mouton de Gruyter, Berlin.
Magloughlin, Lyra, and Hume, Elizabeth (2011). Evaluating the effectiveness of Unified
Feature Theory and three other feature systems. In Tones and features: In honor of G. Nick
Clements. Mouton de Gruyter, Berlin.
Miller, George A. and Nicely, Patricia (1955). An analysis of perceptual confusions among some
English consonants. Journal of the Acoustical Society of America, 27, 338-52.
Milroy, James and Milroy, Lesley (1985). Linguistic change, social network and speaker inno-
vation. Journal of Linguistics, 21(2), 339-84.
Mitchener, W. Garrett (2005). Simulating language change in the presence of non-idealized
syntax. In Proceedings of the Workshop on Psychocomputational Models of Human Language
Acquisition, Ann Arbor, Michigan, pp. 10-19. Association for Computational Linguistics.
Mitterer, Holger (2006). On the causes of compensation for coarticulation: Evidence for phono-
logical mediation. Perception and Psychophysics, 68(7), 1227-40.
and Blomert, Leo (2003). Coping with phonological assimilation in speech perception:
Evidence for early compensation. Perception and Psychophysics, 65(6), 956-69.
Miyawaki, Kuniko, Strange, Winifred, Verbrugge, Robert, Liberman, Alvin M., Jenkins,
James J., and Fujimura, Osamu (1975). An effect of lingusitic experience: The discrimination
of [r] and [1] by native speakers of Japanese and English. Perception and Psychophysics, 18,
331-40.
Mochizuki, Michiko (1981). The identification of/r/ and /!/ in natural and synthesized speech.
Journal of Phonetics, 9, 283-303.
Mohr, Burkhart (1971). Intrinsic variations in the speech signal. Phonetica 23, 69-93.
Moren, Bruce and Zsiga, Elisabeth (2006). The lexical and post-lexical phonology of Thai tones.
Natural Language and Linguistic Theory, 24,113-78.
Moretn, Elliot (20o8a). Analytic bias and phonological typology. Phonology, 25(1), 83-127.
(2oo8b). Learning bias as a factor in phonological typology. In Proceedings of the 26th
Meeting of the West Coast Conference on Formal Linguistics (WCCFL) (eds. C. Chang and
A. Haynie), pp. 393-401. Cascadilla Proceedings Project, Somerville, MA.
(2010). Underphonologization and modularity bias. In Phonological argumentation:
Essays on evidence and motivation (ed. S. Parker). Equinox, London.
and Thomas, Erik R. (2007). Origins of Canadian Raising in voiceless-coda effects: A
case study in phonologization. In Laboratory phonology 9 (eds. J. S. Cole and J. I. Hualde),
pp. 37-64. Mouton, Berlin.
Morgan, James L., White, Katherine, Singh, Leher, and Bortfield, Heather (under review).
DRIBBLER: A developmental model of spoken word recognition. Psychological Review.
Mottron, Laurent, Dawson, Michelle, Soulires, Isabelle, Hubert, Bndicte, and Burack, Jake
(2006). Enhanced perceptual functioning in autism: An update, and eight principles of
autistic perception. Journal of Autism and Developmental Disorders, 36, 27-43.
References 315

Moulines, Eric and Charpentier, Francis (1990). Pitch synchronous waveform processing tech-
niques for text-to-speech synthesis using diphones. Speech Communication, 9, 453-67.
Moulton, William (1960). The short vowel systems of northern Switzerland: A study in struc-
tural dialectology. Word, 16,155-82.
(1967). Types of phonemic change. In To honor Roman Jakobson: Essays on the occasion
of his seventieth birthday, Volume 2, pp. 1393-407. Mouton, The Hague.
Mowrey, Richard and Pagliuca, William (1995). The reductive character of articulatory evolu-
tion. Rivista di Lingistica, 7, 37-124.
Mufwene, Salikoko S. (2001). The ecology of language evolution. Cambridge University Press,
Cambridge.
(2008). Language evolution: Contact, competition, and change. Continuum Press, London
and New York.
Munson, Benjamin (2001). Phonological pattern frequency and speech prodcution in adults
and children. Journal of Speech, Language, and Hearing Research, 44, 778-92.
Ntnen, Risto (2001). The perception of speech sounds by the human brain as reflected by the
mismatch negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology, 38,
1-21.

Namy Laura L., Nygaard, Lynne C., and Sauerteig, Denise (2002). Gender differences in vocal
accommodation: The role of perception. Journal of Language and Social Psychology, 21(4),
422-32.
Narayan, Chandan R. (2008). The acoustic-perceptual salience of nasal place contrasts. Jo urnal
of Phonetics, 36,191-217.
Werker, Janet R, and Beddor, Patrice S. (2010). The interaction between acoustic salience
and language experience in developmental speech perception: Evidence from nasal place
discrimination. Developmental Science, 13(3), 407-20.
Nearey, Terrance and Hogan, John T. (1986). Phonological contrast in experimental phonetics:
Relating distributions of production data to perceptual categorization curves. In Experimen-
tal phonology (eds. J. J. Ohala and J. J. Jaeger), pp. 141-62. Academic Press, Orlando.
Nettle, Daniel (2007). Empathizing and systemizing: What are they, and what do they con-
tribute to our understanding of psychological sex differences? British Journal of Psychol-
ogy, 98, 237-55.
Neu, Hlne (1980). Ranking of constraints on /t,d/ deletion in American English. In Locating
language in time and space (ed. William Labor), pp. 37-54. Academic Press.
New, Boris, Pallier, Christophe, Ferrand, Ludovic, and Matos, Rafeal (2001). Une base
de donnes lexicales du franais contemporain sur internet: LEXIQUE. LAnne Psy-
chologique, 101(3-4), 447-62.
Newman, Mark E. J. and Girvan, Michelle (2004). Finding and evaluating community structure
in networks. Physical Review E, 69(2), 026113.
Newman, Stanley (1944). Yokuts language of California. Viking Fund Publication in Anthro-
pology, no. 2, New York.
Newport, Elissa L. and Aslin, Richard N. (2004). Learning at a Distance I. Statistical Learning
of Nonadjacent Dependencies. Cognitive Psychology, 48,127-62.
Nielsen, Jimmi (2010). Lexical frequency effects in the spread of TH-fronting in Glaswegian:
A cue to the origins of sound change? Master s thesis, University of Edinburgh.
3i6 References

Nielsen, Kuniko Y. (2007). Implicit phonetic imitation is constrained by phonemic contrast.


In Proceedings of the i6th International Congress of the Phonetic Sciences, Saarbrcken, Ger-
many, pp. 1961-4. Saarbrcken.
(2008). The specificity of allophonic variability and its implications for accounts of speech
perception. PhD thesis, UCLA.
Nishi, Kanae, Strange, Winifred, Akahane-Yamada, Reiko, Kubo, Rieko, and Trent-Brown,
Sonja A. (2008). Acoustic and perceptual similarity of Japanese and American English vow-
els. Journal of the Acoustical Society of America, 124, 576-88.
Niyogi, Partha (2006). The computational nature of language learning and evolution. MIT Press,
Cambridge.
and Berwick, Robert C. (1995). The logical problem of language change. AI Memo 1516,
MIT.
(1996). A language learning model for finite parameter spaces. Cognition, 61(1-2),
161-93-
(1998). The logical problem of language change: A case study of European Por-
tuguese. Syntax, i, 192-205.
Nolan, Francis (1985). Idiosyncrasy in coarticulatory strategies. Cambridge Papers in Phonetics
and Experimental Linguistics, 4,1-9.
Norris, Dennis G. (1994). Shortlist: A connectionist model of continuous speech recognition.
Cognition, 52,189-234.
Nosofsky, Robert (1986). Attention, similarity, and the identification-categorization relation-
ship. Journal of Experimental Psychology: General, 115(1), 39-57.
Ohala, John J. (1974). Experimental historical phonology. In Historical linguistics II: theory and
description in phonology (eds. J. M. Anderson and C. Jones), pp. 353-79. North-Holland,
Amsterdam.
(1981). The listener as a source of sound change. In Papers from the Parasession on
Language and Behavior (eds. C. S. Masek, R. A. Hendrick, and M. F. Miller), pp. 178-203.
Chicago Linguistic Society, Chicago.
(1983). The origin of sound patterns in vocal tract constraints. In The production of speech
(ed. P. F. MacNeilage), pp. 189-216. Springer-Verlag, New York.
(1989). Sound change is drawn from a pool of synchronie variation. In Language change:
Contributions to the study of its causes (eds. L. E. Breivik and E. H. Jahr), pp. 173-98. Mouton
de Gruyter, Berlin.
(1990). There is no interface between phonology and phonetics: a personal view. Journal
of Phonetics, 18,153-71.
(1992). What's cognitive, what's not, in sound change. In Diachrony within synchrony
(eds. M. Morrissey and G. Hellermann), pp. 309-55. Peter Verlag, Frankfurt.
(i993a). Coarticulation and phonology. Language and Speech, 36,155-70.
(i993b). The phonetics of sound change. In Historical linguistics: problems and perspectives
(ed. C. Jones), pp. 237-78. Longman Academic, London.
(i993c). Sound change as nature's speech perception experiment. Speech Communica-
tion, 13,155-61.
(i994a). Hierarchies of environments for sound variation; plus implications for 'neutral'
vowels in vowel harmony. Acta Lingistica Hafniensa, 27, 371-82.
References 317

(i994b). Towards a universal, phonetically-based, theory of vowel harmony In Proceed-


ings of the Third International Conference on Spoken Language Processing, Yokohama, Japan,
pp. 491-4.
(1997). Aerodynamics of phonology. In Proceedings of the 4th Seoul International Confer-
ence on Lingusitics [SICOL], Seoul, pp. 92-7.
(2003). Phonetics and historical phonology. In The handbook of historical linguistics (eds.
B. D. Joseph and R. D. Janda), pp. 669-86. Blackwell.
and Shriberg, Elizabeth E. (1990). Hypercorrection in speech perception. In Proceedings
of the International Conference on Spoken Language Processing, Volume i, Kobe, pp. 405-8.
Acoustical Society of Japan.
and Sol, Maria-Josep (2010). Turbulence and phonology. In Turbulent sounds: An inter-
disciplinary guide (eds. S. Fuchs, M. Toda, and M. Zygis), pp. 37-101. Mouton De Gruyter,
Berlin.
hman, Sven E. G. (1966). Coarticulation in VCV utterances: Spectrographic measurements.
Journal of the Acoustical Society of America, 39,151-68.
Oldfield, Richard C. and Wingfield, Arthur (1965). Response latencies in naming objects. The
Quarterly Journal of Experimental Psychology, 17(4), 273-81.
Oliphant, Travis, Jones, Eric, Peterson, Pearu and others (2001). SciPy: Open source scientific
tools for Python, http://www.scipy.org.
Osthoff, Hermann and Brugmann, Karl (1878). Vorwort. Morphologische Untersuchungen, i,
iii-xx.
Otto, A. Ross, Taylor, Eric G., and Markman, Arthur B. (2011). There are at least two kinds of
probability matching: Evidence from a secondary task. Cognition, 118, 274-9.
Oudeyer, Pierre-Yves (2006). Self-organization in the evolution of speech: Studies in the evolution
of language. Oxford University Press, Oxford.
Pardo, Jennifer S. (2006). On phonetic convergence during conversational interaction. Journal
of the Acoustical Society of America, 119(4), 2382-93.
Parush, Avraham, Ostry David J., and Munhall, Kevin G. (1983). A kinematic study of lin-
gual coarticulation in VCV sequences. Journal of the Acoustical Society of America, 74,
1115-23.
Pater, Joe (2004). Austronesian nasal subsitution and other *NC effects. In Optimality the-
ory in phonology: A reader (ed. J. J. McCarthy), Chapter 14, pp. 271-89. Blackwell,
Oxford.
Patterson, David and Connine, Cynthia M. (2001). Variant frequency in flap production: A
corpus analysis of variant frequency in American English flap production. Phonetica, 58(4),
254-75.
Paul, Hermann (1880). Principien der Sprachgeschichte (ist edn.). Max Niemeyer, Halle.
(1920). Prinzipien der Sprachgeschichte (5th edn.). Max Niemeyer, Halle.
Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference.
Morgan Kaufmann, San Francisco, CA.
Pearl, Lisa (2007). Necessary bias in language learning. PhD thesis, University of Maryland.
and Weinberg, Amy (2007). Input filtering in syntactic acquisition: Answers from lan-
guage change modeling. Language learning and development, 3(1), 43-72.
Pelling, J. N. (1971). A practical N deb ele dictionary. Longman Zimbabwe, Harare.
3i8 References

Peperkamp, Sharon (2003). Phonological acquisition: Recent attainments and new challenges.
Language and Speech, 46(2-3), 97-113.
Vendelin, Inga, and Nakamura, Kimihiro (2008). On the perceptual origin of loanword
adaptations: experimental evidence from Japanese. Phonology, 25(1), 129-64.
Peterson, Gonnou E. and Barney, Harold L. (1952). Control methods used in the study of
vowels. Journal of the Acoustical Society of America, 24,175-84.
Phillips, Betty S. (1984). Word frequency and the actuation of sound change. Language, 60(2),
320-42.
(2000). Fast words, slow words. American Speech, 75(4), 414-16.
(2001). Lexical diffusion, lexical frequency, and lexical analysis. In Frequency and the
emergence of linguistic structure (eds. J. L. Bybee and P. J. Hopper), pp. 123-6. John Ben-
jamins, Amsterdam.
(2006). Word frequency and lexical diffusion. Palgrave Macmillan, New York.
Pierrehumbert, Janet B. (1980). The phonetics and phonology of English intonation. PhD thesis,
M.I.T.
(1990). Phonological and phonetic representation. Journal of Phonetics 18, 375-94.
(200la). Exemplar dynamics: Word frequency, lenition and contrast. In Frequency and the
emergence of linguistic structure (eds. J. L. Bybee and P. Hopper), pp. 137-57. John Benjamins,
Amsterdam.
(2ooib). Why phonological constraints are so coarse-grained. Language and Cognitive
Processes, 16(5-6), 691-8.
(2002). Word-specific phonetics. In Laboratory phonology (eds. C. Gussenhoven and
N. Warner), Vol. VII, Phonology and phonetics, pp. 101-39. Mouton de Gruyter, Berlin.
(2004). Phonetic diversity, statistical learning, and acquisition of phonology. Language
and Speech, 46(2-3), 115-54.
Piggott, Glyne (1992). Variability in feature dependency: The case of nasality. Natural Language
and Linguistic Theory 10, 33-77.
Pisoni, David B. (1976). Fundamental frequency and perceived vowel duration. Journal of the
Acoustical Society of America, 59(81), 839-839.
(1977). Identification and discrimination of the relative onset time of two component
tones: Implications for voicing perception in stops. Journal of the Acoustic Society of Amer-
ica, 61(5), 1352-62.
and Aslin, Richard N. (1982). Some effects of laboratory training on identification and
discrimination of voicing contrasts in stop consonants. Journal of Experimental Psychology:
Human Perception and Performance, 8, 297-314.
Pitt, Mark (1998). Phonological processes and the perception of phonotactically illegal conso-
nant clusters. Perception and Psychophysics, 60(6), 941-51.
Dilley, Laura, Johnson, Keith, Kiesling, Scott, Raymond, William, Hume, Elizabeth,
and Fosler-Lussier, E. (2007). Buckeye Corpus of Conversational Speech (2nd release).
www.buckeyecorpus.osu.edu. Columbus, OH: Department of Psychology, Ohio State Uni-
versity (distributor).
Pitt, Mark A. and Johnson, Keith (2003). Using pronunciation data as a starting point in
modeling word recognition. Manuscript, The Ohio State University.
References 319

and McQueen, James (1998). Is compensation for coarticulation mediated by the lexicon?
Journal of Memory and Language, 39, 347-70.
Polka, Linda (1991). Cross-language speech perception in adults: Phonemic, phonetic and
acoustic contributions. Journal of the Acoustical Society of America, 89(6), 2961-77.
Colantonio, Connie, and Sundara, Megha (2001). A cross-language comparison of
/d/-/o/ perception: Evidence for a new developmental pattern. Journal of the Acoustical
Society of America, 109(5), 2190-201.
and Strange, Winifred (1985). Perceptual equivalence of acoustic cues that differentiate
/r/and /!/. Journal of the Acoustical Society of America, 78(4), 1187-97.
and Werker, Janet E (1994). Developmental changes in perception of non-native vowel
contrasts. Journal of Experimental Psychology: Human Perception and Performance, 20,
421-35-
Port, Robert E (2003). Meter and speech. Journal of Phonetics, 31, 599-611.
Pouplier, Marianne and Goldstein, Louis (2010). Intention in articulation: Articulatory timing
of alternating consonant sequences and its implications for models of speech production.
Language and Cognitive Processes, 25, 616-49.
Prince, Alan and Smolensky, Paul (2004). Optimality Theory: Constraint interaction in genera-
tive grammar. Blackwell, Maiden, Mass.
Przezdziecki, Marek A. (2005). Vowel harmony and coarticulation in three dialects ofYoruba:
phonetics determining phonology. PhD thesis, Cornell.
Pulvermller, Friedemann, Huss, Martina, Kherif, Ferath, Moscoso del Prado Martin, Fer-
min, Hauk, Olaf, and Shtyrov, Yury (2006). Motor cortex maps articulatory features of
speech sounds. In Proceedings of the National Academy of Sciences, USA, Volume 103,
pp. 7865-70.
Purcell, David W and Munhall, Kevin G. (2006). Adaptive control of vowel formant frequency:
Evidence from real-time formant manipulation. Journal of the Acoustical Society of Amer-
ica, 120, 966-77.
Puri, Amrita and Wojciulik, Ewa (2008). Expectation both helps and hinders object perception.
Vision Research, 48, 589-97.
Purnell, Thomas, Salmons, Joseph, Tepeli, Dilara, and Mercer, Jennifer (2005). Struc-
ture heterogeneity and change in laryngeal phonetics. Journal of English Linguistics, 33,
307-38.
Quam, Carolyn, Yuan, Jiahong, and Swingley, Daniel (2008). Relating intonational pragmatics
to the pitch realizations of highly frequent words in English speech to infants. In Proceedings
ofthesoth Annual Conference of the Cognitive Science Society (eds. B. C. Love, K. McRae, and
V. M. Sloutsky), pp. 217-22. Cognitive Science Society, Austin, TX.
R Development Core Team (2010). R: A language and environment for statistical computing.
Technical report, R Foundation for Statistical Computing, Vienna.
Raymond, William, Dautricourt, Robin, and Hume, Elizabeth (2006). Word-medial /t, d/ dele-
tion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonologi-
cal factors. Language Variation and Change, 18, 55-97.
Reading, Anthony (2004). Hope and despair. How perceptions of the future shape human behav-
ior. The Johns Hopkins University Press.
3 20 References

Recasens, Daniel (1984). Vowel-to-vowel coarticulation in Catalan VCV sequences. Journal of


the Acoustical Society of America, 76(6), 1624-35.
Pallares, Maria Dolors, and Fontdevila, Jordi (1997). A model of lingual coarticulation
based on articulatory constraints. Journal of the Acoustical Society of America, 102(1),
544-61.
Reichard, Gladys (1938). Coeur d'Alne. In Handbook of American Indian languages
(ed. E Boas), Volume 3, pp. 515-707. J. J. Augustin, New York.
Remez, Robert E., Fellowes, Jennifer M., and Rubin, Philip E. (1997). Talker identification
based on phonetic information. Journal of Experimental Psychology: Human Perception and
Performance, 23, 651-66.
Rennison, John Richard (1990). On the elements of phonological representations: The evidence
from vowel systems and vowel processes. Folia Lingistica, 24,175-244.
Rhodes, Richard A. (1992). Flapping in American English. In Proceedings of the Seventh Inter-
national Phonology Meeting, pp. 217-32. Rosenberg and Sellier, Turin.
Rice, Keren (1993). A re-examination of the feature [sonorant]: The status of'sonorant obstru-
ents'. Language 69, 308-44.
Richards, Russell M. (1991). Phonologie de trois langues beboides du Cameroun: Noone, Ncanti
et Sari. PhD thesis, Universit de la Sorbonne Nouvelle Paris III.
Riding, Richard J. and Rayner, Stephen (2000). International perspectives on individual differ-
ences, Vol i: Cognitive Styles. Ablex Publishing Corporation.
Rix, Helmut (1992). Historische Grammatik des Griechischen: Laut- und Formenlehre (2nd edn).
Wissenschaftliche Buchgesellschaft, Darmstadt.
Rizzolatti, Giacomo and Craighero, Laila (2004). The mirror-neuron system. Annual Review of
Neuroscience, 27, 169-92.
Roengpitya, Rungpat (2001). A study of vowels, diphthongs, and tones in Thai. PhD thesis,
University of California, Berkeley.
Rose, Yvan (2009). Internal and external influences on child language productions. In
Approaches to phonological complexity (eds. I. Chitoran, F. Pellegrino, and E. Marsico), pp.
329-51. Mouton de Gruyter.
Rosen, Stuart M. (1977). Fundamental frequency patterns and the long-short vowel distinc-
tion in Swedish. Speech Transmission Laboratory, Quarterly Progress and Status Report, i,
31-7.
Ross, John R. (1973). Leftward, ho! In Festschrift for Morris Halle (eds. S. Anderson and
P. Kiparsky), pp. 166-73. Holt, Rinehart and Winston, New York.
Rumelhart, David E. and McClelland, James M. (1986). On learning the past tenses of English
verbs. In Parallel distributed processing: Explorations of the microstructure of cognition, Vol-
ume 2, pp. 216-71. MIT Press, Cambridge, MA.
Russell, Dan (1996). UCLA loneliness scale (Version 3): Reliability, validity, and factor struc-
ture. Journal of Personality Assessment, 66(1), 20-40.
Russell, Stuart and Norvig, Peter (1995). Artificial intelligence: A modern approach (ist edn.).
Prentice Hall, N.J.
Sachs, Jacqueline (1977). The adaptive significance of linguistic input to prelinguistic children.
In Talking to children (eds. C. Snow and C. Ferguson). Cambridge University Press.
References 321

Sagey, Elizabeth C. (1986 [1990]). The representation of features and relations in non-linear
phonology. Garland Publishing, New York.
Saltzman, Elliot and Munhall, Kevin G. (1989). A dynamical approach to gestural patterning
in speech production. Ecological Psychology, i, 333-82.
Nam, Hosung, Krivokapic, Jelena, and Goldstein, Louis (2008). A task-dynamic toolkit
for modeling the effects of prosodie structure on articulation. In Proceedings of the Speech
Prosody 2008 Conference, Campinas, Brazil (eds. P. A. Barbosa, S. Madureira, and C. Reis).
Salverda, Anne Pier, Delphine Dahan, et al. (2003). The role of prosodie boundaries in the
resolution of lexical embedding in speech comprehension. Cognition, 90, 51-89.
Schachter, Paul (1976). An unnatural class of consonants in Siswati. Studies in African Linguis-
tics, Supplement 6, 211-20.
and Fromkin, Victoria (1968). A phonology of Akan. UCLA Working Papers in
Phonetics, 9.
Schilling-Estes, Natalie (2002). American English social dialect variation and gender. Journal
of English Linguistics, 30(2), 122-37.
Schuh, Russell G. (1998). A grammar ofMiya. University of California Press, Berkeley.
Selkirk, Elizabeth O. (1980). Prosodie domains in phonology: Sanskrit revisited. In Juncture
(eds. M. Aronoff and M.-L. Kean), pp. 107-29. Anma Libri, Saratoga.
Sendlmeier, Werner E (1981). Der Einflu von Qualitt und Quantitt auf die Perzeption
betonter Vokale im Deutschen. Phonetica, 38, 291-308.
Sereno, Joan A. and Jongman, Allard (1995). Acoustic correlates of grammatical class. Language
and Speech, 38(1), 57-76.
Shannon, Claude (1948). A mathematical theory of communication. The Bell System Technical
Journal, 27, 379~423> 623-56.
Shattuck-Hufnagel, Stefanie (1987). The role of word-onset consonants in speech production
planning: New evidence from speech error patterns. In Motor and sensory patterns in lan-
guage (eds. E. Keller and M. Gopnik). Erlbaum, Englewood Cliffs, N.J.
and Klatt, Dennis H. (1979). The limited use of distinctive features and markedness in
speech production: Evidence from speech error data. Journal of Verbal Learning and Verbal
Behavior, 18, 41-55.
Sheldon, Amy and Strange, Winifred (1982). The acquisition of/r/ and III by Japanese learn-
ers of English: Evidence that speech production can precede speech perception. Applied
Psycholinguistics, 3, 243-61.
Sheliga, Boris M., Riggio, Lucia, and Rizzolatti, Giacomo (1994). Orienting of attention and eye
movements. Experimental Brain Research, 98, 507-22.
Shen, Xiaonan (1990). Tonal coarticulation in Mandarin. Journal of Phonetics, 18(2), 281-95.
Sherman, Donald (1975). Noun-verb stress alternation: An example of the lexical diffusion of
sound change in English. Linguistics, 159, 43-71.
Shiller, Douglas M., Sato, Marc, Gracco, Vincent L., and Baum, Shari R. (2009). Perceptual
recalibration of speech sounds following speech motor learning. Journal of the Acoustical
Society of America, 125,1103-13.
Shockley, Kevin, Sabadini, Laura, and Fowler, Carol A. (2004). Imitation in shadowing words.
Percetion and Psychophysics, 66(3), 422-9.
322 References

Shriberg, Elizabeth E. (1992). Perceptual restoration of filtered vowels with added noise. Lan-
guage and Speech, 35,127-36.
Sievers, Eduard (1898). Angelschsische Grammatik (3rd edn.). Max Niemeyer, Halle.
Silva, David J. (1992). The phonetics and phonology of stop lenition in Korean. PhD thesis, Cornell
University.
(1993). A phonetically based analysis of [voice] and [fortis] in Korean. In Japanese/Korean
Linguistics (ed. P. M. Clancy), Volume 2, pp. 164-74. CSLI, Stanford.
(20o6a). Acoustic evidence for the emergence of tonal contrast in contemporary Korean.
Phonology, 23, 287-308.
(20o6b). Variation in voice onset time for Korean stops: A case for recent sound change.
Korean Linguistics, 13,1-16.
Silverman, Daniel (2oo6a). A critical introduction to phonology: Of sound, mind, and body.
Continuum.
(20o6b). The diachrony of labiality in Trique, and the functional relevance of gradience
and variation. In Papers in laboratory phonology VIII (eds. L. M. Goldstein, D. H. Whalen,
and C. T. Best), pp. 135-54. Mouton de Gruyter, Berlin.
Sims, Andrea (2005). Declension hopping in dialectal Croatian: Two predictions of frequency.
In Yearbook of Morphology 2005 (eds. G. Booij and J. van Marie), pp. 201-25. Springer,
Dordrecht.
Sipka, Danko (2002). Enigmatski Glosar. Alma, Belgrade.
Smith, Caroline L. (1997). The devoicing of /z/ in American English: effects of local and
prosodie context. Journal of Phonetics, 25(4), 471-500.
Smith, Jennifer L. (2002). Phonological augmentation in prominent positions. PhD thesis, Uni-
versity of Massachusetts.
Smyth, Herbert Weir (1956). Greek grammar. Revised by Gordon M. Messing. Harvard Uni-
versity Press, Cambridge, Mass.
Snider, Keith L. (1986). Apocope, tone and the glottal stop in Chumburung. Journal of African
Languages and Linguistics, 8,133-44.
Sohn, Ho-Min (1994). Korean. Routledge, New York.
(1999). The Korean Language. Cambridge University Press, Cambridge.
Sol, Maria-Josep (i992a). Experimental phonology: The case of rhotacism. In Phonologica
1988 (eds. W. U. Dressler, H. C. Luschtzky, O. E. Pfeiffer, and J. R. Rennison), pp. 259-71.
Cambridge University Press, Cambridge.
(i992b). Phonetic and phonological processes: the case of nasalization. Language and
Speech, 35(1-2), 29-43.
Sonderegger, Morgan (2009). Dynamical systems models of language variation and change:
An application to an English stress shift. Masters thesis, Department of Computer Science,
University of Chicago.
(in press). Testing for frequency and structural effects in an English stress shift. In Pro-
ceedings of the Berkeley Linguistics Society 36 (eds. J. Cleary-Kemp, C. Cohen, S. Farmer,
L. Kassner, J. Sylak, and M. Woodley). Berkeley Linguistics Society.
and Niyogi, Partha (2010). Combining data and mathematical models of language change.
In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,
Uppsala, Sweden, pp. 1019-29. Association for Computational Linguistics.
References 323

Stampe, David (1972). A dissertation on natural phonology. PhD thesis, University of Chicago.
Stanley, Carol (1991). Description morpho-syntaxique de la langue tikar (parle au Cameroun).
SIL, Epinay-sur-Seine.
Stemberger, Joseph Paul (1991). Apparent anti-frequency effects in language production: The
Addition Bias and phonological underspecification. Journal of Memory and Language, 30,
161-85.
and Treiman, M. (1986). The internal structure of word-initial consonant clusters. Journal
of Memory and Language, 25,163-80.
Steriade, Donca (2000). Paradigm uniformity and the phonetics-phonology boundary. In
Papers in laboratory phonology V: Acquisition and the lexicon (eds. M. Broe and J. Pierre-
humbert), pp. 313-34. Cambridge University Press, Cambridge.
(2001). Directional asymmetries in place assimilation: A perceptual account. In Perception
in phonology (eds. E. Hume and K. Johnson), pp. 219-50. Academic Press, San Diego.
(2008). The phonology of perceptibility effects: The P-map and its consequences for constraint
organization. In The nature of the word: Essays in honor of Paul Kiparsky (eds. K. Hanson and
S. Inkelas), pp. 151-80. MIT Press, Cambridge, Mass.
Knoll, Ronald, Monsell, Stephen, and Wright, Charles E. (1988). Motor programs and
hierarchical organization in the control of rapid speech. Phonetica, 45,175-97.
Sternberg, Sai, Monsell, Stephen, Knoll, Ronald, and Wright, Charles E. (1978). The latency and
duration of rapid movement sequences: Comparisons of speech and typing. In Information
Processing in Motor Control and Learning (ed. G. E. Stelmach), pp. 117-52. Academic Press,
New York.
Stevens, Kenneth N. (1989). On the quantal nature of speech. Journal of Phonetics, 17, 3-46.
and Halle, Morris (1967). Remarks on analysis by synthesis and distinctive features. In
Models for the perception of speech and visual form (ed. W Wathen-Dunn), pp. 88-102. MIT
Press.
and House, Arthur S. (1963). Perturbation of vowel articulations by consonantal context:
An acoustical study. Journal of Speech and Hearing Research, 6,111-28.
and Keyser, Samuel Jay (1989). Primary features and their enhancement in consonants.
Language, 65, 81-106.
Stewart, Mary E. and Ota, Mitsuhiko (2008). Lexical effects on speech perception in individuals
with 'autistic' traits. Cognition, 109,157-62.
Stoesz, Brenda M. and Jakobson, Loma S. (2008). The influence of processing style on face
perception. Journal of Vision, 8(6), 1138.
Strand, Elizabeth A. (1999). Uncovering the role of gender stereotypes in speech perception.
Journal of Language and Psychology, 18, 86-99.
Strange, Winifred, Verbrugge, Robert R., Shankweiler, Donald P., and Edman, Thomas R.
(1976). Consonant environment specifies vowel identity. Journal of the Acoustical Society
of America, 60, 213-24.
Streeter, Lynn A. (1976). Language perception of two-month-old infants show effects of both
innate mechanisms and experience. Nature, 259, 39-41.
Strogatz, Steven H. (1994). Nonlinear dynamics and chaos. Addison-Wesley, Reading, MA.
Strong, Herbert A., Logeman, Willem S., and Wheeler, Benjamin Ide (1891). Introduction to
the study of the history of language. Longmans, Green, & Co., New York.
3 24 References

Stuart-Smith, Jane and Timmins, Claire (2009). The role of the individual in language variation
and change. In Language and Identities (eds. C. Llamas and D. Watt), pp. 39-54. Edinburgh
University Press, Edinburgh.
and Tweedie, Fiona (2007). 'Talkin jockney'? Variation and change in Glaswegian
accent. Journal of Sociolinguistics, 11, 221-60.
Summerfield, Quentin (1981). Articulatery rate and perceptual constancy in phonetic per-
ception. Journal of Experimental Psychology: Human Perception and Performance, 7,
1074-95-
Sundberg, Ulla and Lacerda, Francisco (1999). Voice onset time in speech to infants and adults.
Phonetica, 56,186-99.
Svantesson, Jan-Olof (1989). Tonogenetic mechanisms in Northern Mon-Khmer. Phonet-
ica, 46, 60-79.
Sweet, Henry (1913). Collected papers of Henry Sweet, ed. by H. C. Wyld. Clarendon Press,
Oxford.
Tabor, Whitney T. (1994). Syntactic innovation: A connectionist model. PhD thesis, Stanford
University, Stanford, CA.
Tang, Joanne S.-Y. and Maidment, John A. (1996). Prosodie aspects of Cantonese child-directed
speech. Speech, Hearing and Language, 9, 257-76.
Tang, Katrina Elizabeth (2008). The phonology and phonetics of consonant-tone interaction.
PhD thesis, UCLA.
Tanowitz, Jill and Beddor, Patrice Specter (1997). Temporal characteristics of coarticula-
tory vowel nasalization in English. The Journal of the Acoustical Society of America, 101,
3194A.
Tees, Richard C. and Werker, Janet F. (1984). Perceptual flexibility: Maintenance or recovery
of the ability to discriminate nonnative speech sounds. Canadian Journal of Psychology, 38,
579-90.
Templin, Mildred C. (1957). Certain language skills in children: Their development and interre-
lationships. Greenwood, Westport, Conn.
Tettamanti, Marco, Moro, Andrea, Messa, Cristina, Moresco, Rosa M., Rizzo, Giovanna,
Carpinelli, Assunta, Matarrese, Mario, Fazio, Ferruccio, and Perani, Daniela (2005). Basal
ganglia and language: phonology modulates dopaminergic release. Neuroreport, 16(4),
397-401.
Thiessen, Erik D., Hill, Emily A., and Saffran, Jenny R. (2005). Infant-directed speech facilitates
word segmentation. Infancy, 7(1), 53-71.
Thompson, Laurence C. and Thompson, M. Terry (1985). A Grassmanns Law for Salish.
Oceanic Linguistics Special Publications, 20,134-47.
Thurgood, Graham, and Javkin, Hector. (1975). An acoustic explanation of a sound change:
*-ap to -o, *-at to -e, and *-ak to -ae. Journal of Phonetics 3,161-5.
Tilsen, Sam (2007). Vowel-to-vowel coarticulation and dissimilation in phonemic-response
priming. In UC-Berkeley Phonology Lab Annual Report, pp. 416-58. Berkeley Phonology
Laboratory.
(2oo9a). Interactions between speech rhythm and gesture. PhD thesis, University of
California, Berkeley.
References 325

Tusen, Sam (2009!)). Subphonemic and cross-phonemic priming in vowel shadowing: evidence
for the involvement of exemplars in production. Journal of Phonetics, 37(3), 276-96.
Tipper, Steven P., Howard, Louise A., and Houghton, George (2000). Behavioral consequences
of selection from neural population codes. In Attention and performance XVIII: Control of
cognitive processes (eds. S. Monsell and J. Driver), pp. 223-45. MIT Press, Cambridge, MA.
Toon, Thomas E. (1978). Lexical diffusion in Old English. In Chicago Linguistics Society: Papers
from theparasessions on the lexicon, pp. 357-64. Chicago Linguistics Society.
Toscano, Joseph C. and McMurray, Bob (2010). Cue integration with categories: Weighting
acoustic cues in speech using unsupervised learning and distributional statistics. Cognitive
Science, 34, 434-64.
Tournadre, Nicolas (2005). LAire linguistique tibtaine et ses divers dialectes. LALIES, 25,
7-56.
Townsend, David and Bever, Thomas (2001). Sentence comprehension: The integration of habits
and rules. MIT Press, Cambridge, MA.
Trager, George L. (1940). One phonemic entity becomes two: The case of 'short a'. American
Speech, 15, 255-8.
Traill, Anthony (1990). Depression without depressors. South African Journal of African Lan-
guages, 10,166-72.
Trehub, Sandra E. (1976). The discrimination of foreign speech contrasts by infants and adults.
Child Development, 47, 466-72.
Treiman, Rebecca, Kessler, Brett, Knewasser, Stephanie, Tincoff, Ruth, and Bowman, Margo
(2000). English speakers' sensitivity to phonotactic patterns. In Papers in laboratory phonol-
ogy V: Acquisition and the lexicon (eds. M. Broe and J. Pierrehumbert), pp. 269-82.
Cambridge University Press, Cambridge.
Tremblay, Kelly, Kraus, Nina, and McGee, Thrse (1998). The time course of auditory percep-
tual learning: Neurophysiological changes during speech-sound training. NeuroReport, 9,
3557-60.
Troutman, Celina, Goldrick, Matthew, and Clark, Brady (2008). Social networks and
intraspeaker variation during periods of language change. University of Pennsylvania Work-
ing Papers in Linguistics, 14(1), 325-38.
Trubetzkoy, Nikolai Sergeevich (1969). Principles of phonology [originally published in 1939;
English translation by Christiane A. M. Baitaxe]. University of California Press, Berkeley,
CA.
Tsushima, Teruaki, Takizawa, Osamu, Sasaki, Midori, Shiraki, Satoshi, Nishi, Kanae, Kohno,
Morio, Menyuk, Paula, and Best, Catherine T. (1994). Discrimination of English /r-1/ and
/w-y/ by Japanese infants at 6-12 months: Language-specific developmental changes in
speech perception abilities. In Proceedings of the International Conference on Spoken Lan-
guage Processing, Volume 4, pp. 1695-8. Acoustical Society of Japan.
Umeda, N. (1981). Influence of segmental factors on fundamental frequency in fluent speech.
Journal of the Acoustical Society of America, 70(2), 350-5.
Vlimaa-Blum, Riitta (2009). The phoneme in cognitive phonology: episodic memories of both
meaningful and meaningless units? Cognitextes, 2. Retrieved from http://cognitextes.revues.
org/2ii on 2010-07-16.
3 26 References

Vallabha, Gautam K., McClelland, James L., Pons, Ferran, Werker, Janet R, and Amano, Shigeaki
(2007). Unsupervised learning of vowel categories from infant-directed speech. Proceedings
of the National Academy of Sciences, 104(33), 13273-8.
van der Hlst, Harry and van de Weijer, Jeroen (1995). Vowel harmony. In Handbook of phono-
logical theory (ed. J. Goldsmith). Blackwell, Cambridge, MA and Oxford.
Van der Stigchel, Stefan, Meeter, Martijn, and Theeuwes, Jan (2006). Eye movement trajectories
and what they tell us. Neuroscience Biobehavioral Review, 30(5), 666-79.
and Theeuwes, Jan (2005). The influence of attending to multiple locations on eye move-
ments. Vision Research, 45(15), 1921-7.
van Dommelen, Wim A. (1993). Does dynamic Fo increase perceived duration? New light on
an old issue. Journal of Phonetics, 21, 367-86.
Vance, Timothy J. (1987). An introduction to Japanese phonology. State University of New York
Press, Albany.
Vennemann, Theo (i972a). Phonetic analogy and conceptual analogy. In Schuchhardt, the
Neogrammarians, and the transformational theory of phonological change: Four essays by
Hugo Schuchhardt, Theo Vennemann, Terence H. Wilbur (eds. T. Vennemann and T. H.
Wilbur), No. 26 in Linguistische Forschungen, pp. 115-79. Athenum, Frankfurt am Main.
(i972b). Rule inversion. Lingua 29, 209-42.
(1974). Words and syllables in natural generative phonology. In Parasession on natural
phonology (eds. A. Brck, R. Fox, and M. La Galy), pp. 346-74. Chicago Linguistic Society.
Vergnaud, Jean-Roger (1980). A formal theory of vowel harmony. In Issues in vowel harmony
(ed. R. M. Vago), pp. 49-62. John Benjamins, Amsterdam.
Verner, Karl (1877). Eine Ausnahme der ersten Lautverschiebung. Zeitschrift fr vergleichende
Sprachforschung, 23, 97-130.
Viswanathana, Navin, Magnusona, James S., and Fowler, Carol A. (2010). Compensation for
coarticulation: Disentangling auditory and gestural theories of perception of coarticula-
tory effects in speech. Journal of Experimental Psychology: Human Perception and Perfor-
mance, 36(4), 1005-15.
Vitevitch, Michael and Luce, Paul (1999). Probabilistic phonotactics and neighborhood activa-
tion in spoken word recognition. Journal of Memory and Language, 40, 374-408.
Charles-Luce, Jan, and Kemmerer, David (1997). Phonotactics and syllable stress:
Implications for the processing of spoken nonsense words. Language and Speech, 40(1),
47-62.
von dem Hagen, Elisabeth A. H., Nummenmaa, Lauri, Yu, Rongjun, Engell, Andrew D.,
Ewbank, Michael P., and Calder, Andrew J. (2010). Autism spectrum traits in the typical
population predict structure and function in the posterior superior temporal sulcus. Cerebral
Cortex, 21(3), 493-500.
Vulkan, Nir (2000). An economists perspective on probability matching. Journal of Economic
Surveys, 14,101-18.
Wakabayashi, Akio, Baron-Cohen, Simon, and Wheelwright, Sally (2006). Are autistic traits an
independent personality dimension? A study of the Autism-Spectrum Quotient (AQ) and
the NEO-PI-R. Personality and Individual Differences, 41, 873-83.
Walter, Mary Ann (2008). Heading toward harmony? Vowel cooccurrence in the Croatian
lexicon. Paper presented at the Symposium on Phonologization, University of Chicago.
References 327

Wang, William S.-Y. and Charles J. Fillmore (1961). Intrinsic cues and consonant perception.
Journal of Speech and Hearing Research 4,130-6.
Lehiste, I., Chuang, C. K., and Darnovsky, N. (1976). Perception of vowel duration. Journal
of the Acoustical Society of America, 60, 892.
Watkins, Kate and Paus, Toms (2004). Modulation of motor excitability during speech per-
ception: The role of Brocas area. Journal of Cognitive Neuroscience, 16(6), 978-87.
Strafella, A., and Paus, T. (2003). Seeing and hearing speech excites the motor system
involved in speech production. Neuropsychologia, 41, 989-94.
Watters, John Robert (1979). Focus in Aghem: A study of its formal correlates and typology. In
Aghem grammatical structure, Southern California Occasional Papers in Linguistics 7, pp.
137-97. University of Southern California, Los Angeles.
Wedel, Andrew (2004a). Category competition drives contrast maintenance within an
exemplar-based production/perception loop. In Proceedings of the seventh meeting of the ACL
special interest group in computational phonology, Barcelona, Spain, pp. 1-10. Association for
Computational Linguistics.
(2oo4b). Self-organization and categorical behavior in phonology. PhD thesis, UC Santa
Cruz.
(2006). Exemplar models, evolution and language change. The Linguistic Review, 23,
247-74.
(2007). Feedback and regularity in the lexicon. Phonology, 24,147-85.
Weinreich, Uriel, Labov, William, and Herzog, Marvin I. (1968). Empirical foundations for a
theory of language change. In Directions for historical linguistics: A symposium (eds. W P.
Lehmann and Y. Malkiel), pp. 95-188. University of Texas Press, Austin, TX.
Weiss, Michael (2010). Outline of the historical and comparative grammar of Latin. Beech Stave
Press, Ann Arbor, Mich.
Weisstein, Eric (2009). Beta distribution. From MathWorld-A Wolfram Web Resource.
Retrieved on 2009-03-15.
Welsh, Timothy and Elliot, Digby (2005). The effects of response priming on the planning
and execution of goal-directed movements in the presence of a distracting stimulus. Acta
Psychologica, 119,123-42.
Werker, Janet F. and McLeod, P. J. (1989). Infant preference for both male and female infant-
directed talk: A developmental study of attentional and affective responses. Canadian Journal
of Psychology, 43(2), 230-46.
Pons, Ferran, Dietrich, Christiane, Kajikawa, Sachiyo, Fais, Laurel, and Amano, Shigeaki
(2007). Infant-directed speech supports phonetic category learning in English and Japanese.
Cognition, 103,147-62.
Shi, R., Desjardins, R., Pegg, J. E., Polka, L., and Patterson, M. (1998). Three meth-
ods for testing infant speech perception. In Perceptual development: Visual, auditory, and
speech perception in infancy (ed. A. Slater), pp. 389-420. Psychology Press, Hove, East
Sussex, UK.
and Tees, Richard C. (1984). Cross-language speech perception: Evidence for perceptual
reorganization during the first year of life. Infant Behavior and Development, 7, 49-63.
Westbury, John and Keating, Patricia A. (1986). On the naturalness of stop consonant voicing.
Journal of Linguistics, 22,145-66.
328 References

Werker, Janet E, Hashi, Michiko, and Lindstrom, Mary J. (1998). Differences among speakers
in lingual articulation for American English /r/. Speech Communication, 26, 203-26.
Wetzels, W. Leo (2007). On the representation of nasality in Maxacali: evidence from Por-
tuguese loans. Appeared (2006) in Portuguese translation, Sobre a representaco da nasal-
idade em Maxacali: evidencias de emprstimos do Portugus. In Descrio, Historia e
Aquisiao do Portugus Brasileiro, pp. 217-40. Pontes/FAPESP, Campinas.
Whalen, Douglas H. (1990). Coarticulation is largely planned. Journal of Phonetics, 18, 3-35.
(1991). Subcategorical phonetic mismatches and lexical access. Perception and Psy-
chophysics, 50, 351-60.
Levitt, Andrea G., and Goldstein, Louis M. (2007). VOT in the babbling of French- and
English-learning infants. Journal of Phonetics, 35(3), 341-52.
Wheeldon, Linda R. and Lahiri, Aditi (1997). Prosodie units in speech production. Journal of
Memory and Language, 37, 356-81.
(2002). The minimal unit of prosodie encoding: prosodie or lexical word. Cogni-
tion, 85(2), 631-641.
and Levelt, Willem J. M. (1995). Monitoring the time course of phonological encoding.
Journal of Memory and Language, 34, 311-34.
Wheelwright, Sally, Baron-Cohen, Simon, Goldenfeld, Nigel, Delaney, Joe, Fine, Debra, Smith,
Richard, Weil, Leonora, and Wakabayashi, Akio (2006). Predicting Autism Spectrum Quo-
tient (AQ) from the Systemizing Quotient-Revised (SQ-R) and Empathy Quotient (EQ).
Brain Research, 1079, 47-56.
Wilson, Colin (2006). Learning phonology with substantive bias: An experimental and com-
putational study of velar palatalization. Cognitive Science, 30, 945-82.
Windfuhr, Gernot L. (1997). Persian phonology. In Phonologies of Africa and Asia (ed. A. Kaye),
Volume 2, pp. 675-90. Eisenbrauns, Winona Lake.
Witkin, Herman A., Moore, Carol A., Goodenough, Donald R., and Cox, Patricia W (1977).
Field-dependent and field-independent cognitive styles and their educational implications.
Review of Educational Research, 47(1), 1-64.
Wong, Patrick C. M. and Perrachione, Tyler K. (2007). Learning pitch patterns in lexical iden-
tification by native English-speaking adults. Applied Psycholinguistics, 28, 565-85.
and Parrish, Todd B. (2007). Neural characteristics of successful and less successful
speech and word learning in adults. Human Brain Mapping, 28, 995-1006.
Wright, Jonathan (2007). Laryngeal contrasts in Seoul Korean. PhD thesis, University of Penn-
sylvania, Philadelphia, PA.
Wright, Richard (1996). Consonant clusters and cue preservation in Tsou. PhD thesis, UCLA.
and Ladefoged, Peter (1997). A phonetic study of Tsou. Bulletin of the Institute of History
and Philology, Academia Snica, 68, 987-1028.
Xu, Yi (1997). Contextual tonal variation in Mandarin. Journal of Phonetics, 25(1), 65-83.
Yamada, Reiko A. and Tohkura, Yoh'ichi (1992). The effects of experimental variables on
the perception of American English /r/ and III by Japanese listeners. Perception and Psy-
chophysics, 52, 376-92.
Yang, Charles D. (2001). Internal and external forces in language change. Language Variation
and Change, 12(3), 231-50.
(2002). Knowledge and learning in natural language. Oxford University Press, New York.
References 329

Yu, Alan C. L. (2004). Explaining final obstruent voicing in Lezgian: Phonetics and history.
Phonology, 24, 73-97-
(2007). Understanding near mergers: The case of morphological tone in Cantonese.
Phonology, 24(1), 187-214.
(20ioa). Perceptual compensation is correlated with individuals' 'autistic' traits: Implica-
tions for models of sound change. PLoS ONE, 5(8), 11950.
(2oiob). Tonal effects on perceived vowel duration. In Laboratory Phonology 10 (eds.
C. Fougeron, B. Khnert, M. DTmperio, and N. Valle), pp. 151-68. Mouton de Gruyter,
Berlin.
(2011). On measuring phonetic precursor robustness: A response to Moretn 2008.
Phonology, 28(3), 491-518.
Abrego-Collier, Carissa, Baglini, Rebekah, Grano, Tommy, Martinovic, Martina, Otte,
Charles III, Thomas, Julia, and Urban, Jasmin (2011). Speaker attitude and sexual orientation
affect phonetic imitation. Penn Working Papers in Linguistics, 17(1), 235-42.
Zimmer, Karl (1985). Arabic loanwords and Turkish phonological structure. International Jour-
nal of American Linguistics, 51, 623-5.
Zipf, George Kingsley (1932). Selected studies of the principle of relative frequency in language.
Harvard University Press, Cambridge, MA.
Zuraw, Kie (2003). Probability in language change. In Probabilistic Linguistics (ed. Reus Bod,
Jennifer Hay, and Stefanie Jannedy), pp. 139-76. MIT Press.
(2007). The role of phonetic knowledge in phonological patterning: Corpus and survey
evidence from Tagalog infixation. Language, 83(2), 277-316.
This page intentionally left blank
Language Index
Aari 66 French 22, 36-8, 130, 132, 136-9, 151-2,
Aghem 24-5 155-6, 158-61, 249-50
Akuapem/Asante 21
Athabaskan 66, 72 German 67-8, 75, 83, 104-6, 109-10, 139,
151-2, 155-6, 158-61
Bole 12-13 Giryama 12-13
Bondei 12 Gonja 21
Greek 67, 69-71, 73-6
Cantonese 145,153, 219-220
Central Tibetan 65 Hindi 132, 204, 253
Chiche wa 12 Hu 101-2

Chumburung 21
Cokwe 13 Ikalanga 6,15
Creek 104
Czech 132, 204 Japanese 15, 63,105-6,109-10,132,137,
139,204
Dagbani 21
Digo 12-13 Kauma 12
Dutch 100,137,151-2,155-6,158-61 Kinande 14
Limburgian 100 Korean 99-100, 149, 151-2, 155-6, 158-61,
184-5, 196, 204, 229-30, 233, 238-43,
English 6, 8, 20, 22, 24, 42, 45-6, 51, 58, 245-6
73-8, 75, 79, 3-5, 87, 117, 1*3, Middle 99-100
130-41, 144-5, 151-2, 154-61, 168, Modern Seoul 99-100
182-3, 185, 187-8, 192, 194, 204-5,
212, 219, 249, 262-3, 265, 267, 269, Latin 67-8, 72-3, 75-6, 83,103
271-3,275,277,279,281, Late Spoken 102
283-4 Luganda 14, 21
African American Vernacular 137
British 20, 123, 263-5 Makua 13
Cockney 137 Malagasy 24,134
Middle English 68-9 Masa 14-15
Old English 42, 68-9, 72-3 Mentu Land Dayak 15
Estonian 101, 106-7 Mijikenda 12-13
Ewe 6, 10 Miya 12, 16
Musey 12,14
Fante 21 Mwiini 12
Filipino 133-4, 138
Finnish 248-9 Namwanga 12
332 Language Index

Navajo 66 Sundanese 15, 75-6


Ndebele 12 Swahili 13
Ngizim 6, 12-13, 16 Swati 12
Ngulu 12
Nlaka'pamux 132 Tagalog 24
Nupe 6 Thai 104-6,109-10,136,145, 204
Tiene 19
Old Sardinian 67 Tikar 21
Tsonga 16
Pare 12 Turkish 249, 253
Persian-Iranian 102
Podoko 12-13 U 101
Pokomo 12 Ulu Muar Malay 15
Portuguese 182
Punu 16-19, 21 Venda 13

Rihe 12 Xhosa 12

Sea Dayak 15 Yaa 13


Secwepemctsin 75 Yaka 13
Serbo-Croatian 151-2,154-6,158-60 Yulu 11-13
Shambala 12
Sncicu?umscn 79 Zar 12-13
Spanish (Latin American) 105-6 Zigula 12
Suma 11-12 Zulu 6, 12
Subject Index
actuation 51, 53, 65, 83-4, 90, 96, 201-2, trigger 5, 7, 9-10, 15-18, 51, 57, 75, 79,
226, 253, 258, 262-4, 282 84, 167, 169, 174-5, 205, 220, 249,
altered auditory feedback 89 273
analogical change 31, 42, 45-6, 51, 82, coarticulation (vowel-to-vowel) 113,125,
272-3 249-50, 252, 254, 259-60
analogy 9,13, 22, 51, 82, 263, 275, 282 confusability 37-8, 41, 64, 272
articulatory complexity 37, 42-4 consonant harmony 66-7, 94-5, 97
articulatory exemplars 87 consonant-vowel harmony 79
Artificial Grammar Learning 183, 194, 196 constraints 12, 19, 31, 52-3, 55-7, 59-62,
aspiration 10,12-13, 20, 76-8, 89,136,176 65, 68, 78, 81-2, 85, 92, 97, 112-16, 127,
assimilation 6, 18, 41-2, 54-7, 66, 76, 114, 130, 144, 149, 183-4, 230-1, 248, 252-3
119, 123, 126, 167, 172-3, 176-8, 250 contrast 4-5, 7-21, 25-6, 62, 64, 79-80, 82,
ATR 16-19,21 98-106, 109, 112-16, 122, 125-7, 129,
Autism 206, 209, 220 131-40, 145, 154, 159-60, 173, 182-3,
Autism spectrum disorder 206-7, 210, 185-7, 189, 204-5, 215, 220, 228-232,
226 234-46, 250, 253, 258, 283
Autism-Spectrum Quotient (AQ) 206-9, corpora 25, 36, 60, 67, 77, 88-9, 114, 125,
211-15, 217, 219-26 130, 141, 143-4, 146, 150-3, 159, 183,
autonomous agent simulation 85 204, 250, 256, 259, 264, 279-80, 282,
284
Bayes 33,234,255 British National Corpus 264
bias 31, 44-7, 52-3, 56, 58-68, 70, 72, Corpus of Historical American
78-87, 89-90, 92-7,107, no, 129,132, English 284
144-5, i63, 192, 194, 201-2, 206, 213, coupling 203, 275, 277-9
230, 232, 237-9, 24i-7> 253, 258, 260, cue informativeness 229-30, 232-40, 242-6;
272 see also cue quality
bifurcations 263,279,281-2
borrowings 10, 12, 15, 22, 249 deletion 18,36-7,41-3,46,88,253
boundaries 8, 20-1, 62, 140, 152, 183-6, demarcation 19
189, 194, 196-7 depressor consonants 6,10-16,18
breathiness 10, 20 Derived Environment Effects 183-4, 196-7
diatonic stress shift 262, 264-5, 270, 272-3,
Canadian raising 79,157 275, 282
CELEX 152, 154, 163 dictionaries 263-5, 268, 271, 283
change: discarding 279-80, 282
target 30, 40, 55, 71-2, 74-80, 84, 88-96, discontinuous phonetic bias 94-6
103-4, 106, 111-19, 123-7, 129, 202, discrete dynamical systems 277
219, 229, 231, 236-9, 241-2, 244-6, dispersion 92, 112-13, 115-16, 126-7, !45>
249, 252, 261, 271 258
334 Subject Index

dissimilation 54-7, 66, 70, 74-8, 113, fixed point 268,278-81


116-19, 122-3, 125-7, 153> 155-6, 169, foCUS 30, 39-40, 122, 212, 2l8, 220-1, 226

171-3, 175-80 frequency:


distributional constraints 12, 19 fundamental 99, 103-5, i35> 141-2, 189,
dual-representation model 87-8, 90 205, 207, 246
dynamical systems 268, 271, 277-8 lexical 21, 25, 30, 36-9, 41-4, 46, 82-4,
90-1, 93-6,115,130-2,138-9,145-6,
efficiency 36, 41, 46, 231 150-2,154-6, 159-63,165,169,195,
emergence 3, 74, 80, 83, 97,130,182,184, 197, 263-5, 270, 272-5, 278-84
209, 212, 225, 247, 249-53, 255-7, fricatives 8, 44-5, 52, 62-3, 68-9, 72-4, 79,
259-61 83, 89, 92-3, 97, 132, 134-5, 137-8,
empathy 207, 209-10, 212, 218, 225 164, 174, 176, 204, 207
Empathy Quotient (EQ) 207, 209-19, functional explanation (diachronic) 259
222, 224-5
English stress 262-72, 275, 277-80, 282 gestural mechanics 59-60, 62, 64-5, 69,
enhancement 8-9, 11, 25, 62, 71-2, 74, 78-80, 97
78-80, 85, 96-7, 229-32, 236-9, 241-6 gestures 16,19, 40, 59, 62-5, 96,111,145,
adaptive 229-30, 244 167, 173, 178,231
probabilistic 85, 230, 232, 236, 238-9, gliding 68
241, 243-6 glottal stop 20-1,63
entropie contribution 31-4, 37-8, 47 gradient phonetic bias 93-4, 96
entropy 29-38,40-1,44,46-7 grammaticalization 5, 19-20, 22-3, 25-8
epenthesis/epenthetic 32, 36-8, 40,178
exemplars 55, 80, 84-7, 89-96,111,114-15, H & H theory 220
124-7,138, 202, 209, 219, 236, 238-9, Habituation 133
260 Harmony 16-17, 19, 52, 66-7, 78-9, 94-5,
exemplar-based model 84, 86, 88, 90, 97, 113-15 125-6, 151-2, 156-9^
95-6, 113-15, 125, 162, 202, 236, 247-61
252-3,260 historical linguistics 29, 54
exemplar-based phonology 55, 92 hypercorrection 55-6, 64, 70, 74, 98, 156,
exemplar theory 86-7, 114, 124-5, 151 230
162-3 hyp o correction 55-6,64-5,113-14
expectation 38-40, 44, 46,182-4,192,194,
196 ideal observer contrast precision 233-4, 239
expectedness 31, 38-9, 42, 46 imitation 86-7, 89-90, 93, 104, 144, 202,
253
/o 8-16, 100-11, 118-22, 139-44, 228-31, implosives 10-11,14
233, 238-46 individual differences 63, 84, 203, 205,
features 210-11, 215, 219, 225-7
feature effects 165-6,174 infant-directed speech 130,138, 141,145-6
phonological 7-11, 15-19, 21-2, 39, 56, information 25, 31-2, 36-7, 41, 114-15,
62, 72, 75-81, 90,102,114,140-1,145, 125, 135, 154, 163, 181-2, 191, 197,
153,163-80,182,185,191,196, 205, 205-7, 220-1, 233-5, 251-3, 255
228, 231, 248, 254-6, 262 information theory 30-2, 36, 249
Subject Index 335

inhibition 60-1, 66, 77-8, 89, 113, 115, 117, [nasal] 167-8,170-2,174,176-7,179-80
119,123-7 nasals 5, 9,11-16,18, 20, 62, 64, 69, 76, 80,
innovator 53, 83-4, 202, 219, 221, 224 85, 92,132-4,149,154,167-8,170-2,
input filtering 279 174, 176-7, 179-98, 249-50
instability 31, 40-3, 93, 144, 209 nasal assimilation 167
intonation 20, 139 natural classes 166-7,169,196
Italian 63, 67, 76,139, 206 noun/verb pairs (English) 263-6, 268-73,
iterated learning 252,256,260 275-7,279, 282-4
iterated maps 277
obstruent-glide fusion 73
Kullback-Liebler (KL) divergence 241,
243-5 P-base 167
palatalization 5, 54, 64, 69-71, 76, 97,
leader 202, 208, 221, 224, 226 184-5, 219-20
levels (of representation) 7-8,85 partitioning 169-70, 172-5, 178, 180, 276
lexical diffusion 96, 263-4, 273 perception 38-9, 41-2, 44, 52-4, 56, 58-60,
lexicon 20-1, 88, 138, 150-2, 154, 157, 63-4, 66-7, 70-3, 78, 81, 85-8, 90, 92,
159-63, 236, 238-9, 249-50, 253-4, 94, 97, 99, 101, 103-5, 108-11, 114,
256,258,275 127-38, 140, 144-5, 153-4, 156, 183-5,
linguistic population 262-3,278 187, 196, 203-7, 209, 212, 216-20,
listener-based misperception 263 224-7, 229-33, 236, 239, 245-7, 253,
Literature Online 273 263,272
long-distance displacement (nonlocal perceptual compensation 64, 84-5, 92,
metathesis) 66-7 127, 154, 207-9, 211, 215-18, 220-1,

Lyman's law 15 224-6,260


perceptual distinctiveness 30, 36-7, 39,
Markedness 39, 62, 85,149-51,165 41, 44, 46, 112, 115
metathesis 42-3, 45-6, 52, 54-7, 60, 66-7, perceptual parsing 59, 63-5, 70-4, 78, 97,
82,178; see also long-distance 201
displacement perceptual reorganization 132,145
misperception 41, 56, 58-9, 63-4, 66-7, phonetic accommodation 89-90
70-3, 85, 103, 129-30, 138, 140, 153, phonetic bias 56, 59-62, 79, 85, 92-5,
156, 220, 230, 247, 263 237-9, 241, 246
mistransmission 272, 279-80, 282 phonetic convergence 202
mixture model: phonetics
finite 230, 232 language-specific 4, 6-7, 37, 53, 60, 63,
Gaussian 232 79, 82, 97, 144, 260
modeling (computational) 39, 251, 253, 259, universal 4, 6-7, 13, 19-20, 22, 36, 53, 60,
262, 271 81-2, 97, 138, 162, 165-6, 173-4,
motor control 89,116,130 178-9, 181, 201
motor planning 57, 59-61, 65-7, 74-5, 78, vs. phonology 5
94~5> 97 112-13, 116, 122, 127, 181, phonologically active class 165,167-8,
201, 230 173-4, 177
336 Subject Index

phonologization i, 3-9, 14-23, 25-31, 36, subphonemic analogy 82


38, 40, 45, 47, 51-3, 56, 63, 78, 80-2, support clique 210-11, 222, 224
97-9, 112-14, 125-30, 138, 145 surprisal 30-4, 36-47
149-53, 156-7, 161-3, 165, 167, 173-9, syllable 8, 17-20, 42, 46, 52, 57, 59, 67, 70-1,
183-4, 196, 201, 228-33, 238, 241, 74, 78, 81, 88, 99-102, 104, 106, 111,
243-6, 248-50, 259-60 114, 123, 133-4, 138-9, 183, 188, 194,
planning 19, 57, 59-61, 65-7, 74-8, 94-5, 207, 239, 272
97, 112-13, 116, 122-5, 127, 181, 201, sympathy group 210-11,222-4,230
230 systemizing:
population dynamics 282 skills 203, 209-10, 213, 218-9, 222,
prefix 6, 17, 26, 273, 275-6, 283 224-5
prenasalized consonants 9,11,14,16 Systemizing Quotient (SQ) 207, 209-19,
probability 30, 32-9, 43, 45, 58, 94, 124, 222, 224-5

143-4, 230, 232-5, 237, 239, 253, 255,


261, 268, 271, 277-82 tone 6-16, 77, 98-106, 113, 116, 118-23,
prosodie constituents 19 124-5, 139-41, 144-5, 153, 204-5,
219-20, 228
radio speech 271 transphonologization 8-9, 13-15, 19, 23, 25,
reduction 18, 23, 25, 31, 41, 46-7, 52, 54-5, 229, 238
57, 83, 88, 104, 185, 253 typology 43, 5i~5, 57~9, 67, 76-7, 81-2,
reliability 41, 46, 232, 234, 236-7, 239 129-34, 136-8, 144-5, 157, 178, 210,
219,247,249,251
selectional bias 81-2, 97, 230
simulation of sound change 85-6, 90-6, uncertainty 30, 32-4, 36, 55, 234
124,127, 230, 236-46, 248, 253, 255-6, underphonologization 150-3, 156-7, 161-3
260-1 underspecification 15
social clique 210-11, 222-4
social network 203, 208, 210, 221, 224-5, variation:
251 between speakers 271
sociolinguistic awareness 84 within speakers 117-18,120, 271
sociolinguistics 40, 83-5, 96, 202-3, 208-9, [voice] 8-9,11,15-16,143,168-72,174-8
212,225-7 voice onset time 89,135-7,140-4,185, 207,
[sonorant] 16, 168, 170, 172-175, 177-178 229,231,233,238-44
speech: voicing 6, 8-9, 11-13, 15-16, 20, 52, 55,
aerodynamics 60-2, 201 61-3, 68, 80-3, 97, 135-7, 140-5,
errors 54, 59-60, 65-7, 76-7, 85, 87, 92, 156-7, 159-62, 164, 169, 175, 177-8,
94 185, 228-31, 246, 272
mode of perception 88,90 vowel harmony 16-17, 52, 113-15, 125-6,
motor plans 87, 89, 112 152, 158-9,247-53,259,260
stress shift stability 263-4, 277-9, 282
structure preservation 45 word frequency 82-3, 90, 197, 263, 282

Anda mungkin juga menyukai