Kristian Berg
kristian.berg@uni-oldenburg.de
1. Overview
Consonant doubling in English shows regularities on many levels. Some regularities are graphotactic1, some are
morphological, some are prosodic, and some seem to be related to etymology. At first glance, the main
regularity seems to be phonographic2: There is a close correlation between double consonants letters and short
(or ‘lax’) vowel phonemes3. The main problem that arises in the description is that this regularity holds only in
one direction: Almost all double consonants indicate that the preceding vowel has a short correspondence (e.g.
<hammer>4, <getting>), but the opposite implication is not valid: Far from all short vowels are marked with
double consonants (c.f. e.g. <limit>, <divinity>). For the writer, this poses a major problem: Exactly when are
short vowel marked by double consonants and when not? The writer is affected as well, albeit indirectly. Since
some short vowel are not marked by consonant doubling (e.g. <lemon>), the resulting words are isomorphic to
words with long vowels (e.g. <demon>). Exactly when does a single vowel letter followed by a single
consonant letter correspond to a long vowel, and when does it correspond to a short vowel? These questions are
This paper sets out to answer them, both from the reading and from the writing perspective, using
correspondences between graphemic and phonological forms that are established on the basis of the large lexical
database CELEX (Baayen et al. 1995). The first insight into the matter is that morphologically complex words
1 Graphotactic regularities are the written analog to phonotactic regularities in spoken language. They pertain to the written
side of words alone and capture the combinatorial principles of letters and graphemes. For example, the fact that <x> is,
unlike most other letters, never doubled is an instance of a graphotactic regularity. It cannot be explained by reference to any
other linguistic level.
2 Phonographic regularities describe the relation between units of writing and units of sound. The standard case are phoneme-
grapheme-correspondences, i.e. the relation between phonemes and graphemes. The grapheme <m>, for example, regularly
corresponds to the phoneme /m/.
3 In this paper, I will use British English as a point of reference. Following Jones et al. (182011), the vowels [ɪ], [æ], [ʊ], [ɒ],
[ʌ] and [e] will be referred to as short. The debate around short vs. long (e.g. Jones et al. 182011), tense vs. lax (e.g. Halle &
Mohanan 1985), free vs. checked (e.g. Kurath 1964) vowels, how to define these categories, and which category is the most
suitable phonologically or phonographically, will be ignored here. As Cummings (1988: 53f.) demonstrates, at least the tense
vs. lax contrast is rooted in spelling. This highlights the crucial importance of keeping the representational levels apart
analytically: If lax vowels are defined phonographically, i.e. as those phonemes that correspond to written vowels in
graphemically closed syllables (as e.g. <bit>, <dinner>, <pudding>), and we use this category to describe phonographic
correspondences (e.g. as in ‘vowel letters before double consonants correspond to lax vowel phonemes’), we end up with a
perfectly tautological statement.
4 Written and spoken words and segments are distinguished as follows: Written words and parts of words are presented in
angled brackets (e.g. <word>); spoken words and parts of words are presented in square brackets (e.g. [wəːd]). If the medial
realization is irrelevant, the word or part of word is italicized (e.g. word).
1
and morphologically simple words behave differently. Double consonants in morphologically complex words
can be motivated with reference to graphemic word-formation rules: Under certain well-defined conditions, the
stem-final consonant of a base is doubled when a vowel-initial suffix operates on that base (<bet> – <betting>).
As will be argued below, these conditions may be framed in purely graphemic terms without reference to
phonology. Morphologically simple words, on the other hand, are different in that the occurrence (or non-
occurrence) of double consonants hinges on the word ending. In this paper, the term word ending denotes
recurring word-final entities without meaning, but with distributional properties (see section 4, Fn. 7 for a more
thorough definition). For example, words which end with <-it> are very likely to occur with single intervocalic
consonant letters (<limit>, <profit>, <spirit>), while words which end with <-ow> are likely to occur after
double consonants (e.g. <sorrow>, <willow>). Perhaps surprisingly, both kinds of word endings sometimes
have the same phonological correspondence: Words that end with <-ic> almost never double the preceding
consonant letter (e.g. <panic>), while words that end with <-ick> almost always do (e.g. <derrick>). Across
both morphologically simple and complex words operate graphotactic constraints. For example, not all single
consonant letters can be doubled. These three areas will be covered in the following, starting with graphotactics
(section 2), moving on to morphologically complex words (section 3) and finally to morphologically simple
words (section 4). The last part (section 5) is a summary and discussion of the main findings.
2. Graphotactic constraints
The first observations come from a purely graphemic perspective. Generalizing over the set of all graphemic
words in the lexical database CELEX (Baayen et al 1995) we get (1) and (2):
Of these, double <k>, <v> and <z> are comparatively rare (for <v> and <z> cf. Venezky 1999: 6).5
(2) Double consonants occur after a single vowel letter and before one or more vowel letters. There are
(2a) Exception 1: Consonant doubling can occur preceding or following a consonant letter in compounds
(e.g. <headdress>, <granddaughter>) or words with Latinate prefixes (e.g. <apply>, <suppress>).
5Additionally, <ck>, <dg>, and <tch> have a distribution very similar to that of double consonants; Venezky
(1999: 14 et passim) calls them ‘pseudogeminates’. Synchronically, <ck> can be argued to be the doubled
variant of <c>, cf. e.g. <picnic> - <picnicked> - *<picnicced>.
2
Double consonants in this position could be utilized to determine the morpheme boundaries: The fact that the
probability of <ndd> occurring morpheme-internally is low is a viable cue for a morpheme boundary. The
number of Latinate prefix words with double consonant followed by a consonant is quite low (<acclaim>,
The word ending <-le> is related to <-el>; as suffixes (e.g. sparkle, as opposed to non-functional word-endings
as in bottle), they are allomorphs6. <-le> is the only formative that shows this “inverted” behavior (Carney 1994:
124f., 277). Other suffixes that correspond to syllabic consonants are spelled differently; the spellings *<bakre>
b. <block>, <trick>, <stiff>, <gaff>, <mass>, <fuss>, <bell>, <hill>; <staff>, <class>, <call>
The words in (3a) can be accounted for by a constraint demanding that lexical words in English have at least
three letters (cf. Jespersen [1909] 1928: §4.96). This is not specific for final double consonants, it also applies to
words like <bee>, <toe>, <rye> etc. Thus the double consonants in (3a) are independently motivated.
The words in (3b) show the regularly occurring word-final double consonants, <ck>, <ff>, <ss>, and <ll> (cf.
There are two respects in which this group (resp. a sub-group) differs from the other consonants.
1.: In the context where <ck>, <ff>, <ss> and <ll> occur - directly after a single vowel letter - neither of the
single consonant letters <c> (or <k>), <f>, <s> or <l> can occur (4a). This sets the group apart from the other
consonant letters (4b); there are only a few exceptions to this (most notably the suffix <-ic>/*<-ick>):
6The condition for their distribution seems to be the stem-final phoneme: <-el> appears “after v, th, ch, n, as in hovel,
brothel, hatchel, kernel” (OED).
3
(4) a. *<bac>/*<bak>, *<rif>, *<kis>, *<hil>
Likewise, Brame (1983b) points to the fact that the letter names for <f>, <s>, and <l> are <eff>, <ess>, and
<ell> respectively, as opposed to <em>/*<emm> and <en>/*<enn> for example. There is thus no opposition in
this context. This does not explain word-final consonant doubling for <c>/<k>, <f>, <s>, and <l> – but it makes
2.: Moreover, the vowel letters preceding word final double consonants in (3b) do not always correspond to short
vowel phonemes; this holds for <staff>, <class>, <call> and similar words (cf. Carney 1994: 124f.)7. This again
sets the group <f>, <s>, <l> apart from other consonant letters.
These are the graphotactic constraints on consonant doubling in English: Not all consonant letters occur as
geminates. Double consonants occur after a single vowel letter and before one or more vowel letters. There are
some well-defined exceptions to these constraints. They hold across both morphologically simple and
Consonant doubling regularly occurs at morpheme boundaries. The following requirements hold (cf. e.g.
(5) Consonant doubling occurs before a suffix with an initial vowel letter if a) there is a single base-final
consonant letter from the set in (1) before the boundary and b) this consonant letter is preceded by a
This regularity is purely graphemic. It captures almost all double consonants in inflectional products (like
<swimming>, <pinned>, <hotter>) and most of the derivational ones (like <baggage>, <potter>, <fattish>). As
noted above, the statement refers to the graphemic structure of the base. This explains spellings like the
7 This only holds for British English; in American English, the vowels in staff and class are short.
4
(6) a. <come> – <coming>
b. <look> – <looking>
c. <stem> – <stemmed>
In (6a), final <e> is dropped because the suffix begins with a vowel (see below). This leads to a potentially
misleading correspondence for <o> (cf. <homing>, <zoning>), but consonant doubling does not occur because
the requirement (5) holds (in the base word, the <m> is not graphotactically final). Likewise, in (6b) the
consonant is not doubled because there is more than one preceding vowel letter (<oo>). In (6c), although the
<e> in the suffix is silent, it still triggers doubling of the preceding consonant letter because -ed is a suffix with
What is the reason for this reference to base forms, which leads to deviations from (more or less) regular
phonographic correspondences? The systematic reason for consonant doubling can be found in <e>-deletion
before suffixes with initial vowel letters like <-ing>, <-er>, <-ance> etc. For example, Carney (1994: 129) and
Palmer et al. (2002: 1577) observe that while in monomorphemic words the differences in the corresponding
vowel qualities are expressed by the presence or absence of final <e> (e.g. <hope> – <hop>), in morphologically
complex forms (with vowel-letter-initial suffixes), the same difference is expressed by the presence or absence
of a double consonant (e.g. <hoping> – <hopping>). Carney (1994: 129) adds that a constant marking of long or
diphthongic correspondences by silent <e> (even in morphologically complex words) would lead to spellings
with vowel letter clusters <ei>, <ee> like *<takeing> and *<takeer>. Carney supposes that this spelling is not an
option because it would interfere with the correspondences these clusters have in monomorphemic words (cf.
<heir>, <peel>).
While that is certainly true, it is possible to go further. The second syllable in *<shakeer>, for example, could
potentially be interpreted as stressed and long (cf. <career>, <veneer>, <engineer>). The framework of Evertz &
Primus (2013) offers a suitable explanation for this: graphemic syllables with complex vowel letters are ‘heavy’,
The need to mark short vowels in morphologically complex forms thus follows from the impossibility to retain
final <e> before vowel-letter-initial suffixes. If the argument sketched above is correct, we would expect final
<e> and consonant doubling in essentially the same positions. For final <e>, there are two major constraints: It
does not occur after complex vowel letters (7a), and it rarely occurs after consonant letter clusters (7b) (cf.
5
(7) a. <bake>, *<baike>; <note>, *<noute>
As noted above in (5), consonant doubling at morpheme boundaries follows exactly these two constraints: there
is no doubling after a complex vowel letter (8a) or after another consonant letter (8b).
Consonant doubling is thus subject to the same constraints as final <e> is. This is further evidence for the close
relation of both markers. Moreover, when words with final <e> and consonant-initial suffixes are combined, the
<e> is not dropped (cf. <state> – <stately>, *<statly>), and accordingly stem-final consonants are not doubled
either (cf. <bad> – <badly>, *<baddly>). Consonant doubling thus seems like a backup solution: Whenever
final <e> is not available in a morphological process, consonant doubling helps out.
Summing up, double consonants are motivated by the need to mark vowel quality in morphologically complex
words with vowel-initial suffixes. This need in turn arises because final <e> cannot be employed for reasons of
graphemic syllable weight. As a consequence, morphologically complex words are formed very regularly on the
graphemic form of the respective base. This may lead to deviating correspondences in the morphologically
complex form. In the following, I will examine both inflectional (3.1) and derivational (3.2) cases in more detail.
3.1 Inflection
If an inflectional suffix (except -s) operates on a monosyllabic basis, statement (5) is exceptionless. This leads to
8 Note that in most cases involving -ed, the suffix is phonologically a single consonant (/t/ or /d/), and the inflected form has
the same number of syllables as the base. So phonographically, <band> would be a good spelling for /bænd/. Yet the suffix
is graphemically vowel-intial, and thus the requirement in (5) holds, which leads to consonant doubling. This is further
evidence for the graphemic nature of consonant doubling. It also serves to give the suffix a distinct (and almost unique)
graphemic form (cf. Berg et al. 2014).
6
The reference to the graphemic form of the base also serves to explain why some spellings are excluded for
irregular past tense forms like kept and slept: Phonographically, *<kepped> and *<slepped> should be possible
(cf. e.g. <prepped>), but the fact that there is no base <kep> and <slep> prevents these spellings.
There are two ways to deal with these exceptions. The data in (10) can be straightforwardly explained by
hypothesis (11), which I will call the prosodic hypothesis and which is an additional requirement to (5) above:
(11) Consonant doubling occurs if the base-final syllable receives phonological stress.
The bases in (10) and many others are trochees with unstressed ultimates, and accordingly the base-final
consonants should not be doubled. On the other hand, for iambic bases – i.e. bisyllabic bases with an unstressed
Cases like <panicking> and <picnicking> are prima facie counterexamples – the consonant is doubled even
though the base-final syllable is unstressed. But as e.g. Venezky (1999: 83) and Carney (1994: 223) have pointed
out, this is a very special case: Without consonant doubling, the stem-final <c> in *<picnicing> would be
wrongly assumed to correspond to [s]. Thus, <ck> can be motivated here by the pursuit of phonographic
consistency.
However, compounds (and pseudo-compounds) are not covered by the prosodic hypothesis:
The words in (13) should not have a double consonant according to the prosodic hypothesis because the base-
final syllable is unstressed. One could advocate that in these cases the stem-final syllables bear secondary stress
(cf. Cummings 1988: 165). But it is far from clear that the second syllables in sandbag, babysit and airdrop are
intonationally categorically different from those in visit and author. The difference might just lie in the full vs.
reduced vowel quality (Bolinger 1986:. 351 N.3; 1989: 215f.; Fudge 1984: 31)9. The intonational difference
between e.g. outfit vs. trumpet seems to be a matter of degree and personal preference.
9As Bolinger (1986: 351 N.3) puts it: “Normally, syllables after the stress behave intonationally the same regardless of
whether they are full or reduced. The pitch contour is the same in both máypole and máple […]”.
7
An alternative way to capture the data in (13) is (14), which I will call the morphological hypothesis:
(14) Consonant doubling occurs if the base-final graphemic syllable is a monosyllabic root.
This easily captures cases like sandbag, babysit, airdrop. Cases like <visiting>/*<visitting> and
<authored>/*<authorred> are also covered by the morphological hypothesis (no base-final monosyllabic root →
no consonant doubling), as well as cases like <inferred>, <beginning>, <remitted>, and <occurring> (base-final
monosyllabic root → consonant doubling). But the morphological hypothesis also explains cases with a varying
degree of transparency from words like kidnap, bootleg, handicap to completely opaque forms like humbug,
zigzag, hobnob (which all occur dominantly with doubled consonants), and variation in cases like
morphological analysis: If a writer analyses -ship in worship as a root, the consonant is doubled, if worship is
monomorphemic for her, it remains the way it is. If a writer analyzes com- and for- in combat and format as
prefixes and thus -bat and -mat as roots, the consonants are doubled; otherwise, they remain single consonants.
The prosodic hypothesis has problems with these data – at least in its prosodic formulation with reference to
secondary stress. A possible reformulation could be in terms of vowel quality: after all, this is a respect in which
compounds like sandbag, bootleg and kidnap differ from polysyllabic roots like summer, orbit or blanket.
Ultimately, however, this difference could also be argued to be morphological. Only vowels in affixes and in
non-initial syllables of polysyllabic roots are reduced, so the full vowel in the final syllable in sandbag, babysit,
and airdrop indicates some sort of lexical content. Both phonology (in form of vowel quality) and graphemics
(in form of consonant doubling/non-doubling) thus operate on similar kinds of morphological information.
But why are consonants doubled only in monosyllabic roots? This may have to do with the distribution of <e>-
deletion and the functional need to mark vowel quality. There are 121 pairs of phonologically monosyllabic
words which only differ in the presence or absence of final <e> (e.g. <bid> – <bide>, <mop> – <mope>, <stag>
morphologically complex forms (e.g. <biding>, <moping>, <staged>). For phonologically bisyllabic words,
there are only 12 such pairs, and almost all involve semantically closely related words (e.g. <artist> – <artiste>,
<ballad> – <ballade>, <human> – <humane>; but also <unit> – <unite>). So the functional motivation to
Generally, dictionaries of American English tend to indicate post-primary secondary stress in compounds (e.g. Webster’s
Third), while dictionaries of British English do not (e.g. OED).
8
One final (minor) sub-regularity: In British English, <l> is doubled even in cases where it is unstressed/not a
monosyllabic root (15a); in American English, it is not (15b) (cf. e.g. Carney 1994: 251).
In British English, one consonant thus behaves oddly; American English is more coherent in this respect (but cf.
Brame 1983a,b)
3.2. Derivation
Products of derivational word formation rules also comply with the constraint (5) and the morphological
hypothesis. This holds for the doubling (16a) resp. non-doubling (16b) of consonant letters in the following
words:
In the cases in (16a) and similar cases, the stem ends with a monosyllabic root, and the final consonant letter is
doubled. The cases in (16b), on the other hand, involve polysyllabic roots and show single stem final consonant
letters.
There are only a handful of exceptions in CELEX. A number of words occurs with unexpected <l>-doubling
(e.g. <crystallize>, <bimetallism>, <panellist>, <marvellous>). As noted above for inflected words with <ll>,
this seems to be a case of diatopic variation, with British English favouring the <ll>-forms and American English
favouring the <l>-forms. The other exceptions involve unexpected single consonants (17a) or unexpected
doubling (17b):
b. <clarinettist>, <carburettor>
According to both the prosodic and the morphological hypothesis, we would expect consonant doubling in (17a)
if the words are indeed formed on the bases par, gas, and scar. For gasify, the form <gassify> was historically a
variant as the OED notes, attested from the 18th century onwards; it would be interesting to see whether this
spelling still occurs (there are no instances of <gassify> in CoCA, though). Scarify nicely captures the whole
9
point of consonant doubling: It can be formed on two bases, scar and scare, and both meanings are attested, both
with their own unique phonological form – but without consonant doubling, both senses and pronunciations
collapse.
The two words with unexpected double consonants in (17b) are British spellings, as a comparison between
CoCA and BNC shows: The <tt>-forms are the dominant ones in the BNC, while the <t>-forms are the dominant
The number of words with double consonants varies greatly between suffixes. For example, there are 212 words
with consonant doubling followed by <er> in CELEX (etc. <beginner>, <cropper>, <nagger>), but only four
words with consonant doubling followed by <ance> (<admittance>, <quittance>, <remittance>, <riddance>). At
first glance, this appears to be a feature of suffixes – -er occurs with double consonants, -ance only marginally
does so. But the reason for this may very well be morphological: If the suffix operates on monosyllabic bases,
the amount of consonant doubling is higher; if it operates only on polysyllabic bases, the amount is smaller.
Finally, independent of whether the prosodic or the morphological hypothesis is preferred, there seems to be a
constraint relating all double consonants in derived words to stressed syllables before the corresponding
consonant phonemes. This usually holds in cases like (18a), but not in those in (18b, cf. also Cummings 1988:
169):
The forms in (18b) should be preferred on the grounds of both the prosodic and the morphological hypothesis:
the stem has final stress, and the stem contains a final monosyllabic root (fer). Yet it is unclear how systematic
these spellings are. They are certainly exceptions within the set of -ence and -able formations. The reason for the
spellings in (18b) may be thought to be prosodic: After all, the double consonants wrongly indicate word stress
on the second phonological syllable. But this explanation does not hold for a number of other words: In e.g.
<muggee>, <floggee>, <plannee>, <allottee>, <chattee> and <submittee>, the ult is stressed, not the penult (as
the double consonants indicate). For the time being, I will treat the spellings in (18b) as idiosyncratic.
10
4. The periphery: Monomorphemic words
In monomorphemic words, consonant doubling is much less regular. The first class of words are of Latin origin.
They are not strictly monomorphemic, but they are not unequivocally morphologically complex either; they are
In these cases, the vowel letter before the double consonant corresponds to a short vowel phoneme, but it is
mostly not stressed (there are exceptions like comment, however). To capture this distribution, one can list
prefixes that often correlate with consonant doubling. The following list is from Rollings (2004: 83):
Additionally, assimilations in the original Latin words have to be accounted for as well (ad + facere => affect, in
+ mobilis => immobile). The biggest problem for writers, however, is that at least some etymological knowledge
seems to be required to predict these double consonants (cf. e.g. Carney 1994: 119ff., Rollings 2004: 83f.).
Without it, the spellings <atone> (at + one) and <attain> (ad + tangere) are purely idiosyncratic. There may be
certain distributional cues (cf. Carney 1994: 120, Rollings 2004: 84), but on the whole, some knowledge about
Apart from words with Latin prefixes, the most important observation is that consonant doubling is highly
correlated with the graphemic shape of the word ending10. A review of the pertinent literature leads to the
inventory in (21a) (examples in 21b) for those word endings that occur with double consonants and the inventory
in (21c) (examples in 21d) for those that follow a single consonant (Carney 1994: 116f; Rollings 2004: 81f):
(21) a. <-ic>, <-id>, <-it>, <-ish>, <-ace>, <-ous>, <-al>, verbal <-age>, <-ule>
This correlation of certain word endings with either single or double consonants is often discussed in terms of
the words’ etymology: For Rollings (2004: 81f.), this behavior is an indicator of whether a word belongs to the
10The term word ending is used in the following to describe recurring letter strings. In bisyllabic words, the word ending is
the part of the word starting with the vowel (letter or phoneme) of the second syllable (e.g. <it> in <limit> or <er> in
<hammer>. It is the reverse unit to Taft’s (1979) BOSS: If you subtract the BOSS from a word, the remainder is the word
ending.
11
‘native’ or the ‘Latin’ part of the lexicon. The basic insight is that consonant doubling is rarer in words of Latin
origin and more frequent in ‘native’ words, but that it is hard to model this behavior synchronically (cf. also
To test the effect of word endings on consonant doubling, three corpus analyses were carried out.
1. The first analysis is a purely graphemic investigation: Which word endings occur with double
consonants (e.g. <er> as in <summer>), which do not (e.g. <it> as in <visit>), and which occur with
2. The second analysis takes the speller’s perspective and asks: How are short vowels marked, and how
3. The third analysis takes the reader’s perspective and asks whether words with single intervocalic
consonant letters (e.g. <paper>, <limit>) correspond to words with a short or a long vowel phoneme (or
These three analyses will be described in the following. They are all based on bisyllabic words: As Carney
(1994: 123) states, apart from the Latin prefixes mentioned above, monomorphemic three or more syllable words
To determine the relation between word endings, single/double consonants and single vowels/vowel letter
clusters, we use CELEX. More specifically, we use the set of all words in CELEX that meet the following
requirements:
the word is not annotated as morphologically complex in CELEX (remaining morphologically complex
the word is graphemically bisyllabic (<limit>), or trisyllabic with single final <e> (<palace>)
the word contains a single or double consonant letter after the first vowel letter or vowel letter cluster
This leads to a sub-corpus of 2,324 words. On a purely graphemic basis, we determined for each word ending
how many words contain a single vowel letter followed by a single consonant letter (<VC>, as in limit); or a
single vowel letter followed by a double consonant (<VCC>, as in hammer); or a cluster of vowel letters
followed by a single consonant letter (<VVC>, as in eager). The pattern <VVCC>, though logically possible, is
12
very rare. It occurs only three times in the corpus (<caisson>, <bouffant>, <pierrot>). Apparently, the
constraint found for morphologically complex words extends to morphologically simple words: No consonant
The following table summarizes the results. It indicates whether a pattern is systematically attested for a given
word ending, and (if more than one pattern is attested) which pattern is dominant. ‘Systematically attested’ in
this context means that at least 10% of the words with a given ending fall into the respective category (<VC>,
<VCC>, <VVC>). This is indicated by the symbol ‘’. Accordingly, for -er in table 1 below this means that all
three patterns occur with more than 10% of the -er-words. Information about the respective dominant pattern is
included in the next line: ‘80%’ means a dominant pattern occurs in more than 80% of the cases; ‘60%’ means it
occurs in more than 60%, and ‘40%’ means the dominant pattern occurs in more than 40% of the cases. For the
ending <-er>, for example, VCC is dominant with more than 60% of all words with <-er> falling in this
category. Only word endings which occur 15 times or more are listed in the following table.
1 -er, -y
>60%
2 -on
>40%
>60%
4 -ing, -et
>60%
5 -in
>40%
>40%
>60%
>80%
9 -our, -id
13
>80%
10 -ow, -ock
>80%
Table 1: Graphemic patterns (<VC>, <VCC>, <VVC>) associated with different word endings. :
systematically attested (> 10% of words with this ending occur with this pattern). >80%: Dominant pattern >
80%; >60%: dominant pattern > 60%; >40%: dominant pattern > 40%.
Ten word endings are correlated strongly (i.e. >80%) with one pattern: groups 8 and 9 (<-ic>, <-i>, <-us>,
<-um>, <-ile>, <ate>, <-ent>, <-our>, <-it>, <-id>) occur predominantly with <VC> patterns, and group 10
(<-ow>, <-ock>) occurs predominantly with <VCC> patterns. In other words, if a speller knows the word
endings is one of these twelve, the dominant pattern already follows on a graphemic basis. For the great
majority of word endings, however, there is graphemic variation – they occur with both single and double
consonants.
The second analysis takes phonology into account and investigates how short vowels are marked in spelling. Do
words with a single intervocalic consonant phoneme and a short vowel in the first syllable correspond to words
with consonant doubling (e.g. <summer>) or with single intervocalic consonants (e.g. <metal>)? The data base
to answer this question is the set of all words in CELEX that meet the following requirements:
the word is not annotated as morphologically complex in CELEX (remaining morphologically complex
the word contains a single intervocalic consonant phoneme which corresponds to a consonant letter that
can be doubled11
This leads to a set of 1,583 words. Cross-classifying vowel quality (short vs. long/diphthong) over single vs.
11This excludes the consonant phonemes /ð, θ, ŋ, ʃ, ʒ, v, z/, which all correspond to complex graphemes that have no doubled
equivalent (*<thth>, *<shsh>). Moreover, the following non-doubled complex graphemes are also taken into account: <ck>
(as a doubled variant of <c> or <k>), <dg> (as a doubled variant of <g> if it corresponds to /d͡ʒ/, cf. e.g. Venezky 1999: 14),
and <tch> (as a doubled variant of <ch> if it corresponds to /t͡ʃ/, cf. e.g. Venezky 1999: 14). /z/ can correspond to <z>, which
can be doubled (e.g. <buzz>); however, <zz> is rather marginal and limited to recent borrowings (cf. Venezky 1999: 45).
The doubled variant for /v/ (<vv>) is marginal as well.
14
<C> <CC> total
Table 2: The relation between consonant doubling and phonological vowel quality. Data base: trochaic CELEX
The majority of words with short vowel phonemes has a doubled consonant in their corresponding graphemic
form (708 of 1,031, or 69%), as table 2 shows. Words with long or diphthong vowel phonemes almost never
occur with doubled graphemic consonants (3 of 552, <1%). The three words which do occur are the ones
mentioned in the last section, <bouffant>, <caisson>, and <pierrot>, which are all of French origin. So words
with short vowels are often spelled with doubled consonants; long vowels or diphthongs almost never are.
Focusing on the 1,031 words with short vowels, we ask what determines the distribution in table 2. As noted by
Carney (1994), Rollings (2004) and others, the ratio of doubled consonants varies depending on the word ending.
The following table presents single vs. double consonants for word endings which occur at least ten times. 20
spellings with vowel letter clusters were excluded, e.g. treadle, zealot, meadow, flourish; in these cases,
er 3 93 96 3% trigger, hammer
ow 2 34 36 6% sorrow, mellow
15
ot 4 6 10 40% maggot, spigot
Table 3:Marking of short vowels with single/double consonants according to word ending. All word endings
Taking 20%/80% as arbitrary thresholds, the word endings in table 3 fall into three groups, those with mostly
doubled consonants (22.a), those with mostly single consonants (22.b) and those in between the two groups
(22.c):
(22) a. >80% <VCC>: <-ock>, <-le>, <-er>, <-ow>, <-y>, <-ey>, <-et>, <-a>
From the purely graphemic overview given above (table 1 above) it follows that the distribution of <-id>, <-it>,
and <-ic> on the one hand and of <-ock> and <-ow> on the other hand are hardly surprising: In all those cases,
there are no graphemic alternatives, e.g. no graphemic words ending with double consonant followed by <ic>, or
no graphemic words endings with single consonant followed by <ow>. The other word endings in (22) extend
the list determined purely graphemically, and also the one from the pertinent literature (21 above). Moreover,
what is striking about the word endings in (22) is that all endings that prefer single consonants involve the vowel
letter <i>. The letter <e>, on the other hand, is found in many of the word endings that prefer double
consonants. This is true for the whole corpus as well: Overall, 200 of 232 words with <e> following a single or
double consonant (e.g. <summer>, <bonnet>, <hockey>, <wicked>) have a double consonant (86%). This
resonates well with Evertz & Primus (2013) who attribute a special theoretical status to this structure (which they
call the ‘canonical trochee’ – a bisyllabic graphemic word with <e> in the second syllable).
It follows that there are indeed patterns (in the sense of recurring word endings) which correlate with the
presence or absence of double consonants. Note that the graphemic form of the word ending is the determinant,
not the phonological form. For the phonological word ending [ɨk] for example, there are at least three different
16
spellings, <ic>, <ick>, and <ock>. While <ic> predominantly occurs with preceding single consonants, <ick>
and <ock> occur exclusively with double consonants. The following table lists this and similar cases:
ɨk 7 19 27%
ɨl
<al> 0 3 0% moral
<yl> 0 2 0% beryl
<il> 0 2 0% peril
ɨt
<ate> 0 6 0% palate
ər
əʊ
<ot> 0 2 0% depot
<eau> 0 3 0% plateau
ɨs
17
<us> 3 0 100% cirrus
<ise> 0 3 0% promise
Table 4: Homophonous word endings which differ graphemically, and which show a different amount of
For the speller, this is an unfortunate situation: To deduce whether or not a consonant is doubled, she must know
which of many possible written forms a phonological word ending has. In this respect, the written forms are
doubly coded. This correlation can be termed graphemic harmony: One choice of graphemic options determines
another choice.
At least partly, graphemic harmony correlates with the words’ etymology: Words of French origin, for example,
tend to have single consonants (e.g. <ic>, <ate>, <our>, <ot> for /əʊ/, <eau>, <ace>, <ise>, <ice>); words of
Germanic origin tend to have double consonants (e.g. <ick>, <ock>, <er>, <ow>). One notable exception is
<et>; the respective words are mostly of French origin, but occur mostly with doubled consonants.
The third analysis takes the reader’s perspective. To understand the patterning of consonant doubling and word
endings (table 3 above), it is important to understand the ‘functional load’ for each word ending. For example,
as shown above, words which end with <id> are only rarely spelled with a preceding double consonant. But if
<id>-words never contained long/diphthong vowel phonemes, the marking of vowel quality would be negligible.
If, on the other hand, a significant fraction of <id> words contained long/diphthong vowel phonemes, the
graphemic forms would be a lot more idiosyncratic – the reader would just have to know how to pronounce this
The data base to tackle this question is the set of all words in CELEX that meet the following requirements:
the word is not annotated as morphologically complex in CELEX (remaining morphologically complex
18
the word is graphemically bisyllabic (<limit>), or trisyllabic with single final <e> (<palace>)
the word contains one single intervocalic consonant letter between the first and the second syllable
This leads to a corpus of 1,114 words. Classifying for short (‘/V’/) or long/diphthong (‘/VV/’) vowel phonemes
Table 5: Cross-classification of vowel quality (‘/V/’: short; ‘/VV/’: long/diphthong) over foot structure
(iamb/trochee). Data base: all graphemically bisyllabic words (and trisyllabic words with single final <e>) in
CELEX with a single vowel letter in the first syllable followed by a single consonant letter.
As table 5 shows, phonological foot structure co-varies with vowel quality: If <VC>-words correspond to
trochaic phonological forms, the vowel phoneme is a long vowel or diphthong 58% of the time, e.g. <tiger>. If
they correspond to iambic words, the vowel phoneme is short (and often reduced) 83% of the time, e.g.
<lament>.
The correspondence of a given graphemic words to an iambic or trochaic phonological word depends on many
factors, e.g. the presence or absence of prefixes like <de-> or <re-> (e.g. <demand>, <report>) and the word
category (e.g. pro’test (V) vs. ‘protest (N)). In the following, we will only investigate the 758 trochaic words
from table 5.
For these words, we get the following distribution according to their word ending (only word endings with ten or
more occurences):
12 This excludes the consonant letters <h, j, q, v, w, x, y, z>. In words with these letters, there is no potential opposition (as in
e.g. <dinner>/<diner>). Thus, intervocalic <v> (for example) cannot code vowel quality, and both a short and a
long/diphthong reading are possible for structurally similar words (cf. <never>/<fever>).
13 As noted above, vowel letter clusters in the first syllable usually correspond to long or diphthong vowel phonemes; the
us 0 27 27 0% opus, bonus
a 3 38 41 7% drama, schema
um 1 11 12 8% velum, datum
Table 6: Reading of words with single intervocalic consonant letters as containing short or long/diphthong
vowel phonemes, according to word ending. All word endings that occur 10 times or more in the sub-corpus.
An in-depth analysis of the word lists that serves as the basis of table 6 may lead to interesting – and potentially
clearer – results. For example, almost all words with <-ic>, <-id>, and <-it> that correspond to long vowels
involve <u> (e.g. <humid>, <cupid>, <music>, <tunic>, <unit>), and no <u> before these suffixes corresponds
to a short vowel. In this sense, <u> is special (cf. e.g. Cummings 1988). It is conceivable that similar sub-
patterns exist that can explain some – though far from all – variation in table 6.
20
Like in the last section, we can classify these endings: Those in (23a) are clearly associated with a
long/diphthong reading of the respective vowel, those in (23b) are clearly associated with a short reading; those
(23) a. >80% /VV/: <-us>, <-a>, <-er>, <-um>, <-o>, <-ent>, <-ey>
c. 20-80% /V/: <-ar>, <-al>, <-y>, <-i>, <-ile>, <-our>, <-on>, <-ate>, <-id>, <-or>
4.4 Synopsis
Many of the word endings in (23) are in the same group as in (22); there seems to be a connection. Functionally,
this makes sense: If a short vowel is often encoded with consonant doubling (e.g. <summer>, group 22a above),
then a single consonant letter can correspond to a long/diphthong vowel phoneme (e.g. <paper>, group 23a). If,
on the other hand, a short vowel phoneme is often encoded with a single consonant letter (e.g. <limit>, group
22a), then the same structure should not be used to encode long/diphthong vowels.
Figure 1 shows this connection. For each word ending from (22) and (23) above, the amount of consonant
doubling (horizontal axis) is plotted against the amount of a long/diphthong reading in words with a single
intervocalic consonant letter (vertical axis). So for example, -age in the middle of the bottom of figure 1
indicates with its position that it occurs with consonant doubling 47% of the time (e.g. <village>, <message>),
while at the same time a short vowel reading is dominant (92%) in <…VCage> words (e.g. <manage>,
<damage>).
21
100% us
um a le
er
90% ent ey
o
or
80%
al ar
70% i y
ile
60%
our
%/VV/ 50%
ate on
40%
ot
30% id
20% it
in et
icish age
10%
ow ock
0%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
%<CC>
Figure 1: Percentage of <CC> spellings (as a ratio of combined <CC> and <C> spellings) for each word
ending combined with the percentage of /VV/ (long vowel or diphthong) reading for single vowel letters in words
These results have to be taken with a grain of salt because the actual numbers are sometimes rather small. For
example, <um> occurs only three times with a short vowel in the corpus; one of the words is spelled with a
single consonant (<alum>), two are spelled with a doubled consonant (<vellum>, <possum>). This leads to a
67% ratio of <CC> spellings, but it should clear that this figure is of a different quality than e.g. the 19 instances
of <a>.
With that in mind, we can identify four groups of word endings in figure 1:
Group 1: no consonant doubling, no long/diphthong reading (<-id>, <-ic>, <-ish>, <-it>; lower left corner of
figure 1). This relation is functional, as sketched above: If words that end with <ic> are never read as having a
long/diphthong vowel phoneme, the shortness of the vowel in turn does not have to be indicated.
Group 2: mostly consonant doubling, mostly long/diphthong reading (<-us>, <-a>, <-er>, <-le>, <-ey>; with
limitations also <-um>, <-ent>, <-ar>, <-or>, <-o>, <-y>; upper right corner of figure 1). This relation is also
(mostly) functional: If vowels preceding single consonant letters are likely to be read as long/diphthong vowel
22
phonemes (e.g. <paper>), then the shortness of the vowel phoneme should in turn be indicated (e.g. <summer>).
However, there are some idiosyncratic cases (e.g. <scholar> vs. <molar>).
Group 3: some amount of consonant doubling, some amount of long/diphthong reading (<-ate>, <-our>, <-al>,
<-i>, <-ile>, <-on>, <-ot>; in between groups 1 and 2, towards the center of figure 1). These word endings do
not systematically encode shortness, even though a considerable amount of words with a single intervocalic
consonant letter gets a long/diphthong reading; shortness is “under-coded”, so to speak. In effect, this group
shows a greater amount of idiosyncratic words. This leads to problems for the reader (consider the long/short
pairs <demon>/<lemon>; <facile>/<docile>; <robot>/<spigot>), but also for the speller (consider the
Group 4: some amount of consonant doubling, no long/diphthong reading (<-in>, <-et>, <-ow>, <-ock>; with
limitations <-age>; bottom right of figure 1). These word endings systematically encode shortness by consonant
doubling, and they do so even without a functional need: Vowels before single intervocalic consonant letters
hardly ever correspond to long/diphthong vowels. In a way, this group “over-codes” shortness.14
Note that we do not find word endings in the upper left corner. The respective spellings would be highly
idiosyncratic.
So what is the condition for consonant doubling in monomorphemic words? Obviously, word endings have a
strong effect (although only the most frequent ones are accounted for in figure 1): Some endings are correlated
with consonant doubling, some with single consonants, and some are in between. If we take figure 1 as a basis,
we can at least formulate a sufficient condition for doubling: Consonant doubling occurs with word endings that
are also associated with a long/diphthong reading. This condition is not necessary: Group 4 also contains double
5. Conclusion
Consonant doubling is regular in morphologically complex words. It can be motivated with reference to e-
deletion before vocalic suffixes, and for morphologically complex words it can possibly be described in
graphemic and morphological terms alone, without reference to phonology. Of course, the resulting spellings are
14 With the exception of group 4, the relation between the two dimensions could also be interpreted as being linear.
However, to my mind it makes more sense to think of the distribution in terms of cluster and outliers/inbetweeners.
23
also phonographically plausible, and there are regular correspondences on a suprasegmental level (cf. e.g.
Rollings 2004, Evertz & Primus 2013). Phonological terms are, however, not necessary to capture the graphemic
Consonant doubling is far less regular in morphologically simple words, where a short vowel phoneme is a
necessary condition. Doubling in these words varies with the respective word’s ending. Some word endings
trigger consonant doubling (e.g. <-er>, <-a>, <-y>, <-ow>, <-ock>), some do not (e.g. <-it>, <-id>, <-ic>). This
is an effect of the graphemic form of the word ending, not one of the phonological form. As a matter of fact, the
same phonological ending (e.g. [ɨk]) can be spelled in different ways, and the presence of consonant doubling
hinges on the choice of this spelling (cf. e.g. <comic>/<gimmick>). This phenomenon was dubbed graphemic
harmony. Word endings are thus recurring entities with distributional properties – they correlate with consonant
doubling or non-doubling. That makes them very similar to suffixes. But unlike suffixes, they have no
morphosyntactic or semantic function. It is an interesting question whether they are psychologically “real”. Do
proficient readers strip them off the word just like they do with suffixes (cf. e.g. Rastle, Davis & New, 2004)?
There is a functional relation between consonant doubling and the amount of words with single intervocalic
consonants that correspond to words with long/diphthong vowel phonemes: Doubling is only necessary if the
alternative spelling would be prone to misreading. The systematic occurrence of words like <paper> makes the
References
Baayen, H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (release 2). Philadelphia:
Berg, K., Buchmann, F., Dybiec, K, and Fuhrhop, N. (2014). Morphological spellings in English. Written
Bolinger, D. (1986). Intonation and its parts: melody in spoken English. Stanford: Stanford University Press.
Bolinger, D. (1989). Intonation and its uses : melody in grammar and discourse. London: Arnold.
Brame, M. K. (1983a). Ungrammatical Notes, 2: Doubling trouble and the peccable British. Linguistic Analysis
24
Brame, M. K. (1983b). Ungrammatical Notes, 3: Undoubling and the impeccable British. Linguistic Analysis
Cummings, D.W. (1988). American English Spelling: An informal description. Baltimore, London: Johns
Evertz, M. & Primus, B. (2013). The graphematic foot in English and German. Writing Systems Research, 5(1),
1-23.
Halle, M. & Mohanan K. P. (1985). Segmental phonology of modern English. Linguistic inquiry (1985): 57-116.
Jespersen, O. ([1909] 1928). A modern English grammar on historical principles, Part I: Sounds and Spellings.
Heidelberg: Winter.
Jones, D., Roach, P, Setter, J., & Esling, J. ( 182011). Cambridge English Pronouncing Dictionary. Cambridge:
Kurath, H. (1964). A phonology and prosody of modern English. Ann Arbor, U. of Michigan.
Palmer, F., Huddleston, R., & Pullum, G. K. (2002). Inflectional morphology and related matters. In:
Huddleston, R. & Pullum, G. K. (Eds.), The Cambridge Grammar of the English Language. Cambridge:
Rastle, K., Davis, M., & New, B. (2004). "The broth in my brother’s brothel: Morpho-orthographic segmentation
Taft, M. (1979). Lexical access-via an orthographic code: The basic orthographic syllabic structure (BOSS). In:
Venezky, R. (1999). The American Way of Spelling. The Structure and Origins of American English
25