Anda di halaman 1dari 18


This course will enable the students to recognize, transcribe and describe the English sounds in general phonetic terms and to master the basic phonetic characteristics of the English language. At the same time, they will have the possibility to improve their knowledge of English pronunciation in relationship with the English spelling. Phonetics and phonology are two closely related branches of linguistics, the science which studies human language in all its aspects.

Speech sounds

Both phonetics and phonology deal with human speech sounds. Speech sounds are the sounds we produce when we want to communicate, that is, the sounds that build up our words and sentences. Unlike animals, which use sets of sounds at random to transmit brief uncomplicated messages (e.g., a honey-bee dancing in front of its hive), human beings can combine their sounds in a precise order so as to form larger units and to convey much ampler and more abstract meaning. Speaking a language we are intuitively aware that in order to pronounce it correctly (or accurately) we have to follow a certain pattern and pick those sounds that characterize it. This is because, as already stated, each language uses a closed set of sounds, and native speakers have the built- in ability to identify those sounds and associations of sounds, which normally occur in their language and distinguish them from ‘alien’ ones. It is usually when we try to learn a foreign language that we start to realize what is typical of it. For example, a Romanian will have difficulties when learning how to master the difference between the initial sound in the word there [D] and the corresponding sound in dare [d] because the former sound does not belong to the inventory of sounds of his own language.

Although each language can only make use of a finite set of sounds, each set is different, so there is no natural language that employs, has employed or probably will ever employ the same sounds as another one. The sound system of any language changes in time. This is due to the fact that the vocal tract of a human being is sophisticated enough to produce an amazingly large variety of speech sounds, so that when the generations of speakers change, the sounds they use will also change, even if only imperceptibly, under various conditioning factors. Small changes turn over centuries into big shifts. This explains, for instance, why the sets of sounds of related languages, e.g., Romanian, Italian, French, etc. are not identical among themselves and with the sounds of the motherlanguage they all emerged from – in our example:


The linguistic sign


When we communicate through language we actually use sounds to convey meanings.The Swiss linguist Ferdinand de Saussure gave a coherent and scientific interpretation of language as a system of signs.In Saussure’s theory,linguistic signs have a dual structure: any linguistic sign is made up of a significant(English-signifier),that is an”acoustic image”(the phonological skeleton of the word) and a signifie(English-signified),or a concept to which the respective acoustic image sends.The acoustic image is primarily a psychologic and not a material reality,which is proved by the fact that we can speak to ourselves without articulating the words whose acoustic image is only present in our mind.There are two essential features of the linguistic sign in Saussure’s opinion:its arbitrariness and the linearity of the signifier. 1.Arbitrariness of the sign – the arbitrariness of the relation holding between the signifier and the signified.The link between them is arbitrary because there is no reason for which a certain acoustic image should be associated with a certain concept,meaning.As far as the arbitrariness of the linguistic sign is concerned,there are two situations in which we can talk about some sort of match between the acoustic image and the concept: the onomatopoeias and the exclamations.

The International Phonetic Alphabet

As a means of communication, language is fundamentally oral. Writing is subordinate to speech and thinking, as its role is that of fixing ideas in a more or less durable material by means of symbols. An alphabet is a much more economical system of writing, as it starts from the idea that every sound should be represented by one symbol, a letter. Nowadays, the most frequently employed alphabet is the Latin one, which has been adapted by many languages according to their phonetic system. In the English spelling, for instance, the relationship between the pronunciation and the spelling of words has become apparently so lax that learners have to memorize strings of letters whose value is different in different contexts: think, e.g., of the English ghost, laugh and thought. In the first word, the graphic sequence gh is pronounced [g], and in the second, [f], but in the third it is not pronounced at all.

Faced with the imperfections and irregularities characterizing the alphabets of natural languages, in order to be able to refer unambiguously and rigorously to speech sounds, linguists have come to design special phonetic alphabets. Nowadays, the best known in the scientific world is the alphabet of the International Phonetic Association (in short: IPA ) which can be used for the notation of speech sounds from all natural languages. The IPA was first devised at the end of the 19th century, and ever since it has been regularly revised and updated, so as to accommodate sounds features and from languages that are still being studied. Like any alphabet, IPA makes use of letters and other small symbols attached to them (diacritics), which can express the tiniest nuances of pronunciation. For instance, there are numerous shades of [t] listed in the IPA alphabet: aspirated [th] (as in top), labialised [tw] (as in twitter),


palatalized [tj] (as in tune), etc. Such detailed notations are necessary in the ‘narrow’ phonetic transcription, which tends to be exhaustive in its description. If, on the contrary, we need to be economical, we may only note the sound as a simple symbol, without any detail (i.e., in ‘broad’ phonetic transcription) – in our example as [t]. By convention, the symbols used in the phonetic transcription are places within square brackets.

The nature of phonology and phonetics ( Definitions)

Both phonetics and phonology deal with human speech sounds. Speech sounds are the sounds we produce when we want to communicate, that is, the sounds that build up our words and sentences. Human speakers can produce an indefinite number of words and sentences, while using a limited number of sound units and a restricted set of rules according to which these sounds are organized. Speech sounds are different from other sorts of vocal sounds (vocalisations) because they make regular, meaningful patterns. Speech is a series of meaningful sounds and silences .

Phonology is the study of the sound systems of languages, and of the general properties displayed by these systems. Phonology is the study of the distinctive sounds of a language, the so-called phonemes. Phonology examines the functions of sounds within a language, as well as the way they combine in syllables and other stretches of speech.

Phonetics is the study of ways human make, transmit, and receive speech sounds. Phonetics deals with the physical aspect of speech sounds: their production, transmission, and reception.

Vocal tract and articulatory organs

Human speech is a combination of sounds and silences generated by the speech mechanism of the vocal tract into meaningful patterns.Click here to see a diagram of the vocal tract

All speech begins as a silent breath of air, created by muscular activity in the chest. The air then comes up from the lungs, via the vocal tract and exiting as a sound wave.

Basically, the speech mechanism as four components: muscular activity, air, some type of resistance or obstruction to the air which causes some sort of sound to be made, and amplification to make the sound loud enough to be heard.


Changes to the air flow between the lungs and mouth and nose produce different sounds.Air starts off in the lungs, flows up through the trachea ( or windpipe), through the larynx, past the epiglottis and through the pharynx. From there, the air can go either through the mouth or nose.

The Vocal tract is the channel of air flow between the larynx and the mouth and nose.

Overview: organs of articulation

Changes to the air flow between the lungs and mouth and nose produce different sounds.Air starts

English Phonetics — Classifying English Sounds


English Consonants

Sounds you can feel !


defined by manner of articulation: obstruction / blockage of air flow plosives — fricatives — affricates Sonorants:

defined by manner of articulation: obstruction bypassed (or incomplete) nasals — approximants — lateral (approximant)

There are 24 consonants in standard British English.

A consonant is a speech sound which obstructs the flow of air through the vocal tract. Some consonants do this a lot and some do it very little: the ones that make maximum obstruction ( ie plosives, which make a complete stoppage of air stream) are the most consonantal. Nasal consonants are less obstructive than plosives as they stop the air completely in the oral cavity but allow it to escape through the nasal cavity. Fricatives obstruct the air flow considerably, causing friction, but do not involve total closure. Laterals obstruct the air flow only in the centre of the mouth, no the sides, so the obstruction is slight. Some other sounds, classed as approximants, obstruct the air flow so little that they could almost be classed as vowels if they were in a different context ( eg /w/ or /j/).

Classification of Consonants

Most English consonants can be classified using three articulatory parameters:

(or we can use everyday words to describe sounds in terms of 3 features):

Place of Articulation: the point at which the air stream is most restricted;( what parts of the

mouth are used)

Manner of Articulation: What happens to the moving column of air; (how the sound is made) Voicing: vibration or lack of vibration of the vocal folds; (whether or not a sound is made in the


Places of Articulation

Lips: Bilabial consonants : /p, b, m, w/ Lips and teeth: Labiodental consonants: /f, v/ Teeth: Interdental consonants : /θ, ð/ Alveolar ridge: Alveolar consonants : /t, d, s, z, n, l/ Central palate (or hard palate): Palatal consonants : / ʃ, ʒ, r, tʃ, dʒ, y/ Velum (or soft palate): Velar consonants : /k, g, ŋ/ Glottis: Glottal fricative : /h/

Fortis consonants are produced with greater articulatory effort and more air pressure required by stronger resistance at the place of articulation. Lenis consonants are more lax: they require less intensity and tension. The duration of articulation is also longer in the case of fortis consonants than in the case of lenis ones. In a voiced/voiceless pair (e.g., [d]/[t]), the voiced consonant is lenis and the voiceless consonant is fortis.


Manner of Articulation

The process by which the moving column of air is shaped is called the manner of articulation. For English, these are:

Plosives: /p, t, k, b, d, g/ Fricatives: /f, v, θ, ð, s, z, ʃ, ʒ, h/ Affricates: / ʧ, ʤ/ Nasals: /m, n, ŋ/ (sometimes called “nasal stops”) Liquids: /l, r/ Glides: /w, y, hw/

Plosives (stops) occur when the air stream stops completely for an instant before it exits the vocal tract. Voiceless stops in English are the /p/ in pour and slap, the /t/ in time and adept, and the /k/ in cold and poke. Voiced stops are the /b/ in bow and crab, the /d/ in dock and blood, and the /g/ in game and bag. Fricatives occur when the air stream is audibly disrupted but not stopped completely. Voiced fricatives are the /v/ in very and shove, the /ð/ in them and bathe, the /z/ in zoo and wise, and the /ʒ/ in measure. Voiceless fricatives are the /f/ in fool and laugh, the /θ/ in thin and bath, the /ʃ/ in shock and nation, the /s/ in soup and miss, and the /h/ in hope and home.

Affricates start out as a stop, but end up as a fricative. There are two affricates in English, both of which are palatal. Therefore we do not need to mention place of articulation to describe afficates.

The voiceless affricate is the /tʃ/ in lunch and chapter. The voiced affricate is the /dʒ/in germ, journal and wedge.

Nasals occur when velum is lowered allowing the air stream to pass through the nasal cavity instead of the mouth.The air stream is stopped in the oral cavity, so sometimes nasals are called “nasal stops.” We will just call them “nasals.” Nasals are the /m/ in mind and sum, the/n/ in now and son, and the /ŋ/ in sing and longer.

Liquids occur when the air stream flows continuously through the mouth with less obstruction than that of a fricative. Both liquids in English are voiced, so we don’t need to mention voicing when we describe liquids. The “lateral” liquid, /l/, is pronounced with the restriction in the alveolar region at the beginning of syllables, as in low and syllable, but in the velar region at the ends of syllables, as in call, halter, and (optionally) syllable. It is called “lateral” because air flows around the sides of the tongue. The “central” liquid is the /r/ in rough and chore. This also has various pronunciations. It is called “central” because air flows over the center of the tongue. So the terms “central” and “lateral” replace the place of articulation in descriptions of the liquids.


Approximants occur when the air stream is unobstructed, producing an articulation that is vowel-like, but moves quickly to another articulation making it a consonant. Sometimes approximants are described as semivowels. The approximants in English include the the labio-velar /w/ shares the articulation features of [u] (the lips are rounded and the back of the tongue raised towards the soft palate) in witch, water and away; the articulation of the palatal [j] is similar to that of the vowel [i] (the front of the tongue is raised close to the palate), /j/ in year and yes.

Overview: table of English consonants

Approximants occur when the air stream is unobstructed, producing an articulation that is vowel-like, but moves


English Vowels For the sake of simplicity, the most common representation of the vowel space takes

English Vowels

For the sake of simplicity, the most common representation of the vowel space takes the stylized arbitrary shape of a quadrilateral (a trapezoid), as first proposed by Daniel Jones in the 1920s, under the name of Cardinal Vowel chart.(figure 1.1) The shape of the chart is modelled on the shape of the phonetic space ie the shape of the oral cavity produced by various positions of the tongue. For English, the phonetic space is represented as a trapezoid.

In figure 1.1 the upper left corner represents the tongue position for the (ideally) highest and

furthest forward vowel ([i]), while the lower right corner shows the tongue position for the lowest and furthest back vowel [A]. Six other sounds, approximately placed equidistantly from each other, are also indicated, thus giving a series of eight cardinal vowels, of which 1 to 5 are unrounded, and 6 to 8 rounded. These are known as the primary cardinal vowels.

1 [i]






[o] 7




[ɔ] 6




[ɑ] 5

The Cardinal Vowel chart is a schematic representation of the vowel space and its limits. It establishes reference points (hence the label ‘cardinal’) to which vowels in specific languages can be compared and described as, for instance, ‘higher than the cardinal vowel X’, ‘further back than the cardinal vowel Y’, or ‘more rounded than the cardinal vowel Z’. In this sense, the vowels in the words sea and shoe are said to illustrate the high cardinal vowels [i] and [u], respectively.


There are also other central vowels which do not belong to the inventory of cardinal vowels, but are included in the IPA chart: the central low-mid unrounded vowel [ʌ ], the central mid(half) unrounded vowel [3], the central mid (half) unrounded vowel [ə ], etc. [ə ] is shaped like an inverted ‘e’ and is usually called ‘schwa’ (pronounced [Swa]), which is the old Hebrew term for a diacritic indicating a missing vowel (Hebrew writing usually only includes consonants).

The RP variety of British English, with twenty vowel phonemes (standard American English has fifteen), has a relatively large vowel system, which is characteristic of Germanic languages (Swedish has even more vowels). There are seven short vowels, five long vowels and eight diphthongs.

The vowels and their corresponding phonemic symbols are shown in the table below:

Figure 1.1

There are also other central vowels which do not belong to the inventory of cardinal vowels,


Figure 1.2

Figure 1.2 Figure 1.3 10

Figure 1.3


Criteria for classifying vowels Vowels are usually described according to their ‘quality’ within a three-term system:

Criteria for classifying vowels

Vowels are usually described according to their ‘quality’ within a three-term system: vowel height, vowel backness, and vowel roundness. Vowel height is a ‘vertical’ parameter, corresponding more or less to the consonantal criterion of manner, based on the distance between the articulators. Vowels vary from close (that position in which the tongue body is as near the palate as it can be without causing audible friction) to half-close, half-open and open (where the tongue body is as far from the palate as possible). Vowel frontness/backness is a ‘horizontal’ criterion, parallel to consonantal place. It refers to the part of the tongue which is raised highest in the articulation of the vowel, varying from front (equivalent to palatal) (through central) to back (equivalent to velar). Vowel roundness: a vowel may be either rounded – articulated with the corners of the lips brought towards each other and the lips pushed forwards, e.g., [u] – or unrounded.

Traditionally in describing English vowels we use the ‘quantity’ distinction ‘long’ vs. ‘short’. Long vowels can be 50 to 100 percent longer than short vowels. For example, there is an obvious difference in length between the vowel in feet [i:] (the colon indicates a long vowel) and the one in fit [I]. That is because length in most English varieties is never the only feature which distinguishes two vowels. This is not the case in other languages (e.g., Danish) or even in a number of Scottish and Northern Irish English varieties, where length is sometimes the only criterion of distinction between pairs of words such as daze [dez] and days [de:z]. Long vowels are always associated with a higher degree of muscular tension in the articulatory organs. Consequently, they are described as tense. Short vowels are produced with less tension, in a


more relaxed manner – hence their description as lax. The number of vowels and their positions on the vowel chart differs considerably from one English variety to another. Of the English varieties, the RP vowel system is particularly rich. Conservative RP is thus said to have 20 vowel sounds (12 monophthongs and 8 diphthongs).

RP front vowels

[i:] – close (high), long, tense, unrounded (e.g., in see). [I] – half-close (high), more central and lower than [i:]; short, lax, unrounded (e.g., in bit). [e] – half-open (low-mid), short, lax, unrounded (e.g., in check). [æ] – open (low), short, lax, unrounded (e.g., in cat).

RP back vowels

[u:] – close (high), long, tense, rounded (e.g., in boot). [ʊ ] – half- close (high), more central and lower than [u:]; short, lax, rounded (e.g., in put). [ɔ :] – half-open (low-mid), long, tense, rounded (e.g., in taught). [ɒ ] – open (low), short, lax, rounded (e.g., in got). [A:] – open (low), long, tense, unrounded (e.g., in father).

RP central vowels

[ʌ ] – half-open, short, lax, unrounded (e.g., in cut); it is closer to the IPA vowel [6] than to the cardinal [V]. [ə ] – mid, short, lax, unrounded (e.g., in about, woman –always in unstressed syllables). [3:] – mid, long, tense, unrounded (e.g., in fur, bird)



Feature changing rules

Feature-changing rules are those rules which affect one feature or a small group of features.

Here belong assimilation and delition, as well as lenition, flapping, glottalisation, etc.

Sounds in connected speech


Sometimes a sound is assimilated (absorbed / incorporated) into an adjacent, similar sound.

The faster a person speaks, the more likely this will happen.


Examples of Assimilation:

/s/ + /ʃ/becomes /ʃː/ horseshoe less shocking Some people think a horseshoe is a good-luck charm. That horror movie was less shocking than I thought it would be. z/ + /ʃ/ becomes /ʃː/ his shirt Wayne’s shadow

His shirt is really quite attractive. Wayne’s shadow fills a large space. Certain letters followed by a /y/ often result in a new sound. This depends on several factors:

  • - speaking habits

  • - speed of speech

  • - casual vs. formal speech

Examples of Assimilation:

I’m turning 21 this year.

S + y



Does your teacher know?

Z + y



Is that your best guess?

t ʃ

He hates your style

T + y = TS + y =

t ʃ

Would you mind not smoking?

D + y =

d ʒ

He needs your support. Deletion

DZ + y = d ʒ

Sometimes a sound is deleted when it’s part of a consonant cluster. This can happen when connecting words, or within a single word.

Examples of Deletion East side Blind man Restless Kindness Wind tunnel Notice that T and D are the most commonly deleted consonants


Example of Sounds in connected speech :

Connect 2 stops

Example of Sounds in connected speech : Connect 2 stops Reduce “to” Connect the “sh” &

Reduce “to”

Connect the “sh” & “s”

Example of Sounds in connected speech : Connect 2 stops Reduce “to” Connect the “sh” &
Example of Sounds in connected speech : Connect 2 stops Reduce “to” Connect the “sh” &

I wouldn’t be able to speak English smoothly

Example of Sounds in connected speech : Connect 2 stops Reduce “to” Connect the “sh” &


Example of Sounds in connected speech : Connect 2 stops Reduce “to” Connect the “sh” &

Connect consonant & vowel



Consonants and vowels can be described (as we learnt) both from a phonetic point of view(how they are produced) and from a phonological point of view (where they occur). Phonetically, a syllable can be described as having a centre, also called peak or nucleus (the main core, the most prominent),which is produced with little or no obstruction of air and is usually formed by a vowel ( either a monophthong or diphthong ). Vowels are the most sonorous sounds human beings produce and the presence of a vowel or of a sound having a high degree of sonority will then be an obligatory element in the configuration of the syllable. The minimal syllable is typically a single, isolated vowel, as in the word are [ɑː] and I [aɪ]. The few consonants that can occur in isolation, such as interjections mm[m] – agreement and sh [ʃ ] – used to ask for silence, are not regarded as minimal syllables by all linguists. The sounds either preceding the vowel or coming after it are necessarly less sonorous and they are optional elements in the make-up of the syllable –the basic configuration of an English syllable will be (C)V(C). The part of the syllable preceding the nucleus is called onset and is produced with greater obstruction of air. So, the onset is formed by one or more consonants. E.g. bar [bɑː], stir [stɜː], my [maɪ]


The non-vocalic elements coming after the nucleus are called the coda of the syllable, which is

The non-vocalic elements coming after the nucleus are called the coda of the syllable, which is also produced with greater obstruction of air, and is therefore formed by one or more consonants.

E.g. art [ɑːt], urge [ɜːdʒ], ice [ɑis].

A syllable that ends in a vowel (one that ends with the nucleus) is called open syllable (no coda at the end).

A syllable that ends in a consonant ( ends with a coda) is called a closed syllable. A syllable is a peak of sonority, often surrounded by less sonorous segments.


A syllable consists of an obligatory rhyme, preceded by an optional onset. A rhyme consists of an obligatory nucleus, followed by an optional coda.

σ (syll) On Rh
σ (syll)
N Co

A syllable nucleus consists of an obligatory sonorant (or resonant) segment, usually a vowel:

σ (sigma)

(O) R N (C)
(O) R N (C)



(O) R N (C)



The onset and coda, when present, may consist of one or more less sonorant segments:



O C C C / s p r
/ s


N C V C C ɪ n t /
t /

Syllables are clusters of segments grouped around a sonority peak(usually a vowel). The most widely-spread syllable structure in the languages of the world consists of a CV sequence (i.e., a consonant followed by a vowel – e.g., Rom. barbat ’ man’ The phonological definition of a syllable takes into account the structure of a particular language, analysing the combinatorial rules of phonemes into syllables, as well as their place and order in the syllable structure.Each language has restrictions concerning the combination of phonemes in syllable.

The distribution of sounds in sound patterns is not arbitrary, but follows some constraints called phonotactics.

Phonotactics: the set of constraints on the permissible combination of sounds in a language, which is part of speaker’s phonological knowledge.

Still the most serious restriction regards the combination of consonants into clusters (two or more consonants together) in the onset and coda.

Phonotactic constraints

The syllable onset:

If the syllable begins with a vowel, it has a zero onset as in ‘am’ /æm/; ‘ease’ /iːz/. If a syllable begins with one consonant, the initial consonant can be any consonant phoneme


/ ŋ / and / ʒ/. Examples: ‘key’ /ki ː/; ‘kick’ /kɪk/.

If a syllable begins with two or three consonants, such a sequence of consonants is called a consonant cluster. Examples: ‘play, stay, street, split, etc’.

Consonant clusters in the onset:

Initial two-consonant clusters are of two types:

  • - Composed of (/s/ + one of a small set of consonants)

  • - (pre-initial + initial)

Examples: ‘stay, spoon, skin, small, snow, sleep, swim, etc’.


Consonant clusters in the onset:

Initial two-consonant clusters:

Composed of (one of a set of fifteen consonants + /l, r, w, j/ (initial + post- initial) Examples: ‘fly, green, three, twin, pride, blind, try, quick, swim’. The set of 15 consonants :/p, t, k, b, d, g, f, θ, s, ʃ, m, n, l, h, v/ [prei] [plei] [twin] [tjuːnə] [kwik] [kjuː] [blæk] [bjuːtɪ] [dwel] [gluː] [grin] [flai] [fjuː] [θrəu] [slɪp] [swɪm] [sjuː] [ʃrʌnk] (past participle of shrink) [hjuːdʒ] [vjuː] [mjuːz] [njuːz] [ljuːd] (lewd – indecent) Initial three-consonant clusters are:

Composed of (/s/ + voiceless stop + approximant) (pre-initial + initial + post-initial) Examples: ‘splash, spread, string, screen, squeeze, etc’ The syllable coda:

If the syllable ends with a vowel, it has a zero coda as in ‘car’ /kɑː/; ‘see’ /siː/. If a syllable ends with one consonant, the final consonant can be any consonant phoneme except /h, r, w, j/. Examples: ‘at’ /æt/; ‘kick’ /kɪk/, ‘catch’ /kætʃ /, ‘seen’ /siːn/. If a syllable ends with two, three or four consonants, such a sequence of consonants is called a consonant cluster ( you already know the clusters, but just in case ) .. Examples: books, six, bank, banks, prompts. Consonant clusters in the coda:

Final two-consonant clusters:

Examples: ‘help, bank, edge, belt, blind, books, six etc’. Final three-consonant clusters:

Examples: helped,against, adjunct, text [kst], seconds, fifths. Final four-consonant clusters:

Examples: prompts, sixths, thousandth, instincts, glimpsed . Strong (stressed) syllables have greater prominence than weak (unstressed) syllables because they can be:

  • - longer in duration

  • - Louder

  • - Sources of pitch changes (high or low)

Weak (unstressed) syllables can have reduced vowels and are less prominent than strong syllables.


The End of this Boring Course/ Story

When you were a baby, did you understand your own language? You learned to understand by listening 24 hours a day, 7 days a week. After that, you learned to speak. Then you learned to read. And then you learned to write. But listening came first! TO LISTEN is active.

TO HEAR is passive. Sometimes it is better only to HEAR. Let the radio, tv, ipod, iphone or the dvd play. But DON'T listen. Just HEAR. Your subconscious will listen for you. And you will still learn. And you - you do nothing. Your brain will HEAR, your subconscious will LISTEN and you will LEARN!