00334545/84 $3.00+0.00
Pergamon Press Ltd.
CONTENTS
Introduction
NOMENCLATURE
Part 1
Part 1, Section A: Amino-Acid Nomenclature
3AA-1 Names of Common s-Amino Acids
3AA-2
3AA-3
3.1 UseofDandL
3.2 Position of prefix
3.3 Omission of prefix
3.4 Subscripts to D and L
3.5 The RS system
3.6 Amino acids derived from amino sugars
3.7 Use of meso
3.8 Use of DL
3AA-4
3AA-10 Carboxyl Group Modifications other than Ester and Amide Formation
10.1 Removal of the carboxyl group
10.2 Ketones
10.3 Aldehydes and alcohols
These are recommendations of the IUPAC-IUB Joint Commission on Biochemical Nomenclature (JCBN), whose members are
H.B.F.Dixon (chairman), A. Cornish-Bowden (secretary), C. Libecq (as chairman of the IUB Committee of Editors of Biochemical Journals),
K. L. Loening, G. P. Moss, J. Reedijk, S.F. Velick, and J. F. G. Vliegenthart. Comments may be sent to any member of the commission, or to its
secretary: A. Cornish-Bowden, Department of Biochemistry, University of Birmingham, P.O. Box 363, Birmingham, England, B15 2TT. JCBN
thanks many who helped with drawing up the recommendations, especially P. Karlson, its former chairman, B. Keil, a former member of the
Nomenclature Committee of IUB (NCIUB), other members and former members of NCIUB, namely H. Bielka, N. Sharon and E. C. Webb,
and also W. E. Cohn, J. T. Edsall, J. S. Morley, G. T. Young, and members of the IUPAC Commission on Nomenclature of Organic Chemistry
(CNOC).
Reproduced
Part 1,
SYMBOLISM
Part 2,
Part 3,
597
598
INTRODUCTION
The traditional and well-known names of the common a-amino acids were, in general, given to them by their discoverers and
bear no relationship to their chemical structures [1, 2]. The modification of these names to accommodate derivatives and to
designate configuration was codified in 1947 [3] and revised in 1960 [4]. After proposals for the revision of the rules for naming
a-amino acids with two centres of chirality had appeared in 1963 [5], a complete revision of the rules was made in 1974 [6} on the
basis of a report by a committee convened by H. B. Vickery. Recommendations for symbols for amino-acid residues in peptide
sequences made by Brand & Edsall(p. 224 in [8]) were revised in 1966 [9] and 1971 [10], and recommendations for a one-letter
notation were approved in 1968 [1 1 ]. Recommendations for naming and symbolizing sequences derived from those of named
peptides were made in 1966 [12].
The present revision combines all these documents. In Part I on nomenclature, the main changes are to propose names for
particular ionic forms ofresidues(3AA-12.2) and to apply the stereochemicairules [13]more fully(3AA-3). Part 2 on symbolism
introduces a few new symbols (3AA-15.2.7), simplifies the designation of ionized forms of peptides (3AA-19.3), explains the
principles for giving symbols to reagents (3AA-1 8.2), presents a method for showing how parts ofresidues react (3AA-17.4), and
describes the one-letter system for representing long sequences (3AA-2O and -21). Part 3, on the modflcation of named peptides, is
extended to cover enantiomers and reversed sequences (3AA-22.7) and peptide analogues (3AA-22.8). Symbols for the twenty
ribosomally incorporated (coded) amino acids are given in Table 1, and symbols used in these recommendations for other amino
acids are mostly listed in the Appendix, although a few others are given in 3AA-1 5.2. Substantially new recommendations are
marked by triangles in the margins.
Part 1. Nomenclature
The trivial names of the a-amino acids that are commonly found in proteins and are represented in the genetic code, together
with their symbols, systematic names [14] and formulas, are given in Table 1. Some other common amino acids are listed in the
Appendix.
When the phrase 'amino acid' is a qualified noun it contains no hyphen; a hyphen is inserted when it becomes an adjective so
as to join its components in qualifying another noun, e.g. amino-acid sequence.
3AA-2. FORMATION OF SEMISYSTEMATIC NAMES FOR AMINO ACIDS AND DERIVATIVES
599
The systematic names and formulas given refer to hypothetical forms in which amino groups are unprotonated and carboxyl groups are
undissociated. This convention is useful to avoid various nomenclatural problems but should not be taken to imply that these structures represent
an appreciable fraction of the amino-acid molecules.
Trivial nam#
Symbol
One-letter
symbolb
Systematic namea
Formula
Alanine
Ala
Arg
CH3CH(NH2)COOH
ASfld
A5pd
Nd
Dd
Cys
Glnd
Qd
Glud
Ed
Gly
His
G
H
2-Aminopropanoic acid
2-Amino-5-guanidinopentanoic acid
2-Amino-3-carbamoylpropanoic acid
2-Aminobutanedioic acid
2-Amino-3-mercaptopropanoic acid
2-Amino-4-carbamoylbutanoic acid
2-Aminopentanedioic acid
Aminoethanoic acid
2-Amino-3-(1H-imidazol-4-yl)propanoic acid
Arginine
Asparagine
Aspartic acid
Cysteine
Glutamine
Glutamic acid
Glycine
Histidine
H2N-C(= NH)NH[CH2]3-CH(NH2)--COOH
H2NCOCH2CH(NH2)COOH
HOOCCH2---CH(NH2)COOH
HSCH2CH(NH2)COOH
H2NCO[CH2]2-CH(NH2)--COOH
HOOC[CH2kCH(NH2)--COOH
CH2(NH2)COOH
/CH =CCH2CH(NH2)COOH
HN
CH
Isoleucine
Leucine
Ile
Leu
Lysine
Lys
Methionine
Phenylalanine
Proline
Met
Phe
Pro
L
K
M
Serine
Threonine
Ser
Thr
Trp
F
P
2-Amino-3-methylpentanoic acida
2-Amino-4-methylpentanoic acid
2,6-Diaminohexanoic acid
2-Amino-4-(methylthio)butanoic acid
2-Amino-3-phenylpropanoic acid
Pyrrolidine-2-carboxylic acid
C2H5CH(CH3)CH(NH2}-COOH
(CH3)2CHCH2--CH(NH2)COOH
H2N[CH2].CH(NH2)COOH
2-Amino-3-hydroxypropanoic acid
2-Amino-3-hydroxybutanoic acida
2-Amino-3-(1H-indol-3-yl)propanoic acid
HOCH2CH(NH2)COOH
CH3S--[CH212CH(NH2)COOH
C6H5 CH2CH(NH2)COOH
1CH2CH2
CH-COOH
CH2
NH"
Tryptophan
CH3CH(OH)CH(NH2)COOH
EJIl.CH2CH(NH2)COOH
Tyrosine
Tyr
2-Amino-3-(4-hydroxyphenyl)propanoic acid
Valine
Val
2-Amino-3-methylbutanoic acid
Unspecified
amino acid
Xaa
Ho__/I_. cH2dH(NH2)COOH
(CH3)2CHCH(NH2)COOH
a The trivial name refers to the L or D or Dr-amino acid; for those that are chiral only the i-amino acid is used for protein biosynthesis.
Use of the one-letter symbols should be restricted to the comparison of long sequences (3AA-20).
The fully systematic forms ethanoic, propanoic, butanoic and pentanoic may alternatively be called acetic, propionic, butyric and valeric,
respectively. Similarly, butanedioic succinic, 3-carbamoylpropanoic succinamic, pentanedioic glutaric, and 4-carbamoylbutanoic glutaramic.
d The symbol Asx denotes Asp or Asn; likewise B denotes N or D. Glx and Z likewise represent glutamic acid or glutamine or a substance,
such as 4-carboxyglutamic acid, Gla (3AA-15.2.6), or 5-oxoproline, Glp (3AA-16.5), that yields glutamic acid on acid hydrolysis of peptides.
See 3AA-3 and -4 for stereochemical designation.
terms like 'a-amino acids' and 'a-carbon atom' are retained. Example:
4
3
2
1
5
6
lysine H3W-CH2-CH2--CH2-CH2-CH(NH2)--COO
a
5
y
A heteroatom has the same number as the carbon atom to which it is attached, e.g. N-2 is on C-2. When such numerals are used as
A
H3
CH3-CH2-CH--CH(NHfl-COO
4
2
1
3
5
600
The word 'methyl' can be italicized for use as a locant for substitution on (or isotopic modification {Section H in [14]} of) the
A methyl group of methionine, e.g. [methyl-14C]methionine. The nitrogen atoms of arginine are designated as shown for the
arginine (1 +)cation:
a
coH2N
C-NH-[CH2]3-CH(NH3 )COO
w'H2N
It should be noted that the w and w' atoms ofthis cation are equivalent because ofresonance. The carbon atom in the guanidino
group may be called guanidinoC (it may be needed as a locant for isotopic replacement although it cannot carry a substituent).
2.2.2. Proline
The carbon atoms in proline are numbered as in pyrrolidine, the nitrogen atom being numbered 1, and proceeding towards
the carboxyl group.
NH2
Aromatic Rings
2.2.3.
The carbon atoms in the aromatic rings of phenylalanine, tyrosine and tryptophan are numbered as in systematic
nomenclature, with 1 (or 3 for tryptophan) designating the carbon atom bearing the aliphatic chain. The carbon atoms of this
chain are designated a (for the carbon atom attached to the amino and carboxyl groups) and fJ (for the atom attached to the ring
system).
Note. This numbering should also be used for decarboxylated products (e.g. tryptamine).
4
-CH(NH3)COO
HO_fILCH2 CH(NH3)COO
tyrosine
phenylalanine
a
CH2CH(NH3)COO13
fl1\2
6L9NI
H
tryptophan
2.2.4
Histidine
The nitrogen atoms of the imidazole ring of histidine are denoted by pros ('near', abbreviated it) and tele ('far', abbreviated x)
to show their position relative to the side chain. This recommendation [6, 10] arose from the fact that two different systems of
numbering the atoms in the imidazole ring of histidine had both been used for a considerable time (biochemists generally
numbering as 1 the nitrogen atom adjacent to the side chain, and organic chemists designating it as 3). The carbon atom between
the two ring nitrogen atoms is numbered 2 (as in imidazole), and the carbon atom next to the t nitrogen is numbered 5. The carbon
atoms of the aliphatic chain are designated a and /3 as in 2.2.1 and 2.2.3 above. This numbering should also be used for the
decarboxylation product histamine and for substituted histidine.
a
13
CH2CH(NH3ICOO
HNN
histidine
2.2.5.
A When amino acids are combined in proteins and peptides, C-i, C-2 and N-2 of each residue (the numbering being that of
aliphatic amino acids) form the repeating unit of the main chain ('backbone') and the remainder forms a 'side chain'. Hence the
words 'side chain' refer to C-3 and higher numbered carbon atoms and their substituents.
The prefix 'nor' denotes removal of a methylene group (Sections F-4.2 and F-4.4 of [15]), but this is not the sense in which it
has been used in the names 'norvaline' and 'norleucine'. Such names, although widely used, may therefore be misinterpreted, so
cooCOOCOO
H3N_f__H or H3N.H or RLNH3+
The relationship between serine and glyceraldehyde may therefore be represented as:
CO2
CHO
HNH3 HsOH
H2OH
D-serine
CH2OH
D-glyceraldehyde
Where confusion is possible between the use of the small capital letter prefix for the configuration of the a-carbon atom in
amino-acid nomenclature and for that of the highest numbered chiral carbon atom in carbohydrate nomenclature [17], a
subscript (lower case Roman letter) is added to the small capital letter prefix. If the prefix is used in the amino-acid sense, the
subscript is s (for serine); if the prefix is used in the carbohydrate sense, the subscript is g (for glyceraldehyde).
Examples: i.,-threonine, for which the synonym in carbohydrate nomenclature is 2-amino-2,4-dideoxy-Dg-threonic acid;
threonine, for which the synonym is 2-amino-2,4-dideoxy-Lg-threonic acid; i.-allothreonine, for which the synonym is 2-amino2,4-dideoxy-Lg-erythronic acid; D-allothreOnine, for which the synonym is 2-amino-2,4- dideoxy-Dg-erythronic acid.
Note that the subscripts are essential only in discussions where both amino-acid names and those of carbohydrate derivatives
occur. Nevertheless, these subscripts are highly desirable if D or L is used in naming a-amino acids that possess more than one
centre of chirality (see 3AA-4).
601
602
3AA-3.5.
The RS System
A more general system ofstereochemical designation, which is especially convenient when there is no simple way ofrelating a
compound to a defined standard, is the RS system of Cahn, Ingold & Prelog [1 3, 1 8]. In this system the ligands of a chiral atom
are placed in an order ofpreference, based largely on atomic number. Ifthe first three ligands appear clockwise in this order when
viewed from the side remote from the least-preferred (fourth) ligand, the chiral centre is R; if anticlockwise, it is S.
The L-configuration, possessed by the chiral a-amino acids found in proteins, nearly always corresponds to Sin the RS system.
The most important exceptions are L-cysteine and L-cystine (see Appendix), which are R (in most amino acids the order of
preference of the groups around C-2 is NH, C00, R, H, but in cysteine and cystine the group R takes precedence over
carboxylate because the atomic number of sulfur attached to C-3 is higher than that of oxygen attached to C-I).
3AA-3.6. Amino Acids Derived from Amino Sugars
Amino acids that are derived from amino sugars and contain five or more carbon atoms are named in conformity with the
system of carbohydrate nomenclature [17] or with a recommended trivial name.
Examples: (1) Dg-glucOsaminic acid for 2-amino-2-deoxy-Dg-gluconic acid, the a-carbon of which has the configuration of
that in D-serine, and in which C-5, the highest numbered chiral centre, also has the D-configuration; (2) D8-mannosaminic acid
for 2-amino-2-deoxy-Dg-mannonic acid, the a-carbon of which has the configuration of that in L-serine, but in which C-5 has the
D configuration. The subscript g may be omitted unless confusion with the amino-acid use of the designations D and L is likely.
3AA-3.7. Use of meso
The prefix meso-, in lower case italic letters, is used to denote those amino acids or derivatives that, although they contain
chiral groups, are achiral, usually because of a plane of symmetry, e.g. meso-lanthionine.
3AA-3.8. Use of DL
A mixture of equimolar amounts of D and L compounds is termed racemic and is designated by the prefix DL (no comma), e.g.
(see 3AA-5).
DL-leucine. It may alternatively be designated by the prefix rac- (e.g. rac-leucine) or by the prefix ()3AA-4. CONFIGURATION AT CHIRAL CENTRES OTHER THAN THE a-CARBON
The RS system (3AA-3.5) is preferred for designating configuration at centres other than a-C,e.g. (25,3R)-threonine. To
avoid using two different systems of designation in the same name, (25,45)-4-hydroxyproline may be used instead of (45)-4hydroxy-L-proline.
H2
N
H2
N
OH
COO
COO
OH
cis-4-hydroxy-L-proline
trans-4-hydroxy-L-proline
H2
H2
cis-4-hydroxy-D-proline
trans-4-hydroxy-D-proline
H2
H2
HO
COO
COO
OH
cis-3-hydroxy-L-proline
trans-3-hydroxy-L-proline
603
The prefixes cis and trans refer to the relative positions of the hydroxyl and carboxyl groups in each compound.
Comment. The hydroxyprolines found in collagen are trans-4-hydroxyproline (predominantly) and trans-3-hydroxyproline.
The prefixes may be omitted when no ambiguity arises (cf. 3AA-3.3).
3AA-4.4. Use of'allo'
Amino acids with two chiral centres were named in the past by allotting a name to the first diastereoisomer to be discovered.
The second diastereoisomer, when found or synthesized, was then assigned the same name but with the prefix allo-. This method
A can be used only with trivial names (see 2.1) but not with semisystematic or systematic names. It is now recommended that allo
CH3
CH3
'CiCH3 CH CH
CH2 H
CH2 cH3
CH C,NH3
CO2 NH3
C02 H
L-isoleucine
D-isoleucine
CH3 OH
CH
CH3 H
C02 NH3
C02 H
L-threonine
D-threonine
CfiOH
CH2 CH3
,.),HNH3
c02-
L-alloisoleucine
CH3 H
COH
I
CNH3 CO2
CHNH3
L-allothreonine
CH3
CH2 H
NCfCH3
CO2 H
D-alloisoleucine
CH3 OH
CfIIH
I
C02- H
D-allothreonine
If the configuration is known at one centre but not at a second, the RS system is used for the known centre, with a Greek
xi (),
meaning 'unknown configuration' for the other, e.g. (2S,5)-2-amino-5-hydroxyhexanoic acid (a single stereoisomer). If the
configuration at two centres is unknown, the may be used as in the example (2,5)-2-amino-5-hydroxyhexanoic acid. If a
racemate is to be designated, this is done by reference to its optical activity (3AA-5), e.g. ()-(2,5)-2-amino-5hydroxyhexanoic acid. If the relative configuration of two centres is known, but the absolute is unknown,'R*' and 'S*' may be
If it is desired to indicate the direction of rotation of plane polarized light of specified wavelength in a specified solvent, this
can be done with a 'plus' or 'minus' sign in parenthesis (E-4.4 of reference [13]), e.g. (+)-6-hydroxytryptophan. This may be
particularly useful if the configuration at C-2 is not known, but it may also be done for emphasis, with or without a
configurational symbol D or L, when this configuration is known, e.g. (+ )-glutamic acid, or.( + )-L-glutamic acid. A racemic
Part
()-leucine.
If an amino acid is substituted on a saturated carbon, it remains an amino acid. Its naming is therefore described in 3AA-2.
This section extends some of the procedures described there, and also covers modification of functional groups.
A number of special procedures are given below to allow names to be based on the trivial names of the a-amino acids, so that
they may indicate biochemical relationships. These procedures, which yield names such as N6-lysino (3AA-7), alanin-3-yl(3AA8), leucinamide (3AA-9.2), phenylalanylchloromethane (3AA-10.2), alaninol(3AA-10.3), etc., should not be extended to other
areas.
3AA-6. IONIZATION OF FUNCTIONAL GROUPS AND NAMING OF SALTS
604
aminopropanoic acid
rather than as 2-ammoniopropanoate. This is particularly so for representing the isoelectric form of amino
acids that contain other ionizing groups. A solution of lysine, for example, would contain appreciable amounts of both NH
When it is desirable to mention or stress the ionic nature ofan amino acid, the three kinds ofions possible for a mono-aminomono-carboxylic compound may be indicated as follows:
NHCH2COO
NHCH2COOH
NH2CH2--C00
In indicating an anion the ending 'ate' replaces 'ic acid' or the final 'e' of the trivial name, or is added to the name tryptophan.
Further forms are required for amino acids that contain ionizing side chains. The singly charged anions of aspartic and
glutamic acids (strictly each has two negative and one positive charge, but this nomenclature refers to net charge) may be
distinguished from the doubly charged anions by placing the charge after the name, or by stating the number ofneutralizing ions.
Thus the form of glutamate (glutamate refers to glutamic acid ; glutaminate is the anion from glutamine) with a charge of minus
one, OOCCH2CH2CH(NH)COO , may be called glutamate(1-), glutamic acid monoanion, or hydrogen glutamate, and
its sodium salt may be called sodium glutamate(1-), sodium hydrogen glutamate, or monosodium glutamate. The corresponding
terms for the dianion, 00CCH2CH2CH(NH2)C00, include glutamate(2-), glutamic-acid dianion, and disodium
glutamate. Unqualified, the word glutamate systematically means the dianion ; hence the usage 'sodium hydrogen glutamate' ; in
normal use, however, it means the ion ofnet charge -1, since this is the form that predominates in neutral solution, and it is used in
this way in, for example, 'a glutamate-dependent reaction' and 'glutamate dehydrogenase'.
Similarly, forms such as lysinium(1 + )or lysine monocation may be used for the ion ofunit net charge derived from lysine. Its
salts may be indicated by adding the name of the anion to the lysinium form, e.g. lysinium(1 + ) chloride, or by naming it lysine
monohydrochloride. The fully protonated form is the lysine dication or lysinium(2+).
3AA-7. AMINO ACIDS SUBSTITUTED ON NITROGEN
Since N-2 is the atom most easily modified in many amino acids, the locant can often be omitted without ambiguity, e.g.
acetylglycine for N-acetylglycine.
It is sometimes convenient to use the name ofa group derived by loss ofhydrogen from a nitrogen atom ofan amino acid as a
prefix in forming another name. Such prefixes are formed by substituting 'o' for the terminal 'e' in those names that end in 'e' (by
analogy with amineamino); e.g. alanino, valino. Tryptophan adds the 'o' directly, and the two dicarboxylic acids become
asparto and glutamo. Where there is more than one nitrogen atom in the amino acid, a locant of the form N' must precede the
group name. e.g. N6-lysino, N"-arginino, N5-glutamino, N'-histidino.
3AA-8. SIDE-CHAIN MODIFICATIONS (excluding modifications of carboxyl or nitrogen)
Most modified amino acids can be named according to 3AA-2, e.g. S-(carboxymethyl)-L-cysteine. Groups formed by loss of
hydrogen atoms from carbon, sulfur or oxygen atoms (excluding the carboxylic oxygen atoms, which are dealt with under 3AA9) are named by substituting '-x-yl' for the terminal 'e' of the trivial name, where 'x' is the locant of the atom from which the
hydrogen atom has been lost, e.g. cystein-S-yl, threonin-03-yl, alanin-3-yl, or by adding '-x-yl' to aspartic, glutamic and
tryptophan, e.g. aspartic-2-yl, tryptophan-2-yl (see 3AA-2.2.3).
Comment. tryptophan-1-yl should be named 1-tryptophano according to 3AA-7.
A common side-chain modification is the oxidation of cysteine to yield cystine (formula in Appendix). Hydrogen atoms are
removed from the SH groups of two molecules, which are joined by an SS bond. The term 'half cystine' refers to each half. It
occurs seldom in naming compounds, since half a cystine molecule is a substituted cysteine and is named as such. In stating
amount of substance, however, any specified entity may be used, so moles or numbers of residues of half cystine may usefully be
compared with these quantities of other amino acids in stating protein composition.
3AA-9. ESTERS AND AMIDES OF THE CARBOXYL GROUP
3AA-9.l. Esters
Esters are named with the anion name (3AA-6), e.g. methyl prolinate, methyl cysteinate, or from the amino acid, e.g. proline
methyl ester, cysteine methyl ester.
3AA-9.2. Amides, Anilides and Analogous Derivatives (H2NCHRCONHR')
In amides, anilides and analogous derivatives of a-amino acids the hydroxyl group of the carboxyl has been replaced by an
amino, anilino, or analogous group. They may be named by replacing the final 'e' of the trivial name of the amino acid by the
word 'amide', 'anilide', etc., e.g. glycinamide, leucinamide, argininanilide. Alternatively, these compounds may be described as
glycine amide, leucine amide, etc.
Note that the 4-amide of aspartic acid and the 5-amide of glutamic acid have specific trivial names, asparagine and glutamine.
Their 1-amides are named aspartic 1-amide and glutamic 1 -amide, or isoasparagine and isoglutamine.
3AA-9.3.
605
Acyl Groups
The acyl group ofan a-amino-mono-carboxylic acid is a structure that lacks the hydroxyl group ofthe carboxyl (H2NCHR
CO). The names ofsuch groups are formed by replacing the ending 'me' (or 'an' in tryptophan) by 'yl' (C-421 ofreference [14]),
e.g. alanyl, arginyl, tryptophyl. 'Cysteinyl' is used instead of'cysteyl', because ofpotential confusion with the group from cysteic
acid. 'Cystyl' is the diacyl group of cystine, and 'haif-cystyl' is the acyl group of cysteine lacking also the H of its SH group.
The monoacyl groups derived from aspartic acid, HOOCCH2CH(NH2)CO and CO-CH2CH(NH2)COOH, are
designated a-aspartyl (or aspart-1-yl) and /3-aspartyl (or aspart-4-yl) respectively; the corresponding groups derived from
glutamic acid are a-glutamyl (or glutam-1-yl) and y-glutamyl (or glutam-5-yl) (C-421.3 of reference [14]). The diacyl groups
formed from the dicarboxylic amino acids are aspartoyl and glutamoyl. The acyl groups derived from asparagine and glutamine
are termed asparaginyl and glutaminyl respectively.
3AA-10. CARBOXYL GROUP MODIFICATIONS OTHER THAN ESTER AND AMIDE FORMATION
Several decarboxylated amino acids have trivial names terminated with 'amine' : tyramine, histamine, cysteamine,
tryptamine, methioninamine. Similarly cystine (see Appendix) forms cystamine.
3AA-10.2. Ketones
If the hydroxyl group of the 1-carboxyl is replaced by an alkyl group, the name of the ketone formed can use the name of the
amino acid by naming the compound as a substituted hydrocarbon, e.g. phenylalanyichioromethane for C6H5CH2CH(NH2)-COCH2C1, 3-amino-1-chloro-4-phenylbutan-2-one (see also 3AA-18.2). This type of name is based on the trivial names of
amino acids (or peptides), so does not place the substituents ofmethane in alphabetical order (as systematic nomenclature does),
but places 'chloromethane' at the end because this indicates C-terminal modification (see 3AA-1 3.1). The practice ofusing names
such as 'phenylalanine chioromethyl ketone' is discouraged, because they erroneously specify the carbonyl group twice.
3AA-10.3. Aldehydes and Alcohols
Aldehydes and alcohols obtained by successive stages of reduction of the carboxyl group of a-amino acids are named by
replacing the final 'e' of a trivial name ending in 'me' (or the 'ic acid' of aspartic and glutamic acids) with the endings 'al' and 'ol'
respectively.
Examples. RCH(NH2)CHO : alaninal, leucinal, lysinal, serinal, aspart-1-al, glutaminal. RCH(NH2)CH2OH: alaninol,
leucinol, lysinol, serinol, aspart-1-ol, glutaminol. The aldehyde and alcohol derivatives of tryptophan take the names
tryptophanal and tryptophanol. The name glycinol is little used, because the systematic name 2-aminoethanol is short, and this
already has the trivial name ethanolamine [19].
Note. The derivative of lysine in which the CH2NH2 group is replaced by CHO has the trivial name allysine (see
Appendix).
A peptide is any compound produced by amide formation between a carboxyl group of one amino acid and an amino group
of another. The amide bonds in peptides may be called peptide bonds. The word peptide usually applies to compounds whose
amide bonds are formed between C-I of one amino acid and N-2 of another (sometimes called eupeptide bonds), but it includes
compounds with residues linked by other amide bonds (sometimes called isopeptide bonds). Peptides with fewer than about 10
20 residues may also be called oligopeptides; those with more, polypeptides. Polypeptides of specific sequence of more than about
50 residues are usually known as proteins, but authors differ greatly on where they start using this term.
3AA-12. AMINO-ACID RESIDUES
When two or more amino acids combine to form a peptide, the elements of water are removed, and what remains of
each amino acid is called an amino-acid residue. a-Amino-acid residues are therefore structures that lack a hydrogen atom of
the amino group (NHCHR--COOH), or the hydroxyl moiety of the carboxyl group (NH2CHRCO), or both (NHCHR
CO); all units of a peptide chain are therefore amino-acid residues. (Residues of amino acids that contain two amino groups or
two carboxyl groups may be joined by isopeptide bonds, and so may not have the formulas shown.)
The residue in a peptide that has an amino group that is free, or at least not acylated by another amino-acid residue (it may, for
example, be acetylated or formylated), is called N-terminal; it is at the N-terminus. The residue that has a free carboxyl group, or
at least does not acylate another amino-acid residue, (it may, for example, acylate ammonia to give NHCHRCONH2), is
called C-terminal. If the amino group of the N-terminal residue is free, the residue may be named as an acyl group under 3AA-9.3;
606
When it is desirable to mention or emphasize the particular ionic form of a residue, this may be done as follows
Name of residue
Protonatedform
argininium residue
arginine residue
histidine residue
lysine residue
aspartic residue
cysteine residue
glutamic residue
tyrosine residue
histidinium residue
lysinium residue
aspartic (acid) residue
cysteine (acid) residue
glutamic (acid) residue
tyrosine (acid) residue
Deprotonated form
arginine (base) residue
histidine (base) residue
lysine (base) residue
aspartate residue
cysteinate residue
glutamate residue
tyrosinate residue
To name peptides, the names ofacyl groups ending in 'yl' (3AA-9.3) are used. Thus ifthe amino acids glycine, NHCH2-COO-, and alanine, NHCH(CH3)COO, condense so that glycine acylates alanine, the dipeptide formed, NHCH2CO
NHCH(CH3)C00, is named glycylalanine. Ifthey condense in the reverse order, the product, NHCH(CH3)CONH
CH2C00 , is named alanylglycine. Higher peptides are named similarly, e.g. alanylleucyltryptophan. Thus the name of the
peptide begins with the name ofthe acyl group representing the N-terminal residue, and this is followed in order by the names of
the acyl groups representing the internal residues. Only the C-terminal residue is represented by the name ofthe amino acid, and
this ends the name ofthe peptide. Formulas should normally be written in the same order, with the N-terminal residue on the left,
A multiplicative affix (p. 5 of reference [14]) placed before 'peptide' gives the total number of residues in the peptide, e.g.
A hexapeptide. Since the higher affixes are not well known, they may be replaced by numerals, e.g. a 22-peptide.
Higher oligopeptides and polypeptides of biological origin often have trivial names ; their sequences are usually described
more conveniently by symbols (3AA-14 to 3AA-19 below) than by constructing long names.
3AA-13.2. Use of Prefixes in Peptide Names
Configurational prefixes (3AA-3) are placed immediately before the trivial names of the residues they refer to. The prefixes
are set off from the names before and after them with hyphens. Examples: L-alanyl-L-leucine; L-alanyl-D-leucrne; glycyl-Lalanine; L-alanylglycine; L-leucyl-L-phenylalanyl-L-leucylglycine; L-alanylglycyl-L-leucine.
A
The mixture of diastereoisomers formed by condensations between DL-amino acids will contain unspecified proportions of
each pair of enantiomers. Names such as DL-alanyl-DL-leucine have been used in the past, but they are misleading because they
contradict the accepted meaning of the prefix DL as signifying a racemate; here the racemate of L-alanyl-L-leucme and D-alanylD-leucine (which may be designated as rac-L-alanyl-L-leucine) is mixed in unspecified proportions with the racemate of L-alanylD-leucine and D-alanyl-L-leucine (which may similarly be designated as rac-L-alanyl-D-Ieucine). This is better indicated by the
name ambo-alanyl-ambo-leucine (item 12c of reference [20]).
A mixture of L-alanyl-L-alanyl-L-alanine and L-alanyl-D-alanyl-L-alanme may likewise be called L-alanyl-ambo-alanyl-Lalanine.
3AA-13.3. Name of Simple Polymers of ct-Amino Acids
Simple polymers of amino acids may, if preferred, be named with prefixes to indicate the number of amino-acid residues
present, e.g. tetraglycine. Mixtures of polymers with varying numbers of residues may be given names like oligoglycine,
polyglycine, poly(L-lysine), etc. [21].
The atoms of a peptide may need to be numbered as locants for substitution or isotopic replacement. Often no more
numbering is required than that of atoms within a residue (see 3AA-2.2), e.g. alanyl-3-chloroalanylalanine. It may sometimes be
convenient to indicate substitution of the peptide as a whole. This may be done by adding the residue number, obtained by
A numbering residues from the N-terminus, after the atom number, and separated from it by a point. The above compound may
therefore be called 3.2-chloro(alanylalanylalanine). Thus the atom C-3.2 is C-3 of the second residue of the peptide. Example:
Alanylthreonylglycylaspartylglycine 4.4-3.2-lactone for the compound that can be represented (3AA-16, -17 and -19 below) as
Ala-Tfir-Gly-Ap-Gly.
Such numbering is especially useful for peptides with trivial names (see 3AA-22.5), e.g. N5 4-methyloxytocin would indicate a
methyl substituent on N-5 of the glutamine residue at position 4 of oxytocin. If the peptide name that follows a substituent
indicated in
607
this way is constructed residue by residue, it must be placed in parentheses to show that the numbering applies to the
Part 2. Symbolism
Part 2, Section A: THE THREE-LETTER SYSTEM (a revision and updating of [10])
3AA-14. GENERAL CONSIDERATIONS ON THREE-LETTER SYMBOLS
14.1. The symbol chosen for an amino acid (Table 1) is derived from its trivial name, and is usually the first three letters of this
name. It is written as one capital letter followed by two lower-case letters, e.g. Gln (not GLN or gln), regardless of its position in a
sentence or structure. If any other convention is used in representing residues, e.g. to emphasize homology, this should be stated
clearly whenever it is used. When the symbol is used for a purpose other than representing an amino-acid residue, e.g. to designate
a genetic factor, three lower-case italic letters may be used, e.g. gin.
14.2. The main use of the symbols is in representing amino-acid sequences. Inasmuch as the symbols by themselves represent
the unsubstituted amino acids, they are modified (3AA-1 6) by hyphens to represent residues. We do not recommend use of the
symbols to represent free amino acids in textual material, but such use may be desirable in tables, diagrams or figures. It may also
be convenient to use them for indicating residue numbers, e.g. Tyr-1 10 for tyrosine residue 110. For substituents, supplementary
symbols are used (3AA-17 and -18).
14.3. A symbol may represent either the name or the formula of a compound.
14.4. Heteroatoms of amino-acid residues (e.g. 0-3 serine, N-6 of lysine) do not explicitly appear in the symbol, as it
represents the whole molecule including them (but see 3AA-17.4).
14.5. Amino-acid symbols denote the L configuration of chiral amino acids unless otherwise indicated by the presence of D or
DL before the symbol and separated from it with a hyphen (see also 3AA-19.2). L may similarly be inserted for emphasis.
14.6. Structural formulas may be used together with symbols to make complicated features or reactions clear (for examples
see 3AA-17.4).
3AA-i5. SYMBOLS FOR AMINO ACIDS
The symbols for the amino acids that are coded for by mRNA are listed in column 2 of Table 1.
3AA-i5.2. Symbols for Less Common Peptide Constituents
Symbols for less common amino acids should be defined in each publication in which they appear. The following principles
and notations are recommended.
41
or Pro
608
15.2.2.
Alloisoleucine and allothreonine (3AA-4.4) may be symbolized by aIle and aThr respectively.
trivial name of a branched-chain compound to designate a straight-chain compound, its use for amino acids should be
A progressively abandoned (3AA-2.4), along with the earlier symbols Nva and Nie. Appropriate symbols for these compounds, 2aminovaleric and 2-aminohexanoic acids, based on symbols proposed for the unsubstituted acids [19], are Avi and Ahx (see also
3AA-15.2.5).
Hse
Homocysteine
Hcy
A
A
A
A
Examples
Symbol
f3Ala
Aad
Note
Abu
Ape
Ahx
cAhx
/3Aad
Apm
A2pr or Dpr
A2bu or Dab
Orn
A2pm or Dpm
ii, iii, iv
ii
ii, iii
Notes
i) This symbol is recommended in place of the previous eAcp, in which 'cp' stood for caproic, which may be confused with capric and
caprylic.
ii) The previous edition of these recommendations (10) discouraged abbreviations starting 'D' for 'di' or 'T' for 'tn' or 'tetra'; because these
letters were overused. We concur in preferring subscripts when these can be applied to well-known symbols, so that Me2SO is preferable to
DMSO, Me3Si- to TMS-, and H4 to TH. Nevertheless we are not convinced that 'A2' easily suggests 'diamino', so alternative symbols are
presented.
iii) 'Dap' should not be used as a symbol, since it could be construed to mean either diaminopropanoic acid or diaminopimelic acid.
iv) 2,3-Diaminopropanoic acid can be regarded as 3-aminoalanine, and so may be symbolized by 'side-chain substitution' (3AA-17.3 below)
as Ala(NH2) or Ala, but users should beware of the possibility that the former may be confused with Ala-NH2 (3AA-17.1), the symbol for
alaninamide.
I
NH2
15.2.6.
Symbols are recommended for two amino acids that have an additional acidic group and may occur in polypeptide sequences.
They are:
4-Carboxyglutamic acid Gla
Cysteic acid
Cya
15.2.7.
Symbols for sugar residues (e.g. Gic, Gal) have been proposed [22], as have ones for nucleoside residues (e.g. Ado, Cyd) [23],
and these may be combined with amino-acid symbols to represent glycopeptides, etc. These symbols include [221 Neu for
neuraminic acid, Neu5Ac for N-acetyl neuraminic acid, and Mur for muramic acid. Depsipeptides (3AA-19.6) contain hydroxyacid residues; when symbols are used for these they should be defined.
3AA-16. SYMBOLISM OF AMINO-ACID RESIDUES
(normally as NHCH2-CO-)
(normally as -NH-CH2-COO)
Thus the hyphen, which represents the peptide bond, removes OH from the 1 -carboxyl group of the amino acid (written in the
conventional un-ionized form) when it is placed on the right of the symbol (i), and removes H from the 2-amino group of the
amino acid when it is placed on the left of the symbol (ii); both modifications can apply to one symbol (iii).
Thus the peptide Gly-Glu (without hyphens at its ends) is distinguished from the sequence -Gly-Glu- (with hyphens at its
ends).
3AA-16.2. Lack of Hydrogen on the 2-Amino Group
A hyphen on the left of the symbol signifies removal of a hydrogen atom from the 2-amino group, as well as representing the
bond formed by the group thus produced. If it should prove necessary to draw a bond to N-2 on the right of the symbol (e.g. in a
cyclic peptide, 3AA-19.4 below), then the hyphen must be replaced by an arrow, which points from CO to NH within the peptide
bond.
If both atoms on N-2 are replaced, two lines can be drawn on the left of the symbol, e.g.
then the hyphen must be replaced by an arrow, which has the same effect.
3AA-16.4. Removal of Groups from Side Chains
0
CH2
Ser
means NH-CH-COO
NH
[CH2]4
Lys
means NH-CH-COO
CH-CH2-NH
[CH2]2
J5
Lys
means NH-CH-COO
609
610
Notes. (a) H is removed from N-w rather than N-5 of arginine unless otherwise indicated; (b) a locant, ir or x(3AA-2.2.4), is
always required for histidine.
A vertical line drawn above or below either of the symbols Asp and Glu represents removal of OH from the side-chain
carboxyl group, as well as representing a bond to a substituent. If a hydrogen has to be removed from a saturated carbon of the
side chain, then a vertical line may be used, but it must be accompanied by a locant. Examples:
CO
CH2
Asp
means NHCH-COO
CHC00
13
Asp
means NH-CH-COO
Combination of horizontal lines, indicating removal of H from N-2(3AA-16.1, 3AA-16.2)or OH from C-I (3AA-16.1, 3AA16.3), with the vertical lines that indicate removal of side-chain atoms (3AA-16.4) allows formation of symbols for 5-oxoproline
(systematically 5-oxopyrrolidine-2-carboxylic acid, also known as pyroglutamic acid or pyrrolidonecarboxylic acid) and for
HNCH-CO[CH2]2-O
-NH-CH---CO
-Hse-1 or -Hse>
or Hsl
This follows logically from 3AA-16.1, 3AA-16.2 and 3AA-16.3 by using symbols for atoms or groups to represent the
substituents. Examples (see also 3AA-18.2):
Ac-Gly
Gly-OEt
Ac-Lys
Ser-OMe
Ac-Glu-OEt
Glu-NH2
Asp-OMe
N-Acetylglycine
Glycine ethyl ester
N2-Acetyllysine
Serine methyl ester
o1-Ethyl N-acetylglutamate
Isoglutamine
A second substituent on N-2, when the first is shown with a line as above, may be represented with a second line to the left of the
symbol for the substituted residue: Xaa. It may be convenient to print this as a vertical line joining the first: 'Xaa.
Example: alanyl-N-methylvaline may be represented
Me
Me
Me
I
/
Ala
Val
or AlaVal or Ala1Val
A substituent in parentheses immediately after the amino-acid symbol. When the symbol for a substituent, such as an
oligosaccharide, has a hyphen on its right-hand side to indicate the bond to the amino acid [22], then this symbol should be placed
in parentheses before the amino-acid symbol rather than after it, e.g. -(Galfll-4Xyl/31-)Ser- [22].
Symbols within parentheses written on one line should normally be used only in textual material and when the symbol for the
substituent is short; otherwise the two-line symbols containing a vertical line will be clearer. Note that the substituents
611
represented replace hydrogen except when the amino acid is aspartic acid or glutamic acid, when they replace the OH of the
carboxyl group unless otherwise specified (3AA-16.4.2).
If a locant is required, it is placed beside the vertical line that represents side-chain substitution, or is joined to the substituent
symbol within the parentheses by a hyphen.
Notes
Examples
Symbols
OMe
OEt
05-Ethyl hydrogen N-acetylglutamate
AcGlu or AcGlu(OEt)
Ac
N6-Acetyllysine
Lys or Lys(Ac)
Ac
03-Acetylserine
Ser or Ser(Ac)
Tyr or Tyr(SO3H)
Et
5-Ethylcysteine
Cys or Cys(Et)
OH
SO3H
A 3-Sulfenoalanine
ii
Cys or Cys(OH)
SO3H
ii, iii
Cys or Cys(SO3H)
CN
5-Cyanocysteine
Cys or Cys(CN)
II
Cys
I
Cystine
D-Cystine
D-Cys D-Cys
meso-Cystine
Met or MetO
02
Met or MetO2
I
ii, iv
ii, iv
P
Phosphoserine (03-phosphonoserine)
Ser or Ser(P)
Me
His or His(t-Me)
N'-Methylhistidine (telemethylhistidine, see 3AA-2.2.4)
Notes
i) Asp-OMe represents the O'-methyl ester of aspartic acid (C-i modification by 3AA-16.3), whereas Asp(OMe) represents the 04-methyl
ester (side-chain modification by 3AA-i6.4.2).
ii) Names based on cysteine and symbols based on Cys already indicate sulfur in the molecule, and similarly with methionine and Met.
Indication of modification of this sulfur should not suggest the addition of further sulfur. Hence calling 3-sulfenoalanine by the name
cysteinesulfenic acid, 3-sulfinoalanine by the name cysteinesulfinic acid, methionine S-oxide by the name methionine sulfoxide, and methionine
5,5-dioxide by the name methionine sulfone may be confusing and is not recommended.
iii) Care should be taken with this symbol because readers who fail to realize that the symbol Cys contains the sulfur may confuse it with
cysteic acid, now symbolized Cya (3AA-i 5.2.6). The earlier [10] symbol Cys for cysteic acid has the disadvantage that the vertical line in it does
03H
iv) The vertical lines or parentheses previously [10] in the symbols for MetO and MetO2 are now omitted, because they wrongly implied
removal of hydrogen.
v) P represents P03H2 [24].
612
3AA-17.3.
This may use the same convention as 3AA-17.2, with the addition of locant numerals where necessary (see 3AA-16.4), e.g.
COOH
4j
4-Carboxyglutamic acid
emphasize carboxylation
NH2
Ala or Ala(NH2)
2,3-Diaminopropanoic acid
(3-aminoalanine, see 3AA-15.2.5)
3,5-Diiodotyrosine
3-Nitrotyrosine
Tyr
The symbols are designed primarily to indicate sequence, and care must be taken to avoid confusion when they are adapted to
other uses.
Although the conversion of a cysteine residue in a protein into an S-carboxymethylcysteine can be adequately represented as
Cys- -Cys(CH2-COOH)-
writers may wish to show the sulfur atom in order to indicate the chemistry of the reaction. Although it would be perfectly
legitimate to write
-Ala(SH)- + -Ala(S-CH2-COOH)-
this would be confusing since the residue is thought of as cysteine rather than as modified alanine.
We therefore recommend putting the residue symbol into quotation marks if one of its groups is to be depicted separately (or,
alternatively, using the symbol Raa). Hence the thiol group of a cysteine residue may be shown as:
SH
SH
'Cys' or
Terminal amino and carboxyl groups can be shown similarly, e.g. H2N'Ala' to show explicitly the amino group present in Ala(in contrast with --AlaNH2 which shows the amide of C-terminal alanine). This convention allows mechanisms to be drawn
out, e.g.
cO2-
CO2-
CH- p
'Cys'
with the
SCH2
'Cys'
quotation marks to alert readers to the fact that the symbol here does not include the sulfur atom. Sequences may also be
R-CO-X
GlyAsp--'Ser'Gly
R-CO + HX
GlyAsp'Ser'GIy
If an unusual residue is to be symbolized within a particular context, it may be helpful to modify (e.g. with an asterisk) the
symbol for the ribosomally incorporated residue, e.g. Ser* for 2-aminopropenoic acid (formed within a peptide chain by
dehydration of a serine residue). Such an asterisk may be placed above the residue rather than after it to allow alignment with
other 3-letter symbols. Symbols modified in this way should be defined when used.
3AA-17.6. Lack of Substitution
If it is desired to emphasize lack of substitution, H or OH may be added to the hyphen or vertical line that represents removal
of one of these groups. Thus H-Ala may be contrasted with Ac-Ala, and Ala-OH with Ala-OMe.
613
Groups substituted for hydrogen or hydroxyl may be indicated by their formulas or by symbols or by combination of
both, e.g.
Benzoylglycine (hippuric acid)
PhCO-Gly or C6H5CO-Gly
Note: the symbol Bz is often used for benzoyl in organic chemistry, and Bzl for benzyl, but because these symbols are so
similar, the alternative PhCO and PhCH2 are preferable.
Glycine methyl ester
Trifluoroacetylglycine
GlyOCH3 or GlyOMe
(Table 3, Note ii)
CF3COGly
Suggestions for symbols to designate substituent (or protecting) groups common in peptide and protein chemistry are given in
Tables 2, 3 & 4.
Z- or CbzBpoc-
Fmoc
Z(OMe)
Z(NO2)
Pz
Acm
Acetyl
Ac-
Benzoyl
Benzyl
(C6H5CO)
(C6H5CH2)
Carbamoyl
(3-Carboxy-4-nitrophenyl)thio
3-Carboxypropanoyl (HOOCCH2CH2CO)
Dansyl, 5-(dimethylamino)naphth-1-ylsulfonyl
2,4-Dinitrophenyl
Formyl
4-Iodophenylsulfonyl (pipsyl)
Dns
or Mal-z(C--404.l of [14])
Mal--
2-Nitrophenylthio
Phenyl(thiocarbamoyl)
Phthaloyl
Phthalyl (o-carboxybenzoyl)
Succinyl (OCCH2CH2CO)
Tosyl
Trifluoroacetyl
Trityl (triphenylmethyl)
CF3CO-
Mal
Pht-
Tos-
(see
Note i)
Ph3C or Trt
Notes
i) In organic nomenclature (C404.1 of [14]), 'succinyl' signifies the bivalent group formed from succinic acid by removal of both hydroxyl
groups, but in biochemical usage it usually signifies the 3-carboxypropanoyl group, e.g. succinyl-CoA.
ii) The use of D for 'di' and T for 'tn' and 'tetra' is discouraged if these apply to atoms or groups for which simple symbols exist, e.g. in
CF3CO, Me3Si and H4folate. We feel less strongly when their avoidance involves giving unusual meanings to symbols, e.g. N for nitro, so Dnp
and N2ph are offered as alternative symbols for dinitrophenyl. See also Note ii of 3AA-15.2.5.
iii) The symbol HCO is preferred to CHO for the formyl group, because CHO has sometimes been used to indicate the attachment of
carbohydrate.
PAAC
6/D
614
Symbol
OBt
1-(Glycyloxy)benzotriazole
Glycine benzyl ester
OCH2Ph
tert-Butoxy
Diphenylmethoxy
Ethoxy
Methoxy
4-Nitrobenzyloxy
4-Nitrophenoxy
4-Nitrophenylthio
Pentachiorophenoxy
Phenylthio
Quinolin-8-yloxy
Succinimido-oxy
2,4,5-Trichiorophenyloxy
OEt
OMe
ONb
ONp
SNp
OPcp
SPh
OQu
ONSu or OSu
OTcp
Note. Carboxyl substituents will not normally appear as prefixes in the names of derivatives of amino acids or peptides, so the name of the
group, its prefix name, given in column 1, is little used in naming compounds. Column 3 is therefore given to show how derivatives containing the
group are named (by one of the alternative methods of 3AA-9.1).
indiscriminate use of such abbreviations is discouraged, especially when the accepted trivial name of the reagent is short, e.g. tosyl
- Dnp-NH-R+H +F
For this reason DnsCl is usually preferred to DNS for dansyl chloride (although the full name is short enough for most textual
use), and DnpF to the original FDNB for i-fluoro-2,4-dinitrobenzene, and similarly Nbs2 in place of DTNB for 3,3'dithiobis(6-nitrobenzoic acid) (Ellman's reagent) and (PrO)2P0F or DipF for diisopropyl fluorophosphate.
Symbols constructed from known elements are more readily understood than arbitrary abbreviations, e.g. Tos-Arg-OMe
rather than TAME for tosylarginine methyl ester, and Tos-Phe-CH2C1 rather than TPCK for 'tosylphenylalanine chloromethyl
ketone', a name incorrectly used for tosylphenylalanylchloromethane (3AA-1O.2), but misleading because it erroneously
specifies the carbonyl group twice.
Gly-Gly
Glu-Gly
N-y-Glutamylglycine
Glu
Thyroliberin
Angiotensin II
Glp-His-Pro-NH2
Asp-Arg-Val-Tyr-Ile-His-Pro-Phe
Glutathione
Glu
-Gly
or Giu
LGly
or G u Cys-Gly or Glu(-Cys-Gly)
LcysGly
Note. Glu
would represent the corresponding thiol ester with a bond between the y-carboxyl of glutamic acid and the
Cys-Gly
thiol group of cysteine.
N2-a-Glutamyllysine
Glu-Lys
N6-a-Glutamyllysine
615
Lys
N2-y-Glutamyllysine
or Glu Lys
Glu
LLys
N6-y-Glutamyllysine
II
for modified residues or names of compounds may be used in such formulas. Thus a peptide with a C-terminal
aldehyde may be shown using either a name or a symbol constructed according to 3AA-16.3. Example:
Symbols
Ac-Leu-Leu-argininal or Ac-Leu-Leu-Arg-H
(If the second method is used, the symbol should be explained to avoid confusion.)
A
If part of a sequence is unknown, but its composition can be specified, this may be indicated by parentheses, with commas
between the residues listed as present, e.g. Ala-Lys-(Ala,Gly3,Val2)-Glu-Val.
A
Ifa peptide must be written on more than one line, we advise placing a hyphen at the end ofeach line to be continued (where it
has its usual meaning of a continuation symbol), and also at the start of the next line (where it represents the peptide bond), e.g.
Ala-Ser-Tyr-Phe-Ser-Gly-Pro-Gly-Trp-Arg
Ala-Ser-Tyr-Phe-LGlyproGlyTrpArg'
but such a break may also be needed in textual material where this is not possible.
(Greek
Gly-Glu-Ala-Lys-Cys-Val
in the one-line system. No lines or parentheses are used, since they would imply removal of H. In earlier [10] recommendations 'H' was added
with a vertical line or parentheses, but again (cf. i) the line represented no single bond.
616
iv) Deprotonation of Side-Chain Acidic Groups. The symbols Asp and Glu may have 0 placed at the end of a vertical line above or below
them, or in parentheses after them (cf. ii), since 0 replaces the OH removed. Other acidic residues, e.g. Cys, have the charge alone at the end of
the vertical line or in parentheses, since the group removed here is H.
Hence the two ionic forms shown above for a peptide could be drawn as
H
+ HG1y-Glu-Ala-Lys-Cys-Val-0
An isoelectric form of Gly-Lys-Gly could be drawn as
I
and Gly-Glu-Ala-Lys-Cys-Val-0
I
H+
Gly-Lys-Gly-0
HGly-Lys-Gly-OH 2 Cl
NO
NO
or Gly__LGly
Me
Glycylnitrosoglycine
Gly Gly
Glycyl-N-acetylglycine
(dy Gly
or Gly
Me
ly
Gly\
Gly-1
Gly--Gly or Gly1Gly
N,N-diglycylglycine
ii) The sequence is again written in one line, but the residues at each end of the line are joined by a lengthened bond, e.g.
LValOrnLeuDpheproyalOrnLeuDpheprol
or (3AA-19.2, sentence 2)
r0rneuProa0rneu0Ph0i
iii) The residues are written on two lines, so that the sequence is reversed on one of them. Hence the CO to NH direction
within the peptide bond must be indicated by arrows (3AA-16.2 and -16.3). Hence gramicidin S may by written (using the option of
Heterodetic cyclic peptides are peptides consisting only of amino-acid residues, but the linkages forming the ring are not
solely eupeptide bonds; one or more is an isopeptide, disulfide, ester, or other bond.
Their symbolic representation follows logically from that of substituted amino acids (3AA-16.4). Examples:
Oxytocin
Cys-Tyr-Ile-Gln-Asn-Cys-Pro-Leu-Gly-NH2
3AA-19.6.
617
Depsipeptides
Depsipeptides are oligomers formed from amino acids and other bifunctional acids, usually hydroxy acids. They are often
cyclic. In symbolic representation, any special symbols used for the hydroxy acids should be defined.
3AA-19.7. Peptide Analogues
Analogues ofpeptides in which theCONHgroup thatjoins residues is replaced by another groupingmay be indicated [25]
by placing a Greek psi, followed by the replacing group in parenthesis, between the residue symbols where the change occurs.
Examples:
Ala-(NH - CO) Ala
for NH -CHMe -NH -CO CHMe C00
for NH CHMe CH = CHCHMe C00
Ala-t'(CH = CH, trans)-Ala
A Although hyphens between residues are important in representing peptide sequences (3AA-16), they may be omitted (I) ifit is
necessary to align sequences with those of nucleic acids ; this is an alternative to separating triplets (II):
MetSerlleGinHis
(I)
Part
AGTATGAGTATTCAACAT
TCATACTCATAAGTTGTA
Met-SerIleGin-His
(II)
There are difficulties in using the three-letter system (3AA-14 to 3AA-19) in presenting long protein sequences. A one-letter
code is much more concise, and is helpful in summarizing large amounts of data, in aligning and comparing homologous
sequences, and in computer techniques for these processes. It may also be used to label residues in three-dimensional pictures of
protein molecules.
The possibility of using one-letter symbols was mentioned by Gamow & Yeas [26] in 1958. orm et al. [27] systematized the
idea in 1961 (see, for example, [28]), and Dayhoff and Eck used one-letter symbols derived partly from the code of orm et al. in
their compilations of protein sequences ([29], latest edition [30]). IUB-IUPAC recommendations [11] were approved in 1968 on
the basis of proposals of a subcommittee of W. E. Cohn, M. 0. Dayhoff, R. V. Eck, and B. Keil, and these recommendations are
given here with no substantial change.
3AA-20.2. Limits of Application of the One-Letter System
The one-letter system is less easily understood than the three-letter system by those not familiar with it, soit should not be used
in simple text or in reporting experimental details of sequence determination. It is therefore recommended for comparisons of
long sequences in tables and lists, and in other special uses where brevity is important. If both it and the single-letter system for
nucleotide sequences [23] are used in the same paper, particular care should be taken to avoid confusion.
3AA-21. DESCRIPTION OF THE ONE-LETTER SYSTEM
The symbols are listed, in alphabetical order of amino-acid names, in Table 1. Table 5 gives them in alphabetical order of
symbols.
Note on the Choice of Symbols
Initial letters of the names of the amino acids were chosen where there was no ambiguity. There are six such cases: cysteine, histidine,
isoleucine, methionine, serine and valine. All the other amino acids share the initial letters A, G, L, P or T, so arbitrary assignments were made.
These letters were assigned to the most frequently occurring and structurally most simple of the amino acids with these initials, alanine (A),
glycine (G), leucine (L), proline (P) and threonine (T).
Other assignments were made on the basis of associations that might be helpful in remembering the code, e.g. the phonetic associations ofF
forphenylalanine and R for arginine. For tryptophan the double ring of the molecule is associated with the bulky letter W. The letters N and Q
618
Three-letter
symbol
Amino acid
Ala
Asx
alanine
aspartic acid or asparagine
cysteine
aspartic acid
glutamic acid
phenylalanine
glycine
histidine
isoleucine
B
C
Cys
Asp
Glu
Phe
Gly
His
Ile
Lys
lysine
leucine
methionine
asparagine
Pro
GIn
Arg
proline
glutamine
arginine
serine
threonine
L
M
Q
R
S
Leu
Met
Asn
Ser
Thr
Val
Trp
X
Y
A Z
Xaa
Tyr
Glx
valine
tryptophan
unknown or 'other' amino acid
tyrosine
glutamic acid or glutamine (or substances
such as 4-carboxyglutamic acid and 5oxoproline that yield glutamic acid on acid
hydrolysis of peptides)
were assigned to asparagine and glutamine respectively; D and E to aspartic and glutamic acids respectively. K and Y were chosen for the two
remaining amino acids, lysine and tyrosine, because, of the few remaining letters, they were close alphabetically to the initial letters of the names.
U and 0 were avoided because U is easily confused with V in handwritten material, and 0 with G, Q, Cand D in imperfect computer print-outs,
and also with zero. J was avoided because it is absent from several languages.
Two other symbols are often necessary in partly determined sequences, so B was assigned to aspartic acid or asparagine when these have not
been distinguished; Z was similarly assigned to glutamic acid or glutamine. X means that the identity of an amino acid is undetermined, or that the
amino acid is atypical.
3AA-21.3. Spacing
An important use of the one-letter notation is in presenting alignment of homologous sequences. It is therefore vital not to
destroy alignment by variable punctuation or variable width of letters. A single space is therefore left between symbols, as a blank
if not occupied by punctuation (3AA-21 .4 and 21.5), so that such punctuation can be inserted without destroying alignment.
Exactly the same spacing is given to each letter, each blank and each punctuation mark, as in typewritten material or, if printed, as
in 'typewriter type font'.
3AA-21 .4. Known Sequences
A blank between letters indicates that the sequence was determined experimentally. For example,
A C D E F G H I K L M N P Q means Ala-Cys-Asp-Glu-Phe-Gly-His-Ile-Lys-Leu-Met-Asn-Pro-Gln
Parentheses are used to indicate regions of a sequence in which the composition is known, but the sequence undetermined;
they are also placed round the symbol for a single residue to show that its identification is tentative. The one-space symbol '= 'can
be used for')(' to indicate the end of one unknown sequence and the beginning of another.
If the residue inside parentheses can be positioned with confidence by homology with related proteins, the letters are
separated by dots. If their position is arbitrary for lack of even indirect evidence the letters are separated by commas. A slash (/)
may be used to separate the symbols for residues that have not been shown experimentally to be corrected, because they are derived
from different peptides. A slash before or after a sequence shows that termination has not been demonstrated (3AA-21 .1).
This punctuation
619
is illustrated in the comparison ofthree sequences, where two partly known (ac) are aligned with a known
one (b):
a)
(A,C,D)E F G(H.I.K.L=M,N)P Q
b)
RSTEFGHIKLADPQ
C)
A C D E F/G H I K L(M,N)P Q
Thusthe sequence ofone ofthe fragments (H.I.K.L) can be inferred with confidence for(s) whereas that offragments (A,C,D) and
(M,N) cannot. Two fragments were sequenced independently in (c). Their positioning is made only by analog with (b).
If more elaborate punctuation is required for special circumstances, it is essential that only one character (or a blank of similar
size) should appear between the letters of the code.
Part 3. Modification of Named Peptides (a revision and updating of [12])
3AA-22. NAMES AND SYMBOLS FOR DERIVATIVES OF NAMED PEPTIDES
It is often convenient to specify the structure of a peptide by reference to a named sequence of which it is a variant. The
recommendations that follow allow this, but they apply only to modifications of the sequence involving normal amide links
between residues.
Note. To exemplify any named peptide, the imaginary peptide 'iupaciubin' (to symbolize the harmonious co-operation of
IUPAC and IUB), Ala-Lys-Glu-Tyr-Leu, is used in formulating the recommendations below.
3AA-22.1. Replacement of Residues
In a peptide of trivial name iupaciubin, if the qth amino-acid residue, starting from the N-terminal end of the chain, is replaced by the amino acid Xaa, the semitrivial name of the modified peptide is [q-amino acid]iupaciubin, and the abbreviated form
is [Xaa"]iupaciubin. A designation of the chain may be placed before the residue number, e.g. [A1aB12]insulin(cattle) (see
Comment f). Examples:
[8-Citrulline]vasopressin,
[Cit8]vasopressin;
[5-Isoleucine,7-alanine]angiotensin II,
[Ile5,A1a7]angiotensin II.
Notes
a) In the full name, the replacement amino acid is designated by its residue name, not the name of its acyl group (e.g. glycine,
not glycyl). This name, and the position of replacement, are given in square brackets.
b) In the abbreviated form, the amino-acid residues are designated by standard 3-letter symbols (Table 1), the first letter only
being a capital (3AA-14), in square brackets.
c) In the abbreviated form the position of substitution is indicated in a special fashion, i.e. by a superior numeral, to indicate
that it is a residue, not an individual atom, that is being replaced.
d) The residue replaced is not designated in these semitrivial names in order to keep the names short, and because this form of
nomenclature of 3AA-22. 1 clearly differs from ordinary substitution nomenclature.
e) The replacement of an amino-acid residue by its enantiomer may be shown by application of this rule as follows: the
A replacement in iupaciubin of L-tyrOsine at position 4 by D-tyrosifle results in [4-D-tyrosine]iupaciubin with the abbreviations
[D-Tyr4]iupaciubin. A mixture of this with iupaciubin gives [4-ambo-tyrosine]iupaciubin or [ambo-Tyr4]iupaciubin (3AA-13.2 and
3AA-19.2). Examples: [o-Ser1]corticotropin; [o-Asp1]angiotensin II.
f) Specification of a sequence may require the species as well as the peptide to be named. If so, the name of the species should be
attached, in parenthesis, to the name of the peptide whenever a modifying prefix is present. Thus a substitution in cattle insulin
could give [A1aB1 2]insulin(cattle). (We prefer 'cattle' as the adjective for this species, since 'bovine' is not used in common speech to
designate the species, but to compare human attributes with those of the cow, and 'ox' can be misleading.)
g) It may be convenient to represent replacement of Gln by Glu or of Asn by Asp with the prefix 'desamido'. Thus
[Glu30]corticotropin(pig) could be called desamido30-corticotropin(pig). Similarly replacement of Gla by Glu can be designated
with the prefix 'decarboxy'.
3AA-22.2. Extension of the Peptide Chain
The compounds obtained by extension of a peptide at either the N-terminus on the C-terminus are designated by the kinds of
names and abbreviations shown below; these are in accordance with general principles of peptide nomenclature (3AA-13.1).
Examples:
a) Extension at the N-terminus
Aminoacyliupaciubin
Valyliupaciubin
Valylglycyliupaciubin
(for extension by two residues)
Xaa-iupaciubin
Val-iupaciubin
Val-Gly-iupaciubin
620
Iupaciubmyl-Xaa
lupaciubinylleucine
Iupaciubinyl-Leu
A This rule is not directly applicable to extension at the C-terminus ofnatural peptides that possess a terminal amide group, such
as oxytocin and a-melanotropin. For these, a new name should be given to the corresponding peptide with a free carboxyl group
by adding 'oic acid' to the trivial name, e. g. oxytocinoic acid from oxytocin, so that extension can then be denoted as above, e.g.
oxytocinoyl-Xaa.
Note
The enkephalins are the two peptides Tyr-Gly-Gly-Phe-Leu and Tyr-Gly-Gly-Phe-Met. Designations such as Leu-, leucyl-
and leucine-enkephalin have been given to the former, with corresponding teim for the latter. These all wrongly imply
N-terminal extension, and could not be used together with any indication of such extension. Morley [25] has advocated
[Leu]enkephalin, if necessary [Leu5}enkephalin, in accordance with 3AA-22.1, implying that enkephalin means Tyr-Gly-GlyPhe-Xaa. We believe that [Leu5]enkephalin is the best designation.
3AA-22.3. Insertion of Residues
The compound obtained by insertion of an additional amino-acid residue Xaa in the position between the qth and the (q + 1)th
endo-Tyr4-angiotensin II.
Notes
a) This form has analogies in other fields where endo iniplies the insertion of something into a structure (e.g. endo-methylene).
The prefix or index qa is based on analogies with the steroids where the atoms inserted into a ring after atom no. q are designated
The compound obtained by the formal removal of an amino-acid residue from the peptide iupaciubin in position q is
designated by the name des-q-amino acid-iupaciubin, abbreviated des-Xaa5-iupaciubin. Example:
des-7-proline-oxytocin,
des-Pro7-oxytocin
Notes
a) Removal of a whole residue is indicated in a way similar to that for removal of a ring in steroids, e.g. des-A-androstane.
b) The form 'de' is not suitable as a prefix because it is easily confused, in speaking, with D (for configuration).
c) Multiple deletions are designated similarly, e.g. des-Ile3,Asn5-oxytocin. If a complete sequence is to be removed, the first
and last loci of this sequence are all that need be specified, and they should be put in parentheses with a hyphen between them, e.g.
des-(B24-B28)-insulin(mouse).
3AA-22.5. Substitution of Side Chains of Residues
The compound formed by introducing an additional amino-acid residue as a substituent of the side chain of a residue in a
peptide is named by applying the rules of peptide nomenclature (3AA-7, -9 and -13) to the trivial name, as follows.
Ala-Lys-Glu-Tyr-Leu
621
by acylation of the i-amino group of the lysine residue at position 2 of iupaciubin (Ala-Lys-Glu-Tyr-Leu) with a valyl
group is named NE2va1y1iupaciubin (abbreviated N2-Va1-iupaciubin).
derived
Other substituents that can be named as prefixes are treated similarly to amino acyl groups. Examples:
Me
N62-Methyliupaciubin
for
Ala-Lys-Glu-Tyr-Leu
COOH
N' NB29bis(Boc)insulin
or
N2hl,NSB29bis(Boc)insulin
valine is acylated by the y-carboxyl group of a glutamic residue in position 3 (Glu-3) of iupaciubin (Ala-Lys-Glu-TyrLeu), is named N(iupaciubinC3yl)valine, or N-(iupaciubin-C53-yl)valine, abbreviated to iupaciubin-C53-yl-Val.
Prefixes may also need to be formed from peptide names for use as substituents in other types of compound, as described in
3AA-13.5.
A A peptide derived from a named peptide iupaciubin by removal of all residues before thepth and all after the qth is named as
iupaciubin-(p-q)-peptide. Examples:
From a-melanotropin
Ac-Ser-Tyr-Ser-Met-Glu-His-Phe-Arg-Trp- Gly-Lys-Pro-Val-NH2
12345678910111213
we may have
Met-Glu-His-Phe-Arg-Trp-Gly
10
4
or
a-melanotropin-(4-10)-peptide
or, to illustrate the naming of a peptide that contains two fragments, and also a C-terminal amide group:
His-Phe-Arg-Lys-Pro-Val-NH2 may be called
cx-melanotropin-(6-8)-(11-13)-peptide amide.
For oligopeptides it may be convenient to state the length of peptide, e.g. iupaciubin-(2-4)-tripeptide, but this is not normally
useful for peptides of over about twelve residues, because the larger multiplying affixes are not widely known. Such a check on the
number of residues is particularly useful when two or more sequences are joined, e.g. a-melanotropin-(6-8)-(11-13)-hexapeptide
amide.
3AA-22.7. Peptides with Reversed Sequence and Enantiomers
The peptide whose sequence is the reverse of a named peptide may itself be named with the prefix 'retro-', giving retroiupaciubin from iupaciubin. The enantiomer of a named peptide may be specified with the prefix 'ent-' (a contracted form of
enantio-, F-6.4 of [15]), giving ent-iupaciubin from iupaciubin.
622
3AA-22.9.
Operation
Short name
Structure
lupaciubin
Ala-Lys-Glu-Tyr-Leu
22.1
Replacement
[Phe4]iupaciubin
Ala-Lys-Glu-Phe-Leu
22.2a
Extension (N-terminal)
Arg-iupaciubin
Arg-Ala-Lys-Glu-Tyr-Leu
22.2b
Extension (C-terminal)
lupaciubinyl-Met
Ala-Lys-Glu-Tyr-Leu-Met
22.3
Insertion
Endo-Thr2-iupaciubin
Ala-Lys-Thr-Glu-Tyr-Leu
22.4
Removal
Des-G1u3-iupaciubin
Ala-Lys-Tyr-Leu
Section
223
2
Val
22.5.1
N2-Val-iupaciubin
Ala-Lys-Glu-Tyr-Leu
22.5.2
C43-Iupaciubinyl-Val
Ala-Lys-Glu-Tyr-Leu
22.6
22.7
Partial sequence
Reversal of sequence
Iupaciubin-(2-4)-peptide
retro-lupaciubin
Lys-Glu-Tyr
Leu-Tyr-Glu-Lys-Ala
22.7
Enantiomer
ent-lupaciubin
D-Ala-D-Lys-D-Glu-D-Tyr-D-Leu
22.8
[3i4,CH2S]iupaciubin
Ala-Lys-Glu-fi(CH2S)-Tyr-Leu
Val
* Square brackets are required to indicate replacement, but are not used for most other modifications.
REFERENCES
1. Vickery, H. B. & Schmidt, C. L. A. (1931) Chem. Rev. 9, 169318.
2. Vickery, H. B. (1972) Adv. Protein Chem. 26, 81 171.
3. Vickery, H. B. (1947) J. Biol. Chem. 169, 237 245.
4. International Union ofPure and Applied Chemistry(IUPAC), Definitive Rules for the Nomenclature ofAmino Acids (1960)J. Amer. Chem.
Soc. 82, 55755577.
5. Vickery, H. B. (1963) J. Org. Chem. 28, 291 293.
6. IUPAC Commission on the Nomenclature of Organic Chemistry (CNOC) and IUPAC-IUB Commission on Biochemical Nomenclature
(CBN), Nomenclature ofoc-Amino Acids, Recommendations 1974, Biochem. J. 149, 1 16(1975); Biochemistry, 14, 449 462(1975); Eur.
J. Biochem. 53, 1 14 (1975); also pp. 6477 in [7].
7. International Union of Biochemistry (1978) Biochemical Nomenclature and Related Documents, The Biochemical Society, London.
8. Brand, E. & Edsall, J. T. (1947) Annu. Rev. Biochem. 16, 223 272.
9. IUPAC-IUB Commission on Biochemical Nomenclature (CBN), Abbreviated Designation of Amino-Acid Derivatives and Peptides,
Recommendations 1966, Arch. Biochem. Biophys. 121, 1 5 (1967); Biochem. J. 102, 23 27(1967); Biochemistry, 5, 2485 2489 (1966);
Biochim. Biophys. Acta, 121, 1 7 (1967); Bull. Soc. Chim. Biol. 49, 121 129 (1968) (in French); Eur. J. Biochem. 1, 375 378 (1967);
Hoppe-Seyler's Z. Physiol. Chem. 348,256261 (1967)(in German); J. Biol. Chem. 241, 2491 2495 (1966); Mol. Biol. 2,282288(1968)
(in Russian).
10. IUPAC-IUB Commission on Biochemical Nomenclature (CBN), Symbols for Amino-Acid Derivatives and Peptides, Recommendations
1971, Arch. Biochem. Biophys. 150, 1 8 (1972); Biochem. J. 126, 773 780 (1972), corrected 135,9(1973); Biochemistry, 11, 17261732
(1972); Biochim. Biophys. Acta, 263, 205 212(1972); Eur. J. Biochem. 27, 201 207(1972), corrected 45, 2(1974); J. Biol. Chem. 247, 977
983 (1972); Pure Appl. Chem. 40, 315 331 (1974); also pp. 7884 in [7].
11. IUPAC-IUB Commission on Biochemical Nomenclature (CBN), A One-Letter Notation for Amino Acid Sequences, 1968, Arch. Biochem.
Biophys. 125(3),i v(1968);Biochem.J. 113,1 4(1969);Biochemistry, 7,2703 2705(1968);Biochim. Biophys. Acta, 168,610(1968);
Bull. Soc. Chim. Biol. 50, 15771582 (1968) (in French); Eur. J. Biochem. 5, 151 153 (1968); Hoppe-Seyler's Z. Physiol. Chem. 350, 793
797 (1969) (in German); J. Biol. Chem. 243, 35573559 (1968); Mol. Biol. 3,473477 (1969) (in Russian); Pure Appl. Chem. 31, 641
645 (1972), also pp. 91 93 in [7].
12. IUPAC-IUB Commission on Biochemical Nomenclature (CBN), Rules for Naming Synthetic Modifications of Natural Peptides, 1966,
Arch. Biochem. Biophys. 121, 68 (1967); Biochem. J. 104, 1719 (1967), corrected 135, 9 (1973); Biochemistry, 6, 362 364 (1967);
Biochim. Biophys. Acta, 133, 1 5 (1967); Bull. Soc. Chim. Biol. 49, 325 330 (1967) (in French); Eur. J. Biochem. 1, 379 381 (1967),
corrected 45, 3 (1974); Hoppe-Seyler's Z. Physiol. Chem. 348, 262 265 (1967) (in German); J. Biol. Chem. 242, 555557 (1967);Mol.
Biol. 2, 466 469 (1968) (in Russian); Pure Appl. Chem. 31, 647 653 (1972); also pp. 8587 in [7].
13. IUPAC Commission on Nomenclature of Organic Chemistry (CNOC), Nomenclature of Organic Chemistry, Section E: Stereochemistry,
Recommendations 1974, Pure Appl. Chem. 45, 1130 (1976); also pp. 118 in [7] and pp. 473 490 in [14].
pu
nzuU1oN
uisioquiJs
ioj
spi pu sapidd
ouuuu
CZ9
'
!'
UOIUfl "dJ pu1 p!idd OsImaq (6L61) ainlopuawoN dlUVA17fO 'tITIJSjwal/3 uoipag'v 'g 'D 'q 'HPUVJUOUU1Jd
'SSJd pJOJX
dl DVdflI UOISSIIUIUOJ UO oq OJfl11DUWON oIuJo dJ2sIiuaqj 'bOND) OJflIOUOWON J0 3iU1tJ 'iuStUTa4D uoipa5 : jIJflT1
S4OflPOJdpu pptjoj'spunodwoj
SUOUUUUuODOJ'9L61
walpo/g '9R 1 8 (8L61) osj dd 61 9Z U' [LI pui dd i6fr iic
j71
UJOIUJ
'
oj
'i'v
',V
J"'O
'
Sjfl
''v rn
old
(Z861)
'i
ainpuuioM
c9
8Z
6Z
(uouJ
os
njpnpj
'iii
'(MHD) papoiqq
'i
I9
'()
ni
'z
'f
I
f
'J
PV S
'0
ut
ci
'
6i
'
'M
Lt'
.oc
'c
iq
8i
'
'
'11
'' '[
2d
I1
c
gzj
JZ
'ci
i '(L6I) jddns
(9L61)
pui iddns
(6L61) I13uo!flN
/7
624
Appendix. Amino Acids with Trivial Names (excluding those listed in Table 1)
It is often helpful to use trivial names in order to avoid cumbersome systematic or semisystematic names, particularly if the substance has to be
named frequently. Coining of trivial names is treated in 3AA-2.1, and a number of existing trivial names are listed in the appendices to the
previous edition of recommendations on amino-acid nomenclature [6]; only the commoner are listed here.
Trivial name
Symbol
predominating at neutral pH
f3Ala
Allysine
HCO[CH2]3CH(NH3)COO
Citrulline
Cit
NH2 CONH[CH2]3CH(NH3)COO
Cystathionine
Ala
/3-Alanine
Hcy
Cysteic acid
Cya
Cystine
Cys
Cys
Dopa
CH2CH(NH3)COO
S[CH2]2CH(NH3)COO
03SCH2CH(NH3ICOO
SCH2CH(NH3)COO
SCH2-CH(NH3)COO
HO
HOCH2-CH(NH3)COO
Homocysteine
Hcy
Homoserine
Hse
HOCH2CH2 CH(NH3COO
Homoserine lactone
Hsl
t5CH2CH2CH(NH3ICO
Lanthionine
Ala
H2CH(NH3)COOCys
SCH2CH(NH3)COO
Ornithine
Orn
NH3[CH2]3CH(NH3iCOO
5-Oxoproline
Glp
NHCOCH2 CH2CHC00
Sarcosine
Sar
CH3NH2CH2COO
Thyronine
Thyroxine
HOi
'Oi/I.L CH2CH(NH3)COO