Predicting Melting Temperature Directly From Protein Sequences - Chi Giang Ho

Computational Biology and Chemistry 33 (2009) 445450
Contents lists available at ScienceDirect

Computational Biology and Chemistry
j our nal homepage: www. el sevi er . com/ l ocat e/ compbi ol chem
Research article
Predicting melting temperature directly from protein sequences
Tienhsiung Ku
a
, Peiyu Lu
b
, Chenhsiung Chan
b
, Tsusheng Wang
b
, Szuming Lai
b
,
Pingchiang Lyu
b
, Naiwan Hsiao
c,
a
Department of Anesthesiology, Changhua Christian Hospital, Changhua, Taiwan
b
Department of Life Sciences, National Tsing Hua University, Hsinchu, Taiwan
c
Institute of Biotechnology, National Changhua University of Education, No. 1, Jin-De Road, Changhua, Taiwan
a r t i c l e i n f o
Article history:
Received 18 May 2009
Received in revised form 9 October 2009
Accepted 10 October 2009
Keywords:
Dipeptide
Hyperthermophiles
Genome
Prediction
a b s t r a c t
Proteins of both hyperthermophilic and mesophilic microorganisms generally constitute from the same
20 amino acids; however, the extent of thermal tolerance of any given protein is an inherent property of
its amino acid sequence. The present study is the rst to report a rapid method for predicting Tm(melting
temperature), the temperature at which 50% of the protein is unfolded, directly from protein sequences
(the Tm Index program is available at http://tm.life.nthu.edu.tw/). We examined 75 complete microbial
genomes using the Tm Index, and the analysis clearly differentiated hyperthermophilic from mesophilic
microorganisms on this global genomic basis. These results are consistent with the previous hypothesis
that hyperthermophiles express a greater number of high Tm proteins compared with mesophiles. The
Tm Index will be valuable for modifying existing proteins (enzymes, protein drugs and vaccines) or
designing novel proteins having a desired melting temperature.
2009 Elsevier Ltd. All rights reserved.
1. Introduction
Understanding the molecular basis of the thermodynamic sta-
bility of proteins is a fundamental problem with clear practical
applications. The protein (enzymes, vaccine and protein drugs) sta-
bility is a formulation challenge (Brown, 2005; Frokjaer and Otzen,
2005; Salmaso et al., 2006), and the modication of the protein
stability is a very useful study (Borghouts et al., 2005; Marr et
al., 2006; Rosenberg and Goldblum, 2006). Thermodynamic sta-
bility is dened by the proteins free energy of stabilization and
melting temperature (Tm, the temperature at which 50% of the
protein is unfolded) (Vieille and Zeikus, 2001). One approach to
understanding the molecular basis of the thermodynamic stabil-
ity of proteins involves comparisons of structures of homologous
proteins between hyperthermophilic and mesophilic organisms
(Gianese et al., 2002; Haney et al., 1999). In recent years there has
been sufcient experimental evidence on hyperthermophilic pro-
teins (e.g., sequence, mutagenesis, structure, andthermodynamics)
toconcludethat nosinglemechanismis responsiblefor theremark-
able thermodynamic stability of these proteins structures (vanden
Burg and Eijsink, 2002; Vieille and Zeikus, 2001; Zhou et al., 2008).
Abbreviations: Tm, melting temperature; TI, Tm Index; HTPP, high Tm protein
percentage; OGT, optimal growth temperature.
Corresponding author. Tel.: +886 47232105; fax: +886 47128758.

E-mail address: nady@cc.ncue.edu.tw (N. Hsiao).
The thermal tolerance of any given protein is a function of
its composition and sequence, as illustrated by the fact that both
hyperthermophilic and mesophilic proteins generally contain the
same 20 amino acids (Reddy et al., 1998a). Thermodynamics
governs most aspects of biomolecular interactions, and several
theoretical methods have been proposed to predict changes in
thermodynamic stability (Liang et al., 2005; Mombelli et al., 2002;
Szilagyi and Zavodszky, 2000; Zavodszky et al., 1998). Most of
these methods are based on detailed atomic models coupled with
semi-empirical force elds, simplied energy criteria, and empir-
ical methods that use the change in free energy between the
denatured state and the compact native state. However, the main
non-covalent interactions, including hydrophobic, van der Waals,
electrostatic and hydrogen bonding, are most important in stabi-
lizing protein structures (Prevost et al., 1991; Spector et al., 2000;
Vogt et al., 1997).
The number of known protein sequences has exploded follow-
ing the publication of the human genome and other sequencing
projects, and the increase in the number of protein sequences far
exceeds the number of known protein structures. The folding prob-
lem has prevented any method from being developed to predict
the exact three-dimensional structure of a protein based on its
amino acid sequence (Sanchez et al., 2000). Moreover, predict-
ing the thermodynamic stability of a protein from its amino acid
sequenceis oneof themost difcult challenges inmolecular biology
and bioinformatics (Richards, 1997). Analyses of protein sequences
fromcomplete genomes of hyperthermophiles and mesophiles has
revealed some of the factors that are responsible for enhancing the
1476-9271/$ see front matter 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compbiolchem.2009.10.002
446 T. Ku et al. / Computational Biology and Chemistry 33 (2009) 445450
thermodynamic stability of these proteins (Cambillau and Claverie,
2000; Chakravarty and Varadarajan, 2000; Suhre and Claverie,
2003). Several differences exist between hyperthermophilic and
mesophilic proteins, in terms of amino acid composition, size and
secondary structure (Chakravarty and Varadarajan, 2000; Szilagyi
and Zavodszky, 2000); however, these differences do not provide
sufcient information to determine the Tm value of a protein from
its sequence.
This study addresses the critical point of protein denaturation
which is the melting temperature (Tm) of the protein. Our data
establish a correlation between the dipeptides of a protein and
its Tm value in the neutral pH. Thus, our results suggest that Tm
values may be predicted using a newly identied, fundamental fac-
tor, namely the composition of dipeptides within the amino acid
sequence.
2. Materials and methods
2.1. Database and materials
The most important step in developing an accurate predictive
method is to obtain an ideal data set. Unfortunately, it is difcult to
obtain a large set of proteins, each of which having a single transi-
tion state, an approximately neutral pH and similar buffer during
physical analyses, a low inter-sequence identity, and a similar Tm
valuetothat of wildtype. The35proteins wechose, listedinTable1,
satisfy the above limitations (i.e., single transition state under ther-
mal denaturation, low inter-sequence identity, and approximately
neutral pH during analyses). The Tm values for these proteins
(Table 1) separate them into two groups: the high Tm (Tm>65
C)
proteins and the low Tm (Tm<55
C) proteins.
2.2. Statistical method
Statistical inferences were made to establish a correlation
between the Tm of a protein and the composition of dipeptides
withinits sequence(Guruprasadet al., 1990). Thestatistical method
we used to distinguish high Tm and low Tm groups was modied
from the method previously used to calculate II (Instability Index)
(Guruprasadet al., 1990). First, the chi-square test was usedtoeval-
uate the statistical signicance of this relationship for Tm value
and dipeptides in certain proteins. The mathematical expectation
of each group is dened by,
E(X) =
x=
x=
x f
x
(X)
In the case of two independent random variables, the expected
value can be calculated from,
E(XY) = E(X) E(Y)
Since the amino acids are 20, so the above equation may be
written as,
E(xy) =
N
obs
(x)
T

N
obs
(y)
T
y=20
y=1
x=20
x=1
N
obs
(xy)
where T is the total number of amino acids in a particular group;
N
obs
(x) and N
obs
(y) are the observed occurrences of amino acids
x and y, respectively, and N
obs
(xy) is the observed occurrences of
dipeptide xy. From chi-square denition, the equation is,
2
(xy) =
[N
obs
(xy) E(xy)]
2
E(xy)
The average chi-square for each group is,
2
avg
=
1
400
xy=400
xy=1
2
(xy)
The average value of chi-square was thenused as the condence
limit to select signicant dipeptides for highTmand lowTmgroups
of proteins, respectively are
2
H

2
Havg
and
2
L

2
Lavg
The potential occurrence P(xy) for each dipeptide is given by,
P(xy) =
N
obs
(xy)
E(xy)
Compared with chi-square values under condence limit, then
retainedthe potential occurrence P(xy) of the signicant dipeptides
and set the other P(xy) to be zero. The relative occurrence P
rev
(xy)
of the signicant dipeptides were given by,
P
rev
(xy) = P(xy) 1
A 20 by 20 matrix of relatively potential occurrence is obtained
for each group. The value >0 indicate the dipeptide signicant
increasing in the certain proteins. The value <0 reveal the dipep-
tide signicant decreasing in the certain proteins. Then these two
relatively potential occurrence matrices are combined into one 20
by 20 matrix, named the Tm weight value table (P
index
) using the
equation,
P
index
(xy) = [P
revHighTm
(xy) P
revLowTm
(xy) +1] 100
100 is a scaling factor. The dipeptides with >100 Tm weight value
may contribute to thermostability of proteins. The Tmweight value
<100 may reduce the thermostability of proteins. These dipeptides
Tm weight values (P
index
) for all 400 possible dipeptide combina-
tions are presentedas a matrixinTable 2. Finally, this table (Table 2)
was applied to predict the Tmvalue fromprotein sequence, the Tm
Index (TI) for a proteinwas thencomputedusing P
index
by equation:
TI =
(100/L)
L1
i=1
P
index
(x
i
y
i+1
) 9372
398
where X
i
Y
i+1
designates a specic dipeptide within the sequence,
L is the number of amino acid residues in the sequence and 100 is
a scaling factor. The numbers 9372 and 398 are empirical values.
These TI (Tm Index) for various proteins in the high and low Tm
groups are given in the right-most column of Table 1.
3. Results and discussion
3.1. Dipeptides involved in melting temperature of proteins
Dipeptides constitute the smallest unit that denes order in an
amino acid sequence. We calculated the propensities for amino
acids to interact with residues occurring before and after in the
amino acid sequence. The calculation of relative potential occur-
rence reveals that each amino acid may have a different tendency
to occur next to a particular amino acid at both its N-terminal and
C-terminal side. As shown in Table 2, each dipeptide is suggested
to contribute differently to the Tm Index (TI) of a given protein.
The dipeptides with relatively high Tm weight values (i.e., >100)
may contribute to a higher Tm, whereas those having lower values
(<100) may reduce the Tm. For example, the occurrence of His-Cys,
Trp-Met and Cys-Pro in a protein may contribute to a higher TI,
whereas the occurrence of Met-Met, Trp-Cys, Asp-Trp and Trp-Pro
may reduce the TI.
T. Ku et al. / Computational Biology and Chemistry 33 (2009) 445450 447
Table 1
Properties of the 35 high and low Tm proteins used in the analysis.
No.
a
Protein Experimental pH Tm range (
C) TI
b
High Tm proteins >65
C
1 Odorant binding protein 6.6 6877 (Burova et al., 1999) 2.915
2 Alpha-chymotrypsin 7.08.0 86 (Bae and Sturtevant, 1995) 2.777
3 Gamma-crystallin 6.87.0 6878 (Sen et al., 1992) 5.807
4 Glutamate dehydrogenase 6.08.0 89 (Lebbink et al., 1999) 2.985
5 Dsba 7.0 6877 (Moutiez et al., 1999) 3.420
6 FLT3 6.17.4 7880 (Remmele et al., 1999) 3.121
7 Procarboxypeptidase A 7.5 8889 (Sanchez-Ruiz et al., 1988) 2.276
8 Carboxylesterase Est2 7.5 92 (Del Vecchio et al., 2002) 3.198
9 Pyrophosphatase 7.0 6599 (Leppanen et al., 1999) 3.371
10 Thioredoxin bacillus 7.0 85 (Pedone et al., 1999) 1.455
11 Superoxide dismutase 7.8 88.9 (Leveque et al., 2000) 1.890
12 Ribonuclease H 6.0 6686 (Hollien and Marqusee, 1999) 3.455
13 Thrombin 7.4 7181 (Lentz et al., 1994) 3.180
14 Tumor suppressor P53 6.07.0 8184 (Johnson et al., 1995) 2.559
15 Bacteriorhodopsin 7.5 95.1 (Azuaga et al., 1996) 1.001
16 Thioredoxin 6.57.5 85.3 (Bolon and Mayo, 2001) 1.350
No.
a
Protein Experimental pH Tm range (
C) TI
b
Low Tm proteins <55
C
1 Cro protein 6.0 3055 (Padmanabhan et al., 1999) 1.585
2 Beta lactamase 7.5 41 (Rahil and Pratt, 1994) 1.697
3 Barnase 6.07.0 53 (Kellis et al., 1989) 2.018
4 Aldolase 7.0 42.544 (Rudolph et al., 1992) 0.924
5 Adrenodoxin 6.57.4 47.553.7 (Burova et al., 1995) 0.598
6 Fibroblast growth factor 6.6 2539 (Culajay et al., 2000) 1.463
7 Ribonuclease T1 7.0 50.8 (Giletto and Pace, 1999) 1.256
8 C-Myb DNA-binding domain 7.5 39 (Morii et al., 1999) 2.085
9 Staphylococcal nuclease 6.08.0 52 (Leung et al., 2001) 1.732
10 Tropomyosin 7.5 4553 (Ishii, 1994) 0.001
11 Tumor suppressor protein P16 7.5 42 (Boice and Fairman, 1996) 1.862
12 Myoglobin 7.0 52 (Staniforth et al., 2000) 1.724
13 Myosin 7.58.0 3145 (Masino et al., 2000) 3.333
14 Chymotrypsin inhibitor 6.3 46 (Ruiz-Sanz et al., 1995) 2.724
15 Histone 6.57.5 4647 (Karantza et al., 2001) 0.681
16 Tryptophan synthase 7.8 46 (Ahmed et al., 1988) 0.637
17 Glucanohydrolase 6.0 48.7 (Wele et al., 1996) 2.868
18 Cro repressor protein 7.0 39 (Pakula and Sauer, 1990) 2.473
19 Alpha lactalbumin 7.0 3643 (Harushima and Sugai, 1989) 4.680
a
The serial number of the protein.
b
Tm Index.
Table 2
Tm weight values for 400 possible dipeptides. The rows denote the rst residue of the dipeptide, and the columns denote the second residue. Weight values >100 are
highlighted in light gray, and those <100 are highlighted in deep gray.
Dipeptide Tm weight values
A C D E F G H I K L M N P Q R S T V W Y
A 100 100 48.6 41.8 100 100 100 100 126 130 150 58.8 100 36 54.4 100 141 168 100 87
C 28.2 100 78 29.3 100 161 100 32.1 100 100 100 100 255 100 100 100 62.1 100 100 100
D 100 100 100 100 142 12 24.7 55.8 100 134 25.1 52.4 100 165 100 143 48 140 99 100
E 100 26.7 100 110 100 178 100 100 100 64.4 102 143 62.6 54.5 142 83.1 115 100 100 63.5
F 100 168 100 100 3.45 100 248 87.2 153 16.7 100 100 3.45 168 100 144 100 26.3 197 100
G 100 100 100 174 107 124 100 150 66.5 105 51.9 138 100 15 61.3 138 100 2.2 151 67.3
H 100 402 93 100 177 100 100 59 24.5 33.4 100 36 189 100 28.6 184 100 100 100 100
I 100 43 35.3 100 37 136 244 100 100 100 100 100 100 100 100 100 158 100 100 153
K 100 29 100 100 105 100 100 100 141 47.5 100 165 58.3 100 137 57.5 100 132 100 121
L 100 56.3 100 132 100 63.7 100 100 100 133 32.2 100 100 54.2 100 100 100 100 100 173
M 164 100 226 23.4 100 153 100 100 100 100 209 219 100 100 100 46.6 v37.2 37.1 100 100
N 100 76 100 102 100 100 100 100 3.49 100 164 2.25 100 116 166 100 100 161 100 88.1
P 154 159 100 100 100 61 100 100 100 116 100 100 139 168 34 100 52.3 88.6 100 100
Q 100 8.5 147 156 42.1 132 188 60.1 100 95.1 100 100 100 253 15.8 63.7 100 43.6 40.6 149
R 62.1 100 140 66.3 100 100 25.9 16.3 137 100 100 100 159 47.5 144 67.6 100 93.6 100 100
S 68.8 100 159 68.6 149 100 88.5 100 100 136 100 100 100 100 100 44.6 100 55.6 100 100
T 129 100 100 106 100 100 100 100 34.1 100 100 100 62.8 100 100 100 100 100 26.8 100
V 100 173 100 100 100 73.4 24.5 100 174 158 100 100 43.6 100 53.3 100 87.3 146 100 41.2
W 100 153 100 25 100 151 100 211 100 69 306 100 84 100 158 25.5 100 100 100 23
Y 100 100 151 140 100 11.7 191 100 100 100 100 88.1 197 212 100 40.1 100 30.9 100 21.6
Table 3
The predictedhighTmproteinpercentage (HTPP) of 75genomes, rankedbydecreas-
ing HTPP. Mesophiles (OGT, <55
C) are highlighted in white, thermophiles (OGT,

5580
C) in light gray, and hyperthermophiles (OGT, >80
C) in deep gray.
Genome Kingdom OGT
a
(
C) HTPP
b
Aquifex aeolicus B 90 66.9
Pyrococcus abyssi A 97 62.3
Thermotoga maritime B 80 62.2
Thermoanaerobacter tengongensis B 75 61.1
Pyrococcus horikoshii A 95 60.5
Pyrococcus furiosus A 98 60.1
Aeropyrum pernix A 90 58.1
Archaeoglobus fulgidus A 82 57.8
Methanococcus jannaschii A 85 57.7
Pyrobaculum aerophilum A 98 57.4
Bacillus halodurans B 30 57.3
Methanopyrus kandleri A 98 56.8
Sulfolobus solfataricus A 78 56.7
Sulfolobus tokodaii A 80 55.7
Helicobacter pylori 26695 B 37 55.6
Bacillus subtilis B 30 55.2
Chlamydia muridarum B 37 54.5
Campylobacter jejuni B 37 54.1
Chlamydia trachomatis B 37 53.4
Pasteurella multocida B 37 52.9
Listeria monocytogenes strain EGD B 37 52.5
Mycoplasma pulmonis B 37 52.4
Lactococcus lactis subsp. lactis B 30 52.3
Streptococcus pneumoniae R6 B 37 52.2
Listeria innocua Clip11262 B 37 52.1
Methanosarcina mazei strain Goe1 A 37 52.0
Thermoplasma volcanium A 60 51.9
Fusobacterium nucleatum subsp. B 37 51.2
Chlamydophila pneumoniae CWL029 B 37 50.7
Methanobacterium thermoautotrophicum A 65 50.6
Thermoplasma acidophilum A 58 50.4
Clostridium perfringens B 37 50.3
Staphylococcus aureus N315 B 37 49.8
Chlamydophila pneumoniae AR39 B 37 49.4
Staphylococcus aureus strain Mu50 B 37 49.3
Methanosarcina acetivorans str. C2A A 39 49.2
Staphylococcus aureus MW2 B 37 49.1
Haemophilus inuenzae Rd B 37 49.0
Streptococcus pyogenes B 37 48.7
Neisseria meningitidis Z2491 B 37 48.2
Buchnera aphidicola str. APS B 26 47.5
Synechocystis sp. PCC 6803 B 37 46.7
Neisseria meningitidis MC58 B 37 46.6
Mycoplasma genitalium B 37 46.1
Clostridium acetobutylicum ATCC824 B 37 46.0
Borrelia burgdorferi B 35 45.9
Mycoplasma pneumoniae B 37 45.8
Nostoc sp. PCC 7120 B 30 44.3
Vibrio cholerae B 28 44.1
Salmonella typhimurium LT2 B 37 43.2
Chlorobium tepidum TLS B 48 43.1
Yersinia pestis KIM B 37 42.5
Deinococcus radiodurans B 30 42.2
Escherichia coli K12 B 37 42.0
Ureaplasma urealyticum B 37 41.6
Corynebacterium glutamicum B 30 40.9
Agrobacterium tumefaciens strain C58 B 30 40.6
Sinorhizobium meliloti 1021 B 26 40.5
Escherichia coli O157:H7 B 37 40.4
Escherichia coli O157:H7 EDL933 B 37 40.2
Pseudomonas aeruginosa B 37 40.1
Caulobacter crescentus B 30 39.5
Streptomyces coelicolor A3(2) B 28 38.2
Halobacterium sp. NRC-1 A 37 37.4
Xylella fastidiosa 9a5c B 26 37.3
Brucella melitensis B 37 37.3
Rickettsia conorii Malish 7 B 37 37.3
Mesorhizobium loti B 26 36.5
Rickettsia prowazekii B 37 36.4
Ralstonia solanacearum B 30 35.2
Mycobacterium leprae B 37 35.2
Mycobacterium tuberculosis H37Rv B 37 34.5
Table 3 (Continued)
Genome Kingdom OGT
a
(
C) HTPP
b
Treponema pallidum B 37 34.4
Mycobacterium tuberculosis CDC1551 B 37 34.3
Xanthomonas campestris pv. campestris B 26 33.2
a
Optimal growth temperature.
b
High Tm (Tm>65
C, Tm Index>1) protein percentages.

We applied the data in Table 2 to calculate TI values that were
then used to predict a range of Tmfor various proteins (Table 1 and
Fig. 1). A TI >1 implies that the Tmvalue of the protein may exceed
65
C (high Tm protein), whereas a TI <0 implies that the Tm value

may be below55
C(lowTmprotein). As summarizedinTable 1and

Fig. 1, the accuracy of these predictions is 100% in each experimen-
tal group. We thus propose that the analysis of dipeptides within
a given amino acid sequence may yield a reliable range for the Tm
value of the corresponding protein. Moreover, these data demon-
strate the propensity of certaindipeptides to signicantly affect the
melting temperature of a protein in its natural state.
3.2. Prediction of complete microbial genomes
Proteins fromhyperthermophilic microorganisms cangenerally
withstand temperatures higher than the boiling point of water.
These proteins are also generally strongly piezostable (Mombelli
et al., 2002; van den Burg, 2003). This observation has theoretical
relevance, since an understanding of the effects of pressure and
temperature on the stability of a protein is equally as important
as structural aspect when developing a comprehensive model of
its thermodynamic stability. However, the structural features that
explain the correlation between resistance against heat and resis-
tance against pressure are complex and not adequately understood
(Mombelli et al., 2002). Froma biotechnological perspective, hyper-
thermophilic enzymes seemto be more suitable for bioprocesses at
high temperature and pressure than their mesophilic counterparts
(Mombelli et al., 2002; van den Burg, 2003). Hyperthermophilic
bacteria grow optimally in the range 80110
C, and thus these

organisms express proteins whose structure and biological activity
can withstand very high temperatures.
In our test samples, we also included proteins encoded by
hyperthermophilic microbial genomes that have been completely
sequenced and deposited on the NCBI web site. Table 3 shows
our testing 75 genomes set which including about 150,000 pro-
Fig. 1. TI (Tm Index) of the high and low Tm proteins used in the analysis. Full
squares denote high Tm proteins and hollow squares denote low Tm proteins. The
proteins may be identied by their serial number given in column 1 of Table 1.
T. Ku et al. / Computational Biology and Chemistry 33 (2009) 445450 449
Fig. 2. The distribution of the high Tm (Tm>65
C, TI >1) protein percentages

(HTPP) of the 75 analyzed microbial genomes. Mesophiles (OGT<55
C) are high-
lighted in white, thermophiles (55
C<OGT<80
C) in gray, and hyperthermophiles

(OGT>80
C) in black.
teins that we predicted. For each genome, we calculated a high
Tmprotein percentage (HTPP), which designates the percentage of
encoded proteins for which the predicted Tm is greater than 65
C.
All mesophiles had an HTPP of less than 56%, whereas the HTPP
for the hyperthermophilic bacterial genomes exceeded 56%. The
only exception was Bacillus halodurans, a facultative alkaliphile and
extremophile fromdeep-sea environments (Takami and Horikoshi,
2000). This unique high-pressure niche may be the major factor for
the unexpected predicted result.
The clear boundary (Table 3) between the hyperthermophiles
and the mesophiles indicates that the rapid TI method can distin-
guishbetweenthese two types of organisms. This analysis included
both bacteria and archaea, and thus the phylogenetic relationships
do not affected the results.
The optimal growth temperature (OGT) of the rst 14 genomes
listed in Table 3 is greater than 75
C except Bacillus halodu-

rans. The OGT of the three thermophiles, Thermoplasma volcanium,
Methanobacterium thermoautotrophicum and Thermoplasma aci-
dophilum, is 60, 65 and 58
C, respectively. The fact that the OGT

for these organisms is less than 75
C may explain the low HTPP

(Table 3). The average HTPP of the mesophiles is 45.1%, whereas
that of the hyperthermophiles is markedly higher, at 59.8%. The
average HTPP of the thermophiles is 55.6%.
Fig. 2 shows the distribution of HTPP among the 75 genomes.
The distribution of the HTPP of the hyperthermophiles is 5667%.
Apparently, the 56% HTPP is a necessary requirement for the OGT
of hyperthermophiles. The distribution of the HTPP of the ther-
mophiles is 5062%. By inspection, the 50% HTPP is the lowest
requirement for the OGT of thermophiles. For most mesophiles, the
HTPP ranged from45 to 50%. However, the distribution of the HTPP
of mesophiles is relatively wide, ranging from33 to 56%, suggesting
that a HTPP of 33%represents a lowest requirement for mesophiles,
given their OGT. The signicant differences in the predicted distri-
bution of HTPP among the hyperthermophiles, thermophiles and
mesophiles reect the OGT distribution (Table 3 and Fig. 2).
3.3. Applications in protein engineering
This data suggest that the stability of proteins is possibly deter-
mined by the order of certain amino acids in its sequence. This
is consisting with the previous study (Reddy, 1996; Reddy et al.,
1998a,b). Recently, somedifferent methods whichcalculateprotein
stabilityare proposed(Vondraseket al., 2007; GhoshandDill, 2009;
Persikov et al., 2005; Nigsch et al., 2006; Folch et al., 2008). Our and
these studies provide the phenomenon that no single mechanism
is responsible for the remarkable thermodynamic stability of these
proteins (Zhou et al., 2008).
Numerous studies have shown that protein inactivation
becomes signicant only a few degrees below its Tm (Vieille and
Zeikus, 2001). In a previous study, an increase in Tm of a modied
esterase always resulted in an increase of the enzymes temper-
ature of maximal activity (Giver et al., 1998). Our analysis using
the rapid TI method for predicting the Tm of a protein indicates
that Tm likely is an intrinsic property of primary structure. This is
the rst report to demonstrate that sequence-specic elements are
signicant with regard to Tm. Our data on the specic character-
istics of dipeptides may contribute to the modication of existing
proteins (enzymes, protein drugs and vaccines) or the design of
novel proteins molecules having a desired Tm (Hsiao et al., 2003).
The recent application of our TI method is the stabilization of the
TS-23 alpha-amylase by replacing histidine-436 with aspartate (Lo
et al., 2005). In this case, the Tm Index of H436D is higher than
the wild type enzyme. Based on the above experimental valida-
tion, the results presented in Table 2 will instruct protein engineers
on how to change individual residues such that dipeptides having
low Tm weight values are converted to those having high values,
thereby increasing the Tm of the target protein. Table 2 also high-
lights the dipeptides that most affect Tm. The TI (TmIndex) method
is provided as a free software platformcomposed of a dipeptide Tm
weight value table and a web-based interface (Online TmPredictor,
see also http://tm.life.nthu.edu.tw/).
Acknowledgements
We thank the National Science Council, Taiwan, for nan-
cial support of this research under Contract No. NSC98-
3112-B-007-006, and the computational proteomics service(s)
provided by the GMBD Bioinformatics Core, NRPGM, Taiwan
(http://www.tbi.org.tw/).
References
Ahmed, S.A., Kawasaki, H., Bauerle, R., Morita, H., Miles, E.W., 1988. Site-directed
mutagenesis of the alpha subunit of tryptophan synthase from Salmonella
typhimurium. Biochem. Biophys. Res. Commun. 151, 672678.
Azuaga, A.I., Sepulcre, F., Padros, E., Mateo, P.L., 1996. Scanning calorimetry and
Fourier-transform infrared studies into the thermal stability of cleaved bacteri-
orhodopsin systems. Biochemistry 35, 1632816335.
Bae, S.J., Sturtevant, J.M., 1995. Thermodynamics of the thermal unfolding of eglin
c in the presence and absence of guanidinium chloride. Biophys. Chem. 55,
247252.
Boice, J.A., Fairman, R., 1996. Structural characterization of the tumor suppressor
p16, an ankyrin-like repeat protein. Protein Sci. 5, 17761784.
Bolon, D.N., Mayo, S.L., 2001. Polar residues in the protein core of Escherichia coli
thioredoxin are important for fold specicity. Biochemistry 40, 1004710053.
Borghouts, C., Kunz, C., Groner, B., 2005. Current strategies for the development of
peptide-based anti-cancer therapeutics. J. Pept. Sci. 11, 713726.
Brown, L.R., 2005. Commercial challenges of proteindrugdelivery. Expert Opin. Drug
Deliv. 2, 2942.
Burova, T.V., Choiset, Y., Jankowski, C.K., Haertle, T., 1999. Conformational stability
and binding properties of porcine odorant binding protein. Biochemistry 38,
1504315051.
Burova, T.V., Bernhardt, R., Pfeil, W., 1995. Conformational stability of bovine holo
and apo adrenodoxina scanning calorimetric study. Protein Sci. 4, 909916.
Cambillau, C., Claverie, J.M., 2000. Structural and genomic correlates of hyperther-
mostability. J. Biol. Chem. 275, 3238332386.
Chakravarty, S., Varadarajan, R., 2000. Elucidation of determinants of protein stabil-
ity through genome sequence analysis. FEBS Lett. 470, 6569.
Culajay, J.F., Blaber, S.I., Khurana, A., Blaber, M., 2000. Thermodynamic charac-
terization of mutants of human broblast growth factor 1 with an increased
physiological half-life. Biochemistry 39, 71537158.
Del Vecchio, P., Graziano, G., Granata, V., Barone, G., Mandrich, L., Manco, G., Rossi,
M., 2002. Temperature- and denaturant-induced unfolding of two thermophilic
esterases. Biochemistry 41, 13641371.
Folch, B., Rooman, M., Dehouck, Y., 2008. Thermostability of salt bridges versus
hydrophobic interactions in proteins probed by statistical potentials. J. Chem.
Inf. Model. 48, 119127.
Frokjaer, S., Otzen, D.E., 2005. Protein drug stability: a formulation challenge. Nat.
Rev. Drug Discov. 4, 298306.
Ghosh, K., Dill, K.A., 2009. Computing protein stabilities from their chain lengths.
Proc. Natl. Acad. Sci. U.S.A. 106, 1064910654.
Gianese, G., Bossa, F., Pascarella, S., 2002. Comparative structural analysis of psy-
chrophilic and meso- and thermophilic enzymes. Proteins 47, 236249.
Giletto, A., Pace, C.N., 1999. Buried, charged, non-ion-paired aspartic acid 76
contributes favorably to the conformational stability of ribonuclease T1. Bio-
chemistry 38, 1337913384.
Giver, L., Gershenson, A., Freskgard, P.O., Arnold, F.H., 1998. Directed evolution of a
thermostable esterase. Proc. Natl. Acad. Sci. U.S.A. 95, 1280912813.
Guruprasad, K., Reddy, B.V., Pandit, M.W., 1990. Correlation between stability of a
protein and its dipeptide composition: a novel approach for predicting in vivo
stability of a protein from its primary sequence. Protein Eng. 4, 155161.
Haney, P.J., Badger, J.H., Buldak, G.L., Reich, C.I., Woese, C.R., Olsen, G.J., 1999. Thermal
adaptation analyzed by comparison of protein sequences from mesophilic and
extremely thermophilic Methanococcus species. Proc. Natl. Acad. Sci. U.S.A. 96,
35783583.
Harushima, Y., Sugai, S., 1989. Hydrogen exchange of the tryptophan residues
in bovine, goat, guinea pig, and human alpha-lactalbumin. Biochemistry 28,
85688576.
Hollien, J., Marqusee, S., 1999. A thermodynamic comparison of mesophilic and
thermophilic ribonucleases H. Biochemistry 38, 38313836.
Hsiao, N.W., Samuel, D., Liu, Y.N., Chen, L.C., Yang, T.Y., Jayaraman, G., Lyu, P.C.,
2003. Mutagenesis study on the zebra sh SOX9 high-mobility group: com-
parison of sequence and non-sequence specic HMGdomains. Biochemistry 42,
1118311193.
Ishii, Y., 1994. The local and global unfolding of coiled-coil tropomyosin. Eur. J.
Biochem. 221, 705712.
Johnson, C.R., Morin, P.E., Arrowsmith, C.H., Freire, E., 1995. Thermodynamic analysis
of the structural stability of the tetrameric oligomerizationdomainof p53tumor
suppressor. Biochemistry 34, 53095316.
Karantza, V., Freire, E., Moudrianakis, E.N., 2001. Thermodynamic studies of the core
histones: stability of the octamer subunits is not altered by removal of their
terminal domains. Biochemistry 40, 1311413123.
Kellis Jr., J.T., Nyberg, K., Fersht, A.R., 1989. Energetics of complementary side-chain
packing in a protein hydrophobic core. Biochemistry 28, 49144922.
Lebbink, J.H., Knapp, S., van der Oost, J., Rice, D., Ladenstein, R., de Vos, W.M., 1999.
Engineering activity andstability of Thermotoga maritima glutamate dehydroge-
nase. II: construction of a 16-residue ion-pair network at the subunit interface.
J. Mol. Biol. 289, 357369.
Lentz, B.R., Zhou, C.M., Wu, J.R., 1994. Phosphatidylserine-containing membranes
alter the thermal stability of prothrombins catalytic domain: a differential scan-
ning calorimetric study. Biochemistry 33, 54605468.
Leppanen, V.M., Nummelin, H., Hansen, T., Lahti, R., Schafer, G., Goldman, A., 1999.
Sulfolobus acidocaldarius inorganic pyrophosphatase: structure, thermosta-
bility, and effect of metal ion in an archael pyrophosphatase. Protein Sci. 8,
12181231.
Leung, K.W., Liaw, Y.C., Chan, S.C., Lo, H.Y., Musayev, F.N., Chen, J.Z., Fang, H.J.,
Chen, H.M., 2001. Signicance of local electrostatic interactions in staphylococ-
cal nuclease studied by site-directed mutagenesis. J. Biol. Chem. 276, 46039
46045.
Leveque, V.J., Stroupe, M.E., Lepock, J.R., Cabelli, D.E., Tainer, J.A., Nick, H.S., Silver-
man, D.N., 2000. Multiple replacements of glutamine 143 in human manganese
superoxide dismutase: effects onstructure, stability, andcatalysis. Biochemistry
39, 71317137.
Liang, H.K., Huang, C.M., Ko, M.T., Hwang, J.K., 2005. Amino acid coupling patterns
in thermophilic proteins. Proteins 59, 5863.
Lo, H.F., Chen, Y.H., Hsiao, N.W., Chen, H.L., Hu, H.Y., Hsu, W.H., Lin, L.L., 2005. Sta-
bilization of a truncated Bacillus sp. strain TS-23 alpha-amylase by replacing
histidine-436 with aspartate. World J. Microbiol. Biotechnol. 21, 411416.
Marr, A.K., Gooderham, W.J., Hancock, R.E., 2006. Antibacterial peptides for thera-
peutic use: obstacles and realistic outlook. Curr. Opin. Pharmacol. 6, 468472.
Masino, L., Martin, S.R., Bayley, P.M., 2000. Ligand binding and thermo-
dynamic stability of a multidomain protein, calmodulin. Protein Sci. 9,
15191529.
Mombelli, E., Shehi, E., Fusi, P., Tortora, P., 2002. Exploring hyperthermophilic pro-
teins under pressure: theoretical aspects and experimental ndings. Biochim.
Biophys. Acta 1595, 392396.
Morii, H., Uedaira, H., Ogata, K., Ishii, S., Sarai, A., 1999. Shape and energetics of a
cavity in c-Myb probed by natural and non-natural amino acid mutations. J.
Mol. Biol. 292, 909920.
Moutiez, M., Burova, T.V., Haertle, T., Quemeneur, E., 1999. On the non-respect of
the thermodynamic cycle by DsbA variants. Protein Sci. 8, 106112.
Nigsch, F., Bender, A., van Buuren, B., Tissen, J., Nigsch, E., Mitchell, J.B., 2006. Melting
point prediction employing k-nearest neighbor algorithms and genetic param-
eter optimization. J. Chem. Inf. Model. 46, 24122422.
Padmanabhan, S., Laurents, D.V., Fernandez, A.M., Elias-Arnanz, M., Ruiz-Sanz, J.,
Mateo, P.L., Rico, M., Filimonov, V.V., 1999. Thermodynamic analysis of the struc-
tural stability of phage 434 Cro protein. Biochemistry 38, 1553615547.
Pakula, A.A., Sauer, R.T., 1990. Reverse hydrophobic effects relieved by amino-acid
substitutions at a protein surface. Nature 344, 363364.
Pedone, E., Cannio, R., Saviano, M., Rossi, M., Bartolucci, S., 1999. Prediction and
experimental testing of Bacillus acidocaldarius thioredoxin stability. Biochem. J.
339 (Pt 2), 309317.
Persikov, A.V., Ramshaw, J.A., Brodsky, B., 2005. Prediction of collagen stability from
amino acid sequence. J. Biol. Chem. 280, 1934319349.
Prevost, M., Wodak, S.J., Tidor, B., Karplus, M., 1991. Contribution of the hydropho-
bic effect to protein stability: analysis based on simulations of the Ile-96-Ala
mutation in barnase. Proc. Natl. Acad. Sci. U.S.A. 88, 1088010884.
Rahil, J., Pratt, R.F., 1994. Characterization of covalently bound enzyme inhibitors
as transition-state analogs by protein stability measurements: phosphonate
monoester inhibitors of a beta-lactamase. Biochemistry 33, 116125.
Reddy, B.V., 1996. Structural distributionof dipeptides that areidentiedtobedeter-
minants of intracellular protein stability. J. Biomol. Struct. Dyn. 14, 201210.
Reddy, B.V., Datta, S., Tiwari, S., 1998a. Use of propensities of amino acids to the
local structural environments to understand effect of substitution mutations on
protein stability. Protein Eng. 11, 11371145.
Reddy, B.V., Ramesh, P., Tiwari, S., 1998b. MEICPS: substitution mutations to engi-
neer intracellular protein stability. Bioinformatics 14, 225226.
RemmeleJr., R.L., Bhat, S.D., Phan, D.H., Gombotz, W.R., 1999. Minimizationof recom-
binant human Flt3 ligand aggregation at the Tm plateau: a matter of thermal
reversibility. Biochemistry 38, 52415247.
Richards, F.M., 1997. Protein stability: still an unsolved problem. Cell Mol. Life Sci.
53, 790802.
Rosenberg, M., Goldblum, A., 2006. Computational protein design: a novel path to
future protein drugs. Curr. Pharm. Des. 12, 39733997.
Rudolph, R., Siebendritt, R., Kiefhaber, T., 1992. Reversible unfolding and refolding
behavior of a monomeric aldolase from Staphylococcus aureus. Protein Sci. 1,
654666.
Ruiz-Sanz, J., de Prat Gay, G., Otzen, D.E., Fersht, A.R., 1995. Protein fragments as
models for events in protein folding pathways: protein engineering analysis of
the association of two complementary fragments of the barley chymotrypsin
inhibitor 2 (CI-2). Biochemistry 34, 16951701.
Salmaso, S., Bersani, S., Semenzato, A., Caliceti, P., 2006. Nanotechnologies in protein
delivery. J. Nanosci. Nanotechnol. 6, 27362753.
Sanchez, R., Pieper, U., Melo, F., Eswar, N., Marti-Renom, M.A., Madhusudhan, M.S.,
Mirkovic, N., Sali, A., 2000. Protein structure modeling for structural genomics.
Nat. Struct. Biol. 7 (Suppl.), 986990.
Sanchez-Ruiz, J.M., Lopez-Lacomba, J.L., Mateo, P.L., Vilanova, M., Serra, M.A., Aviles,
F.X., 1988. Analysis of the thermal unfolding of porcine procarboxypeptidase A
and its functional pieces by differential scanning calorimetry. Eur. J. Biochem.
176, 225230.
Sen, A.C., Walsh, M.T., Chakrabarti, B., 1992. An insight into domain structures and
thermal stability of gamma-crystallins. J. Biol. Chem. 267, 1189811907.
Spector, S., Wang, M., Carp, S.A., Robblee, J., Hendsch, Z.S., Fairman, R., Tidor, B.,
Raleigh, D.P., 2000. Rational modication of protein stability by the mutation of
charged surface residues. Biochemistry 39, 872879.
Staniforth, R.A., Giannini, S., Bigotti, M.G., Cutruzzola, F., Travaglini-Allocatelli, C.,
Brunori, M., 2000. A new folding intermediate of apomyoglobin from Aplysia
limacina: stepwise formation of a molten globule. J. Mol. Biol. 297, 12311244.
Suhre, K., Claverie, J.M., 2003. Genomic correlates of hyperthermostability, an
update. J. Biol. Chem. 278, 1719817202.
Szilagyi, A., Zavodszky, P., 2000. Structural differences between mesophilic, mod-
erately thermophilic and extremely thermophilic protein subunits: results of a
comprehensive survey. Structure 8, 493504.
Takami, H., Horikoshi, K., 2000. Analysis of the genome of an alkaliphilic Bacillus
strain from an industrial point of view. Extremophiles 4, 99108.
van den Burg, B., 2003. Extremophiles as a source for novel enzymes. Curr. Opin.
Microbiol. 6, 213218.
van den Burg, B., Eijsink, V.G., 2002. Selection of mutations for increased protein
stability. Curr. Opin. Biotechnol. 13, 333337.
Vieille, C., Zeikus, G.J., 2001. Hyperthermophilic enzymes: sources, uses, and molec-
ular mechanisms for thermostability. Microbiol. Mol. Biol. Rev. 65, 143.
Vogt, G., Woell, S., Argos, P., 1997. Protein thermal stability, hydrogen bonds, and
ion pairs. J. Mol. Biol. 269, 631643.
Vondrasek, J., Kubar, T., Jenney Jr., F.E., Adams, M.W., Kozisek, M., Cerny, J., Sklenar,
V., Hobza, P., 2007. Dispersion interactions govern the strong thermal stability
of a protein. Chemistry 13, 90229027.
Wele, K., Misselwitz, R., Politz, O., Borriss, R., Wele, H., 1996. Individual amino
acids inthe N-terminal loopregiondetermine the thermostability andunfolding
characteristics of bacterial glucanases. Protein Sci. 5, 22552265.
Zavodszky, P., Kardos, J., Svingor, Petsko, G.A., 1998. Adjustment of conformational
exibility is a key event in the thermal adaptation of proteins. Proc. Natl. Acad.
Sci. U.S.A. 95, 74067411.
Zhou, X.X., Wang, Y.B., Pan, Y.J., Li, W.F., 2008. Differences inaminoacids composition
and coupling patterns between mesophilic and thermophilic proteins. Amino
Acids 34, 2533.

Predicting Melting Temperature Directly From Protein Sequences - Chi Giang Ho

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Predicting Melting Temperature Directly From Protein Sequences - Chi Giang Ho

Diunggah oleh

Hak Cipta:

Format Tersedia

Computational Biology and Chemistry 33 (2009) 445450

Contents lists available at ScienceDirect

Corresponding author. Tel.: +886 47232105; fax: +886 47128758.

C) are highlighted in white, thermophiles (OGT,

C) in light gray, and hyperthermophiles (OGT, >80

C, Tm Index>1) protein percentages.

C (high Tm protein), whereas a TI <0 implies that the Tm value

C(lowTmprotein). As summarizedinTable 1and

C, and thus these

C, TI >1) protein percentages

C) in gray, and hyperthermophiles

C except Bacillus halodu-

C, respectively. The fact that the OGT

C may explain the low HTPP

Anda mungkin juga menyukai