Anda di halaman 1dari 35

Structural Prediction through sequence analysis using bioinformatics tools:

Prediction of the most distant species of fmtA protein from a variety of 10 species
using multiple sequence analysis

Nemerisza I. ARAN, Relyando DE FIESTA, Dane Nathalie T. MIRANDA and Justine
Rose F. SANTOS
Chemical Engineering Department, Technological Institute of the Philippines, Manila
1001, Philippines


Abstract

Various bacteria from different species that contains the fmtA protein was obtained using
the NCBI genbank. This paper contains fmtA protein sequences from gram-positive and
gram-negative bacteria that has different functions and characteristics. With the use of
protscale, the researchers were able to determine the molecular weight and chemical
formula of each species. B. pseudomallei with a molecular weight of 82492.3 Da is
considered to be the heaviest among the remaining bacteria and S. aureus with a
molecular weight of 46067.4 Da being the lightest. All the species chosen gave partial
hydrophobic results based on the Protscale result and all being hydrophilic in the
protparam result, G. lozoyensis having the highest number of major peaks and S.
epedermidis being the lowest. Data of motifs that matched in each sequence was given
by MotifScan, only two bacteria gave no match and the rest gave possible results. Two
domains were identified, the -lactamase and TonB, some shared the same domain.
Proteases and chemicals that are capable of digesting the given bacteria was given by
Peptide cutter.
Secondary structure analysis using Psipred determined how many -helices, -sheets
and random coils each fmtA protein have. S. sanguinis has the most -helix, B.
pseudomallei and H. alvei tied in having the most number of -sheets and Y. regensburgei
having the most number of random coils.
It was found that B. pseudomallei is farthest from the reference ancestral fmtA protein
with a distance of 19.57, and was used in determining the tertiary and quaternary
structure.

Keywords: fmtA protein, protein structure, multiple sequence alignment, bacteria,
bioinformatics


Introduction

Protein sequences emerging from genome sequencing projects are of greatest value to
medicine and biology if their structure and function can be identified. With the growing
number of annotate sequences, association of a new sequence to a protein of known
structure can be a significant step towards the identification of its biological role. Simple
sequence search methods such as FASTA (Pearson and Lipman, 1988) or NCBI readily
identify close homologs of protein sequences.

Multiple Sequence Alignment improves the detection of distantly related homologous
protein. ClustalW and Tcoffee are the most common procedure when doing multiple
sequence alignment, the sequences were grouped according to their similarities into a
tree (hierarchical cluster analysis). Starting with the most similar pairs, all the sequences
are aligned stepwise to each other using the dynamic programming method. The aligned
sequences are output as well as the cluster analysis, but these procedures normally do
not include any statistical analysis of the significance of the alignment.

The fmtA gene was identified to be a methicillin resistance factor in Staphylococcus
aureus. Inactivation of fmtA leads to increased sensitivity of methicillin-resistant S.aureus
strains (MRSA) to Triton X-100 and -lactams and decreases the level of highly cross-
linked peptidoglycan (PG) (Komatsuzawa et al). Ten sequences of fmtA gene from
different bacterial species were entered to a protein structure identifier database to
compare the primary to secondary structure of fmtA genes from each given species of
bacteria. Then, having the most distant specie in predicting the structure of tertiary and
quaternary structure.

Methods

Control dataset
A dataset of fmtA protein from different bacterial species obtained from NCBI GenBank
(http://www.ncbi.nlm.nih.gov/) is listed here: EEV75348.1, EFH09490.1, EHL01514.1,
WP_009496590.1, YP_001066210, WP_004191820.1, WP_002496510.1, EHM51293.1,
EHL97443.1, EHM40511.1. The protein sequences of this 10 accession numbers were
used throughout this paper.
Physico-chemical properties. For the computation of physico-chemical properties,
ProtParam (http://web.expasy.org/protparam/) was used. The computed parameters
include the molecular weight, theoretical pI, amino acid composition, atomic composition,
estimated half-life and grand average of hydropathicity (GRAVY). Extinction coefficient,
absorbance and aliphatic index was calculated using the following formula
E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine)
Absorb(Prot) = E(Prot) / Molecular_weight
Aliphatic index = X(Ala) + a * X(Val) + b * [ X(Ile) + X(Leu) ]
Identifying Protein Domains. InterProScan (www.ebi.ac.uk/InterProScan/) combines
different protein signature recognition methods and allows the comparison of a certain
sequence from the InterPro, a domain database that includes most of the major domain
collections available online.
Motifs determination. Motif (http://myhits.isb-sib.ch/cgi-bin/motif_scan) scanning means
finding all known motifs that occur in a sequence. In using this database, the result must
be filtered and shown using PROSITE profiles.
Transmembrane Segment Prediction.
THMM (http://www.cbs.dtu.dk/services/TMHMM/) is used to predict transmembrane
segments in protein. It also tells about the portion of proteins that are probably inside and
outside the cell.
Locating Coiled-coil Region. In determining the coiled-coil region of the given protein
structure, http://ch.embnet.org/software/COILS_form.html. COILS is a program that
compares a sequence to a database of known parallel two-stranded coiled-coils and
derives a similarity score.
Hydrophobicity prediction. In computing ang representing the profile produced by any
amino acid scale on a selected protein, ProtScale (http://web.expasy.org/protscale/) was
used. It is a two-dimensional plot wherein the hydrophobicity of a protein is given account
to. This concerns major and minor peaks that are responsible in the determination of the
hydrophobic site. In analyzing the plot, the number of hydrophobic and hydrophilic peaks
was counted at the score of ()1
st
respectively.
Detecting PROSITE signature matches. To detect which functional group or protein will
help in increasing the functional diversity of proteome, a trusted protein database was
used (http://prosite.expasy.org/scanprosite/). ScanProsite is a web-based tool in
determining which prosite pattern a certain sequence is located. It is also designed for
checking if other proteins contain the same sequence.
Predicting cleavage sites. PeptideCutter (http://web.expasy.org/peptide_cutter/) was
used in predicting potential cleavage sites cleaved by proteases or chemicals in a given
protein sequence. This tool can be helpful in determining whether the chosen protein can
interact with the available enzyme in the database. Enzymes can be chosen all at once,
and can also be chosen one at a time depending on how the user wants in to be.



Prediction of the secondary structure
PSIPRED (www.bioinf.cs.ucl.ac.uk/psipred/) is a popular structure prediction method to
accurately predict the secondary structure of any protein. It can say how many alpha helix
and beta sheets are in there in a protein structure.

Prediction of the tertiary structure
Dihedral angles between C-C (, psi) and N-C (, phi) of amino acid residues and
empirical distribution of data points in a protein structure are determined using Rampage
(http://mordred.bioc.cam.ac.uk/~rapper/rampage.php).

Prediction of the quaternary structure
SWISS-MODEL (http://swissmodel.expasy.org/) is a structural bioinformatics web-
server dedicated to homology modeling of protein 3D structures. Homology modeling is
currently the most accurate method to generate reliable three-dimensional protein
structure models and is routinely used in many practical applications.
Results and Dscussion
Primary sequence analysis
















[Sporosarcina newyorkensis]
Number of amino acids 625
MW 72531.6 Da GRAVY: -0.1
Instability Index 43.79, protein is unstable Aliphatic Index 99.28
Extinction Coefficients 113110 Absorbance 1.559
[Burkholderia pseudomallei 1106a]
Number of amino acids 753
MW 82434.2 Da
Instability Index 31.03, protein is stable Aliphatic Index 75.34
Extinction Coefficients 153795 Absorbance 1.864
[Streptococcus sanguinis]
Number of amino acids 592
MW 67379.2 Da GRAVY: -0.235
Instability Index 26.40, protein is stable Aliphatic Inedx 88.26
Extinction Coefficients 130640 Absorbance 1.939
[Staphylococcus aureus A8115]
Number of amino acids 397
MW 46067.4 Da GRAVY: -0.561
Instability Index 28.01, protein is stable Aliphatic Index 85.08
Extinction Coefficients 56160 Absorbance 1.219
[Roseomonas cervicalis ATCC 49957]
Number of amino acids 731
MW 79740.9 Da GRAVY: -0.312
Instability Index 36.20, protein is stable Aliphatic Index 76.51
Extinction Coefficients 122620 Absorbance 1.538
[Glarea lozoyensis 74030]
Number of amino acids 513
MW 56389.9 Da GRAVY: -0.127
Instability Index 29.45, protein is stable Aliphatic Index 92.76
Extinction Coefficients 55350 Absorbance 0.982
[Staphylococcus epidermidis]
Number of amino acids 400
MW 46620.6 Da GRAVY: -0.584
Instability Index 28.36, protein is stable Aliphatic Index 81.45
Extinction Coefficients 57190 Absorbance 1.227
[Yokenella regensburgei ATCC 43003]
Number of amino acids 733
MW 81182.9 Da GRAVY: -0.521
Instability Index 36.68, protein is stable Aliphatic Index 60.95
Extinction Coefficients 120560 Absorbance 1.485
[Acetobacteraceae bacterium AT-5844]
Number of amino acids 720
MW 79079.1 Da GRAVY: -0.327
Instability Index 29.54, protein is stable Aliphatic Index 75.67
Extinction Coefficients 106120 Absorbance 1.342
[Hafnia alvei ATCC 51873]
Number of amino acids 728
MW 80466.0 Da GRAVY: -0.531
Instability Index 31.68, protein is stable Aliphatic Index 63.69
Extinction Coefficients 106120 Absorbance 1.319
Fig.1. Physico-chemical properties of the fmtA protein in
different bacteria.

Identifying Protein Domains

Fig.2a. Resulting domain for gram-positive bacteria

Fig.2b. Resulting domain for gram-negative bacteria



Matches E-value
[Staphylococcus aureus
A8115]
Beta-lactamase
Beta-lactamase
PENICILLIN-BINDING PROTEIN
transmembrane_regions
signal-peptide
5.0E-62 [88-383] T
5.0E-62 [88-383] T
5.9E-33 [15-295] T
-1.0 [9-27] ?
-1.0 [1-31] ?
[Roseomonas cervicalis
ATCC 49957]
G3DSA:2.40.170.20
TonB_dep_Rec
TAT
G3DSA:2.170.130.10
Plug
PTHR32552
PTHR32552:SF0
SSF56935
0.0 [195-731] T
5.3000000000000115E-28 [487-730] T
0.0 [1-40] T
4.099999999813685E-38 [38-192] T
4.400000000000006E-22 [82-181] T
0.0 [7-731] T
0.0 [7-731] T
0.0 [55-731] T
[Glarea lozoyensis 74030]
Beta-lactamase
G3DSA:3.40.710.10
PBP_transp_fold
PTHR22935
1.0999999999999873E-48 [3-354] T
2.9999999998641766E-60 [3-358] T
2.9999875044570637E-61 [5-369] T
3.4999946686394883E-40 [1-337] T
[Sporosarcina newyorkensis]
Beta-lactamase
G3DSA:3.40.710.10
PBP_transp_fold
PTHR22935
1.2000000000000005E-47 [68-379] T
9.099999999558015E-61 [56-393] T
1.2000011745813432E-59 [51-403] T
3.9999854413940615E-41 [68-378] T
[Burkholderia pseudomallei
1106a ]
Beta-lactamase
G3DSA:3.40.710.10
PBP_transp_fold
PTHR22935
1.2000000000000005E-47 [68-379] T
9.099999999558015E-61 [56-393] T
1.2000011745813432E-59 [51-403] T
3.9999854413940615E-41 [68-378] T
[Streptococcus sanguinis]
Beta-lactamase
G3DSA:3.40.710.10
PBP_transp_fold
PTHR22935
3.999999999999978E-51 [56-355] T
2.0999999998359858E-66 [57-355] T
7.40000780398351E-66 [31-371] T
3.800006970935884E-41 [36-352] T
[Staphylococcus
epidermidis]
Beta-lactamase
no description
beta-lactamase/transpeptidase-like
PENICILLIN-BINDING PROTEIN
signal-peptide
transmembrane_regions
3.9E-49 [87-368] T
1.0E-54 [70-368] T
4.5E-68 [62-387] T
3.9E-29 [73-386] T
-1.0 [1-26] ?
-1.0 [9-29] ?
[Yokenella regensburgei
ATCC 43003]
G3DSA:2.40.170.20
TonB_dep_Rec
TonB-siderophor
TONB_DEPENDENT_REC_1
TONB_DEPENDENT_REC_2
G3DSA:2.170.130.10
Plug
0.0 [197-733] T
1.6999999999999923E-27 [489-732] T
0.0 [79-733] T
0.0 [1-48] T
0.0 [716-733] T
3.099999999755361E-40 [32-193] T
1.3000000000000007E-22 [78-182] T
[Acetobacteraceae bacterium
AT-5844]
G3DSA:2.40.170.20
TonB_dep_Rec
TonB-siderophor
G3DSA:2.170.130.10
Plug
PTHR32552
PTHR32552:SF0
SSF56935
0.0 [188-720] T
1.9000000000000044E-32 [494-719] T
9.900000000000002E-130 [76-718] T
8.800000001009721E-38 [53-182] T
2.1000000000000035E-21 [75-174] T
0.0 [39-720] T
0.0 [39-720] T
0.0 [29-720] T
[Hafnia alvei ATCC 51873]
G3DSA:2.40.170.20
TonB_dep_Rec
TonB-siderophor
TONB_DEPENDENT_REC_2
G3DSA:2.170.130.10
Plug
PTHR32552
PTHR32552:SF0
SSF56935
0.0 [196-728] T
1.9999999999999946E-25 [489-727] T
0.0 [80-728] T
0.0 [711-728] T
1.0E-40 [33-192] T
5.099999999999984E-24 [79-181] T
0.0 [52-728] T
0.0 [52-728] T
0.0 [52-728] T
Table 1. Summary of matches and E-value from domains obtained using Interproscan
The functions of unknown proteins can be recognized by matching its motif with those of
the known ones using InterProscan a tool from Expasy. FmtA protein in S. aureus, S.
newyorkensis, S. sanguinis, S. epedermidis have the same domains of Beta-lactamase
related and Beta-lactamase/transpeptidase-like. Beta-lactamase catalyses the opening
and hydrolysis of the beta-lactam ring of beta-lactam antibiotics such as penicillins and
cephalosporins. Most of these antibiotics work by preventing biosynthesis of the bacterial
cell wall. The possibility of Staphylococcal and Streptococcal bacteria to have Beta-
lactamase as the primary domain is that these kind of organisms are capable of resisting
many forms of important antibiotics. G. lozoyensis also share this domain and has
Peptidase S12, Pab87-related, C-terminal as another domain. The common
characteristic of the five (5) bacteria is that they are all gram-positive.
As for fmtA in R. cervicalis, B. pseudomallei, Y. regensburgei, A. bacterium and H. alvei
share the same domains for TonB-dependent receptor, beta-barrel, siderophore receptor
and plug. An extensional domain of TonB-dependent receptor, conserved site for both Y.
regensgurgei and H. alvei. TonB box, conserved site as another domain in H. alvei.
TonB is responsible in interacting with the outer membrane receptor proteins. These
proteins carry out high-affinity binding energy-dependent uptake of specific substrates
into the periplasmic space. The periplasmic space is the space between the cell wall and
the cell membranes. Bacteria that has TonB as domain are all gram-negative, just like R.
cervicalis, B. pseudomallei, Y. regensgurgei, A. bacterium and H. alvei.





Motif information Status Position Raw-score N-score E-value
Fmta [Staphylococcus aureus A8115]
Big-1 (bacterial Ig-like domain 1) domain
BIG1
Weak match 1-9 33 4.128 1.6e+03
Ferric malleobactin receptor fmta
[Roseomonas cervicalis ATCC
49957]
NHL repeat profile
NHL
Weak match 306-318 27 4.009 2.1e+03
Twin arginine translocation
TAT
Weak match 1-40 858 7.882 0.28
Putative protein fmta [Glarea
lozoyensis 74030]
No match
Fmta family protein [Sporosarcina
newyorkensis]
LDL-receptor class B repeat profile
LDLRB
Weak match 617-625 153 5.329 99
Ferric malleobactin transporter
[Burkholderia pseudomallei 1106a ]
Alanine-rich region
ALA_RICH
Strong match 27-83 48 9.353 0.0094
Fmta family protein [Streptococcus
sanguinis]
No match
Fmta family protein [Staphylococcus
epidermidis]
Lysine-rich region
LYS_RICH
Weak match 52-83 40 6.918 2.6
Bipartite nuclear localization signal profile
NLS_BP
Weak match 52-66 3 3.000 2.1e+04
Putative ferric malleobactin receptor
fmta [Yokenella regensburgei ATCC
43003]
Threonine-rich region Weak match 67-136 38 6.929 2.5
Putative ferric malleobactin receptor
fmta [Acetobacteraceae bacterium
AT-5844]
No match
Putative ferric malleobactin receptor
fmta [Hafnia alvei ATCC 51873]
MVP (vault) repeat
MVP
Weak match 61-114 446 5.366 91
Table 2. Corresponding motif for each bacteria; given extra information about the position, raw-score, N-score and E-value


Motifs determination.
Among the resulting motifs, only B. pseudomallei gave the strongest response with
regards to its corresponding motif, which is ALA_RICH, given the highest possible N-
score of 9.353. Both S.sanguinis and A. bacterium gave no match to any motif available
in the MotifScan database. All remaining bacteria gave a weak match, this means that
there is a probability that the given motifs are unsure and does not comply to each given
protein sequence.
E-value provides an estimation of the number of false positives. Among the weak
matches, NLS_BP with an E-value of 2.1x10
4
gave the lowest possibility of it being false
positive match to S. epidermidis. For having an E-value of 99, LDLRB has the highest
possibility of it being a false match to S. newyorkensis.
S. epidermidis is said to be a part of humans normal bacterial flora and is associated with
foreign infection. This cocci has low pathogenic potential for those who have strong
immune system. Having weak response from NLS_BP, which has a primary role in
describing a specific sequence within a protein that is responsible for the translocation
into the cell, there must be a possibility that the protein sequence of S. epidermidis
happens to be have few amino acids that are capable in transferring or infecting other life
forms.
S. newyorkensis is an endospore-forming bacteria capable of transferring some of its
DNA to a host. Endospores are commonly found on places where it can survive for a long
period of time, they can be found on soil and water. The function of LDLRB is to regulate
and maintain internal stability of cholesterol in mammalian cells. The difference between
S. newyorkensis and LDLRB is that one does not have to modify the endospores that it
forms and may let it lie dormant for a very long time, and the other has to check-up on the
cholesterol levels once in a while to make sure that everything is at equilibrium.




Transmembrane Segment Prediction
Table 3. Given the posterior probabilities of being on the inside or outside of the cell
Inside Transmembrane helix Outside
Staphylococcus aureus A8115 1 - 6 7 - 26 27 - 397
Roseomonas cervicalis ATCC 4995 1 - 731
Glarea lozoyensis 7403 1 -513
Sporosarcina newyorkensis
1 - 25
524 - 531
590 - 601
26 - 48
501 - 523
532 - 554
569 - 589
602 - 624
49 - 500
555 - 568
625 - 625
Burkholderia pseudomallei 1106a 1 - 753
Streptococcus sanguinis
1 - 4
481 - 492
553 - 563
5 - 24
458 - 480
493 - 515
530 - 552
564 - 586
25 - 457
516 - 529
587 - 592
Staphylococcus epidermidis 1 - 6 7 - 29 30 - 400
Yokenella regensburgei ATCC 43003 1 - 733
Acetobacteraceae bacterium AT-5844 1 - 720
Hafnia alvei ATCC 51873 1 - 728

In determining where the residue is on the cell, the TMHMM plots must be the source of
information and not the probabilities listed above, because the plot shows the location
and the data above shows only the prediction of the location if the transmembrane helix
is on the inside, outside or inside the membrane of the cell.
With respect to the resulting graph of each species, S. aureus, B. pseudomallei, A.
bacterium, H. alvei, Y. regensgurgei and S. epedermidis are inside the cell. G.
lozoyensis showed no TM helix. R. cervicalis is outside the cell. S. newyorkensis
showed inside positions at 25-50, 495-520, 525-550, 555-580. 600-625. Lastly, S.
sanguinis at the position of 1-25, 450-480, 485-510, 520-560, 570-592 showed to be
inside.

Determining the coiled-coil region

Fig 3. Number of coiled-coil regions for different species.
Coiled coils are built by two or more alpha-helices that wind around each other to form a
supercoil. In essence coiled coils are built of sequence elements of three and four
residues whose hydrophobicity pattern and residue composition is compatible with the
structure of amphipathic alpha-helices. S. newyorkensis having the most number of
coiled-coil region with a score of 8 and A. bacterium being the least with a score of 2.

Hydrophobicity prediction
Based on the GRAVY calculated using Protparam, species were considered hydrophilic
due to a very low score. In determining how hydrophobic and hydrophilic each species
are, the Protscale plot was used as the basis for analysis. Peaks above zero indicates
that the residue is hydrophobic, below zero is hydrophilic.

0 1 2 3 4 5 6 7 8
NUMBER OF COILED
COIL REGION
Hafnia alvei ATCC 51873 Acetobacteraceae bacterium AT-5844 Yokenella regensburgei ATCC 43003
Staphylococcus epidermidis Streptococcus sanguinis Burkholderia pseudomallei 305
Sporosarcina newyorkensis Glarea lozoyensis 74030 Roseomonas cervicalis ATCC 49957
Staphylococcus aureus A8115


Fig 4. Prediction of the number of major peaks having ()1 as the base point in the
ProtScale plot.

According to the figure, G. lozoyensis with 13 major peaks has the most hydrophobic
residue and S. epedermidis with 4 major peaks being the least. The hydrophilic peaks are
more in number than the hydrophobic peaks. H. alvei with 20 peaks is considered the
most hydrophilic among all species. Having 12 peaks, S. aureus, S. epedermidis and S.
sanguinis are the least hydrophilic among all the species.

Determining the post-translational modification
Some PTMs of the given bacteria are identical with each other and some has a unique
modification. Figure A shows the PTMs for each bacterium, it was arranged according to
similarity of their modifications. Only two bacteria have active sites namely, Y.
regensburgei ATCC 43003 and H. alvei ATCC 51873. Active sites for Y. regensburgei
ATCC 43003 and H. alvei ATCC 51873 are TonB-dependent receptor proteins signatures
1 and 2 (TonB-DRPS 1 & 2) and TonB-dependent receptor proteins signature 2 (TonB-
DRPS 2), respectively. Without TonB, receptors attach their substrates even though it
does not have an active transport. Active transport is important in moving ions through
5
9
13
9
12
8
4
9 9
6
12
17
14 14
17
12 12
18
17
20
hydrophobic peaks hydrophilic peaks

membranes contrary to their electrochemical gradient. A summarized table below is
provided to show how many patterns in the sequence of bacteria are identical in the
prosite database.





Fig. 5. The chart shows the post-translations modification for each bacterium. The name
of the bacteria is placed outside the circle. Similar PTMs for the 10 bacteria are located
at the center. PTMs positioned in between two colors correspond to two bacteria. While
PTMs found inside the segment of the circle is a unique modifier for that specific bacteria.
G
l
a
r
e
a

l
o
z
o
y
e
n
s
i
s

7
4
0
3

CK2, PKC
Phosphorylation
N-glycosylation
N-myristoylation
leucine
zipper
pattern
TonB-DRPS
1 & 2
TonB-
DRPS 2

Detecting PROSITE signature matches
Table 4. The table shows the number of hits by pattern in a sequence for each bacterium

Protein kinases are responsible for phosphorylation; it is a kind of enzyme that catalyzes
transfer of phosphate group from ATP to substrates. They are usually used in transmitting
signals and controlling processes in cells. Its known function is to modify the activities of
proteins. Some of the functions of phosphorylation are regulating the functions of proteins
by formation and disturbance of protein-protein surfaces and by stimulating
conformational changes. Due to the reversibility of phosphorylation, it permits cell to
respond to stimuli making it to be perfect tool in signal transduction. Individual functions
for the different types of phosphorylation that was used in modifying the given bacteria
are also discussed. Phosphorylation of tyrosine residues controls the enzymatic activity
of the protein and generates binding sites for downstream signaling proteins. Casein
kinase (II) is concern mainly on regulating cellular processes. They are self-regulating on
cyclic nucleotides and calcium. Protein kinase C is used for phosphorylation of serine or
threonine residues adjacent to the C-terminal basic residue. It improves enzyme-
catalyzed reaction and substrate concentration of phosphorylation reaction.
fmtA Sequence Hits by pattern
Staphylococcus aureus A8115 16 out of 397 amino acids
Roseomonas cervicalis ATCC 4995 47 out of 731 amino acids
Glarea lozoyensis 7403 31 out of 513 amino acids
Sporosarcina newyorkensis 24 out of 625 amino acids
Burkholderia pseudomallei 1106a 35 out of 753 amino acids
Streptococcus sanguinis 23 out of 592 amino acids
Staphylococcus epidermidis 22 out of 400 amino acids
Yokenella regensburgei ATCC 43003 48 out of 733 amino acids
Acetobacteraceae bacterium AT-5844 42 out of 720 amino acids
Hafnia alvei ATCC 51873 56 out of 728 amino acids

N-myristoylation serves as a conformational localization switch, the conformational
changes of protein has a major effect in the availability of the handle for membrane
attachment. It also increases the hydrophobicity and affinity of membranes due to
myristoyl group which attaches to the N-terminal amino acid of polypeptide. N-
myristoyltransferase (NMT) is the enzyme used in catalyzing the modification.
N-glycosylation improves the functional diversity of proteome. It confirms whether the
folded proteins are transferred to Golgi or not. N-linked glycoproteins contribute important
properties during protein folding, conformation, distribution, stability and activity.
Amidation improves the activity of peptides and lengthens its shell life. Together with N-
terminal acetylation, it lessens the overall charge of peptides causing the solubility to
decrease. Moreover, it enhances the resistance of peptides against enzymatic
degradation and increases its stability as they copy the native protein. Therefore,
amidation boosts the biological activity of peptide.
S. sanguinis has a unique modification called leucine zipper pattern. It is responsible for
binding DNA within the promoters of genes. It regulates gene expression in order to
develop complex organisms.



Predicting cleavage sites

Fig. 6. Number of possible cleavage sites obtained using PeptideCutter.

With respect to the number of cleavages, Proteinase K cleaved most of the amino acid in
a corresponding sequence for each bacteria. This enzyme preferentially cleaves at
aliphatic of aromatic amino acid such as Tyrosin, Phenylalanine, Tryptophan and
Histidine (Keil, 1992). Other function includes major role in the destruction of proteins in
cell lysates (tissue, cell culture cells) and for the release of nucleic acids. Since almost all
of the chosen sequence contain a lot of aromatic amino acid. Next on the list is
Thermolysin, this proteinase cleaves sites with bulky and aromatic residues Isolucine,
Leucine, Valine, Alanine, Methaionine and Phenylalanine (Keil, 1992).
Trypsin, being in in the middle, preferentially cleaves at Arg and Lys in position P1 with
higher rates for Arg especially at high pH (Keil, 1992). Hydroxylamine, with a low score
of one-digit number, is responsible in cleaving sites at Asn and Glu (Bornstein & Balian).
Lastly, Enterokinase, having no cleavage at all, is a serine protease that recognizes the
0
50
100
150
200
250
300
350
400
190
381
256
344
372
317
205
363
378
357
115
205
153
207
221
183
116
194
205
183
59
63
47
57
67
62
54
59 60 62
3 3 2 2 2 3 3 2 1 2 0 0 0 0 0 0 0 0 0 0
N
O
.

O
F

C
L
E
A
V
E
G
E
S
Proteinase K Thermolysin Trypsin Hydroxylamine Enterokinase

amino acid sequence -Asp-Asp-Asp-Asp-Lys-|-X (Roche) with a high specificity. The
enterokinase activates its natural substrate trypsinogen and releases trypsin by cleavage
at the C-terminal end of this sequence.
Together with enzyme enterokinase, Caspase1-10 is not intended for fmtA from all
bacteria except for Sporosarcina newyorkensis and Streptococcus sanguinis because
these bacterium can digest Caspase8. Enterokinase and Granzyme B enzymes has a
preference for cleaving which fmtA lacks, thats why these enzymes did not give possible
cleavage sites in the sequence. For Factor Xa, only Roseomonas cervicalis is applicable.
FmtA proteins from Roseomonas cervicalis, Burkholderia pseudomallei, Yokonella
regensburgei and Aceterobacteraceae bacterium can digest the enzyme Tobacco etch
virus protease. For Thrombin, only Roseomoinas cervicali gave a positive feedback.

Prediction of secondary structure
Table 4. Summary of results obtained using PSIPRED, indicated the number of -helices,
-sheets and random coils in the structure of each fmtA protein in different bacteria
-helix -sheet Random coils
[Staphylococcus aureus
A8115]
11

9 21
[Roseomonas cervicalis
ATCC 49957]
4 36 41
[Glarea lozoyensis 74030] 11 19 30
[Sporosarcina newyorkensis] 16 17 33
[Burkholderia pseudomallei
1106a]
4 38 42
[Streptococcus sanguinis] 15 17 32
[Staphylococcus
epidermidis]
11 9 21
[Yokenella regensburgei
ATCC 43003]
3 39 43
[Acetobacteraceae bacterium
AT-5844]
3 36 40
[Hafnia alvei ATCC 51873] 3 38 42


Secondary protein structure is the specific geometric shape caused by intramolecular
and intermolecular hydrogen bonding of amide groups. It composes of -helix, -sheets
and sometimes random coils.
Based on the Psipred result of 10 sequences, it is noticeable that the -sheets have a
higher number compared to the -helix. in the -helix structure, the "backbone" of the
peptide forms the inner part of the coil while the side chains extend outward from the coil.
-sheets have a greater number because not all amino acids favor the formation of the
-helix due to steric constraints of the R-groups. Amino acids such as A, D, E, I, L and M
favor the formation of -helices, whereas, G and P favor disruption of the helix. This is
particularly true for P since it is a pyrimidine based imino acid (HN=) whose structure
significantly restricts movement about the peptide bond in which it is present, thereby,
interfering with extension of the helix. Whereas an -helix is composed of a single linear
array of helically disposed amino acids, -sheets are composed of 2 or more different
regions of stretches of at least 5-10 amino acids.











Fig. 7. Multiple sequence alignment of fmtA. The colors of the letters correspond how good or bad the sequence identity.


Multiple sequence alignment is used to detect related proteins and to study the
relationship between the sequences. It has been clearly shown that using multiple
sequence alignments improve upon the detection of distantly related homologous
proteins.
Figure shows the multiple sequence of the entered sequence. Each of the sequence is
listed with their names and an alignment. The color of each letter tells how bad or good
the alignment is. Notice how the first and last part of the protein has a good sequence
identity. These proteins are conserved through evolution.

Fig. 8. Phylogeny of fmtA protein fromm different bacteria, it is divided into three
clusters: Distance were calculated by means of % identity.

The proteins in the alignment can be grouped according to different sequence features.
This allows the proteins to be further assorted into subgroups that are most closely

related to each other. If the protein is not well conserved, it indicates that it is more
evolutionary distant.
Sequences that are more related are closer together in the branch of the tree. Based on
the fig. 8 the ancestor (base point of zero) of the protein is closely related to S. aureus
and S. epidemidis. In contrast, B. pseudomallei are the most distant from ancestral
protein. B. pseudomallei is a Gram-negative bacteria pathogen that normally survives as
a saprophyte in soil and water, but is also capable of infecting most mammals and causing
serious infections resulting in the multifaceted disease melioidosis. Very little is known
about iron acquisition mechanisms in B. pseudomallei. The bacterium produces a
hydroxamate-type siderophore, malleobactin, that can remove iron from lactoferrin and
transferrin, allowing this bacterium to grow under iron-limiting conditions.

Prediction of the tertiary structure of B. pseudomallei
Partial-double-bond makes the peptide planar; it limits the rotation around C-N bond
making it to have two alpha-carbons, C, O, N and H among them in one plane. Making
thea the third angle which is omega () is constant at 180
0
. These three angles are the
most significant local structure parameter in protein folding. Allowed and favored regions
of dihedral angles in B. pseudomallei 1106a can be seen in figure below. Table A shows
the name, position and coordinates of residues that lie in non-core regions (outlier).













Fig. 9. Ramachandran plot of B. pseudomallei. Residues in outlier regions are numbered
accordingly.

Table 5. This table shows the name, position and coordinates of residues that lie in non-
core regions (outlier)
Name of
Residue
Position of
residue in
sequence
Coordinates
Name of
Residue
Position of
residue in
sequence
Coordinates
Threonine
(Thr)
96 -175.32, -11.24 Glycine(Gly) 454 -177.83, -58.14
Valine
(V)
157 133.40, 157.46 Proline(Pro) 455 56.81, 105.40
Proline
(Pro)
165 -115.66, 152.39 Arginine(Arg) 482 161.14, -43.91
Tryptophan
(Trp)
173 46.95, 151.16 Lysine(Lys) 545 -65.87,-150.16

Alanine
(Ala)
201 158.76,-166.87 Glycine(Gly) 549 31.09, 30.59
Aspartic acid
(Asp)
236 136.98,-175.22 Proline(Pro) 568 -44.89, 163.52
Proline
(Pro)
261 4.87, 40.87 Proline(Pro) 569 -43.24, -73.29
Histidine
(His)
328 154.55, 128.87 Serine(Ser) 620 -7.65, -78.12
Asparagine
(Asn)
345 -173.79,-140.22 Valine(V) 651 -55.41, 180.00
Threonine
(Thr)
407 -65.02, -85.34 Proline(Pro) 652 6.62, 54.3
Alanine
(Ala)
432 -173.28,-149.78 Arginine(Arg) 711 -159.69,-128.76
Proline
(Pro)
448 -62.94,-139.77 Alanine(Ala) 735 -172.95, -69.74

Glycine has one side chain, hydrogen, while proline is limited in ramachandran plot since
phi is restricted by cyclic side chain that ranges from -35
o
to -85
o
. Clear illustrations for
glycine, pre-proline and proline residues lying in favored and allowed regions are provided
below.


Fig. 10. Ramachandran plots for glycine, preproline and proline residues. Dark colors
indicate the favored region while lighter colors designates allowed region.

Ramachandran plot illustrates the , angles of in B. pseudomallei 1106a. The
percentage of favorable, allowed and outlier region are 88.9 %, 7.5% and 3.6%,
respectively which is a little bit far from the expected value of ~98% (favorable) and ~2%

(allowed). Out of 573 residues, there are 24 residues lying in outlier regions. Outlier region
indicates how well the structures are suited in the main chain distribution of torsional
angles.

Prediction of quaternary structure


Fig. 11. Results obtained using Swiss model, given the qmean z-score, C-beta interaction
energy and AII-atom interaction energy.

Fig. 12a. 3-dimensional structure of B.
pseudomallei 1106a.





Fig. 12b. 3-dimensional structure of B.
pseudomallei 1106a based on
temperature. High values are colored in
warmer (red) colors and lower values in
colder (blue) colors.


Fig. 13. Assessment of the quality of the homology model.


QMEAN (Qualitative Model Energy Analysis) calculates global and local quality
estimates on the basis of single models. The data shown allows us to inspect the
differences between the models and helps us understand the expected accuracy of the
model.
(a) Represents the QMEAN scores of the reference structures from the PDB. It indicates
how many standard deviations the model score differs from the expected values. It has a
Z-score of -5.42 which is far from the mean. (b) This is a projection of the first plot for the
given protein size. It also shows the number of reference models used in calculation. (c)
Shows that a low quality model has a strongly negative Z-scores for QMEAN. Good
structures are said to be in the light red to blue region. The data shows that the model is
in very low quality because of its very high negative values.


Figure 14. Anolea evaluates the packing quality of the model. The Y-axis shows the
energy for each amino acid of the protein chain. Qmean estimates the global quality for
all models.

The Atomic empirical mean force potential (ANOLEA) performs energy calculations on a
protein chain. The negative energy values shown in green signify beneficial energy setting
while the positive energy values shown in red signify unfavorable energy setting for a
given amino acid. It immediately reveals if there are regions with atoms coming close to
each other and some regions have very high energy. In protein structures amino acid
residues have their preferred location. And the red regions show that the energy of the

model is much higher due to residues making bad contacts. We can conclude that the
green region is much favorable and it indicates the incorrectness of the model.

Conclusion
The species used all contained the fmtA protein; each bacterium is of different kind and
has different functions. Some are fmtA family proteins and some are ferric malleobactin
receptor and transporter. Various databases were used in comparing each species of
bacteria from the other, starting from the molecular weight, GRAVY, and number of amino
acids. Then, the matching domains and motifs are all different. Also, predicting the
location of the transmembrane helix where each has different posterior probabilities.
In doing multiple alignments, the average distance was calculated using % identity. Upon
doing that, the results came out to be B. pseudomallei that is the most distant species
among all species used.

References:
[1] Bornstein P., Balian G. Cleavage at Asn-Gly bonds with hydroxylamine. Methods in
Enzymology (1977) 47: 132- 144
[2] Roche. Enterokinase product description. http://www.roche-applied-
science.com/proddata/gpip/3_1_3_7_10_1.html
[3] Keil, B. Specificity of proteolysis. Springer-Verlag Berlin-Heidelberg-NewYork,
pp.335. (1992)
[4] Komatsuzawa H, et al. 1997. Cloning and characterization of the fmt gene which
affects the methicillin resistance level and autolysis in the presence of Triton X-100 in
methicillin-resistant Staphylococcus aureus. Antimicrob.
Agents Chemother. 41:23552361.
[5] Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL
Workspace: A web-based environment for protein structure homology
modelling. Bioinformatics, 22,195-201.
[6] Schwede T, Kopp J, Guex N, and Peitsch MC (2003) SWISS-MODEL: an automated
protein homology-modeling server. Nucleic Acids Research 31: 3381-3385.
Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An
environment for comparative protein modelling.Electrophoresis 18: 2714-2723.
[7] Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 2444.A
comparison of sequence and structure protein domain mafilies as a basis for structural
genomics. Arne elofsson and erik L. L. Sonnhammer. Department of biochemistry,
Stockholm university, November 11, 1998
[8] Crystal structure of a D-aminopeptidase from Ochrobactrum anthropi, a new member
of the 'penicillin-recognizing enzyme' family.
[9] Bompard-Gilles C, Remaut H, Villeret V, Prange T, Fanuel L, Delmarcelle M, Joris B,
Frere J, Van Beeumen J.
Structure 8 971-80 2000
PMID: 10986464 Related citations

[10] EstB from Burkholderia gladioli: a novel esterase with a beta-lactamase fold reveals
steric factors to discriminate between esterolytic and beta-lactam cleaving activity.
Wagner UG, Petersen EI, Schwab H, Kratky C.
Protein Sci. 11 467-78 2002
PMID: 11847270 Related citations
[11] Understanding the acylation mechanisms of active-site serine penicillin-recognizing
proteins: a molecular dynamics simulation study.
Oliva M, Dideberg O, Field MJ.
Proteins 53 88-100 2003
PMID: 12945052 Related citations
[12]Beta-lactamase of Bacillus licheniformis 749/C. Refinement at 2 A resolution and
analysis of hydration.
Knox JR, Moews PC.
J. Mol. Biol. 220 435-55 1991
PMID: 1856867 Related citations
[13] The active-site-serine penicillin-recognizing enzymes as members of the
Streptomyces R61 DD-peptidase family.
Joris B, Ghuysen JM, Dive G, Renard A, Dideberg O, Charlier P, Frere JM, Kelly JA,
Boyington JC, Moews PC.
Biochem. J. 250 313-24 1988
PMID: 3128280 Related citations
[14] The phototrophic bacterium Rhodopseudomonas capsulata sp108 encodes an
indigenous class A beta-lactamase.
Campbell JI, Scahill S, Gibson T, Ambler RP.
Biochem. J. 260 803-12 1989
PMID: 2788410 Related citations
[15] X-ray structure of Streptococcus pneumoniae PBP2x, a primary penicillin target
enzyme.
Pares S, Mouz N, Petillot Y, Hakenbeck R, Dideberg O.

Nat. Struct. Biol. 3 284-9 1996
PMID: 8605631 Related citations
[16] Crystal structure of the outer membrane active transporter FepA from Escherichia
coli.
Buchanan SK, Smith BS, Venkatramani L, Xia D, Esser L, Palnitkar M, Chakraborty R,
van der Helm D, Deisenhofer J.
Nat. Struct. Biol. 6 56-63 1999
PMID: 9886293 Related citations
[17] Transmembrane signaling across the ligand-gated FhuA receptor: crystal structures
of free and ferrichrome-bound states reveal allosteric changes.
[18] Locher KP, Rees B, Koebnik R, Mitschler A, Moulinier L, Rosenbusch JP, Moras D.
Cell 95 771-8 1998
PMID: 9865695 Related citations
[19] Structural basis of gating by the outer membrane transporter FecA.
Ferguson AD, Chakraborty R, Smith BS, Esser L, van der Helm D, Deisenhofer J.
Science 295 1715-9 2002
PMID: 11872840 Related citations
[20] Substrate-induced transmembrane signaling in the cobalamin transporter BtuB.
Chimento DP, Mohanty AK, Kadner RJ, Wiener MC.
Nat. Struct. Biol. 10 394-401 2003
PMID: 12652322 Related citations
[21] Three paradoxes of ferric enterobactin uptake.
Klebba PE. Front. Biosci. 8 s1422-36 2003. PMID: 12957833 Related citations
[22] The Escherichia coli outer membrane cobalamin transporter BtuB: structural
analysis of calcium and substrate binding, and identification of orthologous transporters
by sequence/structure conservation.
Chimento DP, Kadner RJ, Wiener MC.
[23] Swiss Institute of Bioinformatics. Available:< http://prosite.expasy.org/cgi-
bin/prosite/ScanView.cgi?scanfile=623843423385.scan.gz>. Accessed 16 May 2013.

[24] Thermo Fisher Scientific Inc. (2013).
Available:< http://www.piercenet.com/browse.cfm?fldID=7CE3FCF5-0DA0-4378-A513-
2E35E5E3B49B >. Accessed 12 May 2013.
[25] Hubbard SR, Till JH. (2000).
Available:< http://www.ncbi.nlm.nih.gov/pubmed/10966463>. Accessed 12 May 2013.
[26] Manning G, Whyte DB. et al. (2002). "The protein kinase complement of the human
genome".Science 298 (5600): 1912
1934. doi:10.1126/science.1075762. PMID 12471243.
[27] Francis SH, Corbin JD (August 1999). "Cyclic nucleotide-dependent protein
kinases: intracellular receptors for cAMP and cGMP action". Crit Rev Clin Lab
Sci 36 (4): 275328.doi:10.1080/10408369991239213. ISSN 1040-
8363. PMID 10486703.
[28] American Association for Cancer Research (cAMP-responsive Genes and Tumor
Progression) Available:
< https://en.wikipedia.org/wiki/Cyclic_adenosine_monophosphate>. Accessed 12 May
2013.
[29] LifeTein. Free Modifications: N-Terminal Acetylation and C-Terminal Amidation
Available:< http://www.lifetein.com/Peptide-Synthesis-Amidation-Acetylation.html>.
Accessed 16 May 2013.
[30] Landschulz WH, Johnson PF, McKnight SL (1988-06-24). "The leucine zipper: a
hypothetical structure common to a new class of DNA-binding
proteins". Science 240 (4860): 1759
1764.doi:10.1126/science.3289117. PMID 3289117
[31] Berger-Bchi B, Strssle A, Gustafson J E, Kayser F H. Mapping and
characterization of multiple chromosomal factors involved in methicillin resistance
in Staphylococcus aureus. Antimicrob Agents Chemother. 1992;36:13671373. [PMC
free article] [PubMed]
[32] Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 2444.

Abstract/FREE Full Text
[33] Springer 2013
http://www.springerreference.com/docs/html/chapterdbid/34498.html Jeon H., Meng W.,
Takagi J., Eck M.J., Springer T.A., Blacklow S.C. Implications for familial
hypercholesterolemia from the structure of the LDL receptor YWTD-EGF domain pair.
Source Nat. Struct. Biol. 8:499-504(2001).PubMed ID 11373616. DOI
10.1038/88556

[34] S.C. Lovell, I.W. Davis, W.B. Arendall III, P.I.W. de Bakker, J.M. Word, M.G. Prisant,
J.S. Richardson and D.C. Richardson (2002) Structure validation by Calpha geometry:
phi,psi and Cbeta deviation. Proteins: Structure, Function & Genetics. 50: 437-450.
Available: < http://mordred.bioc.cam.ac.uk/~rapper/rampage2.php>. Accessed 16 May
2013
[35] Karadaghi, S.A. (2012). Available:
<http://www.proteinstructures.com/Structure/Structure/Ramachandran-plot.html>.
Accessed 12 May 2013
[36] Kleywegt, G.J., Jones, A.T. (1996). Phi/Psi-chology: Ramachandran revisited.
Available: < http://www.greeley.org/~hod/papers/ByAuthor/Jones/s4_1996_1395.pdf >.
Accessed 12 May 2013
[36] Swiss Institute of Bioinformatics. Swiss-Model Workspace. Available: <
http://swissmodel.expasy.org/workspace/index.php?userid=arsza_01@yahoo.com&key
=3e0d577ab36ad95cd472ca012a0defe0&func=workspace_modelling&prjid=P000008 >.
Accessed 16 May 2013

Anda mungkin juga menyukai