Trypsinogen-nuc.txt
ChymoProtSeq.txt
(b) The authors are: Szenthe,B., Frost,C., Szilagyi,L., Patthy,A., Naude,R. and
Graf,L. JOURNAL :Biochim. Biophys. Acta 1748 (1), 35-42 (2005).
(c)
TrypProtSeq.txt.
PGDADDDKIVGGYNCPAHSVPYQVSLNAGYHFCGGSLINSQWVLSAAHCYKSSIQVRLGEYNIDVREDSEVVRSSAAVIRHPK
YSSRSLDNDIMLIKLASPVAYSADVQPIALPSSCVKAGTKCLISGWGNTLSSGSSFPEILQCLQAPVLSDRECRNAYPGEISS
NMICVGFLEGGKDSCQGDSGGPVVCDGTLQGIVSWGIGCAQKGYPGVYTKVCNYVSWIQETIAAY
TrypProtSeq.txt.
FFFWPQAHALLLCNKVSPALCSAPCWHCWSEAIPTALPLGHNLVDPVSEVADAGVDSGRADVAVAASPGDDANQGPGVPVLVH
EGTPGVTLAGRGSSPSGAQHGAGDAGAPVLLALALRDQGQGDLLQARRQSVGGGAGASPAGGDALQGVGQNQVGSSQAHGGHA
GAQLGGRGELQQRDVVVEGVGVPVGVRDGLRHGLHLDGLSLGVEVVLPEDDDVGIWAELAVGGCDHPVLVDQRTSTEVGSRAG
LQGHLPGPGAGHRVLPIDDPLAVLHRGADQGHPAA&NGAGQGQAGDCPQQRH
To get the above protein sequences I firstly went to ncbi(National Center for
Biotechnology Information) site using the website, https://www.ncbi.nlm.nih.gov/.
in the search bar, I wrote Ostrich trypsinogen and for red fowl chymotrypsinogen
I used the accession number. I opened the nucleotide sequence via FASTA format
whichis a text-based format for representing either nucleotide sequences or peptide
sequences, whereby base pairs or amino acids are represented in terms of single-
letter codes. I copied the nucleotide sequence and pasted it on Gene Runner, a
simple software tool that allows the analysis of nucleotide sequence to get protein
sequence. On Gene Runner I pasted the nucleotide sequence and went to open reading
frame which after cicking search allowed me to look at the translation frame. I
chose translation of +3 since ita had high molecular weight. Higher molecular
weight allows high possibilities of obtaining the desired protein sequence.
(d)
TrypTopFive.prt
ChymoTopFive.prt.
To get the above similar protein sequences I went to ncbi site and under Analyze
these sequences, I clicked Run Blast thereafter I clicked protein blast. I then
verified the algorithm parameters and thereafter clicked BLAST. I then selected
the top five similar protein sequences by clicking the accession number appearing
as a link on the right hand side of the page. BLAST (Basic Local Alignment Tool)
finds regions of similarity between biological sequences. This program compares
nucleotide or protein sequences to sequence databases and thereafter calculates the
statistical significance.
(e)
ChymoEvolProt.txt.
TrypEvolProt.txt
(f)
TrypProtAlign.aln
AAT11803.2 ------------------------------------------------------------
XP_013809425.1 ------------------------------------------------------------
XP_013813435.1 MKVFLILSCLGAVVAVPGDADDDKIVGGYNCPAHSVPYQVSLNAGYHFCGGSLINNQWVV
XP_013026274.1 ------------------------------------------------------------
ALA65723.1 ------------------------------------------------------------
XP_015149651.1 ------------------------------------------------------------
AAT11803.2 ------------------------------------------------------------
XP_013809425.1 ------------------------------------------------------------
XP_013813435.1 SAAHCYKSYIQVRLGEYNIDVQEDSEVVRSSAAIIRHPKYSSRSLDNDIMLIKLASPVAY
XP_013026274.1 -------------------------------------------------MALELMAQTDL
ALA65723.1 ------------------------------------------------------------
XP_015149651.1 ------------------------------------------------------------
AAT11803.2 ---------------------------------------PGDADDDKIVGGYNCPAHSVP
XP_013809425.1 -----------------------MKVFLILSCLGAVVAVPGDADDDKIVGGYNCPAHSVP
XP_013813435.1 SADIQPIALPSTCVKAELGGRGTMKVFLILSCLGAVVAVPGDADDDKIVGGYNCPAHSVP
XP_013026274.1 KWTTQRSSRRQAPSEKNSGFSSETLKDERTPLCSASVAFPGDADDDKIVGGYNCPRHSVP
ALA65723.1 -------------------MHSL---FLLLSCLGAAVAFPRAADDDKIVGGYNCPEHSVP
XP_015149651.1 -------------------MNSL---FLILSCLGAAVAFPGGADDDKIVGGYTCPEHSVP
* **********.** ****
AAT11803.2 YQVSLNAGYHFCGGSLINSQWVLSAAHCYKSSIQVRLGEYNIDVREDSEVVRSSAAVIRH
XP_013809425.1 YQVSLNAGYHFCGGSLINNQWVVSAAHCYKSYIQVRLGEYNIDVQEDSEVVRSSAAIIRH
XP_013813435.1 YQVSLNAGYHFCGGSLINNQWVVSAAHCYKSYIQVRLGEYNIDVQEDSEVVRSSAAIIRH
XP_013026274.1 YQVSLNAGYHFCGGSLINNQWVVSAAHCYKYNIQVRLGEYNIDVQEDSEVVRSSSVIIRH
ALA65723.1 YQVSLNAGYHFCGGSLINNQWVVSAAHCYKSRIQVRLGEYNIDVQEDSEVVRSSSVIIRH
XP_015149651.1 YQVSLNSGYHFCGGSLINSQWVLSAAHCYKSRIQVRLGEYNIDVQEDSEVVRSSSVIIRH
******:***********.***:******* ************:*********:.:***
AAT11803.2 PKYSSRSLDNDIMLIKLASPVAYSADVQPIALPSSCVKAGTKCLISGWGNTLSSGSSFPE
XP_013809425.1 PKYSSRSLDNDIMLIKLASPVAYSADIQPIALPSTCVKAGTGCLISGWGNTLSSGSSFPE
XP_013813435.1 PKYSSRSLDNDIMLIKLASPVAYSADIQPIALPSTCVKAGTGCLISGWGNTLSSGSSFPE
XP_013026274.1 PNYSSRTIDNDIMLIKLASAVDYSADVQPIALPTSCAKAGTECLISGWGNTLSSGINYPE
ALA65723.1 PNYSSRTLNNDIMLIKLASAVDYSADVQPIALPTSCAKAGTECLISGWGNTLSSGTYYPE
XP_015149651.1 PKYSSITLNNDIMLIKLASAVEYSADIQPIALPSSCAKAGTECLISGWGNTLSNGYNYPE
*:*** :::********** * ****:******::*.**** ***********.* :**
AAT11803.2 ILQCLQAPVLSDRECRNAYPGEISSNMICVGFLEGGKDSCQGDSGGPVVCDGTLQGIVSW
XP_013809425.1 IVQCLQAPVLSDQECRDAYPGQISSNMMCVGFLEGGKDSCQGDSGGPVVCDGTLQGIVSW
XP_013813435.1 IVQCLQAPVLSDQECRDAYPGQISSNMMCVGFLEGGKDSCQGDSGGPVVCDGTLQGIVSW
XP_013026274.1 ILQCLQAPILSDQECQEAYPGQITSNMICVGFLQGGKDSCQGDSGGPVACNGELQGIVSW
ALA65723.1 LLQCLQAPILTNQECQDAYPGEITSNMICIGFLEGGKDSCQGDSGGPVVCNGELQGIVSW
XP_015149651.1 LLQCLNAPILSDQECQEAYPGDITSNMICVGFLEGGKDSCQGDSGGPVVCNGELQGIVSW
::***:**:*:::**::****:*:***:*:***:**************.*:* *******
AAT11803.2 GIGCAQKGYPGVYTKVCNYVSWIQETIAAY
XP_013809425.1 GIGCALKGYPGVYTKVCNYVNWIQETIAAY
XP_013813435.1 GIGCALKGYPGVYTKVCNYVNWIQETIAAY
XP_013026274.1 GIGCALKGYPGVYTKVCNYVDWIQETIAAY
ALA65723.1 GIGCALQGYPGVYTKVCNYVDWIQETIAAY
XP_015149651.1 GIGCALKGYPGVYTKVCNYVDWIQETIAAY
***** :*************.*********
ChymoProtAlign.aln
XP_005029771.1 -MAFLWAVACLALASAVSGCGVPSISPSVQYNERIINGQNAVSGSWPWQVSLQTRTGSHF
EOA94198.1 RMAFLWAVACLALASAVSGCGVPSISPSVQYNERIINGQNAVSGSWPWQVSLQTRTGSHF
XP_010177298.1 -MAFLWAVACLALASTVSGCGVPTISPSVHYSERIINGQNAVSGSWPWQVSLQTRSGSHF
XP_015729276.1 -MAFLWAVTCLALASTVSGCGVPMITPSVQYNERIINGQNAVSGSWPWQVSLQSRSGSHF
NP_001264554.1 -MALLWAVTCLALASTVSGCGVPLISPSVQYSERIINGQNAVSGSWPWQVSLQTRSGSHF
XP_010716164.1 -MAFLWAVTCLALASTVSGCGVPLISPSVQYSERIINGQNAVSGSWPWQVSLQTRSGSHF
**:****:******:******* *:***:*.*********************:*:****
XP_005029771.1 CGGSLINEYWVVTAAHCEFNPYSHVVVLGEYDRYSGSEAVQVKTVTKAVTHPNWDSYNLN
EOA94198.1 CGGSLINEYWVVTAAHCEFNPYSHVVVLGEYDRYSGSEAVQVKTVTKAVTHPNWDSYNLN
XP_010177298.1 CGGSLINENWVVTAAHCEFNPYSHVVVLGEYNLNSNTESVQVKTVTKAITNPSWNAYTLN
XP_015729276.1 CGGSLINENWVVTAAHCEFSPYSHVVVLGEYNLASQTESVQVKTVSKVITHPNWNSNTLN
NP_001264554.1 CGGSLINENWVVTAAHCEFSPYSHVVVLGEYNLNSQTESVQVKTVSKAVTHPNWNSYTLN
XP_010716164.1 CGGSLINANWVVTAAHCEFNPFSHVVVLGEYNLGSQTESVQVKTVSKAITHPNWNAYTLN
******* **********.*:*********: * :*:******:*.:*:*.*:: .**
XP_005029771.1 NDITLLKLSSPAQLGPRVAPVCLAPANLALPSDLQCVTTGWGRTNTNSNALAVRLQQVTL
EOA94198.1 NDITLLKLSSPAQLGPRVAPVCLAPANLALPSDLQCVTTGWGRTNTNSNALAVRLQQVTL
XP_010177298.1 -DITLLKLSSPAQLGPRVSPICLAPANLALPTNLQCVTTGWGRTNTNSQALAARLQQVTL
XP_015729276.1 NDITLLKLSSPAQLNSRVSPVCLAPANLALSTSTECVTTGWGRTSTISNAPATRLQQVSL
NP_001264554.1 NDITLLKLSSPAQLGSRVSPVCLAAANLVLSNSLQCVTTGWGRTSTTSNALASRLQQVSL
XP_010716164.1 NDITLLKLSSSAQLGTRVSPVCLAAANLALSDSQQCVTTGWGRISTTSNALASRLQQVSL
********* ***. **:*:*** ***.* . :******** .* *:* * *****:*
XP_005029771.1 PLVSSSQCMQYWGSSITSSMLCAGGVGASSCQGDSGGPLVYQNGNVWTLIGIVSWGSSNC
EOA94198.1 PLVSSSQCMQYWGSSITSSMLCAGGVGASSCQGDSGGPLVYQNGNVWTLIGIVSWGSSNC
XP_010177298.1 PLISQSQCMQYWGNRITSSMLCAGGVGASSCQGDSGGPLVYQNGNVWTLIGIVSWGNSNC
XP_015729276.1 PLISQSQCQQYWGNRITSSMLCAGGAGASSCQGDSGGPLVYKNGNVWTLIGIVSWGSTNC
NP_001264554.1 PLISQSQCQQYWGTRITSSMLCAGGAGASSCQGDSGGPLVYQNGNAWTLIGIVSWGSSNC
XP_010716164.1 PLVSQSRCQQYWGTRITSAMLCAGGAGASSCQGDSGGPLVYQSGNTWTLIGIVSWGNSNC
**:*.*:* ****. ***:******.***************:.**.**********.:**
XP_005029771.1 NIRTPAVYTRVSQFRNWIDYIVAQG
EOA94198.1 NIRTPAVYTRVSQFRNWIDYIVAQG
XP_010177298.1 NVHTPAIYTRVSQFRSWIDYVVAQG
XP_015729276.1 NIRIPAVYTRVSHFRSWIDQTVAQG
NP_001264554.1 NVRTPAVYTRVSHFRNWIDQIVAQG
XP_010716164.1 NVHTPAVYTRVSHFRNWIDQTVAQ-
*:: **:*****:**.*** ***
I obtained the above global alignments through ebi (European Bioinformatics
Institute) site via the website, http://www.ebi.ac.uk/. I then licked services
and went on to Clustal Omega, A multiple sequence alignment of DNA or protein
sequences. Clustal Omega is a software package that replaces the older ClustalW
alignment tools. The scoring matrix used is BLOSUM62.
(g)
(i) Catalytic triad in the active site: Histidine 48, aspartic acid 92, serine 185
Catalytic triad in the substrate binding site: Aspartic acid 179, Serine 200,
Glycine 202.
The website I used to get catalytic triad of ammino acids involved in the active
site and substrate binding site is the ncbi site with the website:
https://www.ncbi.nlm.nih.gov/ . I then clicked site from the information
displayed concerning trypsinogen in accordance to /site_type="active" for active
site and "substrate binding sites [chemical binding]". I did this so as to obtain
the three amino acids involved in the catalytic triad from those of the whole
structure of trypsinogen.
(ii)[DNSTAGC]-[GSTAPIMVQH]-x(2)-G-[DE]-S-G-[GS]-[SAPHV]-[LIVMFYWH]-[LIVMFYSTANQH]
[LIVM]-[ST]-A-[STAG]-H-C
(h) Catalytic triad in the active site: Histidine 75, Aspartic acid 121, Serine
214.
Catalytic triad in the substrate binding site: Serine 208, Serine 233, Glycine 235
The website I used to get catalytic triad of ammino acids involved in the active
site and substrate binding site is the ncbi site with the website:
https://www.ncbi.nlm.nih.gov/ . I then clicked site from the information
displayed concerning chymotrypsinogen in accordance to /site_type="active" for
active site and "substrate binding sites [chemical binding]". I did this so as to
obtain the three amino acids involved in the catalytic triad from those of the
whole structure of chymotrypsinogen.