BIOL 5272 M Sequencing Assignment

BIOL 5272 M: Data Analysis I: DNA
Sequencing (Chromatogram 010.abi)

Interpretation of sequencing chromatograms provides information such as
protein function, structure, localisation, and evolution. Bioinformatics
methods, database, and tools enable the task to be done at home. Specific
software can process the chromatogram into a FASTA format. BLAST
algorithms identify the species, locate its domains, establish its phylogeny,
perform DNA mapping, and other tasks. Other tools provide protein
structure homology modelling, while accessible databases contain a
magnitude of information on evolution and biological functions of the
protein from the literature. In the end, the goal is to increase the
understanding and directing us to further studies of biological processes.
Sequence 010.abi
Sequencing chromatogram 010.abi was assigned for interpretation. Figure 1
shows the chromatogram upon opening with the free software FinchTV
(http://mac.softpedia.com/get/Math-Scientific/FinchTv/shtml). The sequence
corresponds to a partial gene for a certain protein, thus further analysis is
required to retrieve as much information as possible on protein function,
structure, localisation and evolution.
Figure 1. Chromatography of sequence 010.abi opened with the free

software Finch TV.
Conversion to FASTA format using the same software renders the following
result:
GACATAATGCGATTGGGTTTGATTCTTGGTCAGAATGGACTCGTATATATTAGTAAGTAAAGTGTGACA
CTTAGCCAACTGCCCATAAGCCTTCCTGCTCTTGTAAACTGGAATAACAAGTTCTAAAATGCTTGCACA
AAAATGGTAGAGCTCAGCTTGAGAGAAAAGCTTATTAGCAAGCTGTAAATACTTGACGGCGGAGTCAAC
TGTTAGCTTTGAAGCACCATAACCCTCGACTTCAGCTGCAGATGCCTCTGTTGTAAACTCGCCACTTAC
CATCGGGCAAATCTTACGCAAGGCAGACACATGATCCTTGCTCCACACACCATCATTCCTAGCCACCAA
GGCCTGCATTATCACACCTGCAACCGCCACAGCACACTGTGCAGCTTCTGCCCAAGACTGCATTTCCTG
GTGGGCATCACATAGATGCAATAACCACATTATGTGAAGATCCGGAACAGGAGCAAACGCCATTCCAAG
TTTGTAGAAGCTCTCTGCAGCAGCATATCGATCCATAGCCATCACAGATCCCAAAAGGGCATGCCCGAG
GCTGGCATCAAGAGCAAGAACAAGGCTGTCAGAAAGATGTTTGACTTCAGCCCATGACCAGCGGTTCTC
TGTAAATTTCTCGGGAATAATCAACAGTGTATCATCAGGAAGACCACACTCCCTCAACAAATTGACACT
CTTAGCTTCATCAGCCATTTCGCTTAATGACTGTTGAAGTCGCCGTGCTTCTCCACTCTCTTCTAATGT
ATTATCTGACTTCATATGAGTCACTTGGACATCCGACATGAGTTCTGACAGTGTAATCGTCAGCAAAGC
CCTCAGCCTAGCAGTCTGCATAAAGTATAGCGAGCTCTTAACTAGTATTTGAAGACCAATAACAGCCCT
TTTCCTGACACTGTCATTGCGGTAAACAGCAAGCCGGAGAAGATGGAAGGCTATCTGTTTTAAGAAGCG
ATCATTTTCCCTGGCCCATCAGTGTAGCCCCATGAAGATCAAAGATTCTGTTGAAAATTGGAAAAGAAG
CTTTCCAGAAAGCTAATGACTGGTTTCGGGAGAGAAACTCGTCAGTATTGTAGTAATGCAGTCCAATTT
GCCATAGTCGGTCGCAATATTGTGAGGAGCTGCATGATGAAAAATTTTCAGTGATCTCAAAACTGCAGC
TAACAGTGCACTACATCTCCTCCCAAAGTTCAGTTTTGCTCAAATGGGTGCAGCGATTCTTCTTGTTGA
CTTGGTCTGGAGCTTCATTCGGAAAGAATTGGGCTGACTAAGCCATCTCACTTAGGTACAATCCAGGTG
TGACGATAAGCGA
Screening for Contamination

A contaminated sequence does not represent the genetic information from the
source organism as it contains foreign segments such as vectors, adapters,
linkers, and PCR primers. Contamination affects data analysis because it may
elongate or add an open reading frame and altering the predicted translation
product(s). Therefore, tools to screening nucleic acid sequences for
contamination such as NCBI's VecScreen is valuable prior to further analyses to
avoid time and effort wasted on fruitless analyses, wrong conclusions drawn on
sequence significance, misassembly of sequence and false clustering of expressed
sequence tags, delay in sequence release in a public database, and finally
pollution of public databases. Figure 2 indicates that sequence 010.abi has no
identified contamination and thus ready for further analyses.
Figure 2. VecScreen detected no contaminants in the given sequence.
Finding the Best Possible Sequence Producing Significant

Alignment
BLAST (Basic Local Alignment Search Tool) compares primary biological
sequence information with a library or database of sequences and identifies the
sequences producing significant alignments above a certain threshold. Figure 3
shows the top three possible sequences producing significant alignment to the
query sequence.
Figure 3. The top three possible sequences producing significant alignment

to the assigned sequence 010.abi
The fact that the sequences share the exact maximum score (2093), total score
(2093), query cover (96%), E value (0.0), and identity (97%) but different
accessions calls for a further analysis. Table 1 demonstrates that although each
of the three sequences has different open reading frames, the most probable
reading frame is surprisingly identical for all three sequences, shown in FASTA
format as follows:
MENNNLGLRFRKLPRQPLALPKLDPLLDENLEQWPHLNQLVQCY
GTEWVKDVNKYGHYENIRPDSFQTQIFEGPDTDTETEIRLASARSATIEEDVASISGR
PFSDPGSSKHFGQPPLPAYEPAFDWENERAMIFGQRTPESPAASYSSGLKISVRVLSL
AFQSGLVEPFFGSIALYNQERKEKLSEDFYFQIQPTEMQDAKLSSENRGVFYLDAPSA
SVCLLIQLEKTATEEGGVTSSVYSRKEPVHLTEREKQKLQVWSRIMPYRESFAWAVVP
LFDNNLTTNTGESASPSSPLAPSMTASSSHDGVYEPIAKITSDGKQGYSGGSSVVVEI
SNLNKVKESYSEESIQDPKRKVHKPVKGVLRLEIEKHRNGHGDFEDLSENGSIINDSL
DPTDRLSDLTLMKCPSSSSGGPRNGCSKWNSEDAKDVSRNLTSSCGTPDLNCYHAFDF
CSTTRNEPFLHLFHCLYVYPVAVTLSRKRNPFIRVELRKDDTDIRKQPLEAIYPREPG
VSLQKWVHTQVAVGARAASYHDEIKVSLPATWTPSHHLLFTFFHVDLQTKLEAPRPVV
VGYASLPLSTYIHSRSDISLPVMRELVPHYLQESTKERLDYLEDGKNIFKLRLRLCSS
LYPTNERVRDFCLEYDRHTLQTRPPWGSELLQAINSLKHVDSTALLQFLYPILNMLLH
LIGNGGETLQVAAFRAMVDILTRVQQVSFDDADRNRFLVTYVDYSFDDFGGNQPPVYP
GLATVWGSLARSKAKGYRVGPVYDDVLSMAWFFLELIVKSMALEQARLYDHNLPTGED
VPPMQLKESVFRCIMQLFDCLLTEVHERCKKGLSLAKRLNSSLAFFCYDLLYIIEPCQ
VYELVSLYMDKFSGVCQSVLHECKLTFLQIISDHDLFVEMPGRDPSDRNYLSSILIQE
LFLSLDHDELPLRAKGARILVILLCKHEFDARYQKAEDKLYIAQLYFPFVGQILDEMP
VFYNLNATEKREVLIGVLQIVRNLDDTSLVKAWQQSIARTRLYFKLMEECLILFEHKK
AADSILGGNNSRGPVSEGAGSPKYSERLSPAINNYLSEASRQEVRLEGTPDNGYLWQR
VNSQLASPSQPYSLREALAQAQSSRIGASAQALRESLHPILRQKLELWEENVSATVSL
QVLEITENFSSMAASHNIATDYGKLDCITTILTSFFSRNQSLAFWKAFFPIFNRIFDL
HGATLMARENDRFLKQIAFHLLRLAVYRNDSVRKRAVIGLQILVKSSLYFMQTARLRA
LLTITLSELMSDVQVTHMKSDNTLEESGEARRLQQSLSEMADEAKSVNLLRECGLPDD
TLLIIPEKFTENRWSWAEVKHLSDSLVLALDASLGHALLGSVMAMDRYAAAESFYKLG
MAFAPVPDLHIMWLLHLCDAHQEMQSWAEAAQCAVAVAGVIMQALVARNDGVWSKDHV
SALRKICPMVSGEFTTEASAAEVEGYGASKLTVDSAVKYLQLANKLFSQAELYHFCAS
ILELVIPVYKSRKAYGQLAKCHTLLTNIYESILDQESNPIPFIDATYYRVGFYGEKFG
KLDRKEYVYREPRDVRLGDIMEKLSHIYESRMDSNHILHIIPDSRQVKAEDLQAGVCY
LQITAVDAVMEDEDLGSRRERIFSLSTGSVRARVFDRFLFDTPFTKNGKTQGGLEDQW
KRRTVLQTEGSFPALVNRLLVTKSESLEFSPVENAIGMIETRTTALRNELEEPRSSDG
DHLPRLQSLQRILQGSVAVQVNSGVLSVCTAFLSGEPATRLRSQELQQLIAALLEFMA
VCKRAIRVHFRLIGEEDQEFHTQLVNGFQSLTAELSHYIPAILSEL
Table 1. A comparison between the best open reading frames (ORFs) for
the top three possible sequences producing significant alignment to the
query sequence.
Top 3
Possible
Sequences
1*
Open Reading Frames
The Best Open Reading Frame
2*
3*
1* Arabidopsis thaliana DOCK family guanine nucleotide exchange factor

SPIKE1 mRNA, complete cds = Arabidopsis thaliana putative guanine
nucleotide exchange factor (SPK1) mRNA, complete cds
2* Arabidopsis thaliana mRNA for hypothetical protein, clone: RAFL16-07F02
3* Arabidopsis thaliana putative guanine nucleotide exchange factor
(SPK1) mRNA, complete cds
The Identification and the Characterisation of the Query

Protein from the Identical ORF of the Top Three Possible
Sequences Producing Significant Alignment to the Original
Query Sequence 010.abi
Based on the fact that all three possible sequences producing significant
alignment to the query sequence share the same open reading frame, the next
task is to identify if the amino acid sequence within the open reading frame
produces significant alignments to known proteins. Putative conserved domains
and the best sequences producing significant alignments to the open reading
frame amino acid sequence can be identified using BLASTP 2.2.30+ program as
seen in Figure 4.
Figure 4. The identification of the query protein from the identical open
reading frame of the top three possible sequences producing significant
alignment to the query sequence
It is shown that DOCK family guanine nucleotide exchange factor SPIKE 1 from
Arabidopsis thaliana (accession: NP_193367.7) has the most significant
alignment with a maximum score of 3793, a total score of 3793, a 100 % query
cover, an E value of 0.0, and a 100% identity.
DOCK family guanine nucleotide exchange factor SPIKE

1 from Arabidopsis thaliana (accession: NP_193367.7)
Putative guanine nucleotide exchange factor
Gene Names
SPK1 (Ordered Locus Names: At4g16340)
Also known as: DL4200C; FCAALL.346; SPIKE1; SPK1
Summary
mutant has seedling lethal; trichrome, leaf-shape, cotyledon defects; Putative
Cytoskeletal Protein
Proteomes
UP000006548: Chromosome 4
Figure 5. The genomic context of SPK1 gene in chromosome 4

Arabidopsis thaliana
The Arabidopsis Information Resource (TAIR)

AT4G16340
Functions
GTPase binding
GTP binding
Guanyl-nucleotide exchange factor activity
Located in
Cytosol, plasma membrane, extrinsic component of membrane, endoplasmic
reticulum exit site, nucleus
Domain hits
DHR2_DOCK (accession: cd11684): Dock Homology Region 2, a GEF
domain, of Dedicator of Cytokinesis proteins
DHR2 is one of the two domains of DOCK proteins, which are a family of atypical
guanine nucleotide exchange factors (GEFs) without the usual Dbl homology
(DH) domain. As GEFs, they activate the small GTPases Rac and Cdc42 through
bound GDP exchange for free GTP. DHR2 contains the catalytic GEF activity for
Rac and/or Cdc42.
Marchler-Bauer A et al. (2011), "CDD: a Conserved Domain Database for the functional
annotation of proteins.", Nucleic Acids Res.39(D)225-9.
Marchler-Bauer A et al. (2009), "CDD: specific functional annotation with the

Conserved Domain Database.", Nucleic Acids Res.37(D)205-10.
Marchler-Bauer A, Bryant SH (2004), "CD-Search: protein domain annotations on the

fly.", Nucleic Acids Res.32(W)327-331.
Marchler-Bauer A et al. (2013), "CDD: conserved domains and protein threedimensional structure.", Nucleic Acids Res. 41(D1):D384-52.
C2_DOCK180_related (accession: cd08679): C2 domains found in Dedicator

of CytoKinesis1 (DOCK 180) and related proteins
Dock180 was first identified as a product of c-Crk-interacting protein important
in actin cytoskeletal changes. It is known know that many C2 domains are
calcium-dependent membrane-targeting modules that bind a various substances.
Most C2 domain proteins are either signal transduction enzymes such as protein
kinase, or membrane trafficking proteins such as synaptotagmin 1.
Cellular signaling of Dock family proteins in neural function.Cell. Signal. 2010 Feb; 22(2):175-182
[Regulation of cell morphology and motility by Dock family proteins].Seikagaku 2009 Aug; 81(8):711-716
Structural basis of membrane targeting by the Dock180 family of Rho family guanine exchange factors
(Rho-GEFs).J. Biol. Chem. 2010 Apr 23; 285(17):13211-13222
Ded_cyto (accession: pfam06920): Dedicator of cytokinesis

Dedicator of cytokinesis represents a conserved region around 200 residues
long, which are potential guanine nucleotide exchange factors that activate
several small GTPases by exchanging bound GDP for free GTP
DOCK-C2 (accession: pfam14429): C2 domain in Dock 180 and Zizimin
proteins
They are atypical GTP/GPD exchange factor for GTPases Rac and Cdc42 and are
implicated in phagocytosis and cell-migration.
Structural basis of membrane targeting by the Dock180 family of Rho family guanine exchange factors
(Rho-GEFs).J. Biol. Chem. 2010 Apr 23; 285(17):13211-13222
Identification of novel families and classification of the C2 domain superfamily elucidate the origin and
evolution of membrane targeting activities in eukaryotes.Gene 2010 Dec 1; 469(1-2):18-30
Model Structure
(Provided by ModBase)
template:
the best match after the second blast
copy of the accession number to fuckin ncbi for more info on the query protein
bottom TAIR -- arabidopsis thaliana (function, localisation)
uniprot -- enter the name

BIOL 5272 M Sequencing Assignment

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

BIOL 5272 M Sequencing Assignment

Diunggah oleh

Hak Cipta:

Format Tersedia

BIOL 5272 M: Data Analysis I: DNA

Sequencing (Chromatogram 010.abi)

Figure 1. Chromatography of sequence 010.abi opened with the free

Screening for Contamination

Figure 2. VecScreen detected no contaminants in the given sequence.

Finding the Best Possible Sequence Producing Significant

Figure 3. The top three possible sequences producing significant alignment

Open Reading Frames

The Best Open Reading Frame

1* Arabidopsis thaliana DOCK family guanine nucleotide exchange factor

The Identification and the Characterisation of the Query

DOCK family guanine nucleotide exchange factor SPIKE

Figure 5. The genomic context of SPK1 gene in chromosome 4

The Arabidopsis Information Resource (TAIR)

Marchler-Bauer A et al. (2009), "CDD: specific functional annotation with the

Marchler-Bauer A, Bryant SH (2004), "CD-Search: protein domain annotations on the

C2_DOCK180_related (accession: cd08679): C2 domains found in Dedicator

Ded_cyto (accession: pfam06920): Dedicator of cytokinesis

Anda mungkin juga menyukai