Anda di halaman 1dari 19

ABSTRACT

Bioinformatics is the application of informatic processes in biotic of statistics and computer science to the systems. Its primary use since at least the field of molecular biology. late 1980s has been in genomics and genetics, particularly in those areas of The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study genomics involving large-scale DNA approaches, however, is its focus on sequencing. computationally intensive techniques. Bioinformatics now entails the creation (e.g., pattern recognition, data mining, and advancement of databases, machine learning algorithms, and algorithms, computational and statistical visualization) to achieve this goal. Major techniques and theory to solve formal research efforts in the field include and practical problems arising from the sequence alignment, gene finding, management and analysis of biological genome assembly, drug design, drug data. discover y, protein structure alignment, protein structure pr ediction, prediction Over the past few decades rapid of gene expression and protein-protein developments in genomic and other interactions, genome-wide association molecular research technologies and studies and the modeling of evolution. developments in infor mation technologies have combined to produce a tremendous amount of infor mation related to molecular biology. It is the

Introduction:

name given to these mathematical and computing approaches used to glean Bioinfor matics was applied in the understanding of biological processes. creation and maintenance of a database

Common activities in bioinformatics to store biological information at the include mapping and analyzing DNA beginning of the "genomic revolution", and protein sequences, aligning different such as nucleotide and amino acid DNA and protein sequences to compare sequences. Development of this type of them and creating and viewing 3-D database involved not only design issues models of protein structur es. but the development of complex interfaces whereby researchers could The pr imary goal of bioinfor matics is to both access existing data as well as increase the understanding of biological submit new or revised data. processes. What sets it apart from other

In order to study how normal cellular into families of related activities are altered in different disease sequences. states, the biological data must be

Major research areas:

combined to form a comprehensive picture of these activities. Therefore, the Since the Phage -X174 was field of bioinfor matics has evolved such sequenced in 1977, the DNA sequences that the most pressing task now involves of thousands of organisms have been the analysis and interpretation of various decoded and stored in databases. This types of data, including nucleotide and sequence infor mation is analyzed to amino acid sequences, pr otein domains, determine genes that encode and protein structures. The actual polypeptides (proteins), RNA genes, process of analyzing and interpreting regulatory sequences, structural motifs, data is referred to as computational and repetitive sequences. A comparison biology. Impor tant sub-disciplines of genes within a species or between within bioinformatics and computational different species can show similarities biology include: between protein functions, or relations the development and
y

between species (the use of molecular implementation of tools that systematics to constr uct phylogenetic enable efficient access to, and trees). With the gr owing amount of data, use and management of, various

it long ago became impractical to types of infor mation. analyze DNA sequences manually. the development of new
y

Today, computer programs such as algorithms (mathematical BLAST are used daily to search the formulas) and statistics with genomes of thousands of organisms, which to assess relationships containing billions of nucleotides. These among members of large data programs can compensate for mutations sets, such as methods to locate a (exchanged, deleted or inserted bases) in gene within a sequence, predict the DNA sequence, in order to identify protein structure and/or function, sequences that are related, but not and cluster protein sequences identical. A variant of this sequence

alignment is used in the sequencing Another aspect of bioinfor matics in process itself. The so-called shotgun sequence analysis is annotation, which sequencing technique (which was used, involves computational gene finding to for example, by The Institute for search for protein-coding genes, RNA Genomic Research to sequence the first genes, and other functional sequences bacter ial genome, Haemophilus within a genome. Not all of the influenzae) does not produce entir e nucleotides within a genome are part of chromosomes, but instead generates the genes. Within the genome of higher sequences of many thousands of small organisms, large parts of the DNA do DNA fragments (ranging from 35 to 900 not serve any obvious purpose. This sonucleotides long, depending on the called junk DNA may, however, contain sequencing technology). The ends of unrecognized functional elements. these fragments over lap and, when Bioinfor matics helps to bridge the gap aligned properly by a genome assembly between genome and proteome projectsprogram, can be used to reconstruct the -for example, in the use of DNA complete genome. Shotgun sequencing sequences for protein identification. yields sequence data quickly, but the Genome annotation: task of assembling the fragments can be quite complicated for larger genomes. In the context of genomics, For a genome as large as the human annot ation

is the process of marking the genome, it may take many days of CPU genes and other biological features in a time on large-memory, multiprocessor DNA sequence. The fir st genome computers to assemble the fragments, annotation software system was and the resulting assembly will usually designed in 1995 by Dr. Owen White, contain numerous gaps that have to be who was part of the team at The I nstitute filled in later. Shotgun sequencing is the for Genomic Research that sequenced method of choice for virtually all and analyzed the first genome of a freegenomes sequenced today, and genome living or ganism to be decoded, the assembly algor ithms are a cr itical area of bacter ium Haemophilus influenzae. Dr. bioinfor matics resear ch. White built a software system to find the genes (places in the DNA sequence that

encode a protein), the transfer RNA, and build complex computational


y

other features, and to make initial models of populations to predict assignments of function to those genes. the outcome of the system over Most curr ent genome annotation systems time work similarly, but the programs track and share information on an
y

available for analysis of genomic DNA increasingly large number of are constantly changing and improving. species and organisms Future work endeavours to reconstruct Computational evolutionary the now more complex tree of life. biology: The ar ea of research within computer Evolutionar y biology is the study of science that uses genetic algorithms is the origin and descent of species, as well sometimes confused with computational as their change over time. Informatics evolutionar y biology, but the two areas has assisted evolutionary biologists in are not necessar ily related. several key ways; it has enabled resear chers to: Analysis of gene expression: trace the evolution of a lar ge
y

The expression of many genes number of organisms by can be determined by measuring mRNA measur ing changes in their DNA, levels with multiple techniques including rather than through physical

microarr ays, expressed cDNA sequence taxonomy or physiological tag (EST) sequencing, serial analysis of observations alone, gene expression (SAGE) tag sequencing, more recently, compare entir e
y

massively parallel signature sequencing genomes, which per mits the (MPSS), or var ious applications of study of more complex multiplexed in-situ hybr idization. All of evolutionar y events, such as gene these techniques are extremely noiseduplication, horizontal gene prone and/or subject to bias in the transfer, and the prediction of biological measurement, and a major factors important in bacterial resear ch area in computational biology speciation,

involves developing statistical tools to a single-cell organism, one might separate signal from noise in highcompare stages of the cell cycle, along throughput gene expression studies. with var ious stress conditions (heat Such studies are often used to determine shock, starvation, etc.). One can then the genes implicated in a disorder: one apply clustering algorithms to that might compare micr oarray data from expression data to determine which cancerous epithelial cells to data from genes are co-expressed. For example, the non-cancerous cells to determine the upstream regions ( promoters) of cotranscripts that are up-regulated and expressed genes can be searched for down-regulated in a particular over-r epresented regulatory elements. population of cancer cells. Analysis of protein expression: Analysis of regulation: Protein microar rays and high Regulation is the complex throughput (HT) mass spectrometry orchestration of events starting with an (MS) can provide a snapshot of the extracellular signal such as a hor mone proteins pr esent in a biological sample. and leading to an increase or decrease in Bioinfor matics is very much involved in the activity of one or more proteins. making sense of protein microarray and Bioinfor matics techniques have been HT MS data; the former approach faces applied to explore various steps in this similar problems as with microarrays process. For example, promoter analysis

targeted at mRNA, the latter involves the involves the identification and study of problem of matching large amounts of sequence motifs in the DNA sur rounding mass data against predicted masses from the coding region of a gene. These protein sequence databases, and the motifs influence the extent to which that complicated statistical analysis of region is transcribed into mRNA. samples where multiple, but incomplete Expression data can be used to infer peptides from each protein are detected. gene regulation: one might compar e microarr ay data from a wide variety of states of an organism to form hypotheses about the genes involved in each state. In

new opportunities for bioinformaticians. Analysis of mutations in The data is often found to contain cancer: consider able variability, or noise, and In cancer, the genomes of thus Hidden Markov model and changeaffected cells are rearranged in complex point analysis methods are being or even unpr edictable ways. Massive developed to infer real copy number sequencing efforts are used to identify changes. previously unknown point mutations in a Another type of data that requires novel variety of genes in cancer. infor matics development is the analysis Bioinfor maticians continue to produce of lesions found to be recurrent among specialized automated systems to many tumors . manage the sheer volume of sequence data produced, and they create new Prediction of protein structure: algorithms and software to compare the sequencing results to the growing Protein structure prediction is collection of human genome sequences another important application of and germline polymor phisms. New bioinfor matics. The amino acid sequence physical detection technologies ar e of a protein, the so-called primary employed, such as oligonucleotide structure, can be easily determined from microarr ays to identify chromosomal the sequence on the gene that codes for gains and losses (called comparative it. In the vast majority of cases, this

genomic hybridization), and singleprimary structure uniquely determines a nucleotide polymorphism arrays to structure in its native environment. ( Of detect known point mutations. These course, there are exceptions, such as the detection methods simultaneously bovine spongiform encephalopathy - aka measur e several hundred thousand sites Mad Cow Disease - prion.) Knowledge throughout the genome, and when used of this structur e is vital in understanding in high-throughput to measure thousands the function of the protein. For lack of of samples, generate ter abytes of data better terms, structural information is per exper iment. Again the massive usually classified as one of secondary, amounts and new types of data generate tertiar y and quaternary structure. A

viable general solution to such proteins have completely different amino predictions remains an open problem. As acid sequences, their protein structures of now, most efforts have been dir ected are virtually identical, which reflects towards heuristics that work most of the their near identical purposes. time. Other techniques for predicting protein One of the key ideas in bioinformatics is structure include protein threading and the notion of homology. I n the genomic de novo (from scratch) physics- based branch of bioinformatics, homology is modeling. used to predict the function of a gene: if Comparative genomics: the sequence of gene A, whose function is known, is homologous to the sequence The core of comparative genome of gene B, whose function is unknown, analysis is the establishment of the one could infer that B may share A's correspondence between genes function. In the structural branch of (orthology analysis) or other genomic bioinfor matics, homology is used to features in different organisms. It is determine which parts of a protein ar e these intergenomic maps that make it important in structure formation and possible to trace the evolutionary interaction with other proteins. In a processes responsible for the divergence technique called homology modeling, of two genomes. A multitude of this infor mation is used to predict the evolutionar y events acting at various

structure of a protein once the structure organizational levels shape genome of a homologous protein is known. This evolution. At the lowest level, point currently remains the only way to predict mutations affect individual nucleotides. protein structures reliably. At a higher level, large chromosomal segments undergo duplication, lateral One example of this is the similar transfer, inversion, transposition, protein homology between hemoglobin deletion and insertion. Ultimately, whole in humans and the hemoglobin in genomes ar e involved in processes of legumes (leghemoglobin). Both serve hybridization, polyploidization and the same purpose of transporting oxygen endosymbiosis, often leading to rapid in the organism. Though both of these

speciation. The complexity of genome simulation of simple (artificial) life evolution poses many exciting forms. challenges to developers of High-throughput image mathematical models and algor ithms, analysis: who have r ecourse to a spectr a of algorithmic, statistical and mathematical Computational technologies are used techniques, ranging from exact, to accelerate or fully automate the heuristics, fixed parameter and processing, quantification and analysis approximation algorithms for problems of large amounts of high- infor mationbased on parsimony models to Markov content biomedical imagery. Modern Chain Monte Car lo algorithms for image analysis systems augment an Bayesian analysis of problems based on observer's ability to make measurements probabilistic models. from a large or complex set of images, by improving accuracy, objectivity, or Many of these studies are based on the speed. A fully developed analysis system homology detection and protein families may completely replace the obser ver. computation. Although these systems are not unique to Modeling biological systems: biomedical imagery, biomedical imaging is becoming more important for both Systems biology involves the use diagnostics and research. Some of computer simulations of cellular examples are:

subsystems (such as the networks of metabolites and enzymes which high-throughput and high-fidelity
y

compr ise metabolism, signal quantification and sub-cellular transduction pathways and gene localization (high-content regulatory networks) to both analyze and screening, cytohistopathology, visualize the complex connections of Bioimage informatics) these cellular processes. Artificial life or morphometr ics
y

virtual evolution attempts to understand clinical image analysis and


y

evolutionar y processes via the computer visualization

determining the real-time aircomputing r esources on servers in other


y

flow patterns in breathing lungs parts of the world. The main advantages of living animals der ive from the fact that end users do not quantifying occlusion size in have to deal with software and database
y

real-time imager y from the maintenance overheads. development of and recovery Basic bioinformatics ser vices ar e during arterial injury classified by the EBI into three making behavioral observations
y

categor ies: SSS (Sequence Search from extended video recordings Services), MSA (Multiple Sequence of laboratory animals Alignment) and BSA (Biological infrared measurements for
y

Sequence Analysis). The availability of metabolic activity deter mination these service-oriented bioinformatics infer ring clone overlaps in DNA
y

resources demonstrate the applicability mapping, e.g. the Sulston score of web based bioinformatics solutions,

Software and tools:

and range from a collection of standalone tools with a common data format under a single, standalone or Software tools for bioinfor matics web-based interface, to integrative, range from simple command- line tools,

distr ibuted and extensible bioinformatics to more complex graphical programs and workflow management systems. standalone web-services available from various bioinformatics companies or

References:
y

public institutions. Web services in bioinformatics: Achuthsankar S Nair Computational Biology & SOAP and REST-based Bioinfor matics - A gentle interfaces have been developed for a Overview, Communications of wide variety of bioinfor matics Computer Society of India, applications allowing an application January 2007 running on one computer in one part of Aluru, Srinivas, ed. Handbook of
y

the world to use algorithms, data and Computational Molecular

Biology. Chapman & Hall/Crc, 2006. ISBN 1584884061 (Chapman & Hall/Crc Computer and Information Science Series) Baldi, P and Brunak, S,
y

Bioinfor matics: The Machine Learning Approach, 2nd edition. MIT Press, 2001. ISBN 0-26202506-X Barnes, M.R. and Gray, I.C.,
y

eds., Bioinfor matics for Geneticists, first edition. Wiley, 2003. ISBN 0- 470-84394-2 Baxevanis, A.D. and Ouellette,
y

B.F.F., eds., Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, third edition. Wiley, 2005. ISBN 0471-47878-4 Baxevanis, A.D., Petsko, G.A.,
y

Stein, L.D., and Stormo, G.D., eds., Current Protocols in Bioinfor matics. Wiley, 2007. ISBN 0- 471-25093-7.

http://studentsidea.blogspot.com

or http://studentsidea.co.cc

Anda mungkin juga menyukai