Anda di halaman 1dari 27

INDEX

Introduction to Bioinformatics
Introduction to MATLAB
Introduction to Biological data and their Physical, chemical and

Biological Properties.
Introduction to BI Toolbox of MATLAB.
Introduction to SEQtool list of Function under SEQtool.
Conclusion
INTRODUCTION TO BIOINFORMATI:-
During the last ten years, molecular biology has witnessed an
information revolution as a result both of the development of rapid DNA
sequencing techniques and of the corresponding progress in computer
based technologies which are allowing us to manage with this
information overflow in increasingly efficient ways. The broad term that
was coined in the mid 1980s to include computer applications in
biological sciences is Bioinformatics. The bioinformatics the term can
be considered to mean information technology applied to the
management and analysis of biological data.
This has some implications in diverse areas, ranging from
artificial intelligence and robotics to genome analysis.

Bioinformatics is an interdisciplinary field that develops methods


and software tools for understanding biological data. As
an interdisciplinary field of science, bioinformatics combines computer
science, statistics, mathematics, and engineering to analyze and
interpret biological data. Bioinformatics has been used for in
silico analyses of biological queries using mathematical and statistical
techniques.

Bioinformatics is both an umbrella term for the body of biological


studies that use computer programming as part of their methodology, as
well as a reference to specific analysis "pipelines" that are repeatedly
used, particularly in the field of genomics. Common uses of
bioinformatics include the identification of candidate genes
and nucleotides (SNPs). Often, such identification is made with the aim
of better understanding the genetic basis of disease, unique adaptations,
desirable properties (esp. in agricultural species), or differences
between populations. In a less formal way, bioinformatics also tries to
understand the organisational principles within nucleic
acid and protein sequences, called proteomics.

Bioinformatics has become an important part of many areas of biology.


In experimental molecular biology, bioinformatics techniques such
as image and signal processing allow extraction of useful results from
large amounts of raw data. In the field of genetics and genomics, it aids
in sequencing and annotating genomes and their observed mutations. It
plays a role in the text mining of biological literature and the
development of biological and gene ontologies to organize and query
biological data. It also plays a role in the analysis of gene and protein
expression and regulation. Bioinformatics tools aid in the comparison of
genetic and genomic data and more generally in the understanding of
evolutionary aspects of molecular biology. At a more integrative level, it
helps analyze and catalogue the biological pathways and networks that
are an important part of systems biology. In structural biology, it aids in
the simulation and modeling of DNA, RNA, proteins as well as
biomolecular interactions.
Historically, the term bioinformatics did not mean what it means
today. Paulien Hogeweg and Ben Hesper coined it in 1970 to refer to
the study of information processes in biotic systems.

This definition placed bioinformatics as a field parallel


to biophysics (the study of physical processes in biological systems)
or biochemistry (the study of chemical processes in biological systems)

BIOINFORMATICS DEFINITION

Bioinformatics derives knowledge from computer analysis of


biological data. It is a rapidly developing branch of biology and is
highly interdisciplinary, using techniques and concepts from
informatics, statistics, mathematics, chemistry, biochemistry, physics,
and linguistics. It has many practical applications in different areas of
biology and medicine.

Roughly, bioinformatics describes any use of computers to handle


biological information. In practice the definition used by most people is
narrower; bioinformatics to them is a synonym for "computational
molecular biology"- the use of computers to characterize the molecular
components of living things.
INTRODUCTION TO MATLAB

Millions of engineers and scientists worldwide use MATLAB to


analyze and design the systems and products transforming our world.
MATLAB is in automobile active safety systems, interplanetary
spacecraft, health monitoring devices, smart power grids, and LTE
cellular networks. It is used for machine learning, signal processing,
image processing, computer vision, communications, computational
finance, control design, robotics, and much more.

MATLAB is a high-performance language for technical computing. It


integrates computation, visualization, and programming in an easy-to-
use environment where problems and solutions are expressed in familiar
mathematical notation. Typical uses include:

Math and computation

Algorithm development

Modeling, simulation, and prototyping

Data analysis, exploration, and visualization

Scientific and engineering graphics

Application development, including Graphical User Interface


building
MATLAB is an interactive system whose basic data element is an array
that does not require dimensioning. This allows you to solve many
technical computing problems, especially those with matrix and vector
formulations, in a fraction of the time it would take to write a program
in a scalar noninteractive language such as C or Fortran.
The name MATLAB stands for matrix laboratory. MATLAB was
originally written to provide easy access to matrix software developed
by the LINPACK and EISPACK projects, which together represent the
state-of-the-art in software for matrix computation.
MATLAB has evolved over a period of years with input from many
users. In university environments, it is the standard instructional tool for
introductory and advanced courses in mathematics, engineering, and
science. In industry, MATLAB is the tool of choice for high-productivity
research, development, and analysis.
MATLAB features a family of application-specific solutions called
toolboxes. Very important to most users of MATLAB, toolboxes allow
you to learn and apply specialized technology. Toolboxes are
comprehensive collections of MATLAB functions (M-files) that extend
the MATLAB environment to solve particular classes of problems. Areas
in which toolboxes are available include signal processing, control
systems, neural networks, fuzzy logic, wavelets, simulation, and many
others.

The MATLAB System


The MATLAB system consists of five main parts:

The MATLAB language.


This is a high-level matrix/array language with control flow statements,
functions, data structures, input/output, and object-oriented
programming features. It allows both "programming in the small" to
rapidly create quick and dirty throw-away programs, and "programming
in the large" to create complete large and complex application
programs.

The MATLAB working environment.


This is the set of tools and facilities that you work with as the MATLAB
user or programmer. It includes facilities for managing the variables in
your workspace and importing and exporting data. It also includes tools
for developing, managing, debugging, and profiling M-files, MATLAB's
applications.

Handle Graphics.
This is the MATLAB graphics system. It includes high-level commands
for two-dimensional and three-dimensional data visualization, image
processing, animation, and presentation graphics. It also includes low-
level commands that allow you to fully customize the appearance of
graphics as well as to build complete Graphical User Interfaces on your
MATLAB applications.
The MATLAB mathematical function library.
This is a vast collection of computational algorithms ranging from
elementary functions like sum, sine, cosine, and complex arithmetic, to
more sophisticated functions like matrix inverse, matrix eigenvalues,
Bessel functions, and fast Fourier transforms.

The MATLAB Application Program Interface (API).


This is a library that allows you to write C and Fortran programs that
interact with MATLAB. It include facilities for calling routines from
MATLAB (dynamic linking), calling MATLAB as a computational
engine, and for reading and writing MAT-files.

MATH. GRAPHICS. PROGRAMMING.

The MATLAB platform is optimized for solving engineering and


scientific problems. The matrix-based MATLAB language is the worlds
most natural way to express computational mathematics. Built-in
graphics make it easy to visualize and gain insights from data. A vast
library of prebuilt toolboxes lets you get started right away with
algorithms essential to your domain. The desktop environment invites
experimentation, exploration, and discovery. These MATLAB tools and
capabilities are all rigorously tested and designed to work together.
SCALE. INTEGRATE. DEPLOY.

MATLAB helps you take your ideas beyond the desktop. You can run
your analyses on larger data sets and scale up to clusters and clouds.
MATLAB code can be integrated with other languages, enabling you to
deploy algorithms and applications within web, enterprise, and
production systems.
Biological Data And Their Physical Chemical And Biological
Properties:-

Biological data are data or measurements collected from biological


sources, which are often stored or exchanged in a digital form.
Biological data are commonly stored in files or databases. Examples of
biological data are DNA base-pair sequences, and population data used
in ecology.

Data File Formats

Each file format has been designed for specific needs and outputs in
mind.

GFF

VCF

AB1 In DNA sequencing, chromatogram files used by


instruments from Applied Biosystems

ACE A sequence assembly format

BAM Binary (compressed) Alignment/Map format based on SAM


Sequence Alignment/Map format
BED The browser extensible display format is used for
describing genes and other features of DNA sequences

CAF Common Assembly Format for sequence assembly

EMBL The flatfile format used by the EMBL to represent


database records for nucleotide and peptide sequences from EMBL
databases

FASTA The FASTA file format, for sequence data. Sometimes also
given as FNA or FAA (Fasta Nucleic Acid or Fasta Amino Acid).

FASTQ The FASTQ file format, for sequence data with quality.
Sometimes also given as QUAL.

GenBank The flatfile format used by the NCBI to represent


database records for nucleotide and peptide sequences from the
GenBank and RefSeq databases

GFF The General feature format is used for describing genes


and other features of DNA, RNA and protein sequences

GTF The Gene transfer format is used to hold information about


gene structure.
NEXUS The Nexus file encodes mixed information about genetic
sequence data in a block structured format.

NWK The Newick tree format is a way of representing graph-


theoretical trees with edge lengths using parentheses and commas. It
is useful to hold phylogenetic trees.

PDB structures of biomolecules deposited in Protein Data Bank.


Also used for exchanging protein/nucleic acid structures.

PHD Phred output, from the basecalling software Phred

SAM Sequence Alignment/Map format, in which the results of the


1000 Genomes Project will be released.

SCF Staden chromatogram files used to store data from DNA


sequencing

SBML The Systems Biology Markup Language is used to store


biochemical network computational models

SFF - Standard Flowgram Format

Stockholm The Stockholm format for representing multiple


sequence alignments
Swiss-Prot The flatfile format used to represent database records
for protein sequences from the Swiss-Prot database

VCF Variant Call Format, a standard created by the 1000


Genomes Project that lists and annotates the entire collection of
human variants (with the exception of approximately 1.6 million
variants).

Biological Data are libraries of life sciences information, collected


from scientific experiments, published literature, high-throughput
experiment technology, and computational analysis.[2] They contain
information from research areas
including genomics, proteomics, metabolomics, microarray gene
expression, and phylogenetics.[3] Information contained in biological
Data includes gene function, structure, localization (both cellular and
chromosomal), clinical effects of mutations as well as similarities of
biological sequences and structures.

Biological Data can be broadly classified into sequence, structure and


functional Data. Nucleic acid and protein sequences are stored in
sequence Data and structure Data store solved structures of RNA and
proteins. Functional Data provide information on the physiological role
of gene products, for example enzyme activities, mutant phenotypes, or
biological pathways. Model Organism Data are functional Data that
provide species-specific data. Data are important tools in assisting
scientists to analyze and explain a host of biological phenomena from
the structure of biomolecules and their interaction, to the
whole metabolism of organisms and to understanding
the evolution of species. This knowledge helps facilitate the fight against
diseases, assists in the development of medications, predicting certain
genetic diseases and in discovering basic relationships among species in
the history of life.

Biological knowledge is distributed among many different general and


specialized Data. This sometimes makes it difficult to ensure the
consistency of information. Integrative bioinformatics is one field
attempting to tackle this problem by providing unified access. One
solution is how biological Data cross-reference to other Data
with accession numbers to link their related knowledge together.

Relational database concepts of computer science and Information


retrieval concepts of digital libraries are important for understanding
biological Data. Biological database design, development, and long-
term management is a core area of the discipline of bioinformatics.
[4]
Data contents include gene sequences, textual descriptions, attributes
and ontology classifications, citations, and tabular data. These are often
described as semi-structured data, and can be represented as tables, key
delimited records, and XML structures.
Effective use of agricultural residual biomass may be beneficial for
both local and global ecosystems. Recently, biochar has received
attention as a soil enhancer, and its effects on plant growth and soil
microbiota have been investigated. However, there is little information
on how the physical, chemical, and biological properties of soil
amended with biochar are affected. In this study, we evaluated the
effects of the incorporation of torrefied plant biomass on physical and
structural properties, elemental profiles, initial plant growth, and
metabolic and microbial dynamics in aridisol from Botswana.
Hemicellulose in the biomass was degraded while cellulose and lignin
were not, owing to the relatively low-temperature treatment in the
torrefaction preparation. Water retentivity and mineral availability for
plants were improved in soils with torrefied biomass. Furthermore,
fertilization with 3% and 5% of torrefied biomass enhanced initial
plant growth and elemental uptake. Although the metabolic and
microbial dynamics of the control soil were dominantly associated
with a C1 metabolism, those of the 3% and 5% torrefied biomass soils
were dominantly associated with an organic acid metabolism.
Torrefied biomass was shown to be an effective soil amendment by
enhancing water retentivity, structural stability, and plant growth and
controlling soil metabolites and microbiota.
BIOINFORMATICS TOOLS:

MATLAB (The MathWorks, Inc.) is a general-purpose technical


computing language and development environment that is widely used in
scientific and engineering applications. The Bioinformatics Toolbox for
MATLAB is a library of functions that adds bioinformatics capabilities
to the MATLAB environment. This article describes the functions in the
Bioinformatics Toolbox and gives an example of how these can be used
for phylogenetic analysis of Human Immunodeficiency Virus (HIV) and
Simian Immunodeficiency Virus (SIV) sequence data. Keywords:
MATLAB; sequence alignment; phylogenetic trees; clustering;
visualization; algorithm; data analysis.
Bioinformatics Toolbox provides algorithms and apps for
Next Generation Sequencing (NGS), microarray analysis, mass
spectrometry, and gene ontology. Using toolbox functions, you can read
genomic and proteomic data from standard file formats such as SAM,
FASTA, CEL, and CDF, as well as from online databases such as the
NCBI Gene Expression Omnibus and GenBank. You can explore and
visualize this data with sequence browsers, spatial heatmaps, and
clustergrams. The toolbox also provides statistical techniques for
detecting peaks, imputing values for missing data, and selecting
features.
You can combine toolbox functions to support common bioinformatics
workflows. You can use ChIP-Seq data to identify transcription factors;
analyze RNA-Seq data to identify differentially expressed genes; identify
copy number variants and SNPs in microarray data; and classify protein
profiles using mass spectrometry data.

Getting Started

Learn the basics of Bioinformatics Toolbox

High-Throughput Sequencing

Gene expression, transcription factor, and methylation analysis of NGS


data, including RNA-Seq and ChIP-Seq

Microarray Analysis

Gene expression and genetic variant analysis of microarray data

Sequence Analysis

Genomic and proteomic sequences, alignment, and phylogenetics

Structural Analysis

Visualize and manipulate 3-D structures of proteins and other


biomolecules; RNA secondary structure prediction and visualization
Mass Spectrometry and Bioanalytics

Data from separation techniques that produce traces with peaks,


including MS, LC/MS, NMR, chromatography, and electrophoresis

Bioinformatics tools are software programs that are designed for


extracting the meaningful information from the mass of molecular
biology / biological databases & to carry out sequence or structural
analysis. Factors that must be taken into consideration when designing
bioinformatics tools, software and programmers are:

The end user (the biologist) may not be a frequent user of


computer technology

These software tools must be made available over the internet


given the global distribution of the scientific research community

MAJOR CATEGORIES OF BIOINFORMATICS TOOLS:

Bioinformatics Tools can be classified as

Homology and similarity tools.

Protein functional analysis tools.

Sequence analysis tools.

Miscellaneous tools.
Iiiii Bioinformatics is done with sequence search programs like
BLAST, sequence analysis programs, like the EMBOSS and Staden
packages, structure prediction programs like THREADER or PHD or
molecular imaging/modelling programs like RasMol and WHATIF.

SOME EXAMPLES OF BIOINFORMATICS TOOLS:

BLAST:
BLAST (Basic Local Alignment Search Tool) comes under the
category of homology and similarity tools. It is a set of search programs
designed for the Windows platform and is used to perform fast similarity
searches regardless of whether the query is for protein or DNA.
Comparison of nucleotide sequences in a database can be performed.
Also a protein database can be searched to find a match against the
queried protein sequence. NCBI has also introduced the new queuing
system to BLAST (Q BLAST) that allows users to retrieve results at their
convenience and format their results multiple times with different
formatting options.
INTRODUCTION TO SEQTOOL LIST OF FUNCTION UNDER SEQTOOL
1. 1.1 about seqtools
1. 1.1.1 special features
2. 1.1.2 user interaction
2. 1.2 seqtools help sources
3. 1.3 about registration and licenses
4. 1.4 user interaction (update policy)
5. 1.5 support (bug reports)

1.1 ABOUT SEQTOOLS

SEQtools 8.3 is a win32 software package for handling and analysis of


nucleotide and protein sequences. The program includes a series of
trivial functions to help you carry out common operations. In addition
SEQtools will assist you with more demanding tasks like unattended
batch blast search at NCBI. SEQtools includes advanced facilities for
retrieving, storing, handling and listing search results.

1.1.1 SPECIAL FEATURES

Special functions are included for design of microarray gene expression


analysis experiments, for expression analyses with the SAGE procedure
and for managing small EST projects. Utilities are included for primer
design and ordering, renaming files, creating codon usage tables,
building local searchable databases, aligning nucleotide and protein
sequences, comparing sequences and a lot more. Recently an option to
export sequence data to a ms excel spreadsheet has been included.

1.1.2 USER INTERACTION

SEQtools is a very responsive software package. User comments and


suggestions are highly appreciated and play a key role in keeping the
program bug-free and up to date. You can use SEQtools free of charge
for as long as you wish if you keep your registration alive by confirming
the registration every 60 days.

1.2 SEQTOOLS HELP SOURCES

SEQtools does not come with a printed manual. As the whole SEQtools
organisation consists of a single person it is simply not possibly to
maintain the code, the context sensitive help and the web help. Although
I try to keep the context sensitive help which is build into the program
up-to date, the updating usually lags several revisions behind. Pressing
F1 brings up context sensitive help information relating to the currently
active program item.

The SEQtools homepage includes a fairly comprehensive manual which


is currently being revised to cover the latest changes to the program. I
will attempt to maintain this source of help information up-to date with
relevant illustrations covering the different topics.

1.3 REGISTRATION AND LICENSES

You can access the SEQtools registration form either from the program
as shown below or by visiting www.seqtools.dk

Providing SEQtools to users free of charge has the dual advantage that
users all over the world get free access to a fairly comprehensive
software package for sequence handling and analysis. In return I get
information about bugs and receive useful user input in the form of
suggestions and comments from a large number of users.
The difficult economic situation of many students and scientist in third
world countries is an additional argument for making the use of
SEQtools free of charge. The only condition for the free access to
SEQtools is that users are requested to register after a testing period of
60 days and there after to keep their registration alive by renewing their
license every 60 days.

Old users of SEQtools already know that SEQtools is updated very


frequently. Unlike most other authors of software packages I prefer to
correct bugs right away and upload the corrected version. This used to
create the problem that users often complained about bugs that were
already corrected but not yet downloaded on their pc.

Recently I have included an "update-tester" in SEQtools. Every time you


start SEQtools it visits the download page to see if new updates are
available - and notifies you if there are. You may experience that your
license no longer works after upgrading to a newer version of SEQtools.
In this case you just have to renew your license to cover the upgraded
version.

The user name and the registration key is entered in the form shown
below. Note that this information must be entered exactly as in the
license agreement. The user name is case and "space" sensitive.
Entering incorrect information will terminate SEQtools immediately.
You can extended your license atwww.seqtools.dkor by sending
anemail to me.

1.4 USER INTERACTION

Seqtools has evolved in close association with its users. Numerous users
have contributed significantly to the program by suggesting new
functions to be included in the suite and - not least - by testing functions
and reporting the result to the author.

As SEQtools is maintained by a small organisation there is a very short


distance between coding a program revision and the publication of the
update. This has the advantage that bug fixes are made available to the
users very rapidly, usually the same day the bug is reported.
The disadvantage of the frequent revisions is that you need to update the
program often. Each time SEQtools is opened it will contact the
download page on the web to check if an update is available. If a
revision is available you are informed as SEQtools loads.

It is strongly recommended that you update your SEQtools installation


when a new update is available. As the auto-update process does not
require reinstallation of SEQtools I believe that this is a minor
inconvenience to ensure that you always work with a version of
SEQtools without known bugs or other problems.

1.5 SUPPORT - BUG REPORTS

You can find the latest additions and corrections to SEQtools on


therevision historysection of the homepage. As the last resort write
aemailto me describing the problem (please include as many details as
possible) and I will do my best to assist you.

It is also possible to submit a bug report directly from SEQtools. Look


under the help menu to load the bug report form.
CONCLUSION:
I have studied some function of chemical, Physical and biological
Characteristics of Data using Seqtool available under BI Toolbox of
Matlab which is beneficial because Matlab is a good environment that
provide.

Anda mungkin juga menyukai