Project Report On Hepatitis Virus

1
Introduction
Bioinformatics is an interdisciplinary research area at the interface between computer
science and biological science. It involves the technology that uses computers for storage,
retrieval, manipulation and distribution of information related to biological
macromolecules such as DNA, RNA and proteins. Bioinformatics is limited to sequence,
structural, and functional analysis of genes and genomes and their corresponding
products and is often considered computational molecular biology. It consists of
two subfields: the development of computational tools and databases and the application
of these tools and databases in generating biological knowledge to better understand
living systems. These tools are used in three areas of genomic and molecular biological
research: molecular sequence analysis, molecular structural analysis and molecular
functional analysis. The areas of sequence analysis include sequence alignment, sequence
database searching, motif and pattern discovery, gene and promoter finding,
reconstruction of evolutionary relationships, and genome assembly and comparison.
Structural analyses include protein and nucleic acid structure analysis, comparison,
Classification and prediction. The functional analysis includes gene expression profiling,
protein- protein interaction prediction, protein sub cellular localization prediction,
metabolic pathway reconstruction, and simulation. The three aspects of bioinformatics
analysis are not isolated but often interact to produce integrated results. For example,
protein structure prediction depends on sequence alignment data; clustering of gene
expression profiles requires the use of phylogenetic tree construction methods derived
In sequence analysis. Sequence- based prediction is related functional analysis of co
expressed genes. The first major bioinformatics project was undertaken by Margaret
Dayhoff in 1965, who developed a first protein sequence database called Atlas of Protein
Sequence and Structure. Subsequently, in the early 1970s, the Brookhaven national
laboratory established the Protein Data Bank for archiving three-dimensional protein
structures. At its onset, the database stored less than a dozen protein structures, compared
to more than 30,000 structures today. The first sequence alignment algorithm was
2
Developed by Needleman and Wunsch in 1970. This was a fundamental step in the
development of the field of bioinformatics, which paved the way for the routine sequence
comparisons and database searching practiced by modern biologists.
10 The recent advance of Bioinformatics is molecular modeling which is aimed at
understanding structure-function and structure property relationship in physico-chemical
processes and pharmaceuticals & thus has become increasingly important for finding and
designing new drugs. In fact computers are playing an important role in new drug
discovery and drug design.
HEPATITIS:-
Hepatitis (plural hepatitides) implies injury
to liver characterized by presence of inflammatory cells in the
liver tissue. Etymologically from ancient Greek hepar or hepato- meaning 'liver,' and
suffix -itis, denoting 'inflammation’. The condition can be self limiting, healing on its
own, or can progress to scarring of the liver
. Hepatitis is acute when it lasts less than 6 months

and chronic when it persists longer. A group of viruses known as the
hepatitis viruses cause most cases of liver damage worldwide.
Hepatitis can also be due to toxins (notably alcohol), other infections or
from autoimmune process.
It may run a sub clinical course when

the affected person may not feel ill. The patient becomes unwell and
symptomatic when the disease impairs liver functions that include,
3
among other things, screening of harmful substances, regulation of
blood composition, and production of bile to help digestion.
Causes
Acute hepatitis
 Viral Hepatitis: Hepatitis A to E (more than 95% of viral

cause), Herpes simplex, Cytomegalovirus, Epstein-Barr, Yellow fever
virus, Adenoviruses.
 Non viral infection: Toxoplasma, Leptospira, Q fever, Rocky
mountain spotted fever
 Alcohol
 Toxins: Amanita toxin in mushrooms, Carbon
tetrachloride, Asafetida
 Drugs: Paracetamol, Amoxicillin, Antituberculosis
medicines, Minocycline and many others.
 Ischemic hepatitis (circulatory insufficiency)(1)
 Pregnancy
 Auto immune conditions, e.g. Systemic Lupus

Erythematosus (SLE)
 Metabolic diseases, e.g. Wilson's disease
Chronic hepatitis
 Viral hepatitis: Hepatitis B with or without hepatitis D, hepatitis C

(Hepatitis A and E do not lead to chronic disease)
4
 Autoimmune: Autoimmune hepatitis
 Alcohol
 Drugs: Methyl-dopa, Nitrofurantoin,Iisoniazide, Ketoconazole
 Non-alcoholic steatohepatitis
 Heredity: Wilson's disease, alpha 1-antitrypsin deficiency
 Primary biliary cirrhosis and primary sclerosing
cholangitis occasionally mimic chronic hepatitis[4]
Viral hepatitis
A virus is a particle which is smaller than bacteria, and contains complex genetic
information called DNA or RNA. This genetic material allows the virus to infect bacteria
or living cells, set up the machinery to reproduce itself, leading to destruction of the cell
in which it resides. To date, five viruses, labeled A through E, have been identified which
appear to cause viral hepatitis. Viruses A and E can be contracted from contaminated
water or food (by mouth), while viruses B, C and D are transmitted by direct injection
into the bloodstream (through any method of injection under the skin). The term viral
hepatitis describes any one of the illnesses caused by the five viruses mentioned, and
consists of an infection of liver cells which leads to damage of the liver over days in
some cases, but over many years in others. Thirty years ago, none of the hepatitis viruses
had been identified. In the 1960's, transfusion-related viral hepatitis was extremely
common, with 30% of patients receiving blood products becoming infected. By 1970, a
blood test called the Australia antigen, was developed which appeared to identify those
infected with one hepatitis virus which we now call hepatitis B. The
investigator who discovered the Australia antigen, the protein which makes up the coat of
the virus and which is now called the hepatitis B surface antigen (HBsAg), was awarded
the Nobel prize. Our understanding of viral hepatitis has grown tremendously since the
discovery of the Australia antigen.
5
Currently 11 viruses are recognized as causing hepatitis, Two are
herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are
hepatotropic viruses
EBV and CMV cause mild ,self-resolving forms of hepatitis with no permanent
hepatic damage. Both viruses causes the typical infectious mononucleosis of fatigue
,nausea , and malaise.
Of the nine human hepatotrofic viruses ,only five are well characterized;
hepatitis G and TTV(transfusion transmitted virus) are newly discovered viruses
.hepatitis A (sometimes called infectious hepatitis), and hepatic E (formally called enteric
–transmitted NANB hepatitis) ,are transmitted by fecal-oral contamination .The most
important type include hepatitis B(sometime called serum hepatitis), hepatitis C (formally
called formally non-A ,non-B hepatic), and hepatitis D (formally called delta hepatitis).
Hepatitis A
Incubation period 3-5 weeks (mean 28 days)

Milder disease than Hepatitis B; asymptomatic infections are very common, especially in
children.
Adults, especially pregnant women, may develop more severe disease.
Although convalescence may be prolonged, there is no chronic form of the disease.
Fulminant hepatitis is rare: 0.1% of cases Virus enters via the gut; replicates in the
alimentary tract and spreads to infect the liver, where it multiplies in hepatocytes.
Viraemia is transient. Virus is excreted in the stools for two weeks preceding the onset
of symptoms.
World-wide distribution; endemic in most countries. The incidence in first world

countries is declining. There is an especially high incidence in developing countries and
rural areas. In rural areas of South Africa , the seroprevalence is 100%.
6
Hepatitis E
Incubation period 30-40 days
Acute, self limiting hepatitis, no chronic carrier state
Age: predominantly young adults, 15-40 years .Fulminate hepatitis in pregnant women.
Mortality rate is high (up to 40%).Similar to hepatitis A; virus replicates in the gut
initially, before invading the liver, and virus is shed in the stool prior to the onset of
symptoms. Viraemia is transient. A large inoculum of virus is needed to establish
infection.Little is known yet. The incidence of infection appears to be low in first world
countries.
Hepatitis C
Putative Togavirus related to the Flavi and Pesti viruses.
Thus probably enveloped. Has a ssRNA genome
Does not grow in cell culture, but can infect Chimpanzees Incubation period 6-8 weeks
Causes a milder form of acute hepatitis than does hepatitis B
But 50% individuals develop chronic infection, following exposure.
1) Chronic liver disease
2) Hepatocellular carcinoma
Incidence endemic world-wide; high incidence in Japan, Italy and Spain

In South Africa, 1% blood donors have antibodies
Hepatitis D
Defective virus which requires Hepatitis B as a helper virus in order to replicate.
Infection therefore only occurs in patients who are already infected with Hepatitis
B.Increased severity of liver disease in Hepatitis B carriers. virus particle 36 nm in
7
diameter encapsulated with HBsAg, derived from HBV
delta antigen is associated with virus particles ssRNA genome
Identified in intra-venous drug abusers
Hepatitis G
A virus originally cloned from the serum of a surgeon with non-A, non-B, non-C
hepatitis, has been called Hepatitis G virus. It was implicated as a cause of parenterally
transmitted hepatitis, but is no longer believed to be a major agent of liver disease. It has
been classified as a Flavivirus
Hepatitis B
What is the Hepatitis B Virus?
The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body, once it gains access to the blood stream.
One of the most interesting features of the hepatitis B virus is that the virus itself does not
damage the liver, the damage being caused by the individual's own immune system
attacking the virus-infected cells. Since liver damage from the virus may be very little,
many patients are called healthy carriers. This means that although they may transmit the
disease to others, they have normal-appearing livers and normal liver function tests.
While many individuals remain healthy for many years or a lifetime, others develop
chronic hepatitis, cirrhosis, and occasionally liver cell cancer. These outcomes are linked
to the virus and its effects, although it is unlikely that the virus directly causes cancer.
Those patients who develop hepatitis (damage to liver cells with inflammation), do so on
account of the body's normal inclination to attack the foreign proteins contained in
viruses, and in the cells in which the viruses are found. This process, called the immune
response, determines the pace and the severity of the liver cell injury in this condition,
and will be described in more detail below.
Since the identification of the hepatitis B virus, several other viruses which are nearly
identical, have been identified in Eastern woodchucks, ground squirrels and Peking
8
ducks. The members of this virus family, termed the 'Hepadna' viruses, have similar life
cycles to that observed in man and can serve as animal models, allowing further study of
these unique disease-causing agents.
Classification and general features:
Family : hepadnaviridae
Genera : orthohepadnavirus(e.g.hepatitis B [HBV] of human ) ,Avihepadnavirus (e.g.

Duck hepatitis B virus)
Size 42nm Virions (also known as "Dane particles") contain a circular dsDNA genome.
Fig.hepatitis B virus structure
HBV Antigens
HBsAg = surface (coat) protein produced in excess as small spheres and tubules
9
HBcAg = inner core protein
HBeAg = secreted protein; function unknown.
Clinical Features
Incubation period 2 - 5 months
Insidious onset of symptoms. Tends to cause a more severe disease than Hepatitis A.
Asymptomatic infections occur frequently.
Pathogenesis
Infection is parenterally transmitted. The virus replicates in the liver and virus
particles, as well as excess viral surface protein, are shed in large amounts into the blood.
Viraemia is prolonged and the blood of infected individuals is highly infectious.
Complications
1) Persistant infection:-
Following acute infection, approximately 5% of infected individuals fail to eliminate the
virus completely and become persistantly infected.
Those who are at particular risk include:

babies, young children
immunocompromised patients
males > females
The virus persists in the hepatocytes and on-going liver damage occurs because of the
host immune response against the infected liver cells.
Chronic infection may take one of two forms:

Chronic persistent Hepatitis - the virus persists, but there is minimal liver damage
10
Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid
progression to cirrhosis or liver failure. Patients who become persistently infected are at
risk of developing hepatocellular carcinoma (HCC).
HBV is thought to play a role in the development of this malignancy because:
a) 80% of patients with HCC are carriers of hepatitis B.

b) Virus DNA can be identified in hepatocellular carcinoma cells.
c) Virus DNA can integrate into the host chromosome.
3) Fulminant Hepatitis
Rare; accounts for 1% of infections.
Epidemiology
Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B, 50 million of which
are in Africa. Carriage rates vary markedly in different areas. In South Africa, infection is
much more common in rural communities than in the cities. Hepatitis B is parenterally
transmitted
1) Blood:
• Blood transfusions, serum products,

• sharing of needles, razors
• Tattooing, acupuncture
• Renal dialysis
• Organ donation
2) Sexual intercourse
3) Horizontal transmission in children, families, 'close personal contact'.

This is the major mode of transmission in South Africa where the majority of individuals
11
become infected at between three and nine years of age.
Horizontal transmission also occurs in children's institutions and mental homes.
4) Vertical transmission - perinatal transmission from a carrier mother to her baby
• Tran placental (rare)

• during delivery
• Post natal , ?? breast feeding , ??close contact
(This is the major mode of transmission in South East Asia)
Diagnosis: Serology
Acute infection with resolution
Viral antigens:
1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules. Its presence in serum indicates that virus replication is occurring in the liver
2) 'e' antigen (HBeAg) secreted protein is shed in small amounts into the blood. Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody response:
1) Surface antibody (anti-HBs) becomes detectable late in convalescence, and indicates
immunity following infection. It remains detectable for life and is not found in chronic
carriers (see below).
2) e antibody (anti-HBe) becomes detectable as viral replication falls. It indicates low
infectivity in a carrier.
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM, and remains present for life in both chronic carriers as
well as those who clear the infection. Its presence indicates exposure to HBV.of the
chronic carrier
12
Fig.Hepatitis B virus in serum.
Prevention
1) Active Immunization
Two types of vaccine are available:

Serum derived - prepared from HBsAg purified from the serum of HBV carriers
Recombinant HBsAg - made by genetic engineering in yeasts
Both vaccines are equally safe and effective. The administration of three doses induces
protective levels of antibodies in 95% of vaccine recipients.
Universal immunization of infants was introduced in April 1995. Infants receive 3 doses
at 6, 10 and 14 weeks of age.
Vaccine should be administered to people at high risk of infection with HBV:

1) Health care workers
2) Sexual partners of chronic carriers
3) Infants of HBV carrier mothers
2) Passive Antibody
Hepatitis B immune globulin should be administered to non immune individuals
13
following single episode exposure to HBV-infected blood. For example: needlestick
injuries.
What is Hepatitis B Infection Like?
When most individuals become infected with the hepatitis B virus, they are not aware of
the infection for several weeks, until they develop symptoms of acute hepatitis, such as
nausea, fatigue and jaundice (yellowing of the eyes). The acute hepatitis phase may last
for several weeks and occasionally leads to hospitalization, but acute hepatitis B resolves
completely in 95% of those infected.
Others who do not develop significant symptoms following exposure
may not be aware of the infection. These individuals may also overcome the infection
completely and develop immunity, but frequently become chronic carriers.
The outcome of hepatitis B infection depends to a great extent on
the status of the person's immune system at the time of exposure. Most chronic carriers or
those with chronic hepatitis B are not aware of their on-going infection, although some
have persistent fatigue.
Molecular virology
Genome : circular and 3.2kb in size, double stranded.It has compact
14
Fig. hepatitis B virus genome
organization, with four overlapping reading frames running in one direction and no
noncoding regions. The minus strand is unit length and has a protein covalently attached
to the 5' end. The other strand, the plus strand, is variable in length, but has less than unit
length, and has an RNA oligonulceotide at its 5' end. Thus neither DNA strand is closed
and circularity is maintained by cohesive ends (Strauss, 2002). The four overlapping open
reading frames (ORFs) in the genome are responsible for the transcription and expression
of seven different hepatitis B proteins. The transcription and translation of these proteins
15
is through the used of multiple in-frame start codons. The HBV genome also contains
parts that regulate transcription, determine the site of polyadenylation and a specific
transcript for encapsidation into the nucleocapsid.
Life cycle
In order to reproduce, the hepatitis B virus, must first attach onto a cell which is capable of
supporting its replication. Although hepatocytes are known to be the most effective cell type for
replicating HBV, other types of cells in the human body have be found to be able to support
replication to a lesser degree.
The initial steps following HBV entry are not clearly defined although it is
known that the virion initially attaches to a susceptible hepatocyte through recognition of cell
surface receptor that has yet to be indified (Garces, HBVP). The DNA is then enters into the
nucleus, where it is known to form a convalently close circular form called cccDNA
. The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic
transcripts, all of which serve as mRNAs. The shorter viral mRNAs are translated by ribosomes
attached to the cell's endoplasmic reticulum and the proteins that are destined to become HBV
surface antigens in the viral envelope are assembled.
The pregenome RNA is translated to produce a polymerase protein, P, which then

binds to a specific site at the 3' end of its own transcript, where viral DNA synthesis eventually
occurs. Occuring at the same time as capsid formation, the RNA-P protein complex is packaged
and reverse transcription begins.
At early times after the infection, the DNA is recirculated to the nucleus,
where the process is repeated, resulting in the the accumulation of 10 to 30 molecules of CCC
DNA and an increase in viral mRNA concentrations (Flint et al., 765).
16
Fig. HBV life cycle
17
The hepatitis B virion, also known as the Dane particle, is the one infectious particle
found within the body of an infected patient. This virion has a diameter of 42nm and its
outer envelope contains a high quantity of hepatitis b surface proteins. The envelope
surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins
arranged in an icosahedral arrangement. The nucleocapsid also contains at least one
hepatitis b ploymerase protein (P) along with the HBV genome.
In infected people, virions actually compose a small minority of HBV-derived particles.
Large numbers of smaller subviral particles are also present,that usually outnumber the
virions in the ratio of 100:1.These two subviral particles the hepatitis B filament and a
hepatitis B sphere,are often referred to as a group named surface antigen particles.The
sphere contains both middle and small surface proteins whereas the filament also
includes large hepatitis B surface protein lso includes large hepatitis B surface protein.
The absence of the hepatitis B core, polymerase, and genome causes these particles to
have a non-infectious nature. High levels of these non-infectious particles can be found
during the acute phase of the infection. Since the non-infectious particles present the
same sites as the virion, they induce a significant immune response and are thought to be
non-advantagous for the virus. However, it is also believed that the presence of high
levels of non-infectious particles may allow the infectious viral particles to travel
undetected by antibodies through the blood stream (Garces, HBVP
Hepatitis B Antigens:
There are three different types of hepatitis b antigens encoded by the HBV genome-
.Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B
surface antigens; small hepatitis B surface antigen (HBsAg or SHBsAg), middle hepatitis
B surface antigen (MHBsAg), and large hepatitis B surface Antigen (LHBsAg). HBsAg
is the smallest protein of the hepatitis B surface proteins and has historically been known
as the Australia antigen (Au antigen). It is very hydrophobic, containing four-
transmembrane spanning regions. This protein is the prime constituent of all hepatitis b
particle forms and appears to be manufactured by the virus in high quantities. It also
contains a highly antigenic epitope which may be responsible for triggering immune
18
response. Regardless of the high Antigenicity and prevalence of these particles,the
immune system appears basically oblivious to their presence.
Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected
directly by blood test, this antigen can only be isolated by analyzing an infected
hepatocyte. A 185 amino acid protein is expressed in the cytoplasm of infected cells, they
are highly associated with nucleocapsid assembly (Strauss, 2002).
Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its "early" appearance
during an acute HBV infection. Thought to be located in the core structure of the virus
molecule, this antigen can be detected by blood test. If found its usually indicative of
complete virus particles in circulation. (Strauss, 2002)
19
20
REVIEW OF LITERATURE
Approximately 5% of the world population is infected by the hepatitis B virus (HBV) that
causes a necroinflammatory liver disease of variable duration and severity. Chronically
infected patients with active liver disease carry a high risk of developing cirrhosis and
hepatocellular carcinoma.
Hepatitis B is caused by hepatitis B (HBV ),double –stranded circular DNA virus of

Complex structure . HBV is classified as orthohepadnavirus within the family
Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen
particle: a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)
called Dane particle, and tubular or filamentous that vary in length. These are infective
form of virus.The hepatitis B is normally transmitted by blood transfusion, contaminated
equipment, drug users’ unsterile needle ,or any body secretion.[1]
The immune response to HBV-encoded antigens is responsible both for viral clearance
and for disease pathogenesis during this infection. While the humoral antibody response
to viral envelope antigens contributes to the clearance of circulating virus particles, the
cellular immune response to the envelope, nucleocapsid, and polymerase antigens
eliminates infected cells.
. The dominant cause of viral persistence during HBV infection is the development of a
weak antiviral immune response to the viral antigens. While neonatal tolerance probably
plays an important role in viral persistence in patients infected at birth, the basis for poor
responsiveness in adult-onset infection is not well understood and requires further
analysis. Viral evasion by epitope inactivation and T cell receptor antagonism may
contribute to the worsening of viral persistence in the setting of an ineffective immune
response, as can the incomplete downregulation of viral gene expression and the infection
of immunologically privileged tissues. Chronic liver cell injury and the attendant
inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for
21
the development of DNA damage that can cause hepatocellular carcinoma. Elucidation of
the immunological and virological basis for
HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae.[2]
Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver.
Transient infections run a course of several months, and chronic infections are often
lifelong. Chronic infections can lead to liver failure with cirrhosis and hepatocellular
carcinoma. The replication strategy of these viruses has been described in great detail, but
virus-host interactions leading to acute and chronic disease are still poorly understood.
Studies on how the virus evades the immune response to cause prolonged transient
infections with high-titer viremia and lifelong infections with an ongoing inflammation of
the liver are still at an early stage, and the role of the virus in liver cancer is still elusive.
The state of knowledge in this very active field is therefore reviewed with an emphasis on
past accomplishments as well as goals for the future [3]
(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and
tubules. Its presence in serum indicates that virus replication is occurring in the liver
2) 'e' antigen (HBeAg) secreted protein is shed in small amounts into the blood. Its
presence in serum indicates that a high level of viral replication is occurring in the liver
3) core antigen (HBcAg) core protein is not found in blood
Antibody
1) Surface antibody (anti-HBs) becomes detectable late in convalescence, and indicates
immunity following infection. It remains detectable for life and is not found in chronic
carriers .
2) e antibody (anti-HBe) becomes detectable as viral replication falls. It indicates low
infectivity in a carrier.
3) Core IgM rises early in infection and indicates recent infection
4) Core IgG rises soon after IgM, and remains present for life in both chronic carriers as
22
well as those who clear the infection. Its presence indicates exposure to HBV. of the
chronic carrier.[4]
Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates. The procedure involves
the identification of possible templates that have a clear sequence relationship to the
query, the assembly of the model, the prediction of regions of the structure that are likely
to have different conformations than the templates (e.g., loops), and ultimately, the
refinement of the structure in an attempt to account for inherent differences between the
template and query structures. As mentioned above, homology modeling figures heavily
as a rationale for structural genomics initiatives under the stated assumption that accurate
models can be built for query sequences that have a greater than 30% sequence identity
with their best template.
The quality of the alignment of the query to the template sequence is a major factor in
determining the quality of homology models. This is one of the sources of the 30% rule,
because alignment quality usually decreases dramatically below about 30% sequence
identity. (A structural explanation for this observation has been offered by Chung and
Subbiah, 1996). Advances in the accuracy of sequence alignments using structure-based
profile methods such as those described above should result in continuing improvements
in the quality of homology models. [5,6]
With the number of protein-ligand complexes available in the Protein Data Bank
constantly growing, structure-based approaches to drug design and screening have
become increasingly important. Alongside this explosion of structural information, a
number of molecular docking methods have been developed over the last years with the
aim of maximally exploiting all available structural and chemical information that can be
derived from proteins, from ligands, and from protein-ligand complexes. In this respect,
the term 'guided docking' is introduced to refer to docking approaches that incorporate
23
some degree of chemical information to actively guide the orientation of the ligand into
the binding site. To reflect the focus on the use of chemical information, a classification
scheme for guided docking approaches is proposed. In general terms, guided docking
approaches can be divided into indirect and direct approaches. Indirect approaches
incorporate chemical information implicitly, having an effect on scoring but not on
orienting the ligand during sampling. In contrast, direct approaches incorporate chemical
information explicitly, thus actively guiding the
orientation of the ligand during sampling. Direct approaches can be further divided into
protein-based, mapping-based, and ligand-based approaches to reflect the source used to
derive the features capturing the chemical information inside the protein cavity. Within
each category, a representative list of docking approaches is discussed. In view of the
limitations of current scoring functions, it was generally found that making optimal use of
chemical information represents an efficient knowledge-based strategy for improving
binding affinity estimations, ligand binding-mode predictions, and virtual screening
enrichments obtained from protein-ligand docking. [7]
This review gives an introduction into ligand - receptor docking and illustrates the basic
underlying concepts. An overview of different approaches and algorithms is provided.
Although the application of docking and scoring has led to some remarkable successes,
there are still some major challenges ahead, which are outlined here as well. Approaches
to address some of these challenges and the latest developments in the area are presented.
Some aspects of the assessment of docking program performance are discussed. A
number of successful applications of structure-based virtual screening are described. [8]
24
25
Material and methods
Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science. It involves the technology that uses computers for storage,
retrieval, manipulation and distribution of information related to biological
macromolecules such as DNA, RNA and proteins.
Bioinformatics is limited to sequence, structural, and functional analysis of genes and
genomes and their corresponding products and is often considered computational
molecular biology. It consists of two subfields: the development of computational tools
and databases and the application of these tools and databases in generating biological
knowledge to better understand living systems. These tools are used in three areas of
genomic and molecular biological research: molecular sequence analysis,
molecular structural analysis and molecular functional analysis.
1. NCBI-
Established in 1988 as a national resource for molecular biology information, NCBI
creates public databases, conducts research in computational biology, develops
software tools for analyzing genome data, and disseminates biomedical information -
all for the better understanding of molecular processes affecting human health and
disease
Swiss-prot-
: a curated protein sequence database which strives to provide a
high level of annotation (such as the description of the function of a
26
protein, its domains structure, post-translational modifications,
variants, etc.), a minimal level of redundancy and high level of
integration with other databases
2. Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary

Accession number-Q8IVS8 , EC 2.7.1.31,from human. sabcelular location –cytoplasm
Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc .
3. FASTA
FASTA is a DNA and Protein sequence alignment software package first described (as
FASTP) by David J. Lipman and William R. Pearson in 1985 in the article Rapid and
sensitive protein similarity searches. The original FASTP program was designed for
protein sequence similarity searching. FASTA, described in 1988 (Improved Tools for
Biological Sequence Comparison) added the ability to do DNA:DNA searches, translated
protein:DNA searches, and also provided a more sophisticated shuffling program for
evaluating statistical significance. There are several programs in this package that allow
the alignment of protein sequences and DNA sequences. FASTA is pronounced "FAST-
Aye", and stands for "FAST-All", because it works with any alphabet, an extension of
"FAST-P" (protein) and "FAST-N" (nucleotide) alignment.
The current FASTA package contains programs for protein:protein, DNA:DNA,

protein:translated DNA (with frameshifts), and ordered or unordered peptide searches.
Recent versions of the FASTA package include special translated search algorithms that
correctly handle frameshift errors (which six-frame-translated searches do not handle
very well) when comparing nucleotide to protein sequence data.
In addition to rapid heuristic search methods, the FASTA package provides SSEARCH,
an implementation of the optimal Smith-Waterman algorithm. A major focus of the
package is the calculation of accurate similarity statistics, so that biologists can judge
whether an alignment is likely to have occurred by chance, or whether it can be used to
infer homology. The FASTA package is available fromfasta.bioch.virginia.edu
27
4.BLAST
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for

comparing primary biological sequence information, such as the amino-acid sequences of
different proteins or the nucleotides of DNA sequences. A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences, and
identify library sequences that resemble the query sequence above a certain threshold.
5. Primary & secondary structure analysis
Using ProtParam-for primary structure

ProtParam computes various physico-chemical properties that can be deduced from a
protein sequence. No additional information is required about the protein under
consideration. The protein can either be specified as a Swiss-Prot/TrEMBL accession
number or ID, or in form of a raw sequence. White space and numbers are ignored. If you
provide the accession number of a Swiss-Prot/TrEMBL entry, you will be prompted with
an intermediary page that allows you to select the portion of the sequence on which you
would like to perform the analysis. The choice includes a selection of mature chains or
peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking
on the positions), as well as the possibility to enter start and end position in two boxes.
By default (i.e. if you leave the two boxes empty) the complete sequence will be
analyzed.
It calculate following parameter --
• extinction coefficient
• half-life
• instability index
• aliphatic index
28
Using SOPMA for secondry structure analysis
Recently a new method called the self-optimized prediction method (SOPM) has been
described to improve the success rate in the prediction of the secondary structure of
proteins. In this paper we report improvements brought about by predicting all the
sequences of a set of aligned proteins belonging to the same family. This improved SOPM
method (SOPMA) correctly predicts 69.5% of amino acids for a three-state description of
the secondary structure ( -helix, ß-sheet and coil) in a whole database containing 126
chains of non-homologous (less than 25% identity) proteins. Joint prediction with
SOPMA and a neural networks method (PHD) correctly predicts 82.2% of residues for
74% of co-predicted amino acids. Predictions are available by Email to deleage@ibcp.fr
or on a Web page (http://www.ibcp.fr/predict.html )
PROTOCOL FOLLOWED
29
Obtained the Receptor (Target Protein) from the literature references and available
journals available online and Pubmed literature for HBV strain
Retrieved the FASTA sequence of the protein HBeAg from the database
Swiss- Prot.
Retrieved the PDB-ID for template structure using BLAST: PDBID 2B8N and found
the similarity search.
Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and
modeled the receptor.
Validated modeled receptor using Structure Analysis Validation Server (SAVS).
Verified our model through different parameter like Ranachandran plot and other which
is available in SAVS
Selected the best Ligand from the Database KEGG for HBV disease.
Run the HEX and found the structure of drug molecule.
30
31
In protein struct
modeling, is a cla
from its amino ac
32
(29)
conserved, which may in turn lead to experiments to test those hypotheses. For example,
the spatial arrangement of conserved residues may suggest whether a particular residue is
conserved to stabilize the folding, to participate in binding some small molecule, or to
foster association with another protein or nucleic acid.
Figure : First, the known, template 3D structures are aligned with the target sequence to be
modelled. Second, spatial features, such as CZ - CZ distances, hydrogen bonds, and main chain and
side chain dihedral angles, are transferred from the templates to the target. Thus, a number of
spatial restraints on its structure are obtained. Third, the 3D model is obtained by satisfying all the
restraints as well as possible.
33
Homology modeling can produce high-quality structural models when the target and
template are closely related, which has inspired the formation of a structural genomics
consortium dedicated to the production of representative experimental structures for all
classes of protein folds. The chief inaccuracies in homology modeling, which worsen
with lower sequence identity, derive from errors in the initial sequence alignment and
from improper template selection Like other methods of structure prediction, current
practice in homology modeling is assessed in a biannual large-scale experiment known as
the Critical Assessment of Techniques for Protein Structure Prediction, or CASP.
Template selection and sequence alignment
The critical first step in homology modeling is the identification of the best template
structure, if indeed any are available. The simplest method of template identification
relies on serial pairwise sequence alignments aided by database search techniques such as
FASTA and BLAST. More sensitive methods based on multiple sequence alignment - of
which PSI-BLAST is the most common example - iteratively update their position-
specific scoring matrix to successively idenfity more distantly related homologs. This
family of methods has been shown to produce a larger number of potential templates and
to identify better templates for sequences that have only distant relationships to any
solved structure. Protein threading, also known as fold recognition or 3D-1D alignment,
can also be used as a search technique for identifying templates to be used in traditional
homology modeling methods. When performing a BLAST search, a reliable first
approach is to identify hits with a sufficiently low E-value, which are considered
sufficiently close in evolution to make a reliable homology model. Other factors may tip
the balance in marginal cases; for example, the template may have a function similar to
that of the query sequence, or it may belong to a homologous operon. However, a
template with a poor E-value should generally not be chosen, even if it is the only one
available, since it may well have a wrong structure, leading to the production of a
misguided model. A better approach is to submit the primary sequence to fold-
recognition servers or, better still, consensus meta-servers which improve upon individual
34
fold-recognition servers by identifying similarities (consensus) among independent
predictions.
Often several candidate template structures are identified by these approaches. Although
some methods can generate hybrid models from multiple templates, most methods rely
on a single template. Therefore, choosing the best template from among the candidates is
a key step, and can affect the final accuracy of the structure significantly. This choice is
guided by several factors, such as the similarity of the query and template sequences, of
their functions, and of the predicted query and observed template secondary structures.
Perhaps most importantly, the coverage of the aligned regions: the fraction of the query
sequence structure that can be predicted from the template, and the plausibility of the
resulting model. Thus, sometimes several homology models are produced for a single
query sequence, with the most likely candidate chosen only in the final step.
It is possible to use the sequence alignment generated by the database search technique as
the basis for the subsequent model production; however, more sophisticated approaches
have also been explored.
7. Molecular Docking
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a
ligand and its binding site.
There are two classes of protein docking:
1)Protein-protein docking
2)Protein Receptor-Ligand
Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size. The
interface between the two molecules tend to be flatter and smoother than those in protein-
ligand interactions. Protein-protein interactions are usually more rigid; the interfaces of
these interactions do not have the ability to alter their conformation in order to improve
35
binding and ease movement. Conformational changes are limited by steric constraint and
thus are said to be rigid.
Fig: Protein-Protein docking.

Protein Receptor–Ligand docking
Protein receptor-ligand motifs fit together tightly, and are often referred to as a lock and
key mechanism. There is both high specificity and induced fit within these interfaces with
specificity increasing with rigidity. Protein receptor-ligand can either have a rigid ligand
and a flexible receptor, or a flexible ligand with a rigid receptor.
Fig:Protein Ligand-Receptor Docking
Rigid Ligand with a Flexible Receptor
The native structure of the rigid ligand flexible receptor often maximizes the interface
area between the molecules. They move within respect to one another in a perpendicular
direction in respect to the interface. This allows for binding of a receptor with a larger
than usual ligand. Normally when there is ligand overlap in the docking interface, energy
penalties incur. If the van der Waals forces can be decreased, energy loss in the system
36
will be minimilized. This can be accomplished by allowing flexibility in the receptor.
Flexibility receptors allow for docking of a larger ligand than would be allowed for with
a rigid receptor.
Flexible Ligand with a Rigid Receptor
When the fit between the ligand and receptor does not need to be induced, the receptor
can retain its rigidity while maintaing the free energy of the system. For successful
docking, the parameters of the ligand need to be maintained and the ligand must be
slightly smaller in size than that of the receptor interface. No docking is completely rigid
though; there is intrinsic movement which allows for small conformational adaptation for
ligand binding. When the six degrees of freedom for protein movement are taken into
consideration (three rotational, three translational), the amount of inherent flexibility
allowed the receptor is even greater. This further offsets any energy penalty between the
receptor and ligand, allowing for easier, more enegetically favorable binding between the
two.
Aim of docking
The aim of docking is to find out the new drugs target, it will open new vistas for further
drug development .The finding of our docking will be useful in finding a cure for the
infectious disease bird flu, also it will open new avenues for finding other possible drug
targets in influenza A virus. The docking results can be used to design new lead
compounds and hence can aid in the new drug discovery process.
Receptor
A residue on the surface of the cell that serves as a recognition or binding site for
antigens,antibody or other cellular or immunological components.It is a molecule with in
a cell suface to which a substance (such as harmones or a drug ),selectively bind causing
a change in the activity of the cell.
Ligand
The molecule which binds to a protein molecule (eg, receptor). As a ligand binds through
the interaction of many weak, noncovalent bonds formed to the binding site of a protein,
the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid
37
residues on the protein.
Active Site
The active site of a protein/enzyme is the region that binds the substrates (and the
cofactor, if any). It also contains the residues that directly participate in the making and
breaking of bonds. These residues are called the catalytic groups. In essence, the
interaction of the enzyme and substrate at the active site promotes the formation of the
transition state. The active site is the region of the enzyme that most directly lowers the
Delta G of the reaction, which results in the rate enhancement characteristic of enzyme
action.
Amino acids in protein active sites:
It is difficult to generalize which amino acids are likely to be in a protein

active/functional site as this greatly depends on the type of function. With that in mind,
below are preferences for the 20 amino acids to lie within functional regions on proteins
These were worked out by considering how often particular amino acids were in contact
with bound non-protein atoms in protein three-dimensional structures. Postive values
mean that the amino acid makes more contacts than one would expect by chance;
negative values mean that it makes fewer. The below does not include protein-protein, or
protein-peptide interactions, where many of the amino acids with negative values (e.g.
tryptophan or proline) can play critical roles.
His 0.360 Tyr -0.040 Asp 0.045 Gly -0.070

Trp -0.140 Met 0.025 Val -0.060 Asn 0.080
Leu -0.180 Phe -0.120 Gln 0.050 Cys 0.210
Ile -0.005 Ala 0.025 Glu 0.050 Arg 0.055
Pro -0.200 Lys 0.100 Thr 0.100 Ser 0.130
RAMACHANDRAN PLOT
A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram ),
developed by Gopalasamudram Narayana Ramachandran, is a way to visualize dihedral
angles phi against (sai ) of amino acid residues in protein structure. It shows the possible
conformation of phi and
� shi angles for a polypeptide. In a polypeptide, the main chain
N-CZ and CZ- CZ bonds relatively are free to rotate. This plot is drawn between torsion
angles phi and psi. Ramachandran used computer models of small polypeptides to
38
systematically vary and with the objective of finding stable conformations. For each
conformation, the structure was examined for close contacts between atoms. Atoms were
treated as hard spheres with
dimensions corresponding to their Vander Waals radii. And the angles, which cause
spheres to collide,
correspond to sterically disallowed conformations of the polypeptide backbone.
SAVS (Structure analysis and validation server)
SAVS is a server for analyzing protein structures for validity and assessing how correct
they are. Depending on how many programs one select to use, the server can take several
minutes to run. It also depends on how many residues there are in the protein that is
submitted.
PROCHECK
The aim of PROCHECK is to assess how normal, or conversely how unusual, the
geometry of the residues in a given protein structure is ,as compared with stereo chemical
parameters derived from well-refined, high resolution structure. The checks also make
use of ‘ideal’ bond lengths and bond angles, as derived from a recent and comprehensive
analysis of small molecule structures in the Cambridge Structural Database (CSD).
INPUT
The input to PROCHECK is a single file containing the coordinates of the protein
structure. One of the by-products of running PROCHECK is that coordinate file will be
“cleaned up” by the first of the programs. The cleaning up process corrects any
mislabelled atoms and creates a new coordinates file which has a file–extension of
.new. .new file will have the atoms labelled in accordance with the IUPAC naming
convention.
OUTPUT
The output comprises of the plots, together with detailed residue-by-residue listing. It
generates number of output files in the default directory which have the same name as the
original PDB file, but with different extensions.
39
The residue-by residue listing has a, out extension and lists all the computed stereo
chemical properties, by residue, in a printable ASCII text file.
ENERGY MINIMIZATION
Energy is a function of the degree of freedom in a molecule (i.e. bonds, angels, and
dihedrals).Energy minimization can repair distorted geometries by moving atoms release
internal constraints. Energy minimization is good to release local constraints for a
residue, but it will not pass through high energy barriers and stop in a local minima.
The potential energy calculated by summing the energies of various interactions is a
numerical value for a single conformation. This number can be used to evaluate a
particular conformation, but it may not be a useful measure of a conformation because it
can be dominated by a few bad interactions. For instance, a large molecule with an
excellent conformation fro nearly all atoms can have a large overall energy because of a
single bad interactions, for instance two atoms too near each other space and having a
huge Vander wals repulsion energy. It is often preferable to carry out energyminimization
on a conformation to find the best nearby conformation. Energy minimization isusually
performed by gradient optimization: atoms are moved so as to reduce the net forces on
them. The minimized structure has small forces on each atom and therefore serves as an
excellent starting point for molecular dynamics simulations.
40
41
Result and discussion
1.Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)
Entry Information
Entry name GLCTK_HUMAN
Primary accession number Q8IVS8
Name and origin of the protein
Protein name Glycerate kinase

Synonyms EC 2.7.1.31
HBeAg-binding protein 4
Gene name Name: GLYCTK
Synonyms: HBEBP4
ORFNames: LP5910
From [TaxID:
Homo sapiens (Human)
9606]
Taxonomy Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi; Mammalia; Eutheria; Euarchontoglires;
Primates; Haplorrhini; Catarrhini; Hominidae; Homo.
Protein existence 2: Evidence at transcript level;
Blat result:-
List of potentially matching sequences:-

Include query sequence
Db AC Description Score E-value
pdb1QGT-C Chain C,(Hbcag)Human Hepatitis B Viral Capsid >gi|5 206 6e-54
pdb 2QIJ-C Chain C, Hepatitis B Capsid Protein With An N-Termina... 197 2e-51
pdb 2G33-C CAPSD_HBVD1 Chain C,Human T4 Capsid, Strain Ad... 192 6e-50
pdb 1TA3-B XIP1_WHEAT.. Chain B, Crystal Structure Of Xylanase (Gh10)

In Comp... 27 6.0
42
pdb 1AW9-A Chain A, Structure Of Glutathione S-Transferase Iii I 27 6.0
Graphical overview of the alignments
Primary structure prediction
By ProtParam:
GLCTK_HUMAN (Q8IVS8)
DE Glycerate kinase (EC 2.7.1.31) (HBeAg-binding

protein 4).
The computation has been carried out on the complete sequence (523 amino
acids).
.
Number of amino acids: 523
43
Molecular weight: 55252.6
Theoretical pI: 6.25
Amino acid composition:

Ala (A) 74 14.1%
Arg (R) 33 6.3%
Asn (N) 11 2.1%
Asp (D) 21 4.0%
Cys (C) 5 1.0%
Gln (Q) 32 6.1%
Glu (E) 28 5.4%
Gly (G) 51 9.8%
His (H) 16 3.1%
Ile (I) 15 2.9%
Leu (L) 81 15.5%
Lys (K) 10 1.9%
Met (M) 12 2.3%
Phe (F) 10 1.9%
Pro (P) 28 5.4%
Ser (S) 27 5.2%
Thr (T) 22 4.2%
Trp (W) 4 0.8%
Tyr (Y) 5 1.0%
Val (V) 38 7.3%
Pyl (O) 0 0.0%
Sec (U) 0 0.0%
(B) 0 0.0%
(Z) 0 0.0%
(X) 0 0.0%
Total number of negatively charged residues (Asp + Glu): 49

Total number of positively charged residues (Arg + Lys): 43
Atomic composition:
Carbon C 2435
Hydrogen H 3967
Nitrogen N 711
44
(41)
Oxygen O 719
Sulfur S 17
Formula: C2435H3967N711O719S17
Total number of atoms: 7849
Extinction coefficients:
Extinction coefficients are in units of M-1 cm-1, at

280 nm measured in water.
Ext. coefficient 29700

Abs 0.1% (=1 g/l) 0.538, assuming ALL Cys residues
appear as half cystines
Ext. coefficient 29450

Abs 0.1% (=1 g/l) 0.533, assuming NO Cys residues
appear as half cystines
Secondary structure prediction
By SOPMA result for : UNK_158250
View SOPMA in:
10 20 30 40 50
60 70
| | | | |
| |
MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGP
MLHRALSLDPGGRQLKV
hhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccth
hhhhhhhhcttcceeee
(42)
RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERA
GKQEMLLKPHSRVQVFE
45
ccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeecccccccccht
tchheeeccccceeeee
GAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLE
EKQTLTRLLAARGATIQ
eccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchh
hhhhhhhhhhhttcchh
ELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHN
VQDCLHILNRYGLRAAL
hhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccc
hhhhhhhhhhhtccccc
PRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVL
SAAMQGDVKSMAQFYGL
chhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeee
ehhhhtchhhhhhhhhh
LAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVC
LLAGGEPTVQLQGSGRG
hhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhcccccee
eeettcceeeeeccccc
GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQA
AAEGLDIATFLAHNDSH
ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhh
hhttcchhhhhhccccc
TFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR
hhhhhhhttcheeeecccccccchheeeeecct
Sequence length : 523
SOPMA :
Alpha helix (Hh) : 235 is 44.93%
310 helix (Gg) : 0 is 0.00%
Pi helix (Ii) : 0 is 0.00%
Beta bridge (Bb) : 0 is 0.00%
Extended strand (Ee) : 80 is 15.30%
Beta turn (Tt) : 36 is 6.88%
Bend region (Ss) : 0 is 0.00%
Random coil (Cc) : 172 is 32.89%
Ambigous states (?) : 0 is 0.00%
Other states : 0 is 0.00%
(43)
46
Parameters :
Window width : 17
Similarity threshold : 8
Number of states : 4
Multiple sequence alignment
ClustalW2 Results
1. Number of sequences 10
2. Alignment score 28565
3. Sequence format Pearson
4. Sequence type aa
5. Output file clustalw2-20080510-

09552541.output
6. Alignment file clustalw2-20080510-

09552541.aln
7. Guide tree file clustalw2-20080510-

09552541.dnd
47
8. Your input file clustalw2-20080510-
09552541.input
Scores Table
SeqA Name Len(aa) SeqB Name

Len(aa) Score
=======================================================================
====
1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3
1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3
1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3
1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3
1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3
1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3
1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3
1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3
1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2
2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305
21
2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305
21
2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217
91
2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214
66
2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214
65
(45)
2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214
65
2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214
65
2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214
65
3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305
97
3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217
24
3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214
26
3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214
27
3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214
25
3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214
26
48
3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214
26
4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217
25
4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214
26
4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214
27
4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214
25
4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214
26
4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214
26
5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214
70
5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214
69
5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214
69
5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214
69
5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214
69
6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214
98
6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214
98
6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214
98
6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214
98
7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214
98
7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214
97
7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214

98
8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214
97
8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214
98
9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214
97
===========================================================================
Alignment
CLUSTAL 2.0.5 multiple sequence alignment
49
P17099|HBEAG_HBVA4
------------------------------------------------------------
Q91C37|HBEAG_HBVA6
------------------------------------------------------------
P0C692|HBEAG_HBVA2
------------------------------------------------------------
P0C625|HBEAG_HBVA3
------------------------------------------------------------
Q81105|HBEAG_HBVA5
------------------------------------------------------------
Q64896|HBEAG_ASHV
------------------------------------------------------------
P03153|HBEAG_GSHV
------------------------------------------------------------
P03154|HBEAG_DHBV1
------------------------------------------------------------
P0C6J9|HBEAG_DHBV3
------------------------------------------------------------
Q8IVS8|GLCTK_HUMAN
MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60
P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-
CPTVQASKLCLGWLWG-------M 30
Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-
P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-
P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-
Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-
CPTFQASKLCLGWLWG-------M 30
Q64896|HBEAG_ASHV
----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31
P03153|HBEAG_GSHV
----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31
P03154|HBEAG_DHBV1
----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38
P0C6J9|HBEAG_DHBV3
----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38
Q8IVS8|GLCTK_HUMAN
LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120
: : : . . . . * . :
P17099|HBEAG_HBVA4 DIDP------------------------
YKEFGATVELLSF------------------- 47
Q91C37|HBEAG_HBVA6 DIDP------------------------
YKEFGATVELLSF------------------- 47
P0C692|HBEAG_HBVA2 DIDP------------------------
YKEFGATVELLSF------------------- 47
P0C625|HBEAG_HBVA3 DIDP------------------------
YKEFGATVELLSF------------------- 47
Q81105|HBEAG_HBVA5 DIDP------------------------
YKEFGATVELLSF------------------- 47
Q64896|HBEAG_ASHV DIDP------------------------
YKEFGSSYQLLNF------------------- 48
50
P03153|HBEAG_GSHV DIDP------------------------
YKEFGSSYQLLNF------------------- 48
P03154|HBEAG_DHBV1 DSCL------------------------
YMDINASRALANVYD----------------- 57
P0C6J9|HBEAG_DHBV3 DSCL------------------------
YMDINASRALANVYD----------------- 57
Q8IVS8|GLCTK_HUMAN
ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180
: : :: : ..
P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------
PEHCSPHHTALR 85
Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------
PEHCSPHHTALR 85
P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------
PEHCSPHHTALR 85
P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------
PEHCSPHHTALR 85
Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------
PEHCSPHHTALR 85
Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------
REHCSPHHTAIR 86
P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------
REHCSPHHTAIR 86
P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------
KHVLIATHFVDLIEDFW 104
P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------
KHVLIATHFVDLIEDFW 104
Q8IVS8|GLCTK_HUMAN
ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240
** : *
:
P17099|HBEAG_HBVA4
QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
Q91C37|HBEAG_HBVA6
ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
P0C692|HBEAG_HBVA2
QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117
P0C625|HBEAG_HBVA3
QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
Q81105|HBEAG_HBVA5
QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117
Q64896|HBEAG_ASHV
QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118
P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-
TEEVRRIIVDH---------------------------- 117
P03154|HBEAG_DHBV1
QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164
P0C6J9|HBEAG_DHBV3
QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164
Q8IVS8|GLCTK_HUMAN
LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300
: : :
P17099|HBEAG_HBVA4
-------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164
Q91C37|HBEAG_HBVA6
-------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164
51
P0C692|HBEAG_HBVA2
P0C625|HBEAG_HBVA3
-------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164
Q81105|HBEAG_HBVA5
Q64896|HBEAG_ASHV
-------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165
P03153|HBEAG_GSHV
-------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164
P03154|HBEAG_DHBV1
TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224
P0C6J9|HBEAG_DHBV3
TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224
Q8IVS8|GLCTK_HUMAN
CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360
: : . * :: * . : : : :
*
P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------
RSPRRRTPSPR 195
Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------
RSPRRRTPSPR 195
P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------
RSPRRRTPSPR 195
P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------
RSPRRRTPSPR 195
Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------
RSPRRRTPSPR 195
Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------
RSPRRRTPSPR 199
P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------
RSPRRRTPSPR 198
P03154|HBEAG_DHBV1
DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
P0C6J9|HBEAG_DHBV3
DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284
Q8IVS8|GLCTK_HUMAN
PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420
: :: . : . : . :
.
P17099|HBEAG_HBVA4
RRRSQSPRRRRSQSRESQC----------------------------------------- 214
Q91C37|HBEAG_HBVA6
P0C692|HBEAG_HBVA2
P0C625|HBEAG_HBVA3
RRRSPSPRRRRSQSRESQC----------------------------------------- 214
Q81105|HBEAG_HBVA5
Q64896|HBEAG_ASHV RRRSQSPRRR-
PQSPASNC----------------------------------------- 217
P03153|HBEAG_GSHV
RRRSQSPRRRRSQSPASNC----------------------------------------- 217
P03154|HBEAG_DHBV1
RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305
P0C6J9|HBEAG_DHBV3
RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305
Q8IVS8|GLCTK_HUMAN
GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480
52
. . ..
P17099|HBEAG_HBVA4 -------------------------------------------
Q91C37|HBEAG_HBVA6 -------------------------------------------
P0C692|HBEAG_HBVA2 -------------------------------------------
P0C625|HBEAG_HBVA3 -------------------------------------------
Q81105|HBEAG_HBVA5 -------------------------------------------
Q64896|HBEAG_ASHV -------------------------------------------
P03153|HBEAG_GSHV -------------------------------------------
P03154|HBEAG_DHBV1 -------------------------------------------
P0C6J9|HBEAG_DHBV3 -------------------------------------------
Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523
Guide Tree
(
(
(
(
(
(
Q8IVS8|GLCTK_HUMAN:0.59519,
(
P03154|HBEAG_DHBV1:0.01176,
P0C6J9|HBEAG_DHBV3:0.01119)
:0.36054)
:0.21341,
(
Q64896|HBEAG_ASHV:0.05849,
P03153|HBEAG_GSHV:0.02446)
:0.12844)
:0.14364,
Q81105|HBEAG_HBVA5:0.01168)
:0.00175,
P0C625|HBEAG_HBVA3:0.00818)
:0.00110,
P0C692|HBEAG_HBVA2:0.00445)
:0.00022,
P17099|HBEAG_HBVA4:0.00942,
Q91C37|HBEAG_HBVA6:0.00927);
Phylogram
Tertiary structure prediction:
53
 pdb 1QGT-C was selected as template which showed around 85.6% identity with
target sequence and the template structure was downloaded from the PDB.
 Swiss-PdbViewer was launched and the following procedure was carried out.
Steps involved in SPDBV:
 open the template structure from file (.pdb file)

 choose icon - 'Swiss model'-'load the raw target sequence’
 choose icon -'fit'-'fit raw sequence' then 'magic fit' then 'iterative fit'
 choose icon -'file' - 'save'-'layer'(".pdb")
 choose icon -'file' - 'save'-'project'(".pdb")
 choose icon - 'Swiss model'-'submit modeling request'(A new browser will be
opened loading the pdb file and give the Email ID for receiving the modeled
structure)
 open the new structure (received from Email) - remove the template-by selecting the
target.
 choose icon -'file' - 'save'-'layer'(".pdb")
 Open Swiss model and select load raw sequence option to load target molecule.
54
 Perform magic fit, iterative fit provided under FIT in order to fit the two
sequences.
 Save the file as the project
55
 Select “submit modeling request” under Swiss model to submit it for modeling.
Homologous modeling:
Optimise Mode Request submission form
Please fill these fields:
Your Email address: Gunjan300@gmail.com (MUST be correct!)

Your Name : Gunjan
Request title : Gunjan project Will be added to the results header.
Your SWISS-MODEL project file can be found in:
C:\Documents and Settings\user\Desktop\proj_kumar.pdb
Workunit: P000044 Title:Q8IVS8
56
SWISS MODEL WORKSPACE
Model information
modelled residue range 83 to 514
based on template 2b8nA (2.53 Å)
Sequence Identity [%]: 34
Evalue: 2.70e-52
click on model bars
Fig. structure of template after modeling .
Alignment
TARGET 83 LV GFGKAVLGMA AAAEELLGQH
2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr
57
TARGET hh sss sssss hhh
2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh
TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA

2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt
TARGET hhhhhhhhh ssssss sssss hhhhh

2b8nA hhhhhhhhh ssssss sssss hhhhh
TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA

2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk
TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh
TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS
2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias
TARGET h hhhhhh hhh sss hhhhhh sssssss

2b8nA h hhhhhh hhh sss hhhhhh sssssss
TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV

2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv
TARGET h hhhhhhhhhh hh hhhh sss sssss

2b8nA h hhhhhhhhhh hhh hhhh ss
TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV

2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke
TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh
TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL

2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv
TARGET hhh ssssssssss s hhhhhhhhh hhh ss

2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss
TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ

2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk
TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh
58
2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh
TARGET 455 DGPTEAAGAW VTPELASQAA AEGL

2b8nA 394 ksgallitgp tgtnvndlii gliv-
TARGET h sss sssss ssss

2b8nA sss sssss ssss
Model Validation:
INTRODUCTION
Structure Analysis and Validation Server greatly simplifies computational analysis of the
molecular structure and sequence of proteins. The stereochemical validation of model
structures of proteins is an important part of the comparative molecular modeling
process. Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino
acid residues in protein structure. It shows the possible conformations of φ and ψ angles
for a polypeptide. The Ramachandran plot displays the psi and phi backbone
conformational angles for each residue in a protein. The distance between two succession
alpha carbon atoms in the backbone chain and the angles between the two bonds of such
atoms in desired protein can be determined using this plot.
Software
SAVS: http://nihserver.mbi.ucla.edu/SAVS/
Procedure
The target protein structure obtained after homology modeling using deep view and
modeler is given as input for SAVS.
59
SAVES results for proj_gunjan.pdb
Procheck summary
RAMCHANDRAN POLT:
Result -----
Plot statistics SCORE %age
Residues in most favoured regions [A,B,L] 990 85.6%
Residues in additional allowed regions [a,b,l,p] 104 9.0%
Residues in generously allowed regions [~a,~b,~l,~p] 11 1.0%
Residues in disallowed regions 51 4.4%
---- ---- ------------------
Number of non-glycine and non-proline residues 1156 100.0%
Number of end-residues (excl. Gly and Pro) 8
60
Number of glycine residues (shown as triangles) 127
(59)
Number of proline residues 60

----
Total number of residues 1351
Based on an analysis of 118 structures o

and R-factor no greater than 20%, a good quality model would be expected
to have over 90% in the most favoured regions.
Docking result ---- by Hex software
Fig.Ligand & Receptor (2B8N)
61
Fig . after docking
DoHex 5.0 starting at Fri May 16 09:16:19 2008 on host WORK-A7E7353059.
Running HEX_STARTUP file: C:\Program Files\Hex 5.0\data\startup_v5.mac

Disc Cache enabled. Using directory: C:\Program Files\Hex 5.0\cache
Assuming C:\Program Files\Hex 5.0/examples\2B8N.pdb is a PDB file...
Opened PDB file: C:\Program Files\Hex 5.0/examples\2B8N.pdb, ID = 2B8N

*Warning* Can't add all hydrogens to incomplete residue: A 253:LYS
*Warning* Can't add all hydrogens to incomplete residue: A 316:HIS
*Warning* Can't add all hydrogens to incomplete residue: B 8:LYS
*Warning* Can't add all hydrogens to incomplete residue: B 36:ASN
62
*Warning* Can't add all hydrogens to incomplete residue: B 65:ARG
*Warning* Can't add all hydrogens to incomplete residue: B 316:HIS
*Warning* Can't add all hydrogens to incomplete residue: B 380:TYR
*Warning* Can't add all hydrogens to incomplete residue: B 404:THR
PDB structure has crystal symmetry elements.
PDB structure has biological symmetry elements.
Loaded PDB file: C:\Program Files\Hex 5.0/examples\2B8N.pdb, (927 residues, 7597
atoms, 1 models)
*Warning* Fractional charge (0.35) for non-terminal residue: A 52:MSE
MSE:N Radius = 1.40, Charge = -0.52
MSE:CA Radius = 1.50, Charge = 0.14
MSE:C Radius = 1.40, Charge = 0.53
MSE:O Radius = 1.50, Charge = -0.50
MSE:CB Radius = 1.70, Charge = 0.04
MSE:CG Radius = 1.70, Charge = 0.09
MSE:SE Radius = 1.90, Charge = 0.32
MSE:CE Radius = 1.90, Charge = 0.01
MSE:H Radius = 0.00, Charge = 0.25
*Warning* Fractional charge (0.41) for non-terminal residue: A 82:ASP
ASP:N Radius = 1.40, Charge = -0.52
ASP:CA Radius = 1.50, Charge = 0.25
ASP:C Radius = 1.40, Charge = 0.53
ASP:O Radius = 1.50, Charge = -0.50
ASP:CB Radius = 1.70, Charge = -0.21
ASP:CG Radius = 1.40, Charge = 0.62
ASP:H Radius = 0.00, Charge = 0.25
*Warning* Fractional charge (0.34) for non-terminal residue: A 318:LYS
LYS:N Radius = 1.40, Charge = -0.52
LYS:CA Radius = 1.50, Charge = 0.23

LYS:C Radius = 1.40, Charge = 0.53
LYS:O Radius = 1.50, Charge = -0.50
63
LYS:CB Radius = 1.70, Charge = 0.04
LYS:CG Radius = 1.70, Charge = 0.05
LYS:CD Radius = 1.70, Charge = 0.05
LYS:CE Radius = 1.70, Charge = 0.22
LYS:H Radius = 0.00, Charge = 0.25
*Warning* Fractional charge (0.12) for non-terminal residue: B 8:LYS

64
*Warning* Fractional charge (0.35) for non-terminal residue: B 52:MSE
*Warning* Fractional charge (-0.21) for non-terminal residue: B 81:ASP
65
*Warning* Fractional charge (0.23) for non-terminal residue: B 404:THR
THR:N Radius = 1.40, Charge = -0.52
THR:CA Radius = 1.50, Charge = 0.27
THR:C Radius = 1.40, Charge = 0.53
THR:O Radius = 1.50, Charge = -0.50
THR:CB Radius = 1.50, Charge = 0.21
THR:H Radius = 0.00, Charge = 0.25
Counted 104 +ve and 114 -ve formal charged residues: Net formal charge: -10
*Warning* Using PDB CONECT records to define non-standard bonds.
>2B8N A
PESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAY
EVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLN
ENDTVLFLLSG
GGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPA
KVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQE
TPKHLSNV
EIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDR
PLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTD
GPTDAAGGI
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKS
GALLITGPTGTNVNDLIIGLIV
>2B8N B
ENDTVLFLLSG
66
TPKHLSNV
GPTDAAGGI
VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIV
Assuming C:\Program Files\Hex 5.0/examples\2B8N.pdb is a PDB file...
Opened PDB file: C:\Program Files\Hex 5.0/examples\2B8N.pdb, ID = 2B8N

*Warning* Can't add all hydrogens to incomplete residue: A 316:HIS
*Warning* Can't add all hydrogens to incomplete residue: B 36:ASN
*Warning* Can't add all hydrogens to incomplete residue: B 65:ARG
*Warning* Can't add all hydrogens to incomplete residue: B 316:HIS
*Warning* Can't add all hydrogens to incomplete residue: B 380:TYR
*Warning* Can't add all hydrogens to incomplete residue: B 404:THR
PDB structure has crystal symmetry elements.
PDB structure has biological symmetry elements.
Loaded PDB file: C:\Program Files\Hex 5.0/examples\2B8N.pdb, (927 residues, 7597
atoms, 1 models)

*Warning* Fractional charge (0.41) for non-terminal residue: A 82:ASP
67
ASP:CG Radius = 1.40, Charge = 0.62
*Warning* Fractional charge (0.34) for non-terminal residue: A 318:LYS

68

69
*Warning* Fractional charge (-0.21) for non-terminal residue: B 81:ASP

*Warning* Fractional charge (0.23) for non-terminal residue: B 404:THR
THR:N Radius = 1.40, Charge = -0.52
THR:CA Radius = 1.50, Charge = 0.27
70
THR:C Radius = 1.40, Charge = 0.53
THR:O Radius = 1.50, Charge = -0.50
THR:CB Radius = 1.50, Charge = 0.21
THR:H Radius = 0.00, Charge = 0.25
Counted 104 +ve and 114 -ve formal charged residues: Net formal charge: -10
*Warning* Using PDB CONECT records to define non-standard bonds.
>2B8N A
ENDTVLFLLSG
TPKHLSNV
GPTDAAGGI
>2B8N B
ENDTVLFLLSG
TPKHLSNV
GPTDAAGGI
Found 223 MB main memory: setting N_MAX=33.
Check threefold = 0
Docking search mode = 6D rotation + translation (optimal).
Using intermolecular distance R12 = 0.00, rounded to 0.00

Setting distance range = 0.00 to 19.50, with steps of 0.75
Calculating surface skins: Grid = 0.60A
Contouring surface for molecule 2B8N.

Polar probe = 1.40A, Apolar probe = 1.40A
Gaussian sampling over 6149 atoms done in 2.86 seconds.
Contoured 338888 triangles (169444 vertices) in 1.30 seconds.

Culled 128559 short edges in 6 cycles in 4.34 seconds.
[71657,34961,15915,5330,685,11]
Surface traversal done in 0.23 seconds - Found 1 surface segments.
Primary surface: Area = 26350.96, Volume = 157921.76.
Culled 0 small segments in 0.27 seconds.
71
Culling reduced surface complexity by 75 per cent (81770 triangles, 40885 vertices).
Total contouring time: 6.14 seconds.
Contouring surface for molecule 2B8N.

Polar probe = 1.40A, Apolar probe = 1.40A
Gaussian sampling over 6149 atoms done in 2.81 seconds.
Contoured 338888 triangles (169444 vertices) in 1.30 seconds.
Culled 128559 short edges in 6 cycles in 4.36 seconds.
[71657,34961,15915,5330,685,11]
Surface traversal done in 0.22 seconds - Found 1 surface segments.
Primary surface: Area = 26350.96, Volume = 157921.76.
vm: 50.00 MB.
Culled 0 small segments in 0.27 seconds.
Culling reduced surface complexity by 75 per cent (81770 triangles, 40885 vertices).
Total contouring time: 6.14 seconds.
Sampling surface and interior volumes for molecule 2B8N.

Generated 201019 exterior and 216848 interior skin grid cells.
Exterior skin volume = 43420.10; interior skin volume = 46839.17.
Volume sampling done in 2.34 seconds.
Sampling surface and interior volumes for molecule 2B8N.
Generated 201019 exterior and 216848 interior skin grid cells.
Exterior skin volume = 43420.10; interior skin volume = 46839.17.
Volume sampling done in 1.36 seconds.
Calculating skin coefficients to N = 25...

Integration applied to 417867 cells: 4.64 per cent of the total grid volume.
Skin integration to N = 25 done in 43.95 seconds.
Docking will output a maximum of 500 solutions per pair...
------------------------------------------------------------------------------
Docking 1 pair of starting orientations...
Docking receptor: 2B8N and ligand: 2B8N...
Receptor 2B8N: Tag = 2B8N
Ligand 2B8N: Tag = 2B8N

Working buffer for 1000000 orientations: (27Mb)
Total 6D space: Iterate[27,812,1] x FFT[64,24,48] = 1616412672.

Initial rotational increments (N=16) Receptor: 812 (19Mb), Ligand: 1 (1Mb)
Loading all coefficient vectors into memory...
72
Coefficient rotations done in 0.91 seconds.
Starting 3D FFT: N=16.

Using Kiss FFT for multi-dimensional DFTs.
3D FFT setup: 0.00 s. 66 Mb memory.
Estart = 86212.55 KJ/mol (Eshape=86212.55, Eforce=0.00)
R = 0.00
R = 0.75
R = 1.50
R = 2.25
R = 3.00
R = 3.75
R = 4.50
R = 5.25
R = 6.00
R = 6.75
R = 7.50
R = 8.25
R = 9.00
R = 9.75
R = 10.50
R = 11.25
R = 12.00
R = 12.75
R = 13.50
R = 14.25
R = 15.00
R = 15.75
R = 16.50
R = 17.25
R = 18.00
R = 18.75
R = 19.50
Hex: 53.66 s, GF: 73.01 s, FFT: 277.87 s, Scan: 15.45 s, FFT Rate: 5817091/s.
Estart = 86212.55 -> rank 1
3D search found 0/1616412672 within threshold but NOT including start guess.
Done 21924 3D FFTs for 1616412672 orientations in 7 min, 0 sec (3848574/s).
Best start orientation [alpha=0] (Energy=0.00) is at 1/1.

Energy range: Emin = 0.00, Emax = 0.00
Top 1 orientations -> 3 after distance sub-sampling.

Working buffer for 3 orientations: (1Mb)
Surviving rotational steps (N=25) Receptor: 1 (1Mb), Ligand: 1 (1Mb)
Loading all coefficient vectors into memory...
73
Coefficient rotations done in 0.00 seconds.
Starting docking search with N=25, Nalpha=64/64.

Estart = 143501.36 KJ/mol (Eshape=143501.36, Eforce=0.00)
R = 0.00
R = 0.40
R = 0.75
Estart = 143501.36 -> rank 5
Main pass found 0 minima within threshold but NOT including start guess.
Main pass done in 0 min, 0 sec (1761/s).
Starting orientation [alpha=0] (Energy=143501.36) ranked 5 in the search.
Docked structures 2B8N:2B8N in a total of 7 min, 5 sec.
Best start orientation [alpha=0] (Energy=0.00) is at 1/1.

Energy range: Emin = 0.00, Emax = 0.00
Docking correlation summary by RMS deviation and steric clashes

-------------------------------------------------------------------------
Soln Etotal Eshape Eforce Eair RMS Bumps
---- --------- --------- --------- --------- ---------------- -----
------------------------------------------------------------------------------
Saving top 500 orientations.
Docking done in a total of 8 min, 11 sec.
------------------------------------------------------------------------------
No AIRs enabled or defined. Skipping restraint checks.

Clustering found 1 clusters from 1 docking solutions in 0.00 seconds.
---- ---- ------- ------- ------- ------- ------- ------ --- -----
Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS
---- ---- ------- ------- ------- ------- ------- ------- ------ --- -----
1 1 000:000 0.0 0.0 0.0 0.0 0.0 0.0 -1 -1.00
---------------------------------------------------------------------------
1 1 000:000 0.0 0.0 0.0 0.0 0.0 0.0 -1 -1.00
74
75
Conclusion
After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all
are closely related, they have an important role in survival in different species. It is interesting to
have closer look at the matter by studying at the gene level. A phylogenetic analysis can be very
helpful in understanding the evolutionary pattern
.We have noticed that same genes are present in all strains this shows that are they
evolved together..
With the finishing of the ongoing gene sequencing project on HBV, we
hope it will be possible to draw conclusive decision about the true picture of evolution in near
future and gene responsible for pathogenesis can also be identified.
Complete inference can only be drawn based on a comprehensive list
of the gene products and their function.
In order to find out unknown structure of protein present in the
different species we do homology modelling.. We forward step to present a theoretical model
using available online modelling tools.
As we study that HBeAG ( Glycerate kinase ) protein that is coded by
gene is one of the second reasons of pathogenicity of HBV. So we tried to dock this protein with
appropriate ligand, in order to inhibit their activity on the basis of which the drugs have to be
developed.
76
77
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries. The
present work might be small finding of big issue.
Phylogenetics is that field of biology which deals with identifying and understanding the
relationships between the many different kinds of life on earth. This includes methods for
collecting and analysing data, as well as interpretation of those results as new biological
information.
.
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the
drug more efficiently and with more effectiveness in future by analysing the modelled structure
of protein.
As the new drugs target would be identified it will open new vistas for further drug
development .The finding of our docking will be useful in finding a cure for the infectious disease
bird flu, also it will open new avenues for finding other possible drug targets in influenza A virus.
The docking results can be used to design new lead compounds and hence can aid in the new drug
discovery process.
Finally, similar process can be applied on other pathogens and hence possible therapeutic sites
can be identified in them. Similar method can also be applied to other infectious diseases and
hence we can look forward to a better disease free world.
The work presented is just a small part of big issue and lots of work still needs to be done to
establish a good phylogenetic relationship and full fledged cure for bird flu. But we are hoping
that these findings will go long way and will prove fruitful to any going in a similar area.
78
79
BIBLIOGRAPHY
[1] - Lannsing M. Prescott,John P. Harley and Donald A. Klein ,Microbiology 6th edition
McGrawHill Higher Education,Human diseases caused by viruses
[2] - F V Chisari, C Ferrari

Department of Molecular and Experimental Medicine, Scripps Research Institute, La
Jolla, California 92037, USA.
[3] -C Seeger, W S Mason

Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA. c_seeger@fccc.edu
[4]- plumbed
[5]- Howard Hughes Medical Institute, Department of Biochemistry and Molecular

Biophysics, Columbia University, New York, New York 10032, USA
Reprint requests to: Barry Honig, Howard Hughes Medical Institute, Department of
Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032,
USA
[6]- Al-Lazikani, B., Sheinerman, F.B., and Honig, B. 2001. Combining multiple
structure and sequence alignments to improve sequence detection and alignment:
Application to the SH2 domains of Janus kinases. Proc. Natl. Acad. Sci. 98: 14796–14801. [PubMed].
Aloy, P., Querol, E., Aviles, F.X., and Sternberg, M.J. 2001. Automated structure-based
prediction of functional sites in proteins: Applications to assessing the validity of
inheriting protein function from homology in genome annotation and to protein docking.
J. Mol. Biol. 311: 395–408. [PubMed].
80
(77)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and
Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein
database search programs. Nucleic Acids Res. 25: 3389–3402. [PubMed].
Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher,
P., Cerutti, L., Corpet, F., Croning, M.D., et al. 2000. InterPro—An integrated
documentation resource for protein families, domains and functional sites. Bioinformatics
[7]- Chemogenomics Laboratory, Research Group on Biomedical Informatics, Institut

Municipal Investigació Medica and Universitat Pompeu Fabra, Passeig Maritim de la
Barceloneta, 37-49, 08003 Barcelona (Catalonia), Spain.
[8]- Computational Sciences, Department of Chemistry, Nerviano Medical Sciences,

Viale Pasteur 10, 20014 Nerviano (MI), Italy. romano.kroemer@sanofi-aventis.com
81
82
Abbreviation
• CSA: Catalytic Site Atlas
• Emboss: European Molecular Biology Open Software Suit
• NCBI: National Centre for Biotechnology Information
• NDB: Nucleic Acid Database
• ORF: Open Reading Frame
• OTU: Operational Taxonomic Unit
• PDB: Protein Data Bank
• Phylip: Phylogeny Inference Package
83
84
85

Project Report On Hepatitis Virus

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Project Report On Hepatitis Virus

Diunggah oleh

Hak Cipta:

Format Tersedia

1

to liver characterized by presence of inflammatory cells in the

own, or can progress to scarring of the liver

. Hepatitis is acute when it lasts less than 6 months

It may run a sub clinical course when

 Viral Hepatitis: Hepatitis A to E (more than 95% of viral

 Auto immune conditions, e.g. Systemic Lupus

 Viral hepatitis: Hepatitis B with or without hepatitis D, hepatitis C

Incubation period 3-5 weeks (mean 28 days)

World-wide distribution; endemic in most countries. The incidence in first world

1) Chronic liver disease

Incidence endemic world-wide; high incidence in Japan, Italy and Spain

What is the Hepatitis B Virus?

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

Classification and general features:

Genera : orthohepadnavirus(e.g.hepatitis B [HBV] of human ) ,Avihepadnavirus (e.g.

Fig.hepatitis B virus structure

Those who are at particular risk include:

Chronic infection may take one of two forms:

a) 80% of patients with HCC are carriers of hepatitis B.

Prevalence of disease in Africa

• Blood transfusions, serum products,

3) Horizontal transmission in children, families, 'close personal contact'.

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

• Tran placental (rare)

(This is the major mode of transmission in South East Asia)

Acute infection with resolution

Two types of vaccine are available:

Vaccine should be administered to people at high risk of infection with HBV:

What is Hepatitis B Infection Like?

Genome : circular and 3.2kb in size, double stranded.It has compact

. The (-) strand of cccDNA is the template for transcription

The pregenome RNA is translated to produce a polymerase protein, P, which then

Hepatitis B is caused by hepatitis B (HBV ),double –stranded circular DNA virus of

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

Homology or comparative modeling involves the prediction of the structure of a query

Bioinformatics is an interdisciplinary research area at the interface between computer

2. Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary

The current FASTA package contains programs for protein:protein, DNA:DNA,

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for

5. Primary & secondary structure analysis

Using ProtParam-for primary structure

Validated modeled receptor using Structure Analysis Validation Server (SAVS).

Run the HEX and found the structure of drug molecule.

Template selection and sequence alignment

Protein-Protein Docking interactions

Fig: Protein-Protein docking.

Fig:Protein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

Flexible Ligand with a Rigid Receptor

It is difficult to generalize which amino acids are likely to be in a protein

His 0.360 Tyr -0.040 Asp 0.045 Gly -0.070

SAVS (Structure analysis and validation server)

1.Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Name and origin of the protein

Protein name Glycerate kinase

List of potentially matching sequences:-

pdb1QGT-C Chain C,(Hbcag)Human Hepatitis B Viral Capsid >gi|5 206 6e-54

pdb 1TA3-B XIP1_WHEAT.. Chain B, Crystal Structure Of Xylanase (Gh10)

Graphical overview of the alignments

Primary structure prediction

DE Glycerate kinase (EC 2.7.1.31) (HBeAg-binding

Number of amino acids: 523