Anda di halaman 1dari 8

NCBI News

NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION


National Library of Medicine
National Institutes of Health

August 1997

Vice President Launches PubMed, Lauds Free MEDLINE Access


“M EDLINE...will henceforth
be available free to the
American people.” With those
other “empowerment” initiatives
stating, “This development...may
do more to reform and improve the
articles. By traversing the See
Related Articles’ links, a user can
find articles similar in concept with
words, Vice President Al Gore quality of health care in the United speed and precision. PubMed
inaugurated the PubMed search States than anything else we’ve expands upon Entrez by linking
system at a Capitol Hill press done in a long time.” MEDLINE articles to full-text Web
conference on June 26. PubMed, sites maintained by publishers.
which provides Web access to the Searching PubMed Currently, 95 journals are linked
National Library of Medicine’s PubMed grew out of NCBI’s Entrez to PubMed, including Cell, Journal
(NLM) database of the biomedical project which, since 1992, has of Biological Chemistry, Journal
journal literature, MEDLINE, was offered a subset of MEDLINE of Cell Biology, New England
heralded by Senator Tom Harkin records related to molecular Journal of Medicine, and Science.
(IA) as “...the model of a smart, biology. In addition to encompas- Access to publishers’ Web sites
creative government initiative.” sing all of MEDLINE and may require subscriptions or
The Vice President viewed free PreMEDLINE, PubMed retains registration.
access to MEDLINE as consistent Entrez’s ability to use one article
with the Clinton administration’s as a “seed” to find other similar PubMed Options
PubMed offers the option to search
MEDLINE or any of NCBI’s mo-
lecular biology databases. Users
can select from a variety of search
fields, including but not limited to:
text words, author names, and jour-
nal titles. A MEDLINE citation for
which there is a corresponding on-
line, full-text article will have a
button at the top of the abstract
page that links to the publisher’s
Web site. Additional links point to
Continued on page 2

IN THIS ISSUE

PubMed Launched ....................... 1


Using Sequin ............................... 2
Structure Neighbors ..................... 3
NCBI Data by FTP ...................... 3
ORF Finder .................................. 4
Electronic PCR ............................ 4
Recent Publications ..................... 4
NCBI Director David Lipman (far left) coaches Vice President Gore (seated) as he searches CGAP Revolutionizes Research .. 5
PubMed. NIH Director Harold Varmus (center) and NLM Director Donald Lindberg (far right) Frequently Asked Questions ....... 6
look on.
PubMed, continued from page 1
other NCBI databases, including PubMed is available from the
sequences, 3D structures or NCBI World Wide Web home page
OMIM. Advanced query options (http://www.ncbi.nlm.nih.gov).
allow for the creation of more com- Comments and questions about
NCBI News plex Boolean search expressions,
and a special clinical query page is
PubMed are welcome. Send e-mail
to info@ncbi.nlm.nih.gov or call
optimized to perform searches for (301) 496-2475. ■
NCBI News is distributed two to three
studies relating to the etiology,
times a year. We welcome communi- diagnosis, prognosis, or treatment
cation from users of NCBI databases of human diseases.
and software and invite suggestions
for articles in future issues. Send cor- v v v v v
respondence and suggestions to NCBI
News at the address below.
Using Sequin to Submit Sets
NCBI News
National Library of Medicine of Related Sequences
Bldg. 38A, Room 8N-803
8600 Rockville Pike
Bethesda, MD 20894
Phone: (301) 496-2475
S equin is a program developed
at the NCBI for submitting
DNA sequences to GenBank,
After the alignment is generated,
annotation features, such as a
coding sequence or rRNA, can be
Fax: (301) 480-9241
E-mail: info@ncbi.nlm.nih.gov EMBL, or DDBJ. Both Sequin marked just once on a single master
and BankIt, NCBI’s Web-based se- sequence. These features can then
quence submission tool, can be be propagated from the master
Editors
Dennis Benson
used to submit simple mRNA or sequence to other sequences in the
Barbara Rapp genomic sequences along with alignment. The proper location of
associated coding sequences. the feature will be calculated by
Design Consultant However, Sequin has been outfitted Sequin for each sequence individ-
Troy M. Hill
with a number of advanced se- ually after taking into account any
Photography quence analysis capabilities. Unlike gaps or insertions. The Entrez
Karlton Jackson other sequence submission tools, nucleotide database is now acces-
Sequin can process sets of related sible from Sequin, allowing
Writing, Editing, Graphics,
sequences such as segmented sets sequences from GenBank to be
and Production
Veronica Johnson and those generated by phylogenetic, directly downloaded into Sequin
Donna Roscoe population, or mutation studies. from Entrez. If the GenBank
sequence is related to the sequences
In 1988, Congress established the Like other World Wide Web sub- in the alignment, it can be brought
National Center for Biotechnology In-
formation as part of the National
mission tools, Sequin can be used to into the alignment as the master
Library of Medicine; its charge is to annotate single sequences. How- sequence. Features can then be
create information systems for ever, it is usually easiest to annotate copied from this master sequence
molecular biology and genetics data, related sequences when they are part onto the new sequences. The
and to perform research in compu-
tational molecular biology.
of a multiple sequence alignment. GenBank record will not receive a
Sequin can import the individual new accession number, but rather
The contents of this newsletter may be sequences, as well as the alignment serves only to facilitate annotation
reprinted without permission. The itself, from alignments that have been of the newly submitted sequences.
mention of trade names, commercial
saved in FASTA+GAPs, PHYLIP,
products, or organizations does not
imply endorsement by NCBI, NIH, or or NEXUS format. If the sequences Further information about Sequin,
the U.S. Government. are related, but not yet aligned, including downloading instruc-
Sequin will generate an alignment tions and help documentation, are
NIH Publication No. 97-3272 from a file of FASTA-formatted available from the Sequin Web page
ISSN 1060-8788 sequences. Each new sequence in (http://www.ncbi.nlm.nih.gov/
the alignment will receive its own Sequin). ■
accession number.

2 NCBI News • August 1997


Structure Neighbors in Web Entrez
W WW Entrez now contains “neighbors” for proteins in its 3D structure
database. Structure neighbors are other proteins that have a similar
3D structure or shape. As with the protein sequence neighbors in Entrez,
NCBI Data
structure neighbors are most often homologs with similar biological by FTP
functions. However, since protein evolution conserves 3D structure to a
greater extent than sequence, a protein’s structure neighbors may include The NCBI FTP site contains a
more distant relatives not present among its sequence neighbors. These variety of directories with publicly
additional similarities may provide further insight into a protein’s properties available databases and software.
The available directories include
and biological function. By incorporating structure neighbors in Entrez, ‘repository’, ‘genbank’, ‘entrez’,
these distant relationships, detectable only by 3D structure comparison, ‘toolbox’, ‘pub’, and ‘sequin’.
are readily accessible to molecular biologists.
The repository directory makes a
number of molecular biology data-
An example is provided by the globular domain of chicken histone H5 bases available to the scientific com-
(PDB accession code 1HST). The sequence neighbors of histone H5 are all munity. This directory includes da-
other histones from a variety of eukaryotic species. But the structure tabases such as PIR 53.0, Swiss-
neighbors of histone H5 are diverse and include a number of DNA binding Prot, CarbBank, AceDB, and Fly-
Base.
proteins from bacteria. One of these is the E. coli catabolite gene activator
protein, or CAP, in complex with DNA (PDB accession code 1CGP). The The genbank directory contains
structures are remarkably similar, with 46 amino acid residues of histone files with the latest full release of
GenBank, the daily cumulative
Continued on page 7
updates, and the latest release notes.

The entrez directory contains the


client software for Network Entrez.

The toolbox directory contains a


set of software and data exchange
specifications that are used by NCBI
to produce portable software, and
includes ASN.1 tools and specifi-
cations for molecular sequence data.

The pub directory offers public-


domain software, such as BLAST
(sequence similarity search pro-
gram). Client software for Network
BLAST and PowerBlast is also in-
cluded in this directory.

The sequin directory contains the


new Sequin submission software
for Mac, PC, and UNIX platforms.

Data in these directories can be


transferred through the Internet by
using the Anonymous FTP
program. To connect, type: ftp
ncbi.nlm.nih.gov. Enter anony-
mous as the login name, and enter
your e-mail address as the password.
Then change to the appropriate di-
rectory. For example, change to the
repository directory (cd repository)
to download specialized databases.

Chicken histone H5

NCBI News • August 1997 3


Hunting For Open Reading Frames
Selected Recent With ORF Finder
Publications by
NCBI Staff S earching for open reading frames is possible with NCBI’s software
tool, ORF Finder, accessible from the NCBI World Wide Web home
page (http://www.ncbi.nlm.nih.gov). ORF Finder is a graphical analysis
Altschul, SF, TL Madden, AA
Schaffer, J Zhang, Z Zhang, W tool which finds all open reading frames in a user’s sequence or in a
Miller, and DJ Lipman. Gapped sequence retrieved from a database. ORF Finder also provides easy access
BLAST and PSI-BLAST: a new
generation of protein database search
to the BLAST search page and allows the deduced amino acid sequence to
programs. Nucleic Acids Res 25:3389– be compared against additional amino acid sequence databases using the
402, 1997. BLAST options.
Baxevanis, AD and D Landsman.
Histones and histone fold sequences To use ORF Finder, enter the accession or GI number of the sequence of
and structures: a database. Nucleic interest, or enter your query sequence directly into the text box in FASTA
Acids Res 25:272–3, 1997.
format. ORF Finder will identify all open reading frames using the
Galperin, MY. Sequence analysis of standard genetic code or an alternative one for translation. Users can limit
an exceptionally conserved operon the search for open reading frames to a portion of the query sequence by
suggests enzymes for a new link
between histidine and purine bio- specifying the positions (in base pairs) in the “From” and “To” boxes.
synthesis. Mol Microbiol 24:443–5, Press the ORF Find button to retrieve a graphic display of ORFs and their
1997. location in the sequence in 6 reading frames. Users have the option to
Leipe, DD. Biodiversity, genomes and change the minimum ORF length to 50 or 300 nucleotides (in base pairs)
DNA sequence databases. Curr Opin and Redraw the query sequence. The Six Frames option features a
Genet Dev 6:686–91, 1996. graphic of all start and stop codons. Select a particular ORF by clicking on
Makalowski, W. Mermaid: a not-so- it to see the amino acid sequence with all alternative start codons. After
new family of human repetitive selecting a particular ORF of interest, click on the Accept button and have
elements. Hum Genet 99:696–7, 1997.
the option to view the ORF in various formats: GenBank flat-file, FASTA
Mushegian, AR and EV Koonin. nucleotide, or FASTA amino acid sequence. Selecting View retrieves the
Sequence analysis of eukaryotic
developmental proteins: ancient and
full GenBank record with its annotated sequence information.
novel domains. Genetics 144:817–28,
1996. For those scientists submitting sequence data, ORF Finder is also packaged
Neuwald, AF, DJ Liu, DJ Lipman, with the Sequin sequence submission software. ORF Finder can be used in
and CE Lawrence. Extracting protein conjunction with Sequin’s Sequence Editor to annotate new coding regions
alignment models from the sequence on the record, perform basic editing, and translate nucleotide sequences.
database. Nucleic Acids Res 25:1655–
77, 1997.
The Sequin program can be downloaded from NCBI’s FTP site accessible
from the NCBI WWW home page. ■
Schuler, GD. Sequence mapping by
electronic PCR. Genome Res 7:541– v v v v v
50, 1997.
Schuler, GD, MS Boguski, EA
Stewart, LD Stein, G Gyapay, et al. A
Mapping Unique Genome Sites
gene map of the human genome.
Science 27:540–6, 1996. by Electronic PCR
Wolfsberg, TG and D Landsman. A
comparison of expressed sequence tags
(ESTs) to human genomic sequences.
I t is possible to determine the gene map location of a new sequence using
NCBI’s software tool, Electronic PCR (e-PCR), located on the NCBI
World Wide Web home page. Electronic PCR simulates conventional PCR
Nucleic Acids Res 25:1626–32, 1997.
methods for identifying sequence tagged sites (STSs) by searching for
Zhang, Z and TL Madden. Power-
BLAST: A new network BLAST sites in a query sequence which match the sequence and orientation of a set
application for interactive or automated of primers. STSs are unique DNA landmarks used in the construction of
sequence analysis and annotation. genetic and physical maps of the human genome. The e-PCR tool searches
Genome Res 7:649–56, 1997.
for matches between a user’s query sequence and STS primer sequences
in the STS database (dbSTS). Researchers can use e-PCR to assign
Continued on page 7

4 NCBI News • August 1997


CGAP Revolutionizes Cancer Research
T he knowledge that genetic
mutations are central to the
development of cancerous cells
sequencing the cDNA libraries.
NCBI uses powerful sequence
similarity searching tools, such as
The focal point of the CGAP
project is its Web site, located at
www.ncbi.nlm.nih.gov/ncicgap.
has prompted the National Cancer BLAST, to make electronic com- Managed and supported by NCBI
Institute, in partnership with NCBI, parisons between the libraries of a members Mark Boguski, Ken Katz,
and other government, academic given tissue type at different Greg Schuler, and Carolyn Tolsto-
and industry leaders, to initiate the stages, and generate discrete lists shev, the CGAP Web site is the
Cancer Genome Anatomy Project, of genetic candidates as causative central repository for all of the
or CGAP. components of the carcinogenic information generated by the
process. project. This includes tissues,
CGAP merges state-of-the-art libraries, sequences, and links to
technologies in pathology, molecu- CGAP has collected over 40,000 additional value-added informa-
lar biology and bioinformatics, to DNA sequences so far in its trek tion, such as related DNA and
catapult a new strategic attack on over the next few years toward a protein sequences, genome map-
cancer. It is an unprecedented complete index of genes expressed ping data, and biomedical
assemblage of sequence informa- in tumors—referred to as the Tumor references.
tion characterizing the genetic Gene Index. Initially, this index
constitution of cells at various will be compiled from five major Ultimately, the resourceful use of
stages: normal, precancerous, and cancers: prostate, breast, lung, information housed in the CGAP
tumor. colon, and ovarian. NCBI will Web site is expected to lead to
continue to map new index se- innovative diagnostic, preventa-
Worldwide, participants in CGAP quences to the Human Genome, tive, and curative technologies
are collecting a variety of tissue building upon NCBI’s unique which will forever alter the way
samples from cells at different collection of human gene se- scientists conduct cancer
stages; generating cDNA libraries quences (UniGene) used to con- research. ■
from the tissue samples; and struct the Human Transcript Map.

NCI CGAP Cancer Genome Anatomy Project NCBI

Comparison of Normal versus Tumor Prostate Cell Gene Expression


Normal Precancerous Malignant Gene index Gene description

Hs.1548 Prostate specific antigen (APS)


0.0163 0.0330 0.0078
Hs.73487 Beta-microseminoprotein (prostate secreted) (MSMB)
0.0163 0.0024 0.0104
Hs.82186 V-erb-b2 avian erythroblastic leukemia viral oncogene
0.0000 0.0000 0.0156 homolog 3 {alternative products} (ERBB3)
Hs.5417 ESTs, Weakly similar to F43E2.7 [C.elegans]
0.0000 0.0000 0.0130
Hs.62954 Ferritin heavy chain (FTH1)
0.0069 0.0071 0.0000
Hs.38972 ESTs, Weakly similar to CD63 ANTIGEN [H.sapiens]
0.0050 0.0071 0.0000
Hs.18910 ESTs
0.0000 0.0047 0.0000

CGAP displays a list of genes with statistically significant expression differences. Dot intensity is proportional to relative frequency of EST
expression.

NCBI News • August 1997 5


&
Q
A
I submitted a sequence using
BankIt one month ago, however I
have not yet received a GenBank
Frequently Asked Questions

Since BankIt submissions normally receive GenBank accession numbers


within 24-48 hours, an error in the submission process most likely
occurred. Submitting sequence information by BankIt involves completing,
accession number. Why? reviewing, and submitting your BankIt file. Once you have finished
entering your data and information, and have reviewed it for accuracy and
completeness, switch the selection from “Modify Submission” to “Submit
to GenBank” on the final BankIt page and click on the BankIt button one
last time. Users will receive a return message indicating receipt of the
submission and a GenBank accession number soon thereafter.

I have a list of interesting titles I The PubMed system provides access to bibliographic citations and
retrieved using PubMed. How do I corresponding abstracts, but does not contain the full-text of articles.
obtain the full documents? However, PubMed offers links to a number of publishers who provide
access to full-text journal articles from their Web sites. PubMed displays
the link at the top of the Display title/abstract page when available.
Currently the number of participating journals is small and the journals
may require a user to subscribe before being able to view the full-text. A
list of PubMed journals that offer full-text can be retrieved from the URL:
http://www.ncbi.nlm.nih.gov/PubMed/fulltext. If the journal you are
interested in is not on the list, contact the nearest library for information
about obtaining articles.

How can I import references from First choose the MEDLINE format on the “document summaries page”
PubMed into Endnote? and then display. Select the appropriate save format at the bottom of the
display screen and press the “Save” button.

I recently read an article that Sometimes authors request that a sequence be held confidential until
referenced a GenBank accession publication. Once GenBank is informed that the sequence has been
number, but I can’t find the published, GenBank staff will verify the publication and release the
sequence record in the database. sequence. GenBank encourages users who are unable to retrieve a record
Why? to send the accession number and complete citation in which it appeared to
update@ncbi.nlm.nih.gov.

How can I find out if a particular Conduct a GenBank search using common names for the gene. If the gene
gene has been mapped? has been sequenced, its sequence record can be retrieved from GenBank.
The accession number obtained from that record can be entered into the
text search tool of the Human Transcript Map (available from the NCBI
home page). Please note that the number of transcripts mapped in this
study is estimated to represent one-fifth of the total number of genes in
the human genome so the odds are that a gene has not yet been mapped.
Continued on next page

6 NCBI News • August 1997


Structure Neighbors, continued from page 3 Electronic PCR, continued from page 4
H5 superimposing to 1.7 angstroms page. The 3D superpositions of sequence database records to map
residual, even though sequence structure neighbors can be viewed positions, test primer feasibility,
identity is only 10%. These using the Kinemage button on and integrate and anchor genetic
proteins would appear to be neighbor-list pages (viewing pro- maps and sequence data.
homologs, and the protein-DNA grams such as Cn3D and Kinemage
structure of CAP suggests a model are helper applications that may be To map sequences by e-PCR, enter
for the interaction of histone H5 downloaded to your computer by the sequence of interest in FASTA
with DNA. following hotlinks on the Structure format into the text box or retrieve
home page). 3D viewing is im- a sequence from GenBank using
Structure neighbors may be ac- portant for interpretation of an accession or GI number. The
cessed from the “Structure Sum- structural similarities. In the 1HST “Retrieve STS from” setting can
mary” of a protein in WWW protein, for example, one may see be used to limit the search to STSs
Entrez’s 3D structure database. that histone H5 contains positively from specific organisms. Press the
Neighbor lists are displayed when charged residues in the region that Submit Query button to begin.
one clicks the button Protein 3D superimposes onto the DNA- The results list the STSs found
Structures in the line reading binding interface of CAP as does with their relevant identifiers,
“Protein 3D Structures similar to CAP itself. 3D viewing thus position of the primer binding sites
<Chain A> computed by VAST.” supports the inference that these within the query sequence,
Here <Chain A> is a pull-down proteins “dock” with DNA in a chromosome number (if known),
menu that allows one to select the similar manner. Possible func- and the expected and observed size
individual polypeptide chain, or tional similarities of structure of the amplicon. Hypertext links
compact domain within that chain, neighbors may also be explored, to GenBank and dbSTS records
for which structure neighbors are of course, by examining the (linked to Entrez) are provided for
to be retrieved. Structure neighbor MEDLINE citations and sequence more detailed information.
lists have been computed by the neighbors associated with each
VAST algorithm (Vector Align- protein. The number of STS results one can
1
ment Search Tool ), and are sorted expect for a typical search depend
according to a VAST similarity WWW Entrez’s structure neighbor on a variety of factors, such as
score. VAST compares the relative service is the work of NCBI length of the query sequence and
orientations of helices and beta- researchers T. Madej, C. Hogue, size of the STS database. Results
strands in two protein domains, J.-F. Gibrat, J. Spouge, H. Ohkawa that are reported are unequivocal
and if similarity is more extensive and S. Bryant. Improvements in and more reliable than those
than one would expect by chance, the visualization of structure identified using the general-
produces a detailed structure similarities are still in progress, purpose database search tool
alignment by comparison of atomic and comments and suggestions are BLAST. Chances of obtaining STS
coordinates. welcome. matches will improve as the
number of sequences in the STS
1
WWW Entrez also supports visu- Gibrat J-F, Madej T, Bryant SH: database continues to increase
alization of protein structures by Surprising similarities in structure dramatically.
molecular graphics. The Cn3D comparison. Curr Opin Struct Biol
viewer may be started by the View 1996, 6:377–85. ■ For more information contact
button on the “Structure Summary” info@ncbi.nlm.nih.gov. ■

Frequently Asked Questions, continued from page 6


I am interested in performing non- Perform one BLAST search using the nonredundant database, nr. After
redundant BLAST searches on that, you can use BLAST to search the month database only. The month
some sequences I have. Is there a database has all new records from the last month and is obviously much
way to do this for new GenBank smaller than nr. It was intended for this kind of surveillance blasting.
entries on a regular basis?

NCBI News • August 1997 7


DEPARTMENT OF HEALTH AND HUMAN SERVICES
Public Health Service, National Institutes of Health FIRST-CLASS MAIL
National Library of Medicine
National Center for Biotechnology Information
POSTAGE & FEES PAID
Bldg. 38A, Room 8N-803 PHS/NIH/NLM
8600 Rockville Pike BETHESDA, MD
Bethesda, Maryland 20894
PERMIT NO. 13166

Official Business
Penalty for Private Use $300

NATIONAL INSTITUTES OF HEALTH • National Library of Medicine NCBI News


August 1997

Anda mungkin juga menyukai