Anda di halaman 1dari 9

A unified platform for biological big data sharing and application.

CNGBdb Handbook

CNGBdb search
CNGBdb hosts a vast amount of molecular data and other information that is indexed by CNGBdb
Search. These data include literature, project, sample, experiment, run, assembly, variation, gene,
protein, sequence et al.
On the homepage of CNGBdb, you can enter any meaningful word or number to find relevant
information. For example, gene name (TP53), species, disease, etc. CNGBdb supports word
search. For example, if you search for "homo", it will return search results that match "homo" and
will not return search results that match "ho" or "hom". More complex query syntax will be added in
the CNGBdb iteration version.

Query examples

Following the aforementioned query syntax, users can search according to data content and
characteristics.
A few examples of queries that can be performed using CNGBdb Search are listed below.
Search for literature PMID24971553.
Search for gene TP53.
Search for protein Ovarian cancer-related protein 1.
Search for project CNP0000028.
Search for sample CNS0000027.

Search results

CNGBdb contains information on 10 data structures of project, sample, experiment, run, assembly,
literature, variation, protein, sequence, gene. Searching by keyword on the homepage, all results in
10 data structures will be returned by default. On the page of search results, you can see the top 3
search results with the highest relevance of each sub database. If you want to view more results,
click on “More results” below each sub database to view. Select one of the sub databases from the
drop-down list on the left side of the search bar on the homepage to search, corresponding search
results of the sub database will be returned. Scroll down the page, more search results up to 100
will be loaded, Search results after 100 will not be displayed. If the results you want are still not
found in the 100 search results, it is recommended to modify the search terms to re-search.
Users can also re-search by inputting the search term through the search bar of the search result
page, and the search bar of the search results page has the same function as the home search bar.

CNGBdb - db.cngb.org
A unified platform for biological big data sharing and application.

Filter

The navigation filter on the left allows users for a compact view and easy navigation across different
databases. It provides a means for exploring the search results grouped in relevant databases and
drilling down the scope of the results.

Detailed data and Related data

If you click on the number of a certain data, you can go to the details page of the data to view more
detailed information. For example, click the literature number (PMID24971553) on the search result
page to jump to the literature details page (/search/literature/PMID24971553/).
If you click on Related data for a particular entry you can explore its cross-references to other
databases resources of CNGBdb, such as in the variation database of the search results, click on
the gene in a certain data, you can link to the gene information page of the gene database.

Synonym conversion for CNGBdb search

CNGBdb search configures synonymous organisms (the synonym table is mainly from taxonomy
database) and medical subject words (the synonym table is mainly comes from mesh). When you
search for a keyword, the synonym of the keyword can also be retrieved, for example, Oryza
sativa‘s scientific name is Oryza sativa L, Genbank common name is rice, Inherited blast name is
monocots. When you search for Oryza sativa, all of its synonyms including Oryza sativa L, rice,
monocots can also be retrieved.

Search fields

The 10 data structures of CNGBdb support different search fields. The search fields are as follows.

Structure Search fields

Literature Title, Author, Journal, Publication type, Source, Abstract, Keywords

Gene Source, Organism, Symbol, Title, Also knowns as

HGVS/Genome variation, Organism, Gene(s), Condition(s), Condition ID,


Variation
Phenotype(s), Phenotype ID, Project ID, Literature ID, Source

CNGBdb - db.cngb.org
A unified platform for biological big data sharing and application.

Structure Search fields

Protein Protein name(s), Source, Entry name, Organism, Gene(s), Keywords

Sequence Gene(s), Source, Keywords, Literature ID, Molecule Type, Organism, Title

Project Project ID, Accession in other database, Title, DOI

EBB accession ID, Sample ID, Accession in other database, Data type,
Sample Organism, Related accession, Sample name, Sample model, Deposited in,
Organism ID

Experimen Experiment ID, Accession in other database, Related accession, Platform, Title,
t Library name

Source, Assembly ID, Assembly name, Synonyms, Related accession,


Assembly
Submitter, RefSeq category

Data source references & Database links

1. Millet: Jia G, Huang X, Zhi H, et al. A haplotype map of genomic variations and genome-wide
association studies of agronomic traits in foxtail millet (Setaria italica). Nature genetics.
2013;45(8):957-61.
2. 1KP: Matasci N, Hung LH, Yan Z, et al. Data access for the 1,000 Plants (1KP) project.
GigaScience. 2014;3:17.
3. 1KITE: Misof B, Liu S, Meusemann K, et al. Phylogenomics resolves the timing and pattern of
insect evolution. Science. 2014;346(6210):763-7.
4. HPO: Kohler S, Vasilevsky NA, Engelstad M, et al. The Human Phenotype Ontology in 2017.
Nucleic acids research. 2017;45(D1):D865-D76.
5. NCBI: Coordinators NR. Database resources of the National Center for Biotechnology
Information. Nucleic acids research. 2018;46(D1):D8-D13
6. dbSNP: Smigielski EM, Sirotkin K, Ward M, et al. dbSNP: a database of single nucleotide
polymorphisms. Nucleic acids research. 2000;28(1):352-5
7. SRA: Kodama Y, Shumway M, Leinonen R, et al. The Sequence Read Archive: explosive growth
of sequencing data. Nucleic acids research. 2012;40(Database issue):D54-6.
8. Assembly: Kitts PA, Church DM, Thibaud-Nissen F, et al. Assembly: a resource for assembled
genomes at NCBI. Nucleic acids research. 2016;44(D1):D73-80.

CNGBdb - db.cngb.org
A unified platform for biological big data sharing and application.

9. Refseq: Pruitt KD, Tatusova T, Brown GR, et al. NCBI Reference Sequences (RefSeq): current
status, new features and genome annotation policy. Nucleic acids research. 2012;40(Database
issue):D130-5.
10. Gene: Brown GR, Hem V, Katz KS, et al. Gene: a gene-centered information resource at NCBI.
Nucleic acids research. 2015;43(Database issue):D36-42.
11. Taxonomy: Federhen S. The NCBI Taxonomy database. Nucleic acids research.
2012;40(Database issue):D136-43.
12. GEO: Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data
sets--update. Nucleic acids research. 2013;41(Database issue):D991-5.
13. dbvar: Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data
sets--update. Nucleic acids research. 2013;41(Database issue):D991-5.
14. Clinvar: Landrum MJ, Lee JM, Benson M, et al. ClinVar: public archive of interpretations of
clinically relevant variants. Nucleic acids research. 2016;44(D1):D862-8.
15. OMIM: Amberger JS, Bocchini CA, Schiettecatte F, et al. OMIM.org: Online Mendelian
Inheritance in Man (OMIM(R)), an online catalog of human genes and genetic disorders. Nucleic
acids research. 2015;43(Database issue):D789-98.
16. dbgap: Mailman MD, Feolo M, Jin Y, et al. The NCBI dbGaP database of genotypes and
phenotypes. Nature genetics. 2007;39(10):1181-6.
17. EBI: Park YM, Squizzato S, Buso N, et al. The EBI search engine: EBI search as a
service-making biological data accessible for all. Nucleic acids research. 2017;45(W1):W545-W9.
18.UniProt : The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids
Res. 47: D506-515 (2019).
19. WoRMS Editorial Board (2019). World Register of Marine Species. Available from
http://www.marinespecies.org at VLIZ. Accessed 2019-03-06. doi:10.14284/170.
20. Zou Y123,Xue W12,Luo G, et al. 1,520 reference genomes from cultivated human gut bacteria
enable functional microbiome analyses. Nat Biotechnol,2019/2;37(2):179-185.
21. A new genomic blueprint of the human gut microbiota. Nature (2019) doi:
10.1038/s41586-019-0965-1
22. Almeida A, Mitchell AL, Boland M, Forster SC,Gloor GB, Tarkowska A, Lawley TD and Finn, RD
Qin J, Li R, Raes J, et al. A human gut microbial gene catalogue established by metagenomic
sequencing. Nature. 2010 Mar 4;464(7285):59-65.
23. Qin J, Li Y, Cai Z, et al. A metagenome-wide association study of gut microbiota in type 2
diabetes. Nature. 2012;490(7418):55-60.
24. Le Chatelier E, Nielsen T, Qin J, et al. Richness of human gut microbiome correlates with
metabolic markers. Nature. 2013;500(7464):541-6.
25. Li J, Jia H, Cai X, et al. An integrated catalog of reference genes in the human gut microbiome.
Nat Biotechnol. 2014;32(8):834-41.
26. Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. Cell
Host Microbe. 2015;17(6):852.
27. Zhang X, Zhang D, Jia H, et al. The oral and gut microbiomes are perturbed in rheumatoid
arthritis and partly normalized after treatment. Nat Med. 2015;21(8):895-905.

CNGBdb - db.cngb.org
A unified platform for biological big data sharing and application.

28. Feng Q, Liang S, Jia H, et al. Gut microbiome development along the colorectal
adenoma-carcinoma sequence. Nat Commun. 2015;6:6528. Published 2015 Mar 11.
doi:10.1038/ncomms7528.
29. Yu J, Feng Q, Wong SH, et al. Metagenomic analysis of faecal microbiome as a tool towards
targeted non-invasive biomarkers for colorectal cancer. Gut. 2015;66(1):70-78.
30. Xie H, Guo R, Zhong H, et al. Shotgun Metagenomics of 250 Adult Twins Reveals Genetic and
Environmental Impacts on the Gut Microbiome. Cell Syst. 2016;3(6):572-584.e3.
31. Liu R, Hong J, Xu X, et al. Gut microbiome and serum metabolome alterations in obesity and
after weight-loss intervention. Nat Med. 2017;23(7):859-868.
32. He Q, Gao Y, Jie Z, et al. Two distinct metacommunities characterize the gut microbiota in
Crohn's disease patients. Gigascience. 2017;6(7):1-11.
33. Kuang YS, Lu JH, Li SH, et al. Connections between the human gut microbiome and gestational
diabetes mellitus. Gigascience. 2017;6(8):1-12.
34. Jie Z, Xia H, Zhong SL, et al. The gut microbiome in atherosclerotic cardiovascular disease. Nat
Commun. 2017;8(1):845. Published 2017 Oct 10. doi:10.1038/s41467-017-00900-1.
35. Gu Y, Wang X, Li J, et al. Analyses of gut microbiota and plasma bile acids enable stratification
of patients for antidiabetic treatment. Nat Commun. 2017;8(1):1785. Published 2017 Nov 27.
doi:10.1038/s41467-017-01682-2
36. Shah SP, Roth A, Goya R, et al. The clonal and mutational evolution spectrum of primary
triple-negative breast cancers. Nature. 2012;486(7403):395-9. Published 2012 Apr 4.
doi:10.1038/nature10933.
37. Banerji S, Cibulskis K, Rangel-Escareno C, et al. Sequence analysis of mutations and
translocations across breast cancer subtypes. Nature. 2012;486(7403):405-9. Published 2012 Jun
20. doi:10.1038/nature11154
38. Pereira B, Chin SF, Rueda OM, et al. The somatic mutation profiles of 2,433 breast cancers
refines their genomic and transcriptomic landscapes. Nat Commun. 2016;7:11479. Published 2016
May 10. doi:10.1038/ncomms11479
39. Nik-Zainal S, Alexandrov LB, Wedge DC, et al. Mutational processes molding the genomes of
21 breast cancers. Cell. 2012;149(5):979-93.
40. Stephens PJ, Tarpey PS, Davies H, et al. The landscape of cancer genes and mutational
processes in breast cancer. Nature. 2012;486(7403):400-4. Published 2012 May 16.
doi:10.1038/nature11017
41. Schulze K, Imbeaud S, Letouzé E, et al. Exome sequencing of hepatocellular carcinomas
identifies new mutational signatures and potential therapeutic targets. Nat Genet.
2015;47(5):505-511.
42. Kan Z, Zheng H, Liu X, et al. Whole-genome sequencing identifies recurrent mutations in
hepatocellular carcinoma. Genome Res. 2013;23(9):1422-33.
43. Wu K, Zhang X, Li F, et al. Frequent alterations in cytoskeleton remodelling genes in primary
and metastatic lung adenocarcinomas. Nat Commun. 2015;6:10131. Published 2015 Dec 9.
doi:10.1038/ncomms10131
44. Imielinski M, Berger AH, Hammerman PS, et al. Mapping the hallmarks of lung adenocarcinoma
with massively parallel sequencing. Cell. 2012;150(6):1107-20.

CNGBdb - db.cngb.org
A unified platform for biological big data sharing and application.

45. Li C, Gao Z, Li F, et al. Whole Exome Sequencing Identifies Frequent Somatic Mutations in
Cell-Cell Adhesion Genes in Chinese Patients with Lung Squamous Cell Carcinoma. Sci Rep.
2015;5:14237. Published 2015 Oct 27. doi:10.1038/srep14237
46. Krishnan VG, Ebert PJ, Ting JC, et al. Whole-genome sequencing of asian lung cancers:
second-hand smoke unlikely to be responsible for higher incidence of lung cancer among Asian
never-smokers. Cancer Res. 2014;74(21):6071-81.
47. Fernandez-Cuesta L, Peifer M, Lu X, et al. Frequent mutations in chromatin-remodelling genes
in pulmonary carcinoids. Nat Commun. 2014;5:3518. Published 2014 Mar 27.
doi:10.1038/ncomms4518
48. Ren S, Wei GH, Liu D, et al. Whole-genome and Transcriptome Sequencing of Prostate Cancer
Identify New Genetic Alterations Driving Disease Progression[published online ahead of print, 2017
Sep 18]. Eur Urol. 2017;S0302-2838(17)30720-0. doi:10.1016/j.eururo.2017.08.027
49. Zehir A, Benayed R, Shah RH, et al. Mutational landscape of metastatic cancer revealed from
prospective clinical sequencing of 10,000 patients. Nat Med. 2017;23(6):703-713.
50. Jiang L, Huang J, Higgs BW, et al. Genomic Landscape Survey Identifies SRSF1 as a Key
Oncodriver in Small Cell Lung Cancer. PLoS Genet. 2016;12(4):e1005895. Published 2016 Apr 19.
doi:10.1371/journal.pgen.1005895
51. George J, Lim JS, Jang SJ, et al. Comprehensive genomic profiles of small cell lung cancer.
Nature. 2015;524(7563):47-53.
52. Umemura S, Mimaki S, Makinoshima H, et al. Therapeutic priority of the PI3K/AKT/mTOR
pathway in small cell lung cancers as revealed by a comprehensive genomic analysis. J Thorac
Oncol. 2014;9(9):1324-31.
53. Rudin CM, Durinck S, Stawiski EW, et al. Comprehensive genomic analysis identifies SOX2 as
a frequently amplified gene in small-cell lung cancer. Nat Genet. 2012;44(10):1111-6.
54. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from
high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164.
55. Li MM, Datto M, Duncavage EJ, et al. Standards and Guidelines for the Interpretation and
Reporting of Sequence Variants in Cancer: A Joint
Consensus Recommendation of the Association for Molecular Pathology, American Society of
Clinical Oncology, and College of American Pathologists. J Mol Diagn. 2017;19(1):4-23.
56. Fadista J , Oskolkov N , Hansson O , et al. LoFtool: a gene intolerance score based on
loss-of-function variants in 60 706 individuals[J]. Bioinformatics, 2017, 33(4):btv602.
57. Landrum MJ, Lee JM, Benson M, et al. ClinVar: public archive of interpretations of clinically
relevant variants. Nucleic Acids Res. 2015;44(D1):D862-8.
58. 1000 Genomes Project Consortium, Auton A, Brooks LD, et al. A global reference for human
genetic variation. Nature. 2015;526(7571):68-74.
59. Cann HM, de Toma C, Cazes L, et al. A human genome diversity cell line panel. Science.
2002;296(5566):261-2.
1. https://www.ncbi.nlm.nih.gov
2. https://www.ebi.ac.uk/
3. https://www.ddbj.nig.ac.jp/index-e.html
4. http://www.internationalgenome.org/

CNGBdb - db.cngb.org
A unified platform for biological big data sharing and application.

5. https://phytozome.jgi.doe.gov
6. HPO: https://hpo.jax.org/app/
7. CHPO: http://www.chinahpo.org/
8. https://sites.google.com/site/jpopgen/dbNSFP
9. http://evs.gs.washington.edu/EVS/
10. https://genomics.scripps.edu/browser/
11. https://www.ebi.ac.uk/gwas/homeCF
12. http://www.plantkingdomgdb.com
13. http://coffee-genome.org
14. http://banana-genome-hub.southgreen.fr
15. http://202.127.18.221/bamboo/index.php
16. http://www.herbal-genome.cn/index.php
17. http://web.malab.cn
18. http://bioinfo.bti.cornell.edu/cgi-bin/kiwi/home.cgi
19. http://chi.mpipz.mpg.de/assembly.html
20. http://citrus.hzau.edu.cn/orange/download/index.php
21. http://cucurbitgenomics.org
22. http://ibi.zju.edu.cn/RiceWeedomes/Echinochloa
23. http://strawberry-garden.kazusa.or.jp
24. http://treegenesdb.org/Drupal
25. http://ngs-data-archive.psc.riken.jp
26. http://www.kazusa.or.jp
27. https://solgenomics.net
28. http://treegenesdb.org/Drupal
29. http://marinegenomics.oist.jp
30. https://genomevolution.org/CoGe

Data archiving
The data archive services of CNGBdb include CNGB Nucleotide Sequence Archive (CNSA) , Pan
immune repertoire database (PIRD) and GigaDB, which are committed to the submission, storage
and sharing of data for biological sequencing research projects, samples, experiments, assembly,
variations, etc. They’re designed to provide researchers around the world with the comprehensive
data and information resources today, enabling researchers to use data with maximum authority.

CNSA: CNGB Nucleotide Sequence Archive

PIRD:Pan immune repertoire database

GigaDB

CNGBdb - db.cngb.org
A unified platform for biological big data sharing and application.

Scientific database
The CNGBdb Scientific databases will build data applications in different fields based on the
underlying data structures and data of CNGBdb, aiming to provide scientific data services for
different research areas, such as biodiversity, microbe, cancer, immune, reproductive health,
pathogen, etc., meet the needs of researchers in different fields, enhance the value of data, and
promote data development and application.

PIRD:Pan immune repertoire database

GDRD:Genetic Disease and Rare Disease

DISSECT:Data Integration Solution for Systematic

Exploration of Cancer Traits

MDB:Microbiome Database

PVD:Pathogen Variation Database

Data analysis
Based on the underlying data, CNGBdb builds a distributed high-performance computing platform,
and deploys application services such as BLAST, Cancer Data Analysis, Pathogen Identification.

BLAST:BLAST service of CNGB

DISSECT:Data Integration Solution for Systematic

Exploration of Cancer Traits

PVD:Pathogen Variation Database

Data visualization

CNGBdb - db.cngb.org
A unified platform for biological big data sharing and application.

Visualization is designed to visualize the biological data of CNGBdb using multiple visualization
techniques, including the visualization of genomes, transcriptome, proteome and so on.

Data management
CNGB Data Access (CDA) provides users with the services of approval, authorization, and
distribution of controlled data. Whether data is authorized for access is determined by the data
owner/organization.

Data standard
CNGBdb integrates data structures and standards of international omics, health, and medicine,
such as The International Nucleotide Sequence Database Collaboration (INSDC), The Global
Alliance for Genomics and Health GA4GH (GA4GH), Global Genome Biodiversity Network (GGBN),
American College of Medical Genetics and Genomics (ACMG), and constructs standardized data
standards and structures with wide compatibility.

CNGBdb - db.cngb.org

Anda mungkin juga menyukai