Presentation is loading. Please wait.

Presentation is loading. Please wait.

生物信息学 艾对元: 13893660097 aidy@gsau.edu.cn甘肃农业大学 QQ: 156797555 http://blog.sciencenet.cn/u/eddy7777.

Similar presentations


Presentation on theme: "生物信息学 艾对元: 13893660097 aidy@gsau.edu.cn甘肃农业大学 QQ: 156797555 http://blog.sciencenet.cn/u/eddy7777."— Presentation transcript:

1 生物信息学 艾对元: 13893660097 aidy@gsau.edu.cn甘肃农业大学 QQ: 156797555

2 第三章 生物信息学网络资源 NCBI简介(专题)

3 National Center for Biotechnology
Information (NCBI)

4 Entrez系统 NCBI综合数据库 美国国家生物技术信息中心(National Center for Biotechnology Information,简称NCBI)创建于1988年 。 1991年,NCBI开发了Entrez数据库查询系统,用于对GenBank等分子生物学和生物医学文献摘要(Medline)等数据库的查询 (Schuler et al, 1996)。

5

6

7

8 Entrez系统的使用方法

9

10

11 All Database integrates…
the scientific literature; DNA and protein sequence databases; 3D protein structure data; population study data sets; assemblies of complete genomes

12 that integrates NCBI databases
All Database is a search and retrieval system that integrates NCBI databases

13 NCBI分子数据子库 1、单碱基多态性数据库dbSNP 2、基因组数据库(Genome) 3、人类基因组数据库Ensembl,UCSC
4、表达序列标记数据库dbEST 5、序列标记位点数据库dbSTS 6、面向基因聚类数据库UniGene 7、基因组调查序列 dbGSS 测序时的酶切位点附近 标记序列 8、蛋白质结构分类数据库SCOP, Pfam 9、蛋白质二级结构数据库DSSP 10、蛋白质同源序列比对数据库HSSP, Homogene 11、 OMIM(Online Mendelian Inheritance in Man) 人类基因和遗传疾病的分类数据库

14 GenBank分类码 中文名称 符号 灵长类动物序列 PRI 啮齿类动物序列 ROD 其他哺乳动物序列 MAM 其他脊椎动物序列 VRT
back 中文名称 符号 灵长类动物序列 PRI 啮齿类动物序列 ROD 其他哺乳动物序列 MAM 其他脊椎动物序列 VRT 无脊椎动物序列 INV 植物真菌藻类序列 PLN 细菌序列 BCT 病毒序列 VRL 噬菌体序列 PHG 人工合成序列 SYN 未注释序列 UNA 表达序列标签 EST 专利序列 PAT 序列标记位点 STS 基因组测序序列 GSS 高通量基因组序列 HTG 未完成测序的高通量cDNA序列 HTC 高通量cDNA序列

15 Accessing information on molecular sequences

16 database query VS search
(interleukin 18);(3f62)=PDB; Q14116 (IL18_HUMAN)=EBI/UNIPRO; RefSeq= NP_ ; NM_ ; NP_ ; NM_ ; Unigene= Hs Gene ID: 3606, 2. database search=数据库搜索,检索:是指通过特定的序列相似性比对算法找出数据库中与检测序列具有一定程度相似性的序列。

17 Accession numbers(登录号) are labels for sequences
NCBI includes databases (such as GenBank) that contain information on DNA, RNA, or protein sequences. You may want to acquire information beginning with a query such as the name of a protein of interest, or the raw nucleotides comprising a DNA sequence of interest. DNA sequences and other molecular data are tagged with accession numbers that are used to identify a sequence or other record relevant to molecular data.

18 What is an accession number?
An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X GenBank genomic DNA sequence NT_ Genomic contig Rs dbSNP (single nucleotide polymorphism) N An expressed sequence tag (1 of 170) NM_ RefSeq DNA sequence (from a transcript) NP_ RefSeq protein AAC02945 GenBank protein Q SwissProt protein 1KT7 Protein Data Bank structure record DNA RNA protein

19 Four ways to access DNA and
protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI)

20 4 ways to access protein and DNA sequences
[1] Entrez Gene with RefSeq Entrez Gene is a great starting point: it collects key information on each gene/protein from major databases. It covers all major organisms. RefSeq provides a curated, optimal accession number for each DNA (NM_006744) or protein (NP_007635)

21 NCBI’s important RefSeq project: best representative sequences
RefSeq (accessible via the main page of NCBI) NCBI数据库的参考序列。校正的,非冗余集合,包括基因组DNA contigs,已知基因的mRNAs和蛋白。 RefSeq的Accession numbers表示形式 : Complete genome NC_###### Complete chromosome NC_###### Genomic contig NT_###### mRNA (DNA format) NM_###### e.g. NM_006744 Protein NP_###### e.g. NP_006735

22 From the NCBI home page, type “rbp4” and hit “Go”

23

24

25

26

27 By applying limits, there are now just two entries

28

29

30

31 代码 物种来源 参考文献 GeneBank格式记录序列信息

32 专业评论 特性

33

34

35 FASTA format

36 Entrez Gene (top of page)
Note that links to many other RBP4 database entries are available

37 Entrez Gene (middle of page)

38 Entrez Gene (bottom of page)

39 Example of how to access sequence data:
HIV-1 pol There are many possible approaches. Begin at the main page of NCBI, and type an Entrez query: hiv-1 pol

40

41 Following the “genome” link yields a manageable four results
Searching for HIV-1 pol: Following the “genome” link yields a manageable four results

42 Example of how to access sequence data:
HIV-1 pol For the Entrez query: hiv-1 pol there are about 40,000 nucleotide or protein records (and >100,000 records for a search for “hiv-1”), but these can easily be reduced in two easy steps: --specify the organism, e.g. hiv-1[organism] --limit the output to RefSeq!

43 over 100,000 nucleotide entries for HIV-1 only 1 RefSeq

44 Four ways to access DNA and
protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI)

45 DNA RNA protein complementary DNA (cDNA) UniGene

46 UniGene: unique genes via ESTs
• Find UniGene at NCBI: 被整理成簇的EST和全长mRNA序列,每一个代表一种特定已知的或假设的基因,有定位图和表达信息以及同其它资源的交叉参考。记录信息主要为该基因的相关序列(cDNA,EST等)、染色体定位和表达谱信息。其组成的ESTs来源于完整的cDNA文库。 UniGene数据库将GenBank序列自动分为很多簇(cluster),它的每个记录表示一个簇,每个簇代表了一个唯一的基因。

47 Cluster sizes in UniGene
This is a gene with 1 EST associated; the cluster size is 1

48 Cluster sizes in UniGene
This is a gene with 10 ESTs associated; the cluster size is 10

49 Cluster sizes in UniGene (human)
Cluster size Number of clusters 1  8,100 2 38,200 ,300 ,000 ,600 ,700  ,050  ,000 12 16,000-30,000 2 UniGene build 172, 8/04

50 UniGene: unique genes via ESTs
Conclusion: UniGeneis a useful tool to look up information about expressed genes. UniGene displays information about the abundance of a transcript (expressed gene), as well as its regional distribution of expression (e.g. brain vs. liver).

51 练习 利用Enterz查找human CCL18基因的核酸\蛋白质RefSeq序列,保存为FASTA格式,记录RefSeq的Accession numbers。

52 NCBI www.ncbi.nlm.nih.gov
美国国家生物技术信息中心(National Center for Biotechnology Information, NCBI) NCBI成立于1988年,其主要工作是开发以GenBank为代表的数据库,进行计算生物学研究,开发用于分析基因组数据的软件工具,发布生物医学信息。 Entrez是NCBI著名的用于提取序列信息的工具,它将科学文献、DNA和蛋白质序列数据库、蛋白质三维结构数据、种群研究数据以及全基因组组装数据整合成一个高度集成的系统。类似于EBI的SRS.是一个查询、提取和显示系统。

53 NCBI The original version(1991) of Entrez had just 3 nods, now grown to nearly 20 nods

54 NCBI

55 NCBI

56 NCBI

57 Data base http://www.ebi.ac.uk/ http://www.ncbi.nlm.nih.gov/
1.熟悉NCBI- GenBank Entrez检索体系 2. 熟悉SRS (EBI-EMBL) 检索体系。 UniProtKB, ensembl, AraayExp,PDBe,BLAST+,PMC-E 3. 熟悉DBGET (NIG-DDBJ ) 检索体系。

58 Thank you 完 艾对元: 13893660097 aidy@gsau.edu.cn甘肃农业大学 QQ: 156797555
APRIL. 18th, 2014 Thank you

59 Access to Biomedical Literature

60 PubMed is… National Library of Medicine's search service
12 million citations in MEDLINE links to participating online journals PubMed tutorial (via “Education” on side bar)

61 PubMed at NCBI to find literature information

62 PubMed is the NCBI gateway to MEDLINE.
MEDLINE contains bibliographic citations and author abstracts from over 4,600 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to 1966.

63

64

65

66 PubMed search strategies
Try the tutorial (“education” on the left sidebar) Use boolean queries (capitalize AND, OR, NOT) lipocalin AND disease Try using “limits” Try “Links” to find Entrez information and external resources Obtain articles on-line via Welch Medical Library (and download pdf files):

67 · Journal Database 期刊浏览 · MeSh Database 可以用它来分层流览医学主题词 · Single Citation Matcher输入期刊的信息可以找到某单篇的文献或整个期刊的内容。 · Batch Citution Matcher用一种特定的形式输入期刊的信息一次搜索多篇文献。 · Clinical Queries这一部分为临床医生设置,通过过滤的方式将搜索的文献固定在4个范围:治疗、诊断、病原学与预后。 Related Resources · Order Documents可以使用户在当地得到文献的全文, 但有些是要收费的。 · NLM Mobile是对另一个NLM基于网络的查询系统的链接。

68 练习 在PubMed中搜索human CCL18基因研究的报道(2000年以后),列出检索到的篇目,并试图找到一至两篇全文。

69 BLAST is… Basic Local Alignment Search Tool
NCBI's sequence similarity search tool supports analysis of DNA and protein databases 80,000 searches per day

70

71

72 Blastp Blastn Blastx Tblastn Tblastx 蛋白质 核酸 表7 BLAST程序检测序列和数据库类型 程序名
方 法 Blastp 蛋白质 用检测序列蛋白质搜索蛋白质序列数据库 Blastn 核酸 用检测序列核酸搜索核酸序列数据库 Blastx 将核酸序列按6条链翻译成蛋白质序列后搜索蛋白质序列数据库 Tblastn 用检测序列蛋白质搜索由核酸序列数据库按6条链翻译成的蛋白质序列数据库 Tblastx 将核酸序列按6条链翻译成蛋白质序列后搜索由核酸序列数据库按6条链翻译成的蛋白质序列数据库

73

74

75

76

77

78

79

80

81

82 OMIM is… Online Mendelian Inheritance in Man
catalog of human genes and genetic disorders edited by Dr. Victor McKusick, others at JHU

83

84

85

86

87

88

89 Books is… searchable resource of on-line books

90 TaxBrowser is… browser for the major divisions of living organisms
(archaea, bacteria, eukaryota, viruses) taxonomy information such as genetic codes molecular data on extinct organisms

91 Structure site includes…
Molecular Modelling Database (MMDB) biopolymer structures obtained from the Protein Data Bank (PDB) Cn3D (a 3D-structure viewer) vector alignment search tool (VAST)

92

93

94

95

96

97

98

99

100

101

102

103 作业 利用Enterz查找human CCL18,human cxcl1基因的核酸\蛋白质RefSeq序列,保存为FASTA格式,记录从GeneBank获得的序列信息。 在PubMed中搜索human CCL18基因研究的报道(2000年以后),列出检索到的篇目,并试图找到一至两篇全文。


Download ppt "生物信息学 艾对元: 13893660097 aidy@gsau.edu.cn甘肃农业大学 QQ: 156797555 http://blog.sciencenet.cn/u/eddy7777."

Similar presentations


Ads by Google