野生稻O. rufipogon W1943 1888条全长cDNA序列的数据分析 NCGR 2008-09-03
背景 野生稻O. rufipogon(AA genome)是与栽培稻关系最近的祖先水稻品种1,2。 具有许多优于栽培稻的农艺性状,比如耐旱、耐盐等等3,4; 公共数据库中有大量栽培稻的基因组序列信息5,6,同时也有大量的cDNA资源7,8; 极少野生稻的序列和克隆资源,比较成规模的是Oryza minuta (BBCC genome) 5,211条叶片ests9。
现状与目的 NCGR野生稻资源:克隆并精确测序了1,888个unique的O. rufipogon W1943 cDNA克隆。 期望通过W1943 cDNA序列与籼、粳稻cDNA序列的比较: 汇总一些水稻新基因、潜在野生稻特有的基因、W1943特有剪切方式基因、组织特异性高表达的基因和与microRNA相关的基因; 提供一些线索,供有兴趣者作进一步研究之用。
1888 W1943 cDNAs BLAST against cultivated rice genomic sequences and cDNAs 1888 W1943 cDNAs SSR comparison with indica and japonica cDNAs
一、未匹配粳稻基因组之基因 定义:未能定位到O. sativa japonica Nipponbare genome sequences,但与籼稻93-11基因组序列有同源或与水稻ests序列有同源或与其它禾本科ests序列有同源。 且去除与细菌有同源的基因 解释:或者落于粳稻基因组测序gap中,或者籼稻特有的基因,或者野生稻特有基因。 name 93-11 contigs ESTs or mRNA hits protein CT842002 Contig005912 AK241925 - CU406895 Contig003011 CT859459 CT842007 Contig008507 CT856206 CU861744 Contig000750 AK099287 ring-box protein CU405940 Contig001402 AK103326 CU405657 CT856885 CU406172 Contig014596 AK242967 CT841712 CA766528 CT842006 Contig000383 AK111647 GTP-binding protein CU405768 CT836656 60S ribosomal protein L7A CU861753 CU405675 CA756235 60S ribosomal protein L17 CU406308 Contig000444 AK070131 CU406202 NM_001063334 CT841996 Contig002576 CT834800 CU406924 AC145809 CU406568 Contig003848 AK064050 Bowman Birk trypsin inhibitor CU405898 CN130755.1 (Sorghum bicolor) ribulose-bisphosphate carboxylase CU406582 AK107776 CU406778 BE429292.1 (Triticum turgidum) CU406596 Contig001277 AK242711 CU861677 FF534517.1 (Manihot esculenta) CT842008 -- CT841912 EH277383.1 (Spartina alterniflora)
二、水稻新基因 定义:能定位到栽培稻基因组序列的同源,但无任何已知水稻表达序列的同源。与rice MPSS搜索比较几乎没有找到匹配片段。 解释:水稻新基因。或者在栽培稻中表达量过低难于克隆,或者野生稻特有。 name Len(bp) Chr location Identity(%) Antisense protein CU406910 656 10 99 CU405785 727 05 CA764081 DNA-directed RNA polymerase 3 CU406138 568 02 CU861795 475 09 79 CT858901 - CU406022 543 12 CU406355 837 97 AK107125 AP2 domain, putative CU405757 477 04 100 CU406396 520 AK103485 CU406921 414 CT841800 941 11 AK121962 patatin, putative CU406535 389 CU861688 693 08 AK109182 CU406832 530 92 CT841937 1552 98 AK106713 CU406871 458 01 84 注:该17个基因均没有找到任何蛋白同源匹配。右侧的7个基因与已知的水稻表达序列成反义RNA对。 CU861804 383 06 CU861721 554
三、W1943特有剪切方式基因 定义:与栽培稻japonica基因组序列完全一致(100% identity),同时与栽培稻表达序列同源但剪切方式独有(独特的AS剪切方式)。 解释:或只是尚未克隆到该AS表达方式;或为野生稻所独具。 name Len(bp) Chr location No. of exon protein CT841942 978 07 6 (1st intron: GC-AG) - CU406810 958 06 6 (1st intron: GT-TG) dual-specificity phosphatase protein CT841893 1011 01 6 drought-induced protein CT841874 1369 4 vesicle transport protein CU405853 1377 05 1 dehydration-responsive protein CU405923 639 IAA amidohydrolase CU406279 648 CU406025 839 02 CT841561 740 2 CU406579 468 09 CU406935 1345 CU406600 1107 CU405570 952 CU406091 893 3 CU406134 665 10
some W1943 cDNAs unique splicing pattern exon剪切出现intron;intron中出现exon exon剪切出现intron intergenic区出现exon 2个exon合并成单exon some W1943 cDNAs unique splicing pattern ⅰ: The expression level of every gene should exceed 100 tpm (times per million) of at least one tissue. ⅱ: If the gene expressed in several diverse tissues, then the percentage of the highest expression level should be more than 75% among all tissues. ⅲ: The ratio of the first two highest expression level should be over 10. 41 putative tissue-specific genes10 http://mpss.udel.edu/rice/
四、组织特异性高表达基因 name len ORF tissue protein orw1943s101k15 619bp 51-434bp leaf light-regulated protein orw1943c102c24 833bp 62-463bp subunit of ribulose-1,5-bisphosphate carboxylase orw1943s101p08 1544bp 111-650bp glycolate oxidase orw1943s101h18 912bp 120-752bp H+-transporting ATP synthase chain 9-like protein orw1943c113b17 837bp 108-623bp retrotransposon protein, putative, unclassified orw1943c002g13 843bp 58-576bp alanine aminotransferase orw1943c003d24 404bp 18-227bp ferredoxin-NADP(H) oxidoreductase orw1943c104a05 985bp 70-777bp photosystem-1 F subunit precursor orw1943c006o21 625bp 89-310bp root metallothionein-like protein type 1 orw1943c103g16 916bp 110-553bp MAPEG family orw1943c003o09 619bo 111-311bp Potato inhibitor I family orw1943c112g14 1008bp 58-825bp receptor-like protein kinase orw1943c104g04 805bp 115-618bp pathogenesis-related protein PR-1a orw1943c006h22 664bp 1-453bp pathogenesis-related protein 4b orw1943s101p06 512bp 36-353bp N-carbamyl-L-amino acid amidohydrolase orw1943c111d19 1399bp 59-1312bp germinating seedling putative alpha-galactosidase orw1943c002o05 769bp 82-432bp nonspecific lipid-transfer protein 2 precursor orw1943c002p22 518bp 92-331bp meristematic tissue metallothionein orw1943c109d02 682bp 45-230bp muture pollen putative lipase
五、潜在miRNA及miRNA靶基因 判断流程及标准11: microRNAs:21-23nt小分子RNA,由具发夹结构的70-90nt单链RNA前体经Dicer酶加工而来。具种间保守性。 作用方式:通过不完全互补结合到靶目标mRNA (多数3’ UTR区),诱发蛋白翻译抑制,不影响转录本的稳定性;少数miRNA可能以类似siRNA的方式诱导靶目标mRNA的降解。根据互补的完全程度发挥不同的作用。 判断流程及标准11:
鉴定潜在的野生稻miRNA putative miRNA Length(bp) pre-miRNA len(bp) hit_miRNA Chr location miRNA sequence CU406292 1416 262 osa-MIR159a 01 uuuggauugaagggagcucug CU405943 1511 101 osa-MIR156j 06 ugacagaagagagugagcac CU861819 561 80 osa-miR818e 04 aaucccuuauauuuugggacgg CU861752 727 150 osa-miR446 10 aucaauaugaaugugggaaau CU406292 pre-miRNA CU405943 pre-miRNA
reference 1. Wang, Z. Y., Second, G., and Tanksley, S. D. 1992, Polymorphism and phylogenetic relationships among species in the genus Oryza as determined by analysis of nuclear RFLPs, Theor. Appl. Genet., 83, 565–581. 2. Londo, J. P., Chiang, Y. C., Hung, K. H., Chiang, T. Y., and Schaal, B. A. 2006, Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple independent domestications of cultivated rice, Oryza sativa, Proc. Natl. Acad. Sci. USA, 103, 9578-83. 3. Zhang, X., Zhou, S., Fu, Y., Su, Z., Wang, X., and Sun, C. 2006, Identification of a drought tolerant introgression line derived from Dongxiang common wild rice (O. rufipogon Griff.), Plant Mol. Biol., 62, 247-59. 4. Tian, F., Zhu, Z., Zhang, B., et al. 2006, Fine mapping of a quantitative trait locus for grain number per panicle from wild rice (Oryza rufipogon Griff.), Theor. Appl. Genet., 113, 619-29. 5. International Rice Genome Sequencing Project. (2005), The map-based sequence of the rice genome, Nature, 436, 793-800. 6. Yu, J., Hu, S., Wang, J., et al. 2002, A draft sequence of the rice genome Oryza sativa L.ssp. indica, Science, 296, 92-100. 7. The Rice Full-Length cDNA Consortium. (2003), Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice, Science, 301, 376-379. 8. Liu, X. H., Lu, T. T., Yu, S. L., et al. 2007, A collection of 10,096 indica rice full-length cDNAs reveals highly expressed sequence divergence between Oryza sativa indica and japonica subspecies, Plant Molecular Biology, (accepted) . Cho, S. K., Ok, S. H., Jeung, J. U., et al. (2004), Comparative analysis of 5,211 leaf ESTs of wild rice (Oryza minuta), Plant Cell Rep., 22, 839-47. Nakano, M., Nobuta, K., Vemaraju, K., Tej, S.S., Skogen, J.W., and B.C. Meyers. (2006), Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA, Nucleic Acids Research, 34, D731-D735. Xie, F. L., Huang, S. Q., Guo, K., Xiang, A. L., Zhu, Y. Y., Nie, L., and Yang, Z. M. (2007), Computational identification of novel microRNAs and targets in Brassica napus, FEBS Letters, 581, 1464-1474.