Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data sharing of HLPP Yunping Zhu State Key Laboratory of Proteomics

Similar presentations


Presentation on theme: "Data sharing of HLPP Yunping Zhu State Key Laboratory of Proteomics"— Presentation transcript:

1 Data sharing of HLPP Yunping Zhu State Key Laboratory of Proteomics
Beijing Proteome Research Center Beijing Institute of Radiation Medicine Beijing,

2 Outline Introduction of HLPP Data sharing of HLPP Summary

3 Introduction of proteomics and HLPP
Why Proteomics? Same Genome Different Proteome Black Swallowtail - larvae and butterfly

4 Milestones Milestones of HPP Milestones of HGP genetic map
physical map sequence map Expression Profile (Proteome) Proteins Linkage Map (Interactome) Subcellular Localization Profile Protein Modification Profile

5 Human Proteome Project
HPPP (plasma) USA HGPI (glyco) HLPP(liver) China Japan Sweden HAI(ab) UK PSI (standards) MRPP (models) HBPP (brain) Germany Canada

6 Scientific objectives of the HUPO HLPP
Generation of Compiled Expression Profile of Liver (Proteome): Comprehensive analysis of human liver protein constituents in health and disease states. Establishment of Subcellular Localization Profile: Determination of protein localization and Profile of Sub-cellular proteome. Networking of Liver Proteins Linkage Map (Interactome): Comprehensive analysis of liver protein-protein interactions and networks of liver proteome. Elucidation of Protein Modification Profile: Systematic analysis of post-translational modifications of liver proteome Bridging Liver Proteome Project and Plasma Proteome Project. Parallel coordination of these two initiatives with respect to resources, technology, and knowledge database will be achieved in order to discover biomarkers. Construction of a Knowledge Database: Integration and correlation of human liver proteome with liver transcriptome and human genome

7 New strategy of HPP NATURE,452 (920), 2008 After serveral years’ augument & discussion, people seems come to a consensus. This is the new strategy of HPP. 该思路与我们在HLPP中提出的两谱两图三库的总体思路基本吻合。 The cost “is absolute peanuts when you consider the importance of mapping the building blocks of life”——Mathias Uhlen

8 Proteomics - globally investigate the biological system
Proteome:The whole protein map of an organism, such as cell, tissue, body fluid. Topics: The protein expression, modification, quantification et al. Methods: Experiment observations +background knowledge, collection and integration 蛋白质组学的基本研究策略是:利用质谱、电泳、芯片等实验观测技术观测生物系统,包括蛋白质的表达、定量、修饰等信息,然后与蛋白质知识库、数据库整合分析,推测生物系统的状态,例如疾病状态。 其中质谱技术是其中最重要的观测手段,具有高通量、高灵敏度等特点。 质谱数据的分析就是从质谱数据中解析蛋白质的存在状态,是蛋白质组生物信息学的重要研究内容。 Mass spectrometer is one of the most important instrument in proteomics

9 Technical routes employed for proteome analysis
Each protocol has to be performed by more than two labs and more than two times in each lab Technical Routes Separation Methods Mass Spectrometry Searching Algorithm 2DLC-ESI (Bottom-up) Digestion_SCXLC_RPLC ESI_Qq-TOF Mascot ESI_ITMS Sequest 3DLC-ESI SCX_Digestion_SCXLC_RPLC SAX_Digestion_SCXLC_RPLC SEC_Digestion_SCXLC_RPLC MALDI_TOFTOF 2DE-MALDI (Top-down) 2DE_Digestion 1DE-LC-ESI SAX_SDS_Digestion_RPLC SDS_Digestion_RPLC LABS: F He & X Qian (BPRC) P Yang & F He (FUDAN) Rong Zeng (SIBS) Siqi Liu (BGI) Each protocol has to be performed by more than two labs more than two times in each lab

10 The usage of MS data Biological aspect
Identification of peptides and proteins PTMs: types and sites Quantification of peptides and proteins Technique aspects The improvement of experimental strategy, instrument design The discovery of the physical and chemical elements of experiments 质谱数据的用途: 1、生物学方面:肽段蛋白质的鉴定;翻译后修饰分析,包括类型和位点分析;定量分析; 2、技术方面:实验策略的改进,仪器平台的改进和设计;分析实验的物理化学原理,建立实验数据产出模型,例如MS/MS图谱预测问题,ESI电离模型,的哦偶可以用于搜库鉴定

11 Data management and sharing
HLPP Proteome Data Standard Data management platform Data process and integration dbLEP: A Database of Liver Proteome Expression Profile Problems arising

12 HLPP Proteome Data Standard
Sample Information Protein extraction Concentration Measurement Separation Gel-based 2DE Gel1D-IEF Gel1D-SDS LC-based RPLC SCXLC SAXLC SEC Trap Desalting Digestion Ion source MALDI Autoflex Ultraflex ABI 4700 ESI Qstar QTofMicro LCQ LTQ Mass spectrometry TOF TOFTOF QTOF ITMS Peak list generation ABI 4700 QStar Autoflex TurboSequest Masslynx FlexAnalysis Peak list preprocessing GPS peak list preprocess Protein/peptide identification Mascot Sequest Protein list generation GPS Biotools Qstar BuildSummary DTASelect Bioworks AutoQuest 模块与实验环节一致 结构清晰 方便数据重用 方便用户数据定制 较少嵌套 适应技术演进

13 Data management platform for HLPP
Experimental Database MWS Data Client GUI,Web Services, API Files Interchange, DMBS Dump Data synchronize Data submission Experiment reports, data, references, files… Data Center Mirror MWS Server Administrator Submission Packages Repository Project Management Database Project Management System Participating labs Data Center Data exhibition Project Mana-gement Data Center Server 修改意见:将各个系统框出来, 加上中文 Clients for Labs

14 Data process and integration

15 Overview of protein identification in CAL
99% peptide confidence Total identification: 515,728 Total non-redundant peptides: 19,509 (without considering PMF and Combine data) Total non-redundant proteins: 5,454 Proteins with 2 or more peptides (or identified by both tandem MS and PMF containing data): 3,013 95% peptide confidence Total identification: 807,394 Total non-redundant peptides: 37,914 (without considering PMF and Combine data) Total non-redundant proteins: 12,951 Proteins with 2 or more peptides (or identified by both tandem MS and PMF containing data): 6,788 807394/515728= 37914/19509= 12951/5454=

16 dbLEP: A Database of Liver Proteome Expression Profile
Liver Expression Profile database (dbLEP) aims to be an information center of liver protein expression profile. For each dataset, dbLEP provides all identification results including none-redundant identified protein, all possible identified proteins, peptides and their spectrums. Detailed annotation for each identified protein Large number of intact data resources, abundant links and flexible search functions

17 Structure of dbLEP

18

19 ProjectDetail

20 ProteinList

21 proteinDetail

22 proteinAnnotation

23 spectrumDetail

24 filterSearch

25 proteinSearch

26 FAQ

27 SiteMap

28 AboutUs

29 Data Sets of HLPP Expression Profile
Expression profile of Chinese fetal liver Expression profile of French adult liver Expression profile of Chinese adult liver Expression profile of Chinese adult liver organelle Expression profile of C57 mice liver organelle Expression profile of Chinese adult liver cell ORFs of Chinese adult liver

30 Availability of HLPP MS data
Exchange data with EBI, ISB and NIST Data available at: BPRC dblep EBI Pride DB NIST peptide atlas

31 Problems arising Data packed or unpacked Large amount of small files
Data transfer … Data integrity

32 Data categories Raw data Peak list Mascot file Peptide list
Protein list Data annotation File size: Raw data < Peak list < Mascot file

33 Data transfer examples
Data to EBI (Pride): expression profile of Human Fetal Liver (~40 Giga bytes), by DVD Data to NIST (Peptide Atlas): expression profile of Human Fetal Liver (~40 Giga bytes), by DVD Data to ISB: expression profile of French Adult Liver (~80 Giga bytes), by DVD Data to University of McGill: 4 expression profiles (~430 Giga bytes), by USB Disk

34 Data transfer example Data to University of McGill: ~ 430 GB (mascot files) Data transfer rate: max: 1.25 mbps, ~156 kBps avg: 888 kbps, 111 kBps Time needed for data transferring: ~47 days Internet connection in BPRC: 100 M (by CNC/China unicom)

35 Exhibition of data analysis

36 Acknowledgements Beijing Proteome Research Center
Prof. Fuchu He Dr. Songfeng Wu Dr. Dong Li Dr. Lei Dou Dr. Jiyang Zhang Dr. Jianqi Li Ling Tang, Engineer Xiuhe Wang, Engineer Wei Liu Jie Ma Prof. Xiaohong Qian Prof. Dongsheng Zhao Dr. Wantao Ying Yufeng Wang Xiaolei Wang Yiqing Mao Lin Li Participating labs of CNHLPP Beijing Genome Institute Beijing Institute of Radiation medicine Central-South University Fudan University Hunan Normal University Peking University Shanghai Institute for Biological Sciences Funds National High Technology Research and Development Program of China (2006AA02A312) Chinese National Key Program of Basic Research (2006CB910803) Beijing Municipal Science and Technology Project (H )

37 Thanks!

38 基本情况 1995年接入因特网,信息点约8000,联网计算机约2500,各类服务器、交换机约350,骨干1000M,桌面100M。
出口100M(北京电信通, BEIJING TELETRON TELECOM ENGINEERING CO.,LTD.)+20M(恒川科技,目前专用)。 同时在线峰值800~900人,峰值带宽接近100M(每日19:00~24:00,节假日周末全天)。日流量(上下行合计)约800GB。 安全:防火墙+行为网关+审计+IDS+邮件网关+WEB网关+网络杀毒+MRTG流量监控+办公区防毒墙+办公区代理;行为网关分组限制每个用户上网行为(已限制P2P但未关闭)、连接数、带宽等。 接入了科技网CNGI项目。 目前国家气象信息中心使用中国科技网为其提供的国际出口与NOAA(美国海洋与大气管理局)进行实时数据传输,使用带宽维持在60-80Mb,使用效果良好。该合作项目也是由美方发起,介绍国家气象信息中心使用GLORIAD线路,正好科技网方面在那时找到了国家气象局进行下一代互联网试验网络的合作(CNGI),即将气象信息中心同时接入了科技网的v4网络,使用该条国际出口,目前使用效果良好。

39 Basic information Internet access: 1995
~8000 info points. ~2500 have internet access Backbone: 1000M Desktop: 100M ISP 100M, BEIJING TELETRON TELECOM ENGINEERING CO.,LTD. 20M, exclusive use, Hengchuan Tech. CO.,LTD. Access to CERN/CNGI project

40 Network topology

41 Network traffic on

42 Statistic of network traffic

43 Trend of network traffic


Download ppt "Data sharing of HLPP Yunping Zhu State Key Laboratory of Proteomics"

Similar presentations


Ads by Google