Data sharing of HLPP Yunping Zhu State Key Laboratory of Proteomics

Slides:



Advertisements
Similar presentations
Peking Union Medical College Chinese Academy of Medical Sciences 尿蛋白标志物数据库 —— 现状和未来 中国医学科学院基础医学研究所 生物信息中心 邵晨
Advertisements

第十七章 基因组学与医学 GENOMICS AND MEDICINE 刘新文 北京大学医学部生化与分子生物学系.
基於OPAC的CALIS聯合目錄 資源整合與檢索 Resource Integration and Retrieval in CALIS Union Catalog Based on OPAC System 劉春玥 Liu Chunyue (CALIS聯合目錄部,北京大學圖書館) (CALIS Union.
DATE: 14/10/2009 陳威宇 格網技術組 雲端運算相關應用 (Based on Hadoop)
创业 敬业 专业.
了解血压现状、关注变化趋势 ---全国第五次高血压调查的意义及方案介绍
如何在Elsevier期刊上发表文章 china.elsevier.com
APS先進規劃與排程系統 簡介與導入案例分享
Web of Science 激励发现 推动创新 西南交通大学 刘广宇 汤姆森科技信息集团 中国办事处
网格 及其应用的一些相关技术 高能所计算中心 于传松
Physician Financial Incentives and Cesarean Section Delivery
一流的科技信息推动一流的科学研究 SCI数据库在科研中的价值与应用
Handel Cheng, Ph.D. Dr. Jane Formula Tech. CO., LTD.
无锡科技职业学院 Wuxi Professional College of Science and Technology
HADOOP的高能物理分析平台 孙功星 高能物理研究所/计算中心
都市計畫概論論文概述及評論: 彰化高鐵站區域計畫
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Semantic-Synaptic Web Mining: A Novel Model for Improving the Web Mining 報告者:陳宜樺 報告日期:2015/9/25.
libD3C: 一种免参数的、支持不平衡分类的二类分类器
题目 第一作者1,2,第二作者1,3, 及第三作者等 1,4* 1,大学,部门,城市
Rate and Distortion Optimization for Reversible Data Hiding Using Multiple Histogram Shifting Source: IEEE Transactions On Cybernetics, Vol. 47, No. 2,February.
ISI Web of Science 7.0 加速学术信息交流 推动科学研究发展
Web of Science ——Science Citation Index(SCI)网络版
毕业论文报告 孙悦明
Manifold Learning Kai Yang
Speaker: Kai-Wei Ping Advisor: Prof Dr. Ho-Ting Wu 2014/06/23
廢棄物處理現況與展望 行政院環境保護署綜計處 處長 葉俊宏.
更加高效利用SciVerse ScienceDirect
王耀聰 陳威宇 國家高速網路與計算中心(NCHC)
化学生物信息学 -从进化到药物发现 张红雨 (华中农业大学生物信息中心).
The Research of Applying GuideLine Interchange Format (GLIF) of Protégé Application 報告人:博三 陳正怡 指導教授 李友專 博士 2018年11月20日 醫學資訊報告.
如何從事論文寫作 2 玄奘大學 林國威
ESI-MS/MS、MALDI-Q-TOF
Knowledge Engineering & Artificial Intelligence Lab (知識工程與人工智慧)
Journal Citation Reports® 期刊引文分析報告的使用和檢索
ProQuest 科研数据库 (剑桥科学文摘CSA)
圖形溝通大師 Microsoft Visio 2003
中国散裂中子源小角谱仪 的实验数据格式与处理算法 报告人:张晟恺 中国科学院高能物理研究所 SCE 年8月18日
Draft Amendment to STANDARD FOR Information Technology -Telecommunications and Information Exchange Between Systems - LAN/: R: Fast BSS.
國立陽明大學 臨床醫學研究所 簡報 2005 報告人 臨床醫學研究所所長 吳肇卿 教授.
China Standardization activities of ITS
SAP 架構及基本操作 SAP前端軟體安裝與登入 Logical View of the SAP System SAP登入 IDES
基于自适应同步的网络结构识别 陆君安 School of Mathematics and Statistics, Wuhan University (复杂网络论坛,北京,April.27-29th,2011)
彭丰林 王丹 祁民 沈晓阳 张健 黄清华 中国虚拟地磁台建设构想 PENG Fenglin, WANG Dan, QI Min, SHEN Xiaoyang, HUANG Qinghua 彭丰林 王丹 祁民 沈晓阳 张健 黄清华
Proteomics: the global analysis of proteins
研究經驗與趨勢分享 黃悅民 Department of Engineering Science,
替代方案(Alternatives)之參考
Measurement of Magic Wavelengths for the 40Ca+ Clock Transition
在戒慎恐懼中前進,之二: 基改作物、生物資料庫
Version Control System Based DSNs
Sensor Networks: Applications and Services
資料銷售方式對圖書館聯盟的衝擊 黃鴻珠 淡江大學圖書館
虚 拟 仪 器 virtual instrument
中美图书馆之间合作的过去、现在和未来 Sino-U. S
從 ER 到 Logical Schema ──兼談Schema Integration
Component 2: Workshop 第二部分研讨会
Inter-band calibration for atmosphere
A Data Mining Algorithm for Generalized Web Prefetching
Interactome data and databases: different types of protein interaction
An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。 本文目的: 做到既有较高的藏量(1bpp),
SAP 架構及基本操作 SAP前端軟體安裝與登入 Logical View of the SAP System SAP登入 IDES
高效洁净机械制造实验室是 2009 年教育部批准立项建设的重点实验室。实验室秉承“突出特色、创新发展“的宗旨,以求真务实的态度认真做好各项工作。 实验室主任为黄传真教授,实验室副主任为刘战强教授和李方义教授。学术委员会主任为中国工程院院士卢秉恒教授。实验室固定人员中,有中国工程院院士艾兴教授,教育部.
主要内容 什么是概念图? 概念图的理论基础 概念图的功能 概念地图的种类 如何构建概念图 概念地图的评价标准 国内外概念图研究现状
替代方案(Alternatives)之參考
4 純化策略 Purification strategy
如何在Elsevier期刊上发表文章 china.elsevier.com
第九章 蛋白质组学 的研究方法和进展 2019/10/24.
Pattle Pun. Professor of Biology emeritus, Wheaton College, IL
CAI-Asia China, CATNet-Asia
Presentation transcript:

Data sharing of HLPP Yunping Zhu State Key Laboratory of Proteomics Beijing Proteome Research Center Beijing Institute of Radiation Medicine Beijing, 2009-10-19

Outline Introduction of HLPP Data sharing of HLPP Summary

Introduction of proteomics and HLPP Why Proteomics? Same Genome Different Proteome Black Swallowtail - larvae and butterfly

Milestones Milestones of HPP Milestones of HGP genetic map physical map sequence map Expression Profile (Proteome) Proteins Linkage Map (Interactome) Subcellular Localization Profile Protein Modification Profile

Human Proteome Project HPPP (plasma) USA HGPI (glyco) HLPP(liver) China Japan Sweden HAI(ab) UK PSI (standards) MRPP (models) HBPP (brain) Germany Canada

Scientific objectives of the HUPO HLPP Generation of Compiled Expression Profile of Liver (Proteome): Comprehensive analysis of human liver protein constituents in health and disease states. Establishment of Subcellular Localization Profile: Determination of protein localization and Profile of Sub-cellular proteome. Networking of Liver Proteins Linkage Map (Interactome): Comprehensive analysis of liver protein-protein interactions and networks of liver proteome. Elucidation of Protein Modification Profile: Systematic analysis of post-translational modifications of liver proteome Bridging Liver Proteome Project and Plasma Proteome Project. Parallel coordination of these two initiatives with respect to resources, technology, and knowledge database will be achieved in order to discover biomarkers. Construction of a Knowledge Database: Integration and correlation of human liver proteome with liver transcriptome and human genome

New strategy of HPP NATURE,452 (920), 2008 After serveral years’ augument & discussion, people seems come to a consensus. This is the new strategy of HPP. 该思路与我们在HLPP中提出的两谱两图三库的总体思路基本吻合。 The cost “is absolute peanuts when you consider the importance of mapping the building blocks of life”——Mathias Uhlen

Proteomics - globally investigate the biological system Proteome:The whole protein map of an organism, such as cell, tissue, body fluid. Topics: The protein expression, modification, quantification et al. Methods: Experiment observations +background knowledge, collection and integration 蛋白质组学的基本研究策略是:利用质谱、电泳、芯片等实验观测技术观测生物系统,包括蛋白质的表达、定量、修饰等信息,然后与蛋白质知识库、数据库整合分析,推测生物系统的状态,例如疾病状态。 其中质谱技术是其中最重要的观测手段,具有高通量、高灵敏度等特点。 质谱数据的分析就是从质谱数据中解析蛋白质的存在状态,是蛋白质组生物信息学的重要研究内容。 Mass spectrometer is one of the most important instrument in proteomics

Technical routes employed for proteome analysis Each protocol has to be performed by more than two labs and more than two times in each lab Technical Routes Separation Methods Mass Spectrometry Searching Algorithm 2DLC-ESI (Bottom-up) Digestion_SCXLC_RPLC ESI_Qq-TOF Mascot ESI_ITMS Sequest 3DLC-ESI SCX_Digestion_SCXLC_RPLC SAX_Digestion_SCXLC_RPLC SEC_Digestion_SCXLC_RPLC MALDI_TOFTOF 2DE-MALDI (Top-down) 2DE_Digestion 1DE-LC-ESI SAX_SDS_Digestion_RPLC SDS_Digestion_RPLC LABS: F He & X Qian (BPRC) P Yang & F He (FUDAN) Rong Zeng (SIBS) Siqi Liu (BGI) Each protocol has to be performed by more than two labs more than two times in each lab

The usage of MS data Biological aspect Identification of peptides and proteins PTMs: types and sites Quantification of peptides and proteins … Technique aspects The improvement of experimental strategy, instrument design The discovery of the physical and chemical elements of experiments 质谱数据的用途: 1、生物学方面:肽段蛋白质的鉴定;翻译后修饰分析,包括类型和位点分析;定量分析; 2、技术方面:实验策略的改进,仪器平台的改进和设计;分析实验的物理化学原理,建立实验数据产出模型,例如MS/MS图谱预测问题,ESI电离模型,的哦偶可以用于搜库鉴定

Data management and sharing HLPP Proteome Data Standard Data management platform Data process and integration dbLEP: A Database of Liver Proteome Expression Profile Problems arising

HLPP Proteome Data Standard Sample Information Protein extraction Concentration Measurement Separation Gel-based 2DE Gel1D-IEF Gel1D-SDS LC-based RPLC SCXLC SAXLC SEC Trap Desalting Digestion Ion source MALDI Autoflex Ultraflex ABI 4700 ESI Qstar QTofMicro LCQ LTQ Mass spectrometry TOF TOFTOF QTOF ITMS Peak list generation ABI 4700 QStar Autoflex TurboSequest Masslynx FlexAnalysis Peak list preprocessing GPS peak list preprocess Protein/peptide identification Mascot Sequest Protein list generation GPS Biotools Qstar BuildSummary DTASelect Bioworks AutoQuest 模块与实验环节一致 结构清晰 方便数据重用 方便用户数据定制 较少嵌套 适应技术演进

Data management platform for HLPP Experimental Database MWS Data Client … GUI,Web Services, API Files Interchange, DMBS Dump Data synchronize Data submission Experiment reports, data, references, files… Data Center Mirror MWS Server Administrator Submission Packages Repository Project Management Database Project Management System Participating labs Data Center Data exhibition Project Mana-gement Data Center Server 修改意见:将各个系统框出来, 加上中文 Clients for Labs

Data process and integration

Overview of protein identification in CAL 99% peptide confidence Total identification: 515,728 Total non-redundant peptides: 19,509 (without considering PMF and Combine data) Total non-redundant proteins: 5,454 Proteins with 2 or more peptides (or identified by both tandem MS and PMF containing data): 3,013 95% peptide confidence Total identification: 807,394 Total non-redundant peptides: 37,914 (without considering PMF and Combine data) Total non-redundant proteins: 12,951 Proteins with 2 or more peptides (or identified by both tandem MS and PMF containing data): 6,788 807394/515728=1.565542 37914/19509=1.943411 12951/5454=2.374587

dbLEP: A Database of Liver Proteome Expression Profile Liver Expression Profile database (dbLEP) aims to be an information center of liver protein expression profile. For each dataset, dbLEP provides all identification results including none-redundant identified protein, all possible identified proteins, peptides and their spectrums. Detailed annotation for each identified protein Large number of intact data resources, abundant links and flexible search functions

Structure of dbLEP

http://dblep.hupo.org.cn

ProjectDetail

ProteinList

proteinDetail

proteinAnnotation

spectrumDetail

filterSearch

proteinSearch

FAQ

SiteMap

AboutUs

Data Sets of HLPP Expression Profile Expression profile of Chinese fetal liver Expression profile of French adult liver Expression profile of Chinese adult liver Expression profile of Chinese adult liver organelle Expression profile of C57 mice liver organelle Expression profile of Chinese adult liver cell ORFs of Chinese adult liver

Availability of HLPP MS data Exchange data with EBI, ISB and NIST Data available at: BPRC dblep EBI Pride DB NIST peptide atlas

Problems arising Data packed or unpacked Large amount of small files Data transfer … Data integrity

Data categories Raw data Peak list Mascot file Peptide list Protein list Data annotation File size: Raw data < Peak list < Mascot file

Data transfer examples Data to EBI (Pride): expression profile of Human Fetal Liver (~40 Giga bytes), by DVD Data to NIST (Peptide Atlas): expression profile of Human Fetal Liver (~40 Giga bytes), by DVD Data to ISB: expression profile of French Adult Liver (~80 Giga bytes), by DVD Data to University of McGill: 4 expression profiles (~430 Giga bytes), by USB Disk

Data transfer example Data to University of McGill: ~ 430 GB (mascot files) Data transfer rate: max: 1.25 mbps, ~156 kBps avg: 888 kbps, 111 kBps Time needed for data transferring: ~47 days Internet connection in BPRC: 100 M (by CNC/China unicom)

Exhibition of data analysis

Acknowledgements Beijing Proteome Research Center Prof. Fuchu He Dr. Songfeng Wu Dr. Dong Li Dr. Lei Dou Dr. Jiyang Zhang Dr. Jianqi Li Ling Tang, Engineer Xiuhe Wang, Engineer Wei Liu Jie Ma Prof. Xiaohong Qian Prof. Dongsheng Zhao Dr. Wantao Ying Yufeng Wang Xiaolei Wang Yiqing Mao Lin Li Participating labs of CNHLPP Beijing Genome Institute Beijing Institute of Radiation medicine Central-South University Fudan University Hunan Normal University Peking University Shanghai Institute for Biological Sciences Funds National High Technology Research and Development Program of China (2006AA02A312) Chinese National Key Program of Basic Research (2006CB910803) Beijing Municipal Science and Technology Project (H030230280590)

Thanks! zhuyp@hupo.org.cn

基本情况 1995年接入因特网,信息点约8000,联网计算机约2500,各类服务器、交换机约350,骨干1000M,桌面100M。 出口100M(北京电信通, BEIJING TELETRON TELECOM ENGINEERING CO.,LTD.)+20M(恒川科技,目前专用)。 同时在线峰值800~900人,峰值带宽接近100M(每日19:00~24:00,节假日周末全天)。日流量(上下行合计)约800GB。 安全:防火墙+行为网关+审计+IDS+邮件网关+WEB网关+网络杀毒+MRTG流量监控+办公区防毒墙+办公区代理;行为网关分组限制每个用户上网行为(已限制P2P但未关闭)、连接数、带宽等。 接入了科技网CNGI项目。 目前国家气象信息中心使用中国科技网为其提供的国际出口与NOAA(美国海洋与大气管理局)进行实时数据传输,使用带宽维持在60-80Mb,使用效果良好。该合作项目也是由美方发起,介绍国家气象信息中心使用GLORIAD线路,正好科技网方面在那时找到了国家气象局进行下一代互联网试验网络的合作(CNGI),即将气象信息中心同时接入了科技网的v4网络,使用该条国际出口,目前使用效果良好。

Basic information Internet access: 1995 ~8000 info points. ~2500 have internet access Backbone: 1000M Desktop: 100M ISP 100M, BEIJING TELETRON TELECOM ENGINEERING CO.,LTD. 20M, exclusive use, Hengchuan Tech. CO.,LTD. Access to CERN/CNGI project

Network topology

Network traffic on 20091026

Statistic of network traffic

Trend of network traffic