E-mail: xiaochuanle@126.com Improving peptide identification for tandem mass spectrometry by incorporating translatomics information Chuan-Le Xiao (肖传乐) 中山大学眼科学国家重点实验室 E-mail: xiaochuanle@126.com
Background 1 3 steps in protein identification:
Background 1 J. Proteome Res. 2014, 13, 4113−4119
Background 1 Translatomics (Ribosome profiling, Ribo-seq)
Background 1 Mapping 困难:基因组序列太长, 计算方法要求: 需要比对reads量大。 • 速度和准确度 Sequencing read 50-150bp 目前测序产生数百万个短读序列(reads) ,将每个read在基因组上准确定位。 Mapping 困难:基因组序列太长, 需要比对reads量大。 计算方法要求: • 速度和准确度 • 可接受的内存耗用
高灵敏度!高速度!高精度! Background 1 FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads. Nucleic Acids Res. 2012 FANSe2: A Robust and Cost-Efficient Alignment Tool for Quantitative Next-Generation Sequencing Applications.PLoS ONE 高灵敏度!高速度!高精度!
Background 1 A549正在转录mRNA量与蛋白质量关系
Background 1 m/z 峰强度 ? 问题:蛋白的鉴定效率低(约10-30%)
Background 1 ProVerB: 高鉴定能力和高精度,广泛使用性且可靠性高
Background 1 1) 配对氨基酸与峰强度统计分析(b, y离子强度矩阵) i=A,C,…. j=A,C,….
Background 1 . 理论峰产生规则: 1. b ,y碎片离子必须产生 2. 碎片离子包含S,T,E,D产生 2) 产生理论图谱 理论峰产生规则: 1. b ,y碎片离子必须产生 2. 碎片离子包含S,T,E,D产生 b-H2O和 y-H2O 3. 包含R,K,Q,N产生b-NH3,和y-NH3 4.母离子价态大于1且包含S,H,K 生成二价离子 .
Background 1 3) 打分模型 实例 匹配打分模型 连续匹配打分模型 b3和b4 ,b4和b5 b, y离子匹配打分模型 P0=0.06 连续匹配打分模型 b3和b4 ,b4和b5 r=0.09083 b, y离子匹配打分模型 总分和去背景值
Background 1 IPomics we propose a novel strategy and develop a software system called IPomics for peptides identification by incorporating prior information from tranlatomics abundance information
Materials and method 2 1. Five data resource Ribo-seq and MS/MS paired datasets
Materials and method 2 2. Analysis pipline ProVerB FANSe2 2. Analysis pipline The analysis pipeline of IPomics was made up of five key steps
RESULTS 3 1. The prior information of FPKM for protein identification 2. The incorporation of tranlatomic FPKM in scoring model 3. Comparison of IPomics with Mascot, OMSSA, X!Tandem and Pfind 4. Computational validation with SILAC and Tyrosine phosphorylation datasets
RESULTS 3 1. The prior information of FPKM for protein identification
RESULTS 3 Established a quantification model to transform the FPKM of translatomic into the corresponding probability of protein identification
RESULTS 3 2. The incorporation of tranlatomic FPKM in scoring model There were two ways included simple fragment match and consecutive ion match for incorporating the PF of prior information FPKM in the binomial scoring model we evaluated the different distribution of peptide score by applying two scoring methods -10·lg(P) and -10·lg(Psimple)
RESULTS 3 3. Comparison of IPomics with Mascot, OMSSA, X!Tandem and pFind
RESULTS 3 3. Comparison of IPomics with Mascot, OMSSA, X!Tandem and pFind
Comparison_peptides 3
Comparison_high-confidence peptides 3 Table 2. Fractions of high confidence peptides of the five algorithms Type Algorithm Datasets Human Youngbrain Oldbrain Youngliver Oldliver Peptide Total 37300 43357 42154 34667 34675 Mascot 29042 (77.9%) 36691 (84.6%) 35917 (85.2%) 29635 (85.5%) 29763 (85.8%) OMSSA 24780 (66.4%) 36803 (84.9%) 35880 (85.1%) 29682 (85.6%) 29838 (86.1%) X!Tandem 30862 (82.7%) 39161 (90.3%) 38202 (90.6%) 31636 (91.3%) 31647 (91.3%) pFind 33879 (90.8%) 36006 (83.1%) 35240 (83.6%) 30124 (86.9%) 30379 (87.6%) IPomics 36444 (97.7%) 42748 (98.6%) 41575 (98.6%) 34103 (98.4%) 34084 (98.3%)
RESULTS 3 4. Computational validation with SILAC and Tyrosine phosphorylation datasets
RESULTS 3 4. Computational validation with Tyrosine phosphorylation datasets Table S8. The identified spectra and peptides in tyrosine dataset The 175 of 304 tyrosine sites identified by IPomics were also searched in both Mascot and OMSSA, and the high confidence tyr peptides that at least identified by two engines were as high as 85.5% in IPomics (Fig. 7). The 14.5% (44) tyrosine phosphorylation peptides were uniquely identified by IPomics without overlap. However, all those peptides with tyrosine phosphorylation sites had been experimental verified in PhosphoSitePlus
Thanks!