Presentation is loading. Please wait.

Presentation is loading. Please wait.

Authors: Saumil Mehta and Deendayal Dinakarpandian

Similar presentations


Presentation on theme: "Authors: Saumil Mehta and Deendayal Dinakarpandian"— Presentation transcript:

1 ConsDiff: an algorithm for the detection of conserved differences between protein sequences
Authors: Saumil Mehta and Deendayal Dinakarpandian Source: Data & Knowledge Engineering, vol.53, pp.31-43, 2005 Speaker: Shu-Fen Chiou(邱淑芬) Date: 2005/02/17

2 Outline Introduction ConsDiff Implementation Conclusion Comment

3 Amino acid 分子構造的中心為一碳原子,稱為 碳 ( carbon)。 接在 碳上,有一個 胺基 及一個 酸基 (故名胺基酸)。
接在 碳上,有一個 胺基 及一個 酸基 (故名胺基酸)。 另有一氫原子及一基團 (R) 接在 碳上。 隨 R 基團的不同,各胺基酸的性質互有差異,組成二十種胺基酸

4 Amino acids 胺基酸名 amino acid 三字 符號 一字 胺基丙酸 Alanine Ala A 魚精胺酸 Arginine
天門冬醯酸 Asparagine Asn N 天門冬胺酸 Aspartic acid  Asp D Asn + Asp Asx B 半胱胺酸 Cysteine Cys C 麩胺酸醯胺 Glutamine Gln Q 麩胺酸 Glutamic acid  Glu E Gln + Glu Glx Z 甘胺酸 Glycine Gly G 組織胺酸 Histidine His H 異白胺酸 Isoleucine Ile I 胺基酸名 amino acid 三字 符號 一字 白胺酸 Leucine Leu L 離胺酸 Lysine Lys K 甲硫胺酸 Methionine Met M 苯丙胺酸 Phenylalanine Phe F 脯胺酸 Proline Pro P 絲胺酸 Serine Ser S 息寧胺酸 Threonine Thr T 色胺酸 Tryptophan Trp W 酪胺酸 Tyrosine Tyr Y 纈胺酸 Valine Val V

5 Amino acid 許多胺基酸連接起來成為蛋白質 脫水(H2O)反應 殘基 (residue)

6 蛋白質 Protein family Protein Sequence (FASTA Format)

7 Multiple Sequence Alignment
e.g.: ClustalW

8 Blosum Matrix主要是用來記錄在做sequence alignment時,兩個residue被align在一起的機率有多少,一旦這個matrix定義好了之後,我們就可以利用這個matrix,儘量將相似的residue align在一起,以達到最好的alignment。 Blosum這個matrix是根據blocks這個database的資料算出來的,舉個例子來說,blosum 62的matrix,便是收集blocks database中identity為62%的序列,再由這些序列推導出matrix。

9 Blosum

10

11 Problem Two protein subsets A and B and their residues and .
ALL and are conserved differences and are candidate residues responsible for the difference in the property of interest. The problem is find and .

12 ConsDiff MIS (Minimum internal score) MES (Maximum external score)
CDS (Conserved difference score): =[max (MIS(A), MIS(B)) - MES] Senstitivity (S) : default=0, determines the minimum value of the CDS that is considered significant Conserved difference

13 Algorithmic complexity

14 Implementation

15 Implementation

16 Conclusion Present an algorithm and a prototype implementation for the objective of automated discovery of conserved differences between two sets of protein sequence.

17 Comment 相同的sequence,也不考慮 只要找到gap(-),就不考慮


Download ppt "Authors: Saumil Mehta and Deendayal Dinakarpandian"

Similar presentations


Ads by Google