Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Vision Chapter 4

Similar presentations


Presentation on theme: "Computer Vision Chapter 4"— Presentation transcript:

1 Computer Vision Chapter 4
Statistical Pattern Recognition Tsun-An Hsieh Dr. Fuh Chiou Shann 1.Statistical Pattern Recognition 統計圖形識別 或稱 統計模式識別 2.讓電腦透過統計數學的方法來作Pattern的判讀以及自動處理 (比如:讓電腦能夠識別數字0~9,則遇到一數字時,電腦能自動識別且將此數字分類到類別0~類別9其中一個類別) Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

2 Summary Introduction. Bayesian approach. Maximin decision rule.
Misidentification and false-alarm error rates. Nearest neighbor rule. Construction of decision trees. Estimation of decision rules error. Neural network. 摘要 介紹 貝氏方法 極大極少決策準則 錯誤識別和假警報錯誤率 最近鄰居 決策樹的建立 決策準則錯誤的評估 神經網路 DC & CV Lab. CSIE NTU

3 Introduction .簡述 (1)先用一已知真正類別(事實上是哪一類、正確的類別)的data set來讓電腦能建立能夠將unit識別
、分類的能力[此data set又叫作training set] (3)將一個pattern歸類到某一類的動作叫做assignment(匹配) DC & CV Lab. CSIE NTU

4 Introduction(cont.) Units: Image regions and projected segments
Each unit has an associated measurement vector data set : { circular arc , a hole, … } Using decision rule to assign unit to class or category optimally UNIT包括圖形的區域以及投影的部分 每個UNIT有與其相關的測量值向量(可以是多維度的) 接著利用決策準則(decision rule)依照measurement 將unit assign(指定、分配)到最適合、最佳的類別 (1)測量值:圖形的區域以及投影的部分上有一系列的測量值(多維度的) (2)decision rule是電腦知道measurement後作unit assignment的根據,decision rule首先決定作分類用的特徵, 然後取要判斷此特徵所需的測量值,最後依照測量值來作unit assignment。 (如要電腦能圖形辨識是否為圓形, 則我們可能採取"線段是圓弧的"、"線段圍成一洞"這樣的特徵來判斷圖形, 而這需要關於圖形上點的分布的測量值,然後依照測量值來判斷是否為圓形。) "用一已知真正類別(事實上是哪一類、正確的類別)的data set來讓電腦能建立能夠將unit識別、分類的能力“ 此一流程在就是已知正確的分類結果(因為已知每個unit真正是屬於哪個類別)下訓練電腦挑選特徵、創建最佳的decision rule DC & CV Lab. CSIE NTU

5 Introduction (Cont.) Feature selection and extraction techniques
Have no hole Have hole Decision rule construction techniques word O - number 0 Techniques for estimating decision rule error 特徵的選取以及擷取技術 要讓電腦能識別一data set時,應該要選出何種特徵作為分類的依據才能在要識別一未知真正類別的unit時做到最好、最接近正確的assignment, 如此一來可以減少measurement的維度。 [比如要具有識別數字0、1的能力時,可以以"有洞"這樣的特徵來做為判斷的依據,則遇到unit時就 看unit是否"有洞",有洞就將它assign到0這一類,沒洞就assign到1。而選取這樣的特徵就不需圖 形的大小、亮暗度之類的measurement。] 決策準則(decision rule)的創建 我們要決定如何依照測量值(將特徵量化後的值)將這個UNIT作assignment 估計準則的誤差 匹配(assignment)的結果幾乎可以肯定不是100%正確的,因此我們要考量決策準則的誤差 依此誤差找最小誤差、最大效益的決策準則 DC & CV Lab. CSIE NTU

6 Simple Pattern Discrimination
Also called pattern identification process A unit is observed or measured A category assignment is made that names or classifies the unit as a type of object The category assignment is made only on observed measurement (pattern) Simple Pattern Discrimination 簡單圖形區分 pattern identification process 圖形鑑定過程 unit在過程中被觀測、測量,然後得到measurement(測量值) unit被視為一種類別物件,被命名或者分類 category assignment只依照observed measurement來達成 DC & CV Lab. CSIE NTU

7 Simple Pattern Discrimination (cont.)
a: assigned category from a set of categories C t: true category identification from C d: observed measurement from a set of measurements D (t, a, d): event of classifying the observed unit P(t, a, d): probability of the event (t, a, d) a 是unit被assign到的類別 而C是全部類別集合成一個集合 t 是unit它真正應該要屬於的那一個類別、就是他應該被匹配到的正確的類別 (比如把一個unit 數字6 assign到8這個類別,則此unit的a就是8,t是6) d 是觀察到的測量值,所有的unit的測量值形成一個測量值集合D (t,a,d) 是指unit" 真正應該要屬於t這個類別 '且' 被assign到a這個類別 '且' 被量測出的測量值是d "這樣的一個事件 P(t, a, d): 事件(t, a, d)發生的機率 DC & CV Lab. CSIE NTU

8 Economic Gain Matrix e(t, a): economic gain/utility with true category t and assigned category a A mechanism to evaluate a decision rule Identity gain matrix e(t, a): 是"unit真正類別是t且被assign到類別a"這個事件的經濟獲益。 經濟獲益是用來評估決策準則的效益的機制。 Identity gain matrix就是單位矩陣,只有對角線為1(此時t與a都是同一類別,即unit被assign到正確的類別),其他為0。 DC & CV Lab. CSIE NTU

9 An Instance 關於噴射機渦輪葉片的檢查 crack(裂縫)
true state=crack & detected state=crack :真正是有裂縫的,測出來也是有裂縫的,此時經濟獲益是(-$500(換葉片的cost))+((-$10)檢測的cost) DC & CV Lab. CSIE NTU

10 Another Instance P(g, g): probability of true good, assigned good,
P(g, b): probability of true good, assigned bad, ... e(g, g): economic consequence for event (g, g), e positive: profit consequence e negative: loss consequence P(g,g):真正的類別是g且被assign到類別g的機率 e(g,g):事件"一unit真正的類別是g且被assign到類別g"的經濟獲益 e是正值:獲益、有利潤 e負值:虧損 DC & CV Lab. CSIE NTU

11 Another Instance (cont.)
各種'真正類別與被匹配到的類別的組合'的可能性 DC & CV Lab. CSIE NTU

12 Another Instance (cont.)
各種'真正類別與被匹配到的類別的組合'的經濟獲益 DC & CV Lab. CSIE NTU

13 Another Instance (cont.)
Fraction of good objects manufactured P(g) = P(g, g) + P(g, b) Fraction of bad objects manufactured P(b) = P(b, g) + P(b, b) Expected profit per object E = P(g):真正類別是good的機率,即良率,等同於"真正的類別是g且被assign到類別g的機率“ + "真正的類別是g且被assign到類別b的機率" [ps.此時只有b、g兩類,P(x)也可稱作是產出率。 ] E:經濟獲益期望值 = (各種'真正類別與被匹配到的類別的組合'的事件的可能性) * (此種事件下的經濟獲益) DC & CV Lab. CSIE NTU

14 Conditional Probability
about 條件機率 P(b|g): false-alarm(假警報) rate [已知真正是good,被assign到bad(把好的看成壞的)] P(g|b): misdetection(漏檢) rate [已知真正是bad,被assign到good(把壞的看成好的)] P(b|g): false-alarm rate P(g|b): misdetection rate DC & CV Lab. CSIE NTU

15 Conditional Probability (cont.)
Another formula for expected profit per object E = = P(g|g)P(g)e(g,g)+P(b|g)P(g)e(g,b) + P(g|b)P(b)e(b,g)+P(b|b)P(b)e(b,b) 把page11的期望值公式中的P(x,y)改成用page12的P(b|g)、P(g|b)表示 1.首先用條件機率公式把P(x,y)用P(y|x)*P(x)取代 2.再把有相同的P(x)的整理到一起、把p(x)提出來 3.最後依照Page12把P(y|x)一律用P(b|g)、P(g|b)替換,也把P(b)換成[1-P(g)] DC & CV Lab. CSIE NTU

16 Table 4.4: Machine performance Table 4.5: Economic consequence
Example 4.1 Table 4.4: Machine performance P(g) = 0.95 P(b) = 0.05 Detected State Good Bad True State P(g|g) = 0.8 P(b|g) = 0.2 P(g|b) = 0.1 P(b|b) = 0.9 E = P(g) = 0.95, P(b) = 0.05 Table 4.5: Economic consequence Detected State Good Bad True State e(g|g) = $2000 e(g|b) = -$100 e(b|g) = -$10.000 e(b|b) = -$100 DC & CV Lab. CSIE NTU

17 Example 4.1 (cont.) 題目給定(misdetection rate(P(g|b), false-alarm rate(P(b|g)) = (0.1, 0.2) 套以page13的期望值公式 而P(g|g) = 1 - P(b|g) = 0.8,P(b|b) = 1 - P(g|b) = 0.9,即page14的P(x|y)表格 DC & CV Lab. CSIE NTU

18 Table 4.6: Machine performance Table 4.7: Economic consequence
Example 4.2 Table 4.6: Machine performance P(g) = 0.95 P(b) = 0.05 Detected State Good Bad True State P(g|g) = 0.85 P(b|g) = 0.15 P(g|b) = 0.12 P(b|b) = 0.88 E = P(g) = 0.95, P(b) = 0.05 Table 4.7: Economic consequence Detected State Good Bad True State e(g|g) = $2000 e(g|b) = -$100 e(b|g) = -$10.000 e(b|b) = -$100 DC & CV Lab. CSIE NTU

19 Example 4.2 (cont.) 與page15不同的是(misdetection rate(P(g|b), false-alarm rate(P(b|g)) = (0.12, 0.15), 代表把關的嚴格度(檢測品質)不同。 由P(g|b) = 0.12 ,"把壞的看成好的"的機率較高,代表嚴格度較低。 (misdetection rate(P(g|b), false-alarm rate(P(b|g))的不同影響到期望值E。 在page16的情況下獲益期望值E比page15高。 DC & CV Lab. CSIE NTU

20 Decision Rule Construction
(t, a): summing (t, a, d) on every measurements d Therefore, Average economic gain (t, a)代表 不管量測值d為何,所有"將真正屬於類別t的unit assign到類別a"的事件的集合 E[e]:經濟獲益期望值 = (各種'真正類別與被匹配到的類別的組合'的事件的可能性) * (經濟獲益矩陣為e時此種事件下的經濟獲益) DC & CV Lab. CSIE NTU

21 Decision Rule Construction (cont.)
所有經濟獲益矩陣中的某格與可能性矩陣上同樣位置的格子的乘積的和 = 期望值 DC & CV Lab. CSIE NTU

22 Decision Rule Construction (cont.)
We can use identity matrix as the economic gain matrix to compute the probability of correct assignment: 決策準則的創建 “用單位矩陣當作經濟獲益矩陣(對角線以外的值皆為0 且 對角線上的值都為1)的期望值” 會等於 "可能性矩陣的對角線上的值的合",也就等於把unit assign到正確類別的機率 DC & CV Lab. CSIE NTU

23 Fair Game Assumption Decision rule uses only measurement data in assignment; the nature and the decision rule are not in collusion In other words, P(a| t, d) = P(a| d) 公平比賽假設 1.我們只使用measurement data來作assignment,即他真正的類別跟decision rule是不會互相影響 的,化作公式表示:P(a| t, d) = P(a| d) ["在已知真正類別為t且測量值為d下,把unit assign 到a的機率" 會等同於 "只已知測量值為d下,把unit assign到a的機率"] 2. P(t, a, d) = P(a| t, d)*P(t,d) {by條件機率 in page22} = P(a|d)*P(t,d) {by fair game assumption} DC & CV Lab. CSIE NTU

24 Fair Game Assumption (cont.)
From the definition of conditional probability DC & CV Lab. CSIE NTU

25 Fair Game Assumption (cont.)
P(t, a, d) = P(a| t, d)*P(t,d) //By conditional probability = P(a| d)*P(t,d) //By fair game assumption By definition, = 1. P(t, a, d) = P(a| t, d)*P(t,d) {by條件機率 in page22} = P(a|d)*P(t,d) {by fair game assumption} 2. P(t, a|d) = P(t, a, d)/P(d) {by 條件機率 P(t, a| d) = P(t, a, d)/P(d)} = (P(a|d)*P(t,d)) / P(d) {by formula above} =P(a|d)*P(t|d) {by 條件機率 P(t| d) = P(t, d)/P(d)} ("已知unit測量值為d時,真正類別是t且被assign到a的機率" 等同於 "已知unit測量值為d時,真 正類別是t的機率"與"已知unit測量值為d時,被assign到a的機率"的乘積) DC & CV Lab. CSIE NTU

26 Deterministic Decision Rule
We use the notation f(a|d) to completely define a decision rule; f(a|d) presents all the conditional probability associated with the decision rule A deterministic decision rule: Decision rules which are not deterministic are called probabilistic/nondeterministic/stochastic 1. f(a|d) = 在測量值為d時,assign到a的機率(知道量出來是d,assign到類別a的機率),f(a|d)完整地定義了決策準則 2. A deterministic decision rule(明確的決策準則):當已知測量值為d時則把此unit assign到固定的某一個類別a 3. probabilistic/nondeterministic/stochastic rule(不明確、隨機的決策準則): 當已知測量值為d時,此決策準則可能把此unit assgin到類別a,也可能assign到別的類別(比如:f(a|d)=0.7,f(b|d)=0.3) DC & CV Lab. CSIE NTU

27 Expected Value on f(a|d)
Previous formula By // By conditional probability and //By p.23 => 1.把page18的期望值公式中的P(t,a)代換成page18的sum of P(t,a,d) for all d 2.再把P(t,a,d) 經條件機率公式及page23的公式代換成P(t|d)f(a|d)P(d) DC & CV Lab. CSIE NTU

28 Expected Value on f(a|d) (cont.)
1.把P(t|d)P(d)經條件機率公式帶換成P(t,d) 2.由於f(a|d)對於sum of(t belongs to C)而言是為一常數(因f(a|d)與t無關),我們可以把f(a|d)提出來 3.接著我們要先看deterministic decision rule,由於deterministic decision rule的f(a|d)不是1就是0,且只有在某固定a時才為1, 因此如果我們要找deterministic decision rule下的最大的期望值的話(同樣地,就是在找達成最大期望值的決策準則), 先考慮measurement為d的情況下,當公式後半的(e(,)P(,))為最大時,讓f(a|d)為1,即得當measurement為d下的最大期望值, 最後再將所有measurement的情況的期望值加起來,即可得最大的E。 4.移動一下summation 的位置 我們最後可以得到這個式子,這式子呈現了期望值跟decision rule間的影響關係 DC & CV Lab. CSIE NTU

29 Bayes Decision Rules Constructing f Maximize expected economic gain
Satisfy Constructing f 1. Bayes Decision Rules是期望值為最大的那個decision rule 2. 第二點即是page26所備註的"如果我們要找deterministic decision rule下的最大的期望值的話, 先考慮measurement為d時,當公式後半的(e(,)P(,))為最大時,讓f(a|d)為1,即得當measurement為d下的最大期望值, 最後再將所有measurement的情況的期望值加起來。" 所以當公式後半的(e(,)P(,))不為最大時就讓f(a|d)為0。 DC & CV Lab. CSIE NTU

30 E[e; f] = Σ {Σ f(a | d) [Σ e(t, a)P(t,d)] }
d € D a € C t € C Measurement P(c,d) d1 d2 d3 True category c1 0.12 0.18 0.3 Identification c2 0.2 0.16 0.4 e Assigned c1 c2 True Measurement f(a|d) d1 d2 d3 Assigned category c1 Identification c2 E[e; f] = Figure 4.2 Calculation of the Bayes decision rule and calculation of the expected gain. DC & CV Lab. CSIE NTU

31 Bayes Decision Rules (cont.)
P(c,d)表示unit真正的類別是c且測量值是d (1) f(a|d)表示已知測量值是d時,unit被assign到a的機率 e(x,y)表示unit真正的類別是x且被assign到y時的經濟獲益(此例子的經濟獲益矩陣e是單位矩陣) 可以從d1的期望值開始求 +0.12{P(c1,d1)}* 0{e(c1,c2)}* 1{f(c2|d1)} 0.12{P(c1,d1)}* 1{e(c1,c1)}* 0{f(c1|d1)} 測量值d1時,期望值為 =0.2 +0.2{P(c2,d1)}* 1{e(c2,c2)}* 1{f(c2|d1)} +0.2{P(c2,d1)}* 0{e(c2,c1)}* 0{f(c1|d1)} 依此類推求得d2、d3時的期望值分別為0.18、0.3,故E=0.68 (A)看真正的類別是c1的情況(P(c1,d1))=0.12,而真正類別是c1時會有可能被assign到c1、c2兩種情況,分別是情況(a)、情況(b), 然後 這算法是先看d為d1的情況, 因此情況(A)下,期望值為0 (b)而真正類別是c1且被assign到c2的經濟獲益(e(c1,c2))是0,已知測量值是d1時,unit被assign到c2的機率(f(c2|d))是1 (a)而真正類別是c1且被assign到c1的經濟獲益(e(c1,c1))是1,已知測量值是d1時,unit被assign到c1的機率(f(c1|d))是0 (B)看真正的類別是c2的情況(P(c2,d1))=0.2,而真正類別是c2時會有可能被assign到c1、c2兩種情況,分別是情況(a)、情況(b), 因此(B)的期望值為0.2 (b)而真正類別是c2且被assign到c2的經濟獲益(e(c2,c2))是1,已知測量值是d1時,unit被assign到c2的機率(f(c2|d))是1 (a)而真正類別是c2且被assign到c1的經濟獲益(e(c2,c1))是0,已知測量值是d1時,unit被assign到c1的機率(f(c1|d))是0 綜合(A)、(B)得d1時的期望值為0.2 / (a)a=c1 (A)t=c1-(b)a=c2 / (a)a=c1 / (a)c1 d1-----(B)t=c2-(b)a=c2 / / / (A)c1-(b)c2 (a)c1 / / / / / \ D---d2 -----(B)c1-(b)c2 d3--(A)c1--(a)c1 \ (b)c2 \ \ (B)c2-(a)c1 (b)c2 faster: d=d1 → P(c2,d1)=0.2 → e(c2,c2)=1 → f(c2|d1)=1 d=d1 → P(c1,d1)=0.12 → e(c1,c1)=1(找非0的) → f(c1|d1)=0 → 把3個值相乘:0.12*1*0 = 0 d=d2 → P(c2,d2)=0.16 → e(c2,c2)=1 → f(c2|d2)=0 d=d2 → P(c1,d2)=0.18 → e(c1,c1)=1 → f(c1|d2)=1 d=d3 → P(c2,d3)=0.04 → e(c2,c2)=1 → f(c2|d2)=0 d=d3 → P(c1,d3)=0.3 → e(c1,c1)=1 → f(c1|d2)=1 (2) 當d為d1時只有f(c2|d)為1[表示已知d為d1時,此決策準則會100%將此unit assign到c2] (A) 或者直接先看f(a|d) 然後看e(x,y)矩陣的y為c2那行,而知只有e(c2,c2)為1[表示當assigned category為c2時,只有true category為c2時才有經濟獲益] (B)d=d2 → f(c1|d2)=1 → e(c1,c1)=1 → P(c1,d2)=0.18 而得1*1*0.2=0.2 因此回找P(c2,d) 期望值為 =0.68 (C)d=d3 → f(c1|d3)=1 → e(c1,c1)=1 → P(c1,d2)=0.3 DC & CV Lab. CSIE NTU

32 Bayes Decision Rules (cont.)
Assigned c1 c2 True DC & CV Lab. CSIE NTU

33 Bayes Decision Rules (cont.)
+ 此表格是是P(t,a)矩陣 P(t,a)是"unit真正類別是t且被assign到a"的機率 等於對所有d把P(t,d)*f(a|d) 對角線上的P(c1,c1)、P(c2,c2)表示correct assignment P(c1,c2)是“unit真正類別是c1且被assign到c2”的機率“(incorrect assignment) 看f(a|d):只有f(c2|d1)不為0,則P(c1,c2)=P(c1,d1)*f(c2|d1)=0.12 (找f(a|d)中,f(c1|d)不為0的d,再回找對應的P(t,d)) P(c2,c1)是"unit真正類別是c2且被assign到c1"的機率"(incorrect assignment) P(c2,c1) = P(c2,d2)*f(c1|d2) + P(c2,d3)*f(c1|d3) = = 0.2 (找f(a|d)中,f(c2|d)不為0的d,再回找對應的P(t,d)) (1) P(c1,c1)值為 0.12{P(c1,d1)}* 0{f(c1|d1)} + 0.18{P(c1,d2)}*1{f(c1|d2)} + 0.3{P(c1,d3)}* 1{f(c1|d3)} =0.48 (2) P(c2,c1)值為 0.2{P(c2,d1)}* 0{f(c1|d1)} + 0.16{P(c2,d2)}*1{f(c1|d2)} + 0.04{P(c2,d3)}* 1{f(c1|d3)} =0.20 依此類推可求得其他P(t,a)值 + DC & CV Lab. CSIE NTU

34 Continuous Measurement
For the same example, try the continuous density function of the measurements: and Measurements lie in the close interval [0,1] Prove that they are indeed density function Continuous Measurement是指measurement是連續的, 而page28的例子中measurement是離散的,為d1、d2、d3 如今我們看同一個例子中measurement是離散的情形對最大期望值的影響 page30中給定t1、t2的連續密度函數,x的範圍在0~1之間,x即為measurement 則把x套以某measurement d的話就表示P(d|t)[已知真正類別是t然後量出measurement為d的機率] 首先先檢測它給訂的函數是真的連續密度函數, 可由把函數在他的定義域範圍內的積分是否為1來證實 DC & CV Lab. CSIE NTU

35 Continuous Measurement (cont.)
Suppose that the prior probability of is and the prior probability of is = When , a Bayes decision rule will assign an observed unit to t1, which implies => P(t):prior probability(事先機率) 為一個unit在未知其他任何條件下真正屬於t類的機率,即為產出率。 由page26得期望值的公式,當我們使用明確的決策準則且要讓期望值為最大時 以公式後半的(e(,)P(,))為最大時,再讓f(a|d)為1,即得最大獲益的期望值-(1) 先考慮measurement為d的期望值 = Σf(a|d)(Σe(,)P(,))─(2) 又此例子中經濟獲益矩陣為單位矩陣, 所以只有assignment category = true category時,期望值才不為0 代表當我們選定a來算期望值且此情形下期望值不為0的時候,t也會被綁定,否則出來的值就為0-(3) 由(1)、(2)、(3)可知當我們要選定measurement d下,讓期望值為最大時,P(t,d)要是最大的 又P(t,d)=P(d|t)P(t),且此例子中category只有t1、t2兩類, 所以我們只要知道x為多少時 P(t1,d) ≧ P(t2,d),再讓f(c1|d)為1 x為多少時 P(t1,d) < P(t2,d),讓f(c2|d)為1 然後在兩種數值範圍內分別對兩者積分即可得最大期望值 DC & CV Lab. CSIE NTU

36 Continuous Measurement (cont.)
E[e;f] = .805 > .68, the continuous measurement has larger expected economic gain than discrete DC & CV Lab. CSIE NTU DC & CV Lab. CSIE NTU

37 Continuous Measurement (cont.)
.805 > .68, the continuous measurement has larger expected economic gain than discrete P(t1,d) ≧ P(t2,d)時,x的範圍為0.707~1, P(t1,d) < P(t2,d)時,x的範圍為0~0.707, 在兩種範圍下分別對f(t1|d)e(t1,t1)P(t1,d)、f(t2|d)e(t2,t2)P(t2,d)積分 DC & CV Lab. CSIE NTU

38 Prior Probability The Bayes rule: Replace with
The Bayes rule can be determined by assigning any categories that maximizes 即在限定是明確的決策準則下,說明要讓期望值為最大時,該如何選此決策準則 詳見PAGE26的備忘稿 DC & CV Lab. CSIE NTU

39 Economic Gain Matrix Suppose are two different economic gain matrix with relationship According to the construction rule. Given a measurement d, Because We then got 同樣的決策準則下及其他條件不變下, 將經濟獲益矩陣e1乘上常數k1再加上常數k2變成e2對我們挑選有最大期望值的明確決策準則並沒有影響 (挑選有最大期望值的明確決策準則:固定d下,找a,當e( , )P( , )最大時,讓f(a|d) = 1) DC & CV Lab. CSIE NTU

40 Economic Gain Matrix Identity matrix Incorrect loses 1
A more balanced instance 一些經濟獲益矩陣 Identity matrix:正配匹配(correct assignment)時,賺一塊 Incorrect loses 1:incorrect assignment虧一塊 A more balanced instance:correct assignment時賺一塊,且incorrect assignment虧一塊 DC & CV Lab. CSIE NTU

41 Economic Gain Matrix Bayes decision rules do not maximize the probability of correct identification are obtained by economic gain matrix in which the diagonal terms are not equal or the off-diagonal terms are not equal. 一些經濟獲益矩陣 Identity matrix:正配匹配(correct assignment)時,賺一塊 Incorrect loses 1:incorrect assignment虧一塊 A more balanced instance:correct assignment時賺一塊,且incorrect assignment虧一塊 DC & CV Lab. CSIE NTU

42 Maximin Decision Rule Maximizes average gain over worst prior probability 可以將期望值中公式P(t,d)依條件機率公式寫成P(d|t)P(t), 而期望值則可寫作ΣΣΣf(ck|d)e(cj,ck)P(d|cj)P(cj), 其中把ΣΣf(ck|d)e(cj,ck)P(d|cj)以E[e|cj;f]表示, E[e|cj;f]代表在決策準則f下,真正類別屬於類別cj的unit的期望值 對一measurement d而言,P(c1)+P(c2)+...+P(ck) = 1 worst prior probability是指在同一決策準則下, 使得期望值為最小的P(c1)、P(c2)、...、P(ck)分布情形, 此時的期望值即可寫成ppt上面的minΣ(E[e|cj;f])(P(cj)) p(c1)...p(ck) 則此種情況就是此決策準則的min gain Maximin Decision Rule是一個decision rule, 其min gain≧其他任何decision rule的min gain, Maximin是指在最小中取最大,在各個decision rule中取他們的min gain出來相比,找出有最大的min gain的decision rule DC & CV Lab. CSIE NTU

43 Example 4.3 給定一P(d|c)表,而此例子中經濟獲益矩陣為單位矩陣 DC & CV Lab. CSIE NTU

44 Example 4.3 (cont.) P(d | c) d1 d2 d3 c1 .2 .3 .5 c2 .4 .1
(A)~(C)是找deterministic maximin rule的流程 (A) 首先看左上方的式子:E[e;g,P] ≧ min(E[e|cj;g]) [任何decision rule為g時,"任意事先機率分布下得到的期望值"會大於等於最小的"真正類別是cj的條件期望值"] E[e;g,P]代表在decision rule為g且P(c1)、P(c2)、...、P(ck)分布為P下的經濟獲益期望值 E[e|cj;f]是在決策準則f下,真正類別屬於類別cj的unit的期望值 E[e;g,P]可以寫作是E[e|c1;g]P(c1)+...+E[e|ck;g]P(ck)-(1) 其中P(c1)、P(c2)、...、P(ck)皆小於等於1,且P(c1)+P(c2)+...+P(ck)=1-(2) min(E[e|cj;g])就是E[e|cj;g]為最小時,讓P(cj) = 1, 由(1)、(2)可知 E[e;g,P] ≧ (min(E[e|cj;g]))*1 因為假設P(cj)挪p的機率給E[e|ci;g]且E[e|ci;g]≧E[e|cj;g],此時P(cj)=(1-p)、P(ci)=p 則E[e;g,P] = E[e|cj;g]*(1-p) + E[e|ci;g]*p ≧ E[e|cj;g]*(1-p) + E[e|cj;g]*p = E[e|cj;g] E[e;g,P] ≧ min(E[e|cj;g])是用於所有decision rule,包括明確的以及隨機的 (B) 然後我們先只考慮decision rule為Deterministic(明確的)情形 明確的決策準則會將一已知measurement為d的unit completely(100%地)assign到某一類 此例子的measurement的可能性有三種且類別有兩種, 則Deterministic decision rule的數量為2*2*2 (乘3次代表measurement有3種、數字2代表有可能被匹配到的類別有兩個), 由(A)可知,求某一decision rule的min gain只要找到min(E[e|cj;f])即可 而此例子的E[e|cj;f]有E[e|c1;f]、E[e|c2;f]兩種 由page36可知E[e|cj;f] = ΣΣf(ck|d)e(cj,ck)P(d|cj) 我們以決策準則為f1為例找E[e|c1;f1]、E[e|c2;f1] (a) 首先是E[e|c1;f1] 也是從d1開始看,(ps.此例的經濟獲益矩陣e是單位矩陣) P(d1|c1)=0.2 → e(c1,c1)=1 → f(c1|d1)=1 → 0.2*1*1 = 0.2 P(d1|c1)=0.2 → e(c1,c2)=0 → f(c2|d1)=0 → 0 然後看d2 P(d2|c1)=0.3 → e(c1,c1)=1 → f(c1|d2)=1 → 0.3 P(d2|c1)=0.3 → e(c1,c2)=0 → f(c2|d2)=0 → 0 最後是d3 P(d3|c1)=0.5 → e(c1,c1)=1 → f(c1|d3)=1 → 0.5 P(d3|c1)=0.5 → e(c1,c2)=0 → f(c2|d3)=0 → 0 因此E[e|c1;f]為 = 1 (b) 再來是E[e|c2;f1] P(d1|c2)=0.5 → e(c2,c1)=0 → f(c1|d1)=1 → 0 P(d1|c2)=0.5 → e(c2,c2)=1 → f(c2|d1)=0 → 0 P(d2|c2)=0.4 → e(c2,c1)=0 → f(c1|d2)=1 → 0 P(d2|c2)=0.4 → e(c2,c2)=1 → f(c2|d2)=0 → 0 P(d3|c2)=0.1 → e(c2,c1)=0 → f(c1|d3)=1 → 0 P(d3|c2)=0.1 → e(c2,c2)=1 → f(c2|d3)=0 → 0 因此E[e|c2;f]為0 我們由page38可以看到在f1列(row)的d1、d2、d3都是assign到c1 代表f(c1|d1)、f(c1|d2)、f(c1|d3)皆為1,f(c2|d1)、f(c2|d2)、f(c2|d3)皆為0 ( f(a|d)表示已知measurement為d,把unit assign到a的機率 ) 簡而言之, 要算E[e|c1;f1]時從d1、d2、d3開始算起, 先找P(d1|c1)然後找"e(c1,a)不等於0"的e(c1,a) [此例子是e(c1,c1)] 最後乘上"f(a|d1)不等於0"的f(a|d1) [f1下是f(c1|d1) = 1] 接著再算d為d2、d3的狀況,最後相加得E[e|c1;f1] 由上述的算法可再得E[e|c1;f2]=0.2*1*1+0.3*1*1=0.5,E[e|c2;f2] = 0.1*1*1 = 0.1 或者由f(a|d)為出發點: 先看有關f(a|d)的表(即page38的表) 假設要找f3下的E[e|c1;f3] 先看哪些d被assign到c1→找到d1、d3, 然後依照"e(t,c1)在t為何者時,e(t,c1)不為0"來找P(t,d), 因此取P(c1,d1)、P(c1,d3), 則E[e|c1;f3] = f(c1|d1)*P(c1,d1)+f(c1|d3)*P(c1,d3) = 0.7 同樣地E[e|c2;f3] = f(c2|d2)*P(c2,d2) = 0.4*1 = 0.4 (C) 先取各decision rule下,最小的E[e|cj;fx](x from 1 to 8),得min E[e|cj;f] 最後拿各min E[e|cj;f]相比較,找擁有最大的min gain的decision rule, 則此rule則為有deterministic maximin rule(因為此8種rule皆為deterministic) 此例子中deterministic maximin rule為f5與f7, 在page39中,以大虛線表示deterministic maximin rule DC & CV Lab. CSIE NTU

45 Example 4.3 (cont.) DC & CV Lab. CSIE NTU
(A)~(C)是找deterministic maximin rule的流程 (A) E[e|cj;f]是在決策準則f下,真正類別屬於類別cj的unit的期望值 E[e;g,P]代表在decision rule為g且P(c1)、P(c2)、...、P(ck)分布為P下的經濟獲益期望值 首先看左上方的式子:E[e;g,P] ≧ min(E[e|cj;g]) [任何decision rule為g時,"任意事先機率分布下得到的期望值"會大於等於最小的"真正類別是cj的條件期望值"] E[e;g,P]可以寫作是E[e|c1;g]P(c1)+...+E[e|ck;g]P(ck)-(1) 由(1)、(2)可知 E[e;g,P] ≧ (min(E[e|cj;g]))*1 min(E[e|cj;g])就是E[e|cj;g]為最小時,讓P(cj) = 1, 其中P(c1)、P(c2)、...、P(ck)皆小於等於1,且P(c1)+P(c2)+...+P(ck)=1-(2) 因為假設P(cj)挪p的機率給E[e|ci;g]且E[e|ci;g]≧E[e|cj;g],此時P(cj)=(1-p)、P(ci)=p E[e;g,P] ≧ min(E[e|cj;g])是用於所有decision rule,包括明確的以及隨機的 則E[e;g,P] = E[e|cj;g]*(1-p) + E[e|ci;g]*p ≧ E[e|cj;g]*(1-p) + E[e|cj;g]*p = E[e|cj;g] 然後我們先只考慮decision rule為Deterministic(明確的)情形 (B) 則Deterministic decision rule的數量為2*2*2 (乘3次代表measurement有3種、數字2代表有可能被匹配到的類別有兩個), 此例子的measurement的可能性有三種且類別有兩種, 明確的決策準則會將一已知measurement為d的unit completely(100%地)assign到某一類 由(A)可知,求某一decision rule的min gain只要找到min(E[e|cj;f])即可 由page36可知E[e|cj;f] = ΣΣf(ck|d)e(cj,ck)P(d|cj) 而此例子的E[e|cj;f]有E[e|c1;f]、E[e|c2;f]兩種 (a) 我們以決策準則為f1為例找E[e|c1;f1]、E[e|c2;f1] P(d1|c1)=0.2 → e(c1,c1)=1 → f(c1|d1)=1 → 0.2*1*1 = 0.2 也是從d1開始看,(ps.此例的經濟獲益矩陣e是單位矩陣) 首先是E[e|c1;f1] P(d1|c1)=0.2 → e(c1,c2)=0 → f(c2|d1)=0 → 0 P(d2|c1)=0.3 → e(c1,c2)=0 → f(c2|d2)=0 → 0 P(d2|c1)=0.3 → e(c1,c1)=1 → f(c1|d2)=1 → 0.3 然後看d2 P(d3|c1)=0.5 → e(c1,c1)=1 → f(c1|d3)=1 → 0.5 最後是d3 (b) 因此E[e|c1;f]為 = 1 P(d3|c1)=0.5 → e(c1,c2)=0 → f(c2|d3)=0 → 0 再來是E[e|c2;f1] P(d1|c2)=0.5 → e(c2,c2)=1 → f(c2|d1)=0 → 0 P(d1|c2)=0.5 → e(c2,c1)=0 → f(c1|d1)=1 → 0 P(d2|c2)=0.4 → e(c2,c2)=1 → f(c2|d2)=0 → 0 P(d2|c2)=0.4 → e(c2,c1)=0 → f(c1|d2)=1 → 0 因此E[e|c2;f]為0 P(d3|c2)=0.1 → e(c2,c2)=1 → f(c2|d3)=0 → 0 P(d3|c2)=0.1 → e(c2,c1)=0 → f(c1|d3)=1 → 0 我們由page38可以看到在f1列(row)的d1、d2、d3都是assign到c1 ( f(a|d)表示已知measurement為d,把unit assign到a的機率 ) 代表f(c1|d1)、f(c1|d2)、f(c1|d3)皆為1,f(c2|d1)、f(c2|d2)、f(c2|d3)皆為0 要算E[e|c1;f1]時從d1、d2、d3開始算起, 簡而言之, 接著再算d為d2、d3的狀況,最後相加得E[e|c1;f1] 最後乘上"f(a|d1)不等於0"的f(a|d1) [f1下是f(c1|d1) = 1] 先找P(d1|c1)然後找"e(c1,a)不等於0"的e(c1,a) [此例子是e(c1,c1)] 由上述的算法可再得E[e|c1;f2]=0.2*1*1+0.3*1*1=0.5,E[e|c2;f2] = 0.1*1*1 = 0.1 假設要找f3下的E[e|c1;f3] 先看有關f(a|d)的表(即page38的表) 或者由f(a|d)為出發點: 先看哪些d被assign到c1→找到d1、d3, 則E[e|c1;f3] = f(c1|d1)*P(c1,d1)+f(c1|d3)*P(c1,d3) = 0.7 因此取P(c1,d1)、P(c1,d3), 然後依照"e(t,c1)在t為何者時,e(t,c1)不為0"來找P(t,d), 同樣地E[e|c2;f3] = f(c2|d2)*P(c2,d2) = 0.4*1 = 0.4 最後拿各min E[e|cj;f]相比較,找擁有最大的min gain的decision rule, 先取各decision rule下,最小的E[e|cj;fx](x from 1 to 8),得min E[e|cj;f] (C) 此例子中deterministic maximin rule為f5與f7, 則此rule則為有deterministic maximin rule(因為此8種rule皆為deterministic) 在page39中,以大虛線表示deterministic maximin rule DC & CV Lab. CSIE NTU

46 Example 4.3 (cont.) Conditional gains E[e|c1;f] E[e|c2;f]
Decision rule f1 1 f2 .5 .1 f3 .7 .4 f4 .2 f5 .8 f6 .3 .6 f7 .9 f8 E[e;f] Maximin gain for deterministic rules Maximin gain P(c1)

47 Example 4.3 (cont.) DC & CV Lab. CSIE NTU (A)畫圖
此時期望值E[e;f] = E[e|c1;f]*0 + E[e|c2;f]*1 = E[e|c2;f], 首先因P(c1)+P(c2)=1,P(c1) = 0表示P(c2)=1, 我們可以將期望值與P(c1)的關係畫成page39的圖 因此可以找到一decision rule在P(c1) = 0的點(x,y)= (0,E[e;f](=E[e|c2;f])) 則一decision rule在圖上的圖形為直線線段 且因E[e|c1;f]與E[e|c2;f]是常數因此x、y的關係式為y=a(x)+b(1-x)[a、b為常數且x≦1], 再找P(c1) = 1時的座標(x,y)= (1,E[e;f](=E[e|c1;f])) 我們可以直接將(0,E[e|c2;f])與(1,E[e|c1;f])相連得到decision rule的圖 bayes gain是所有decision rule比較後得出的最大經濟獲益, (B)bayes gain 而在不同的P(c1)區間分別由不同的deterministic decision rule擁有bayes gain, Bayes decision rule在worst prior probability下的gain為f5與f7的交點 而我們將一區間內有bayes gain的deterministic decision rule的區段標以小虛線,小虛線的組合代表Bayes decision rule bayes gain可以只透過deterministic decision rule得出,而不用考慮隨機的決策準則來得到(由(C)可知), 可將f5與f7化作page40的形勢進而求P(c1)、P(c2)與 lowest Bayes gain 可得P(c1) = 4/7 P(c1)*0.8 + (1-P(c1))*0.5 = P(c1)*0.5 + (1-P(c1))*0.9 帶入Page38的f5和f7數據,找出 lowest Bayes gain位置 lowest Bayes gain = (4/7)*0.8 + (3/7)*0.5 = 由page26的公式可知期望值=Σf(a|d)(Σe(,)P(,)), (C)隨機決策規則與明確決策規則的關係 隨機的決策準則與明確的決策準則的差別是f(a|d)的不同, 則它在P(ci)分布為P下的期望值E[e;f9,P] = E[e|c1;f9]*P(c1)+E[e|c2;f9]*P(c2) 假設在此一例子中有一隨機決策準則f9其f9(c1|d1)=0.7、f9(c2|d1)=0.3、f9(c1|d2)=1、f9(c1|d2)=1 對有相同的true category t以及measurement d的unit可能有機率分到a1類別及a2及a3... E[e|c1;f9] = f9(c1|d1)*e(c1,c1)*P(d1|c1) + f9(c1|d2)*e(c1,c1)*P(d2|c1) + f9(c1|d3)*e 又 f9(c1|d1) = 0.7*f1(c1|d1) 且 f9(c1|d2) = f1(c1|d2) = f5(c1|d2)、 f9(c1|d3) = f1 (c1,c1)*P(d3|c1) 因此E[e|c1;f9] = 0.7*f1(c1|d1)*e(c1,c1)*P(d1|c1) + [(0.7*f1(c1|d2)+0.3*f5(c1|d2))*e (c1|d3) = f5(c1|d3) = 0.7*[f1(c1|d1)*e(c1,c1)*P(d1|c1)+f1(c1|d2)*e(c1,c1)*P(d2|c1)+f1(c1|d3)*e(c1,c1)*P (c1,c1)*P(d2|c1)] + [(0.7*f1(c1|d3)+0.3*f5(c1|d3))*e(c1,c1)*P(d3|c1)] = 0.7*E[e|c1;f1] + 0.3*E[e|c1;f5] (d3|c1)] + 0.3*[f5(c1|d2)*e(c1,c1)*P(d2|c1) + f5(c1|d3)*e(c1,c1)*P(d3|c1)] 又 f9(c2|d1) = 0.3*f5(c2|d1) E[e|c1;f9] = f9(c2|d1)*e(c2,c2)*P(d1|c2) 因此E[e|c2;f9] = 0.3*f5(c2|d1)*e(c2,c2)*P(d1|c2) = 0.3*E[e|c2;f5] = 0.7E[e;f1,P]+0.3E[e;f5,P] 因此E[e;f9,P] = 0.7*E[e|c1;f1]*P(c1) + 0.3*{E[e|c1;f5]*P(c1) + E[e|c2;f5]*P(c2)} 因此在事先機率的分布相同下E[e;f9,P]可以看作是0.7E[e;f1,P]+0.3E[e;f5,P], 因此隨機決策規則在某事先機率的分布下的期望值的範圍介於同樣情況下"明確隨機規則相比後的最小期望值"及"明確隨機規則相比後的最大期望值"之間 隨機決策規則的期望值可以看作是明確隨機規則的期望值間的線性組合 (D)maximin decision rule的min E ≧ Bayes decision rule的min gain,fM:General maximin rule 且maximin decision rule fM為水平線(見page41) 因此maximin decision rule的min E ≧ Bayes decision rule的min gain 因maximin decision rule在其最差事先機率分布下的期望值 ≧ 其他decision rule在它們各自的最差的事先機率分布下的期望值 則令E[e;fM] = pE[e;f5] + (1-p)E[e;f7] (因E[e;f5] = E[e;fM] = E[e;f7]) 然後我們知道fM在lowest bayes gain的點上時的期望值是f5與f7兩者的線性組合 接著可以藉由水平線的特性知道fM不隨P(c1)不同而改變 = P(c1)(x)+(y) = p(0.3P(c1)+0.5) + (1-p)(-0.4(P(c1))+0.9) 然後得fM對於各個measurement的fM(a|d)[如page46下方的表] 因此P(c1)(x) = 0,因此x=0,進而求出p DC & CV Lab. CSIE NTU

48 Example 4.3 (cont.) The lowest Bayes gain is achieved when
將page39中的f5、f7的線段以E、P(c1)為變數列出直線方程式 然後求他們的交點,交點的y座標即為E,也正是lowest Bayes gain 帶入Page38的f5和f7數據,找出 lowest Bayes gain位置 P(c1)*0.8 + (1-P(c1))*0.5 = P(c1)*0.5 + (1-P(c1))*0.9 可得P(c1) = 4/7 lowest Bayes gain = (4/7)*0.8 + (3/7)*0.5 = The lowest gain is DC & CV Lab. CSIE NTU

49 Example 4.3 (cont.) DC & CV Lab. CSIE NTU
這是所有 decision rule的期望值 中間那條fM就是maximin decision rule (A)包住的區域是所有decision rule的集合 以E[e|c1;f]、E[e|c2;f]分別為X、Y座標軸, 根據page38標出f1~f8在座標上的點, 然後將在最外圍的點都連起來,圍成的區域就是所有decision rule的集合(包括明確的以及隨機的) 隨機決策準則是由多個明確準則座標作線性組合而成的點 因為若是連起來的區域沒辦法包住所有的deterministic decision rule代表的點就表示這區域不是所有decision rule的集合 若是隨機決策準則落在兩個明確決策準則的點形成的線段上則表示此隨機準則為此兩個明確準則的線性組合 (B)fM會在X=Y上,且fM的期望值不隨事先機率分布不同而改變(因此在page39中呈水平線) 而fM會落在x=y上且離原點最遠的點(見(C)) 由於E[e;fM,P] = E[e|c1;fM]*P(c1)+E[e|c2;fM]*P(c2), 而因fM落在x=y上,則E[e|c1;fM] = E[e|c2;fM],又P(c1)+P(c2) = 1, 則E[e;fM,P] = E[e|c1;fM]*p + E[e|c1;fM]*(1-p) = E[e|c1;fM] = E[e|c2;fM] = min(E[e|cj;f]) = 固定值 因此fM的期望值不隨事先機率分布不同而改變,因此在page39中呈水平線 (C) 若fM不是在x=y上的話,那麼依照page38中E[e;g,P] ≧ min(E[e|cj;g]) 它在最差事先機率的分布下的期望值會等於x座標或者y座標 但我們卻可以在"x=y且落於包住的區域內"找到一點座標F,其min(E[e|cj;F])≧min(E[e|cj;fM]) [ps. 當點在x=y上時,E[e|c1;f] = E[e|c2;f] = E[e;g,P] = min(E[e|cj;f]) ] 而在x=y上的點離原點越遠,其x座標及y座標會越大,則fM當然在x=y上且落於區域內離原點最遠的點 DC & CV Lab. CSIE NTU

50 Example 4.4 P( d|c ) d1 d2 c1 3/4 1/4 c2 1/8 7/8 e c1 c2 -1 20/7 d1 d2
20/7 d1 d2 E[e|c1;f] E[e|c2;f] f1 c1 f2 c2 f3 f4 DC & CV Lab. CSIE NTU

51 Example 4.4 P( d|c ) d1 d2 c1 3/4 1/4 c2 1/8 7/8 e c1 c2 -1 20/7 d1 d2
20/7 d1 d2 E[e|c1;f] E[e|c2;f] f1 c1 f2 c2 f3 1/6 5/14 f4 -1 20/7 我們以決策準則為f1為例找E[e|c1;f1]、E[e|c2;f1] (a) 首先是E[e|c1;f1] 也是從d1開始看, P(d1|c1)=3/4 → e(c1,c1)=11/3 → f(c1|d1)=1 → (3/4)*(11/3)*1 = 11/4 P(d1|c1)=3/4→ e(c1,c2)=-1→ f(c2|d1)=0 → 0 然後看d2 P(d2|c1)=1/4 → e(c1,c1)=11/3 → f(c1|d2)=1 → (1/4)*(11/3)*1 = 11/12 P(d2|c1)=1/4 → e(c1,c2)=-1 → f(c2|d2)=0 → 0 因此E[e|c1;f1]為11/4+11/12+0 = 11/3 (b) 再來是E[e|c2;f1] P(d1|c2)=1/8 → e(c2,c1)=0 → f(c1|d1)=1 → 0 P(d1|c2)=1/8 → e(c2,c2)=20/7 → f(c2|d1)=0 → 0 P(d2|c2)=7/8 → e(c2,c1)=0 → f(c1|d2)=1 → 0 P(d2|c2)=7/8 → e(c2,c2)=20/7 → f(c2|d2)=0 → 0 因此E[e|c2;f1]為0 (c) 其次是E[e|c1;f2] P(d2|c1)=1/4 → e(c1,c1)=11/3 → f(c1|d2)=0 → 0 P(d2|c1)=1/4 → e(c1,c2)=-1 → f(c2|d2)=1 → (1/4)*(-1)*1 = -1/4 因此E[e|c1;f2]為11/4-1/4 = 10/4 = 5/2 (d) 再來是E[e|c2;f2] P(d2|c2)=7/8 → e(c2,c1)=0 → f(c1|d2)=0 → 0 P(d2|c2)=7/8 → e(c2,c2)=20/7 → f(c2|d2)=1 → (7/8)*(20/7)*1 = 20/8 = 5/2 因此E[e|c2;f2]為5/2 DC & CV Lab. CSIE NTU

52 Example 4.4 (cont.) DC & CV Lab. CSIE NTU

53 Example 4.4 (cont.) 本例子中General maximin rule 恰好是Deterministic maximin rule F2 DC & CV Lab. CSIE NTU

54 Example 4.5 DC & CV Lab. CSIE NTU
General maximin rule 永遠可以做到和 Bayes最差一樣好 General maximin rule != Bayes decision rule General maximin rule 不一定等於 Deterministic maximin rule (在有些prior probability function下,Deterministic maximin rule的gain會大於General maximin rule 的gain ; 反之誠然, 不過確定的是General maximin rule的min gain 大於等於Deterministic maximin rule的min gain) 我們以決策準則為f1為例找E[e|c1;f1]、E[e|c2;f1] (a) 首先是E[e|c1;f1] 也是從d1開始看, P(d1|c1)=0.3 → e(c1,c1)=1 → f1(c1|d1)=1 → 0.3*1*1 = 0.3 P(d1|c1)=0.3→ e(c1,c2)=0→ f1(c2|d1)=0 → 0 然後看d2 P(d2|c1)=0.4 → e(c1,c1)=1 → f1(c1|d2)=1 → 0.4*1*1 = 0.4 P(d2|c1)=0.4 → e(c1,c2)=0 → f1(c2|d2)=0 → 0 然後看d3 P(d3|c1)=0.3 → e(c1,c1)=1 → f1(c1|d3)=1 → 0.3*1*1 = 0.3 P(d3|c1)=0.3 → e(c1,c2)=0 → f1(c2|d3)=0 → 0 因此E[e|c1;f1]為 = 1 (b) 再來是E[e|c2;f1] P(d1|c2)=0.25 → e(c2,c1)=0 → f1(c1|d1)=1 → 0 P(d1|c2)=0.25 → e(c2,c2)=1 → f1(c2|d1)=0 → 0 P(d2|c2)=0.45 → e(c2,c1)=0 → f1(c1|d2)=1 → 0 P(d2|c2)=0.45 → e(c2,c2)=1 → f1(c2|d2)=0 → 0 P(d3|c2)=0.3 → e(c2,c1)=0 → f1(c1|d3)=1 → 0 P(d3|c2)=0.3 → e(c2,c2)=1 → f1(c2|d3)=0 → 0 因此E[e|c2;f1]為0 DC & CV Lab. CSIE NTU

55 Example 4.5 (cont.) DC & CV Lab. CSIE NTU

56 Example 4.5 (cont.) E[e;f1] = E[e|c1;f1]*P(c1)+E[e|c2;f1]*P(c2) = P(c1) E[e;f4] = E[e|c1;f4]*P(c1)+E[e|c2;f4]*P(c2) = 0.3*P(c1)+0.75*P(c2) = 0.3*P(c1)+0.75*[1-P(c1)] = -0.45*P(c1)+0.75 交會點座標 P(c1) = -0.45P(c1) 所以 P(c1) = 0.75 / 1.45 = 15 / 29 = DC & CV Lab. CSIE NTU

57 Example 4.5 (cont.) f1 and f4 forms the lowest Bayes gain
Find some p that eliminate P(c1) p = 當(1.45p-0.45 ) ≤ 0時, 求得p ≤ DC & CV Lab. CSIE NTU

58 Example 4.5 (cont.) DC & CV Lab. CSIE NTU

59 Decision Rule Error The misidentification error αk
The false-identification error βk DC & CV Lab. CSIE NTU

60 An Instance α1(f) = [ f(c2|d1)*P(d1,c1) + f(c2|d2)*P(d2,c1) + f(c2|d3)*P(d3,c1) + f(c2|d4)*P(d4,c1) + f(c2|d5)*P(d5,c1) + f(c2|d6)*P(d6,c1) ] / [ ] = (0* * * * * *0.05) / ( ) = 0.25 / 0.5 = 0.5 β1(f) = [ f(c1|d1)*P(d1,c2) + f(c1|d2)*P(d2,c2) + f(c1|d3)*P(d3,c2) + f(c1|d4)*P(d4,c2) + f(c1|d5)*P(d5,c2) + f(c1|d6)*P(d6,c2) ] / [ ] = (1* * * * * *0.05) / ( ) = 0.1 / 0.5 = 0.2 DC & CV Lab. CSIE NTU

61 Reserving Judgment The decision rule may withhold judgment for some measurements Then, the decision rule is characterized by the fraction of time it withhold judgment and the error rate for those measurement it does assign. It is an important technique to control error rate. 1. Reserved judgment可有效控制誤差率 2. 對於某些測量值,決策準則可能會抑制到某些判定結果。 3. 決策準則被對於那些它指定的測量值所抑制的判定結果和誤差率的時間比率所描述 DC & CV Lab. CSIE NTU

62 Reserving Judgment Let be the maximum Type I error we can tolerate with category k Let be the maximum Type II error we can tolerate with category k Measurement that will not be rejected (acceptance region) 根據Page33 (1) f(ck|d) = P(ck|d) ≤ βk*[1-P(cK)] P(d, ck) ≤ βk*[1-P(cK)]*P(d) P(d, cj) = P(d)-P(d, ck) ≥ P(d) - βk*(1-P(cK)) *P(d) 所以P(d, cK) ≥ P(d, cj) ≥ P(d) - βk*(1-P(cK)) *P(d) (2) f(cj|d) = P(cj|d) ≤ αk*P(cK) P(d, cj) ≤ αk*P(cK) *P(d) 所以P(d, cK) < P(d, cj) ≤αk*P(d)*P(cK) DC & CV Lab. CSIE NTU

63 Nearest Neighbor Rule Assign pattern x to the closest vector in the training set The definition of “closest”: where is a metric or measurement space Chief difficulty: brute-force nearest neighbor algorithm computational complexity proportional to number of patterns in training set closest vector:最近向量 brute-force nearest neighbor:暴力法 DC & CV Lab. CSIE NTU

64 Binary Decision Tree Classifier
Assign by hierarchical decision procedure DC & CV Lab. CSIE NTU

65 Major Problems Choosing tree structure
Choosing features used at each non-terminal node Choosing decision rule at each non-terminal node DC & CV Lab. CSIE NTU

66 Decision Rules at the Non-terminal Node
Thresholding the measurement component Fisher’s linear decision rule Bayes quadratic decision rule Bayes linear decision rule Linear decision rule from the first principal component Measurement component ,threshold ,≦THRESHOLD:左邊,>:右邊,然後算ENTROPY PURITY找MAX, PAGE25: DC & CV Lab. CSIE NTU

67 Error Estimation An important way to characterize the performance of a decision rule Training data set: must be independent of testing data set Hold-out method: a common technique construct the decision rule with half the data set, and test with the other half DC & CV Lab. CSIE NTU

68 Neural Network A set of units each of which takes a linear combination of values from either an input vector or the output of other units DC & CV Lab. CSIE NTU

69 Neural Network (cont.) Has a training algorithm Responses observed
Reinforcement algorithms Back propagation to change weights Reinforcement algorithms: 加強演算法 Back propagation to change weights: 回過去擴展改變權重 DC & CV Lab. CSIE NTU

70 Summary Bayesian approach Maximin decision rule
Misidentification and false-alarm error rates Nearest neighbor rule Construction of decision trees Estimation of decision rules error Neural network DC & CV Lab. CSIE NTU


Download ppt "Computer Vision Chapter 4"

Similar presentations


Ads by Google