Download presentation
Presentation is loading. Please wait.
1
Chapter 5 Segmentation Presenter:林婷涵 Mail:ivyyhere@gmail.com
通常用來定義圖像的邊界(直線 曲線 等),來區分前景跟背景 把同類型的pixel貼上同一個標籤 Presenter:林婷涵
2
Segmentation (1/2) 5.1 Active Contours 5.2 Split and Merge
5.3 Mean Shift and Mode Finding 5.4 Normalized Cuts 5.5 Graph Cuts and Energy-based Methods Early method : region split or merging
3
Segmentation (2/2) Paticle filter + snake 5.1 Level set 5.1
Graph based merging 5.2.4 Mean shift 5.3 Normalize cut 5.4 Graph cut 5.5
4
5.1 Active Contours Snakes Scissors Level Sets
Snake:二維的閉合能量曲線,曲線在計算過程中,會慢慢朝輪廓靠近,當能量最小時就是segment完成 Scissor:半自動的segment,他需要user給定一些在輪廓上的點,然後過計算將點跟點之間的強邊給連結起來,有點像是photoshop功能的套索工具 Level set:把圖片想像成3D的,從最低的地方水平線慢慢上升的切面圖就是segment的結果
5
5.1.1 Snakes Snakes are a two-dimensional generalization of the 1D energy-minimizing splines. One may visualize the snake as a rubber band of arbitrary shape that is deforming with time trying to get as close as possible to the object contour. 把目標邊緣想像是一個連續曲線 主動輪廓模型:可變形的閉合曲線+能量函數,以最小化能量目標函數為目標控制變形,在最小能量時就是目標輪廓 蛇的定義:一個n個點的集合,內部能量,外部能量 內部能量:輪廓或是曲線的能量=>讓曲線越光華 外部能量: (圖像能量:梯度等,約束力:人人給的)讓蛇朝輪廓特徵移動 圖像能量:梯度等 約束力:人給的 弹性能量迅速把轮廓线压缩成一个光滑的圆, 弯曲能量驱使轮廓线成为光滑曲线或直线, 而图像力则使轮廓线向图像的高梯度位置靠拢。基本Snakes模型就是在这3个力的联合作用下工作的。
6
Snakes(cont’d) Internal spline energy:
s: arc length vs, vss: first-order and second-order derivatives of curve α, β: first-order and second-order weighting functions Discretized form of internal spline energy cont:contour 輪廓表示的能量:彈性能量 curv:curvature 曲率表示的能量 一次微分:表輪廓的連續性 二次微分:表輪廓的曲率 Alpha越大蛇就會收縮得越快,beta越大,曲線就會越平滑 方程式決定曲線是否可以再這一點伸展or彎曲 如果noise越大,alpha跟beta就要越大 合理的選擇alpha跟beta可以讓輪廓收縮到合理的位置
7
Snakes (cont’d) External spline energy:
Line term: attracting to dark ridges Edge term: attracting to strong gradients Term term: attracting to line terminations In practice, most systems only use the edge term, which can be directly estimated by gradient. 外部能量=圖像力+約束力 圖像力的組成: 線,邊緣,終端
8
Snakes (cont’d) User-placed constraints can also be added.
f: the snake points d: anchor points Wiggle:擺動 Slither:滑行 故得名
9
Snake (cont’d) Because regular snakes have a tendency to shrink, it is usually better to initialize them by drawing the snake outside the object of interest to be tracked.
10
Elastic Nets and Slippery Springs
Applying to TSP (Traveling Salesman Problem): 為了得到最佳解,elastic net可以以pass一些點不用travel 藍色的圈圈:城市的吸引力(機率) 白色:要travel的點 黑色:snake:初始位置是travel點的mean
11
Elastic Nets and Slippery Springs (cont’d)
Probabilistic interpretation: i: each snake node j: each city σ: standard deviation of the Gaussian dij: Euclidean distance between a tour point f(i) and a city location d(j)
12
Elastic Nets and Slippery Springs (cont’d)
The tour f(s) is initialized as a small circle around the mean of the city points and σ is progressively lowered. Slippery spring: this allows the association between constraints (cities) and curve (tour) points to evolve over time.
13
B-spline Approximations
Snakes sometimes exhibit too many degrees of freedom, making it more likely that they can get trapped in local minima during their evolution. Use B-spline approximations to control the snake with fewer degrees of freedom. Snake給太多自由,可能會陷入局部極小 Sol:B曲線的近似
14
Shape Prior 已經得知目標形狀,給主動輪廓曲線給一個形狀模板
15
5.1.2 Dynamic snakes and CONDENSATION
In many applications of active contours, the object of interest is being tracked from frame to frame as it deforms and evolves. In this case, it make sense to use estimates from the previous frame to predict and constrain the new estimates. Conditional Density Propagation 機率: 條件密度傳播
16
Kalman Filtering and Kalman Snakes (1/3)
Kalman filter uses estimates from the previous frame to predict and constrain the new estimates. xt: current state variable xt-1: previous state variable A: linear transition matrix w: noise vector, which is often modeled as a Gaussian 如果你從初始時間開始,就紀錄了某一系統狀態的值. 而這個值受到隨機雜訊的干擾, 所以你不能直接拿來用. 如果你利用從第一筆資料到目前的資料,來估算之前某一時刻的狀態值, 這種演算稱為smoothing. 如果你利用從第一筆資料到目前的資料,來估算目前的狀態值, 這種演算稱為filtering. 如果你利用從第一筆資料到目前的資料,來估算未來某一時刻的狀態值, 這種演算稱為prediction. 所以Kalman filter只是在做上述的filtering動作, 但它是經過最佳化的演算法, 其可在隨機環境中 進行最小平方計算
17
Kalman Filtering and Kalman Snakes (2/3)
The linear dynamic model causes a deterministic change (drift) in the previous estimate, while the process noise causes a stochastic diffusion that increases the system entropy. New measurements from the current frame restore some of the certainty in the updated estimate.
18
Kalman Filtering and Kalman Snakes (3/3)
But in many situations, however, such as when tracking in clutter, some applications require more general multi-modal distributions.
19
Particle Filter Particle filtering techniques represent a probability distribution using a collection of weighted point samples. Then use CONDENSATION to estimate. (Conditional Density Propagation ) Conditional Density Propagation 機率: 條件密度傳播 each density distribution is represented using a superposition of weighted particles.
20
CONditional DENSity propagATION (1/2)
the drift-diffusion-measurement cycle implemented using random sampling, perturbation, and re-weighting stages.
21
CONditional DENSity propagATION (2/2)
B:為甚麼測量密度大部分都是多模態?垂直曲線的線有很多局部最大值 C; Figure 5.8a shows what a factored sample of a head tracker might look like, drawing a red B-spline contour for each of (a subset of) the particles being tracked. Figure 5.8b shows why the measurement density itself is often multi-modal: the locations of the edges perpendicular to the spline curve can have multiple local maxima due to background clutter. Finally, Figure 5.8c shows the temporal evolution of the conditional density (x coordinate of the head and shoulder tracker centroid) as it tracks several people over time.
22
5.1.3 Scissors (1/2) Scissors can draw a better curve (optimal curve path) that clings to high-contrast edges as the user draws a rough outline. Semi-automatic segmentation Algorithm: Step 1: Associate edges that are likely to be boundary elements. Step 2: Continuously recompute the lowest cost path between the starting point and the current mouse location using Dijkstra’s algorithm. Semi-automatic segmation 1.User先大致畫好輪廓,然後計算過程中會慢慢地往高對比的邊靠近 2.把邊跟low cost的boundary element 連起來(Zero crossing, gradient ) Dijkstra:單個源點到其他頂點的最短路徑問題
23
Scissors (2/2) 白色:人的滑鼠 橘色:剪刀自己切出來的
As the user draws a rough outline, the system computes and draws a better curve that clings to high-contrast edges.
24
5.1.4 Level Sets (1/3) If the active contours based on parametric curves of the form f(s), as the shape changes dramatically, curve reparameterization may also be required. Level sets use 2D embedding function φ(x, y) instead of the curve f(s). Level Set方法把低维的一些计算上升到更高一维,把N维的描述看成是N+1维的一个水平 第一,低维时的拓扑变化在高维中不再是一个难题;第二,低维需要不时的重新参数化,高维中不需要;第三,高维的计算更精确,更鲁棒;第四,Level Set方法可以非常容易的向更高维推广
25
Level Sets (2/4) An example is the geodesic active contour:
g(I): snake edge potential (gradient) φ: signed distance function away from the curve geodesic :大地測量學,測地線 第一項:拉長曲線 第二項:把曲線導向低一點的方
26
Level Sets (3/4) According to g(I), the first term can straighten the curve and the second term encourages the curve to migrate towards minima of g(I). Level-set is still susceptible to local minima. An alternative approach is to use the energy measurement inside and outside the segmented regions. 1.第一項:拉長曲線 第二項:把曲線導向低一點的方 2.雖然很容易去改變它的拓樸形狀,但是容易受到局部極小 3.另所以另外一個方法 測量能量的差別(如顏色 紋路…等)
27
Level Sets (4/4) 圓圈會慢慢演化成精確的segmentation的結果
28
5.2 Split and Merge Watershed Region splitting and merging
Graph-based Segmentation k-means clustering Mean Shift 最簡單的方法->直接使用threshold,但是往往資訊都不會夠
29
5.2.1 Watershed (1/2) An efficient way to compute such regions is to start flooding the landscape at all of the local minima and to label ridges wherever differently evolving components meet. Watershed segmentation is often used with the user manual marks corresponding to the centers of different desired components. Threshold(因為她是對一個grayscale的圖做運算) 把圖想像成3D的,比較低的地方就是盆地 想像開始下雨,從盆地開始滿上來,然後兩個不同的component相遇的時候把他標記成山脊 Watershed segmentation:通常是user會去mark出感興趣的元件的中心
30
Watershed (2/2) 聚焦顯微鏡 可能會造成over segmentation
31
5.2.2 Region Splitting (Divisive Clustering)
Step 1: Computes a histogram for the whole image. Step 2: Finds a threshold that best separates the large peaks in the histogram. Step 3: Repeated until regions are either fairly uniform or below a certain size. 利用一些圖像的特性去做簡單的分群 如灰階特徵,或是紋路特徵等等
32
5.2.3 Region Merging (Agglomerative Clustering)
The various criterions of merging regions: Relative boundary lengths and the strength of the visible edges at these boundaries Distance between closest points and farthest points Average color difference or whose regions are too small 1兩個region之間的相對邊界長度 或是 強度 2兩個region 最近點跟最遠2點 3相鄰顏色的區域平均 < threshold 太小
33
5.2.4 Graph-based Segmentation (1/3)
This algorithm uses relative dissimilarities between regions to determine which ones should be merged. Internal difference for any region R: MST(R): minimum spanning tree of R w(e): intensity differences of an edge in MST(R)
34
Graph-based Segmentation (2/3)
Difference between two adjacent regions: Minimum internal difference of these two regions: τ(R): heuristic region penalty 兩個相鄰區域之間的差異 兩個區域假設merge再一起之間的最小差異
35
Graph-based Segmentation (3/3)
If Dif(R1, R2) < Mint(R1, R2) then merge these two adjacent regions. 正方形裡面的變化比 中間那條亮的線還來的大 ,但是我們還是可以自動segment成三塊區域 因為看同一個region的相異性的話 中間那條可以被切出來
36
5.2.5 Probabilistic Aggregation (1/3)
Gray level similarity: Minimal external difference between Ri and Rj: ∆i+ = mink| ∆ik| and ∆ik is the difference in average intensities between regions Ri and Rk Average intensity difference: ∆i- = Σk(τik ∆ik) / Σk(τik) and τik is the boundary length between regions Ri and Rk 機率聚合 Gray level similarity Texture similarity
37
Probabilistic Aggregation (2/3)
Texture similarity: The pairwise statistics σlocal+ and σlocal- are used to compute the likelihoods pij that two regions should be merged. Definition of strong coupling: C: a subset of V φ: usually set to 0.2
38
Probabilistic Aggregation (3/3)
39
5.3 Mean Shift and Mode Finding
L:亮度 UV:色度
40
5.3.1 K-means and Mixtures of Gaussians (1/2)
Step 1: Give the number of clusters k it is supposed to find. Then choose k samples as the centers of clusters. We call the set of centers Y. Step 2: Use fixed Y to compute the square error for all pixels, then we can get the clusters U which has least square error Emin. K-means: 分群演算法,必須事前設定群集的數量 k,然後找尋下列公式的極大值,以達到分群的最佳化之目的。 1.隨機指派群集的中心(隨機找到K個point當作初始群集的中心) 2.產生初始群集(計算其他點離這k個point中誰比較近,然後就會assign到那個群集) 3.產生新的質量中心(重新計算此群體的新的質量中心,並取代剛剛的隨機point當作該群的中心) 4.變動群集邊界(指定完之後,再重新比較每個point跟群集之間的距離,然後再根據距離重新assign依次群集 )
41
K-means and Mixtures of Gaussians (2/2)
Step 3: Use fixed Y and U to compute the square error Emin’. If Emin = Emin’ then stop and we get the final clusters. Step 4: If Emin ≠ Emin’ then use U to find new cluster centers Y’. Go to Step 2 and find new cluster U’, iteratively. Use mixtures of Gaussians to model the superposition of density distributions, and then adopt k-means to find clusters. 把每個component想成一個gaussian,而一張圖上面有很多不同的gaussian去疊加而成
42
5.3.2 Mean Shift (1/8) Mean shift segmentation is the inverse of the watershed algorithm => find the peaks (modes) and then expand the region. 找到peak之後,去identify那些region可以爬到這個peak,就是同一個component
43
Mean Shift (2/8) Step 1: Use kernel density estimation to estimate the density function given a sparse set of samples. f(x): density function xi: input samples k(r): kernel function or Parzen window h: width of kernel
44
Mean Shift (3/8) Step 2: Starting at some guess for a local maximum yk, mean shift computes the gradient of the density estimate f(x) at yk and takes an uphill step in that direction. Delta fx: local maxima的高度 Mx:有點類似算梯度變化,然後得到mean shift向量
45
Mean Shift (4/8) The location of yk in iteration can be express in following formula: Repeat Step 2 until completely converge or after a finite steps. Step 3: The remaining points can then be classified based on the nearest evolution path. 找到梯度變化薇玲處
46
Mean Shift (5/8)
47
Mean Shift (6/8) There are still some kernels to be used:
Epanechnikov kernel (converge in finite steps) Gaussian (normal) kernel (slower but result better) 1. 在有限的步數收斂
48
Mean Shift (7/8) Joint domain: use spatial domain and range domain to segment color image. Kernel of joint domain (five-dimensional): xr: (L*, u*, v*) in range domain xs: (x, y) in spatial domain hr, hs: color and spatial widths of kernel 前面討論到的是color based去設計的kernel,可能會有一些isolate的pixel會沒有對應到 Color and location Hr hs去control空間跟顏色對kernel的影響力
49
Mean Shift (8/8) M: a region has pixels under the number threshold will be eliminated
50
Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
51
Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
52
Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
53
Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
54
Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
55
Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
56
Intuitive Description
Region of interest Center of mass Objective : Find the densest region Distribution of identical billiard balls
57
5.4 Normalized Cuts (1/8) Normalized cuts examine the affinities between nearby pixels and try to separate groups that are connected by weak affinities. Pixel-wise affinity weight for pixels within a radius ∥xi - xj∥ < r : Fi, Fj: feature vectors that consist of intensities, colors, or oriented filter histograms xi, xj: pixel locations 會檢查pixel之間的相關度,然後打斷 比較弱的edge Affinities(similarity)
58
Normalized Cuts (2/8) To find the minimum cut between two groups A and B: A better measure of segmentation is to find minimum normalized cut: assoc(X, Y): Σi ∈ X, j ∈ Y wij 1.定義為 所有要被切斷的weight的sum ,但是不大好,因為這樣為了得到最小的cut,可能會有isolate的pixel 2.assoc:sum of all the weight=>等於先做過normalize
59
Normalized Cuts (3/8)
60
Normalized Cuts (4/8) But computing the optimal normalized cut is NP-complete. The following is a faster method. Minimize the Rayleigh quotient: W: weight matrix [wij] D: diagonal matrix, diagonal entries are the number of corresponding row sums in W 可以把問題轉換成哀跟value problem
61
Normalized Cuts (5/8) y = ((1 + x) - b(1 - x)) / 2 is a vector consisting of all 1s and -bs such that y‧d = 0. x is the indicator vector where xi = +1 iff i ∈ A and xi = -1 iff i ∈ B. It is equivalent to solving a regular eigenvalue problem: N = D-1/2WD-1/2 and N is called normalized affinity matrix.
62
Normalized Cuts (6/8)
63
Normalized Cuts (7/8)
64
Normalized Cuts (8/8)
65
5.5 Graph Cuts and Energy-based Methods (1/5)
Energy corresponding to a segmentation problem: Region term: Region statistics can be mean gray level or color: Boundary term: 圖上的每個pixel想成node,node跟其他相鄰的node之間有相鄰的邊 邊上面的weight就是energy energy又分為兩項,region term跟boundry term region term:這個pixel分配為某個label的懲罰,label又分為前景跟背景,所以可以透過跟給定的目標以及前景的histogram來比較獲得,換句話說就是這個像素屬於某個特定label的機率,但是因為我們希望像素p如果被標記成機率最大的label的時候能量是最小,所以一般取機率的負對數值 boundary term:如果p跟q這兩個pixel是相鄰的像素,那energy就可以當成p跟q之間不連續的懲罰,一般來說p跟q如果越像的話(比如說他們的灰度)那麼這個energy term就會越大,越不向的話就會趨近於零
66
Graph Cuts and Energy-based Methods (2/5)
Use Binary Markov Random Field Optimization: 紅色是前景 藍色是背景
67
Graph Cuts and Energy-based Methods (3/5)
也可以把此問題改成用圖來表示 node的組成就是原本圖上的pixel外,另外多兩個點,sink跟tie s一般表示前景,t表示背景 每條邊上的weight,就是前面講的energy 普通node的是boundary term來決定,跟s跟t的是region term決定 找到一個cut可以把點分成兩個集合(不相交的集合S跟T) minium cut就是此時這樣切時所有邊的weight總和,也等於maxium flow最大流 很明顯,發生在目標和背景邊界處的cut就是我們想要的 最大流/最小割
68
Graph Cuts and Energy-based Methods (4/5)
Another major extension to the original binary segmentation formulation is the addition of directed edges. This method prefers light to dark transitions or vice versa.
69
Graph Cuts and Energy-based Methods (5/5)
GrabCut image segmentation: This allows their system to operate given minimal user input, such as a single bounding box —the background color model is initialized from a strip of pixels around the box outline, and the foreground color model is initialized from the interior pixels. This process runs iteratively to re-estimates the region statistics until resulting better region statistics.
Similar presentations