Convolutional Neural Network

Convolutional Neural Network
Hung-yi Lee

Why CNN for Image? …… …… …… …… …… …… ……
[Zeiler, M. D., ECCV 2014] …… …… 我們都知道說CNN常常被用在影像處理上我們當然可以用一般的Nerual network來做影像處理不一定要用CNN 假設你想要做影像的分類你就是train 一個 nerual network input 是一張圖片那這張圖片你就把他表示成裡面的pixcel 也就是一個很長很長的factor 那output呢就是假設你有1000個類別那output就是1000個dimention 實際上如果我們train nerual neetwrok的時候在這個network的structure裡面每一個nerual其實就代表一個最基本的classify 事實上在文件上呢根據訓練的結果也有很多人得到這樣的結論舉例來說第一層的nerual 他是最簡單的classify 有沒有綠色出現有沒有黃色出現有沒有斜的條文那第二個layer呢他做的事情是detect更複雜的東西根據第一個layer的output 他如果看到直線橫線就是窗匡的一部份如果看到棕色橫條紋就是木紋看到斜條紋加灰色的有可能是很多東西比如說輪胎的一部分等等第三個hidden layer會做更複雜的事情是根據第二個hidden layer的output 比如說他可以知道說某一個 neural看到蜂巢就被activate 某一個nerual看到車子就被activate 某一個nerual看到人的上半身就被avtivate 但現在的問題是當我們直接用一般的fully connected network來做影像處理假設這個是一張100*100的彩色圖片如果是彩色圖的話每一個pixcel其實需要3個RGB的value來描述它你把它拉成一個factor 他有多少個pixcel 他有100*100*3 所以是30000維那input的factor如果是30000維那這個hidden layer就1000個nerual就好了你的這個第一層的參數其實就有30000*1000了這樣太多了所以CNN做的事情就是我們來簡化這個nerual network的架構我們把這裡面一些根據人的知識根據人對影像處理的理解我們就知道說某些weight其實用不上一開始就把它濾掉不要用fully connected network而是用比較少的參數來做影像處理這件事情 CNN的運作很複雜但他的模型是比DNN比較簡單 …… …… …… …… Represented as pixels …… The most basic classifiers Use 1st layer as module to build classifiers Use 2nd layer as module …… Can the network be simplified by considering the properties of images?

Why CNN for Image Some patterns are much smaller than the whole image
A neuron does not have to see the whole image to discover the pattern. 我們先講一下為什麼我們有可能把一些參數拿掉我們只用一些比較少的參數就可以做影像處理這件事情這邊有幾個觀察第一個是在影像處理裡面如果我們說第一層的hidden layer要做的事情就是偵測某一種pattern的出現那大部分的pattern其實是離整張image還要小對一個neuron來說假設他要知道一張image裡面有沒有某個pattern出現他其實不需要看整張image 他只要看image的一小部分他就可以決定這件事情舉例來說假設我們現在有一張鳥的圖片第一個hidden layer的某個neuron他的工作是要偵測有沒有鳥嘴的存在有一些neroun偵測有沒有翅膀的存在有一些neroun偵測有沒有爪子的存在有一些neroun偵測有沒有尾巴的存在合起來就可以偵測圖片中的某一隻鳥假設有某一個neroun要偵測有沒有鳥嘴的存在他其實不需要看整張圖他其實只要給neroun看這一個小的部分他其實就可以知道這是不是一個鳥嘴對人來說也是如此你只要看這個小的區域就可以知道這件事情每一個neroun其實只要連接到一個小塊的區域就好他不用連結到整張圖 Connecting to small region with less parameters “beak” detector

Why CNN for Image The same patterns appear in different regions.
“upper-left beak” detector Do almost the same thing They can use the same set of parameters. “middle beak” detector

Why CNN for Image Subsampling the pixels will not change the object
bird bird subsampling We can subsample the pixels to make image smaller Less parameters for the network to process the image

Fully Connected Feedforward network
The whole CNN cat dog …… Convolution Fully Connected Feedforward network Max Pooling Can repeat many times Convolution Max Pooling Flatten

The whole CNN Property 1 Some patterns are much smaller than the whole image Convolution Property 2 Max Pooling The same patterns appear in different regions. Can repeat many times Property 3 Convolution Subsampling the pixels will not change the object Max Pooling Flatten

CNN – Convolution …… Those are the network parameters to be learned. 1
-1 1 Filter 1 Matrix -1 1 Filter 2 Matrix …… 6 x 6 image Each filter detects a small pattern (3 x 3). Property 1

CNN – Convolution 1 -1 Filter 1 stride=1 1 3 -1 6 x 6 image

CNN – Convolution 1 -1 Filter 1 If stride=2 1 3 -3
3 -3 We set stride=1 below 6 x 6 image

CNN – Convolution 1 -1 Filter 1 stride=1 1 3 -1 -3 -1 -3 1 -3 -3 -3 1
3 -1 -3 -1 -3 1 -3 -3 -3 1 3 -2 -2 -1 6 x 6 image Property 2

CNN – Convolution Do the same process for every filter Feature Map -1
stride=1 Do the same process for every filter 1 3 -1 -3 -1 -1 -1 -1 -1 -3 1 -3 -1 -1 -2 1 Feature Map -3 -3 1 -1 -1 -2 1 3 -2 -2 -1 6 x 6 image -1 -4 3 4 x 4 image

CNN – Colorful image 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 Filter 1 Filter 2
1 1

Convolution v.s. Fully Connected
1 1 -1 -1 1 convolution image 1 Fully-connected …… ……

… … Less parameters! … 1: 1 -1 1 Filter 1 2: 3: 3 4: 1 7: 8: 1 9: 10:
3: 3 4: 1 … 7: 8: 1 9: 10: … 13: 6 x 6 image 14: Less parameters! Only connect to 9 input, not fully connected 15: 1 16: 1 …

… … Less parameters! Even less parameters! … 1: 1 -1 1 2: Filter 1 3:
3: 3 4: 1 … 7: 8: 1 -1 9: 10: … 13: 6 x 6 image 14: Less parameters! 15: 1 16: Shared weights Even less parameters! 1 …

CNN – Max Pooling 1 -1 -1 1 Filter 1 Filter 2 3 -1 -3 -1 -1 -1 -1 -1
-3 -1 -1 -2 1 -3 -3 1 -1 -1 -2 1 3 -2 -2 -1 -1 -4 3

CNN – Max Pooling New image but smaller Conv Max Pooling Each filter
1 Conv 3 -1 1 Max Pooling 3 1 3 2 x 2 image 6 x 6 image Each filter is a channel

The whole CNN A new image Smaller than the original image
3 1 -1 Convolution Max Pooling Can repeat many times A new image Convolution Smaller than the original image Max Pooling The number of the channel is the number of filters

The whole CNN cat dog …… Convolution Fully Connected Feedforward network Max Pooling A new image Convolution Max Pooling A new image Flatten

3 Flatten 1 3 1 -1 3 Fully Connected Feedforward network -1 Flatten 1 3

Only modified the network structure and input format (vector -> 3-D tensor)
CNN in Keras input Convolution 1 -1 -1 1 There are 25 3x3 filters. …… Max Pooling Input_shape = ( 1 , 28 , 28 ) 1: black/weight, 3: RGB 28 x 28 pixels Convolution 3 -1 3 Max Pooling -3 1

Only modified the network structure and input format (vector -> 3-D tensor)
CNN in Keras input 1 x 28 x 28 Convolution How many parameters for each filter? 9 25 x 26 x 26 Max Pooling 25 x 13 x 13 Convolution How many parameters for each filter? 225 50 x 11 x 11 Max Pooling 50 x 5 x 5

Only modified the network structure and input format (vector -> 3-D tensor) CNN in Keras input 1 x 28 x 28 output Convolution Fully Connected Feedforward network 25 x 26 x 26 Max Pooling 25 x 13 x 13 Convolution 50 x 11 x 11 Max Pooling 1250 50 x 5 x 5 Flatten

Live Demo

What does CNN learn? The output of the k-th filter is a 11 x 11 matrix. x input Degree of the activation of the k-th filter: 25 3x3 filters Convolution (gradient ascent) 11 Max Pooling 有25個filter每個為3X3，而第二層飛pixal，其範圍也飛3X3，更大。每個filter都是一個cube>。現在我們把地k個filter取出來。定義裡面的值為a，k為第k層filter，ij為第i列第j行。接著定義一個degree of activation，表示地k個filter有多倍activate也就是目前input的東西嶼k個filter有多match。而該degree為a的總和。現在我們想知道第k個filter是什麼，因此我們要找一張圖Ｘ其能讓地k個filter最被active。要做到其實用gradient ascent即可，因為要反過來做，原本是descent。 3 -1 -3 1 -2 …… 50 3x3 filters Convolution 11 50 x 11 x 11 Max Pooling

What does CNN learn? The output of the k-th filter is a 11 x 11 matrix. input Degree of the activation of the k-th filter: 25 3x3 filters Convolution (gradient ascent) Max Pooling 取12個filter其結果如下，可以看到這些圖是某種紋路不斷重複，第三張圖是斜條紋，因此第三張圖是偵測是否有協調文。因為重複因此doa最大 50 3x3 filters Convolution 50 x 11 x 11 Max Pooling For each filter

What does CNN learn? input
Find an image maximizing the output of neuron: Convolution Max Pooling Convolution Max Pooling 跟前面一樣，找到一個nruron，aj，找九張圖最能activate這些neuron，flaten之後是看整張圖，因此不適pattern重複，而是完整的圖。第四章就是偵測千年演的neuron，會琛側比較大的pattern。 flatten Each figure corresponds to a neuron

What does CNN learn? input Can we see digits? Convolution Max Pooling
Can we see digits? Convolution Max Pooling 1 2 Convolution Max Pooling 現在考慮output，讓結果最大，想自動化出數字，卻長這樣。ＣＮＮ學的節果根人學的不太一樣。 3 4 5 flatten 6 7 8 Deep Neural Networks are Easily Fooled

What does CNN learn? Over all pixel values 1 2 1 2 3 4 5 3 4 5 6 7 8 6
圖中白色部分是有塗色的地方，因此Ｘ不應一直重複出現，我們要限制Ｘ，把所有pixal的直取絕對值加總，再娶一個x讓y最大石，sum x最小，也就是大部分的地方都沒有圖筆畫。其結果如下，隱約能看出東西，6有點像6，若再加上其他限制，可能會有更好的效果 1 2 1 2 3 4 5 3 4 5 6 7 8 6 7 8

Deep Dream Given a photo, machine adds what it sees …… CNN
Modify image Given a photo, machine adds what it sees …… 以上的想法就是deep dream的效果，他會再塗上加上他看到的東西。把圖丟入ＣＮＮ，取出某個hidden layer，並把其中每個正直條大，復職條小。後把這當心image的目標。也就是讓ＣＮＮ誇大化他看到的東西。 CNN exaggerates what it sees

Deep Dream Given a photo, machine adds what it sees ……

Deep Style Given a photo, make its style like famous paintings

Deep Style Given a photo, make its style like famous paintings
結果如下

Deep Style ? CNN CNN content style CNN
做法的精神如下，將原圖丟給ＣＮＮ曲奇output的value，吶喊丟給ＣＮＮ，也得到output，但不取其value，在一期fitler和fitler之間的corelation，現在找一張照片output的vaule為左邊照片，output filter間的corelation向右邊的照片，也就是內容取左邊，style取右邊，現在找一張照片同時最大化左邊與右邊，所得就是。 content style A Neural Algorithm of Artistic Style CNN ?

More Application: Playing Go
Network (19 x 19 positions) Next move 為何ＣＮＮ能用在下圍棋。其實不用ＣＮＮ，任意ＮＮ皆可，只要輸入為19X19棋盤，輸出也是，即可。輸入一棋盤，黑1白-1空0，吐出為其上的位置即可，結束。但ＣＮＮ效果更好。把棋盤當作image來看，其為19X19的image ＝ matrix。 19 x 19 matrix (image) 19 x 19 vector 19 x 19 vector Fully-connected feedforward network can be used Black: 1 white: -1 none: 0 But CNN performs much better.

More Application: Playing Go
record of previous plays Training: 黑: 5之五白: 天元黑: 五之5 … Target: “天元” = 1 else = 0 CNN 這是監督室 Target: “五之 5” = 1 else = 0 CNN

Why CNN for playing Go? Some patterns are much smaller than the whole image The same patterns appear in different regions. Alpha Go uses 5 x 5 for first layer 適用ＣＮＮ要有image該有的特徵。為期有些特性和圖形相似，有些pattern比全圖小，ＥＸ鳥嘴只用看鳥頭。圍棋亦然，一個pattern叫吃，只需看小塊。Alphago只用5x5，其假定為其基本為5x5。同樣pattern會出現在不同地方，叫吃能出現在左上或右下，但意義一樣。可用銅detector判斷。

Alpha Go does not use Max Pooling ……
Why CNN for playing Go? Subsampling the pixels will not change the object Max Pooling How to explain this??? Very good example for desing your network 第三點很難理解subsampleing，因而有max pooling，曾有人說ＡＧ因有max pooling因而能供其弱點。但明顯沒有ＡＧ架構有特點。每個位置都用48個value描述，包括，黑白、叫吃、狀態等，第一層做0 pading補0成23x23，第二層用5x5filter，hidden layer用2-12。發現他沒有用max pooling！！！因此要看應用情況設計架構 Alpha Go does not use Max Pooling ……

More Application: Speech
The filters move in the frequency direction. CNN 影像處理，這是聲音。橫軸是時間、縱軸是頻率、顏色是能量。其實是“你好”。讓機器來判斷，基本上只考慮頻率上的移動，一時間方向沒幫助，因為語音後都還會皆有時間因素的其他模組。男女說你好，頻率不同，但能量特徵一樣。應用要看特性 Frequency Image Time Spectrogram

More Application: Text
? 處理文字。輸入文字要判斷正反情緒，vector高為向量中樂相近越像，一字一vector，排起來便image。每個dimension是獨立的 Source of image:

To learn more …… The methods of visualization in these slides
networks-see-the-world.html More about visualization Very cool CNN visualization toolkit The 9 Deep Learning Papers You Need To Know About The-9-Deep-Learning-Papers-You-Need-To-Know- About.html

To learn more …… How to let machine draw an image PixelRNN
Variation Autoencoder (VAE) Generative Adversarial Network (GAN)

Convolutional Neural Network

Similar presentations

Presentation on theme: "Convolutional Neural Network"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

Convolutional Neural Network

Similar presentations

Presentation on theme: "Convolutional Neural Network"— Presentation transcript:

Similar presentations

About project

反馈