Download presentation
Presentation is loading. Please wait.
1
Self-Attention huitr
3
Motivation ๆฎ้CNNๅ ๅ ๅท็งฏๅฑไธ่ฝๅคๅพๅฅฝๅฐๆๆlong range dependency
ๆๅบnon-local operation๏ผๅฏนไบ็นๅพๅพไธญๆฏไธชๅ็ด ็น๏ผ้ฝ็จๅ
ถไปๆๆๅ็ด ็น็ๅๆข็ปๆๅๅ ๆๆฑๅ๏ผๅฝไธๅๅไฝไธบ่ฏฅๅ็ด ็น็ๆฐ็นๅพ Self-attention๏ผ็จๅไธๅผ ๅพ็ไธญ็ๅ
ถไปๅ็ด ็นๆฅๅขๅผบๅฝๅๅ็ด ็น
4
Formulation ๐:๐๐๐๐๐ฅ ๐๐ ๐๐ข๐ก๐๐ข๐ก ๐๐๐ ๐๐ก๐๐๐ X:๐๐๐๐ข๐ก ๐๐๐๐ก๐ข๐๐
๐:๐๐๐๐๐ฅ ๐กโ๐๐ก ๐๐๐ข๐๐๐๐๐ก๐ ๐๐๐ ๐๐๐ ๐ ๐๐๐๐ ๐๐๐ ๐๐ก๐๐๐๐ ๐:๐๐๐๐๐ข๐ก๐ ๐๐๐๐๐ค๐๐ ๐ ๐๐๐๐๐ก๐๐๐๐ โ๐๐ ๐๐๐ก๐ค๐๐๐ ๐ ๐๐๐ ๐ ๐:๐๐๐๐๐ข๐ก๐๐ ๐ ๐๐๐๐๐๐ ๐๐๐ก๐๐ก๐๐๐ ๐๐ X ๐ถ X :๐๐๐๐๐๐๐๐ง๐๐ก๐๐๐
5
Instantiation ๐โ๐๐๐๐ ๐๐ ๐: Embedded dot product version
7
Core idea first gather key features from the entire space into a compact set then distribute them to each location adaptively
8
Method
9
Computational graph
10
Comparison Chen Y , Rohrbach M , Yan Z , et al. Graph-Based Global Reasoning Networks[J]
11
Experiments 5 extra A2-blocks at Res3 and Res4
6.5 GFLOPs and 33.0 M parameters
12
Experiments Chen Y , Rohrbach M , Yan Z , et al. Graph-Based Global Reasoning Networks[J]
14
Method
15
A Generic Formulation of Self-Attention
X: feature maps as a matrix of size s ร c ๐พ ๐ : key function ๐ ๐ : query function ๐ ๐ : value function Implemented as linear layers S=๐๐พ ๐๐ ๐๐๐ X: feature maps as a matrix of size s ร c K, Q:c ร ๐ matrice ๐:๐ ร ๐ matrix
16
X: feature maps as a matrix of size s ร c
K, Q:c ร ๐ matrice ๐:๐ ร ๐ matrix Left Associativity S= ๐๐พ ๐๐ ๐ ๐๐ ๐๐พ ๐๐ ๐ : ๐ ร ๐ โ๐ ร ๐ =๐ ร ๐ , ๅฏไปฅ็ๆๆๆspatial locationไน้ด็็ธไผผๅบฆ๏ผๅณNon-local็ๆๆณ ๐๐พ ๐๐ ๐ ๐๐: ๐ ร ๐ โ๐ ร ๐=๐ ร ๐ Right Associativity S=๐๐พ[ ๐๐ ๐๐๐] [ ๐๐ ๐๐๐]: b ร ๐ โ๐ ร ๐=๐ ร ๐, ๅฏไปฅ็ๆbไธชc็ปด็Global Descriptor๏ผๅณDouble Attention็ๆๆณ ๐๐พ[ ๐๐ ๐๐๐]: ๐ ร ๐ โ๐ ร ๐=๐ ร ๐
17
X: feature maps as a matrix of size s ร c K, Q:c ร ๐ matrice
Left Associativity S= ๐๐พ ๐๐ ๐ ๐๐ ๐๐พ ๐๐ ๐ : ๐ ร ๐ โ๐ ร ๐ =๐ ร ๐ , ๅฏไปฅ็ๆๆๆspatial locationไน้ด็็ธไผผๅบฆ๏ผๅณNon-local็ๆๆณ ๐๐พ ๐๐ ๐ ๐๐: ๐ ร ๐ โ๐ ร ๐=๐ ร ๐ Complexity: ๐ ร ๐ ร ๐ +๐ ร ๐ ร ๐= ๐ 2 (๐+๐) Right Associativity S=๐๐พ[ ๐๐ ๐๐๐] [ ๐๐ ๐๐๐]: b ร ๐ โ๐ ร ๐=๐ ร ๐, ๅฏไปฅ็ๆbไธชc็ปด็Global Descriptor๏ผๅณDouble Attention็ๆๆณ ๐๐พ[ ๐๐ ๐๐๐]: ๐ ร ๐ โ๐ ร ๐=๐ ร ๐ Complexity: ๐ ร ๐ ร ๐+๐ ร ๐ ร ๐=๐ 2๐๐
19
Framework
20
Experiments
22
Comparison with Non-local
23
Criss-cross attention module
H x W x C2 (H+W-1) x H x W H x W x C1 H x W x C2 H x W x C1
24
Criss-cross attention module
๐ธ ๐ : C2 x 1 H x W x C2 (H+W-1) x H x W H x W x C1 H x W x C2 H x W x C1
25
Criss-cross attention module
๐ธ ๐ : C2 x 1 H x W x C2 (H+W-1) x H x W H x W x C1 H x W x C2 ๐ ๐ : (H+W-1) x C2 H x W x C1
26
Criss-cross attention module
๐ธ ๐ : C2 x 1 H x W x C2 (H+W-1) x H x W H x W x C1 H x W x C1 H x W x C2 ๐ ๐ : (H+W-1) x C2 H x W x C1 H x W x C1 ๐ฝ ๐ : (H+W-1) x C1
27
Why 2 loops
28
Experiments
29
่ฏไปท ็ดๆฅไฝฟ็จNon-local๏ผๆ่
ๅ
่ฃ
ไธไธๅไฝฟ็จ๏ผ๏ผ็ธๅฏนๅๆฐๅบฆไธๆฏๅพ้ซ๏ผไฝๆฏ็ฒพๅบฆๅทไธๅปๆ่ฎธไนๅฏไปฅ
ๅๆๅนถ้ไฝNon-local็ๅคๆๅบฆ๏ผไปๅฆไธไธชๆนๅ็่งฃ่ฎก็ฎๅพ๏ผๆฏ่พๆinsight
Similar presentations
© 2024 slidesplayer.com Inc.
All rights reserved.