Presentation is loading. Please wait.

Presentation is loading. Please wait.

Self-Attention huitr 2019.03.16.

Similar presentations


Presentation on theme: "Self-Attention huitr 2019.03.16."โ€” Presentation transcript:

1 Self-Attention huitr

2

3 Motivation ๆ™ฎ้€šCNNๅ †ๅ ๅท็งฏๅฑ‚ไธ่ƒฝๅคŸๅพˆๅฅฝๅœฐๆ•ๆ‰long range dependency
ๆๅ‡บnon-local operation๏ผŒๅฏนไบŽ็‰นๅพๅ›พไธญๆฏไธชๅƒ็ด ็‚น๏ผŒ้ƒฝ็”จๅ…ถไป–ๆ‰€ๆœ‰ๅƒ็ด ็‚น็š„ๅ˜ๆข็ป“ๆžœๅšๅŠ ๆƒๆฑ‚ๅ’Œ๏ผŒๅฝ’ไธ€ๅŒ–ๅŽไฝœไธบ่ฏฅๅƒ็ด ็‚น็š„ๆ–ฐ็‰นๅพ Self-attention๏ผŒ็”จๅŒไธ€ๅผ ๅ›พ็‰‡ไธญ็š„ๅ…ถไป–ๅƒ็ด ็‚นๆฅๅขžๅผบๅฝ“ๅ‰ๅƒ็ด ็‚น

4 Formulation ๐‘–:๐‘–๐‘›๐‘‘๐‘’๐‘ฅ ๐‘œ๐‘“ ๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก ๐‘๐‘œ๐‘ ๐‘–๐‘ก๐‘–๐‘œ๐‘› X:๐‘–๐‘›๐‘๐‘ข๐‘ก ๐‘“๐‘’๐‘Ž๐‘ก๐‘ข๐‘Ÿ๐‘’
๐‘—:๐‘–๐‘›๐‘‘๐‘’๐‘ฅ ๐‘กโ„Ž๐‘Ž๐‘ก ๐‘’๐‘›๐‘ข๐‘š๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘’ ๐‘Ž๐‘™๐‘™ ๐‘๐‘œ๐‘ ๐‘ ๐‘–๐‘๐‘™๐‘’ ๐‘๐‘œ๐‘ ๐‘–๐‘ก๐‘–๐‘œ๐‘›๐‘  ๐‘“:๐‘๐‘œ๐‘š๐‘๐‘ข๐‘ก๐‘’ ๐‘๐‘Ž๐‘–๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’ ๐‘Ÿ๐‘’๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘›๐‘ โ„Ž๐‘–๐‘ ๐‘๐‘’๐‘ก๐‘ค๐‘’๐‘’๐‘› ๐‘– ๐‘Ž๐‘›๐‘‘ ๐‘— ๐‘”:๐‘๐‘œ๐‘š๐‘๐‘ข๐‘ก๐‘’๐‘  ๐‘Ž ๐‘Ÿ๐‘’๐‘๐‘Ÿ๐‘’๐‘ ๐‘’๐‘›๐‘ก๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘œ๐‘“ X ๐ถ X :๐‘๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘™๐‘–๐‘ง๐‘Ž๐‘ก๐‘–๐‘œ๐‘›

5 Instantiation ๐‘โ„Ž๐‘œ๐‘–๐‘๐‘’ ๐‘œ๐‘“ ๐‘“: Embedded dot product version

6

7 Core idea first gather key features from the entire space into a compact set then distribute them to each location adaptively

8 Method

9 Computational graph

10 Comparison Chen Y , Rohrbach M , Yan Z , et al. Graph-Based Global Reasoning Networks[J]

11 Experiments 5 extra A2-blocks at Res3 and Res4
6.5 GFLOPs and 33.0 M parameters

12 Experiments Chen Y , Rohrbach M , Yan Z , et al. Graph-Based Global Reasoning Networks[J]

13

14 Method

15 A Generic Formulation of Self-Attention
X: feature maps as a matrix of size s ร— c ๐พ ๐‘‹ : key function ๐‘„ ๐‘‹ : query function ๐‘‰ ๐‘‹ : value function Implemented as linear layers S=๐‘‹๐พ ๐‘‹๐‘„ ๐‘‡๐‘‹๐‘‰ X: feature maps as a matrix of size s ร— c K, Q:c ร— ๐‘ matrice ๐‘‰:๐‘ ร— ๐‘ matrix

16 X: feature maps as a matrix of size s ร— c
K, Q:c ร— ๐‘ matrice ๐‘‰:๐‘ ร— ๐‘ matrix Left Associativity S= ๐‘‹๐พ ๐‘‹๐‘„ ๐‘‡ ๐‘‹๐‘‰ ๐‘‹๐พ ๐‘‹๐‘„ ๐‘‡ : ๐‘  ร— ๐‘ โˆ—๐‘ ร— ๐‘ =๐‘  ร— ๐‘ , ๅฏไปฅ็œ‹ๆˆๆ‰€ๆœ‰spatial locationไน‹้—ด็š„็›ธไผผๅบฆ๏ผŒๅณNon-local็š„ๆ€ๆƒณ ๐‘‹๐พ ๐‘‹๐‘„ ๐‘‡ ๐‘‹๐‘‰: ๐‘  ร— ๐‘  โˆ—๐‘  ร— ๐‘=๐‘  ร— ๐‘ Right Associativity S=๐‘‹๐พ[ ๐‘‹๐‘„ ๐‘‡๐‘‹๐‘‰] [ ๐‘‹๐‘„ ๐‘‡๐‘‹๐‘‰]: b ร— ๐‘  โˆ—๐‘  ร— ๐‘=๐‘ ร— ๐‘, ๅฏไปฅ็œ‹ๆˆbไธชc็ปด็š„Global Descriptor๏ผŒๅณDouble Attention็š„ๆ€ๆƒณ ๐‘‹๐พ[ ๐‘‹๐‘„ ๐‘‡๐‘‹๐‘‰]: ๐‘  ร— ๐‘ โˆ—๐‘ ร— ๐‘=๐‘  ร— ๐‘

17 X: feature maps as a matrix of size s ร— c K, Q:c ร— ๐‘ matrice
Left Associativity S= ๐‘‹๐พ ๐‘‹๐‘„ ๐‘‡ ๐‘‹๐‘‰ ๐‘‹๐พ ๐‘‹๐‘„ ๐‘‡ : ๐‘  ร— ๐‘ โˆ—๐‘ ร— ๐‘ =๐‘  ร— ๐‘ , ๅฏไปฅ็œ‹ๆˆๆ‰€ๆœ‰spatial locationไน‹้—ด็š„็›ธไผผๅบฆ๏ผŒๅณNon-local็š„ๆ€ๆƒณ ๐‘‹๐พ ๐‘‹๐‘„ ๐‘‡ ๐‘‹๐‘‰: ๐‘  ร— ๐‘  โˆ—๐‘  ร— ๐‘=๐‘  ร— ๐‘ Complexity: ๐‘  ร— ๐‘ ร— ๐‘ +๐‘  ร— ๐‘  ร— ๐‘= ๐‘  2 (๐‘+๐‘) Right Associativity S=๐‘‹๐พ[ ๐‘‹๐‘„ ๐‘‡๐‘‹๐‘‰] [ ๐‘‹๐‘„ ๐‘‡๐‘‹๐‘‰]: b ร— ๐‘  โˆ—๐‘  ร— ๐‘=๐‘ ร— ๐‘, ๅฏไปฅ็œ‹ๆˆbไธชc็ปด็š„Global Descriptor๏ผŒๅณDouble Attention็š„ๆ€ๆƒณ ๐‘‹๐พ[ ๐‘‹๐‘„ ๐‘‡๐‘‹๐‘‰]: ๐‘  ร— ๐‘ โˆ—๐‘ ร— ๐‘=๐‘  ร— ๐‘ Complexity: ๐‘ ร— ๐‘  ร— ๐‘+๐‘  ร— ๐‘ ร— ๐‘=๐‘ 2๐‘๐‘

18

19 Framework

20 Experiments

21

22 Comparison with Non-local

23 Criss-cross attention module
H x W x C2 (H+W-1) x H x W H x W x C1 H x W x C2 H x W x C1

24 Criss-cross attention module
๐‘ธ ๐’– : C2 x 1 H x W x C2 (H+W-1) x H x W H x W x C1 H x W x C2 H x W x C1

25 Criss-cross attention module
๐‘ธ ๐’– : C2 x 1 H x W x C2 (H+W-1) x H x W H x W x C1 H x W x C2 ๐›€ ๐’– : (H+W-1) x C2 H x W x C1

26 Criss-cross attention module
๐‘ธ ๐’– : C2 x 1 H x W x C2 (H+W-1) x H x W H x W x C1 H x W x C1 H x W x C2 ๐›€ ๐’– : (H+W-1) x C2 H x W x C1 H x W x C1 ๐šฝ ๐’– : (H+W-1) x C1

27 Why 2 loops

28 Experiments

29 ่ฏ„ไปท ็›ดๆŽฅไฝฟ็”จNon-local๏ผˆๆˆ–่€…ๅŒ…่ฃ…ไธ€ไธ‹ๅ†ไฝฟ็”จ๏ผ‰๏ผŒ็›ธๅฏนๅˆ›ๆ–ฐๅบฆไธๆ˜ฏๅพˆ้ซ˜๏ผŒไฝ†ๆ˜ฏ็ฒพๅบฆๅˆทไธŠๅŽปๆˆ–่ฎธไนŸๅฏไปฅ
ๅˆ†ๆžๅนถ้™ไฝŽNon-local็š„ๅคๆ‚ๅบฆ๏ผŒไปŽๅฆไธ€ไธชๆ–นๅ‘็†่งฃ่ฎก็ฎ—ๅ›พ๏ผŒๆฏ”่พƒๆœ‰insight


Download ppt "Self-Attention huitr 2019.03.16."

Similar presentations


Ads by Google