Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-threaded Algorithm 3

Similar presentations


Presentation on theme: "Multi-threaded Algorithm 3"โ€” Presentation transcript:

1 Multi-threaded Algorithm 3
Michael Tsai 2014/1/2

2 Multithreaded matrix multiplication
P-SQUARE-MATRIX-MULTIPLY(A,B) n=A.rows let C be a new n x n matrix parallel for i=1 to n parallel for j=1 to n ๐‘ ๐‘–๐‘— =0 for k=1 to n ๐‘ ๐‘–๐‘— = ๐‘ ๐‘–๐‘— + ๐‘Ž ๐‘–๐‘˜ โˆ™ ๐‘ ๐‘˜๐‘— return C ๐‘‡ 1 (๐‘›)=ฮ˜ ๐‘› 3 ๐‘‡ โˆž ๐‘› =ฮ˜ log n +ฮ˜ log ๐‘› +ฮ˜ ๐‘› =ฮ˜(๐‘›) ๐‘‡ 1 ๐‘› ๐‘‡ โˆž ๐‘› = ฮ˜ ๐‘› 3 ฮ˜ ๐‘› =ฮ˜ ๐‘› 2

3 Divide-and-conquer Multithreaded Algorithm for Matrix Multiplication (ๅ‹’ๅ‹’้•ท)
๐ด= ๐ด 11 ๐ด 12 ๐ด 21 ๐ด 22 ๐ต= ๐ต 11 ๐ต 12 ๐ต 21 ๐ต 22 ๐ถ= ๐ถ 11 ๐ถ 12 ๐ถ 21 ๐ถ 22 = ๐ด 11 ๐ด 12 ๐ด 21 ๐ด ๐ต 11 ๐ต 12 ๐ต 21 ๐ต = ๐ด 11 ๐ต 11 ๐ด 12 ๐ต 12 ๐ด 21 ๐ต 11 ๐ด 22 ๐ต ๐ด 12 ๐ต ๐ด 12 ๐ต 22 ๐ด 22 ๐ต ๐ด 22 ๐ต 22 C T

4 ๐‘€ 1 ๐‘› ๐‘€ โˆž ๐‘› = ฮ˜ ๐‘› 3 ฮ˜ log 2 ๐‘› =ฮ˜ ๐‘› 3 log 2 ๐‘›
P-MATRIX-MULTIPLY-RECURSIVE(C,A,B) n=A.rows if n==1 ๐‘ 11 = ๐‘Ž 11 ๐‘ 11 else let T be a new n x n matrix partition A,B,C, and T into n/2 x n/2 submatrices ( ๐ด ๐‘–๐‘— , ๐ต ๐‘–๐‘— , ๐ถ ๐‘–๐‘— , ๐‘‡ ๐‘–๐‘— , ๐‘–,๐‘—={1,2}) spawn P-MATRIX-MULTIPLY-RECURSIVE( ๐ถ 11 , ๐ด 11 , ๐ต 11 ) spawn P-MATRIX-MULTIPLY-RECURSIVE( ๐ถ 12 , ๐ด 11 , ๐ต 12 ) spawn P-MATRIX-MULTIPLY-RECURSIVE( ๐ถ 21 , ๐ด 21 , ๐ต 11 ) spawn P-MATRIX-MULTIPLY-RECURSIVE( ๐ถ 22 , ๐ด 21 , ๐ต 12 ) spawn P-MATRIX-MULTIPLY-RECURSIVE( ๐‘‡ 11 , ๐ด 12 , ๐ต 21 ) spawn P-MATRIX-MULTIPLY-RECURSIVE( ๐‘‡ 12 , ๐ด 12 , ๐ต 22 ) spawn P-MATRIX-MULTIPLY-RECURSIVE( ๐‘‡ 21 , ๐ด 22 , ๐ต 21 ) P-MATRIX-MULTIPLY-RECURSIVE( ๐‘‡ 22 , ๐ด 22 , ๐ต 22 ) sync parallel for i=1 to n parallel for j=1 to n ๐‘ ๐‘–๐‘— = ๐‘ ๐‘–๐‘— + ๐‘ก ๐‘–๐‘— ๐‘€ 1 ๐‘› =8 ๐‘€ 1 ๐‘› 2 +ฮ˜ ๐‘› 2 =ฮ˜ ๐‘› 3 ๐‘€ 1 ๐‘› ๐‘€ โˆž ๐‘› = ฮ˜ ๐‘› 3 ฮ˜ log 2 ๐‘› =ฮ˜ ๐‘› 3 log 2 ๐‘› ๐‘€ โˆž ๐‘› = ๐‘€ โˆž ๐‘› 2 +ฮ˜ log ๐‘› +ฮ˜ log ๐‘› =ฮ˜ log 2 ๐‘›

5 How about Strassenโ€™s method?
Reading assignment: p Parallelism: ฮ˜( ๐‘› log log 2 ๐‘› ), slightly less than the original recursive version!

6 Multithreaded Merge Sort
A: Array to be sorted p and r: Start and end index of the range to be sorted Merge-Sortโ€™(A,p,r) if p<r q= (๐‘+๐‘Ÿ)/2 spawn MERGE-SORTโ€™(A,p,q) MERGE-SORTโ€™(A,q+1,r) sync MERGE(A,p,q,r) Merge() is the bottleneck! ฮ˜(๐‘›) ๐‘€ ๐‘† 1 โ€ฒ ๐‘› =2๐‘€ ๐‘† 1 โ€ฒ ๐‘› 2 +ฮ˜ ๐‘› =ฮ˜ ๐‘› log ๐‘› Parallelism= ฮ˜( log ๐‘› ) ๐‘€ ๐‘† โˆž โ€ฒ ๐‘› =๐‘€ ๐‘† โˆž โ€ฒ ๐‘› 2 +ฮ˜ ๐‘› =ฮ˜ ๐‘›

7 ๆŠŠMergeไนŸ็”จDivide & Conquerไพ†่งฃ!
(ๆ–นไพฟไบค็ตฆไธๅŒ็š„threadๅŽปๅš) 2. ๆ‰พๅ‡บ ๐‘ 2 ~ ๐‘Ÿ 2 ไธญ ๐‘ž 2 ้€™ๅ€‹ไฝ็ฝฎไฝฟๅพ— ๐‘ 2 ~ ๐‘ž 2 โˆ’1้ƒฝ< x, ๐‘ž 2 ~ ๐‘Ÿ 2 ้ƒฝโ‰ฅ x 1. ๆŒ‘ๅ‡บ ๐‘ 1 ~ ๐‘Ÿ 1 ็š„ไธญไฝๆ•ธx 3. Copy x ๅˆฐๆ–ฐๅœฐๆ–น(ๆ นๆ“šxๅ’Œ ๐‘ž 2 ็š„ไฝ็ฝฎ) Base case: ๐‘ 1 ~ ๐‘Ÿ 1 ๅ’Œ ๐‘ 2 ~ ๐‘Ÿ 2 ้ƒฝๆ˜ฏ็ฉบ็š„ 4. ้–‹ๅˆ†่บซๅŽปmerge ๐‘ 1 ~ ๐‘ž 1 โˆ’1ๅ’Œ ๐‘ 2 ~ ๐‘ž 2 โˆ’1, ๅŠ ๐‘ž 1 +1~ ๐‘Ÿ 1 ๅ’Œ ๐‘ž 2 ~ ๐‘Ÿ 2 ๅ…ฉๅ€‹ๅ€ๆฎต

8 Multithreaded Merge P-MERGE(T, ๐‘ 1 , ๐‘Ÿ 1 , ๐‘ 2 , ๐‘Ÿ 2 ,A, ๐‘ 3 ) ๐‘› 1 = ๐‘Ÿ 1 โˆ’ ๐‘ 1 +1 ๐‘› 2 = ๐‘Ÿ 2 โˆ’ ๐‘ 2 +1 if ๐‘› 1 < ๐‘› 2 exchange ๐‘ 1 ๐‘ค๐‘–๐‘กโ„Ž ๐‘ 2 exchange ๐‘Ÿ 1 ๐‘ค๐‘–๐‘กโ„Ž ๐‘Ÿ 2 exchange ๐‘› 1 ๐‘ค๐‘–๐‘กโ„Ž ๐‘› 2 if ๐‘› 1 ==0 return else ๐‘ž 1 = ( ๐‘ 1 + ๐‘Ÿ 1 )/2 ๐‘ž 2 =BINARY-SEARCH(T[ ๐‘ž 1 ],T, ๐‘ 2 , ๐‘Ÿ 2 ) ๐‘ž 3 = ๐‘ 3 + ๐‘ž 1 โˆ’ ๐‘ 1 +( ๐‘ž 2 โˆ’ ๐‘ 2 ) A[ ๐‘ž 3 ]=T[ ๐‘ž 1 ] spawn P-MERGE(T, ๐‘ 1 , ๐‘ž 1 โˆ’1, ๐‘ 2 , ๐‘ž 2 โˆ’1,A, ๐‘ 3 ) P-MERGE(T, ๐‘ž 1 +1, ๐‘Ÿ 1 , ๐‘ž 2 , ๐‘Ÿ 2 ,A, ๐‘ž 3 +1) sync T: Array to be merged A: Array to save the merged result ๐‘ 1 , ๐‘Ÿ 1 , ๐‘ 2 , ๐‘Ÿ 2 : Start and end index of the range to be merged ๐‘ 3 : The index of the median in A

9 ๐‘› 1 โ‰ฅ ๐‘› 2 , ๐‘›= ๐‘› 1 + ๐‘› 2 ๐‘› 2 = 2 ๐‘› 2 2 โ‰ค ๐‘› 1 + ๐‘› 2 2 = ๐‘› 2 ๆœ€็ณŸ็š„็‹€ๆณไธ‹, xๆฏ”ๆ‰€ๆœ‰ ๐‘ 2 ~ ๐‘Ÿ 2 ็š„้ƒฝๅคง ่ฆmerge ๐‘› ๐‘› 2 ๅ€‹ๅ…ƒ็ด  ๐‘› ๐‘› 2 โ‰ค ๐‘› ๐‘› ๐‘› 2 2 = ๐‘› 1 + ๐‘› ๐‘› 2 2 โ‰ค ๐‘› 2 + ๐‘› 4 = 3๐‘› 4

10 Multithreaded Merge P-MERGE(T, ๐‘ 1 , ๐‘Ÿ 1 , ๐‘ 2 , ๐‘Ÿ 2 ,A, ๐‘ 3 ) ๐‘› 1 = ๐‘Ÿ 1 โˆ’ ๐‘ 1 +1 ๐‘› 2 = ๐‘Ÿ 2 โˆ’ ๐‘ 2 +1 if ๐‘› 1 < ๐‘› 2 exchange ๐‘ 1 ๐‘ค๐‘–๐‘กโ„Ž ๐‘ 2 exchange ๐‘Ÿ 1 ๐‘ค๐‘–๐‘กโ„Ž ๐‘Ÿ 2 exchange ๐‘› 1 ๐‘ค๐‘–๐‘กโ„Ž ๐‘› 2 if ๐‘› 1 ==0 return else ๐‘ž 1 = ( ๐‘ 1 + ๐‘Ÿ 1 )/2 ๐‘ž 2 =BINARY-SEARCH(T[ ๐‘ž 1 ],T, ๐‘ 2 , ๐‘Ÿ 2 ) ๐‘ž 3 = ๐‘ 3 + ๐‘ž 1 โˆ’ ๐‘ 1 +( ๐‘ž 2 โˆ’ ๐‘ 2 ) A[ ๐‘ž 3 ]=T[ ๐‘ž 1 ] spawn P-MERGE(T, ๐‘ 1 , ๐‘ž 1 โˆ’1, ๐‘ 2 , ๐‘ž 2 โˆ’1,A, ๐‘ 3 ) P-MERGE(T, ๐‘ž 1 +1, ๐‘Ÿ 1 , ๐‘ž 2 , ๐‘Ÿ 2 ,A, ๐‘ž 3 +1) sync ๐‘ƒ ๐‘€ โˆž ๐‘› =๐‘ƒ ๐‘€ โˆž 3๐‘› 4 +ฮ˜ log ๐‘› =ฮ˜ log 2 ๐‘› ๐‘ƒ ๐‘€ 1 ๐‘› =ฮฉ ๐‘› , ๅ› ็‚บ่‡ณๅฐ‘่ฆcopy n elements ๐‘ƒ ๐‘€ 1 ๐‘› =O ๐‘› , ่ฆ‹่ชฒๆœฌp.802 โ‡’๐‘ƒ ๐‘€ 1 ๐‘› =ฮ˜ ๐‘› ฮ˜ log ๐‘›

11 P-MERGE-SORT(A,p,r,B,s) n=r-p+1 if n==1 B[s]=A[p] else let T[1
P-MERGE-SORT(A,p,r,B,s) n=r-p+1 if n==1 B[s]=A[p] else let T[1..n] be a new array ๐‘ž= ๐‘+๐‘Ÿ 2 ๐‘ž โ€ฒ =๐‘žโˆ’๐‘+1 spawn P-MERGE-SORT(A,p,q,T,1) P-MERGE-SORT(A,q+1,r,T,qโ€™+1) sync P-MERGE(T,1,qโ€™,qโ€™+1,n,B,s) A: Array to be merged p,r: Index of the range to be sorted B: Array to save the result ๐‘ƒ๐‘€ ๐‘† 1 ๐‘› =2๐‘ƒ๐‘€ ๐‘† 1 ๐‘› 2 +๐‘ƒ ๐‘€ 1 ๐‘› =2๐‘ƒ๐‘€ ๐‘† 1 ๐‘› 2 +ฮ˜ ๐‘› =ฮ˜ ๐‘› log ๐‘› ๐‘ƒ๐‘€ ๐‘† โˆž ๐‘› =๐‘ƒ๐‘€ ๐‘† โˆž ๐‘› 2 +๐‘ƒ ๐‘€ โˆž ๐‘› =๐‘ƒ๐‘€ ๐‘† โˆž ๐‘› 2 +ฮ˜ log 2 ๐‘› =ฮ˜ log 3 ๐‘› Parallelism = ฮ˜ ๐‘› log ๐‘› log 3 ๐‘› =ฮ˜ ๐‘› log 2 ๐‘› Much better now!

12 ๆ€Ž้บผ่‡ชๅทฑๅญธๆผ”็ฎ—ๆณ• ้žๅธธ้‡่ฆ ้€™ๅ ‚่ชฒๆฒ’่พฆๆณ•ๆŠŠๆ‰€ๆœ‰โ€้‡่ฆโ€็š„ๆผ”็ฎ—ๆณ•้ƒฝๆ•™ๅฎŒ ไฝ†ๆ˜ฏๅธŒๆœ›้Ž็จ‹ไธญไฝ ๅทฒ็ถ“ๅปบ็ซ‹ไบ†่‡ชๅทฑๅญธ็ฟ’ๆผ”็ฎ—ๆณ•็š„่ƒฝๅŠ›
ๆบ–ๅ‚™็ ”็ฉถๆ‰€่€ƒ่ฉฆ็š„ๆ™‚ๅ€™ๅฏ่ƒฝๆœƒ้œ€่ฆ

13 ๆˆ‘็š„็ถ“้ฉ— ๆˆ‘ไนŸ่ทŸ่‘—ๅคงๅฎถไธ€่ตทๅญธ็ฟ’ ่ณ‡ๆ–™็ตๆง‹ ่ˆ‡ ๆผ”็ฎ—ๆณ• ไธ€ไบ›ๅฅฝๆ–นๆณ•(่‡ชไปฅ็‚บ)่ทŸๅคงๅฎถๅˆ†ไบซ: ็•ซๅœ–, ็”จ็ฐกๅ–ฎ็š„ไพ‹ๅญๅœ–่งฃๆญฅ้ฉŸ
ๅ…ˆไบ†่งฃๆฆ‚ๅฟต(ๅคงๆ–นๅ‘), ไธๆ€ฅ่‘—็œ‹่ค‡้›œ็š„ๆ•ธๅญธ (ๆˆ‘ๅธธ่ชช: ๅฆ‚ๆžœไป€้บผ้ƒฝๆฒ’ๅผ„ๆ‡‚, ๅ…ˆๅผ„ๆ‡‚้€™ๅ€‹) ไธ€ๆฌกไบ†่งฃไธ€ๅฐ้ƒจๅˆ†ๅฐฑๅฅฝ (ๆจก็ต„ๅŒ–ๅญธ็ฟ’), ไธๆ€ฅ่‘—ๅผ„ๆ‡‚ๆ‰€ๆœ‰็š„้ƒจๅˆ† ็›ธไฟก่‡ชๅทฑไธ€ๅฎšๅฏไปฅๅผ„ๆ‡‚ (้žๅธธ้‡่ฆ) ไธ็Ÿฅ้“่‡ชๅทฑๅˆฐๅบ•ๆ‡‚ไบ†ๆฒ’? ็”จไธ€ไบ›ๅคๆ€ช็š„ไพ‹ๅญไพ†่ฉฆ่ฉฆ (boundary case) ็ทด็ฟ’่‡ชๅทฑ็œ‹่ชฒๆœฌ (้žๅธธ้‡่ฆ)

14 ๆœŸๆœซ่€ƒๅ…งๅฎน&ๅž‹ๅผ Closed book, 2 x A4 cheat sheets (double-sided) ไฝ”ๅญธๆœŸๆˆ็ธพ26%
ๅŒ…ๅซๆ•ดๅญธๆœŸไธŠ่ชฒๅ…งๅฎน, ไฝ†ไปฅๆœŸไธญ่€ƒ้ŽๅพŒ็š„ๅ…งๅฎน็‚บไธป 180 minutes ้กŒๅž‹่ˆ‡ๆœŸไธญ่€ƒ้กžไผผ (ๆ˜ฏ้ž+้ธๆ“‡+่งฃ้‡‹) Extra office hour ??


Download ppt "Multi-threaded Algorithm 3"

Similar presentations


Ads by Google