Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture on High Performance Processor Architecture (CS05162)

Similar presentations


Presentation on theme: "Lecture on High Performance Processor Architecture (CS05162)"— Presentation transcript:

1 Lecture on High Performance Processor Architecture (CS05162)
Dynamic Branch Prediction Scheme An Hong Fall 2007 University of Science and Technology of China Department of Computer Science and Technology CS258 S99

2 Outline Bimodal Branch Prediction Scheme
Two-Level Branch Prediction Scheme 混合预测算法的例子:Alpha 21264的分支预测器 2018/9/20 CS of USTC AN Hong

3 Dynamic Prediction(1): 利用单个分支自身历史(基于模式的预测)
Dynamic Prediction :Use run-time information to make prediction example: Branch Prediction Buffer(1-位预测器) 2018/9/20 CS of USTC AN Hong

4 Dynamic Prediction: 1-bit BHT
Branch History Table Lower bits of PC address index table of 1-bit values Says whether or not branch taken last time No address check Problem: in a loop, 1-bit BHT will cause two mispredictions : End of loop case, when it exits instead of looping as before First time through loop on next time through code, when it predicts exit instead of looping 2018/9/20 CS of USTC AN Hong

5 Dynamic Prediction: 1-bit BHT (Branch Prediction Buffer)
Pros: Small. 1 bit per entry can fit lots of entries Always returns a prediction Cons: aliasing between branches one bit of state mispredicts many branches for (I =0; I<10; I++) { a = a + 1; } Two mispredictions per loop invocation 2018/9/20 CS of USTC AN Hong

6 2-bit Saturating Up-down Counter
Dynamic Prediction(2):Bimodal Branch Prediction Scheme(2bits BHT, 2-位饱和预测器) Solution: 2-bit predictor where change prediction only if get misprediction twice: Use extra state to reduce mispredictions at loop ends Red: stop, not taken Green: go, taken Adds hysteresis to decision making process T NT Predict Taken Predict Not Taken BHT T = Taken N = Not Taken 11 10 00 01 2-bit Saturating Up-down Counter 2018/9/20 CS of USTC AN Hong

7 Bimodal Branch Prediction Scheme
Strategy: Based on the direction the branch went the last few times it was executed. Based on a little self-history pattern Based on a counter Works well: when each branch is strongly biased in a particular direction. For scientific/engineering applications where program execution is dominated by inner-loops. 2018/9/20 CS of USTC AN Hong

8 Bimodal Branch Prediction Scheme
例1:…NNNTNNN… 1-位预测器,出现2次预测错 2-位预测器,出现1次预测错 例2:TNTNTN…., 初始状态为01的2-位预测器,出现100%预测错 BHT方法准确度 Mispredict because either: Wrong guess for that branch Got branch history of wrong branch when index the table 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12% 4096 about as good as infinite table (in Alpha 21164) 2-bit已经足够, n-bit (n>2)与2-bit效果差不多 2018/9/20 CS of USTC AN Hong

9 Pros/Cons Cons: Pros: Now only mispredicts once on each loop
Also good for data-dependent branches where most data points the same way ex: checking for termination character at the end of a string Pros: Still have aliasing problem between branches only uses information about history of current branch(self-history, or local-history) But, sequences of branches often correlate 2018/9/20 CS of USTC AN Hong

10 Dynamic Prediction(3): 利用多个分支全局历史(基于相关的预测)
例子:Taken from the equntott benchmark if (aa == 2) aa = 0; if (bb == 2) bb = 0; if (aa != bb) {……} If first two branches taken, third always taken 在许多整型应用中,控制流常常是很复杂的,并且一个分支的结果经常受已执行的其它分支的结果影响。也就是说,分支是相关的。由于这种相关性,如果只看分支本身的历史,单个分支的历史看上去缺少规律性,如果直接用它来做局部预测,精度就不会高。 2018/9/20 CS of USTC AN Hong

11 基于相关的动态分支预测技术 预测原理 预测器结构:全部采用二级结构
根据若干条相邻分支最近执行历史的组合——相邻历史(neighbor-history),或称全局历史(global-history)来做预测,因此又称全局预测技术 预测器结构:全部采用二级结构 第一级为一个分支历史寄存器(Branch History Register, BHR),用于记录所有相关分支最近的执行历史; 第二级为一个全局的模式历史表(Global pattern history table,PHT),用于记录每条分支的行为状态。 典型的全局预测器是Pan, So和Rahmeh提出的基于相关的预测器,McFarling称之为gselect预测器;McFarling在[Pan, So和Rahmeh]的基础上为解决命名冲突提出了gshare预测器; Yeh和Patt提出的两级自适应预测器中的其中三种配置:GAg/GAp /GAs预测器也属全局预测器。 2018/9/20 CS of USTC AN Hong

12 Two-Level Branch Prediction (GAg)
Use history of recent branches 2018/9/20 CS of USTC AN Hong

13 Global Branch Prediction Scheme: Global BHR/pre-address PHTs
相应于每个路径的子历史表 k-bits per-address PHTs global BHR (Shift left when update) n-bits 2-bits Branch Address(PC) k=4 n = full address 16 entries Denotation GAp(k) G: “Global” BHR A: adaptive p: “per-address” PHT k: BHR length GAp(k) = GAp(4),又称(k,n)-gselect预测器 2018/9/20 CS of USTC AN Hong

14 Global Branch Prediction Scheme: 设计思想
例:考虑下面的代码段: if (aa == 2) // branch b1 aa =0; if (bb == 2) // branch b2 bb =0; if (aa != bb) { // branch b3 …… } 这段代码由编译器转换为以下汇编程序(假定aa和bb分别分配为R1和R2): SUBI R3,R1,#2 BNEZ R3,L1 ;branch b1 (aa != 2) , taken ADD R1, R0, R0 ;aa == 0, not taken L1: SUBI R3,R2,#2 BNEZ R3,L2 ;branch b2 (bb != 2) , taken ADD R2, R0, R0 ;bb == 0, not taken L2:SUB R3,R1,R2 ;R3=aa-bb BEQZ R3,L3 ;branch b3 (aa = = bb) , taken 通向b3的路径 A:0-0 B:0-1 C:1-0 D:1-1 在b3执行之前可以推测 aa=0 bb=0 bb2 aa2 2018/9/20 CS of USTC AN Hong

15 Global Branch Prediction Scheme: 设计思想
aa bb aa' bb' 到达b3的路径 PHT的当前状态 b3预测结果 b3的实际结果 正确:c 错误:w PHT的下一个状态 2 C N T w 1 A B c 3 D 2018/9/20 aa’和bb’是b1和b2执行后aa和bb的新值 CS of USTC AN Hong

16 Global Branch Prediction Scheme: 设计思想
路径 B1 和 B2的方向 B3的实际结果 2018/9/20 CS of USTC AN Hong

17 Pros/Cons of Two-Level Branch Prediction
Predicts correlated branch behavior that breaks other predictors eqntott example Better overall performance than purely address-based predictors Cons: Interference between unrelated branches with same history example: all loop-end branches will map to same entry in pattern history table sometimes this is a good thing 2018/9/20 CS of USTC AN Hong

18 Global Branch Prediction Scheme
GAp(k) 又称(k,n)-gselect预测器 (0,n)- gselect 预测器,即为2-位(双峰)预测器 (M,0)- gselect预测器,即为GAg(M)预测器 k-bits per-address PHTs global BHR n-bits 2-bits Branch Address(PC) k=4 n = full address 16 entries Denotation GAp(k) G: “Global” BHR A: adaptive p: “per-address” PHT k: BHR length 2018/9/20 CS of USTC AN Hong

19 Global Branch Prediction Scheme: Global BHR/pre-set PHTs
k-bits per-set PHTs global BHR n-bits 2-bits Branch Address(PC) k=4 n =10 16 entries Denotation GAs(n, 2k) G: “Global” BHR A: adaptive s: “per-set” PHT k: BHR length GAs(4,1024) 2018/9/20 CS of USTC AN Hong

20 Global Branch Prediction Scheme: Gselect Two-level predictor(另一种画法)
by S.T.Pan 1992 PHT k-bits BHR n-bits Branch Address(PC) n k n+k + 2018/9/20 CS of USTC AN Hong

21 Global Branch Prediction Scheme: Gshare Two-level predictor
by Scott McFarling 1993 Global Branch Prediction Scheme: Gshare Two-level predictor PHT k-bits BHR n-bits Branch Address n k nk XOR 2018/9/20 CS of USTC AN Hong

22 Global Branch Prediction Scheme: Global BHR/Global PHT
Global branch predictor Denotation GAg(k) G: “Global” BHR A: adaptive g: “global” PHT k: BHR length k-bits 2-bits PHT = Pattern History Tables (2-bit Saturating Up-down Counter ) BHR = Branch History Register (Shift left when update) 2k entries GAg(k) 2018/9/20 CS of USTC AN Hong

23 Global Branch Prediction Scheme: Global BHR/Global PHT
Problem: Two code sequences may have the same bit pattern in the BHR and thus index the same pattern in the PHT. 2-bits global PHT global BHR 4096 entries Global branch Path History, Shift left when update k = 12 GAg(12) 2018/9/20 CS of USTC AN Hong

24 Global Branch Prediction Scheme: Gshare Two-level predictor
by Scott McFarling 1993 Global Branch Prediction Scheme: Gshare Two-level predictor n = 8 k = 8 n = 4 k = 4 n k Branch Address BHR gselect 4/4 gselect 8/8 hashed 2018/9/20 CS of USTC AN Hong

25 Global Branch Prediction Scheme
Strategy Based on the combined history of all recent branches Based on a Shift register and a counter Works well when the direction taken by sequentially executed branches is highly correlated. 11% additional accuracy (compared with 2-bit scheme) at the extra hardware cost of one shift register Example: if (x<1) {...} if (x>1) {...} if ( a = = 2) a=0; // b1 if ( b = = 2) b = 0; // b2 if (a != b ) { // b3 } 2018/9/20 CS of USTC AN Hong

26 Two-level adaptive predictors:Variations
基于多个分支全局历史的(基于相关的)预测方法 Global PHT per-address PHTs per-set PHTs Global BHR GAg GAp GAs per-address BHT PAg PAp PAs per-set BHT SAg SAp SAs 基于单个分支局部历史的(基于模式的)预测方法 2018/9/20 CS of USTC AN Hong

27 Local Branch Prediction Scheme: pre-address BHR/global PHT
1100 per-address BHT N-bits Branch Address (PC) full address k-bits 2-bits global PHT 16 entries k=4 PAg(4) 2018/9/20 CS of USTC AN Hong

28 Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor
n-bits Branch Address (PC) n k-bits 2-bits PHT = Pattern History Tables (2-bit Saturating Up-down Counter ) BHT = Branch History Table (Shift left when update) 2k entries 2n entries k SAg(k) 2018/9/20 CS of USTC AN Hong

29 Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor
1100 BHT n-bits Branch Address (PC) n=10 k-bits 2-bits 1024 entries PHT 16 entries k=4 SAg(4) 2018/9/20 CS of USTC AN Hong

30 Local Branch Prediction Scheme: pre-address BHR/per-address PHTs
Branch Address (PC) b1 b2 per-address PHTs N-bits 2-bits 2-bits 2-bits full address n-bits b1 k=4 1100 b2 1100 16 entries per-address BHT PAp(4) 2018/9/20 CS of USTC AN Hong

31 Local Branch Prediction Scheme
Strategy: considers the history of each branch independently and takes advantage of repetitive patterns. Works well: branches with simple repetitive patterns. Example: for (I=1, I<=4; I++){ } //its pattern is (1110)n 2018/9/20 CS of USTC AN Hong

32 性能比较 @SPEC89 2018/9/20 CS of USTC AN Hong

33 性能比较 @SPEC89 2018/9/20 CS of USTC AN Hong

34 混合预测算法的例子:Alpha 21264的分支预测器
per-set BHT Global history prediction global PHT Global PHT GAg(12) k =10 n=10 k = 12 Selector Local history prediction (112) SAg(10) Global BHR 2018/9/20 CS of USTC AN Hong


Download ppt "Lecture on High Performance Processor Architecture (CS05162)"

Similar presentations


Ads by Google