Lecture on High Performance Processor Architecture (CS05162)

Slides:



Advertisements
Similar presentations
高考英语阅读分析 —— 七选五. 题型解读: 试题模式: 给出一篇缺少 5 个句子的文章, 对应有七个选项,要求同学们根据文章结构、 内容,选出正确的句子,填入相应的空白处。 考查重点: 主要考查考生对文章的整体内容 和结构以及上下文逻辑意义的理解和掌握。 (考试说明) 选项特点: 主旨概括句(文章整体内容)
Advertisements

卢泱琏 冯晓盼 罗丽萍 黄麒蓉 韦妍 第三节词的减译法. I can do it,and so can you. 我能做,(并且)你也能做。(省略并列连词) A nurse should have patience in her work. ( 一个)当护士的应当在工作上有耐心。(省略不定冠词 A.
CHAPTER 9 虛擬記憶體管理 9.2 分頁需求 9.3 寫入時複製 9.4 分頁替換 9.5 欄的配置法則 9.6 輾轉現象
Have you ever been to a zoo? zoo water park Have you ever been to a water park?
Healthy Breakfast 第四組 電子一甲(電資一) 指導老師:高美玉 組長:B 侯昌毅
-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么?
破舊立新(三) 人生召命的更新 使徒行傳廿六章19-23節.
中四 升學講座 中五 2007年12月8日.
专题八 书面表达.
Chapter 17 數位革命與全球電子市場 Global Marketing Warren J. Keegan Mark C. Green.
完形填空技巧 CET4.
二維品質模式與麻醉前訪視滿意度 中文摘要 麻醉前訪視,是麻醉醫護人員對病患提供麻醉相關資訊與服務,並建立良好醫病關係的第一次接觸。本研究目的是以Kano‘s 二維品質模式,設計病患滿意度問卷,探討麻醉前訪視內容與病患滿意度之關係,以期分析關鍵品質要素為何,作為提高病患對醫療滿意度之參考。 本研究於台灣北部某醫學中心,通過該院人體試驗委員會審查後進行。對象為婦科排程手術住院病患,其中實驗組共107位病患,在麻醉醫師訪視之前,安排先觀看麻醉流程衛教影片;另外對照組111位病患,則未提供衛教影片。問卷於麻醉醫師
广德二中2006届高考 英语专题复习 单项填空 答题指导.
加快数据中心运转速度 — 加速业务发展 约翰•福勒 甲骨文公司系统事业部执行副总裁. 加快数据中心运转速度 — 加速业务发展 约翰•福勒 甲骨文公司系统事业部执行副总裁.
第4章 VHDL设计初步.
2012高考英语书面表达精品课件:话题作文6 计划与愿望.
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Taipei Wonderful! 龍騰版英文第三冊第六課.
CH1 Number Systems and Conversion
Been During the Vacation?
An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET
IV. Implementation IV-A Method 1: Direct Implementation 以 STFT 為例
Module 5 Shopping 第2课时.
How often do you exercise?
! 温故知新 上下文无关文法 最左推导 最右推导 自上而下 自下而上 句柄 归约 移进-归约冲突 移进-归约分析 递归下降预测分析
關聯式資料庫.
Excellence in Manufacturing 卓 越 制 造
Web-based cooperation + Data Intelligence for Malaysian SME
加州協調護理計畫 洛杉磯縣.
Lecture on High Performance Processor Architecture (CS05162)
第 17 章 數位革命與 全球電子市場 © 2005 Prentice Hall.
Fundamentals of Physics 8/e 27 - Circuit Theory
第4章(2) 空间数据库 —关系数据库 北京建筑工程学院 王文宇.
微程序控制器 刘鹏 Dept. ISEE Zhejiang University
创建型设计模式.
组合逻辑3 Combinational Logic
Dì二十課 看bìng Dì二十课 看bìng
製程能力分析 何正斌 教授 國立屏東科技大學工業管理學系.
第14章 竞争市场上的企业 上海杉达学院 国贸系.
子博弈完美Nash均衡 我们知道,一个博弈可以有多于一个的Nash均衡。在某些情况下,我们可以按照“子博弈完美”的要求,把不符合这个要求的均衡去掉。 扩展型博弈G的一部分g叫做一个子博弈,如果g包含某个节点和它所有的后继点,并且一个G的信息集或者和g不相交,或者整个含于g。 一个Nash均衡称为子博弈完美的,如果它在每.
JTAG INTERFACE SRAM TESTER WITH C-LCM
第十五课:在医院看病.
Single’s Day.
IBM SWG Overall Introduction
汉英翻译对比练习.
BORROWING SUBTRACTION WITHIN 20
Safety science and engineering department
5-6 串列埠模式0輸出埠擴充實習.
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
虚 拟 仪 器 virtual instrument
中国科学技术大学计算机系 陈香兰 Fall 2013 第三讲 线程 中国科学技术大学计算机系 陈香兰 Fall 2013.
中央社新聞— <LTTC:台灣學生英語聽說提升 讀寫相對下降>
关联词 Writing.
True friendship is like sound health;
從 ER 到 Logical Schema ──兼談Schema Integration
成才之路 · 英语 人教版 · 必修1 路漫漫其修远兮 吾将上下而求索.
计算机问题求解 – 论题 算法方法 2016年11月28日.
Inter-band calibration for atmosphere
高考应试作文写作训练 5. 正反观点对比.
Distance Vector vs Link State
An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。 本文目的: 做到既有较高的藏量(1bpp),
计算机问题求解 – 论题1-5 - 数据与数据结构 2018年10月16日.
Chapter 10 Mobile IP TCP/IP Protocol Suite
Distance Vector vs Link State Routing Protocols
常州市教育学会学业水平监测 九年级英语试卷分析 常州市第二十四中学 许喆 2012年2月.
簡單迴歸分析與相關分析 莊文忠 副教授 世新大學行政管理學系 計量分析一(莊文忠副教授) 2019/8/3.
Principle and application of optical information technology
Train Track and Children
ppt宝藏提供 中国银行业信息化系统建设研讨会
Presentation transcript:

Lecture on High Performance Processor Architecture (CS05162) Dynamic Branch Prediction Scheme An Hong han@ustc.edu.cn Fall 2007 University of Science and Technology of China Department of Computer Science and Technology CS258 S99

Outline Bimodal Branch Prediction Scheme Two-Level Branch Prediction Scheme 混合预测算法的例子:Alpha 21264的分支预测器 2018/9/20 CS of USTC AN Hong

Dynamic Prediction(1): 利用单个分支自身历史(基于模式的预测) Dynamic Prediction :Use run-time information to make prediction example: Branch Prediction Buffer(1-位预测器) 2018/9/20 CS of USTC AN Hong

Dynamic Prediction: 1-bit BHT Branch History Table Lower bits of PC address index table of 1-bit values Says whether or not branch taken last time No address check Problem: in a loop, 1-bit BHT will cause two mispredictions : End of loop case, when it exits instead of looping as before First time through loop on next time through code, when it predicts exit instead of looping 2018/9/20 CS of USTC AN Hong

Dynamic Prediction: 1-bit BHT (Branch Prediction Buffer) Pros: Small. 1 bit per entry can fit lots of entries Always returns a prediction Cons: aliasing between branches one bit of state mispredicts many branches for (I =0; I<10; I++) { a = a + 1; } Two mispredictions per loop invocation 2018/9/20 CS of USTC AN Hong

2-bit Saturating Up-down Counter Dynamic Prediction(2):Bimodal Branch Prediction Scheme(2bits BHT, 2-位饱和预测器) Solution: 2-bit predictor where change prediction only if get misprediction twice: Use extra state to reduce mispredictions at loop ends Red: stop, not taken Green: go, taken Adds hysteresis to decision making process T NT Predict Taken Predict Not Taken BHT T = Taken N = Not Taken 11 10 00 01  2-bit Saturating Up-down Counter 2018/9/20 CS of USTC AN Hong

Bimodal Branch Prediction Scheme Strategy: Based on the direction the branch went the last few times it was executed. Based on a little self-history pattern Based on a counter Works well: when each branch is strongly biased in a particular direction. For scientific/engineering applications where program execution is dominated by inner-loops. 2018/9/20 CS of USTC AN Hong

Bimodal Branch Prediction Scheme 例1:…NNNTNNN… 1-位预测器,出现2次预测错 2-位预测器,出现1次预测错 例2:TNTNTN…., 初始状态为01的2-位预测器,出现100%预测错 BHT方法准确度 Mispredict because either: Wrong guess for that branch Got branch history of wrong branch when index the table 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12% 4096 about as good as infinite table (in Alpha 21164) 2-bit已经足够, n-bit (n>2)与2-bit效果差不多 2018/9/20 CS of USTC AN Hong

Pros/Cons Cons: Pros: Now only mispredicts once on each loop Also good for data-dependent branches where most data points the same way ex: checking for termination character at the end of a string Pros: Still have aliasing problem between branches only uses information about history of current branch(self-history, or local-history) But, sequences of branches often correlate 2018/9/20 CS of USTC AN Hong

Dynamic Prediction(3): 利用多个分支全局历史(基于相关的预测) 例子:Taken from the equntott benchmark if (aa == 2) aa = 0; if (bb == 2) bb = 0; if (aa != bb) {……} If first two branches taken, third always taken 在许多整型应用中,控制流常常是很复杂的,并且一个分支的结果经常受已执行的其它分支的结果影响。也就是说,分支是相关的。由于这种相关性,如果只看分支本身的历史,单个分支的历史看上去缺少规律性,如果直接用它来做局部预测,精度就不会高。 2018/9/20 CS of USTC AN Hong

基于相关的动态分支预测技术 预测原理 预测器结构:全部采用二级结构 根据若干条相邻分支最近执行历史的组合——相邻历史(neighbor-history),或称全局历史(global-history)来做预测,因此又称全局预测技术 预测器结构:全部采用二级结构 第一级为一个分支历史寄存器(Branch History Register, BHR),用于记录所有相关分支最近的执行历史; 第二级为一个全局的模式历史表(Global pattern history table,PHT),用于记录每条分支的行为状态。 典型的全局预测器是Pan, So和Rahmeh提出的基于相关的预测器,McFarling称之为gselect预测器;McFarling在[Pan, So和Rahmeh]的基础上为解决命名冲突提出了gshare预测器; Yeh和Patt提出的两级自适应预测器中的其中三种配置:GAg/GAp /GAs预测器也属全局预测器。 2018/9/20 CS of USTC AN Hong

Two-Level Branch Prediction (GAg) Use history of recent branches 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: Global BHR/pre-address PHTs 相应于每个路径的子历史表 k-bits per-address PHTs global BHR (Shift left when update) n-bits 2-bits Branch Address(PC) k=4 n = full address 16 entries Denotation GAp(k) G: “Global” BHR A: adaptive p: “per-address” PHT k: BHR length GAp(k) = GAp(4),又称(k,n)-gselect预测器 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: 设计思想 例:考虑下面的代码段: if (aa == 2) // branch b1 aa =0; if (bb == 2) // branch b2 bb =0; if (aa != bb) { // branch b3 …… } 这段代码由编译器转换为以下汇编程序(假定aa和bb分别分配为R1和R2): SUBI R3,R1,#2 BNEZ R3,L1 ;branch b1 (aa != 2) , taken ADD R1, R0, R0 ;aa == 0, not taken L1: SUBI R3,R2,#2 BNEZ R3,L2 ;branch b2 (bb != 2) , taken ADD R2, R0, R0 ;bb == 0, not taken L2:SUB R3,R1,R2 ;R3=aa-bb BEQZ R3,L3 ;branch b3 (aa = = bb) , taken 通向b3的路径 A:0-0 B:0-1 C:1-0 D:1-1 在b3执行之前可以推测 aa=0 bb=0 bb2 aa2 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: 设计思想 ③ ④ aa bb aa' bb' 到达b3的路径 PHT的当前状态 b3预测结果 b3的实际结果 正确:c 错误:w PHT的下一个状态 2 C N T w 1 A B c 3 D ⑥ ⑤ 2018/9/20 aa’和bb’是b1和b2执行后aa和bb的新值 ① CS of USTC AN Hong ②

Global Branch Prediction Scheme: 设计思想 路径 B1 和 B2的方向 B3的实际结果 2018/9/20 CS of USTC AN Hong

Pros/Cons of Two-Level Branch Prediction Predicts correlated branch behavior that breaks other predictors eqntott example Better overall performance than purely address-based predictors Cons: Interference between unrelated branches with same history example: all loop-end branches will map to same entry in pattern history table sometimes this is a good thing 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme GAp(k) 又称(k,n)-gselect预测器 (0,n)- gselect 预测器,即为2-位(双峰)预测器 (M,0)- gselect预测器,即为GAg(M)预测器 k-bits per-address PHTs global BHR n-bits 2-bits Branch Address(PC) k=4 n = full address 16 entries Denotation GAp(k) G: “Global” BHR A: adaptive p: “per-address” PHT k: BHR length 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: Global BHR/pre-set PHTs k-bits per-set PHTs global BHR n-bits 2-bits Branch Address(PC) k=4 n =10 16 entries Denotation GAs(n, 2k) G: “Global” BHR A: adaptive s: “per-set” PHT k: BHR length GAs(4,1024) 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: Gselect Two-level predictor(另一种画法) by S.T.Pan 1992 PHT k-bits BHR n-bits Branch Address(PC) n k n+k + 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: Gshare Two-level predictor by Scott McFarling 1993 Global Branch Prediction Scheme: Gshare Two-level predictor PHT k-bits BHR n-bits Branch Address n k nk XOR 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: Global BHR/Global PHT Global branch predictor Denotation GAg(k) G: “Global” BHR A: adaptive g: “global” PHT k: BHR length k-bits 2-bits PHT = Pattern History Tables (2-bit Saturating Up-down Counter ) BHR = Branch History Register (Shift left when update) 2k entries GAg(k) 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: Global BHR/Global PHT Problem: Two code sequences may have the same bit pattern in the BHR and thus index the same pattern in the PHT. 111100001111 2-bits global PHT global BHR 4096 entries Global branch Path History, Shift left when update k = 12 GAg(12) 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme: Gshare Two-level predictor by Scott McFarling 1993 Global Branch Prediction Scheme: Gshare Two-level predictor n = 8 k = 8 n = 4 k = 4 n k Branch Address BHR gselect 4/4 gselect 8/8 0000 0000 0000 0001 00000001 00000000 1111 1111 1111 0000 11111111 1000 0000 01111111  hashed 2018/9/20 CS of USTC AN Hong

Global Branch Prediction Scheme Strategy Based on the combined history of all recent branches Based on a Shift register and a counter Works well when the direction taken by sequentially executed branches is highly correlated. 11% additional accuracy (compared with 2-bit scheme) at the extra hardware cost of one shift register Example: if (x<1) {...}  if (x>1) {...} if ( a = = 2) a=0; // b1 if ( b = = 2) b = 0; // b2 if (a != b ) { // b3 … } 2018/9/20 CS of USTC AN Hong

Two-level adaptive predictors:Variations 基于多个分支全局历史的(基于相关的)预测方法 Global PHT per-address PHTs per-set PHTs Global BHR GAg GAp GAs per-address BHT PAg PAp PAs per-set BHT SAg SAp SAs 基于单个分支局部历史的(基于模式的)预测方法 2018/9/20 CS of USTC AN Hong

Local Branch Prediction Scheme: pre-address BHR/global PHT 1100 per-address BHT N-bits Branch Address (PC) full address k-bits  2-bits global PHT 16 entries k=4 PAg(4) 2018/9/20 CS of USTC AN Hong

Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor n-bits Branch Address (PC) n k-bits 2-bits  PHT = Pattern History Tables (2-bit Saturating Up-down Counter ) BHT = Branch History Table (Shift left when update) 2k entries 2n entries k SAg(k) 2018/9/20 CS of USTC AN Hong

Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor 1100 BHT n-bits Branch Address (PC) n=10 k-bits 2-bits  1024 entries PHT 16 entries k=4 SAg(4) 2018/9/20 CS of USTC AN Hong

Local Branch Prediction Scheme: pre-address BHR/per-address PHTs Branch Address (PC) b1 b2 per-address PHTs N-bits 2-bits 2-bits 2-bits full address n-bits  b1 k=4 1100  b2 1100  16 entries per-address BHT PAp(4) 2018/9/20 CS of USTC AN Hong

Local Branch Prediction Scheme Strategy: considers the history of each branch independently and takes advantage of repetitive patterns. Works well: branches with simple repetitive patterns. Example: for (I=1, I<=4; I++){ } //its pattern is (1110)n 2018/9/20 CS of USTC AN Hong

性能比较 @SPEC89 2018/9/20 CS of USTC AN Hong

性能比较 @SPEC89 2018/9/20 CS of USTC AN Hong

混合预测算法的例子:Alpha 21264的分支预测器 per-set BHT Global history prediction global PHT Global PHT GAg(12) k =10 n=10 k = 12 Selector Local history prediction (112) SAg(10) Global BHR 2018/9/20 CS of USTC AN Hong