隱藏之投影片 2016/8/22 2016/8/29 Streamlining Inter-operation Memory Communication via Data Dependence Prediction 胡連鈞 Hu, Lien Chun 電機系, Department of Electrical.

Slides:



Advertisements
Similar presentations
F1 VISA APPLICATION F1学生赴美留学签证申请流程.
Advertisements

宏 观 经 济 学 N.Gregory Mankiw 上海杉达学院.
二維品質模式與麻醉前訪視滿意度 中文摘要 麻醉前訪視,是麻醉醫護人員對病患提供麻醉相關資訊與服務,並建立良好醫病關係的第一次接觸。本研究目的是以Kano‘s 二維品質模式,設計病患滿意度問卷,探討麻醉前訪視內容與病患滿意度之關係,以期分析關鍵品質要素為何,作為提高病患對醫療滿意度之參考。 本研究於台灣北部某醫學中心,通過該院人體試驗委員會審查後進行。對象為婦科排程手術住院病患,其中實驗組共107位病患,在麻醉醫師訪視之前,安排先觀看麻醉流程衛教影片;另外對照組111位病患,則未提供衛教影片。問卷於麻醉醫師
Chapter 4. Logistics Information Management
CHAP 2 Computer-System Structures 计算机系统结构
Chapter 2: Computer-System Structures计算机系统结构
雅思大作文的结构 Presented by: 总统秘书王富贵.
-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學 資訊管理系 李麗華 教授.
大数据在医疗行业的应用.
Chapter 8 Liner Regression and Correlation 第八章 直线回归和相关
Academic Year TFC EFL Data Collection Outline 学年美丽中国英语测试数据收集概述
59 中 张丽娟 学习目标: 1. 识记并理解运用 6 个单词和 5 个短语。 (source, accessible, network, access, via, create come up with, from the moment on, consist of, go down , at the.
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
3-3 Modeling with Systems of DEs
Euler’s method of construction of the Exponential function
IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 3, MARCH 2013
An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET
Thinking of Instrumentation Survivability Under Severe Accident
指導教授:許子衡 教授 報告學生:翁偉傑 Qiangyuan Yu , Geert Heijenk
! 温故知新 上下文无关文法 最左推导 最右推导 自上而下 自下而上 句柄 归约 移进-归约冲突 移进-归约分析 递归下降预测分析
模式识别 Pattern Recognition
SPC introduction.
CHAPTER 8 VIRTUAL MEMORY
Logistics 物流 昭安國際物流園區 總經理 曾玉勤.
Draft Amendment to STANDARD FOR Information Technology -Telecommunications and Information Exchange Between Systems - LAN/: R: Fast BSS.
微程序控制器 刘鹏 Dept. ISEE Zhejiang University
创建型设计模式.
Unit 2 Key points summary.
Inventory System Changes and Limitations
子博弈完美Nash均衡 我们知道,一个博弈可以有多于一个的Nash均衡。在某些情况下,我们可以按照“子博弈完美”的要求,把不符合这个要求的均衡去掉。 扩展型博弈G的一部分g叫做一个子博弈,如果g包含某个节点和它所有的后继点,并且一个G的信息集或者和g不相交,或者整个含于g。 一个Nash均衡称为子博弈完美的,如果它在每.
重點 資料結構之選定會影響演算法 選擇對的資料結構讓您上天堂 程式.
Formal Pivot to both Language and Intelligence in Science
陳明璋 一個引導注意力為導向的數位內容設計及展演環境 Activate Mind Attention AMA
PubMed整合显示图书馆电子资源 医科院图书馆电子资源培训讲座.
Advanced Basic Key Terms Dependency Actor Generation association
服務於中國研究的網絡基礎設施 A Cyberinfrastructure for Historical China Studies
句子成分的省略(1).
Chapter 5 Recursion.
IBM SWG Overall Introduction
資料庫 靜宜大學資管系 楊子青.
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
2019/4/8 A Load Balancing Mechanism for multiple SDN Controllers based on Load Informing Strategy Miultiple controller 的 load balancing 機制,使用一個叫 Load informing.
Introduction to C Programming
Guide to a successful PowerPoint design – simple is best
BORROWING SUBTRACTION WITHIN 20
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
虚 拟 仪 器 virtual instrument
Common Qs Regarding Earnings
從 ER 到 Logical Schema ──兼談Schema Integration
Inter-band calibration for atmosphere
A Data Mining Algorithm for Generalized Web Prefetching
高考应试作文写作训练 5. 正反观点对比.
Patent Application The 15th of every month Form When to Provide
An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。 本文目的: 做到既有较高的藏量(1bpp),
管理學作業 趙子晴 陳鈺峰.
An organizational learning approach to information systems development
BiCuts: A fast packet classification algorithm using bit-level cutting
计算机问题求解 – 论题1-5 - 数据与数据结构 2018年10月16日.
Nucleon EM form factors in a quark-gluon core model
CHAPTER 6 Concurrency:deadlock And Starvation
 隐式欧拉法 /* implicit Euler method */
动词不定式(6).
Advanced Basic Key Terms Dependency Generalization Actor Stereotype
怎樣把同一評估 給與在不同班級的學生 How to administer the Same assessment to students from Different classes and groups.
Race Conditions and Semaphore
MGT 213 System Management Server的昨天,今天和明天
簡單迴歸分析與相關分析 莊文忠 副教授 世新大學行政管理學系 計量分析一(莊文忠副教授) 2019/8/3.
Principle and application of optical information technology
Hybrid fractal zerotree wavelet image coding
Section 1 Basic concepts of web page
Presentation transcript:

隱藏之投影片 2016/8/22 2016/8/29 Streamlining Inter-operation Memory Communication via Data Dependence Prediction 胡連鈞 Hu, Lien Chun 電機系, Department of Electrical Engineering 國立成功大學, National Cheng Kung University Tainan, Taiwan, R.O.C 06-2757575 ext 62365, Office: 奇美樓, 6F, 95601 Email: seido5310@gmail.com Website address: http://j92a21b.ee.ncku.edu.tw/broad/index.html or http://www.ee.ncku.edu.tw/chinese/member/professor/T405-jou/T0000000c.htm

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

Abstract 方法一 (Cloaking) : We use data dependence prediction to identify and link dependent loads and stores. without incurring the overhead of address calculation, disambiguation and data cache access. 方法二 (Bypassing) : We also use data dependence prediction to convert, DEF-store-load-USE chains within the instruction window into DEF-USE chains prior to address calculation and disambiguation. 1.第一種方法是Cloaking : 我們利用資料相依的預測去確認有顆能發生資料相依的load跟store指令,避免過頭的位址計算,有效減少memory的溝通延遲。 Ps. 我們知道如果store完下一個接load會發生資料相依,會導致memory的溝通延遲。 2.第二種方法是Bypassing : 我們也使用資料相依的預測,讓原本要走DEF-store-load-USE chains 變成比較短的DEF-USE chains 。有效減少memory的溝通延遲。

Abstract 方法三 (建立TVC) : We use true and output data dependence status prediction to introduce and manage a small storage structure called the Transient Value Cache (TVC). The TVC captures memory values that are short-lived. 1.第三種方法是建立一個TVC : 我們利用output data相依狀態預測並產生一個小的結構TVC,TVC的儲存資料很短命,他可以快速儲存馬上要用的資料,而且TVC並不屬於其他的memory hierarchy(存儲層次)中,就像是data cache依樣。

Abstract 方法一 (Cloaking) 和方法二 (Bypassing) are aimed at reducing the effective communication latency (降低溝通的延遲性). 方法三 (建立TVC) is aimed at reducing data cache bandwidth requirements (降低快取的頻寬需求),increasing the effective memory bandwidth (增加有效率的記憶體頻寬). 以上方法都是為了簡化內部記憶體的溝通。

Abstract Experimental analysis of the proposed techniques shows that: (i) the proposed speculative communication methods correctly handle a large fraction of memory dependences (ii) a large number of the loads and stores do not have to ever reach the data cache when the TVC is in place. 另外實驗的分析發現 1.投機性的溝通發方法能夠正確地處理大部分的資料相依。 2.當TVC放置時,很大部分的load跟store甚至不用接出data的快取。

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

1. Introduction 首先介紹: With an implicit specification, communication cam take place after address-calculation and disambiguation. With an explicit specification communication can take place as soon as the two instructions are encountered and the value is available. (Faster) implicit specification會在所有位址計算完後記憶體才會溝通。 explicit specification 會在兩個指令出現時,就計算完成,且可以讓值算出。 較快

1. Introduction 目標 : We are primarily concerned with methods of converting the traditional implicit specification of memory communication into an explicit form. 方法 : To do so, we use data dependence prediction to explicitly link loads and stores that are likely to be dependent. These loads and stores can then communicate via a dynamically created name space without incurring the overhead of address calculation, disambiguation and data cache access. 1.為了要達到explicitly link我們使用資料相依的預測。 2.透過動態的建立name spcace 避免過多的計算。

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

2.Memory as an Inter-operation Communication Agent Memory communication can be viewed as a two step process. 1. Dependences are established. 2. Actual values are communicated. To streamline memory communication we need: (i) establish the dependences as quickly as possible. (ii) provide storage structures that best meet the communication requirements. (low latency/high bandwidth) 1.Memory的溝通可以作為兩個階段的過程,第一是是否具有相依性的建立,第二是實際的值被計算出來。 2.為了要減少Memory的溝通延遲 要做兩件事情 (1) 盡快地找出相依性 (2)提供儲存空間給溝通的裝置 (少延遲 高頻寬)

2.Memory as an Inter-operation Communication Agent In this paper we do not consider a static approach since it would require static knowledge of the dependences, and it would also involve changing the program representation completely. Instead, we investigate dynamic approaches. We then use these speculative dependences to create a dynamic name space through which the dependent loads and stores can communicate without incurring the overhead of address calculation. With dynamically collected information that can be used to develop and manage novel memory hierarchies. 1.在這個paper中我們不討論靜態的方法,因為這需要靜態依賴的知識,可能會涉及到程式改變 2.相反地,我們調查動態方法。 3.我們去建立一個動態的名字空間,可以讓有相依性的load store避免過頭的計算。 4.藉由動態的計算,我們可以發展出現代化的memory hierarchies

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

3. Memory Traffic Analysis SPECint95 256 load 50% store 8 8K store 我們拿SPECint95中幾個規格做為測試程式,分析發現 橫坐標是store的距離 上面是 load/store相依百分比,下面是store/store相依百分比。 這個結論告訴我們,在256 Store 距離中,有50%的下一個load有資料相依,有60%的下一個store有資料相依。 如果能能夠先預測出這些相依性,可以讓延遲降低,效能提升 60% store

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

4. Speculative Memory Cloaking The purpose of cloaking is to streamline memory communication by dynamically converting the implicit specification of dependences into an explicit form. In part (a), detecting a load-store dependence results in an association among the load, the store and a synonym. 首先,介紹第一個方法 1.Cloaking藉由動態的將implicit specification 轉成explicit form來減少memory communication 。 2.在圖a,發現有 load store 相依時,把結果放進association中。

4. Speculative Memory Cloaking When a subsequent instance of the store is encountered and a dependence is predicted (action 1). this association results in the generation of a new version of the synonym (action 2). Synonym File (SF) which is a small, low-latency/high-bandwidth storage structure. Upon value reception the synonym file entry is updated and marked as full (action 3). Finally, when the store computes its address it accesses memory (action 4). 1.在隨後的store來臨,並且預測到會有相依情形時 。 2.Association的結果會產生一個新的synonym。(為了給下一個相依load使用) 3.SF的值會被更新,並且並且標示full 4.最後,store計算出位址,並存取記憶體。

4. Speculative Memory Cloaking When the appropriate instance of the load is brought into the instruction window, the association is used again to derive the synonym (action 5). Locate the appropriate element in the synonym file (action 6). Instructions that use the load value may at this point execute speculatively using this value (action 7). When the load data address becomes available, the memory system is accessed to read the actual value (action 8). 當隨後的load來時,associtaion再次被使用並且去驅動synonym。 找到相應的值在SF中。 Load的value會被推測出來。 當Load的位址會計算出來時,memory馬上頭存取並讀取正確的值。

4. Speculative Memory Cloaking 驗證方式 : This is compared with the value obtained earlier via the cloaking mechanism. If the two values are the same, cloaking was successful and no further action is required. Otherwise, data value mis-speculation occurs, and any instructions that used wrong data have to be re-executed. 結論 : Speculative memory cloaking has the following requirements: (1). predicting dependences. (2). creating synonyms, associating them with the dependent instructions and assigning storage for the communication (3). verifying the speculatively communicated values 最後值會被做比較,如果兩個值相同 clacking成功不會有額外的行動需求,當有預測失敗時,wrong data 必須要重新計算。 Clacking要有以下需求 (1)預測相依性 (2)建立synonyms,好讓相依的指令能夠存儲。 (3)驗證value值是否正確

4. Speculative Memory Cloaking 我們靠三個structures達到 (a) dependence detection table (DDT) (1) Data Address (ADDR) (2) Store PC (STPC) (3) a valid bit. (b) dependence prediction and naming table (DPNT) (1) instruction address (PC) (2) dependence status predictor (PRED) (3) dependence tag (DTAG) (4) a valid bit. (c) synonym file (SF). (1) name (2)value (3) full/empty bit (4) valid bit. 為了要達到cloacking需要三個structures達到 這個是用來偵測相依性 預測相依,並且將它重新命名 儲存store的值,並給load值的地方

4. Speculative Memory Cloaking 1.在part(b)跟part©中我們做如何偵測相依性。 2.在part(b)中第一個store執行並記錄在DDT。 3.第一個load去看他的data address是否跟剛剛的store相同,如果相同代表他們相依存在。 4.在這些行為後,將他們相關性寫入DPNT中。 In parts (b) and (c) we show the actions that lead to the detection of the dependence. In part (b), the first instance of the store executes and records in the DDT its PC and the data address it updated (action 1). Later on, in part (c), the first instance of the load using its data address probes the DDT (action 2) and determines that a dependence exists. In reaction to this detection, two entries are allocated in the DPNT (action 3).

4. Speculative Memory Cloaking A later instance of the store enters the instruction window. The PC of the store is used to probe the DPNT for a matching entry (action 4) Assuming that the predictor indicates so, a synonym is generated based on the tag recorded in the DPNT entry, and it is used to allocate space in the SF (action 5). The full/empty bit of the SF entry is set to empty to indicate that the value is not yet available, 接著有一個store進來 Store的PC會去偵測DPNT是否match 如果有預測到 ,synonym會會建立一個 tag 記錄在DPNT中用來連結SF。 full/empty bit 設定成空的 因為值還沒算好

4. Speculative Memory Cloaking store 也記正確的值給SF,並把full/empty bit 設成1。 最後store也進入傳統的memory hierarchy Whereas, the store also records the location of the SF entry since the actual data value, when it becomes available, will have to be written in the SF entry (action 6). Eventually, the store also accesses the traditional memory hierarchy (action 7).

4. Speculative Memory Cloaking When the next instance of the load enters the window Its PC is used to probe the DPNT (action 8). After a dependence status prediction is made, the tag recorded in the DPNT entry leads to the generation of the same synonym generated previously for the store. This synonym is used to access the appropriate SF entry (action 9) and to obtain the data left there by the store. At this point the load may use this data to execute (action 10). When the data address becomes available, the load accesses the traditional memory hierarchy to obtain the actual data value (action 11). 1. load的PC用來偵測DPNT 2. 先去偵測看有沒有address相同的指令。(如果有的話,會跟剛剛一樣 synonym會會建立一個 tag 記錄在DPNT中 用來連結SF) 3. 接著synonym會去存取SF並獲取剛剛store留下的值。9 4. 這個值做運算回傳。10 5.當load的address計算好的時候再傳給traditional memory hierarchy ,去得到正確的data value。11

4. Speculative Memory Cloaking 驗證 : This value is compared against the value read previously from the SF and appropriate action is taken if the two values differ. 更新 : At this point we may also update the predictors in the DPNT entries for both the load and the store. 這個值會被剛剛SF的值做比較,讓traditional memory hierarchy可以做為驗證,避免錯誤。 同時。DPNT的值會被更新。

4. Speculative Memory Cloaking 他的block diagram 長這樣子,當有指令進來時,去看DPNT是否有位址相同的指令,如果有synonym會會建立一個 tag 記錄在DPNT中用來連結SF,之後會進行指令解碼跟重新命名,同時會有SF做預測,即能夠及時算出相依時正確的值,之後可以靠EX做驗證回傳正確性,然後在commit的地方做更新data 的動作。

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

5. Speculative Memory Bypassing Using the I1–store–load–I4 chain shown in part (a). 缺點 : travel through these two instructions before it can reach I4. (slower) when the dependent load and store co-exist in the instruction window, further reduction in the communication latency is possible with speculative memory bypassing. 第五節,我們講解 第二個方法 bypassing (條件)當有相依的load跟store存在在同一個instruction window,降低他的latency就是利用Bypassing。

5. Speculative Memory Bypassing As shown in part (b) with speculative memory bypassing, the value can be sent directly from I1 to I4. (faster) As was the case with speculative memory cloaking, this communication is speculative and has to be verified. Speculative memory bypassing can be implemented as a simple extension to speculative memory cloaking. 1.藉由bypassing可以用值馬上從I1給I4 2.溝通可利用cloacking做驗證。 3.Bypassing可以作為cloaking的延伸。

5. Speculative Memory Bypassing At step (1), instruction I1 is decoded and register renaming creates a new name TAG1 for the target register R1. At step (2), the store instruction is decoded and determines the current name of its source register R1. In parallel, via the use of cloaking, a synonym is created for the memory communication, we also record in the synonym the current name TAG1 of store’s source register R1. I1被解碼,暫存器重新取名TAG1 store被解碼,從暫存器R1種計算得到current name。 同時,我們利用cloaking,synonym會個當下,我們在synonym寫入the current name TAG1

5. Speculative Memory Bypassing At step (3), the load instruction is decoded and register renaming creates a new name TAG2 for the destination register R2. At step (4) I4 is decoded, it can determine that its source register R2 has two names: one actual TAG2 and one speculative TAG1. By using the speculative name TAG1, I4 can link directly to I1 and execute speculatively as soon as I4 produces its value. Later on, after the load has accessed the memory the integrity of the communication can be verified. 第三步,load被解碼,暫存器重新命名並製造一個新的TAG2給R2使用。 在第四步時,I4被解碼,R2目前有兩個名字,one actual TAG2 and one speculative TAG1 3. 藉由speculative name TAG1,I4可以直接從I1連結並計算,只要I4產生出值。 接著load從traditional memory hierarchy 算好後可以做驗證。 4. 接著load從traditional memory hierarchy 算好後可以做驗證。

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

6. Transient Value Cache 因為 most of the values stored to memory are quickly killed. (Sec 3) Motivated by these observations we extend the memory hierarchy by introducing a small storage structure, the Transient Value Cache (TVC). TVC用來記錄stored values are communicated or killed. 1. stores whose values that are likely to be killed soon 2. loads that are likely to access

6. Transient Value Cache Stores that are likely to be killed soon are initially sent only to the TVC in hope that they will be killed in it before they are forced to go the data cache (part (a)). Other stores are sent to both caches to keep them coherent (part (b)). 有資料相依的store一開始存在TVC中,希望在被傳到Data Cache前希望被殺掉 其他一般的store會兩邊都傳。

6. Transient Value Cache Loads that are likely to have true dependences with recent stores are initially sent only to the TVC. Such a load is directed to the data cache only if we miss in the TVC (part (c)). Other loads have to access both the TVC and the data cache in parallel (part (d) . 功能 : reducing data cache bandwidth requirements (降低快取的頻寬需求),increasing the effective memory bandwidth (增加有效率的記憶體頻寬). 有資料相依的Load一開始存在TVC中,如果miss的話才會傳給Data Cache中。 其他的Load兩邊都傳值。

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

7. Experimental Evaluation 預測相依正確度 1.有限的硬體資源下的預測條 2.大部分都正確 3.錯誤的 4.無限的資源

7. Experimental Evaluation Percentage of true dependences communicated correctly via cloaking. Dark bar is for infinite DPNT, gray bars are for 512, 1K, 2K and 4K entries. It can be seen that the majority of all dynamic dependences is correctly communicated. It can reduce the effective communication latency. 透過cloaking 可以看出多數的動態預測可以正確的溝通。有效的降低延遲。

Content Abstract Introduction Memory as an Inter-operation Communication Agent Memory Traffic Analysis 方法一 : Speculative Memory Cloaking 方法二 : Speculative Memory Bypassing 方法三 : Transient Value Cache Experimental Evaluation Summary and Conclusions 1.我們開始討論內部記憶體溝通的方式和問題在 Sec2 2.我們做定量的內部分析在Sec3 3.Sec4 講 Cloacking 4.Sec5 講 bypassing 5.Sec6 講 建立一個 TVC 6. Sec7 講以上技術的量化考量 7. Sec9 我們最後做一個總結

8. Summary and Conclusions (1) We show that the data dependence status of most memory operations can be predicted with high accuracy on a per instruction basis and based solely on the history of previous data dependences. (2) We show that the traditional implicit specification of memory communication can be dynamically converted into a explicit specification. (3) We propose speculative memory cloaking and its extension speculative memory bypassing, to take the address calculation, the load and store instructions themselves off the communication path. (4) We propose the Transient Value Cache a dependence status prediction managed storage structure that can reduce the contention for data cache resources. 我們證明了資料相依性能在多數的記憶體運算中被正確的運算,藉由之前的資料相依表。 可以藉由動態的將傳統的implicit specification變成快速的explicit specification. 我們利用了speculative memory cloaking 和 speculative memory bypassing 避免過多的計算。 我們建立TVC,一個獨立cache 給資料相依時暫時存放使用(像是馬上要被殺掉的store),可以降低data cache 的使用。