现代计算机体系结构 主讲教师:张钢 教授 天津大学计算机学院 课件、作业、讨论网址:

Slides:



Advertisements
Similar presentations
微电子技术新进展西安理工大学 电子工程系 高 勇. 内容简介 微电子技术历史简要回顾 微电子技术发展方向 – 增大晶圆尺寸和缩小特征尺寸面临的挑战和 几个关键技术 – 集成电路 (IC) 发展成为系统芯片 (SOC) 可编程器件可能取代专用集成电路( ASIC ) – 微电子技术与其它领域相结合将产生新产业.
Advertisements

云计算辅助教学风云录 黎加厚 上海师范大学教育技术系 2010年8月9日.
Course 1 演算法: 效率、分析與量級 Algorithms: Efficiency, Analysis, and Order
Welcome to the world of Computer Organization 计算机组成原理
第 2 章 中央處理單元.
第一章 多核概述 使用多核了吗? 摩尔定律——芯片的晶体管数量每一年半左右增长一倍。 处理器性能不断提高主要基于两个原因:
Chapter 10 效能測量與分析.
第二章 微型计算机系统 第一节 基本术语和基本概念 第二节 计算机系统的基本构成 第三节 微机系统的硬件组成 第四节 微机系统的软件组成.
Chapter 17 數位革命與全球電子市場 Global Marketing Warren J. Keegan Mark C. Green.
创新实验 课程说明 计算机学院 孙彤 计算机学院 张明.
计算机应用基础 计算机基础知识.
加快数据中心运转速度 — 加速业务发展 约翰•福勒 甲骨文公司系统事业部执行副总裁. 加快数据中心运转速度 — 加速业务发展 约翰•福勒 甲骨文公司系统事业部执行副总裁.
最新計算機概論 第3章 計算機組織.
新世代計算機概論 第3章 電腦的系統單元.
第一章 導論.
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Operating System CPU Scheduing - 3 Monday, August 11, 2008.
數位電路導論 Introduction to Circuits Theory and Digital Electronics
微小光機電的新科技生活 一花一世界,一沙一天國; 君掌盛無邊,剎那含永劫。(李叔同) 沈志雄 彰化師範大學機電工程學系 光磊科技研發部
CH.2 Introduction to Microprocessor-Based Control
Chapter 5 電腦元件 目標---- 研讀完本章後,你應該可以: 閱讀有關電腦的廣告以及了解它的專業用語(行話)。
第 2 章 中央處理單元.
周学海 中国科学技术大学 2018/11/6 计算机体系结构 周学海 中国科学技术大学.
- Cellular Phone Content
数字系统设计 I Digital System Design I
CPU資料處理 醫務管理暨醫療資訊學系 陳以德 副教授: 濟世CS 轉
計算方法設計與分析 Design and Analysis of Algorithms 唐傳義
沈 彤 英特尔中国区嵌入式产品事业部 市场经理 Jul, 26th 2011
電腦的種類 超級電腦 (supercomputer) 大型電腦 (Mainframe) 迷你電腦 ( Mini computer)
Purposes of Mold Cooling Design
Applied Operating System Concepts
不断变迁的闪存行业形势 Memory has changed, especially serial - from a low cost, low pin count, slow memory to an advanced, high performance memory solution to save.
C H A P T E R 10 存储器层次.
GPU分散式演算法設計與單機系統模擬(第二季)
5 Computer Organization (計算機組織).
Ch1: Computer Abstractions 计算机系统概述
Popular Uses of ABC/M - the 1st half
Logistics 物流 昭安國際物流園區 總經理 曾玉勤.
Flash数据管理 Zhou da
微程序控制器 刘鹏 Dept. ISEE Zhejiang University
Ch 9: Input/Output System 输入/输出系统
胡維平 國立中正大學化學暨生物化學系 Aug. 30, 2017
Hong Kong Library Education and Career Forum 2009
Programmable Logic Architecture Verilog HDL FPGA Design
重點 資料結構之選定會影響演算法 選擇對的資料結構讓您上天堂 程式.
Formal Pivot to both Language and Intelligence in Science
邏輯設計 Logic Design 顧叔財, Room 9703, (037)381864,
塑膠材料的種類 塑膠在模具內的流動模式 流動性質的影響 溫度性質的影響
第4章(1) 空间数据库 —数据库理论基础 北京建筑工程学院 王文宇.
嵌入式系統 資工系 魏 凱 城.
2012清大電資院學士班 「頂尖企業暑期實習」 經驗分享心得報告 實習企業:工業技術研究院 電光所 實習學生:電資院學士班  呂軒豪.
Operating System Principles 作業系統原理
- Cellular Phone Content
計算機概論 第3章 計算機組織與結構概觀.
IBM SWG Overall Introduction
資料結構 Data Structures Fall 2006, 95學年第一學期 Instructor : 陳宗正.
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
Real-Time System Software Group Lab 408 Wireless Networking and Embedded Systems Laboratory Virtualization, Parallelization, Service 實驗室主要是以系統軟體設計為主,
CISCO年度1994股東常會
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
虚 拟 仪 器 virtual instrument
中国科学技术大学计算机系 陈香兰 Fall 2013 第三讲 线程 中国科学技术大学计算机系 陈香兰 Fall 2013.
半導體原理及應用 (II) 陳志方 國立成功大學 電機工程學系 1/15/06.
计算机系统结构(2012年春) ----存储层次: Cache基本概念
The viewpoint (culture) [观点(文化)]
SoC 與微控制器的發展 朱亞民.
BiCuts: A fast packet classification algorithm using bit-level cutting
Operating System Software School of SCU
Experimental Analysis of Distributed Graph Systems
WiFi is a powerful sensing medium
Presentation transcript:

现代计算机体系结构 主讲教师:张钢 教授 天津大学计算机学院 课件、作业、讨论网址:http://glearning.tju.edu.cn/ 通信邮箱:gzhang@tju.edu.cn 2018年

主要参考书(一) Computer Architecture A Quantitative Approach 机械工业出版社 (英文版第5版) John L. Hennessy David A. Patterson 机械工业出版社 电子书网址: http://www.doc88.com/p-112663203506.html 现代计算机体系结构

主要参考书(二) 计算机体系结构 量化研究方法 人民邮电出版社 (第5版) John L. Hennessy David A. Patterson 贾洪峰 译 人民邮电出版社 现代计算机体系结构

主要参考书(三) Computer Architecture A Quantitative Approach 机械工业出版社 (英文版第4版) John L. Hennessy David A. Patterson 机械工业出版社 现代计算机体系结构

主要参考书(四) 计算机系统结构 一种定量的方法 (第四版) 电子工业出版社 John L. Hennessy David A. Patterson著 白跃彬 译 电子工业出版社 现代计算机体系结构

Stanford主页上对Hennessy的介绍 现代计算机体系结构

Stanford主页上对Hennessy的介绍 现代计算机体系结构

主要参考书(五) 可扩展并行计算 Scalable Parallel Computing 机械工业出版社 技术、结构与编程 Technology, Architecture, Programming 黄铠 徐志伟 著 陆鑫达 等译 机械工业出版社 现代计算机体系结构

主要参考书(六) 计算机系统结构(第二版) 郑纬民 等 清华大学出版社 现代计算机体系结构

课程时间安排 课程安排:2018年3月8日开始 上课时间:1-8周,每周四下午1:30-5:00 上课地点:第55楼A区117教室 现代计算机体系结构

The Main Contents课程主要内容 Chapter 1. Fundamentals of Quantitative Design and Analysis Chapter 2. Memory Hierarchy Design Chapter 3. Instruction-Level Parallelism and Its Exploitation Chapter 4. Data-Level Parallelism in Vector, SIMD, and GPU Architectures Chapter 5. Thread-Level Parallelism Chapter 6. Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Appendix A. Pipelining: Basic and Intermediate Concepts 现代计算机体系结构

先修课要求 本科课程: 计算机组成原理 计算机系统结构 操作系统 计算机网络 现代计算机体系结构

考试与成绩 出勤(包括Quizs和回答问题): 20% 作业(网上提交): 20% 期末考试(闭卷): 60% 提交作业要求: 作业(网上提交): 20% 期末考试(闭卷): 60% 提交作业要求: 写清姓名和作业号,张某某 作业几 作业以附件形式提交,附件不要使用WPS格式 提交时间要求: 周一早8点之前提交 现代计算机体系结构

The Main Contents课程主要内容 Chapter 1. Fundamentals of Quantitative Design and Analysis Chapter 2. Memory Hierarchy Design Chapter 3. Instruction-Level Parallelism and Its Exploitation Chapter 4. Data-Level Parallelism in Vector, SIMD, and GPU Architectures Chapter 5. Thread-Level Parallelism Chapter 6. Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Appendix A. Pipelining: Basic and Intermediate Concepts 现代计算机体系结构

Computer Technology Performance improvements: Improvements in semiconductor technology Feature size, clock speed Improvements in computer architectures Enabled by High Level Language (HLL) compilers, UNIX Lead to RISC architectures Together have enabled: Lightweight computers Productivity-based managed/interpreted programming languages 现代计算机体系结构

Uniprocessor Performance From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006 VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x86: 20%/year 2002 to present 现代计算机体系结构

Single Processor Performance Move to multi-processor RISC 现代计算机体系结构

Original Food Chain Big Fishes Eating Little Fishes 现代计算机体系结构

Massively Parallel Processors 1986 Computer Food Chain Mainframe Work- station PC Mini- computer Mini- supercomputer Supercomputer Massively Parallel Processors 现代计算机体系结构

Massively Parallel Processors Mini- supercomputer Mini- computer Massively Parallel Processors 2002 Computer Food Chain Mainframe Work- station PC Server Supercomputer Now who is eating whom? 现代计算机体系结构

Why Such Change in 16 years? Performance Technology Advances CMOS VLSI dominates older technologies (TTL, ECL) in cost AND performance Computer architecture advances improves low-end RISC, superscalar, RAID, … 现代计算机体系结构

作业1: 列举近20年来在计算机系统结构方面出现的各项新技术 现代计算机体系结构

Why Such Change in 16 years? Price: Lower costs due to … Simpler development CMOS VLSI: smaller systems, fewer components Higher volumes CMOS VLSI : same dev. cost 10,000 vs. 10,000,000 units Lower margins by class of computer, due to fewer services 现代计算机体系结构

Why Such Change in 16 years? Function Rise of networking/local interconnection technology 现代计算机体系结构

Moore’s Law Exponential Growth – doubling of transistors every couple of years 现代计算机体系结构

Growth in CPU Transistor Count 现代计算机体系结构

现代计算机体系结构

Moore’s Law Graph In 1965, Gordon Moore prediction, popularly known as Moore's Law, states that the number of transistors on a chip will double about every two years. 现代计算机体系结构

Moore’s Law Graph 芯片尺寸大些好?小些好? 图中灰色圆形为晶圆 图中黄点为杂质 现代计算机体系结构

Moore’s Law Graph 试想如果一个晶圆只出一个芯片会怎样? 现代计算机体系结构

Moore’s Law Graph 适当的芯片数总成本最少 现代计算机体系结构

Do you want to be a millionaire? You double your investment everyday Starting investment - one cent. How long it takes to become a millionaire? 20 days 27 days 37 days 365 days Lifetime ++ 现代计算机体系结构

Do you want to be a millionaire? You double your investment everyday Starting investment - one cent. How long it takes to become a millionaire 20 days One million cents 27 days Millionaire 37 days Billionaire Doubling transistors every 18 months This growth rate is hard to imagine 现代计算机体系结构

现代计算机体系结构

现代计算机体系结构

现代计算机体系结构

Uniprocessor Performance From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006 VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x86: 20%/year 2002 to present 现代计算机体系结构

Why does the improvement have dropped? The End of the Uniprocessor Era Single biggest change in the history of computing systems 现代计算机体系结构

Current Trends in Architecture Cannot continue to leverage Instruction-Level parallelism (ILP) Single processor performance improvement ended in 2003 New models for performance: Data-level parallelism (DLP) Thread-level parallelism (TLP) Request-level parallelism (RLP) These require explicit restructuring of the application 现代计算机体系结构

Trends in Technology Integrated circuit technology Transistor density: 35%/year Die size芯片面积: 10-20%/year Integration overall: 40-55%/year DRAM capacity: 25-40%/year (slowing) Flash capacity: 50-60%/year 15-20X cheaper/bit than DRAM Magnetic disk technology: 40%/year 15-25X cheaper/bit then Flash 300-500X cheaper/bit than DRAM 现代计算机体系结构

Memory Capacity (Single Chip DRAM) year size(Mb) cyc time 1980 0.0625 250 ns 1983 0.25 220 ns 1986 1 190 ns 1989 4 165 ns 1992 16 145 ns 1996 64 120 ns 2000 256 100 ns 现代计算机体系结构

Bandwidth and Latency Bandwidth or throughput Latency or response time Total work done in a given time 10,000-25,000X improvement for processors 300-1200X improvement for memory and disks Latency or response time Time between start and completion of an event 30-80X improvement for processors 6-8X improvement for memory and disks 现代计算机体系结构

Log-log plot of bandwidth and latency milestones 现代计算机体系结构

Transistors and Wires Feature size Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to .032 microns in 2011 Transistor performance scales linearly Wire delay does not improve with feature size! Integration density scales quadratically 现代计算机体系结构

Power and Energy Thermal Design Power (TDP) 热量设计功耗 Characterizes sustained power consumption持续功耗 Used as target for power supply and cooling system Lower than peak power, higher than average power consumption Clock rate can be reduced dynamically to limit power consumption Energy per task is often a better measurement 现代计算机体系结构

Power and Energy Intel公司对Core i7处理器给出的是最大TDP (Max TDP),并不是 TDP 现代计算机体系结构

Dynamic Energy and Power Transistor switch from 0 -> 1 or 1 -> 0 ½ x Capacitive load x Voltage2 Dynamic power ½ x Capacitive load x Voltage2 x Frequency switched Reducing clock rate reduces power, not energy 现代计算机体系结构

Power Intel 80386 consumed ~ 2 W 3.3 GHz Intel Core i7 (1st G) consumes 130 W Heat must be dissipated from 1.5 cm x 1.5 cm chip This is the limit of what can be cooled by air 现代计算机体系结构

Static Power Static power consumption Currentstatic x Voltage Scales with number of transistors To reduce: power gating – turning off the power supply to idle circuits to reduce leakage. 现代计算机体系结构

Energy Saving Do nothing well 以逸待劳 Turn off the clock of inactive modules E.g. floating-point unit, cores Dynamic Voltage-Frequency Scaling (DVFS)动态电压—频率调整 Design for typical case 典型情况设计 Overclocking 超频 现代计算机体系结构

Energy Saving Dynamic Voltage-Frequency Scaling (DVFS)动态电压—频率调整 当CPU处于仅有 3%的使用率时, CPU也非要处于 全速运行的状态 吗? 现代计算机体系结构

Energy Saving Why is DVS, not is DVFS? “Figure 5.11 shows the potential power savings of CPU dynamic voltage scaling (DVS) for that same server by plotting the power usage across a varying compute load for three frequency-voltage steps.” 现代计算机体系结构

Energy Saving Design for typical case 典型情况设计 Memory and storage offer low power modes “Emergency slowdown” Overclocking 超频 Intel从2008年开始在芯片中提供Turbo模式。 在Turbo模式下,允许在少数几个核(核心)上以高于标称时钟频率的更高频率短时运行。 例如,3.3GHz Core i7是多核微处理器,不同型号的Core i7有2-8个核(核心)不等,Core i7可以在很短的时间内让部分核(核心)以3.6GHz的频率运行 现代计算机体系结构

Energy Saving The primary evaluation now is tasks per joule or performance per watt Not is performance per mm2 of silicon 现代计算机体系结构

思考题 有一个现象:相同的程序、在相同的计算机上运行,室温的变化会影响程序的执行速度。 为什么室温会影响程序执行的速度?或者说为什么室温会影响计算机系统的性能? 现代计算机体系结构

Trends in Cost Cost driven down by learning curve 学习曲线 DRAM: price closely tracks cost Microprocessors: price depends on volume(产量) 10% less for each doubling of volume 现代计算机体系结构

Dependability Module reliability Mean time to failure (MTTF) Mean time to repair (MTTR) Mean time between failures (MTBF) = MTTF + MTTR Availability = MTTF / MTBF 现代计算机体系结构

Conventional Wisdom in Computer Architecture Old Conventional Wisdom: Power is free, Transistors expensive New Conventional Wisdom: “Power wall” Power expensive, Transistors free (Can put more on chip than can afford to turn on) 现代计算机体系结构

Conventional Wisdom in Computer Architecture Old CW: Sufficient increasing Instruction-Level Parallelism via compilers, innovation (Out-of-order, speculation, VLIW, …) New CW: “ILP wall” law of diminishing returns on more HW for ILP 现代计算机体系结构

Conventional Wisdom in Computer Architecture Old CW: Multiplies are slow, Memory access is fast New CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for multiply) 现代计算机体系结构

Conventional Wisdom in Computer Architecture Old CW: Uniprocessor performance 2X / 1.5 yrs New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall Uniprocessor performance now 2X / 5(?) yrs  Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years) More, simpler processors are more power efficient 现代计算机体系结构

计算机体系结构课程的内容 1950s to 1960s: Computer Architecture Course: Computer Arithmetic 1970s to mid 1980s: Computer Architecture Course: Instruction Set Design, especially ISA appropriate for compilers 1990s: Computer Architecture Course: Design of CPU, memory system, I/O system, Multiprocessors, Networks 2010s: Computer Architecture Course: Self adapting systems? Self organizing structures? DNA Systems/Quantum Computing? 现代计算机体系结构

计算机体系结构的研究内容 进一步提高单个微处理器的性能。(光速极限问题) 基于微处理器的多处理器体系结构。 全面提高计算机的系统性能:可用性,可维护性,可缩放性。 新型器件的处理器:如光计算机;新原理的计算机(生物,分子,又提出了DNA计算机)。 现代计算机体系结构

What is Computer Architecture? Application Gap too large to bridge in one step (but there are exceptions, e.g. magnetic compass) Physics In its broadest definition, computer architecture is the design of the abstraction layers that allow us to implement information processing applications efficiently using available manufacturing technologies. 现代计算机体系结构

Abstraction Layers in Modern Systems Application Algorithm Original domain of the computer architect (‘50s-’80s) Programming Language Reliability, power, … Parallel computing, security, … Reinvigoration of computer architecture, mid-2000s onward. Operating System/Virtual Machine Domain of recent computer architecture (‘90s) Instruction Set Architecture (ISA) Microarchitecture Gates/Register-Transfer Level (RTL) Circuits Devices Physics 现代计算机体系结构

Computer Engineering Methodology Evaluate Existing Systems for Bottlenecks Implementation Complexity Benchmarks Technology Trends Implement Next Generation System Simulate New Designs and Organizations Workloads 现代计算机体系结构

Types of Computers Traditional Computers come in many shapes and sizes Supercomputers Mainframes Minicomputers Microcomputers, Also known as a PC Palm computers, Also known as PDAs Embedded computers 现代计算机体系结构

Supercomputers Designed for ultra-high performance tasks weather analysis large expensive massively parallel-processing 现代计算机体系结构

Mainframes Require high performance Generate and process large numbers of transactions IBM S/390 126 MIPS in a single-processor configuration. 现代计算机体系结构

Minicomputers Designed for real-time dedicated applications or as high-performance, multiple user applications Digital Alpha IBM RS/6000 Sun Ultra 现代计算机体系结构

Microcomputers The most prevalent form Sitting on a standard desktop or even laptop The first PC was built by IBM 现代计算机体系结构

Apple 现代计算机体系结构

Palm computers These computers are about the size of a human hand word processing spreadsheet calculations handwriting recognition game playing faxing 现代计算机体系结构

Types of Computers Now Personal Mobile Device (PMD) Desktop Computing Servers Clusters/Warehouse-Scale Computers (WSC) Many desktop computers or servers are connected by local area networks to act as a single larger computer The largest of the clusters Embedded Computers What are embedded computers? 现代计算机体系结构

Types of Computers Now 现代计算机体系结构

Classes of Parallelism and Parallel Architectures In applications Data-Level Parallelism (DLP) Task-Level Parallelism (TLP) Hardware support Instruction-Level Parallelism Vector Architectures and Graphic Processor Unit (GPUs) Thread-Level Parallelism Request-Level Parallelism 现代计算机体系结构

Flynn Categories Single instruction stream, single data stream (SISD) Single instruction stream, multiple data stream (SIMD) Multiple instruction stream, single data stream (MISD Multiple instruction stream, multiple data stream (MIMD) 现代计算机体系结构

Flynn Categories 现代计算机体系结构

Flynn Categories Some further divide the MIMD category into SPMD(Single Program, Multiple Data) and MPMD(Multiple Program, Multiple Data) SPMD Multiple autonomous processors simultaneously executing the same program on different data MPMD Multiple autonomous processors simultaneously operating at least 2 independent programs 现代计算机体系结构

Flynn’s Web Page Copy from Stanford University 现代计算机体系结构

Intel 4004 现代计算机体系结构

Intel 8008 现代计算机体系结构

Intel 80286 现代计算机体系结构

Intel 80386 现代计算机体系结构

Intel 80486 现代计算机体系结构

Intel Pentium 现代计算机体系结构

Intel Pentium Pro 现代计算机体系结构

Intel Pentium II 现代计算机体系结构

Pentium Evolution (1) 8080 first general purpose microprocessor 8 bit data path Used in first personal computer – Altair 8086 much more powerful 16 bit instruction cache, prefetch few instructions 8088 (8 bit external bus) used in first IBM PC 80286 16 Mbyte memory addressable up from 1Mb 80386 32 bit Support for multitasking 现代计算机体系结构

Pentium Evolution (2) 80486 Pentium Pentium Pro sophisticated powerful cache and instruction pipelining built in maths co-processor Pentium Superscalar (超标量) Multiple instructions executed in parallel Pentium Pro Increased superscalar organization Aggressive register renaming branch prediction data flow analysis speculative execution (推测执行) 现代计算机体系结构

Pentium Evolution (3) Pentium II Pentium III Pentium 4 MMX technology graphics, video & audio processing Pentium III Additional floating point instructions for 3D graphics Pentium 4 Note Arabic rather than Roman numerals Further floating point and multimedia enhancements 现代计算机体系结构

Sea Change in Chip Design Intel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm2 chip RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm2 chip 125 mm2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+Dcache RISC II shrinks to ~ 0.02 mm2 at 65 nm Caches via DRAM or 1 transistor SRAM? Processor is the new transistor? 现代计算机体系结构

Problems with Sea Change Algorithms, Programming Languages, Compilers, Operating Systems, Architectures, Libraries, … not ready to supply Thread-Level Parallelism or Data-Level Parallelism for 1000 CPUs / chip, 现代计算机体系结构

Problems with Sea Change Architectures not ready for 1000 CPUs / chip Unlike Instruction-Level Parallelism, cannot be solved by computer architects and compiler writers alone, but also cannot be solved without participation of architects 现代计算机体系结构

Problems with Sea Change This edition of our course and 4th Edition of textbook “Computer Architecture: A Quantitative Approach” explores shift from Instruction-Level Parallelism to Thread-Level Parallelism / Data-Level Parallelism 现代计算机体系结构

Measurement and Evaluation Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems 现代计算机体系结构

Measurement and Evaluation Creativity Cost / Performance Analysis Good Ideas Mediocre Ideas Bad Ideas 注意:英文中常用的Cost/Performance与中文中常用的性能/价格正好相反! 现代计算机体系结构

现代计算机体系结构

现代计算机体系结构

性能和成本 “X is n times faster than Y” mean =n 现代计算机体系结构

Amdahl’s Law Speedup=(Performance for entire task using the enhancement)/ (Performance for entire task without using the enhancement) Speedup=(Execution time for entire task without using the enhancement)/ (Execution time for entire task using the enhancement) 现代计算机体系结构

Amdahl’s Law Depends on Two Factors Fraction enhanced The fraction of the computation time in the original machine that can be converted to take advantage of the enhancement (可改进部分占用的时间)/(改进前整个任务的执行时间)< 1 例:改进前整个任务60秒,可改进部分为20秒,则Fraction enhanced=20/60 现代计算机体系结构

Amdahl’s Law Depends on Two Factors Speedup enhanced The improvement gained by the enhanced execution mode (改进前改进部分的执行时间)/(改进后改进部分的执行时间)> 1 例:改进前改进部分5秒,改进后改进部分2秒,则Speedup enhanced=5/2 现代计算机体系结构

由Amdahl’s Law得出的结论(一) [(可改进部分占用的时间)/(改进前整个任务的执行时间)] / [(改进前改进部分的执行时间)/(改进后改进部分的执行时间)] = (改进后改进部分的执行时间)/(改进前整个任务的执行时间) 现代计算机体系结构

由Amdahl’s Law得出的结论(二) 由结论(一)得: Speedup overall = 1 / [(1-Fraction enhanced) + (Fraction enhanced / Speedup enhanced)] 现代计算机体系结构

Amdahl’s Law结论的例子(1) Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew = ExTimeold x (0.9 + 0.1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95 现代计算机体系结构

Amdahl’s Law结论的例子(2) 现代计算机体系结构

Amdahl’s Law结论的例子(3) 现代计算机体系结构

CPU Time or CPU Time =CPU clock cycles for a program / Clock rate =CPU clock cycles for a program  Clock cycle time 现代计算机体系结构

Cycles Per Instruction (Throughput) “Average Cycles per Instruction” CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count CPU Time = Instruction Count * CPI * Clock cycle Time = Instruction Count * CPI / Clock Rate 现代计算机体系结构

Cycles Per Instruction (Throughput) 现代计算机体系结构

Cycles Per Instruction (Throughput) “Instruction Frequency” Invest Resources where time is Spent! 现代计算机体系结构

Example: 现代计算机体系结构

现代计算机体系结构

Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPI Clock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X 现代计算机体系结构

Example: Calculating CPI Base Machine (Reg / Reg) Op Freq Cycles CPI(i) (% Time) ALU 50% 1 .5 (33%) Load 20% 2 .4 (27%) Store 10% 2 .2 (13%) Branch 20% 2 .4 (27%) 1.5 Typical Mix 现代计算机体系结构

性能标准 MIPS ( Million Instruction Per Second ) =指令条数 /(执行时间106) 缺陷: 依赖于指令集 在同一台机器上,因程序不同而不同 可能与性能相反 现代计算机体系结构

性能标准 MFLOPS ( Million Floating Point Oprations Per Second ) =程序中的浮点操作次数 /(执行时间106) 优点:可以比较不同的机器 缺陷: 不能体现整体性能 依赖浮点操作类型 现代计算机体系结构

性能标准 基准测试程序 衡量性能的唯一固定而且可靠的标准是真正执行程序的时间。 实际应用程序 核心测试程序 小型基准测试程序 综合基准测试程序 衡量性能的唯一固定而且可靠的标准是真正执行程序的时间。 现代计算机体系结构

Benchmark Suites Desktop SPEC CPU2006: 12 integer, 17 floating-point SPECviewperf, SPECapc: graphics benchmarks Server SPEC CPU2006: running multiple copies, SPECrate SPECSFS: for NFS performance SPECWeb: Web server benchmark TPC-x: measure transaction-processing, queries, and decision making database applications Embedded Processor New area EEMBC: EDN Embedded Microprocessor Benchmark Consortium 现代计算机体系结构

性能比较 两个程序在三台计算机上的执行时间 总执行时间:一致的衡量标准 现代计算机体系结构

性能比较 平均执行时间 各执行时间的算术平均值 其中Ti是第i个程序的执行时间 现代计算机体系结构

性能比较 调和均值执行速率 其中Ri=1/Ti ,Ti是第i个程序的执行时间 现代计算机体系结构

性能比较 加权执行时间 加权算术平均值 其中Wi是第i个程序在任务中所占的比重,Ti是该程序的执行时间。 现代计算机体系结构

性能比较 几何平均 Geometric Mean Execution time ratio is normalized to a base machine Is used to figure out SPECrate 现代计算机体系结构

作业2 阅读关于Power Wall 、 ILP Wall、 Memory Wall方面的英文文献 要求: 每人至少阅读一篇英文文献 ; 写一篇类似大摘要的读书报告(中英文均可),注明文献出处; 提交所阅读的文献+读书报告(文件名:作业2+姓名) 现代计算机体系结构

作业3 第五版 Case Studies 1.4 完整的题目见下页 现代计算机体系结构

现代计算机体系结构

现代计算机体系结构