周学海 xhzhou@ustc.edu.cn 0551-63601556, 63492149 中国科学技术大学 2018/11/9 计算机体系结构周学海 xhzhou@ustc.edu.cn 0551-63601556, 63492149 中国科学技术大学 11/9/2018 中国科学技术大学.

Slides:

Advertisements

Similar presentations

1 I/O 设备访问方式和类型. 2 Overview n The two main jobs of a computer: l I/O (Input/Output) l processing n The control of devices connneted to the computer is.

Advertisements

程序的执行程序执行和指令执行概述数据通路基本结构和工作原理流水线方式下指令的执行

第 2 章中央處理單元.

Chapter 10 效能測量與分析.

2017年3月5日单片机原理与应用背景知识调查.

建構 Beta電腦 – Fall /29/0.

第3章流水线技术.

Performance Evaluation

資料庫設計 Database Design.

第二章指令系统.

大连理工大学软件学院软件工程系赖晓晨计算机组成与结构大连理工大学软件学院软件工程系赖晓晨

Operators and Expressions

CH1 Number Systems and Conversion

Arithmetic for Computers

Population proportion and sample proportion

CH.2 Introduction to Microprocessor-Based Control

第 2 章中央處理單元.

微处理器设计1 刘鹏 College of ISEE Zhejiang University

臺北市立大學資訊科學系(含碩士班) 賴阿福 CS TEAM

Quiz 3 假设各种分支占所有指令数的百分比如下表所示：

周学海 , 中国科学技术大学 2018/11/15 计算机体系结构周学海 , 中国科学技术大学 11/15/2018 中国科学技术大学.

Lecture on High Performance Processor Architecture (CS05162)

第4章处理器（CPU） 4.1 引言 4.2 逻辑设计的一般方法 4.3 建立数据通路 4.4 一个简单的实现机制 4.5 多周期实现机制.

计算机系统结构第一章基本概念第二章指令系统第三章存储系统第四章输入输出系统** 第五章标量处理机第六章向量处理机

指令集架構計算機也跟人類一樣，需要提供一套完整的語言讓人們跟它充分溝通，以完成正確的計算工作。

1-1 微電腦系統單元 1-2 微電腦系統架構 1-3 微控制器（單晶片微電腦） 1-4 類比與數位訊號介面

1-1 微電腦系統單元 1-2 微電腦系統架構 1-3 微控制器（單晶片微電腦） 1-4 類比與數位訊號介面

5 Computer Organization (計算機組織).

C 程式設計— 語言簡介台大資訊工程學系資訊系統訓練班.

Lecture on High Performance Processor Architecture (CS05162)

The Processor: Datapath and Control

微程序控制器刘鹏 Dept. ISEE Zhejiang University

第2章计算机指令集结构曲冠南

创建型设计模式.

C++ 程式設計— 語言簡介台大資訊工程學系資訊系統訓練班.

嵌入式微处理器系统第二章处理器技术（1）北京大学软件与微电子学院.

微处理器设计2 刘鹏 College of ISEE Zhejiang University

C 語言簡介 - 2.

7.1 机器指令 7.2 操作数类型和操作类型 7.3 寻址方式 7.4 指令格式举例 7.5 RISC 技术.

第14章其它DSP设计库 14.1 总线控制库 14.2 复数信号库 14.3 Gates库 14.4 状态机函数库

第二章计算机指令集结构指令集结构是系统结构的主要内容之一，其功能设计实际上就是确定软/硬件功能分配。

重點資料結構之選定會影響演算法選擇對的資料結構讓您上天堂程式.

陳慶瀚機器智慧與自動化技術(MIAT)實驗室國立中央大學資工系 2013年5月28日

校園網路架構介紹與資源利用主講人：趙志宏圖書資訊館網路通訊組.

逆向工程-汇编语言

CPU结构和功能.

第二章计算机指令集结构指令集结构是系统结构的主要内容之一，其功能设计实际上就是确定软/硬件功能分配。

計算機概論第3章計算機組織與結構概觀.

Instructions： Language of the Machine

CISC vs. RISC 複雜指令集電腦(Complex Instruction Set Computer: CISC)

第2章计算机指令集结构授课教师：车喜龙

C语言程序设计主讲教师：陆幼利.

Speaker: Liu Yu-Jiun Date: 2009/4/29

The Processor: Datapath and Control (Multi-cycle implementation)

中国科学技术大学计算机系陈香兰 2013Fall 第七讲存储器管理中国科学技术大学计算机系陈香兰 2013Fall.

虚拟仪器 virtual instrument

中国科学技术大学计算机系陈香兰 Fall 2013 第三讲线程中国科学技术大学计算机系陈香兰 Fall 2013.

周学海中国科学技术大学 2019/4/19 计算机体系结构周学海中国科学技术大学.

Lightweight Data-flow Analysis for Execution-driven Constraint Solving

Distance Vector vs Link State

BiCuts: A fast packet classification algorithm using bit-level cutting

College of Computer Science & Technology

第4章指令级并行授课教师：车喜龙

临界区问题的硬件指令解决方案（Synchronization Hardware）

3. 逻辑运算指令 A、简单逻辑操作指令 CLR A. (不影响CY、AC、 OV标志) CPL A

Distance Vector vs Link State Routing Protocols

2 Number Systems, Operations, and Codes

清华大学计算机科学与技术系高性能计算研究所郑纬民教授 2005年5月

C++语言程序设计 C++语言程序设计第一章 C++语言概述第十一组 C++语言程序设计.

Introduction to Computer Security and Cryptography

Presentation transcript:

周学海 xhzhou@ustc.edu.cn 0551-63601556, 63492149 中国科学技术大学 2018/11/9 计算机体系结构周学海 xhzhou@ustc.edu.cn 0551-63601556, 63492149 中国科学技术大学 11/9/2018 中国科学技术大学

ISA的分类寻址方式操作数的类型、表示和大小 ISA的操作控制类指令指令编码编译技术与计算机体系结构 MIPS指令集 2018/11/9 中国科学技术大学

ISA instruction set software hardware 2018/11/9 中国科学技术大学

ISA的概念软件子系统与硬件子系统的关键界面一组直接由硬件执行的指令，包括应具备的特性程序员可见的指令集合程序员可见的机器状态(registers and memory) 应具备的特性可持续多代，以保持向后（backward) 兼容可用于不同应用领域（desktops, servers, embedded applications）为高层软件的设计与开发提供方便的功能方便低层硬件子系统高效实现 2018/11/9 中国科学技术大学

ISA 的演进 2018/11/9 中国科学技术大学

ISA 的基本分类早期的ISA 0 address add ; tos ¬ tos + next 堆栈型（Stack）: 0 address add ; tos ¬ tos + next 累加器型(Accumulator) (1 register): 1 address add A ;acc ¬ acc + mem[A] 被通用寄存器型ISA取代通用寄存器型 (General Purpose Register): Register-memory 2 address add A B ；EA[A] ¬ EA[A] + EA[B] 3 address add A B C ；EA[A] ¬ EA[B] + EA[C] Load/Store: 3 address add Ra Rb Rc ；Ra ¬ Rb + Rc load Ra Rb ；Ra ¬ mem[Rb] store Ra Rb ；mem[Rb] ¬ Ra 存储器-存储器型（目前已经没有） 2018/11/9 中国科学技术大学

不同的指令格式 Zero-address format: Stack machines ADD Stack[SP-1] Stack[SP] + Stack[SP-1] Usually top of stack is kept in high-speed registers One-address format: Accumulator machines ADD (X) AC AC + Memory[X] Two-address format: destination = first source operand ADD R1, R2 Reg[R1] Reg[R1] + Reg[R2] ADD R1, (X) Reg[R1] Reg[R1] + Memory[X] ADD (X), (Y) Memory[X] Memory[X] + Memory[Y] Three-address format: most RISC architectures ADD R3, R1, R2 Reg[R3] Reg[R1] + Reg[R2] Variety 2018/11/9 中国科学技术大学

ISA: 必须说明哪些东西? 指令格式或编码方式。即如何编码? 操作数和操作结果的存放位置数据类型和大小寻址方式支持哪些操作 Instruction Fetch Decode Operand Execute Result Store Next 指令格式或编码方式。即如何编码? 操作数和操作结果的存放位置存放位置? 多少个显式操作数? 存储器操作数如何定位? 哪些操作数可以或不可以放到存储器中? 数据类型和大小寻址方式支持哪些操作下一条指令地址 jumps, conditions, branches fetch-decode-execute is implicit! 2018/11/9 中国科学技术大学

有关ISA的若干问题存储器寻址操作数的类型与大小所支持的操作控制转移类指令指令格式 2018/11/9 中国科学技术大学

存储器寻址 80年以来几乎所有机器的存储器都是按字节编址一个存储器地址可以访问：不同体系结构对字的定义是不同的问题：一个字节、2个字节、4个字节、更多字节….. 不同体系结构对字的定义是不同的 16位字（Intel X86）32位字（MIPS）如何读32位字，两种方案每次一个字节，四次完成；每次一个字，一次完成问题：（1）如何将字节地址映射到字地址（尾端问题）（2）一个字是否可以存放在任何字节边界上（对齐问题）即尾端（Endian)和对齐问题 2018/11/9 中国科学技术大学

尾端问题 little endian, big endian, 在一个字内部的字节顺序问题，如地址xxx00指定了一个字（int）, 存储器中从xxx00处连续存放ffff0000, 则有两种方式： Little endian 方式下xxx00位置是字的最低字节，整数值为0000ffff, Intel 80x86, DEC Vax, DEC Alpha (Windows NT) Big endian 方式下xxx00位置是字的最高字节，整数值为ffff0000, IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA 2018/11/9 中国科学技术大学

对齐问题对s字节的对象访问地址为A，如果A mod s =0 称为边界对齐。边界对齐的原因是存储器本身读写的要求，存储器本身读写通常就是边界对齐的，对于不是边界对齐的对象的访问可能要导致存储器的两次访问，然后再拼接出所需要的数。（或发生异常） 2018/11/9 中国科学技术大学

寻址方式寻址方式：如何说明要访问的对象地址有效地址：由寻址方式说明的某一存储单元的实际存储器地址。有效地址 vs. 物理地址 2018/11/9 中国科学技术大学

各种寻址方式的使用情况? (忽略寄存器直接寻址) 三个SPEC89程序在VAX结构上的测试结果：立即寻址，偏移寻址使用较多 2018/11/9 中国科学技术大学

偏移寻址主要问题：偏移的范围（偏移量的大小） Alpha Architecture with full optimization for Spec CPU2000, showing the average of integer programs(CINT2000) and the average of floating-point programs (CFP2000) 2018/11/9 中国科学技术大学

立即数寻址 Alpha Architecture with full optimization for Spec CPU2000, showing the average of integer programs(CINT2000) and the average of floating-point programs (CFP2000) 2018/11/9 中国科学技术大学

立即数的大小 The distribution of immediate values. About 20% were negative for CINT2000 and about 30% were negative for CFP2000. These measurements were taken on a Alpha, where the maximum immediate is 16 bits, for the spec cpu2000 programs. A similar measurement on the VAX, which supported 32-bit immediates, showed that about 20% to 25% of immediates were longer than 16 bits. 2018/11/9 中国科学技术大学

寻址方式小结重要的寻址方式: 偏移寻址方式, 立即数寻址方式, 寄存器间址方式 SPEC测试表明，使用频度达到 75%--99% 偏移字段的大小应该在 12 - 16 bits 可满足75%-99%的需求立即数字段的大小应该在 8 -16 bits 可满足50%-80%的需求 03/11 lecture 3 1, 性能综合（第6节） 2，第2章第1次（第7节）ISA 分类、memory addressing 2018/11/9 中国科学技术大学

操作数的类型、表示和大小操作数类型和操作数表示也是软硬件的主要界面之一。操作数类型：是面向应用、面向软件系统所处理的各种数据类型。整型、浮点型、字符、字符串、向量类型等类型由操作码确定或数据附加硬件解释的标记，一般采用由操作码确定数据附加硬件解释的标记，现在已经不采用操作数的表示：操作数在机器中的表示，硬件结构能够识别，指令系统可以直接使用的表示格式整型：原码、反码、补码浮点：IEEE 754标准十进制：BCD码/二进制十进制表示 2018/11/9 中国科学技术大学

常用操作数类型 ASCII character = 1 byte (64-bit register can store 8 characters Unicode character or Short integer = 2 bytes = 16 bits （half word) Integer = 4 bytes = 32 bits (word size on many RISC Processors) Single-precision float = 4 bytes = 32 bits (word size) Long integer = 8 bytes = 64 bits (double word) Double-precision float = 8 bytes = 64 bits (double word) Extended-precision float = 10 bytes = 80 bits (Intel architecture) Quad-precision float = 16 bytes = 128 bits 2018/11/9 中国科学技术大学

操作数的大小基准测试的结论：（1）对单字、双字的数据访问具有较高的频率（2）定义操作数字段长度为64位，更具有一般性 2018/11/9 中国科学技术大学

ISA的操作 CISC 指令集结构的功能设计 RISC 指令结构的功能设计 2018/11/9 中国科学技术大学

典型操作类型一般计算机都支持前三类所有的操作；不同计算机系统对系统支持程度不同，但都支持基本的系统功能。对最后四类操作的支持程度差别也很大，有些机器不支持，有些机器还在此基础上做一些扩展，这些指令有时作为可选的指令。 2018/11/9 中国科学技术大学

Top 10 80x86 Instructions 2018/11/9 中国科学技术大学

ISA对操作类型的选择需考虑的因素：速度、价格和灵活性基本要求：指令系统的完整性、规整性、高效率和兼容性完整性设计：具备基本指令种类兼容性：系列机高效率：指令执行速度快、使用频度高规整性让所有运算部件都能对称、均匀的在所有数据存储单元之间进行操作。对所有数据存储单元都能同等对待，无论是操作数或运算结果都可以无约束地存放到任意数据存储单元中正交性数据类型独立于寻址方式寻址方式独立于所要完成的操作当前对这一问题的处理有两种截然不同的方向 CISC和RISC 2018/11/9 中国科学技术大学

CISC计算机ISA的功能设计目标：强化指令功能，减少指令条数，以提高系统性能基本优化方法 1. 面向目标程序的优化是提高计算机系统性能最直接方法优化目标缩短程序的长度缩短程序的执行时间优化方法统计分析目标程序执行情况，找出使用频度高，执行时间长的指令或指令串优化使用频度高的指令用新的指令代替使用频度高的指令串 2018/11/9 中国科学技术大学

优化目标程序的主要途径（1/2) 1）增强运算型指令的功能如sin(x), Cos(x), SQRT(X)，甚至多项式计算如用一条三地址指令完成 P(X) = C(0)+C(1)X+C(2)X2+C(3)X3+….. 2) 增强数据传送类指令的功能主要是指数据块传送指令 R-R, R-M, M-M之间的数据块传送可有效的支持向量和矩阵运算，如IBM370 R-Stack之间设置数据块传送指令，能够在程序调用和程序中断时，快速保存和恢复程序现场，如 VAX-11 2018/11/9 中国科学技术大学

优化目标程序的主要途径（2/2) 3) 增强程序控制指令的功能在CISC中，一般均设置了多种程序控制指令，正常仅需要转移指令和子程序控制指令 2. 面向高级语言和编译程序改进指令系统优化目标：主要是缩小HL-ML之间的差距优化方法： 1）增强面向HL和Compiler支持的指令功能统计分析源程序中各种语句的使用频度和执行时间增强相关指令的功能优化使用频度高、执行时间长的语句增加专门指令，以缩短目标程序长度，减少目标程序执行时间，缩短编译时间 2018/11/9 中国科学技术大学

FORTRAN语言和COBOL语言中各种主要语句的使用频度一元赋值其他赋值 IF GOTO I/O DO CALL 其他 FORTRAN 31.0 15.0 11.5 10.5 6.5 4.5 6.0 COBOL 42.1 7.5 19.1 8.46 0.17 3.4 观察结果：（1）一元赋值在其中比例最大，增强数据传送类指令功能，缩短这类指令的执行时间是对高级语言非常有力的支持，（2）其他赋值语句中，增1操作比例较大，许多机器都有专门的增1指令（3）条件转移和无条件转移占22％，38.2%,因此增强转移指令的功能，增加转移指令的种类是必要的 2018/11/9 中国科学技术大学

2）高级语言计算机系统 3）支持操作系统的优化实现－些特权指令缩小HL和ML的差别时，走到极端，就是把HL和ML合二为一，即所谓的高级语言计算机。在这种机器中，高级语言不需要经过编译，直接由机器硬件来执行。如LISP机，PROLOG机 3）支持操作系统的优化实现－些特权指令任何一种计算机系统通常需要操作系统，而OS又必须用指令系统来实现，指令系统对OS的支持主要有处理器工作状态和访问方式的转换进程的管理和切换存储管理和信息保护进程同步和互斥，信号量的管理等 2018/11/9 中国科学技术大学

03/01-Review ISA需考虑的问题 ISA的类型寻址方式 Class of ISA Memory addressing Types and sizes of operands Operations Control flow instructions Encoding an ISA ISA的类型通用寄存器型占主导地位寻址方式重要的寻址方式: 偏移寻址方式, 立即数寻址方式, 寄存器间址方式 SPEC测试表明，使用频度达到 75%--99% 偏移字段的大小应该在 12 - 16 bits, 可满足75%-99%的需求立即数字段的大小应该在 8 -16 bits, 可满足50%-80%的需求 03/04 1 chapter 1 性能综合评估算术平均调和平均几何平均 2 Chapter 2 ISA的分类、寻址方式 2018/11/9 中国科学技术大学

03/01-Review (续) ISA的功能设计 CISC（Complex Instruction Set Computer）任务：确定硬件支持哪些操作方法：统计的方法两种类型：CISC和RISC CISC（Complex Instruction Set Computer）目标：强化指令功能，减少运行的指令条数，提高系统性能方法：面向目标程序的优化，面向高级语言和编译器的优化 RISC（Reduced Instruction Set Computer）目标：通过简化指令系统，用高效的方法实现最常用的指令方法：充分发挥流水线的效率，降低（优化）CPI 2018/11/9 中国科学技术大学

RISC指令集结构的功能设计采用RISC体系结构的微处理器 SUN Microsystem: SPARC, SuperSPARC, Ulta SPARC SGI: MIPS R4000, R5000, R10000, IBM: Power PC Intel: 80860, 80960 DEC: Alpha Motorola 88100 HP HP300/930系列，950系列 ARM，MIPS 2018/11/9 中国科学技术大学

从CISC到RISC 1974 John Cocke 和他的团队在给电话交换机设计控制器时，开始研究指令系统的合理性问题，John Cocke (1987年图灵奖获得者）70年代末 IBM推出 801CPU, 1986年推出IBM RT PC（或称6150） RISC architecture The first prototype computer to use reduced instruction set computer (RISC) architecture was designed by IBM researcher John Cocke and his team in the late 1970s. For his efforts, Cocke received the Turing Award in 1987, the US National Medal of Science in 1994, and the US National Medal of Technology in 1991. IBM RT PC The RISC Technology Personal Computer (RT PC) was introduced in 1986, and featured the 32-bit RISC architecture. 2018/11/9 中国科学技术大学

1979 David Patterson 研究了CISC指令系统主要存在以下几方面的问题： VLSI设计困难，不利于单片集成；许多复杂指令操作复杂，运行速度慢；各条指令不规整，不利于先进计算机体系结构技术来提高系统的性能。 1981: Patterson等人研制成功了32位RISC I 微处理器。31种指令，三种数据类型，只有变址和相对寻址两种寻址方式，字长 32位 1983：RISC II 2018/11/9 中国科学技术大学

RISC的定义和特点 RISC是一种计算机体系结构的设计思想，它不是一种产品。RISC是近代计算机体系结构发展史中的一个里程碑，直到现在，RISC还没有一个确切的定义早期对RISC特点的描述大多数指令在单周期内完成采用Load/Store结构硬布线控制逻辑减少指令和寻址方式的种类固定的指令格式注重代码的优化从目前的发展看，RISC体系结构还应具有如下特点：面向寄存器结构十分重视流水线的执行效率－尽量减少断流重视优化编译技术 2018/11/9 中国科学技术大学

减少指令平均执行周期数是RISC思想的精华 90年代，IEEE的Michael Slater对RISC的定义做了如下描述： RISC为使流水线高效执行，应具有如下特征：简单而统一格式的指令译码大部分指令可以单周期执行完成只有Load/Store指令访存简单寻址方式采用Load延迟技术 RISC为使优化编译便于产生优化代码，应具有如下特征：三地址指令格式较多的寄存器对称的指令格式减少指令平均执行周期数是RISC思想的精华 2018/11/9 中国科学技术大学

问题 RISC的指令系统精简了，CISC中的一条指令可能由一串指令才能完成，那么为什么RISC执行程序的速度比CISC还要快？ ExecuteTime = CPI X IC X T IC CPI T CISC 1 2~15 33ns~5ns RISC 1.3~1.4 1.1~1.4 10ns~2ns IC : 实际统计结果，RISC的IC只比CISC 长30％~40% CPI: CISC CPI一般在4~6之间，RISC 一般CPI =1 , Load/Store 为2 T: RISC采用硬布线逻辑，指令要完成的功能比较简单 2018/11/9 中国科学技术大学

硬件方面：硬布线控制逻辑，减少指令和寻址方式的种类，使用固定格式，采用Load/Store，指令执行过程中设置多级流水线。 RISC为什么会减少CPI 硬件方面：硬布线控制逻辑，减少指令和寻址方式的种类，使用固定格式，采用Load/Store，指令执行过程中设置多级流水线。软件方面：十分强调优化编译的作用 2018/11/9 中国科学技术大学

RISC的关键技术延时转移技术指令取消技术指令流调整技术重叠寄存器窗口技术硬件为主固件为辅处理器中设置较多的寄存器堆，并划分很多窗口每个过程使用其中相邻的三个窗口和一个公共的窗口一个与前一个过程共用，一个与下一个过程共用，一个作为局部寄存器硬件为主固件为辅 2013-03-08 2018/11/9 中国科学技术大学

重叠寄存器窗口技术图示 2018/11/9 中国科学技术大学

操作类型小结以下指令类型使用频度最高，指令系统应该支持这些类型的指令 load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return; 2018/11/9 中国科学技术大学

控制类指令四种类型的控制流改变：条件分支( Conditional branch) 、跳转(Jump)、过程调用(Procedure calls)、过程返回(Procedure returns) Alpha Architecture with full optimization for Spec CPU2000, showing the average of integer programs(CINT2000) and the average of floating-point programs (CFP2000) 2018/11/9 中国科学技术大学

控制流类指令中的寻址方式 PC-relative 方式（相对寻址）说明动态的转移地址方式: 编译时不知道转移地址，程序执行时动态确定转移地址放到某一寄存器中 ……. 2018/11/9 中国科学技术大学

转移目标地址与当前指令的距离 Alpha Architecture with full optimization for Spec CPU2000, showing the average of integer programs(CINT2000) and the average of floating-point programs (CFP2000) 建议：PC-relative 寻址，偏移地址至少8位 2018/11/9 中国科学技术大学

分支比较类型比较 Alpha Architecture with full optimization for Spec CPU2000, showing the average of integer programs(CINT2000) and the average of floating-point programs (CFP2000) 2018/11/9 中国科学技术大学

指令编码 Variable: Fixed: Hybrid: … 2018/11/9 中国科学技术大学

指令格式选择策略 • 如果代码长度最重要，那么使用变长指令格式如果性能至关重要，使用固定长度指令有些嵌入式CPU附加可选模式，由每一应用自己选择性能还是代码量有些机器使用边执行边解压的方式 2018/11/9 中国科学技术大学

指令格式如果每条指令存在多个存储器操作数，或有多种寻址方式 =>每一操作数都要说明其寻址方式如果每条指令存在多个存储器操作数，或有多种寻址方式 =>每一操作数都要说明其寻址方式对于Load-store型机器，每条指令只有一个存储器地址，并且只有较少的寻址方式 => 可以将寻址方式反映在操作码中 2018/11/9 中国科学技术大学

MIPS 寻址方式/指令格式所有指令都是32位宽 Register Indirect? Register (direct) op rs rt rd register Immediate op rs rt immed Base+index op rs rt immed Memory register + PC-relative op rs rt immed Memory PC + Register Indirect? 2018/11/9 中国科学技术大学

Review 操作数的类型、大小 ISA的功能设计：任务为确定硬件支持哪些操作。方法是统计的方法。存在CISC和RISC两种类型单字、双字的数据访问具有较高的频率，定义操作数字段长度为64位，更具有一般性 ISA的功能设计：任务为确定硬件支持哪些操作。方法是统计的方法。存在CISC和RISC两种类型 CISC（Complex Instruction Set Computer）目标：强化指令功能，减少指令的指令条数，以提高系统性能基本方法：面向目标程序的优化，面向高级语言和编译器的优化 RISC（Reduced Instruction Set Computer）目标：通过简化指令系统，用最高效的方法实现最常用的指令主要手段：充分发挥流水线的效率，降低（优化）CPI 控制转移类指令指令编码（指令格式） 2018/11/9 中国科学技术大学

ISA的演进 2018/11/9 中国科学技术大学

-Review 操作数的类型、大小 ISA的功能设计：任务为确定硬件支持哪些操作。方法是统计的方法。存在CISC和RISC两种类型单字、双字的数据访问具有较高的频率，定义操作数字段长度为64位，更具有一般性 ISA的功能设计：任务为确定硬件支持哪些操作。方法是统计的方法。存在CISC和RISC两种类型 CISC（Complex Instruction Set Computer）目标：强化指令功能，减少指令的指令条数，以提高系统性能基本方法：面向目标程序的优化，面向高级语言和编译器的优化 RISC（Reduced Instruction Set Computer）目标：通过简化指令系统，用最高效的方法实现最常用的指令主要手段：充分发挥流水线的效率，降低（优化）CPI 03/09 1st 1、操作数的类型和大小 2、ISA的功能设计 CISC- 面向目标机器的优化 2nd CISC-面向高级语言和编译器的优化 RISC 基本特性、关键技术 2018/11/9 中国科学技术大学

编译技术与计算机体系结构设计使用汇编语言编程已不是主流编译器的设计目标现在使用汇编的人很少因此编译器更需要ISA 正确性编译后代码的速度其他目标编译的速度调试的支持语言间的互操作性等语言互操作性是一种代码与使用其他编程语言编写的另一种代码进行交互的能力。语言互操作性可以有助于最大程度地提高代码的重复使用率，从而提高开发过程的效率 2018/11/9 中国科学技术大学

现代编译器的典型结构 2018/11/9 中国科学技术大学

优化类型 Major types of optimizations and examples in each class. These data tell us about the relative frequency of occurrence of various optimizations. The third column lists the static frequency with which some of the common optimizations are applied in a set of 12 small FORTRAN and Pascal programs. There are nine local and global optimizations done by the compiler included in the measurement. Six of these optimizations are covered in the figure, and the remaining three account for 18% of the total static occurrences. The abbreviation N.M. means that the number of occurrences of that optimization was not measured. Processor-dependent optimizations are usually done in a code generator, and none of those was measured in this experiment. The percentage is the portion of the static optimizations that are of the specified type. Data from Chow [1983] (collected using the Stanford UCODE compiler). 2018/11/9 中国科学技术大学

Level 0 is the same as unoptimized code Level 0 is the same as unoptimized code. Level 1 includes local optimizations, code scheduling, and local register allocation. Level 2 includes global optimizations, loop transformations (software pipelining), and global register allocation. Level 3 adds procedure integration. These experiments were performed on the Alpha compilers. Level 0 is the same as unoptimized code. Level 1 includes local optimizations, code scheduling, and local register allocation. Level 2 includes global optimizations, loop transformations (software pipelining), and global register allocation. Level 3 adds procedure integration. These experiments were performed on the Alpha compilers. Change in instruction count for the programs lucas and mcf from the SPEC2000 as compiler optimization levels vary. 2018/11/9 中国科学技术大学

Stanford Ucode Compiler 优化的效果 2018/11/9 中国科学技术大学

ISA设计需了解的有关Compiler的问题如何分配变量如何寻址变量需要多少寄存器优化技术对指令使用频度有何影响使用哪些控制结构这些控制结构使用频度 2018/11/9 中国科学技术大学

编译器中数据的组织（运行时环境）栈（Stack）全局数据区（global data area）堆（heap）用来分配局部变量用于存储活动记录（过程调用和返回），通过堆栈指针访问其中的内容。全局数据区（global data area）用来分配被静态说明的对象，如常量和全局变量。其中数组和其他聚集类型的数据结构占很大比例堆（heap）用于分配动态对象，用指针访问。全局变量和堆变量因为存在别名问题而无法分配到寄存器 2018/11/9 中国科学技术大学

寄存器分配问题栈中对象的寄存器分配相对简单全局变量的寄存器分配相对比较困难堆对象的寄存器分配几乎是不可能的原因：一般用指针来访问，使得几乎无法分配有些全局变量和栈对象由于存在别名问题也无法分配原因：存在多条路径来访问变量的地址。寄存器分配是优化的主要手段，因此在这方面的努力是非常重要的 2018/11/9 中国科学技术大学

关于编译优化的一些研究的结论减少分支语句非常困难大量的优化使得存储器访问操作减少可以减少一些ALU操作结果是控制类指令在统计上占较大比例控制类指令很难加速 2018/11/9 中国科学技术大学

编译技术与计算机体系结构设计小结有利于编译器的ISA 寄存器分配是关键问题保证所有的寻址方式可用于各种数据传送指令最小指令集规整性和正交性: 没有特殊的寄存器，例外情况尽可能少, 所有操作数模式可用于任何数据类型和指令类型,要求操作、数据类型和寻址方式必须正交，以简化代码生成过程。完整性: 支持基本的操作和满足目标的应用系统需求帮助编译器设计者了解各种代码序列的执行效率和代价，有助于编译器的优化对于在编译时就已经可确定的量，提供能够将其变为常数的指令寄存器分配是关键问题寄存器数目多有利于编译器的设计与实现提供至少16个通用寄存器和独立的浮点寄存器保证所有的寻址方式可用于各种数据传送指令最小指令集 2018/11/9 中国科学技术大学

一些统计比较结果 2018/11/9 中国科学技术大学

一些结论 Load-Store 型ISA，move类指令比例大于M-M型ISA Load-Store型ISA，Branch类指令比例小于M-M型ISA Load-Store型ISA，平均CPI和T较小 2018/11/9 中国科学技术大学

MIPS指令集结构 MIPS64的一个子集，简称为MIPS MIPS的寄存器 32个64位通用寄存器（GPRs） R0，R1，…，R31 也被称为整数寄存器 R0的值永远是0 32个64位浮点数寄存器（FPRs） F0，F1，…，F31 用来存放32个单精度浮点数（32位），也可以用来存放32个双精度浮点数（64位）。存储单精度浮点数（32位）时，只用到FPR的一半，其另一半没用一些特殊寄存器它们可以与通用寄存器交换数据。例如，浮点状态寄存器用来保存有关浮点操作结果的信息。 2018/11/9 中国科学技术大学

MIPS的数据表示整数字节（8位）半字（16位）字（32位）双字（64位）浮点数单精度浮点数（32位）双精度浮点数（64位）字节、半字或者字在装入64位寄存器时，用零扩展或者用符号位扩展来填充该寄存器的剩余部分。装入以后，对它们将按照64位整数的方式进行运算。 2018/11/9 中国科学技术大学

MIPS的寻址方式立即数寻址与偏移量寻址寄存器间接寻址是通过把0作为偏移量来实现的立即数字段和偏移量字段都是16位的。寄存器间接寻址是通过把0作为偏移量来实现的 16位绝对寻址是通过把R0（其值永远为0）作为基址寄存器来完成的 MIPS的存储器是按字节寻址的，地址为64位所有存储器访问都必须是边界对齐的 2018/11/9 中国科学技术大学

存储器中的数据存放（存储字长为 32 位）边界对准边界未对准 4 8 12 16 20 24 28 32 36 4 8 地址（十进制） 4 8 12 16 20 24 28 32 36 双字双字（地址32）双字（地址24）半字（地址20）半字（地址22）半字（地址16）半字（地址18）字节（地址 8）字节（地址 9）字节（地址10）字节（地址11）字（地址 4）字（地址 0）字节（地址14）字节（地址15）字节（地址13）字节（地址12）边界对准 ▲ ▲ 地址（十进制） 4 8 字节( 地址7) 字节( 地址6) 字( 地址2) 半字( 地址10) 半字( 地址8) 半字( 地址0) 字( 地址4) 边界未对准 2018/11/9 中国科学技术大学

MIPS的指令格式寻址方式编码到操作码中所有的指令都是32位的操作码占6位 3种指令格式 I类 R类 J类 2018/11/9 中国科学技术大学

I类指令包括所有的load和store指令、立即数指令，分支指令、寄存器跳转指令、寄存器链接跳转指令。立即数字段为16位，用于提供立即数或偏移量。 2018/11/9 中国科学技术大学

具体的I类指令 load指令 store指令立即数指令分支指令寄存器跳转、寄存器跳转并链接访存有效地址：Regs[rs]＋immediate 从存储器取来的数据放入寄存器rt store指令要存入存储器的数据放在寄存器rt中立即数指令 Regs[rt] ← Regs[rs] op immediate 分支指令转移目标地址：Regs[rs]＋immediate，rt无用寄存器跳转、寄存器跳转并链接转移目标地址为Regs[rs] 2018/11/9 中国科学技术大学

R类指令包括ALU指令、专用寄存器读/写指令、move指令等 ALU指令 Regs[rd]← Regs[rs] funct Regs[rt] func为具体的运算操作编码 2018/11/9 中国科学技术大学

J类指令包括跳转指令、跳转并链接指令、自陷指令、异常返回指令在这类指令中，指令字的低26位是偏移量，它与PC值相加形成跳转的地址 2018/11/9 中国科学技术大学

MIPS的操作 MIPS指令可以分为四大类符号的意义 load和store、ALU操作、分支与跳转、浮点操作 x←ny：从y传送n位到x x，y←z：把z传送到x和y 下标：表示字段中具体的位；对于指令和数据，按从最高位到最低位（即从左到右）的顺序依次编号，最高位为第0位，次高位为第1位，依此类推。下标可以是一个数字，也可以是一个范围。例如：　Regs[R4]0：R4的符号位；Regs[R4]56..63：R4的最低字节 2018/11/9 中国科学技术大学

Mem：表示主存；按字节寻址，可以传输任意个字节。上标：用于表示对字段进行复制的次数。例如：0 32：一个32位长的全0字段符号##：用于两个字段的拼接，并且可以出现在数据传送的任何一边。举例：R8、R10：64位的寄存器，则 Regs[R8]32..63 ←32（Mem [Regs[R6]]0）24 ## Mem [Regs[R6]] 表示的意义是：以R6的内容作为地址访问内存，得到的字节按符号位扩展为32位后存入R8的低32位，R8的高32位（即 Regs[R8]0..31）不变。 2018/11/9 中国科学技术大学

load和store指令指令举例指令名称含义 LD R2，20(R3) 装入双字 Regs[R2]←64 Mem[20+Regs[R3]] LW R2，40(R3) 装入字 Regs[R2]←64 (Mem[40+Regs[R3]]0)32 ## Mem[40+Regs[R3]] LB R2，30(R3) 装入字节 Regs[R2]←64 (Mem[30+Regs[R3]]0)56 ## Mem[30+Regs[R3]] LBU R2，40(R3) 装入无符号字节 Regs[R2]←64 056 ## Mem[40+Regs[R3]] LH R2，30(R3) 装入半字 Regs[R2]←64 (Mem[30+Regs[R3]]0)48 ## Mem[30+Regs[R3]]## Mem[31+Regs[R3]] L.S F2，60(R4) Regs[F2]←64 Mem[60+Regs[R4]] ## 032 L.D F2，40(R3) 装入双精度浮点数 Regs[F2]←64 Mem[40+Regs[R3]] SD R4，300(R5) 保存双字 Mem[300+Regs[R5]]←64 Regs[R4] SW R4，300(R5) 保存字 Mem[300+Regs[R5]]←32 Regs[R4] S.S F2，40(R2) 保存单精度浮点数 Mem[40+Regs[R2]]←32 Regs[F2] 0··31 SH R5，502(R4) 保存半字 Mem[502+Regs[R4]]←16 Regs[R5] 48··.63

ALU指令指令举例指令名称含义寄存器-寄存器型（RR型）指令或立即数型算术和逻辑操作：加、减、与、或、异或和移位等 DADDU R1，R2，R3 无符号加 Regs[R1]← Regs[R2]+ Regs[R3] DADDIU R4，R5，#6 加无符号立即数 Regs[R4]← Regs[R5]+6 LUI R1，#4 把立即数装入到一个字的高16位 Regs[R1]← 032 ## 4 ## 016 DSLL R1，R2，#5 逻辑左移 Regs[R1]← Regs[R2]<<5 DSLT R1，R2，R3 置小于 If(Regs[R2]< Regs[R3]) Regs[R1]← 1 else Regs[R1]←0 R0的值永远是0，它可以用来合成一些常用的操作。例如： DADDIU R1，R0，#100 //给寄存器R1装入常数100 DADD R1，R0，R2 //把寄存器R2中的数据传送到寄存器R1

MIPS控制类指令指令举例指令名称含义 J name 跳转 PC 36··63← name<<2 JAL name 跳转并链接 Regs[R31]←PC+4；PC 36··63←name<<2； ((PC+4)－227)≤name＜((PC+4)+227) JALR R3 寄存器跳转并链接 Regs[R31]←PC+4；PC← Regs[R3] JR R5 寄存器跳转 PC← Regs[R5] BEQZ R4，name 等于零时分支 if(Regs[R4]== 0) PC←name ； ((PC+4)－217)≤name＜((PC+4)+217) BNE R3，R4，name 不相等时分支 if(Regs[R3]!= Regs[R4]) PC←name MOVZ R1，R2，R3 等于零时移动 if(Regs[R3]==0) Regs[R1]← Regs[R2] 2018/11/9 中国科学技术大学

跳转指令根据跳转指令确定目标地址的方式不同以及跳转时是否链接，可以把跳转指令分成4种。确定目标地址的方式跳转的两种类型把指令中的26位偏移量左移2位后，替换程序计数器的低28位。间接跳转：由指令中指定的一个寄存器来给出转移目标地址。跳转的两种类型简单跳转：把目标地址送入程序计数器。跳转并链接：把目标地址送入程序计数器，把返回地址（即顺序下一条指令的地址）放入寄存器R31。 2018/11/9 中国科学技术大学

分支指令（条件转移）分支条件由指令确定。提供一组比较指令，用于比较两个寄存器的值。例如：测试某个寄存器的值是否为零提供一组比较指令，用于比较两个寄存器的值。例如：“置小于”指令有的分支指令可以直接判断寄存器内容是否为负，或者比较两个寄存器是否相等。分支的目标地址。由16位带符号偏移量左移两位后和PC相加的结果来决定一条浮点条件分支指令：通过测试浮点状态寄存器来决定是否进行分支。 2018/11/9 中国科学技术大学

MIPS的浮点操作由操作码指出操作数是单精度（SP）或双精度（DP）浮点操作浮点数比较指令后缀S：表示操作数是单精度浮点数包括加、减、乘、除，分别有单精度和双精度指令。浮点数比较指令根据比较结果设置浮点状态寄存器中的某一位，以便于后面的分支指令BC1T（若真则分支）或BC1F（若假则分支）测试该位，以决定是否进行分支。 2018/11/9 中国科学技术大学

本章小结 90年代ISA的主要变化： ISA设计的发展趋势 Address size doubles Optimization of conditional branches via conditional execution Optimization of cache performance via prefetch Support for multimedia Faster floating-point operations ISA设计的发展趋势 Long instruction words Increased conditional execution Blending of general-purpose and DSP architectures 80x86 emulation 2018/11/9 中国科学技术大学

Acknowledgements These slides contain material developed and copyright by: John Kubiatowicz (UCB) Krste Asanovic (UCB) David Patterson (UCB) Chenxi Zhang (Tongji) UCB material derived from course CS152、CS252、CS61C KFUPM material derived from course COE501、COE502 11/9/2018 中国科学技术大学

附件 MIPS I Operation Overview 2018/11/9 中国科学技术大学

MIPS I Operation Overview Arithmetic Logical: Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access: LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR 2018/11/9 中国科学技术大学

Multiply / Divide Start multiply, divide MULT rs, rt MULTU rs, rt DIV rs, rt DIVU rs, rt Move result from multiply, divide MFHI rd MFLO rd Move to HI or LO MTHI rd MTLO rd Why not Third field for destination? (Hint: how many clock cycles for multiply or divide vs. add?) Registers HI LO 2018/11/9 中国科学技术大学

Data Types Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word Character: ASCII 7 bit code UNICODE 16 bit code Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers: 2's Complement Floating Point: Single Precision Double Precision Extended Precision exponent How many +/- #'s? Where is decimal pt? How are +/- exponents represented? E M x R base 2018/11/9 mantissa 中国科学技术大学

Operand Size Usage Support for these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers 2018/11/9 中国科学技术大学

MIPS arithmetic instructions Instruction Example Meaning Comments add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptions multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder Hi = $2 mod $3 divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi Move from Lo mflo $1 $1 = Lo Used to get copy of Lo Which add for address arithmetic? Which add for integers? 2018/11/9 中国科学技术大学

MIPS logical instructions Instruction Example Meaning Comment and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR xor xor $1,$2,$3 $1 = $2 Å $3 3 reg. operands; Logical XOR nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable 2018/11/9 中国科学技术大学

MIPS data transfer instructions Instruction Comment SW 500(R4), R3 Store word SH 502(R2), R3 Store half SB 41(R3), R2 Store byte LW R1, 30(R2) Load word LH R1, 40(R3) Load halfword LHU R1, 40(R3) Load halfword unsigned LB R1, 40(R3) Load byte LBU R1, 40(R3) Load byte unsigned LUI R1, 40 Load Upper Immediate (16 bits shifted left by 16) Why need LUI? LUI R5 R5 0000 … 0000 2018/11/9 中国科学技术大学

When does MIPS sign extend? When value is sign extended, copy upper bit to full value: Examples of sign extending 8 bits to 16 bits: 00001010  00000000 00001010 10001100  11111111 10001100 When is an immediate value sign extended? Arithmetic instructions (add, sub, etc.) sign extend immediates even for the unsigned versions of the instructions! Logical instructions do not sign extend Load/Store half or byte do sign extend, but unsigned versions do not. 2018/11/9 中国科学技术大学

Methods of Testing Condition Condition Codes Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. ex: add r1, r2, r3 bz label Condition Register Ex: cmp r1, r2, r3 bgt r1, label Compare and Branch Ex: bgt r1, r2, label 2018/11/9 中国科学技术大学

Conditional Branch Distance • 25% of integer branches are 2 to 4 instructions 2018/11/9 中国科学技术大学

Conditional Branch Addressing PC-relative since most branches are relatively close to the current PC At least 8 bits suggested (128 instructions) Compare Equal/Not Equal most important for integer programs (86%) 2018/11/9 中国科学技术大学

MIPS Compare and Branch BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch BNE rs, rt, offset <> Compare to zero and Branch BLEZ rs, offset if R[rs] <= 0 then PC-relative branch BGTZ rs, offset > BLT < BGEZ >= BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31) BGEZAL >=! Remaining set of compare and branch ops take two instructions Almost all comparisons are against zero! 2018/11/9 中国科学技术大学

MIPS jump, branch, compare instructions Instruction Example Meaning branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2’s comp. set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2’s comp. set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; natural numbers set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; natural numbers jump j 10000 go to 10000 Jump to target address jump register jr $31 go to $31 For switch, procedure return jump and link jal 10000 $31 = PC + 4; go to 10000 For procedure call 2018/11/9 中国科学技术大学

Signed vs. Unsigned Comparison After executing these instructions: slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0 slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0 sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0 sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0 What are values of registers r4 - r7? Why? r4 = ; r5 = ; r6 = ; r7 = ; two two two 2018/11/9 中国科学技术大学

Calls: Why Are Stacks So Great? Stacking of Subroutine Calls & Returns and Environments: A A: CALL B CALL C C: RET B: A B A B C A B A Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS) 2018/11/9 中国科学技术大学

Memory Stacks Useful for stacked environments/subroutine call & return even if operand stack not part of architecture Stacks that Grow Up vs. Stacks that Grow Down: inf. Big 0 Little Next Empty? grows up grows down Memory Addresses c b Last Full? SP a 0 Little inf. Big How is empty stack represented? Little --> Big/Last Full POP: Read from Mem(SP) Decrement SP PUSH: Increment SP Write to Mem(SP) Little --> Big/Next Empty POP: Decrement SP Read from Mem(SP) PUSH: Write to Mem(SP) Increment SP 2018/11/9 中国科学技术大学

Call-Return Linkage: Stack Frames High Mem ARGS Reference args and local variables at fixed (positive) offset from FP Callee Save Registers (old FP, RA) Local Variables FP Grows and shrinks during expression evaluation SP Low Mem Many variations on stacks possible (up/down, last pushed / next ) Compilers normally keep scalar variables in registers, not memory! 2018/11/9 中国科学技术大学

MIPS: Software conventions for Registers 0 zero constant 0 1 at reserved for assembler 2 v0 expression evaluation & 3 v1 function results 4 a0 arguments 5 a1 6 a2 7 a3 8 t0 temporary: caller saves . . . (callee can clobber) 15 t7 16 s0 callee saves . . . (callee must save) 23 s7 24 t8 temporary (cont’d) 25 t9 26 k0 reserved for OS kernel 27 k1 28 gp Pointer to global area 29 sp Stack pointer 30 fp frame pointer 31 ra Return Address (HW) 2018/11/9 中国科学技术大学

MIPS / GCC Calling Conventions fact: addiu $sp, $sp, -32 sw $ra, 20($sp) sw $fp, 16($sp) addiu $fp, $sp, 32 . . . sw $a0, 0($fp) ... lw $31, 20($sp) lw $fp, 16($sp) addiu $sp, $sp, 32 jr $31 FP SP ra low address FP SP ra ra old FP FP SP ra old FP First four arguments passed in registers. 2018/11/9 中国科学技术大学

Details of the MIPS instruction set Register zero always has the value zero (even if you try to write it) Branch/jump and link put the return addr. PC+4 or 8 into the link register (R31) (depends on logical vs physical architecture) All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, …) Immediate arithmetic and logical instructions are extended as follows: logical immediates ops are zero extended to 32 bits arithmetic immediates ops are sign extended to 32 bits (including addu) The data loaded by the instructions lb and lh are extended as follows: lbu, lhu are zero extended lb, lh are sign extended Overflow can occur in these arithmetic and logical instructions: add, sub, addi it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu 2018/11/9 中国科学技术大学

Delayed Branches li r3, #7 sub r4, r4, 1 bz r4, LL addi r5, r3, 1 subi r6, r6, 2 LL: slt r1, r3, r5 In the “Raw” MIPS, the instruction after the branch is executed even when the branch is taken? This is hidden by the assembler for the MIPS “virtual machine” allows the compiler to better utilize the instruction pipeline (???) 2018/11/9 中国科学技术大学

Branch & Pipelines Time li r3, #7 execute sub r4, r4, 1 ifetch execute bz r4, LL ifetch execute Branch addi r5, r3, 1 ifetch execute Delay Slot LL: slt r1, r3, r5 ifetch execute Branch Target By the end of Branch instruction, the CPU knows whether or not the branch will take place. However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken. Why not execute it? 2018/11/9 中国科学技术大学

Filling Delayed Branches Inst Fetch Dcd & Op Fetch Execute execute successor even if branch taken! Inst Fetch Dcd & Op Fetch Execute Then branch target or continue Inst Fetch Single delay slot impacts the critical path add r3, r1, r2 sub r4, r4, 1 bz r4, LL NOP ... LL: add rd, ... Compiler can fill a single delay slot with a useful instruction 50% of the time. try to move down from above jump move up from target, if safe Is this violating the ISA abstraction? 2018/11/9 中国科学技术大学

Miscellaneous MIPS I instructions break A breakpoint trap occurs, transfers control to exception handler syscall A system trap occurs, transfers control to exception handler coprocessor instrs. Support for floating point TLB instructions Support for virtual memory: discussed later restore from exception Restores previous interrupt mask & kernel/user mode bits into status register load word left/right Supports misaligned word loads store word left/right Supports misaligned word stores 2018/11/9 中国科学技术大学

review ISA的分类寻址方式指令格式累加器型、堆栈型和寄存器型立即寻址、偏移寻址和寄存器寻址偏移量字段：12 ～ 16 bits 立即数字段： 8 to 16 bits 指令格式变长指令格式、定长指令格式、以上两种格式的折中 2018/11/9 中国科学技术大学

review ISA功能设计－确定提供哪些操作类型。 CISC RISC 目标：强化指令功能，减少指令的指令条数，以提高系统性能基本方法：面向目标程序的优化，面向高级语言和编译器的优化 RISC 目标：通过简化指令系统，用最高效的方法实现最常用的指令主要手段：充分发挥流水线的效率，提高CPI 2018/11/9 中国科学技术大学

Summary: Salient features of MIPS I 32-bit fixed format inst (3 formats) 32 32-bit GPR (R0 contains zero) and 32 FP registers (and HI LO) partitioned by software convention 3-address, reg-reg arithmetic instr. Single address mode for load/store: base+displacement no indirection, scaled 16-bit immediate plus LUI Simple branch conditions compare against zero or two registers for =, no integer condition codes Delayed branch execute instruction after a branch (or jump) even if the branch is taken (Compiler can fill a delayed branch with useful work about 50% of the time) 2018/11/9 中国科学技术大学

Summary: Instruction set design (MIPS) Use general purpose registers with a load-store architecture: YES Provide at least 16 general purpose registers plus separate floating-point registers: 31 GPR & 32 FPR Support basic addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; : YES: 16 bits for immediate, displacement (disp=0 => register deferred) All addressing modes apply to all data transfer instructions : YES Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size : Fixed Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers: YES Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, and return: YES, 16b Aim for a minimalist instruction set: YES 2018/11/9 中国科学技术大学

review ISA 研究的问题研究方法：基于统计的方法研究的一些结论 ISA的分类操作数部分理解存储器地址（尾端和对齐问题）寻址方式操作数的类型、表示和大小问题操作码部分支持哪些类型的操作指令格式研究方法：基于统计的方法研究的一些结论常用寻址方式：立即数、偏移寻址、寄存器间址偏移字段的大小应该在 12 - 16 bits，可满足75%-99%的需求立即数字段的大小应该在 8 -16 bits，可满足50%-80%的需求操作数大小：单字、双字的数据访问具有较高的频率，定义操作数字段长度为64位，更具有一般性 2018/11/9 中国科学技术大学

ISA的功能设计控制类指令 CISC 任务：确定硬件支持哪些操作方法：统计的方法两种类型：CISC和RISC 条件分支最常用寻址方式：PC-relative 和偏移地址至少8位，说明动态的转移地址方式 CISC 强化指令功能，减少程序中指令条数，以提高系统性能 2018/11/9 中国科学技术大学

Review 功能设计指令格式编译技术与计算机系统结构编译优化-4个层次变长指令格式、定长指令格式、 RISC vs. CISC 以上两种格式的折中编译技术与计算机系统结构现代编译器的基本结构编译优化-4个层次高层优化：一般在源码上进行，同时把输出传递给以后的优化扫描步骤局部优化：仅在一系列代码片断之内（基本块）将代码优化全局优化：将局部优化扩展为跨越分支，并且引入一组针对优化循环的转换与机器相关的优化：充分利用特定的系统结构 2018/11/9 中国科学技术大学