Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Processor: Datapath and Control

Similar presentations


Presentation on theme: "The Processor: Datapath and Control"— Presentation transcript:

1 The Processor: Datapath and Control
Computer Organization & Design 5th. Chapter 4 The Processor: Datapath and Control 處理器:資料路徑與控制 ROBERT CHEN

2 Outlines Introduction Logic Design Conventions Building a Datapath
Computer Organization & Design 5th. Outlines Introduction Logic Design Conventions Building a Datapath A Simple Implementation Scheme A Multicycle Implementation Exception

3 Introduction 計算機的效能受到下面三個因素影響:
Computer Organization & Design 5th. Introduction 計算機的效能受到下面三個因素影響: 指令的數目(instruction count) 每個指令的時脈週期數目 (CPI) 整數指令, 算數邏輯指令, 記憶體相關指令及分支 時脈週期的長短(clock cycle time) 編譯器(compiler)和指令集架構(ISA)決定了一個程式所需的指令數目的多寡。 時脈週期的長度和每個指令的時脈週期數目(CPI)卻是由處理器本身的製作方式來決定。 在本章中,我們分別對於兩種不同的MIPS指令製作方式,建構出其資料路徑和控制單元。 單一時脈製作方法 多重時脈製作方法

4 Introduction 製作MIPS時,其功能單元包含兩個不同的邏輯元件: 能運算資料的元件 含狀態的元件 例:ALU
Computer Organization & Design 5th. Introduction 製作MIPS時,其功能單元包含兩個不同的邏輯元件: 能運算資料的元件 例:ALU 組合式(元件的輸出值僅取決於現有的輸入值) 含狀態的元件 例:記憶體和暫存器檔案 循序式(輸出值決定在輸入值及其內部的狀態) 循序邏輯

5 Introduction 執行指令的階段 圖4.1以高階的概觀圖來說明MIPS的製作方式 指令擷取(Instruction Fetch)
Computer Organization & Design 5th. Introduction 執行指令的階段 指令擷取(Instruction Fetch) 解碼 (Decode) 運算元擷取 (Operand Fetch) 執行(Execute) 寫回(Write back) 圖4.1以高階的概觀圖來說明MIPS的製作方式

6 Introduction We're ready to look at an implementation of the MIPS
Computer Organization & Design 5th. Introduction We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j

7 Introduction State Elements Unclocked vs. Clocked
Computer Organization & Design 5th. Introduction State Elements Unclocked vs. Clocked Clocks used in synchronous logic when should an element that contains state be updated? cycle time rising edge falling edge

8 Introduction An unclocked state element Latches and Flip-flops
Computer Organization & Design 5th. Introduction An unclocked state element The set-reset latch output depends on present inputs and also on past inputs Latches and Flip-flops Latches and flip-flops are the simplest memory elements. Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) A clocking methodology defines when signals can be read and written Wouldn't want to read a signal at the same time it was being written

9 Introduction D-latch Two inputs: Two outputs:
Computer Organization & Design 5th. Introduction D-latch Two inputs: the data value to be stored (D) the clock signal (C) indicating when to read & store D Two outputs: the value of the internal state (Q) and it's complement When the latch is open (C asserted), the value of Q changes as D changes transparent latch.

10 Introduction D flip-flop(D型正反器) Flip-flops are not transparent
Computer Organization & Design 5th. Introduction D flip-flop(D型正反器) Flip-flops are not transparent Output changes only on the clock edge The first latch, called the master, is open and follows the input D when C is asserted. When the clock input falls, the first latch is closed, but the 2nd latch, called the slave, is open and gets its input from the output of the master latch. Q _ D l a t c h C

11 Introduction Set-up time and Hold time D C
Computer Organization & Design 5th. Introduction Set-up time and Hold time Set-up time: the minimum time that the input must remain valid before the clock edge Hold time: the minimum time that the input must be valid after the clock edge (usually very small) D C Set-up time Hold time

12 Introduction An edge triggered methodology(邊緣觸發)
Computer Organization & Design 5th. Introduction An edge triggered methodology(邊緣觸發) Decide signals when to be read, when to be written Typical execution: read contents of some state elements, send values through some combinational logic write results to one or more state elements C l o c k y e S t a m n 1 b i g 2

13 Introduction Register File(暫存器檔案)
Computer Organization & Design 5th. Introduction Register File(暫存器檔案) A register file consists of a set of registers that can be read and written by supplying a register number to be accessed. Built using D flip-flops and decoders (specify register number) Read part (left) : supply a register number as input, and the output is the information stored in that register. A register file with 2 read ports and 1 write ports. (right) M u x R e g i s t r 1 n a d 2 m b R e a d r g i s t n u m b 1 2 f l W

14 Introduction Register File
Computer Organization & Design 5th. Introduction Register File Write part: need 3 inputs: a register number, the data to write, and a clock that controls the writing into the register. Note: we still use the real clock to determine when to write n - t o 1 d e c r R g i s C D u m b W a

15 Introduction Simple Implementation Basic components:
Computer Organization & Design 5th. Introduction Simple Implementation Basic components: two state elements instruction memory (指令記憶體)and program counter (PC) are needed to store and access instructions. An adder is needed to compute the next instruction address. Since the instruction memory is read-only(唯讀), we can treat it as combinational logic. P C I n s t r u c i o m e y a d . b g A S

16 Introduction Fetching instruction and incrementing PC (擷取指令並遞增PC)
Computer Organization & Design 5th. Introduction Fetching instruction and incrementing PC (擷取指令並遞增PC) A portion of the datapath used for fetching instructions and incrementing Program Counter PC送出位址讀取指令之後, 立刻PC+4,指到下一個指令 P C I n s t r u c i o m e y R a d 4 A

17 Introduction R-Format ALU operations
Computer Organization & Design 5th. Introduction R-Format ALU operations R-format instruction has 3 register operands, 2 read and 1 write Rg. add $t0, $t1, $t2 Register numbers are 5 bits to indicate 32 registers, data bus are 32 bits and ALU control has 4 bits A L U c o n t r l R e g W i s a d 1 2 u D m b . Z 5 4

18 Introduction Datapath for R-type Instruction Eg. add $t0, $t1, $t2 I n
Computer Organization & Design 5th. Introduction Datapath for R-type Instruction Eg. add $t0, $t1, $t2 I n s t r u c i o R e g W a d 1 2 A L U l Z p 4

19 Introduction Load and Store Instructions
Computer Organization & Design 5th. Introduction Load and Store Instructions Load and store instructions compute a memory address by adding the base register, to a 16-bit signed offset field contained in the instruction “Sign extension unit” extends the 16-bit data to 32-bit data by replicating the high-order sign bit to the extra higher 16-bit data Eg. lw $t0, 40($t1) sw $t0, 32($t1) 1 6 3 2 S i g n e x t d b . - s o u M m R a W r D y A

20 Introduction Datapath for load and store instructions 資料路徑的載入和儲存動作
Computer Organization & Design 5th. Introduction Datapath for load and store instructions 資料路徑的載入和儲存動作 暫存器的存取發生在記憶體位址計算之後。 對記憶體的讀取。 如果是載入指令,會有一個寫入動作到暫存器檔案中。 lw $t0, 40($t1) sw $t0, 32($t1) t1 I n s t r u c i o 1 6 3 2 R e g W a d D m y S x A L U l Z t0 40

21 Introduction J-type Instruction Branch datapath
Computer Organization & Design 5th. Introduction J-type Instruction Branch datapath Needs to compute the branch target address (計算分支目標位址) PC+4 is the address of the next instruction Offset field is left-shifted two bits to make a word offset. (PC0-27  Offset ) Needs to compare register contents(比較暫存器內容) 1 6 3 2 S i g n e x t d Z r o A L U u m h f l T b a c B P C + 4 s p I R W beq $t1, $t2, offset

22 Computer Organization & Design 5th.
Introduction 聖戰士組合 利用多工器(MUX)或資料選擇器(data selector)將R形態指令和記憶體指令的資料路徑組合起來, 而不用重複增加相同的功能單元 4

23 Computer Organization & Design 5th.
Introduction 聖戰士組合 加入指令擷取部份的資料路徑

24 Introduction 聖戰士組合 加入分支部份的資料路徑 跳躍指令目標位址=指令之偏移量+跳躍指令之位址
Computer Organization & Design 5th. Introduction 聖戰士組合 加入分支部份的資料路徑 跳躍指令目標位址=指令之偏移量+跳躍指令之位址

25 Introduction 大功告成? 最難的是Control Unit 之設計
Computer Organization & Design 5th. Introduction 大功告成? 最難的是Control Unit 之設計

26 A Simple Implementation Scheme
Computer Organization & Design 5th. A Simple Implementation Scheme 這個簡易的製作方式包含 載入字組 (lw) 及儲存字組 (sw) 相等分支 (beq) ALU 指令: add, sub, and , or, 及 set on less than 根據不同的指令形態,ALU需要可以做下列運算 加法 計算 lw 及 sw 的記憶體位址 減法 為了相等分支 AND, OR, subtraction, add, 或 slt 為了 R-形態指令需要 (由6位元的功能欄決定) ALU 控制輸入 0000 : AND 0001 : OR 0010 : 加法 0110 : 減法 0111 : 小於時設定 set on less than 1100 :NOR (for other MIPS instructions) ALU a b Zero Result Overflow CarryOut ALU-operation 4

27 A Simple Implementation Scheme
Computer Organization & Design 5th. A Simple Implementation Scheme Purpose Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of data (multiplexor inputs) How you get these control signals: Information comes from the 32 bits of the instruction Example: add $8, $17, $18 Instruction Format: ALU's operation based on instruction type and function code 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct

28 What Control Signals Do We Need?
Computer Organization & Design 5th. What Control Signals Do We Need?

29 Design Method for Control
Computer Organization & Design 5th. Design Method for Control Multi-level control (decoding) Instruction opcode: main control unit (first level) ALU control Sub-control for arithmetic MUX control Which source registers and destination registers ALU input source Input source of destination register Input source of PC Result for first level Seven 1-bit control lines 2-bit ALUOP control signals The above control signals can be set based solely on the opcode field of the instruction Exception: PCSrc (depends on the beq result)

30 A Simple Implementation Scheme
Computer Organization & Design 5th. A Simple Implementation Scheme ALU控制位元的控制是由 ALUOp 控制位元所決定 ALUOp是來用決定不同的指令型態 指令運算碼 ALUOp 指令的運算 功能欄位 需要的ALU運算 ALU的控制輸入 LW 00 載入字組 XXXXXX 加法 0010 SW 儲存字組 Branch equal 01 相等分支 減法 0110 R-type 10 100000 100010 AND 100100 and 0000 OR 100101 or 0001 小於時設定 101010 小於時設定slt 0111

31 ALU Control ALU Control Instructions using ALU Branch eq R-type
Computer Organization & Design 5th. ALU Control ALU Control Instructions using ALU Load/store address calculation – add lw $t1, offset(t2) Branch eq Subtract for comparison ‘taken’ or ‘not taken’ add/subtract for address calculation beq $t1, $t2, offset R-type and/or set-on-less-than ALU control 4 2 6 function field ALUOp operation

32 ALU Control Multi-level control (decoding)
Computer Organization & Design 5th. ALU Control Multi-level control (decoding) Instruction opcode: main control unit – first level 00 = lw, sw 01 = beq, 10 = arithmetic 2nd level: function code for arithmetic : sub control Main CU generates the ALUOP bits as inputs of the ALU control unit Reduce the size of main control but may increase the delay

33 ALU Control Truth table X : don’t care term
Computer Organization & Design 5th. ALU Control Truth table X : don’t care term All zeros or don’t care terms are eliminated Input Output 注意事項: 1.ALUOP 目前無 ’11’項 所以原來的’10’改成’1X’ 2.Funct field中F5F4皆為 ’10’故改成’XX’

34 設計主要的控制單元 指令的格式 Op 欄位:Op[5 : 0]
Computer Organization & Design 5th. 設計主要的控制單元 指令的格式 Op 欄位:Op[5 : 0] R 型指令、相等則分支(beq)指令及儲存指令中, 暫存器:指令的25 : 21 位元及20 : 16 位元的rs 欄位及 rt 欄位 載入及儲存指令中的基底暫存器:指令的25 : 21 位元(rs) 相等則分支(beq)指令﹑載入指令及儲存指令的16 位元偏移量(offset): 指令的15 : 0 位元

35 A Simple Implementation Scheme
Computer Organization & Design 5th. A Simple Implementation Scheme Seven single-bit control lines, one 2-bit ALUOp control signal Except for PCSrc, the control signal can be set solely based on the opcode field of the instruction. To generate PCSrc, we need to AND together a signal from the control unit, which we call Branch, with the Zero signal out of the ALU.

36 The Simple Datapath with the Control Unit
Computer Organization & Design 5th. The Simple Datapath with the Control Unit P C I n s t r u c i o m e y R a d [ 3 1 ] 2 6 5 A M g L U O p W B h D S 4 x l f Z

37 A Simple Implementation Scheme
Computer Organization & Design 5th. A Simple Implementation Scheme 為什麼單一時脈週期的製作方式不被採用? 每個指令的時脈週期都必須有相同長度(因此,CPI = 1) 計算機的運算處理指令中最長的路徑將決定時脈週期的長度 整體效能似乎不是很好 範例:單一時脈計算機的效能,假設功能單元的運算時間如下: 記憶體單元: 2 ns ALU 及加法器: 2 ns 暫存器檔案 (讀取或寫入): 1 ns 下列的製作方式那一種會比較快? 每個指令在一個固定長度的時脈週期內運作完成 每個指令在一個時脈週期內運作完成,但時脈週期長度是可變動

38 A Simple Implementation Scheme
Computer Organization & Design 5th. A Simple Implementation Scheme 範例 (續) 為了計算效能,假設我們使用下列指令的混合比例: 24% 載入, 12% 儲存, 44% R形態指令, 18% 分支及 2%跳躍指令 解答 1. CPU 時脈週期為 8 ns. 2. CPU 時脈週期 = 8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns 效能改進的比例為 8/6.3 = 1.27. 指令種類所用到的功能單元 R格式 指令擷取 暫存器存取 ALU 載入字組 記憶體存取 儲存字組 分支 跳躍 指令 種類 指令記憶體 暫存器讀取 ALU運算 資料記憶體 暫存器寫入 總和 R格式 2 1 6ns 載入字組 8ns 儲存字組 7ns 分支 5ns 跳躍 2ns

39 A Simple Implementation Scheme
Computer Organization & Design 5th. A Simple Implementation Scheme 範例 假設我們有浮點指令單元: 執行浮點加法需要8ns 執行浮點乘法需要16ns 所有功能單元所需的時間如同上例。下列的製作方式何會比較快? 1.每個指令在一個固定長度的時脈週期內運作完成 2.每個指令在一個時脈週期內運作完成,但時脈週期長度是可變動 為了計算效能,假設我們使用下列指令的混合比例: 31%載入, 21%儲存, 27% R形態指令, 5%分支,2% 跳躍指令, 7%浮點加法及7% FP浮點乘法 解答 1. 最長的指令為浮點乘法,其時脈週期為 = 20 ns 2. 浮點指令的加法須時 = 12 ns. CPU 時脈週期 = 8*31% + 7*21% + 6*27% + 5*5% + 2*2% +20*7% + 12*7%= 7.0 ns 效能改進的比例為20/7 = 2.9.

40 Design Main Control Unit
Computer Organization & Design 5th. Design Main Control Unit

41 Computer Organization & Design 5th.

42 Computer Organization & Design 5th.

43 Computer Organization & Design 5th.

44 Computer Organization & Design 5th.


Download ppt "The Processor: Datapath and Control"

Similar presentations


Ads by Google