The Processor: Datapath and Control (Multi-cycle implementation) Chapter5-2 The Processor: Datapath and Control (Multi-cycle implementation) 臺大電機系 吳安宇教授 V1. 11/17/2004 臺大電機吳安宇教授-計算機結構
Review of Single-cycle Implementation 臺大電機吳安宇教授-計算機結構
Single-cycle implementation Why a single-cycle implementation isn’t used today? Long cycle time for each instruction (load takes longest time) All instructions take as much time as the slowest one 臺大電機吳安宇教授-計算機結構
Outline 5.1 Introduction 5.2 Logic Design Conventions 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 5.5 A multi-cycle Implementation 臺大電機吳安宇教授-計算機結構
A multi-cycle Implementation Each step in the execution will take one clock cycle. Allow a function unit (e.q. ALU) to be used more than once per instruction, as long as it is used on different clock cycles. Advantage: Allow instructions to take different numbers of clock cycles. Share function units within the execution of a single instruction. The difference between single-cycle & multi-cycle implementation: A single memory unit is used for both instructions and data. A register is used to save the instruction after it is read from memory. – It is called “Instruction Register (IR)”. A single ALU is used, rather than an ALU + two adders. 臺大電機吳安宇教授-計算機結構
Added Temporary Registers The Instruction Register (IR) and the Memory Data Register (MDR) are added to save the output of memory for an instruction read and a data read, respectively. Two separate registers are used, since both values are needed during the same clock cycle (the IR needs to hold the instruction until the end of execution of that instruction, and thus will require a write control signal) The A and B registers are used to hold the register operand values read from the register file. The ALUOut register holds the output of the ALU. 臺大電機吳安宇教授-計算機結構
A multi-cycle Implementation Two sources for a memory address: a MUX to select The PC (for instruction access) ALUOut (for data access, lw, sw) A single ALU must accommodate all the inputs Two required changes to the datapath: An additional multiplexor: choose between the A register and the PC. A four-way multiplexor: the B register the constant 4 the sign-extended field the sign-extended and shifted offset field (2bits) 臺大電機吳安宇教授-計算機結構
Multi-cycle Datapath 臺大電機吳安宇教授-計算機結構
Adding Control Signals to Datapath 臺大電機吳安宇教授-計算機結構
Program Counter Control With the jump and branch instruction, there are 3 possible value to be written into the PC: Normal: The output of the ALU: PC+4, which should be stored directly into the PC The register ALUOut: the address of the branch target address When the instruction is a jump: The lower 26 bits of the IR shifted left by 2 and concatenated with the upper 4 bits of the incremented PC. PCWrite: causes an unconditional write of the PC PCWriteCond: causes a write of the PC if the branch condition is also true 臺大電機吳安宇教授-計算機結構
Complete Datapath 臺大電機吳安宇教授-計算機結構
Actions of the control signals Priority: PCWrite > PCWriteCond 臺大電機吳安宇教授-計算機結構
Actions of the 2-bit control signals 臺大電機吳安宇教授-計算機結構
Breaking the Instruction Execution into Clock Cycles The limitation of one ALU operation, one memory access, and one register file access determines what can fit in one step Breaking the Instruction Execution into Clock Cycles Instruction fetch step Instruction decode and register fetch step Execute, memory address computation, or branch completion Memory access or R-type instruction completion step Memory read completion step Each MIPS instruction needs 3 ~ 5 of these steps. 臺大電機吳安宇教授-計算機結構
Complete Datapath 臺大電機吳安宇教授-計算機結構
Instruction Fetch and Decode Instruction fetch step IR <= Memory [PC]; PC <= PC + 4; Instruction decode and register fetch step A <= Reg [IR [25:21]]; # get rs B <= Reg [IR [20:16]]; # get rt ALUOut <= PC + (sign-extend (IR[15:0]) << 2) # precompute branch target address 臺大電機吳安宇教授-計算機結構
Execution cycle Execute, memory address computation, or branch completion Memory reference: ALUOut <= A + sign-extend (IR[15:0]); R-type: ALUOut <= A op B; Branch: if (A==B) PC <= ALUOut; Jump: PC <= {PC[31:28], (IR[25:0], 2’b00)}; #{x, y} is the Verilog notation for concatenation of bit fields x and y 臺大電機吳安宇教授-計算機結構
Instruction Completion Steps Memory access or R-type instruction completion step Memory reference: MDR <= Memory [ALUOut]; # for lw or Memory [ALUOut] <= B; # for sw R-type: Reg [ IR [ 15:11 ] ] <= ALUOut; # completion of R-type Memory read completion step Load: Reg[IR[20:16]] <= MDR; 臺大電機吳安宇教授-計算機結構
A multi-cycle Implementation 臺大電機吳安宇教授-計算機結構
CPI in a Multi-cycle CPU: CPI in the multi-cycle CPU: 25% loads (1% load byte + 24% load word) 10% stores (1% store byte + 9% store word) 11% branches (6% beq, 5% bne) 2% jumps (1% jal + 1% jr) 52% ALU (all the rest) CPI = 0.25*5 + 0.10*4 + 0.52*4 + 0.11*3 + 0.02*3 = 4.12 This CPI is better than the worst-case CPI of 5.0 when all the instructions take the same number of clock cycles. Loads: 5 (clock cycles) Stores: 4 ALU instructions: 4 Branches: 3 Jumps: 3 臺大電機吳安宇教授-計算機結構
Techniques to Specify the Control Two different techniques to specify the control: Finite state machine (state diagram) Microprogramming (see Appendix) Microprogram: A symbolic representation of control in the form of instructions, called microinstructions, that are executed on a simple micromachine. 臺大電機吳安宇教授-計算機結構
Finite-state Machine Control The high-level view of the finite state machine control 臺大電機吳安宇教授-計算機結構
Instruction Fetch and Decode 臺大電機吳安宇教授-計算機結構
Memory-reference instructions 臺大電機吳安宇教授-計算機結構
R-type 臺大電機吳安宇教授-計算機結構
Branch 臺大電機吳安宇教授-計算機結構
Jump 臺大電機吳安宇教授-計算機結構
Complete State diagram 臺大電機吳安宇教授-計算機結構
Implementation of State Diagram Conventional way to implement the Control Unit B. Use Verilog/VHDL to implement the State Diagram 臺大電機吳安宇教授-計算機結構
Interrupt and Exception Interrupts were initially created to handle unexpected events like arithmetic overflow and to signal requests for service from I/O devices. Some events generated internally or externally: Type of event From where? MIPS terminology I/O device request External Interrupt Invoke the operating system from user program Internal Exception Arithmetic overflow Using an undefined instruction Hardware malfunctions Either Exception or interrupt 臺大電機吳安宇教授-計算機結構
Interrupt and Exception Exception: any unexpected change in control flow without distinguishing whether the cause is internal or external Interrupt: only when the event is externally caused We will only discuss how to handle an undefined instruction or an arithmetic overflow in this chapter. How exceptions are handled: Save the address of the offending instruction in the Exception Program Counter (EPC) and transfer control to the operating system at some specified address. Take some predefined action in response to an overflow, or stop the execution of the program and report an error (Execute Interrupt Service Routine) Terminate the program or may continue its execution, using the EPC to determine where to restart the execution of the program. 臺大電機吳安宇教授-計算機結構
Interrupt registers Two main methods used to communicate the reason for an exception: Cause register: A status register which holds a field that indicates the reason for the exception (used in MIPS architecture) Vectored interrupt: An interrupt for which the address to which control is transferred is determined by the cause of the exception. The operating system knows the reason for the exception by the address at which it is initiated. The address are separated by 32 bytes or 8 instructions, and the operating system must record the reason for the exception and may perform some limited processing in this sequence. Exception type Exception vector address (in hex) Undefined instruction C000 0000hex Arithmetic overflow C000 0020hex 臺大電機吳安宇教授-計算機結構
Handle Interrupt in MIPS For MIPS exception system Two additional registers to the datapath: EPC (exception program counter): A 32-bit register used to hold the address of the affected instruction. Cause: A register used to record the cause of the exception. In the MIPS architecture, this register is 32 bits. 3 Additional control signals: EPCWrite CauseWrite IntCause Change the 3-way mutiplexor (controlled by PCSouse) to a 4-way multiplexor, with additional input wired to the constant value 8000 0180hex. 臺大電機吳安宇教授-計算機結構
Handle Interrupt in MIPS Two new states (10 and 11) are shown in Fig 5.40 Undefined instruction (10): This is detected when no next state is defined from state 1 for the op value. Arithmetic overflow (11): The Overflow signal is used in the modified finite state machine to specify an additional possible next state(11) for state 7. 臺大電機吳安宇教授-計算機結構
Complete Datapath of MIPS CPU 臺大電機吳安宇教授-計算機結構
Complete State Diagram of Controller 臺大電機吳安宇教授-計算機結構
HW#6 HW#6: Chapter 5 exercise: 5.4, 5.5, 5.6, 5.8, 5.11, 5.13 Due date: 11/26 (Friday by 2pm) to TA Thinking Shen (in front of E2-232 box). No late submissions 臺大電機吳安宇教授-計算機結構