Chapter 8 – Memory Basics Logic and Computer Design Fundamentals Chapter 8 – Memory Basics 计算机的基本部件包括CPU, 存储器,外设。 存储器的主要用途就是存储数据,数据包括代码,指令,指令执行所需要的数据。 存储器词很大,广义的存储器包括内存,外存,硬盘,光盘等。狭义的就是指计算机内部用来存放数据的一个元件。Memory指内部的,storage指外部的。 Haifeng Liu haifengliu@zju.edu.cn 2014 Fall College of Computer Science and Technology, Zhejiang University
Overview Memory definitions Random Access Memory (RAM) Static RAM (SRAM) integrated circuits Cells and slices Cell arrays and coincident selection Arrays of SRAM integrated circuits Dynamic RAM (DRAM) integrated circuits DRAM Types Synchronous (SDRAM) Double-Data Rate (DDR SRAM) RAMBUS DRAM (RDRAM) Arrays of DRAM integrated circuits 存储器今天要介绍的是RAM, 还有一种是ROM, 前面介绍过。RAM和ROM在一些基本概念上很接近,唯一的区别是RAM既可以读又可以写
Memory Definitions Memory ─ A collection of storage cells together with the necessary circuits to transfer information to and from them. Memory Organization ─ the basic architectural structure of a memory in terms of how data is accessed. Random Access Memory (RAM) ─ a memory organized such that data can be transferred to or from any cell (or collection of cells) in a time that is not dependent upon the particular cell selected. Memory Address ─ A vector of bits that identifies a particular memory element (or collection of elements). 集合:一组存储器,有行有列。 必要的存储指写;恢复信息指读 顺序访问:磁带 存储地址:8位的,存储256个单元,16位的就是64K, 内存4GB需要的地址位数就是32位。
Memory Definitions (Continued) Typical data elements are: bit ─ a single binary digit byte ─ a collection of eight bits accessed together word ─ a collection of binary bits whose size is a typical unit of access for the memory. It is typically a power of two multiple of bytes (e.g., 1 byte, 2 bytes, 4 bytes, 8 bytes, etc.) Memory Data ─ a bit or a collection of bits to be stored into or accessed from memory cells. Memory Operations ─ operations on memory data supported by the memory unit. Typically, read and write operations over some data element (bit, byte, word, etc.). 字的概念比较模糊,有的地方一个字就是16位,有的是指位宽,一次性访问多少位。 总容量怎么定义 N根地址线,位宽M, 存储器的容量就是2^N*M 一般操作都是针对一个字进行的。
Memory Organization PDP-8 IBM 360 Organized as an indexed array of words. Value of the index for each word is the memory address. Often organized to fit the needs of a particular computer architecture. Some historically significant computer architectures and their associated memory organization: Digital Equipment Corporation PDP-8 – used a 12-bit address to address 4096 12-bit words. IBM 360 – used a 24-bit address to address 16,777,216 8-bit bytes, or 4,194,304 32-bit words. Intel 8080 – (8-bit predecessor to the 8086 and the current Intel processors) used a 16-bit address to address 65,536 8-bit bytes. Intel 8080 Intel 8086 位宽的演变有个历史的。 PDP-8:1964年生产,是第一台大规模生产的微处理机,最早的计算机是用磁性线圈来存储一位数据。体积大,容易坏。 IBM 360:16M容量,智能手机都有1G。 Intel 8080:1974年4月生产的一个CPU芯片,40个引脚,不算多的,引脚包括电源,地线,复位,总线等等,采用了总线复用技术来减少引脚个数。
Memory Block Diagram A basic memory system is shown here: k address lines are decoded to address 2k words of memory. Each word is n bits. Read and Write are single control lines defining the simplest of memory operations. n Data Input Lines k Address Lines Read Write n Data Output Lines Memory Unit 2k Words n Bits per Word k 1 n 实际中数据线只有一排脚,Inout, 用三态门实现,读写控制连到三态门使能。
Memory Organization Example Example memory contents: A memory with 3 address bits & 8 data bits has: k = 3 and n = 8 so 23 = 8 addresses labeled 0 to 7. 23 = 8 words of 8-bit data Memory Address Memory Content Binary Decimal 000 1 0 0 0 1 1 1 1 001 1 1 1 1 1 1 1 1 1 010 2 1 0 1 1 0 0 0 1 011 3 0 0 0 0 0 0 0 0 100 4 1 0 1 1 1 0 0 1 101 5 1 0 0 0 0 1 1 0 110 6 0 0 1 1 0 0 1 1 111 7 1 1 0 0 1 1 0 0
Basic Memory Operations Memory operations require the following: Data ─ data written to, or read from, memory as required by the operation. Address ─ specifies the memory location to operate on. The address lines carry this information into the memory. Typically: n bits specify locations of 2n words. An operation ─ Information sent to the memory and interpreted as control information which specifies the type of operation to be performed. Typical operations are READ and WRITE. Others are READ followed by WRITE and a variety of operations associated with delivering blocks of data. Operation signals may also specify timing info. 读写操作还有一些批量进行的,猝发读,猝发写,连续地进行访问。
Basic Memory Operations (continued) Read Memory ─ an operation that reads a data value stored in memory: Place a valid address on the address lines. Wait for the read data to become stable. Write Memory ─ an operation that writes a data value to memory: Place a valid address on the address lines and valid data on the data lines. Toggle the memory write control line Sometimes the read or write enable line is defined as a clock with precise timing information (e.g. Read Clock, Write Strobe). Otherwise, it is just an interface signal. Sometimes memory must acknowledge that it has completed the operation.
Memory Operation Timing Most basic memories are asynchronous Storage in latches or storage of electrical charge No clock Controlled by control inputs and address Timing of signal changes and data observation is critical to the operation Read timing: (read cycle is 65ns) 20 ns Clock T1 T2 T3 T4 T1 Address Address valid 读周期就是访问延迟。延迟越大,内存的速度越慢。越小,访问速度越快。以前内存条速度是133的,后来到333,533,现在是1G的,读延迟越来越小。 Memory enable Read/ Write Data Output Data valid 65 ns Read cycle
Memory Operation Timing Write timing: (write cycle is 75ns) Critical times measured with respect to edges of write pulse (1-0-1): Address must be established at least a specified time before 1-0 and held for at least a specified time after 0-1 to avoid disturbing stored contents of other addresses Data must be established at least a specified time before 1-0 and held for at least a specified time after 0-1 to write correctly 20 ns Clock T1 T2 T3 T4 T1 Address Address valid Memory enable Read/ Write Data Input Data valid 75 ns Write cycle 给定N个地址,从2^n中选一个单元使能,利用译码器来实现。 地址译码非常耗时,所以要先建立好。 读比写要快,一般存储器的速度是指读的速度。
RAM Integrated Circuits Types of random access memory Static – information stored in latches Dynamic – information stored as electrical charges on capacitors Charge “leaks” off Periodic refresh of charge required Dependence on Power Supply Volatile – loses stored information when power turned off Non-volatile – retains information when power turned off 前面讲的是外部的结构,下面看一下内部的构造。 锁存器速度比较快,只是经过一个传输延迟, 但是锁存器的晶体管开销大,保存一位数据需要5~6个晶体管。动态的话只需要一个晶体管。 买来的内存条都是动态RAM,静态RAM什么时候用到—二级缓存(level 2 cache),最早以前256K, 现在高端CPU到4M,8M, 二级缓存是一个1/2CPU主频存储元件,一级缓存是和CPU同主频的,很小,16K~32K. 缓存大,速度跑得快。DDR->cache, DDR的频率达不到CPU的频率,把CPU接到cache,cache存放一部分指令,同样的频率。如果指令没找到,再从DDR中调入。怎么做指令预取,换页,减少miss rate. Non-volatile: flash
Static RAM Cell Array of storage cells used to implement static RAM SR Latch Select input for control Dual Rail Data Inputs B and B Dual Rail Data Outputs C and C Select B C S Q C R Q B RAM cell 右边是个示意电路,实际电路很复杂。 Select=0, 锁存器保持 Select=1, B=B’=0,输出 Select=1, B和B’互补输入,就写入。
Static RAM Bit Slice Represents all circuitry that is required for 2n 1-bit words Multiple RAM cells Control Lines: Word select i – one for each word Bit Select Data Lines: Data in Data out Word Select select B C S Q X C Word B R Q X select RAM cell RAM cell Word select 1 Word Select RAM cell select 2 n - 1 Word S Q X select 2 n - 1 R Q X RAM cell Bit select=0时,整个位片被禁止,只有在1的时候,可以选择。 RAM cell Read/Write Logic Data in S Q Data out Data in Read/ Bit Write select R Q (b) Symbol Write logic Read logic Data out Read/ Bit Write select (a) Logic diagram
2n-Word 1-Bit RAM IC To build a RAM IC from a RAM slice, we need: Decoder decodes the n address lines to 2n word select lines A 3-state buffer on the data output permits RAM ICs to be combined into a RAM with c 2n words 4-to-16 Word select Decoder A 3 A 3 2 3 1 2 RAM cell A A 2 2 2 2 3 4 A 1 1 A 1 2 5 6 RAM cell l A 7 A 2 16 x 1 8 RAM 9 10 Data Data 11 input output 12 13 14 Read/ 15 Write RAM cell Chip select 片选 如果是M位的,右边的复制一列,把word select拉过来,chip select接到一起,输出分别拉出m个输出。 Memory enable (a) Symbol Read/Write Logic Data input Data in Data Data out output Read/ Bit Write select Read/Write Chip select (b) Block diagram
Cell Arrays and Coincident Selection Memory arrays can be very large => Large decoders Large fanouts for the bit lines The decoder size and fanouts can be reduced by approximately by using a coincident selection in a 2-dimensional array Uses two decoders, one for words and one for bits Word select becomes Row select Bit select becomes Column select See next slide for example A3 and A2 used for Row select A1 and A0 for Column select 一个地址分成两半,行,列。
Cell Arrays and Coincident Selection (continued) Row decoder 2-to-4 Decoder A 3 2 1 RAM cell RAM cell RAM cell RAM cell 1 2 3 A 2 2 1 Row RAM cell RAM cell RAM cell RAM cell select 4 5 6 7 2 RAM cell RAM cell RAM cell RAM cell 8 9 10 11 3 RAM cell RAM cell RAM cell RAM cell 12 13 14 15 Read/Write Read/Write Read/Write Read/Write logic logic logic logic Data in Data in Data in Data in Data out Data out Data out Data out Word select row select, bit select column select 输入接到一起,没关系,不会冲突。 输出通过或门接到一起, Read/ Bit Read/ Bit Read/ Bit Read/ Bit Write select Write select Write select Write select Data input Read/Write X X X X Column select Data 1 2 3 output Column 2-to-4 Decoder decoder with enable 2 1 2 Enable A 1 A Chip select
RAM ICs with > 1 Bit/Word Word length can be quite high. To better balance the number of words and word length, use ICs with > 1 bit/word See Figure on the next page for example 2 Data input bits 2 Data output bits Row select selects 4 rows Column select selects 2 pairs of columns
Making Larger Memories Using the CS lines, we can make larger memories from smaller ones by tying all address, data, and R/W lines in parallel, and using the decoded higher order address bits to control CS. Using the 4-Word by 1-Bit memory from before, we construct a 16-Word by 1-Bit memory. D-out加三态门 地址扩展
Making Wider Memories To construct wider memories from narrow ones, we tie the address and control lines in parallel and keep the data lines separate. For example, to make a 4-word by 4-bit memory from 4, 4-word by 1-bit memories Note: Both 16x1 and 4x4 memories take 4-chips and hold 16 bits of data.
Dynamic RAM (DRAM) Basic Principle: Storage of information on capacitors. Charge and discharge of capacitor to change stored value Use of transistor as “switch” to: Store charge Charge or discharge Characteristic: destructive read Periodically refresh Select B T C DRAM cell (a) Circuit Select D C Q B DRAM Cell Model 刷新控制器 (b) Logical Model
Dynamic RAM - Bit Slice C is driven by 3-state drivers Sense amplifier is used to change the small voltage change on C into H or L In the electronics, B, C, and the sense amplifier output are connected to make destructive read into non-destructive read Word Select select B C D Q C DRAM cell Word model select DRAM cell Word select 1 Word Select DRAM cell select 2 n 2 1 D Q Word select 2 n 2 1 C DRAM cell DRAM cell model Read/Write logic Data in Sense amplifier Data out Data in Read/ Bit Write select (b) Symbol Write logic Read logic Data out Bit Read/ Write select (a) Logic diagram
Dynamic RAM - Block Diagram The address is split to roughly halve the large number of address pins on the typical RAM IC. The row address is used to select the row of cells to be read within the memory. The column address is used to select the word to be placed on the output from the data read from the row of cells. Since the data must be read from the cells before it can be selected, the row address must be applied first.
Dynamic RAM - Block Diagram Refresh Controller and Refresh Counter Read and Write Operations Application of row address Application of column address Why is the address split? Why is the row address applied first? The address is split to roughly halve the large number of address pins on the typical RAM IC. The row address is used to select the row of cells to be read within the memory. The column address is used to select the word to be placed on the output from the data read from the row of cells. Since the data must be read from the cells before it can be selected, the row address must be applied first.
Dynamic RAM Read Timing 20 ns Clock T1 T2 T3 T4 T1 Address Row Column Address Address RAS CAS Output enable RAS:row address strobe 行列地址分开送,地址线减少一半。并且可以猝发读:两级存储器,DDR与cache数据块交换。列地址是起始地址,以后一个时钟周期读一个字节。 Read/ Write Data Hi-Z Data valid output 65 ns Read cycle
DRAM Types Types to be discussed Synchronous DRAM (SDRAM) Double Data Rate SDRAM (DDR SDRAM) RAMBUS® DRAM (RDRAM) Justification for effectiveness of these types DRAM often used as a part of a memory hierarchy (See details in chapter 14) Reads from DRAM bring data into lower levels of the hierarchy Transfers from DRAM involve multiple consecutively addressed words Many words are internally read within the DRAM ICs using a single row address and captured within the memory This read involves a fairly long delay SDRAM: 586,奔腾 DDR SDRAM:现在的计算机,时钟脉冲的上升沿和下降沿都可以读写,两倍的速度。
DRAM Types (continued) Justification for effectiveness of these types (continued) These words are then transferred out over the memory data bus using a series of clocked transfers These transfers have a low delay, so several can be done in a short time The column address is captured and used by a synchronous counter within the DRAM to provide consecutive column addresses for the transfers burst read – the resulting multiple word read from consecutive addresses
Synchronous DRAM Transfers to and from the DRAM are synchronize with a clock Synchronous registers appear on: Address input Data input Data output Column address counter for addressing internal data to be transferred on each clock cycle beginning with the column address counts up to column address + burst size – 1
Synchronous DRAM (Continuous)
Synchronous DRAM (Continuous) The Memory Bandwidth of SDRAM is related with burst size Example: Memory data path width: 1 byte Memory clock period: 7.5 ns Latency time (from application of row address until first word available): 4 clock cycles If burst size: 8 bytes Read cycle time: (4 + 8) x 7.5 ns = 90 ns Memory Bandwidth: 8/(90 x 10-9) = 88.89 Mbytes/sec If burst size: 2048 bytes Read cycle time: (4 + 2048) x 7.5 ns = 15390 ns Memory Bandwidth: 2048/(15390 x 10-9) = 133.07 Mbytes/sec
Double Data Rate Synchronous DRAM Transfers data on both edges of the clock Provides a transfer rate of 2 data words per clock cycle Example: Same as for synchronous DRAM Read cycle time = 15390 ns Memory Bandwidth: 2 x 2048/(15390 x 10-9) = 266.14 Mbytes/sec
Rambus DRAM (RDRAM) Uses a packet-based bus for interaction between the RDRAM ICs and the memory bus to the processor The bus consists of: A 3-bit row address bus A 5-bit column address bus A 16 or 18-bit (for error correction) data bus The bus is synchronous and transfers on both edges of the clock Packets are 4-clock cycles long giving 8 transfers per packet representing: A 12-bit row address packet A 20-bit column address packet A 128 or 144-bit data packet Multiple memory banks are used to permit concurrent memory accesses with different row addresses The electronic design is sophisticated permitting very fast clock speeds
Arrays of DRAM Integrated Circuits Similar to arrays of SRAM ICs, but there are differences typically handled by an IC called a DRAM controller: Separation of the address into row address and column address and timing their application Providing RAS and CAS and timing their application Performing refresh operations at required intervals Providing status signals to the rest of the system (e.g., indicating whether or not the memory is active or is busy performing refresh)
Assignment 8-1、8-4、8-5、8-8