计算机组织与系统结构输入/输出系统与总线 I/O System and Bus （第二十讲）程旭易江芳 2011.12.21.

计算机组织与系统结构输入/输出系统与总线 I/O System and Bus （第二十讲）程旭易江芳

本将内容的地位？本讲主题： I/O系统网络 Processor Processor Input Input Control
Memory Memory Datapath Datapath Output Output In terms of the overall picture, we started out from the left showing you how to design a processor抯 datapath and control. The three lectures before the Spring break covered the memory system. Today we will cover the Input and Output devices. Friday, we will talk about how to interface the I/O devices to the processor and memory via busses and the OS software. Next Wednesday, we will show you how multiple computers can be connected together with a network through the I/O devices. +1 = 5 min. (X:45)

本讲提纲 I/O性能与测度 I/O设备的特性磁盘总线引论总线类型和总线操作总线仲裁和如何设计总线仲裁器

ISA/EISA - MicroChannel
个人计算机剖视 Add-in board Cache Processor Cache/DRAM Controller Audio Motion Video VRAM DRAM VRAM DRAM PCI Bus SCSI LAN Base I/O Exp Bus Xface Graphics DRAM VRAM ISA/EISA - MicroChannel Bridge Architecture

动机: 谁关心I/O? CPU性能: 每年增长60% I/O系统的性能受到机械延迟的限制(磁盘I/O)
每年改进 < 10% (每秒的输入输出量或 MB/sec) Amdahl定律：系统加速比受制于最慢的部分！ 10% IO & 10x CPU => 性能改进 5倍 (损失50%) 10% IO & 100x CPU => 性能改进10倍 (损失90%) I/O瓶颈：缩小程序执行中 CPU部分的时间削减快速CPU的潜在性能

I/O 系统设计的论题性能可扩展性（Expandability）失效时，可用性 Network Disk Processor
Cache Memory - I/O Bus Main Memory I/O Controller Graphics interrupts This is a more in-depth picture of the I/O system of a typical computer. The I/O devices are shown here to be connected to the computer via I/O controllers that sit on the Memory I/O busses. We will talk about buses on Friday. For now, notice that I/O system is inherently more irregular than the Processor and the Memory system because all the different devices (disk, graphics) that can attach to it. So when one designs an I/O system, performance is still an important consideration. But besides raw performance, one also has to think about expandability and resilience in the face of failure. For example, one has to ask questions such as: (a)[Expandability]: is there any easy way to connect another disk to the system? (b) And if this I/O controller (network) fails, is it going to affect the rest of the network? +2 = 7 min (X:47)

I/O 系统性能 I/O 系统的性能与系统的许多部分有关 (受制于最弱的环节）： CPU 存储系统内部和外部cache 主存
底层互联（总线） I/O控制器 I/O设备 I/O软件的速度 (操作系统) 软件使用I/O设备的效率两种通用的性能指标：吞吐率：I/O带宽响应时间：时延 Even if we look at performance alone, I/O performance is not as easy to quantify as CPU performance because I/O performance depends on many other aspect of the system. Besides the obvious factors (the last four) such as the speed of the I/O devices and their controllers and software, the CPU, the memory system, as well as the underlying interconnect also play a major role in determining the I/O performance of a computer. Two common I/O performance metrics are I/O throughput, also known as I/O bandwidth and I/O response time. Also known as I/O latency. +1 = 8 min. (X:48)

简化的生产者-消费者模型吞吐率：消费者在单位时间内完成的任务数目为了达到最高可能的吞吐率：消费者从不停顿队列从不为空响应时间：
从某一任务进入队列开始，到该任务被消费者完成为止为了最小化响应时间：队列应该为空服务者应该空闲 Response time and throughput are related by this producer-server model. Throughput is the number of tasks completed by the server in unit time while response time begins when a task is placed in the queue and ends when it is completed by the server. In order to get the highest possible throughput, the server should never be idle so the queue should never be empty. But in order to minimize response time, you want the queue to be empty so the server is idle and can serve you as soon as place the order. So obviously, like many other things in life, throughput and response time is a tradeoff. +2 = 10 min. (X:50)

吞吐率与响应时间响应时间 (ms) 最大吞吐率的百分比 300 200 100 20% 40% 60% 80% 100%
This is shown here in this response time versus percentage of maximum throughput plot. In order to get the last few percentage of maximum throughput, you really have to pay a steep price in response time. Notice here the horizontal scale is in percentage of maximum throughput: that is this tradeoff (curve) is in terms of relative throughput. The absolute maximum throughput can be increased without sacrificing response time. +1 = 11 min. (X:51) 20% 40% 60% 80% 100% 最大吞吐率的百分比

增大吞吐率通常，可以采用以下方法改进吞吐率：在瓶颈问题上，增加硬件减少负载相关时延相对而言，响应时间难以减少：
消费者队列生产者队列消费者通常，可以采用以下方法改进吞吐率：在瓶颈问题上，增加硬件减少负载相关时延相对而言，响应时间难以减少：最终受制于光速！ (但目前，距离光速的限制还很远！) For example, one way to improve the maximum throughput without sacrificing the response time is to add another server. This bring us to an interesting fact, or joke, in I/O system and network design which says throughput is easy to improve because you can always throw more hardware at the problem. Response time, however, mis much harder to reduce, because ultimately it is limited by the speed of light and you cannot bribe God. Even though a lot of people do try by going to church regularly. +1 = 12 min. (X:52)

评价磁盘性能的I/O基准程序超级计算机应用程序：大规模科学计算问题事务处理例如：航空订票系统和银行的ATM 文件系统
例如，UNIX文件系统 Well you cannot talk about performance without also talk about benchmarks. As far as I/O performance benchmarks that deal with magnetic disks is concerned, they can be divided into three categories. I/O benchmarks that deal with large scale scientific problems that are usually found in supercomputer application. Then there are I/O benchmarks that are for transaction processing. An example of transaction processing is airline reservation system. ****** Can anyone give me another example of transaction processing? Your bank ATM. Finally, there are I/O benchmarks that measure file system performance. +2 = 14 min. (X:54)

存贮设备: 磁盘用途长期、非易失的存贮在存贮层次中，大容量、廉价、较慢的级别特性寻道时间(平均8 ms左右)
定位延迟（positional latency）旋转延迟（rotational latency）传输率大约每毫秒一个扇区 (5-15 MB/s) 成块容量 Gigabytes 三年四倍磁道（Track）扇区（Sector）柱面（Cylinder）盘片（Platter）磁头（Head） 7200 RPM = 120 RPS => 8 ms per rev ave rot. latency = 4 ms 128 sectors per track => 0.25 ms per sector 1 KB per sector => 16 MB / s The purpose of the magnetic disk is to provide long term, non-volatile storage. Disks are large in terms of capacity, inexpensive, but slow so they reside at the lowest level in the memory hierarchy. There are 2 types of disks: floppy and hard drives. Both types relay on a rotating platter coasted with a magnetic surface and a movable head is used to access the disk. The advantages of hard disks over floppy disks are: (a) Platters are made of metal or glass so they are more rigid and can be larger. (b) Hard disk also has higher density because it can be controlled more precisely. (c) Hard disk also has higher data rate because it can spin faster. (d) Finally, each hard disk drive can incorporate more than one platter. +2 = 30 min. (Y:10) 响应时间 = 排队 + 控制器 + 寻道 + 旋转 + 传输服务时间

磁盘的组织典型数据 (依赖于磁盘大小)：每面500 至 2,000 磁道每道 32 至 128 扇区扇区是可以读写的最小单位
盘片（Platters）磁道（Track）典型数据 (依赖于磁盘大小)：每面500 至 2,000 磁道每道 32 至 128 扇区扇区是可以读写的最小单位通常，所有磁道包含相同数量的扇区恒定位密度：在外围磁道可以记录更多的扇区扇区（Sector） Here is a primitive picture showing you how a disk drive can have multiple platters. Each surface on the platter are divided into tracks and each track is further divided into sectors. A sector is the smallest unit that can be read or written. By simple geometry you know the outer track have more area and you would thing the outer tack will have more sectors. This, however, is not the case in traditional disk design where all tracks have the same number of sectors. Well, you will say, this is dumb but dumb is the reason they do it . By keeping the number of sectors the same, the disk controller hardware and software can be dumb and does not have to know which track has how many sectors. With more intelligent disk controller hardware and software, it is getting more popular to record more sectors on the outer tracks. This is referred to as constant bit density. +2 = 32 min. (Y:12)

磁盘的响应延迟磁盘响应延迟 = 排队时间 + 控制器时间 + 寻道时间 + 旋转时间 + 传输时间 4K字节数据的传输：
扇区内磁道磁头外磁道盘片驱动臂磁盘响应延迟 = 排队时间 + 控制器时间 + 寻道时间 + 旋转时间 + 传输时间驱动器 4K字节数据的传输：寻道：8 ms以下旋转： rpm 传输： rpm

磁盘特性柱面：在给定时刻所有盘面的所有位于磁头下面的磁道所构成的柱面读/写数据过程包括三个阶段：寻道时间：将盘臂定位到正确的磁道上
Cylinder Sector Track Head 柱面：在给定时刻所有盘面的所有位于磁头下面的磁道所构成的柱面读/写数据过程包括三个阶段：寻道时间：将盘臂定位到正确的磁道上旋转时间：旋转盘片，使得所需的扇区位于读/写磁头下；传输时间：传输读写磁头下的一块位流（扇区）工业界报告的平均寻道时间：通常为：8 ms 至 12 ms (所有可能寻道的时间总和) / (所有可能的寻道情况总数) 由于磁盘访问的局部性，通常实际的寻道时间为：仅为广告数值的 25% 至 33% Platter To read write information into a sector, a movable arm containing a read/write head is located over each surface. The term cylinder is used to refer to all the tracks under the read/write head at a given point on all surfaces. To access data, the operating system must direct the disk through a 3-stage process. (a) The first step is to position the arm over the proper track. This is the seek operation and the time to complete this operation is called the seek time. (b) Once the head has reached the correct track, we must wait for the desired sector to rotate under the read/write head. This is referred to as the rotational latency. (c) Finally, once the desired sector is under the read/write head, the data transfer can begin. The average seek time as reported by the manufacturer is in the range of 12 ms to 20ms and is calculated as the sum of the time for all possible seeks divided by the number of possible seeks. This number is usually on the pessimistic side because due to locality of disk reference, the actual average seek time may only be 25 to 33% of the number published. +2 = 34 min. (Y:14)

旋转时间和传输时间旋转时间大多数磁盘的转速为 3,600 至 7200 RPM 每转动一周约需要 16 ms 至 8 ms
传输时间是下列参数的函数：传输大小(通常为一个扇区)： 1 KB / sector 旋转速度： 3600 RPM 至 7200 RPM 记录密度：一个磁道上每英寸记录的位数典型数值：每秒 2 至 12 MB As far as rotational latency is concerned, most disks rotate at 3,600 RPM or approximately 16 ms per revolution. Since on average, the information you desired is half way around the disk, the average rotational latency will be 8ms. The transfer time is a function of transfer size, rotation speed, and recording density. The typical transfer speed is 2 to 4 MB per second. Notice that the transfer time is much faster than the rotational latency and seek time. This is similar to the DRAM situation where the DRAM access time is much shorter than the DRAM cycle time. ***** Do anybody remember what we did to take advantage of the short access time versus cycle time? Well, we interleave! +2 = 36 min. (Y:16)

存贮技术的驱动力主流计算模式的驱动五十年代: 批处理到在线处理的转变九十年代: 集中处理到普及计算的转变
计算机无处不在：电话、电子书籍、汽车、摄像机全球性光纤网络及无线网络存贮工业的成效：嵌入式存贮更小、更便宜、更可靠、更低功耗数据使用高容量、层次式管理存储系统

历史回顾 1956 IBM Ramac ~ 七十年代早期 Winchester 针对大型机开发，专用接口
在大小上不断缩小：27 in. 至 14 in. 七十年代 5.25英寸软盘出现工业标准磁盘接口 ST506, SASI, SMD, ESDI 八十年代早期个人计算机和第一代工作站八十年代中期 Client/server计算基于文件服务器的集中存储加速磁盘的小型化： 8英寸至 5.25英寸巨大的磁盘驱动器市场成为现实工业标准： SCSI、IPI、IDE 在PC市场，采用5.25英寸驱动器，专用接口寿终正寝历史回顾

历史回顾八十年代末/九十年代初：膝上机、笔记本电脑(掌上机) 3.5英寸、 2.5英寸(1.8英寸)
大小加上容量驱动市场，而非性能目前，带宽改进：40%/年来自DRAM、PCMCIA卡中flash RAM的挑战仍然太贵，Intel承诺降低成本但还没有兑现每立方英寸上兆字节，还不能另人满意光盘性能尚不理想，但有小的生存空间(CD ROM)

磁盘历史（绪） 1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in
source: New York Times, 2/23/98, page C3, Wakers of disk drives crowd even more data into even smaller spaces

处理器接口处理器接口中断存储器映射I/O I/O控制结构轮询（Polling）中断（Interrupts）
直接存储器访问（DMA） I/O控制器 I/O处理器容量、访问时间、带宽互联总线

I/O 接口 I/O和存储器传输的联线不同 40 Mbytes/sec （乐观） VME bus Multibus-II
CPU Memory 存储器总线独立I/O总线 Interface Interface 独立的I/O指令 (in,out) I/O和存储器传输的联线不同 Peripheral Peripheral CPU 40 Mbytes/sec （乐观） 10 MIPS 处理器使该总线完全饱和! 公共存储器 & I/O总线 VME bus Multibus-II Memory Interface Interface Peripheral Peripheral

存储器映射 I/O CPU 单存储器&I/O总线没有独立的I/O指令 ROM Memory Interface RAM
Peripheral Memory $ L2 $ 存储器总线 Bus Adaptor I/O 总线 ROM RAM I/O

可编程I/O (轮询） CPU 若忙就循环等待数据是否不能很有效地利用准备好? 除非设备非常快！ no Memory IOC yes
读数据 device 但需要不断检测 I/O I/O工作可以分散到计算代码之中存数据 done? no yes

中断驱动数据传输 CPU add sub and or 用户程序 (1) I/O中断 nop (2) 保存PC Memory IOC
device (3) 中断服务地址 read store ... rti 中断服务例程用户程序仅在实际传输中才暂停以每1ms一次的速率传输1000次： 1000次中断（每2微秒一次中断） 1000次中断服务（每次98微秒） = 0.1 CPU秒 (4) 存储器设备传输率 = 10 MBytes/sec => 0 .1 x sec/byte => 0.1 sec/byte => 1000 bytes = 100 sec 1000次传输 x 100 sec = 100 ms = 0.1 CPU seconds 离设备传输率还有很大空间！中断开销的1/2

直接存储器访问（DMA） DMAC 向外设控制器提供握手信号，向存储器提供存储地址和握手信号以每毫秒一次的速率完成1000次传输的时间：
1 DMA建立时间 50 sec 1 次中断 2 sec 1 次中断服务 48 sec 秒的CPU时间 CPU向DMAC发送开始地址、方向；然后，发射“开始”命令。 CPU ROM 存储器映射 I/O Memory DMAC RAM IOC device Peripherals DMAC 向外设控制器提供握手信号，向存储器提供存储地址和握手信号 DMAC n

输入/输出处理器 IOP D1 CPU D2 主存总线 Mem . . . Dn I/O 总线目标设备命令在哪里 CPU IOP
Dn I/O 总线目标设备命令在哪里 CPU IOP 向 IOP发射指令完成后中断 (4) OP Device Address (1) 在存储器中查找命令 (2) (3) 存储器 OP Addr Cnt Other 设备与存储器之间的数据传送由IOP直接控制 IOP 偷取存储器周期特殊请求做什么数据放在哪里多少

与处理器体系结构的关系 I/O 指令已经基本消失了提高处理器性能增设的cache提出新的问题
冲洗CACHE非常费时，而I/O可能污染cache 可以从共享存储多处理器的“监听(snooping)”策略借鉴解决方案虚拟存储器对 DMA 提出新问题一些Load/store结构可能要求原子性操作装入锁定（load locked）、条件存储（store conditional）处理器难以进行上下文切换（context switch）

I/O设备的类型和特征行为：一个 I/O设备如何工作? 输入设备：只读输出设备：只写，不能读存贮设备：可以重读，通常也可重写
合作对象：在I/O设备的另一方要么是人，要么是机器要么是输入设备传入数据，要么是输出设备读取数据数据传输率数据可以传送的最大速率在I/O设备和主存之间或者在I/O设备与CPU之间 Although there are many different I/O devices, they can all be organized based on these three characteristics: behavior, partner, and data rate. Behavior deals with how an I/O device behave? Is it an input device such as a mouse? Or is it an output only device such as a printer? Or is it a storage device such as the disk. Partner deals with what an I/O device interact? Does it interact with human or machine? Finally data rate deals with the peak rate at which data can be transferred between the I/O device and the main memory or between the I/O device and the CPU. +1 = 26 min. (Y:06)

I/O 设备实例设备行为合作方数据传输率 (KB/sec) 键盘输入人 0.01 鼠标输入人 0.02
键盘输入人鼠标输入人行式打印机输出人激光打印机输出人图形显示器输出人 30,000.00 LAN 输入或输出机器软盘存贮机器光盘存贮机器磁盘存贮机器 2,000.00

可靠性与可用性（Reliability and Availability）
常被混淆的两个概念可靠性：是否有部件失效？可用性：是否用户还可以正确使用系统？可用性可以通过增加硬件来改进：例如：存储器中增加ECC 可靠性只能通过下列方法改进：改善使用环境状态建造更加可靠的元器件和部件减少系统使用的元器件和部件数可以通过使用低成本、低可靠性的部件来改进可用性 This bring us to two terms that are often confused: reliability and availability. Here is the proper distinction: reliability asks the question is anything broken? Availability, on the other hand, ask the question is the system still availability to the user? Adding hardware can therefore improve availability. For example, an airplane with two engines is more “available” than an airplane with one engine. Reliability, on the other hand, can only be improved by bettering environmental conditions, building more reliable components, or reduce the number of components in the system. Notice that by adding hardware to improve availability, you may actually reduce reliability. For example, an airplane with two engines is twice as likely to have an engine failure than an airplane with only one engine so its reliability is lower although its availability is higher. +2 = 44 min. (Y:24)

磁盘阵列一种磁盘存储的新组织大量容量小、价廉的磁盘构成的阵列通过使用很多磁盘驱动器来提高潜在吞吐率: 数据分布在多个磁盘上
对不同磁盘进行多次访问可靠性比单个磁盘更低但是可以通过增加冗余磁盘改进可用性，可以利用冗余信息重建丢失信息 MTTR: 平均修复时间，小时级别 MTTF: 平均无故障时间，三年至五年 The discussion of reliability and availability brings us to a new organization of disk storage where arrays of small and inexpensive disks are used to increase the potential throughput. This is how it works: Data is spread over multiple disk so multiple accesses can be made to several disks either via interleaving or done in parallel. While disk arrays improve throughput, latency is not necessary improved. Also with N disks in the disk array, its reliability is only 1 over N the reliability of a single disk. But availability can be improved by adding redundant disks so lost information can be reconstructed from redundant information. Since mean time to repair is measured in hours and MTTF is measured in years, redundancy can make the availability of disk arrays much higher than that of a single disk. +2 = 46 min. (Y:26)

网络存贮网络带宽逐步增加磁盘大小逐步缩小
14" 10"  8"  5.25"  3.5"  2.5"  1.8"  1.3" . . . 基于磁盘阵列的高带宽磁盘系统在高速网络上的高性能存贮服务网络提供了更好的物理和逻辑接口：独立的CPU 和存贮系统！网络文件服务支持远程文件访问的操作系统结构 3 Mb/s  10Mb/s  50 Mb/s  100 Mb/s  1 Gb/s  10 Gb/s 网络的持续高带宽传输能力网络带宽逐步增加

磁盘阵列的制造上优势磁盘产品系列常规： 4种磁盘设计 3.5 5.25 10 14 低端高端磁盘阵列： 1种磁盘设计 3.5

阵列的可靠性 N个磁盘的可靠性 = 1个磁盘的可靠性  N 没有冗余的阵列在使用中太不可靠！
50,000小时  70 磁盘 = 700 小时磁盘系统的平均无故障时间：从 6 年跌至 1个月！没有冗余的阵列在使用中太不可靠！可与访问过程并行进行重构的热备份：可以达到很高的媒体可用性

使用DRAM的二级存储可以按两种方式使用DRAM作为第二级存储：固态盘（Solid state disk）扩展存储固态盘：
象磁盘一样进行操作，但是更快成本更高使用电池来保证系统信息不会丢失扩展存储：允许数据块从主存移进和移出的较大存储器 One drawback of using magnetic disk as secondary storage is that it is slow. There are two ways DRAM can be used for secondary storage: solid state disk and expanded storage. Solid state disk behaves just like a disk except it is much faster because it does not have any delay caused by mechanical parts such as seek time and rotational delay. The drawback of solid state disk is that it is very expensive. Also, unlike disk which is non-volatile, DRAM will lost its contents if power is turned off so battery is used to make the system non-volatile. As far as expanded storage is concerned, it is a large memory, much larger than the main memory, that only allows block transfers to or from main memory. +2 = 48 min. (Y:28)

光盘缺点是只读介质优点：可移动制造成本低在大型存贮备份方面，具有与新型磁带技术竞争的潜力
Another challenger to magnetic disk as secondary storage device is optical disks or CDs. The drawback of using CDs as secondary storage is that it is read-only. The advantages of optical compact disk is that it is removable, inexpensive to manufacture, and some of them are write-once, which means you can make one reliable write to them. The write-once feature gives CD the potential to compete with new tape technologies for archival storage. +1 = 49 min. (Y:29)

I/O系统小结磁盘 I/O基准程序：超级计算机应用程序：主要关心数据传输率事务处理：主要关心 I/O速率文件系统：主要关心文件访问
磁盘访问时间包括以下三部分：寻道时间：广告数值为 ms，现实情况可能更低旋转时间： 7200RPM：4.2ms；5400 RPM： 5.6 ms 传输时间：每秒 MB First we showed you the diversity of I/O requirements by talking about three categories of disk I/O benchmarks. Supercomputer application’s main concern is data rate. The main concern of transaction processing is I/O rate and file system’s main concern is file access. Then we talk about magnetic disk. One thing to remember is that disk access time has 3 components. The first 2 components, seek time and rotational latency involves mechanical moving parts and are therefore very slow compare to the 3rd component, the transfer time. One good thing about the seek time is that this is probably one of the few times in life that you can actually get better than the “advertised” result due to the locality of disk access. As far as graphic display is concerned, resolution is the basic measurement of how much information is on the screen and is usually described as “some number” by “some number.” The first “some number” is the horizontal resolution in number of pixels while the second “some number” is the vertical resolution in number of scan lines. Then I showed you how the size as well as bandwidth requirement of a Color Frame Buffer can be reduced if a Color Map is placed between the Frame Buffer and the graphic display. Finally, we talked about a spacial memory, the VRAM, that can be used to construct the Frame Buffer. It is nothing but a DRAM core with a high speed shift register attach to it. That’s all I have for today and we will continue our discussion on I/O Friday. +3 = 80 min. (Z:00)

总线：将I/O与处理器和存储系统连接起来
Processor Input Control Memory Datapath Output 总线是一组共享的通信链路它使用一组线路将多个子系统连接起来 In a computer system, the various subsystems must be able to talk to one another. For example, the memory and processor need to communicate with each other. Similarly, the processor needs to communicate with the I/O devices. The most common way to do is to use a bus. A bus is a shared communication link that uses 1 set of wires to connect multiple subsystems. +1 = 6 min. (X:46)

总线的优点多功能性：易于增加新设备外设可在多个使用相同总线标准的计算机系统之间移动低成本：可以以多种方式共享使用单一一组线路
Processor I/O Device I/O Device I/O Device Memory 多功能性：易于增加新设备外设可在多个使用相同总线标准的计算机系统之间移动低成本：可以以多种方式共享使用单一一组线路 The two major advantages of the bus organization are versatility and low cost. By versatility, we mean new devices can easily be added. Furthermore, if a device is designed according to a industry bus standard, it can be move between computer systems that use the same bus standard. The bus organization is a low cost solution because a single set of wires is shared in multiple ways. +1 = 7 min. (X:47)

总线的缺点可能导致通信瓶颈总线的带宽制约了最大 I/O吞吐率总线最高速度主要受制于：总线的长度总线上设备的数目
Processor I/O Device I/O Device I/O Device Memory 可能导致通信瓶颈总线的带宽制约了最大 I/O吞吐率总线最高速度主要受制于：总线的长度总线上设备的数目需要支持多种设备的范围，特别是这些设备具有：时延差异很大数据传输率差异很大 The major disadvantage of the bus organization is that it creates a communication bottleneck. When I/O must pass through a single bus, the bandwidth of that bus can limit the maximum I/O throughput. The maximum bus speed is also largely limited by: (a) The length of the bus. (b) The number of I/O devices on the bus. (C) And the need to support a wide range of devices with a widely varying latencies and data transfer rates. +2 = 9 min. (Y:49)

总线的典型组织控制线：信号请求和应答指示数据线上的信息类型数据线在源设备和目的设备之间传递信息: 数据和地址复杂命令
Control Lines Data Lines 控制线：信号请求和应答指示数据线上的信息类型数据线在源设备和目的设备之间传递信息: 数据和地址复杂命令一个总线事务包括两部分：发送地址接收或发送数据 A bus generally contains a set of control lines and a set of data lines. The control lines are used to signal requests and acknowledgments and to indicate what type of information is on the data lines. The data lines carry information between the source and the destination. This information may consists of data, addresses, or complex commands. A bus transaction includes tow parts: (a) sending the address and (b) then receiving or sending the data. +1 = 10 min (X:50)

主设备与从设备一次总线事务包括两部分：发送地址接收或发送数据主设备（Master）是
Master send address Bus Master Bus Slave Data can go either way 一次总线事务包括两部分：发送地址接收或发送数据主设备（Master）是通过发送地址来启始总线事务的设备从设备通过下列过程对下述地址产生反应：如果主设备请求数据，就发送数据给主设备如果主设备要发送数据，就接收来自主设备的数据 The bus master is the one who starts the bus transaction by sending out the address. The slave is the one who responds to the master by either sending data to the master if the master asks for data. Or the slave may end up receiving data from the master if the master wants to send data. In most simple I/O operations, the processor will be the bus master but as I will show you later in today’s lecture, this is not always be the case. +1 = 11 min. (X:51)

输出操作这里输出是指处理器发送数据到I/O设备第一步：请求存储器第二步：读存储器第三步：向I/O设备发送数据控制 (存储器读请求)
数据 (存储器地址) I/O设备 (磁盘) 第二步：读存储器控制处理器存储器数据 I/O设备 (磁盘) We will define the term Output as the processor sending data to the I/O device and as shown here, it is a 3-step processes. The first step is to wake up the memory by sending it a read request on the control line and the address of the location to be read on the data lines. In the second step, the actual read occurs. That is the memory system is accessing the data. Finally, the memory system transfers the data to the I/O device using the bus’s data lines and the control lines are used (either by the processor or by the memory system, depending on the memory system’s intelligence) to tell the I/O device data is coming. +2 = 13 min. (X:53) 第三步：向I/O设备发送数据控制(设备写请求) Processor 存储器数据 (I/O设备地址，后跟数据) I/O设备 (磁盘)

输入操作这里输入是指处理器从I/O设备接收数据第一步：请求存储器控制(存储器写请求) 处理器存储器数据 (存储器地址)
第二步：接收数据控制 (I/O读请求) 处理器存储器数据 (I/O设备地址，后跟数据) In our discussion, Input is defined as the processor receiving data from the I/O device and is only a 2-step process. The first step is to wake up the Memory System by telling it a write operation is imminent (Control line) and provides the memory system with the address of the location to be written. Then the CPU will tell the I/O device to start sending data by sending a read request to the I/O device (Control line and I/O address on the data line). The data will then be sent to the Memory System via the data lines. +1 = 54 min. (X:54) I/O设备 (磁盘)

总线类型处理器-存储器总线 (面向设计) 距离短、速度快仅需要与存储系统匹配可达最大存储器到处理器带宽直接与处理器相连
I/O总线(工业标准) 通常，更长也更慢需要与广泛的 I/O设备匹配与处理器-存储器总线或底板总线相连单总线，又称底板总线(工业标准) 底板（Backplane）：底盘（chassis）内的互联结构允许处理器、存储器和I/O设备共存成本优势：所有部件共享一条总线 Buses are traditionally classified as one of 3 types: processor memory buses, I/O buses, or backplane buses. The processor memory bus is usually design specific while the I/O and backplane buses are often standard buses. In general processor bus are short and high speed. It tries to match the memory system in order to maximize the memory-to-processor BW and is connected directly to the processor. I/O bus usually is lengthy and slow because it has to match a wide range of I/O devices and it usually connects to the processor-memory bus or backplane bus. Backplane bus receives its name because it was often built into the backplane of the computer--it is an interconnection structure within the chassis. It is designed to allow processors, memory, and I/O devices to coexist on a single bus so it has the cost advantage of having only one single bus for all components. +2 = 16 min. (X:56)

具有单总线的计算机系统：底板总线单总线（底板总线）用于：处理器与存储器通信 I/O设备与存储器之间通信优点：简单，成本低
Backplane Bus Processor Memory I/O Devices 单总线（底板总线）用于：处理器与存储器通信 I/O设备与存储器之间通信优点：简单，成本低缺点：速度慢，且总线可能是系统的主要瓶颈示例： IBM PC Here is an example showing a single bus, the backplane bus is used to provide communication between the processor and memory. As well as communication between I/O devices and memory. The advantage here is of course low cost. One disadvantage of this approach is that the bus with so many things attached to it will be lengthy and slow. Furthermore, the bus can become a major communication bottleneck if everybody wants to use the bus at the same time. The IBM PC is an example that uses only a backplane bus for all communication. +2 = 18 min. (X:58)

双总线系统 I/O 总线通过总线适配器接入处理器-存储器总线：处理器-存储器总线：主要用于处理器-存储器之间的通信
Processor Memory Bus Processor Memory Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus I/O Bus I/O 总线通过总线适配器接入处理器-存储器总线：处理器-存储器总线：主要用于处理器-存储器之间的通信 I/O总线：为 I/O设备提供扩展槽 Apple Macintosh-II NuBus：处理器、存储器，以及一些可选 I/O设备 SCCI Bus：其他的I/O设备 Right before the break, I showed you a system with one bus only. Here is an example using two buses where multiple I/O buses tap into the processor-memory bus via bus adaptors. The Processor-memory bus is used mainly for processor-memory traffic while the I/O buses are used to provide expansion slots for the I/O devices. The Apple Macintosh-II adopts this organization where the NuBus is used to connect processor, memory, and a few selected I/O devices together. The rest of the I/O devices reside on an industry standard bus, the SCCI Bus, which is connected to the NuBus via a bus adaptor. +2 = 25 min. (Y:05)

三总线系统少量底板总线接入处理器-存储器总线处理器-存储器总线用于处理器-存储器之间的通信 I/O总线连接到底板总线
Processor Memory Bus Processor Memory Bus Adaptor Bus Adaptor I/O Bus Backplane Bus Bus Adaptor I/O Bus 少量底板总线接入处理器-存储器总线处理器-存储器总线用于处理器-存储器之间的通信 I/O总线连接到底板总线优点：处理器总线上的负载急剧减少 Finally, in a 3-bus system, a small number of backplane buses (in our example here, just 1) tap into the processor-memory bus. The processor-memory bus is used mainly for processor memory traffic while the I/O buses are connected to the backplane bus via bus adaptors. An advantage of this organization is that the loading on the processor-memory bus is greatly reduced because of the small number of taps into the high-speed processor-memory bus. +1 = 26 min. (Y:06)

同步和异步总线同步总线：在控制线中包括一个时钟线采用与时钟有关的一组固定通信协议优点：使用的逻辑非常少，可以以非常高的速度运行
缺点：总线上的每个设备都必须以相同的时钟频率运行为了避免时钟扭斜，如果这种总线的速度很快，那么它就不能太长异步总线：它没有时钟驱动可以适用于很广泛的设备它很容易增长，而不需考虑时钟扭斜问题需要握手协议l There are substantial differences between the design requirements for the I/O buses and processor-memory buses and the backplane buses. Consequently, there are two different schemes for communication on the bus: synchronous and asynchronous. Synchronous bus includes a clock in the control lines and a fixed protocol for communication that is relative to the clock. Since the protocol is fixed and everything happens with respect to the clock, it involves very logic and can run very fast. Most processor-memory buses fall into this category. Synchronous buses have two major disadvantages: (1) every device on the bus must run at the same clock rate. (2) And if they are fast, they must be short to avoid clock skew problem. By definition, an asynchronous bus is not clocked so it can accommodate a wide range of devices at different clock rates and can be lengthened without worrying about clock skew. The draw back is that it can be slow and more complex because a handshaking protocol is needed to coordinate the transmission of data between the sender and receiver. +2 = 28 min. (Y:08)

握手协议处理器从存储器读数据三条控制线 ReadReq：指示对存储器有一次读请求同时，地址被放置在数据线上
1 2 3 Data Address Data 2 4 6 5 Ack 6 7 4 DataRdy 三条控制线 ReadReq：指示对存储器有一次读请求同时，地址被放置在数据线上 DataRdy：指示现在在数据线上的数据已经准备好同时，数据被放置在数据线上 Ack：向另一方应答 ReadReq 或者 DataRdy Here is a simple handshaking protocol example that uses three control lines: (a) ReadReq indicates a read request for memory. When the sender asserts this signal, the address is put on the data lines at the same time. (b) DataRdy indicates the data word is now ready on the data lines. When the data sender asserts this signal, it must also put the data on the data lines simultaneously. (c) Ack is used to acknowledge the ReadReq or the DataRdy signal of the other party. So here it is how it works. Assume the processor wants to read something from memory. The signal in red is driven by the processor while the signal in black is driven by memory. (1) The processor initiates the bus transaction by asserting the ReadReq line and put the address it wants to read on the data lines at the same time. (2) The memory upon seeing the ReadReq latch in the address asserts the Ack signal to (3) tell the processor: “I got it” so the it can disassert ReadReq as well as the Data lines. (4) Upon seeing ReadReq goes low, the memory knows the processor is off the bus so it disasserts the Ack signal. (5) When the data being requested is ready, the memory will put the data on the Data lines and asserts the DataRdy signal. (6) Upon seeing the DataRdy signal, the processor will latch in the data on the data line and then assert the Ack to tell the memory it is now OK to get off the bus. (7) Once the processor sees DataRdy line go low, it knows the memory has gotten off the bus so it disasserts the ACK signal so another bus transaction can start. +3 = 31 min. (Y:11)

增加总线带宽独立的与多路复用的地址和数据线如果提供了独立的地址和数据线，那么就可以在一个总线周期同时传送地址和数据
成本： (a) 需要更多的总线线路； (b) 增加复杂性数据总线宽度：通过增加总线宽度，传输多个字的信息只需要较少的总线周期例如： SPARCstation 20的存储总线为128位宽成本：需要更多的总线线路成块传输（Block transfers）允许总线在连续的多个总线周期传输多个字的信息仅仅在开始的时候需要提供一次地址总线在最后一个字传送完毕之前并不释放总线成本：(a) 增加了复杂性 (b)降低了请求的响应时间 Our handshaking example in the previous slide used the same wires to transmit the address as well as data. The advantage is saving in signal wires. The disadvantage is that it will take multiple cycles to transmit address and data. By having separate lines for addresses and data, we can increase the bus bandwidth by transmitting address and data in the same cycle at the cost of more bus lines and increased complexity. This (1st bullet) is one way to increase bus bandwidth. Another way is to increase the width of the data bus so multiple words can be transferred in a single cycle. For example, the SPARCstation memory bus is 128 bits of 16 bytes wide. The cost of this approach is more bus lines. Finally, we can also increase the bus bandwidth by allowing the bus to transfer multiple words in back-to-back bus cycles without sending an address or releasing the bus. The cost of this last approach is an increase of complexity in the bus controller as well as a decease in response time for other parties who want to get onto the bus. +2 = 33 min. (Y:13)

获得对总线的访问权在总线设计中，最重要的一个论题：需要使用某总线的设备是如何获得并保留对该总线的使用权的？
Control: Master initiates requests Bus Master Bus Slave Data can go either way 在总线设计中，最重要的一个论题：需要使用某总线的设备是如何获得并保留对该总线的使用权的？通过主-从安排，可以避免混乱：只有总线主设备才能够控制对总线的访问：该主设备发起并控制所有的总线请求从设备对读和写请求发出响应最简单的一个系统：处理器是唯一的总线主设备所有的总线请求都必须受到处理器的控制主要缺陷：在每次总线事务中都必须有处理器参与 Taking about trying to get onto the bus: how does a device get onto the bus anyway? If everybody tries to use the bus at the same time, chaos will result. Chaos is avoided by a maser-slave arrangement where only the bus master is allow to initiate and control bus requests. The slave has no control over the bus. It just responds to the master’s response. Pretty sad. In the simplest system, the processor is the one and ONLY one bus master and all bus requests must be controlled by the processor. The major drawback of this simple approach is that the processor needs to be involved in every bus transaction and can use up too many processor cycles. +2 = 35 min. (Y:15)

多个潜在总线主设备：需要仲裁总线仲裁策略：希望使用总线的总线主设备发出总线请求该总线主设备在获得请求许可之前不能使用该总线
该总线主设备在使用总线结束之后必须向仲裁器发出信号总线仲裁策略通常需要平衡以下两个要素：总线优先权：最高优先权的设备必须被最先服务公平：即使是最低优先权的设备也不能永远得不到总线服务总线仲裁策略大致可分为如下四类：基于自测的分布式仲裁：每个希望使用总线的设备都将代表自身的标志码放置在总线上；基于冲突监测的分布式仲裁：以太网使用该策略；菊花链（Daisy chain）仲裁集中、并行仲裁 A more aggressive approach is to allow multiple potential bus masters in the system. With multiple potential bus masters, a mechanism is needed to decide which master gets to use the bus next. This decision process is called bus arbitration and this is how it works. A potential bus master (which can be a device or the processor) wanting to use the bus first asserts the bus request line and it cannot start using the bus until the request is granted. Once it finishes using the bus, it must tell the arbiter that it is done so the arbiter can allow other potential bus master to get onto the bus. All bus arbitration schemes try to balance two factors: bus priority and fairness. Priority is self explanatory. Fairness means even the device with the lowest priority should never be completely locked out from the bus. Bus arbitration schemes can be divided into four broad classes. In the fist one: (a) Each device wanting the bus places a code indicating its identity on the bus. (b) By examining the bus, the device can determine the highest priority device that has made a request and decide whether it can get on. In the second scheme, each device independently requests the bus and collision will result in garbage on the bus if multiple request occurs simultaneously. Each device will detect whether its request result in a collision and if it does, it will back off for an random period of time before trying again. The Ethernet you use for your workstation uses this scheme. We will talk about the 3rd and 4th schemes in the next two slides. +3 = 38 min. (Y:18)

菊花链总线仲裁策略优点：简单缺点：不能保证公平：低优先级的设备可能永远得不到服务 grant信号会限制总线的速度 Device 1
Highest Priority Device N Lowest Priority Device 2 Grant Grant Grant Release Bus Arbiter Request 优点：简单缺点：不能保证公平：低优先级的设备可能永远得不到服务 grant信号会限制总线的速度 The daisy chain arbitration scheme got its name from the structure for the grant line which chains through each device from the highest priority to the lowest priority. The higher priority device will pass the grant line to the lower priority device ONLY if it does not want it so priority is built into the scheme. The advantage of this scheme is simple. The disadvantages are: (a) It cannot assure fairness. A low priority device may be locked out indefinitely. (b) Also, the daisy chain grant line will limit the bus speed. +1 = 39 min. (Y:19)

Highest priority: ReqA
使用单一总线仲裁器的集中式仲裁 ReqA GrantA ReqB Arbiter Highest priority: ReqA Lowest Priority: ReqC GrantB ReqC GrantC Clk Clk ReqA ReqB GrA In the centralized, parallel arbitration scheme, the devices independently request the bus by using multiple request lines. A centralized arbiter chooses from among the devices requesting bus access and notifies the selected device that it is now the bus master via one of the grant line. Here is an example with A has the highest priority and C the lowest. Since A has the highest priority, Grant A will be asserted even though both requests A and B are asserted. Device A will keep Request A asserted until it no longer needs the bus so when Request A goes low, the arbiter will disassert Grant A. Since Request B remains asserted (Device B has not gotten the bus yet) at this time, the arbiter will then asserts Grant B to grant Device B access to the bus. Similarly, Device B will not disassert Request B until it is done with the bus. +2 = 41 min. (Y:21) GrB

操作系统的职责操作系统可以被看为以下两部分的接口： I/O硬件与请求 I/O的程序 I/O系统的三大特征：
中断必须由操作系统处理，中断将导致系统进入管态（supervisor mode）对 I/O设备的低级控制非常复杂：管理一组并发事件正确控制设备的需求必须非常详细 The OS acts as the interface between the I/O hardware and the program that requests I/O. The responsibilities of the operating system arise from 3 characteristics of the I/O systems: (a) First the I/O system is shared by multiple programs using the processor. (b) I/O system, as I will show you, often use interrupts to communicate information about I/O operation and interrupt must be handled by the OS. (c) Finally, the low-level control of an I/O device is very complex so we should leave to those crazy kernel programers to handle them. +1 = 52 min. (Y:32)

对操作系统的需求对共享I/O资源提供保护保证用户程序只能访问该用户有权访问的那部分I/O设备提供访问设备的一种抽象：
提供处理低级设备操作的例程处理I/O设备产生的中断提供公平访问共享I/O资源的策略所有用户程序必须平等访问I/O资源对访问进行调度以求增大系统的吞吐率 Here is a list of the function the OS must provide. First it must guarantee that a user’s program can only access the portion of an I/O device that it has the right to do so. Then the OS must hide low level complexity from the user by suppling routines that handle low-level device operation. The OS also needs to handle the interrupts generated by I/O devices. And the OS must be be fair: all user programs must have equal access to the I/O resources. Finally, the OS needs to schedule accesses in a way that system throughput is enhanced. +1 = 53 min. (Y:33)

OS和I/O系统通信需求操作系统必须能够防止：用户程序与 I/O设备之间的直接通信如果用户程序能够直接对I/O进行操作，那么
需要三种类型的通信： OS必须能够给I/O设备提供命令当 I/O 设备完成操作或遇到错误时，它必须能够通报OS。数据必须在存储器和I/O设备之间传输 The OS must be able to communicate with the I/O system but at the same time, the OS must be able to prevent the user from communicating with the I/O device directly. Why? Because if user programs could perform I/O directly, we would not be able to provide protection to the shared I/O device. Three types of communications are required: (1) First the OS must be able to give commands to the I/O devices. (2) Secondly, the device e must be able to notify the OS when the I/O device has completed an operation or has encountered an error. (3) Data must be transferred between memory and an I/O device. +2 = 55 min. (Y:35)

给 I/O设备发送命令可以使用两种方式对I/O设备寻址：专用 I/O指令存储器映像 I/O 专用 I/O指令指明：设备号和命令字
设备号和命令字设备号：处理器通过一组连线（这组连线是I/O总线的一部分）与设备进行通信命令字：它通常在总线的数据线上发送存储器映像 I/O：地址空间的一部分分配给I/O设备对这些地址进行读和写就被解释为对给I/O设备的命令防止用户程序直接发送I/O 操作： I/O地址空间受到地址变换机制的保护 How do the OS give commands to the I/O device? There are two methods. Special I/O instructions and memory-mapped I/O. If special I/O instructions are used, the OS will use the I/O instruction to specify both the device number and the command word. The processor then executes the special I/O instruction by passing the device number to the I/O device (in most cases) via a set of control lines on the bus and at the same time sends the command to the I/O device using the bus’s data lines. Special I/O instructions are not used that widely. Most processors use memory-mapped I/O where portions of the address space are assigned to the I/O device. Read and write to this special address space are interpreted by the memory controller as I/O commands and the memory controller will do right thing to communicate with the I/O device Why is memory-mapeed I/O so popular? Well, it is popular because we can use the same protection mechanism we already implemented for virtual memory to prevent the user from issuing commands to the I/O device directly. +2 = 57 min. (Y:37)

I/O 设备通报 OS 当发生下列情况时。OS需要知道： I/O设备完成了某一操作 I/O操作遇到了错误可以通过两种不同的方式实现：
轮询（Polling）： I/O设备将信息放置在状态寄存器中； OS定期检测状态寄存器 I/O中断：每当 I/O 设备需要处理器关注时，它就中断处理器继续进行正在处理的工作。 After the OS has issued a command to the I/O device either via a special I/O instruction or by writing to a location in the I/O address space, the OS needs to be notified when: (a) The I/O device has completed the operation. (b) Or when the I/O device has encountered an error. This can be accomplished in two different ways: Polling and I/O interrupt. +1 = 58 min. (Y:38)

I/O中断 I/O中断就像一般的意外事件，只是： I/O中断是异步的需要进一步传送信息相对于指令执行而言， I/O中断是异步的：
我们可以在我们自己认为合适的时候处理这种中断 I/O中断比一般意外事件更加复杂：需要传达产生中断的设备的身份信息中断请求具有不同的紧急性需要对中断请求优先排队 How does an I/O interrupt different from the exception you already learned? Well, an I/O interrupt is asynchronous with respect to the instruction execution while exception such as overflow or page fault are always associated with a certain instruction. Also for exception, the only information needs to be conveyed is the fact that an exceptional condition has occurred but for interrupt, there is more information to be conveyed. Let me elaborate on each of these two points. Unlike exception, which is always associated with an instruction, interrupt is not associated with any instruction. The user program is just doing its things when an I/O interrupt occurs. So I/O interrupt does not prevent any instruction from completing so you can pick your own convenient point to take the interrupt. As far as conveying more information is concerned, the interrupt detection hardware must somehow let the OS know who is causing the interrupt. Furthermore, interrupt requests needs to be prioritized. The hardware that can do all these looks like this. +2 = 64 min. (Y:44)

中断逻辑 : : 检测和对中断请求同步忽略被废止的中断 (masked off) 对未决的中断请求排序
创建中断微序列地址（ microsequence address）对中断微序列提供信号 Synchronizer Circuits uSeq. addr & select logic Interrupt Priority Network Async interrupt requests : : Interrupt Mask Reg First the interrupt requests have to pass through a synchronizer circuit, basically two flip flops in series, to avoid getting into meta-stability. Then based on the contents of the interrupt mask registers, all the interrupts that are currently disabled are ignored. The rest of the interrupt requests are then prioritized and we will generate the interrupt target address based on the highest priority interrupt we received. +2 = 66 min. (Y:46) Sync. Inputs Async. Inputs D Q Q D Clk Clk

程序中断/意外事件硬件硬件中断服务：保存 PC (或在流水机器中多个 PC值) 抑制正在被处理的中断转移到中断服务例程选项：
保存状态、保存寄存器、保存中断信息改变状态、改变操作模式、获取中断信息 I/O中断的一个有利特点：异步性：不与指定指令相关联可以在流水线的最方便的时候来处理它！ Besides detecting the interrupt and generate the interrupt target address, what else does the hardware needs to do. Well, at the minimum, the processor has to save the program counter or in a pipeline machine with delayed branch, you may have to save multiple program counters. To prevent recursion within a interrupt service routine, the processor must also disable further interrupt from the device it is servicing. This can be done easily by turning the interrupt bit in the mask register off. Then, the processor also has to branch to the interrupt target address. These (first three) are the things the hardware must do. Here are a list of optional things. Well, as I said before, one good thing about the interrupt is that it is asynchronous so you can pick the most convenient place in the pipeline to take it. +2 = 68 min. (Y:48)

从程序员来看中断目标地址选项：通用：转移到所有中断的公共地址，然后软件对原因进行译码并指出下一步做什么
main program interrupts request (e.g., from keyboard) (1) Add (2) Save PC and “branch” to interrupt target address Div Sub Save processor status/state Service the (keyboard) interrupt Restore processor status/state (3) get PC 中断目标地址选项：通用：转移到所有中断的公共地址，然后软件对原因进行译码并指出下一步做什么专用：根据中断类型和/或级别自动转移到不同的地址—向量化中断 Let me show you what I mean. For example here, the CPU receives a keyboard interrupt request in the middle of the Divide instruction. If the divide instruction causes an divide by 0 exception , we have to handle it immediately. But here we have an interrupt, we can say, well, I have been spending all this time doing this divide and is almost finish, let’s wait. So the interrupt is not taken until after the divide instruction has completed. As far as generating the interrupt target address is concerned, we also have two options. First is the low cost version. We make all interrupts go to a common address and let the software to figure out what to do next. Then for those you like to built complicated hardware, you can make the hardware automatically branch to different addresses based on the interrupt type and/or level. This is called vectored interrupt. +2 = 70 min. (Y:50)

把CPU从I/O处理中解放出来：DMA CPU sends a starting address,
direction, and length count to DMAC. Then issues "start". Direct Memory Access (DMA): External to the CPU Act as a maser on the bus Transfer blocks of data to or from memory without CPU intervention CPU Memory DMAC IOC device Finally, lets see how we can delegate some of the I/O responsibilities from the CPU. The first option is Direct Memory Access which take advantage of the fact that I/O events often involve block transfer: you are not going to access the disk 1 byte at a time. The DMA controller is external to the processor and can acts as a bus master to transfer blocks of data to or from memory and the I/O device without CPU intervention. This is how it works. The CPU sends the starting address, the direction and length of the transfer to the DMA controller and issues a start command. The DMA controller then take over from there and provides handshake signals required (point to the last text block) to complete the entire block transfer. So the DMA controller are pretty intelligent. If you add more intelligent to the DMA controller, you will end up with a IO processor or IOP for short. +2 = 72 min. (Y:52) DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory.

把CPU从I/O处理中解放出来：IOP CPU IOP D1 D2 main memory bus Mem . . . Dn I/O bus
Dn I/O bus target device where cmnds are OP Device Address CPU IOP (1) Issues instruction to IOP (4) IOP interrupts CPU when done IOP looks in memory for commands (2) OP Addr Cnt Other (3) memory The IOP is so smart that the CPU only needs to issue a simple instruction (Op, Device, Address) that tells them what is the target device and where to find more commands (Addr). The IOP will then fetch commands such as this (OP, Addr, Cnt, Other) from memory and do all the necessary data transfer between the I/O device and the memory system. The IOP will do the transfer at the background and it will not affect the CPU because it will access the memory only when the CPU is not using it: this is called stealing memory cycles. Only when the IOP finishes its operation will it interrupts the CPU. +2 = 74 min. (Y:54) what to do special requests Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles. where to put data how much

总线小结三种类型总线：处理器-存储器总线 I/O总线底板总线总线总裁策略：菊花链仲裁：不能保证公平性集中并行仲裁：需要中央仲裁器
轮询:浪费一些处理器时间 I/O中断：与一些意外事件类似，只是它是异步的把CPU从I/O处理中解放出来 Direct memory access (DMA) I/O processor (IOP) Let’s summarize what we learned today. First we talked about three types of buses: the processor-memory bus, which is usually the shortest and fastest. The I/O bus, which has to deal with a large range of I/O devices, is usually the longest and slowest. The backplane bus is a general interconnect built into the chassis of the machine. The processor-memory bus, which runs at high speed, usually is synchronous while the I/O and backplane buses can either be synchronous or asynchronous. As far as bus arbitration schemes are concerned, I showed you two in details. The daisy chain scheme is simple, but it is slow and cannot assure fairness--that is a low priority device may never get to use the bus at all. The centralized parallel arbitration scheme is faster but it requires a centralized arbiter so I also show you how to build a simple arbiter using simple AND gates and JK flip flops. When we talked about OS’s role, we discussed two ways an I/O device can notify the operating system when data is ready or something goes wrong. Polling is simple to implement but it may end up wasting a lot of processor cycle. I/O interrupt is similar to exception but it is asynchronous with respect to instruction execution so we can pick our own convenient point in the pipeline to handle it. Finally we talked about two ways you can delegate I/O responsibility from the CPU: Direct memory access and IO processor. +3 = 77 min. (Y:57)

计算机组织与系统结构输入/输出系统与总线 I/O System and Bus （第二十讲）程旭易江芳 2011.12.21.

Similar presentations

Presentation on theme: "计算机组织与系统结构输入/输出系统与总线 I/O System and Bus （第二十讲）程旭易江芳 2011.12.21."— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

计算机组织与系统结构 输入/输出系统与总线 I/O System and Bus （第二十讲 ） 程 旭 易江芳 2011.12.21.

Similar presentations

Presentation on theme: "计算机组织与系统结构 输入/输出系统与总线 I/O System and Bus （第二十讲 ） 程 旭 易江芳 2011.12.21."— Presentation transcript:

Similar presentations

About project

反馈

计算机组织与系统结构输入/输出系统与总线 I/O System and Bus （第二十讲）程旭易江芳 2011.12.21.

Presentation on theme: "计算机组织与系统结构输入/输出系统与总线 I/O System and Bus （第二十讲）程旭易江芳 2011.12.21."— Presentation transcript: