Chapter 14: Mass-Storage Systems 海量存储器系统

Slides:



Advertisements
Similar presentations
高考短文改错专题 张柱平. 高考短文改错专题 一. 对短文改错的要求 高考短文改错的目的在于测试考生判断发现, 纠正语篇中 语言使用错误的能力, 以及考察考生在语篇中综合运用英 语知识的能力. 二. 高考短文改错的命题特点 高考短文改错题的形式有说明文. 短文故事. 书信等, 具有很 强的实用性.
Advertisements

1 I/O 设备访问方式和类型. 2 Overview n The two main jobs of a computer: l I/O (Input/Output) l processing n The control of devices connneted to the computer is.
Information Resource Management
存储基础知识 V1.1.
Chapter 3: Operating-System Structures操作系统结构
CHAPTER 9 虛擬記憶體管理 9.2 分頁需求 9.3 寫入時複製 9.4 分頁替換 9.5 欄的配置法則 9.6 輾轉現象
Foundations of Computer Science
Chapter 17 數位革命與全球電子市場 Global Marketing Warren J. Keegan Mark C. Green.
資料庫設計 Database Design.
操作系统结构.
CHAP 2 Computer-System Structures 计算机系统结构
Chapter 2: Computer-System Structures计算机系统结构
第五章 设备管理 5.1 I/O系统 5.2 I/O控制方式 5.3 缓冲管理 5.4 设备分配 5.5 设备处理 5.6 磁盘存储器管理.
1. 理想的路由算法 有关路由选择协议的几个基本概念 算法必须是正确的和完整的。 算法在计算上应简单。
第6章 数媒资产管理系统的存储技术 刘士军 1、光纤通道
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Operating System CPU Scheduing - 2 Monday, August 11, 2008.
Operating System CPU Scheduing - 3 Monday, August 11, 2008.
联想DSS并行存储 张莫穷, 联想HPC团队
指導教授:許子衡 教授 報告學生:翁偉傑 Qiangyuan Yu , Geert Heijenk
EMC VMware架构下的备份解决方案 中国解决方案中心.
Chapter 5 電腦元件 目標---- 研讀完本章後,你應該可以: 閱讀有關電腦的廣告以及了解它的專業用語(行話)。
第 4 章 記憶單元.
全廠製造費用分攤率 全廠使用單一的製造費用分攤率。
網路技術管理進階班---區域網路的技術發展
網路技術管理進階班---網路連結 講師 : 陳鴻彬 國立東華大學 電子計算機中心.
Chap 13:Mass-Storage Structure 海量存储结构
不断变迁的闪存行业形势 Memory has changed, especially serial - from a low cost, low pin count, slow memory to an advanced, high performance memory solution to save.
Module 13: Secondary-Storage 二级存储
CHAPTER 8 VIRTUAL MEMORY
5 Computer Organization (計算機組織).
SOLUTIONACCELERATORS Windows Vista Hardware Assessment 1
Operating System Concepts 作業系統原理 CHAPTER 2 系統結構 (System Structures)
第 17 章 數位革命與 全球電子市場 © 2005 Prentice Hall.
Flash数据管理 Zhou da
第4章 网络互联与广域网 4.1 网络互联概述 4.2 网络互联设备 4.3 广域网 4.4 ISDN 4.5 DDN
中国科学技术大学计算机系 陈香兰 Fall2013 第十讲 文件管理 中国科学技术大学计算机系 陈香兰 Fall2013.
微程序控制器 刘鹏 Dept. ISEE Zhejiang University
创建型设计模式.
Ch 9: Input/Output System 输入/输出系统
存储系统.
Proware Technology Corp.
第七讲 网际协议IP.
校園網路架構介紹與資源利用 主講人:趙志宏 圖書資訊館網路通訊組.
第4章(1) 空间数据库 —数据库理论基础 北京建筑工程学院 王文宇.
Operating System Principles 作業系統原理
第十五课:在医院看病.
推动全球能源变革,以创造清洁、安全、繁荣的低碳未来。
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
Guide to a successful PowerPoint design – simple is best
计算机组装、维修及 实训教程 第9章 硬盘驱动器 2019年4月11日星期四.
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
虚 拟 仪 器 virtual instrument
中国科学技术大学计算机系 陈香兰 Fall 2013 第三讲 线程 中国科学技术大学计算机系 陈香兰 Fall 2013.
Common Qs Regarding Earnings
高考应试作文写作训练 5. 正反观点对比.
Distance Vector vs Link State
2019/4/27 华为公司标准工作汇报 华为技术有限公司 2009年10月.
第10章 存储器接口 罗文坚 中国科大 计算机学院
第六章 記憶體.
CHAPTER 6 Concurrency:deadlock And Starvation
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
Mechanics Exercise Class Ⅱ
Distance Vector vs Link State Routing Protocols
Operating System Software School of SCU
第6章 硬盘实用程序 GHOST 6.0 硬盘克隆(Clone)、硬盘分区拷贝工具
MGT 213 System Management Server的昨天,今天和明天
第11章 儲存裝置 與其管理.
Principle and application of optical information technology
When using opening and closing presentation slides, use the masterbrand logo at the correct size and in the right position. This slide meets both needs.
Presentation transcript:

Chapter 14: Mass-Storage Systems 海量存储器系统 14.1 Disk Structure 磁盘结构 14.2 Disk Scheduling 磁盘调度 14.3 Disk Management 磁盘管理 14.4 Swap-Space Management 交换空间管理 14.5 RAID Structure RAID结构 14.6 Disk Attachment 磁盘连接 14.7 Stable-Storage Implementation 稳定存储实现 14.8 Tertiary Storage Devices 三级存储设备 Operating System Issues 有关操作系统的问题 Performance Issues 有关性能的问题 Operating System Concepts

14.1 Disk Structure磁盘结构 Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. 磁盘设备是以一种逻辑块的一维大数组的形式编址的,这里的逻辑块是传输的最小单位。 The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially. 逻辑块的一维数组映射到磁盘上一些相连的扇区。 Sector 0 is the first sector of the first track on the outermost cylinder. 0扇区是最外边柱面的第一个磁道的第一个扇区。 Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost. 数据首先都映射到一个磁道,其余的数据映射到同一柱面的其他磁道,然后按照从外向里的顺序映射到其余的柱面。 Operating System Concepts

Disk Structure磁盘结构 Low-level formatted Constant linear velocity(CLV) (恒定线速度) Density of bits per track is uniform The farther a track is from the center of the disk, the greater its length, so the more sectors it can hold. 磁头越往中心移动,转速越快,以保持数据速率不变 CD-ROM和DVD-ROM采用这种方法 Constant angular velocity(CAV) (恒定角速度) 磁头转速不变 为保持数据速率不变,从中心往外,数据密度由大变小 硬盘等采用这种方法 Low-level formatted Block size:512 bytes or 1024 Operating System Concepts

14.2 Disk Scheduling磁盘调度 The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk bandwidth. 操作系统任务就是高效地使用硬件——对于磁盘设备,这意味着很短的访问时间和磁盘带宽。 Access time has two major components 访问时间包括两个主要部分 Seek time is the time for the disk are to move the heads to the cylinder containing the desired sector. 寻道时间是指把磁头移到所需柱面的时间。 Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head. 旋转延迟是指将磁头旋转到磁盘上指定扇区所需的等待时间。 Operating System Concepts

Disk Scheduling (Cont.) Minimize seek time 最小寻道时间 Seek time  seek distance 寻道时间  寻道距离 Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. 磁盘带宽,是用传输的总位数,除以第一个服务请求与最后传输完成之间的总时间。(也就是单位时间内数据传输量) Operating System Concepts

Disk Scheduling (Cont.) Several algorithms exist to schedule the servicing of disk I/O requests. 有几种磁盘I/O请求的服务调度算法 We illustrate them with a request queue (0-199). 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53 Operating System Concepts

14.2.1 FCFS Scheduling先来先服务调度 Simplest form. Illustration shows total head movement of 640 cylinders.如下图所示,磁头总共移动了640个柱面的距离。 Operating System Concepts

14.2.2 SSTF Scheduling 最短寻道时间优先调度 Shortest-seek-time-first(SSTF) algorithm Selects the request with the minimum seek time from the current head position. 选择从当前磁头位置所需寻道时间最短的请求。 SSTF scheduling is a form of shortest-job-first(SJF) scheduling; SSTF是SJF调度的一种形式; may cause starvation of some requests. 有可能引起某些请求的饥饿。 Illustration shows total head movement of 236 cylinders.如图所示,磁头移动的总距离是236柱面。 It’s not optimal, 最佳磁头移动的总距离是208柱面 Operating System Concepts

SSTF (Cont.) Operating System Concepts

14.2.3 SCAN Scheduling 扫描调度 The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. 磁头从磁盘的一端开始向另一端移动,沿途响应访问请求,直到到达了磁盘的另一端,此时磁头反向移动并继续响应服务请求。 Sometimes called the elevator algorithm. 有时也称为电梯算法。 Illustration shows total head movement of 236 cylinders. 如图所示,磁头移动的总距离是236柱面。 Operating System Concepts

SCAN (Cont.) Operating System Concepts

14.2.4 C-SCAN Scheduling Circular SCAN(C-SCAN) scheduling provides a more uniform wait time than SCAN. 提供比扫描算法更均衡的等待时间。 The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. 磁头从磁盘的一端向另一端移动,沿途响应请求。当它到了另一端,就立即回到磁盘的开始处,在返回的途中不响应任何请求。 Treats the cylinders as a circular list that wraps around from the last cylinder to the first one. 把所有柱面看成一个循环的序列,最后一个柱面接续第一个柱面。 磁头移动的总距离是183(*在从一边到另一边的变化过程中不接受任何请求)柱面。 Operating System Concepts

C-SCAN (Cont.) Operating System Concepts

14.2.5 LOOK/C-LOOK Scheduling SCAN和C-SCAN总是将磁臂在整个盘面宽度上移动,其实这样做并不实用。 SCAN和C-SCAN的一种改进形式:LOOK和C-LOOK。 C-LOOK: Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk. 磁臂在每个方向上仅仅移动到最远的请求位置,然后立即反向移动,而不需要移动到磁盘的一端。 磁头移动的总距离是153(*在从一边到另一边的变化过程中不接受任何请求)柱面。 Operating System Concepts

C-LOOK (Cont.) Operating System Concepts

14.2.6 Selecting a Disk-Scheduling Algorithm 选择一种磁盘调度算法 SSTF is common and has a natural appeal SSTF比较通用,性能一般。 SCAN and C-SCAN perform better for systems that place a heavy load on the disk. SCAN和C-SCAN在磁盘重负载的系统中性能较好。 Performance depends on the number and types of requests. 性能依赖于请求的数量和类型。 Requests for disk service can be influenced by the file-allocation method. 磁盘服务请求受到文件定位方式的影响。 目录和索引块的位置(最里面、最外面或中间柱面)也很重要。 上述算法仅考虑seek time, 而没有考虑 rotational latency Operating System Concepts

Selecting a Disk-Scheduling Algorithm (Cont.) The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary. 磁盘调度算法应该写成操作系统中的一个独立模块,在必要的时候允许用不同的算法来替换。 Either SSTF or LOOK is a reasonable choice for the default algorithm. SSTF和LOOK都是缺省算法的合理选择。 Operating System Concepts

磁盘I/O调度策略 来自不同进程的磁盘I/O请求构成一个随机分布的请求队列。磁盘I/O调度的主要目标就是减少请求队列对应的平均柱面定位时间。 先进先出算法 优先级算法 后进先出算法 短查找时间优先算法 扫描(SCAN)算法 循环扫描(C-SCAN)算法 N步扫描(N-step-SCAN)算法 双队列扫描(FSCAN)算法 Operating System Concepts

先进先出(FIFO, First In First Out)算法:磁盘I/O执行顺序为磁盘I/O请求的先后顺序。 后进先出(LIFO, Last In First Out)算法:后产生的磁盘I/O请求,先执行。 该算法是基于事务系统中顺序文件中磁盘I/O的局部性特征,相邻访问的位置也相邻。 它的问题在于系统负载重时,可能有进程的磁盘I/O永远不能执行,处于饥饿状态。 Operating System Concepts

该算法的目标是使每次磁头移动时间最少。它不一定是最短平均柱面定位时间,但比FIFO算法有更好的性能。 短查找时间优先(SSTF, Shortest Service Time First)算法:考虑磁盘I/O请求队列中各请求的磁头定位位置,选择从当前磁头位置出发,移动最少的磁盘I/O请求。 该算法的目标是使每次磁头移动时间最少。它不一定是最短平均柱面定位时间,但比FIFO算法有更好的性能。 对中间的磁道有利,可能会有进程处于饥饿状态。 扫描(SCAN)算法:选择在磁头前进方向上从当前位置移动最少的磁盘I/O请求执行,没有前进方向上的请求时才改变方向。 该算法是对SSTF算法的改进,磁盘I/O较好,且没有进程会饿死。 Operating System Concepts

循环扫描(C-SCAN)算法:在一个方向上使用扫描算法,当到达边沿时直接移动到另一沿的第一个位置。 该算法可改进扫描算法对中间磁道的偏好。实验表明,该算法在中负载或重负载时,磁盘I/O性能比扫描算法好。 N步扫描(N-step-SCAN)算法:把磁盘I/O请求队列分成长度为N的段,每次使用扫描算法处理这N个请求。当N=1时,该算法退化为FIFO算法。 该算法的目标是改进前几种算法可能在多磁头系统中出现磁头静止在一个磁道上,导致其它进程无法及时进行磁盘I/O。 双队列扫描(FSCAN)算法:把磁盘I/O请求分成两个队列,交替使用扫描算法处理一个队列,新生成的磁盘I/O请求放入另一队列中。 该算法的目标与N步扫描算法一致。 Operating System Concepts

14.3 Disk Management磁盘管理 Disk initialization Booting from disk Low-level formatting, or physical formatting磁盘低级格式化,或物理格式化 Booting from disk Boot blocks Bad-block recovery Bad blocks Operating System Concepts

14.3.1 Disk Format Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write. 低级格式化,或物理格式化——把磁盘划分成扇区,以便磁盘控制器可以进行读写。 扇区的数据结构:header + data area(usually 512 bytes) + trailer An error-correcting code(ECC) included in header ECC 在写数据时计算产生, 在读数据时校验,并有可能纠正错误 Operating System Concepts

Disk Format(Cont.) To use a disk to hold files, the operating system still needs to record its own data structures on the disk. 为了使用磁盘保存文件,操作系统还需要在磁盘上保存它自身的数据结构。 步骤: Partition the disk into one or more groups of cylinders.把磁盘划分成一组或多组柱面。 Logical formatting or “making a file system”. 逻辑格式化或“创建文件系统”。 使用磁盘系统有两种方法: Raw I/O (raw disk access 直接访问磁盘) Via regular file system services and I/O Operating System Concepts

14.3.2 Boot Block 启动块 Boot block initializes system. 启动块初始化系统 The bootstrap is stored in ROM. 引导程序存储在ROM中 The full bootstrap program is stored in a partition called the boot blocks. 完整的引导程序在磁盘的被称为引导块的分区上(系统盘,boot disk / system disk ) Bootstrap loader program. 引导程序装载程序。 Operating System Concepts

Fig 14.6 MS-DOS Disk Layout Operating System Concepts

14.3.3 Bad Block 坏块 简单磁盘系统(如 IDE接口) 复杂磁盘系统(如 SCSI接口) 通过命令方式处理坏块 如 MS-DOS 的 format 和 chkdsk 命令 复杂磁盘系统(如 SCSI接口) 通过sector sparing or forwarding 的方式处理坏块: 将坏块重定向到系统保留的空闲块上 空闲块在磁盘的某个用户不可见的位置,或在每个柱面的某个位置以减少数据移动距离(seek time) 通过sector slipping 的方式处理坏块(整体移动): 假如 17#扇区坏,随后第一个刻有扇区为202# 则处理方法为: 201202,200201,…,1819,1718 坏块未必一定能够被恢复 Operating System Concepts

14.4 Swap-Space Management 交换空间管理 Swap-space — Virtual memory uses disk space as an extension of main memory. 交换空间——虚拟内存使用磁盘空间作为对主存的扩展。 使用交换空间的主要目的:提高系统吞吐量 一个系统中,交换空间的大小可以是几兆~几个Gbytes Some OS, such as Unix, allow the use of multiple swap spaces It’s safer to overestimate than to underestimate swap space. Waste some disk space, but does not other harm. Operating System Concepts

Swap-Space Management 交换空间管理 Swap-space Location can be carved out of the normal file system, 交换空间可以与常规的文件系统一起使用 容易实现 效率低 =〉改进:cache block location information in main memory or, more commonly, it can be in a separate disk partition. 或者,更通常的情况是放在一个单独的磁盘分区里。 Use algorithm optimized for speed, not for storage efficiency Add more swap space via repartitioning of the disk 有些操作系统(如 Solaris 2)比较灵活,既可利用raw partitions,也可利用file-system space作为交换空间 Operating System Concepts

Swap-Space Management (Cont.) 4.3BSD allocates swap space when process starts; holds text segment (the program) and data segment. 4.3BSD在进程开始时分配交换空间;保存正文段(程序)和数据段。 交换是通过在连续的磁盘区域和内存之间copy整个进程来实现的 Kernel uses swap maps to track swap-space use. OS核心使用交换映像跟踪交换空间的使用情况。 Solaris 2 allocates swap space only when a page is forced out of physical memory, not when the virtual memory page is first created. Solaris 2 仅在一页被交换出物理内存的时候分配交换空间,而不是在虚拟内存页最初生成的时候。 Operating System Concepts

Fig 14.7 4.3 BSD Text-Segment Swap Map Fixed size (512KB), 最后一块除外(最后一块1KB为分配增量) Operating System Concepts

Fig14.8 4.3 BSD Data-Segment Swap Map 分配较复杂,因为数据段是动态增长的 Map is fixed size. Given index i (map entry number), the block size pointed by i is 2i x 16KB, maximum of 2MB When a process tries to grow its data segment beyond the final allocated block in its swap area, the operating system allocates another block, twice as large as the previous one. Index i = 0 1 2 3 4 … Operating System Concepts

14.5 RAID Structure RAID结构 RAID – Redundant Arrays of Inexpensive Disk multiple disk drives provides reliability via redundancy. Inexpensive  Independent RAID is arranged into six different levels. Mean time to failure (平均故障时间) Mean time to repair (平均修复时间) Mean time to data loss (平均数据丢失时间) Operating System Concepts

RAID (cont) Several improvements in disk-use techniques involve the use of multiple disks working cooperatively. Improvement of Reliability via Redundant 通过冗余提高可靠性 Reliability  Redundancy Simplest (but most expensive) approach is to duplicate every disk, called mirroring (or shadowing) RAID schemes improve performance and improve the reliability of the storage system by storing redundant data. Mirroring or shadowing keeps duplicate of each disk. Block interleaved parity(块交叉存取校验) uses much less redundancy. Operating System Concepts

RAID (cont) Improvement of Performance via Parallelism 通过并行性提高性能 With multiple disks, we can improve the transfer rate as well by striping data (bit-level or block-level) across multiple disks Disk data striping uses a group of disks as one storage unit. Bit-level striping: e.g. 4 disks, bits i and 4+i each byte go to disk i Block-level striping e.g. n disks, block i of a file goes to disk (i mod n)+1 Goals: Increase the throughput of multiple small accesses by load balancing Reduce the response time of large accesses Operating System Concepts

Fig 14.9 RAID Levels P– error-correcting bits C– a second copy of the data Operating System Concepts

RAID Levels RAID Level 0: disk arrays with striping at the level of blocks, but without any redundancy RAID Level 1: disk mirroring RAID Level 2: memory-style error-correcting code(ECC) organization All single-bit errors are detected by the memory system Error-correcting schemes store two or more extra bits Many extra disks, so it’s not used in practice Operating System Concepts

RAID Levels RAID Level 3: bit-interleaved parity (位交叉存取校验) organization 如果有某一个扇区坏了,可以确切地知道是哪一个;并且可以通过另一磁盘上的相应数据计算出该坏块上的每一位是0或是1 所需的冗余硬盘比RAID Level 2少 性能问题:计算和写校验数据需要时间 改进:使用NVRAM(non-volatile RAM) 或 Cache RAID Level 4: block-interleaved parity (块交叉存取校验) organization 与RAID Level 3相类似, 但在另一硬盘上保存了校验块(not bit) Block-level striping 每个硬盘上按块访问,并行性比较好 问题:一次写操作,需要访问磁盘4次(2次读老的块<数据块和校验块>,2次写新的块) , 也即:read-modify-write Operating System Concepts

RAID Levels RAID Level 5: block-interleaved distributed parity (块交叉分布式存取校验) 与RAID Level 4的区别:将数据和校验分布在所有N+1个磁盘上,而不是将数据写在N个盘上,将校验写在1个盘上 例如:磁盘阵列有5个硬盘构成,则第n个数据块的校验信息存放在第(n mod 5)+1个盘上,而第n块的实际数据分布在另外4个磁盘上 改进:与RAID Level 4相比,可以避免校验盘访问过频 RAID Level 6: P+Q redundancy scheme 与RAID Level 5的相类似,但存放更多冗余信息,以防多个磁盘失效 使用Error-correcting code,而不是parity,因此需要更多的冗余信息 Operating System Concepts

Fig 14.10 RAID (0 + 1) and (1 + 0) Operating System Concepts

RAID Levels RAID Level 0+1: a combination of RAID 0 and 1 RAID 0 提供性能(performance) RAID 1 提供可靠性(reliability) Better performance than RAID 5 A set of disks are striped, and then the stripe is mirrored to another, equivalent stripe RAID Level 1+0: disks are mirrored in pairs, and then the resulting mirror pairs are striped Have advantages over RAID 0+1 theoretically Operating System Concepts

RAID Levels RAID 1+0 have advantages over RAID 0+1 theoretically, for example: If a single disk fails in RAID 0+1, the entire stripe is inaccessible, leaving only the other stripe available With a failure in RAID 1+0, the single disk is unavailable, but its mirrored pair is still available as are all the rest of the disks Operating System Concepts

14.5.4 Selecting a RAID level RAID Level 0: used in high-performance applications RAID Level 1: used in high-reliability with fast recovery RAID Level 0+1 and 1+0: used in high-performance and reliability with fast recovery RAID Level 5: preferred for storing large volumes of data Hot spare disk 热备份磁盘: is not used for data, but is configured to be used as a replacement should any other disk fail. Allocating more than one hot spare allows more than one failure to be repaired without human intervention Operating System Concepts

14.5.5 Extensions The concepts of RAID have generalized to other storage devices, including arrays of tapes and even to the broadcast of data over wireless systems Tape-drive robots Operating System Concepts

14.6 Disk Attachment 磁盘连接 Disks may be attached one of two ways: 1.Host attached via an I/O port -- IDE, ATA and SCSI 2. Network attached via a network connection Operating System Concepts

Fig 14.11 Network-Attached Storage Operating System Concepts

Fig 14.12 Storage-Area Network Operating System Concepts

SAN SAN is a private network using storage protocols rather than networking protocols 当前SAN 系统普遍存在的缺陷: 协议不标准 设备的互操作性差 发展趋势: 用IP (Gigabit Ethernet) 网络协议作为交换设备 Operating System Concepts

14.7 Stable-Storage Implementation 稳定存储实现 Write-ahead log scheme requires stable storage. 向前写日志系统需要稳定存储。 To implement stable storage:为了实现稳定存储 Replicate information on more than one nonvolatile storage media with independent failure modes. 在多个非易失性存储介质上备份信息,这些介质具有不同的故障方式。 Update information in a controlled manner to ensure that we can recover the stable data after any failure during data transfer or recovery. 以一种有控制的方式更新信息,以便确保在数据传输或修复的过程中发生错误以后我们能够恢复稳定的数据。 Operating System Concepts

14.8 Tertiary Storage Structure 三级存储结构 Low cost is the defining characteristic of tertiary storage. 三级存储的定义特征是低成本。 Generally, tertiary storage is built using removable media 通常,三级存储由可移动介质构成。 Common examples of removable media are floppy disks、CD-ROMs, and tapes; other types are available. 通常的可移动介质的例子是软盘、光盘和磁带;其他还有一些类型。 Operating System Concepts

14.8.1 Tertiary Storage Devices三级存储设备 14.8.1.1 Removable Disks 可移动磁盘 Floppy disk — thin flexible disk coated with magnetic material, enclosed in a protective plastic case. 软盘——在又薄又软的盘面上涂上磁介质,装在一个用于保护的塑料套中。 Most floppies hold about 1 MB; similar technology is used for removable disks that hold more than 1 GB. 大多数软盘的容量是1MB;类似的技术也用于可移动磁盘,其容量大于1GB。 Removable magnetic disks can be nearly as fast as hard disks, but they are at a greater risk of damage from exposure. 可移动磁盘的速度几乎与硬盘一样快,但由于是暴露在外的,损坏的风险更大。 Operating System Concepts

Removable Disks (Cont.) A magneto-optic disk records data on a rigid platter coated with magnetic material. 光电磁盘在一个涂有磁介质的刚性盘面上记录数据。 Laser heat is used to amplify a large, weak magnetic field to record a bit. 激光的热量用来放大一个大面积脆弱的磁区域,来记录一位。 Laser light is also used to read data (Kerr effect). 激光的光线也用来读取数据(Kerr效应) The magneto-optic head flies much farther from the disk surface than a magnetic disk head, and the magnetic material is covered with a protective layer of plastic or glass; resistant to head crashes. 光电磁头距离盘面的距离比磁头远,并且磁介质上覆盖有塑料或玻璃的保护层;可抗击磁头撞击。 Operating System Concepts

Removable Disks (Cont.-1) Optical disks do not use magnetism; they employ special materials that are altered by laser light. 光盘不使用磁介质;它们使用激光改造过的特殊材料。 The data on read-write disks can be modified over and over. 光电磁盘在一个涂有磁介质的刚性盘面上记录数据。 Operating System Concepts

WORM Disks WORM (“Write Once, Read Many Times”) disks can be written only once. WORM(“一次写,多次读“)盘只能被写一次。 Thin aluminum film sandwiched between two glass or plastic platters.薄铝膜被加在两层玻璃或塑料盘中间。 To write a bit, the drive uses a laser light to burn a small hole through the aluminum; information can be destroyed by not altered.要写一位,驱动器用激光在铝膜上烧一个小洞;信息不能修改,只能被破坏。 Very durable and reliable.持久可靠。 Read Only disks, such ad CD-ROM and DVD, come from the factory with the data pre-recorded. 只读盘,比如光盘和DVD,都是在工厂进行了数据预存储的。 Operating System Concepts

14.8.1.2 Tapes 磁带 Compared to a disk, a tape is less expensive and holds more data, but random access is much slower.与磁盘比较,磁带更便宜,并且能保存更多数据,但是随机访问非常慢。 Tape is an economical medium for purposes that do not require fast random access, e.g., backup copies of disk data, holding huge volumes of data. 如果不需要快速随机存取,磁带是一种经济的媒介,比如备份磁盘数据、保存极大量的数据 Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library. 大型磁带装置通常使用自动磁带机,把磁带从磁带库的磁带驱动器移动到存储槽。 Operating System Concepts

Tapes (Cont.) stacker – library that holds a few tapes 栈式存储器——保存有一些磁带的库。 silo – library that holds thousands of tapes 队式存储器——保存有数以千计磁带的库。 A disk-resident file can be archived to tape for low cost storage; the computer can stage it back into disk storage for active use. 一个磁盘驻留文件可以存到磁带上,以便降低存储成本;计算机可以为了当前的使用,把它传输回到磁盘上。 Operating System Concepts

14.8.1.3 Future Technology Holographic storage 全息存储 Micro-electronic mechanical systems (MEMS) Operating System Concepts

14.8.2 Operating System Jobs 操作系统的工作 Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications 主要的系统工作是管理物理设备,并且为应用程序提供一个虚拟机的抽象。 For hard disks, the OS provides two abstraction: 对于硬盘,操作系统提供两个抽象 Raw device – an array of data blocks. Raw设备——一个数据块的数组。 File system – the OS queues and schedules the interleaved requests from several applications. 文件系统——操作系统对几个应用程序交叉的请求进行排队和调度。 Operating System Concepts

14.8.2.1 Application Interface应用程序接口 Most OSs handle removable disks almost exactly like fixed disks — a new cartridge is formatted and an empty file system is generated on the disk. 多数操作系统处理可移动磁盘的方式与固定磁盘几乎是一样的——格式化一个新的盘碟,同时在盘上生成一个空的文件系统。 Tapes are presented as a raw storage medium, i.e., and application does not not open a file on the tape, it opens the whole tape drive as a raw device. 磁带作为一个raw存储介质,也就是说,应用程序不是打开磁带上的一个文件,而是作为raw设备打开整个磁带。 Usually the tape drive is reserved for the exclusive use of that application. 通常磁带设备是由一个应用程序独占使用的。 Operating System Concepts

Application Interface (Cont.) Since the OS does not provide file system services, the application must decide how to use the array of blocks. 由于操作系统在磁带上不提供文件系统服务,应用程序必须决定如何使用数据块的数组。 Since every application makes up its own rules for how to organize a tape, a tape full of data can generally only be used by the program that created it. 由于每个应用程序对于如何组织一个磁带都建立了自己的规则,一个装满数据的磁带通常只能由创建它的应用程序来使用。 The basic operations for a tape drive differ from those of a disk drive. 磁带驱动器的基本操作与磁盘驱动器是不同的。 Operating System Concepts

Tape Drives磁带驱动器 locate positions the tape to a specific logical block, not an entire track (corresponds to seek). 定位操作指向磁带的一个特定逻辑块,而不是一个完整的磁道(与寻道操作对比) The read position operation returns the logical block number where the tape head is. 读位置操作返回磁带头所在的逻辑块号。 The space operation enables relative motion. 间隔操作允许相关的运动。 Tape drives are “append-only” devices; updating a block in the middle of the tape also effectively erases everything beyond that block. 磁带驱动器是“只能附加”的设备;更新磁带中间的一个块,需要先清除那个块上的所有数据。 An EOT mark is placed after a block that is written. 写完的一块后面需要加上一个EOT标记。 Operating System Concepts

14.8.2.2 File Naming 文件命名 The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer, and then use the cartridge in another computer. 当我们想向一台计算机的一个可移动盘碟写入数据、然后在另一台计算机中使用的时候,命名文件的问题在可移动媒介上更加困难。 Contemporary OSs generally leave the name space problem unsolved for removable media, and depend on applications and users to figure out how to access and interpret the data. 现代操作系统通常没有解决可移动媒介上的命名空间问题,而是依靠应用程序和用户来指出如何访问解释数据。 Some kinds of removable media (e.g., CDs) are so well standardized that all computers use them the same way.一些可移动介质(比如CD)相当的标准化,所有的计算机都以同样的方式使用它们。 Operating System Concepts

14.8.2.3 Hierarchical Storage Management (HSM) 层次存储管理 A hierarchical storage system extends the storage hierarchy beyond primary memory and secondary storage to incorporate tertiary storage — usually implemented as a jukebox of tapes or removable disks. 一个层次存储系统扩展了存储层次,从主存、二级存储到一体化的三级存储——通常是一个磁带的自动播放机或者可移动磁盘。 Usually incorporate tertiary storage by extending the file system.通常通过扩展文件系统来一体化三级存储。 Small and frequently used files remain on disk. 经常使用的小文件仍然存在磁盘上。 Large, old, inactive files are archived to the jukebox. 不使用的大的旧文件存在自动播放机上。 HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data. HSM在超级计算中心和其他有庞大数据量的大设备中比较常见。 Operating System Concepts

14.8.3 Performance Issues有关性能的问题 14.8.3.1 Speed 速度 Two aspects of speed in tertiary storage are bandwidth and latency. 三级存储速度的两个方面是带宽和延迟。 Bandwidth is measured in bytes per second. 带宽用每秒字节数来衡量。 Sustained bandwidth – average data rate during a large transfer; # of bytes/transfer time. Data rate when the data stream is actually flowing. 持续的带宽——大量传输过程中的平均数据率;单位传输时间的字节数。数据流实际流动时的数据率。 Operating System Concepts

Speed Effective bandwidth – average over the entire I/O time, including seek or locate, and cartridge switching. Drive’s overall data rate. 有效带宽——整个I/O时间的平均,包括寻道或者定位,以及盘碟选择。驱动器的全面数据率 Access latency – amount of time needed to locate data. 访问延迟——定位数据需要的时间。 Access time for a disk – move the arm to the selected cylinder and wait for the rotational latency; < 35 milliseconds. 磁盘的访问时间——移动磁臂来选择柱面,并且等待旋转延迟;<35毫秒。 Access on tape requires winding the tape reels until the selected block reaches the tape head; tens or hundreds of seconds. 访问磁带需要把所选的块倒到磁带头的位置;数十甚至数百秒。 Operating System Concepts

Speed (Cont.) Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk. 一般来说,对磁带的随机访问比对磁盘的随机访问要慢一千倍。 The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives. 三级存储成本低,这是许多便宜的盘碟使用少量昂贵驱动器的结果。 A removable library is best devoted to the storage of infrequently used data, because the library can only satisfy a relatively small number of I/O requests per hour. 一个可移动的库对于很少使用的数据的存储是很好的,因为库每个小时只需要满足相对很少的I/O请求。 Operating System Concepts

14.8.3.2 Reliability 可靠性 A fixed disk drive is likely to be more reliable than a removable disk or tape drive. 固定磁盘驱动器比可移动磁盘或磁带驱动器更可靠。 An optical cartridge is likely to be more reliable than a magnetic disk or tape. 光介质比磁介质的磁盘或磁带更可靠。 A head crash in a fixed hard disk generally destroys the data, whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed. 对于固定的硬盘,磁头撞击通常会破坏数据,然而磁带或光盘驱动器的错误通常对数据盘碟是无害的。 Operating System Concepts

14.8.3.3 Cost 成本 Main memory is much more expensive than disk storage 主存比磁盘存储要贵很多。 The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive. 硬盘存储的每兆字节成本与磁带不相上下,如果每个驱动器只用一条磁带。 The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years.近年来,最便宜的磁带驱动器和最便宜的磁盘驱动器的存储容量几乎一样。 Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives. 只有当盘碟的数量远大于驱动器数量的时候,三级存储才能节约成本。 Operating System Concepts

Fig 14.13 Price per Megabyte of DRAM, From 1981 to 2000 Operating System Concepts

Fig 14.14 Price per Megabyte of Magnetic Hard Disk, From 1981 to 2000 Operating System Concepts

Fig 14.15 Price per Megabyte of a Tape Drive, From 1984-2000 Operating System Concepts

Exercises 2, 10 Operating System Concepts