数字图书馆的信息基础架构 樊华 存储架构师 2009.06.04 IBM Confidential
国内数字图书馆的建设方兴未艾 数字图书馆的主要优点 信息储存空间小、不易损坏 信息查阅检索方便 远程迅速传递信息 同一信息可多人同时共享 多媒体资料 …… 2001 国家计委批准立项“全国党校系统数字图书馆建设计划” 2000 文化部制定《中国数字图书馆工程一期规划 》 1999 国家图书馆完成“数字图书馆试验演示系统” 1998 首都图书馆成为“中国数字图书馆工程首家示范单位” IBM System Storage N3300 and N3600 disk systems and IBM System Storage DS3200/DS3300/DS3400/DS4200/DS4700 Express DS3400 ranked #1 as network storage hardware Dynamic data provisioning: On-the-fly provisioning that allows volumes to expand and contract based on application requirements. This allows for better resource utilization. Optimized for interoperability with IBM server: Solution for System x servers, System p, System z and System i servers (running AIX or Linux). Also, with BladeCenter servers, and BladeCenter Power blade servers (running AIX or Linux) Rich set of storage deployment options: direct attach external Storage and SAN storage for data consolidation, optimization and single point of management Flexible information access: block I/O( SAN functionality DS3000, DS4000 and NSeries) and file serving (NAS functionality - NSeries) Extensive set of SAN Configuration options: High performance and easy to use, cost-effective connectivity with pay-as-you grow scalability with IBM System storage SAN switches. (Ex: IBM System Storage SAN768B fabric backbone; SAN256B, SAN140M and SAN256M directors and IBM System Storage SAN16B-2 Express, SAN32B-3 and SAN64B-3 fabric switches Cisco MDS 9506, MDS 9509 and MDS 9513 directors for IBM System Storage directors and Cisco MDS 9124 Express, MDS 9134 and MDS 9222i for IBM System Storage fabric switches) 1997 中国试验型数字式图书馆项目立项 标志着我国数字国书馆建设的开始 1996 北京第62届国际图联(IFLA)大会 IBM公司和清华大学图书馆联手展示“IBM数字图书馆方案” IBM Confidential
数字图书馆的业务特点和面临的挑战 海量信息 几十TB,上百TB的数据量,且不断高速增长,对投资、信息管理及系统可扩展能力带来巨大的挑战。 多种信息类型 非结构化数据,包括文档、图片 、影音文件 结构化数据,包括检索数据库,业务功能系统 信息价值 数字图书馆的信息大都通过艰苦的数字化转换过程而形成,投入大,价值高,需要进行妥善的保护。相对传统图书馆而言,数字信息更易于通过IT技术进行复制和备份保护,但同时数字信息也更易于被盗用。 信息访问更便捷 通过网络即可对数字图书馆的信息进行便捷的访问,允许更多的人员通过网络进行7*24小时的并发访问。其对系统的性能和可用性有较高的要求。 信息需长期保存 数字图书馆的信息大都需要永久保存。信息的生命周期超过存储设备的生命周期。 IBM System Storage N3300 and N3600 disk systems and IBM System Storage DS3200/DS3300/DS3400/DS4200/DS4700 Express DS3400 ranked #1 as network storage hardware Dynamic data provisioning: On-the-fly provisioning that allows volumes to expand and contract based on application requirements. This allows for better resource utilization. Optimized for interoperability with IBM server: Solution for System x servers, System p, System z and System i servers (running AIX or Linux). Also, with BladeCenter servers, and BladeCenter Power blade servers (running AIX or Linux) Rich set of storage deployment options: direct attach external Storage and SAN storage for data consolidation, optimization and single point of management Flexible information access: block I/O( SAN functionality DS3000, DS4000 and NSeries) and file serving (NAS functionality - NSeries) Extensive set of SAN Configuration options: High performance and easy to use, cost-effective connectivity with pay-as-you grow scalability with IBM System storage SAN switches. (Ex: IBM System Storage SAN768B fabric backbone; SAN256B, SAN140M and SAN256M directors and IBM System Storage SAN16B-2 Express, SAN32B-3 and SAN64B-3 fabric switches Cisco MDS 9506, MDS 9509 and MDS 9513 directors for IBM System Storage directors and Cisco MDS 9124 Express, MDS 9134 and MDS 9222i for IBM System Storage fabric switches) IBM Confidential
数字图书馆的建设目标-随需应变的信息服务 人员, 流程, 应用 策略 & 实施 合作伙伴 & 解决方案 Intelligent Management. Protected Information. Smarter Insights. IBM’s commitment to the cross-company Information On Demand initiative has resulted in a comprehensive portfolio of software, services, hardware and industry-specific solutions to turn Information On Demand vision into reality for organizations of any size. This includes infrastructure required for information, such as storage and servers, strategy and implementation services, a the broadest partner base to provide integrated applications and tools to take advantage of Information On Demand capabilities, and skills and expertise to help customers achieve the benefits of Information On Demand. 面向信息的业务价值,辅以灵活的基础架构,安全地存储信息和减轻经营风险 IBM Confidential
数字图书馆基础架构建设的三项原则和四项关注点 提高服务 控制投资 降低风险 四项关注点 信息可用性 信息安全性 信息保存性 信息遵从性 信息遵从性 信息保存性 信息安全性 IBM System Storage N3300 and N3600 disk systems and IBM System Storage DS3200/DS3300/DS3400/DS4200/DS4700 Express DS3400 ranked #1 as network storage hardware Dynamic data provisioning: On-the-fly provisioning that allows volumes to expand and contract based on application requirements. This allows for better resource utilization. Optimized for interoperability with IBM server: Solution for System x servers, System p, System z and System i servers (running AIX or Linux). Also, with BladeCenter servers, and BladeCenter Power blade servers (running AIX or Linux) Rich set of storage deployment options: direct attach external Storage and SAN storage for data consolidation, optimization and single point of management Flexible information access: block I/O( SAN functionality DS3000, DS4000 and NSeries) and file serving (NAS functionality - NSeries) Extensive set of SAN Configuration options: High performance and easy to use, cost-effective connectivity with pay-as-you grow scalability with IBM System storage SAN switches. (Ex: IBM System Storage SAN768B fabric backbone; SAN256B, SAN140M and SAN256M directors and IBM System Storage SAN16B-2 Express, SAN32B-3 and SAN64B-3 fabric switches Cisco MDS 9506, MDS 9509 and MDS 9513 directors for IBM System Storage directors and Cisco MDS 9124 Express, MDS 9134 and MDS 9222i for IBM System Storage fabric switches) 信息可用性 提高服务 控制投资 降低风险 IBM Confidential
数字图书馆基础架构建设应关注的4大类问题 信息可用性 信息安全性 信息保存性 信息遵从性 支持客户信息保持策略 降低声誉风险和升级缺陷 提供信息持续、可靠的访问 保护和实现安全的信息共享 需应对海量信息容量的高速扩展 需提供7*24小时的高性能访问服务 从灾难中恢复的能力 简化存储管理 从数据损坏和系统故障中恢复 避免信息泄露风险 Information “CARS” 信息保存性 信息遵从性 支持客户信息保持策略 降低声誉风险和升级缺陷 IBM Information Infrastructure is an initiative that helps clients meet the challenge of Information Explosion by helping to improve competencies around 4 key areas: Information Availability Information Security Information Retention Information Compliance Core competencies enable “Data Tone” or Information as a Service. Information Infrastructure is a supporting technology for business applications 降低海量信息长期保存的总体投资 数据压缩技术 按信息价值不同保存长期信息到更便宜的存储设备上 满足法律法规和企业自身对信息的可靠性要求 IBM Confidential
Agenda 信息可用性 信息安全性 信息保存性 信息遵从性 支持客户信息保持策略 降低声誉风险和升级缺陷 提供信息持续、可靠的访问 保护和实现安全的信息共享 信息高可用解决方案 存储虚拟化解决方案 并行文件系统 网格化存储系统 数字图书馆的信息备份与恢复 磁盘、磁带加密技术 Agenda 信息保存性 信息遵从性 支持客户信息保持策略 降低声誉风险和升级缺陷 IBM Information Infrastructure is an initiative that helps clients meet the challenge of Information Explosion by helping to improve competencies around 4 key areas: Information Availability Information Security Information Retention Information Compliance Core competencies enable “Data Tone” or Information as a Service. Information Infrastructure is a supporting technology for business applications 分级存储与信息生命周期管理 重复数据删除技术 NENR(不可删除,不可修改)存储解决方案 IBM Confidential
存储发展历程及技术特点简介 Businesses are facing multiple forces that create the need to re-examine the Information Infrastructure and its ability to meet the projected needs of the business: Information Explosion Business Optimization Opportunities Risk and Cost Management = = = = = = Background “Why Now?” Historically, IT professionals have balanced the challenges associated with managing data centers as they increase in cost and complexity with the need to be highly responsive to ongoing demands from the business placed on IT. But never before has the growth of the IT marketplace faced such a “perfect storm” of forces that stimulate the need for true data center transformation. We have captured this visually here by showing what is sometimes opposing forces - Operational challenges around cost, service delivery, business resiliency and security, and “green” initiatives that have IT at a breakpoint ; And Business and technology innovation that can drive competitive advantage, but wreak havoc with existing IT infrastructures IBM Confidential
直联存储 网络连接存储 存储区域网络 iSCSI 存储技术名词解释 DAS - Direct Attached Storage NAS - Network Attached Storage 存储区域网络 SAN - Storage Area Network iSCSI iSCSI - Internet SCSI IBM Confidential
磁盘存储技术的历史发展 NAS DAS SAN iSCSI LAN LAN LAN 内置磁盘 文件服务器 IP 存储网 专用NAS RAID 外置 SCSI盘阵 FC Switch SAN iSCSI Ethernet Switch IBM Confidential
SAN、iSCSI、NAS的特点 NAS SAN 为解决DAS的问题而产生的存储技术 是DAS+网络的一种技术 多应用于需高性能的业务系统 为解决数据共享问题和优化文件(File)存储而产生的存储技术 文件管理系统在存储设备端,是文件服务+IP的一种技术 多应用于需要文件共享访问的业务系统 SAN 为解决DAS的问题而产生的存储技术 是DAS+网络的一种技术 多应用于需高性能的业务系统 iSCSI 是SAN+IP的一种技术,所以iSCSI也叫IP SAN 多应用于PC服务器平台、Windows(SQL Server、Exchange)、Linux、中小型数据库 IBM Confidential
存储构架-DAS、SAN、iSCSI、NAS 应用服务器 应用服务器 应用服务器 应用服务器 应用服务器 文件系统 文件系统 文件系统 NFS, CIFS Ethernet Switch SCSI FC Switch Ethernet Switch RAID RAID RAID RAID 文件系统 IBM Confidential
SAN、iSCSI、NAS架设 SAN iSCSI NAS 服务器上购买安装HBA卡 网络使用FC Switch 服务器上安装免费的iSCSI initiator(操作系统厂商免费提供)软件 网络使用通用的Ethernet Switch NAS 服务器上不需要安装任何软硬件 SAN iSCSI NAS 应用服务器 FC HBA 应用服务器 iSCSI initiator 以太网口 应用服务器 以太网口 NFS, CIFS Ethernet Switch FC Switch Ethernet Switch IBM Confidential
信息可用 Businesses are facing multiple forces that create the need to re-examine the Information Infrastructure and its ability to meet the projected needs of the business: Information Explosion Business Optimization Opportunities Risk and Cost Management = = = = = = Background “Why Now?” Historically, IT professionals have balanced the challenges associated with managing data centers as they increase in cost and complexity with the need to be highly responsive to ongoing demands from the business placed on IT. But never before has the growth of the IT marketplace faced such a “perfect storm” of forces that stimulate the need for true data center transformation. We have captured this visually here by showing what is sometimes opposing forces - Operational challenges around cost, service delivery, business resiliency and security, and “green” initiatives that have IT at a breakpoint ; And Business and technology innovation that can drive competitive advantage, but wreak havoc with existing IT infrastructures IBM Confidential
信息可用之一-存储高可用性 传统的:2+1( 主机高可用、存储未高可用 ) 现在的:2+2( 主机高可用、存储高可用 ) 业务价值 HA HA Active Server Backup Server Active Server Backup Server SAN SAN SAN SAN 数据同步复制 Primary Copy Target Copy 存储 存储 业务价值 100% 的本地数据访问弹性 没有受磁盘设备故障导致的应用中断时间,或应用中断时间最小化 与远程灾备系统相辅相成 方便易行的数据保护和故障恢复过程 IBM Confidential
存储高可用性解决方案一:文件系统卷级镜像 主机 磁盘1 磁盘2 RPO=0 RTO: 单个存储故障:RTO=0 数据中心故障:RTO<30分钟 条件: 主备存储在同一个SAN中,两者距离不超过几百米 特点: 持续的可用性 通过软件实现 双磁盘写,对性能的影响轻微 主要产品 AIX LVM Veritas 磁盘1故障时,磁盘2不需要重新在主机上mount,应用无需中断。真正实现了无缝接管 LVM advantage: The only vendor who can support both LVM mirroring and disk replication LVM data mirroring is a solution with host and storage technologies, IBM provides state-of-the-art solutions with these capability The solution requires less investment, enables easy management without any fail-over operation LVM is a feature incorporated in AIX, with much less implementation efforts and costs IBM Confidential
存储高可用性解决方案二:本地磁盘复制 RPO=0 RTO: 单个存储故障:RTO<30分钟 数据中心故障:RTO<30分钟 条件: 主备存储间同步数据复制,两者距离不超过100-300公里,主备存储同构或存储虚拟化 特点: 生产磁盘故障时,需要重新启动的动作来恢复生产 通过硬件实现 对服务器透明的数据镜像 磁盘镜像/复制技术 主要产品: IBM DS8000/DS5000/4000 Metro Mirror EMC DMX SRDF HDS TrueCopy 主机 备机 HA SAN Active Disk 存储同步复制 Backup Disk IBM Confidential
信息可用之二-虚拟化技术? 虚拟化的资源 替代物理的资源: 同样的接口/ 功能,但是摆脱了物理资源的限制 可以综合利用所有的物理资源 虚拟化 : 一种替代以前的进程 创建虚拟化资源并且映射到物理资源 可以进行一对多或多对一的映射 映射任务由软件或微码实现 物理资源 有自己的接口和功能的部件 通常是物理的,可以是集中的,也可以是分散的 例子:内存,磁盘,网络,服务器 虚拟化的资源 替代物理的资源: 同样的接口/ 功能,但是摆脱了物理资源的限制 可以综合利用所有的物理资源 IBM Confidential
什么是存储虚拟化? 在物理存储系统和服务器之间增加一个虚拟层,它管理和控制所有存储并对服务器提供存储服务。 服务器不直接与存储硬件打交道,存储硬件的增减、调换、分拆、合并对服务器层完全透明。 隐藏了复杂程度 允许将现有的功能集成使用 摆脱了物理容量的局限 逻辑表现 虚拟化 物理设备 IBM Confidential
存储虚拟化的不同实现方式 基于主机(Veritas) 基于SAN网络 (IBM, EMC) 基于磁盘阵列 (HDS) 虚拟化软件安装在应用主机上 从连接到主机的不同存储上划分虚拟卷 基于SAN网络 (IBM, EMC) 虚拟引擎在一个专用的集成设备上或光纤交换机上 从连接到SAN的存储上划分虚拟卷 基于磁盘阵列 (HDS) 虚拟化软件包含在磁盘阵列控制器上 从连接到该磁盘阵列的存储上划分虚拟卷 IBM Confidential
存储虚拟化对数字图书馆的价值 数字图书馆特点 存储虚拟化 高速增长的海量信息 跨平台、跨存储的存储池部置能力 信息保存期长于存储设备生命期 在线数据迁移能力 海量信息备份保护 灵活的快照能力 灾难恢复保护 异构存储的同步/异步复制能力 分级存储管理,节省投资,优化性能。 分级存储池建设,及在线数据迁移 高速的数据库检索与大容量的信息保存 分级存储池管理能力 IBM Confidential
信息可用之三-何谓并行文件系统 支持并行I/O操作 数据物理上分布在多个存储上 提供单一的名称空间视图,实现文件访问的位置独立性 向应用提供数据访问接口 减少磁盘接口和网络带宽造成的瓶颈,优化I/O资源的使用 IBM Confidential
并行文件系统如何工作 IBM GPFS 昆腾 StoreNext Apple Xsan2.0 ….. IBM Confidential
可以在线添加存储和服务器,不影响应用正常运行 在IO服务器和磁盘存储两层都有很好的线性扩展能力 并行文件系统工作方式和扩展方式 Ethernet InfiniBand 可能瓶颈点2 SAN 可能瓶颈点1 可以在线添加存储和服务器,不影响应用正常运行 在IO服务器和磁盘存储两层都有很好的线性扩展能力 IBM Confidential
高可用性 并行文件系统的优势一 先进的仲裁管理机制,确保系统最大程度的可用性,没有单一故障点 管理服务器在manager资源池内实现自动故障切换 支持多路径磁盘访问,一条路径访问失败,可以通过其它路径实现 支持对元数据和用户数据做replication,保证系统稳定可靠 Rolling Update,不停机升级 支持日志功能,实现系统快速恢复 IBM Confidential
并行文件系统的优势二 性能 文件存储采用条带化技术,单个文件跨节点和存储系统分布,提高并发访问性能 智能预取机制,通过对文件访问模式的预测来进行预取,降低读写延迟 分布式的Byte Range级锁管理,包括文件和目录两个级别,允许最大程度的并发访问 分布式元数据服务器,避免元数据处理成为系统瓶颈 支持客户端数据缓存,不同节点可以根据需要设置不同的缓存大小 数据块的大小可自定义,16K, 64K, 256K, 512K, 1M,2M,4M IBM Confidential
并行文件系统的优势三 可扩展性 最大可支持数千个节点的集群规模和数百GB每秒的IO吞吐量 在不停止服务的情况下向集群添加和删除节点 在不停止服务的情况下向文件系统加入和删除磁盘 在不停止服务的情况下修改文件系统inode数目 IBM Confidential
并行文件系统的优势四-信息生命周期管理 目录结构 /home |--… |--file1 |--file2 目录结构不变,数据移动对用户透明 节点1 节点2 节点3 节点4 /home |--… |--file1 |--file2 mmapplypolicy 1. 扫描元数据 2. 匹配规则 3. 移动数据 目录结构不变,数据移动对用户透明 System Pool1 Pool2 File2 File1 /home 光纤存储 SATA存储 IBM Confidential
并行文件系统的能力及其对数字图书馆的价值(以GPFS为例) 项目 最大限制 对数字图书馆的价值 集群中的节点数 8192 高可用性、高可扩展能力,降低风险 单个文件系统容量 249PB 海量信息统一保存; 简化信息存储管理和策略; 支持跨存储保存信息 降低风险 集群中并行文件系统个数 256 单个文件系统中文件个数 2,147,483,648 单个文件系统能用到的逻辑卷个数(LUN数) 268 million 每个逻辑卷容量 取决于磁盘阵列系统支持的能力 I/O带宽 134GB/s 高性能,提升服务 文件系统信息生命周期管理 支持,透明执行 节省投资 IBM Confidential
信息可用之四-网格存储系统 性能 可靠性 可扩展性 = $$$ 传统存储产品所面临的问题 组成模块: Scale Up 组成模块: 磁盘 缓存 控制器 接口 内部连接 Interface Interface Interface Controllers 性能 可靠性 可扩展性 = $$$ 传统的架构中, 可扩展性的获得依赖于更高性能(同时也更为昂贵)的组件 双控制器集群 定制的硬件 组件成本高昂 漫长而又复杂的产品开发周期 复杂的被动式服务 需要循环往复的性能优化过程 Cache JBOD JBOD IBM Confidential 30
革命性的网格系统架构解决传统存储之痛 Scale Out 设计原则: 自愈能力,自动优化 可扩展网格节点模块 节点间相对独立独立 大规模并行 细粒度数据分布 业界标准组件 -紧耦合disk, RAM and CPU 虚拟化架构-零管理 自愈能力,自动优化 可扩展网格节点模块 节点间相对独立独立 开放: 标准硬件模块,组件成本经济 快速,高效的产品开发周期 简单容易的服务模式 Interface Interface Interface Interface Interface Switching Switching Data Module Data Module Data Module Data Module Data Module Data Module Data Module IBM Confidential 31
每个卷会分布到所有的磁盘驱动器上/ 所有数据镜像 数据被分成1MB “分区” 存储在磁盘上 2017/3/22 网格存储的 数据分布算法 每个卷会分布到所有的磁盘驱动器上/ 所有数据镜像 数据被分成1MB “分区” 存储在磁盘上 以伪随机的方式 自动地 将 ”分区” 分布到系统 所有 的磁盘上 Data Module Interface Switching IBM Confidential 32 32
网格存储在系统变更时的数据分布算法 数据的分布只有在系统变更时才发生改变 当新硬件添加时均衡仍然得以保持 当旧硬件移除时均衡仍然得以保持 2017/3/22 网格存储在系统变更时的数据分布算法 数据的分布只有在系统变更时才发生改变 当新硬件添加时均衡仍然得以保持 当旧硬件移除时均衡仍然得以保持 当硬件出故障时均衡仍然得以保持 Data Module 1 Node 1 Data Module 2 Node 2 Data Module 3 Node 3 IBM Confidential 33 33
网格存储在系统变更时的数据分布算法 数据的分布只有在系统变更时才发生改变 当新硬件添加时均衡仍然得以保持 当旧硬件移除时均衡仍然得以保持 2017/3/22 网格存储在系统变更时的数据分布算法 数据的分布只有在系统变更时才发生改变 当新硬件添加时均衡仍然得以保持 当旧硬件移除时均衡仍然得以保持 当硬件出故障时均衡仍然得以保持 Data Module 1 Data Module 2 Data Module 3 Data Module 4 Node 4 [ hardware upgrade ] IBM Confidential 34 34
网格存储在系统变更时的数据分布算法 数据的分布只有在系统变更时才发生改变 2017/3/22 网格存储在系统变更时的数据分布算法 数据的分布只有在系统变更时才发生改变 当新硬件添加时均衡仍然得以保持 当旧硬件移除时均衡仍然得以保持 当硬件出故障时均衡仍然得以保持 数据分布的 完整化 和 自动化 确保当配置变更时所有的磁盘驱动器都会加入到数据重新分布的任务中. 因此而带来的是恢复/优化期间巨大的性能回报 [ hardware failure ] Data Module 1 Data Module 2 Data Module 3 Data Module 4 IBM Confidential 35 35
信息安全 Businesses are facing multiple forces that create the need to re-examine the Information Infrastructure and its ability to meet the projected needs of the business: Information Explosion Business Optimization Opportunities Risk and Cost Management = = = = = = Background “Why Now?” Historically, IT professionals have balanced the challenges associated with managing data centers as they increase in cost and complexity with the need to be highly responsive to ongoing demands from the business placed on IT. But never before has the growth of the IT marketplace faced such a “perfect storm” of forces that stimulate the need for true data center transformation. We have captured this visually here by showing what is sometimes opposing forces - Operational challenges around cost, service delivery, business resiliency and security, and “green” initiatives that have IT at a breakpoint ; And Business and technology innovation that can drive competitive advantage, but wreak havoc with existing IT infrastructures IBM Confidential
信息安全之一:数字图书馆信息备份与恢复 企业数据集中备份逐渐普及 数字图书馆中数据备份中常见的难题 LAN集中备份 LAN-Free集中备份 Server less备份 数据库的在线数据保护 数字图书馆中数据备份中常见的难题 海量数据的备份保护难题 大批量小文件的数据备份保护难题 NAS数据的备份保护难题 7*24小时业务的备份时间窗难题 业务可用性 海量信息 备份时间窗 IBM Confidential
对海量文件信息备份常用的技术手段 永久增量备份技术(TSM) 合并备份技术 (Veritas…) 利用存储快照技术 用虚拟磁带库提升恢复速度 单实例永久增量备份保存 时间点恢复 合并备份技术 (Veritas…) 增量备份保存 合并全备份以利恢复 利用存储快照技术 实现Server less备份减小备份时间窗 用虚拟磁带库提升恢复速度 备份NAS信息 NAS to NAS备份SANPVAULT SnapMirror to tape 业务可用性 海量信息 备份时间窗 IBM Confidential
三种磁带库技术的比较 普通磁带库 虚拟磁带库 删重技术虚拟磁带库 介质类型 磁带 磁盘 可管理容量 无限 ≤2PB 通常<1PB, IBM ≤ 25PB 通常物理容量 50-300TB 有限,10-300TB 5TB-30TB 驱动器数量 ≤192 ≤4096 ≤512 备份速度 120MB/Sec/DRV ≤4.8GB/Sec ≤1GB/Sec 优势 简单,“价廉” 高速且并发的备份与恢复 大容量,并发性,快速恢复 缺点 恢复速度慢 实配容量,性能有限,需要频繁向磁带库导出数据 速度有上限 RTO 长 短 适用性 各种用户 对备份恢复性能有高要求,对价格不敏感的用户 兼容备份恢复速度,管理灵活性和价格的客户 方案特点 Tape VTL+频繁的导出到Tape VTL+偶尔长期保存数据到Tape 价格 低 高 相对低 Source: TheInfoPro, Inc. Wave 10 Survey, Jan 2008 IBM Confidential 39
信息备份优化方案1 : D-D-T备份方案 优点: 1. 关键服务器备份时间窗极短(秒级) Flash Copy 优点: 1. 关键服务器备份时间窗极短(秒级) 2. 全面备份整体/数据库/MAIL/文件和桌面机连续数据保护 局限: 1. 依赖于存储和SAN 存储子系统 存储子系统 磁带机/库 SAN 虚拟磁带库 数据库 文件 Adv BK Server 备份服务器 优点: 1. 恢复速度快,最大可达1GB/秒 2. 虚拟磁带库,更多并发备份,备份速度快 数据库 文件 其它 桌面机 IBM Confidential
信息备份优化方案2: IP存储&备份解决方案 LAN 应用服务器 备份服务器 磁带库 千兆以太网交换机 (IP SAN) 克隆盘 (FlexClone) SnapMirror to Tape Snapdiff增量备份NAS数据 SnapVault 实现NAS间备份 D2D (应用软件备份功能、备份软件等) 主存储设备 ( N5000或N3000,FC或SAS磁盘) 近线二级NAS存储 ( N3000,SATA磁盘) IBM Confidential
信息安全之二:自加密存储磁带机 自加密磁带驱动器 业界所有的LTO Gen 4 磁带驱动器提供针对开发格式的加密技术 IBM TS1130 磁带机容量高达1TB 标准的密钥管理 (Standards-based Encryption Key Manager) LTO Gen 4 IBM TS1130 方案价值 对物理介质的移动和丢失不会带来安全问题 简化加密技术,更加节约时间和金钱 硬件加解密对备份恢复性能影响小于1% 简化密钥管理 (Tivoli Key Lifecycle Manager) In September 2006, IBM introduced the industry’s first encrypting tape drive… the IBM System Storage TS1120, and it has been a remarkable success with tens of thousands of units shipped since then. Shortly thereafter, IBM followed with the availability of encrypting LTO drives to support a wider range of lower cost tape environments. In July, IBM announced the third generation of the 3592 enterprise tape drive – the IBM System Storage TS1130 Tape Drive - the first drive to offer 1TB of native capacity (and 160 Mbps data rate). In addition, several enhancements were announced to the IBM System Storage TS3500 and TS3310 Tape Libraries. Tape Systems supporting encrypting tape drives: Enterprise: TS3400 TS3500 TS7700, TS7500 (no attach to System z); LTO: TS3500, TS3310, TS3100 TS3200, and DR550 Another option is the System z ability to encrypt data written to tape with its (Encryption Facility for z/OS). With this new model of drive-level encryption, customers have come to appreciate the peace of mind when they remove and transport storage tapes. They don’t have to worry about losing personal data if one of these tapes gets misplaced or stolen! This doesn’t mean they can be more cavalier about managing tapes when they transport them, but the financial and overall business risk from losing them is greatly reduced. Not only are these tapes encrypted and unreadable to unauthorized parties, the IBM Encrypting Tape solution is very simple to deploy since the encryption is done within the drive, the process is transparent to the OS, applications, databases, system administrators and end-users. What this means is that in most cases users can, and are recommended to, deploy the solution within their existing environment, which provides the benefit of encrypting data without the disruption of inserting specialized appliances or hardware accelerators into their networks. Notes on Tivoli Key Lifecycle Manager (formerly TKLM)… TKLM is part of the IBM Java environment and uses the IBM Java Security components for its cryptographic capabilities. TKLM has three main functions that are used to control its behavior: Keystore – customers have a choice of using key stores they’ve already deployed or installing new key stores, including the TKLM one. SW-based keystore type: JCEKS (file-based) HW-based keystore type: PKCS11IMPLKS (PKCS11 cryptographic device), System z, System i In total about 40 3rd-party keystores are supported by TKLM’s key serving engine (see below) Key serving (Cryptographic services provider) – this is the most valuable component of the TKLM today, which transparently detects storage (tape) media, assigns unique encryption keys to each tape cartridge, and automatically serves the keys when a tape cartridge is mounted into the drive. The key serving mechanism supports about 40 3rd-party key stores and uses standard APIs, like PKCS11 and T10, to access 3rd-party key stores. In addition to the 40 3rd-party key stores supported, TKLM also supports 6 different types of key stores for additional implementation flexibility. Key management – maintains policies to perform cryptographic services, like what data gets encrypted, which keys to use, synchronization of key stores, audit capabilities. These policies are stored in different places according to the encryption model the customer deploys (i.e., the tape library or the application itself). [Note: Future versions could include automatic rotation/deletion of keys and other automated policy functions.] TS3310 (3576) DR550 TS7740 (mainframe) TS3500 (3584) IBM Confidential
Enterprise Key Management Host 加密从磁带到磁盘的扩展 全磁盘加密 (Full disk encryption - FDE) 加密的存储系统 实施安全级别高的数据卷进行加密处理,同时采用可信的密钥管理 采用工业标准 FDE 采用TCG的工业加密标准 (Trusted Computing Group security protocol) 主动制定密钥管理的工业标准 基于硬盘自身硬件的加解密对性能影响可忽略。 Enterprise Key Management Host Application Servers System Admin SAN Last fall, IBM, Seagate, and LSI made a technology announcement that will bring drive-level encryption to disk storage systems in the data center, which is the next logical step to the existing encrypting tape solution. Customers and industry observers have shown great interest, and we’re excited about extending storage encryption leadership to further protect our customers from the security threat of losing sensitive information when both tapes and disks are removed from the data center. At a high level, the solution aims to address key IT requirements for simplicity and manageability for data-at-rest encryption: Simplified and proven key management system, operational in the largest banks in the world. Unified key management will handle all forms of storage – The encrypting disk solution will use TKLM (formerly EKM), the same key manager used for IBM’s tape encrypting solution. As with the encrypting tape solution, the encrypting disk solution will be transparent to the OS, applications, databases, system administrators, and end-users, which will make deployment much simpler than deploying specialized encryption appliances. Designed for standard-based manageability - Every one of the hard drive vendors is active in the Trusted Computing Group, the organization writing the standards for these self-encrypting drives. Standards drive interoperability, which drives volume and creates competition. Volume and competition drive cost. We expect the other HDD vendors to closely follow us with products. Maintains performance and linear scalability – The Seagate Secure drives include ASICs, which maintain I/O speeds, and since encryption is done within each drive, the system scales linearly without additional hardware accelerators necessary with specialized encryption appliances. At its core, the self-encrypting disk solution will consist of Seagate Secure full-disk encrypting drives. Each Seagate Secure drive will have an ASIC, which encrypts data as in enters the drive and decrypts data as in leaves the drive. The role of the Storage System is that it owns the disk drives, formats them and manages the data on them to ensure the appropriate DATA PROTECTION, DATA AVAILABILITY, PERFORMANCE, COPY SERVICES, PARTITIONING, and ZONING. In our encryption model, the storage system is the connection and management point between the disk drives and the key server, so as data is written or read from the encrypting drives, the storage system manages the interaction between the drives, the applications, and the key manager. The role of the key management service is to manage the keys associated with encrypting an decrypting the data. The Tivoli Key Lifecycle Manager (previewed in April and scheduled to be available in Q4) will transparently detect encryption-capable media and assign the authorization keys necessary to lock and unlock individual drives. The key manager includes backup and synchronization for high availability and long-term retention, as well as auditing capabilities for both internal and external compliance purposes. Since it is a Java-based application, TKLM can run on most existing server platforms to leverage the resident server’s existing access control and high availability/disaster recovery configurations, which greatly simplifies implementation of this model. For the various security reasons mentioned earlier, we recommend deploying TKLM on the mainframe if customers have one. Lastly, this is intended to be a standards-based solution. Every one of the hard drive vendors is active in the Trusted Computing Group, the organization writing the standards for these self-encrypting drives. Standards drive interoperability, which, in turn, drives volume and competition. With volume and competition comes lower prices. IBM, Seagate, and LSI are actively involved in the development and ratification of these standards, and the whole storage industry is moving aggressively to bring these standards to market. NAS Systems Tape High-end Storage System Midrange Storage System IBM Confidential
信息保存 信息遵从 Businesses are facing multiple forces that create the need to re-examine the Information Infrastructure and its ability to meet the projected needs of the business: Information Explosion Business Optimization Opportunities Risk and Cost Management = = = = = = Background “Why Now?” Historically, IT professionals have balanced the challenges associated with managing data centers as they increase in cost and complexity with the need to be highly responsive to ongoing demands from the business placed on IT. But never before has the growth of the IT marketplace faced such a “perfect storm” of forces that stimulate the need for true data center transformation. We have captured this visually here by showing what is sometimes opposing forces - Operational challenges around cost, service delivery, business resiliency and security, and “green” initiatives that have IT at a breakpoint ; And Business and technology innovation that can drive competitive advantage, but wreak havoc with existing IT infrastructures IBM Confidential
Source: various research data 将结果保存到低成本的存储上,提高系统的利用率 分级存储提供的价值:降低基础架构成本 非活动数据或经常不访问的数据归档到低成本的存储上,提高利用率,降低成本 High Duty Low Duty Cycle Cycle Active data Inactive Data 按照 Forester的研究报告 85% 的生产数据是非活动的 68% 的数据在过去的 90天是没被访问的 按照 IDC 的调查, 40% 的内容是活动的或经常被访问的 Active Data Note to Presenter: Sources: SNIA/Source Consulting Strategic Research IDC Try to get your customer to understand what issues exist because of growing file systems, and what analysts have said about these problems. Research from the Storage Networking Industry Association’s Source Consulting indicates that: 51% of all open systems data is unnecessary, duplicate, or non-business related. 68% of data has not been accessed in 90 days or more. With the results that: 55% of unplanned server outages are the result of running out of storage. 60% improvement in management efficiency is needed every year to keep up with storage growth. These industry facts underscore the magnitude of today’s data-management challenges. Inactive data $ / GB Production Disk Archive Disk Online Tape 1 Year 3 Years 5 Years 20 Years Source: various research data Retention Systems 将结果保存到低成本的存储上,提高系统的利用率 Source: SNIA/Source Consulting IBM Confidential
* TCO estimates based on IBM internal studies. 信息保留要考虑成本和能耗因素 磁盘和磁带的解决方案能够节约50%成本 以10年TCO为例, 假定250TB存储, 以25%每年成长 SATA盘拥有比FC磁盘更低的成本 重复删除技术虚拟磁带库提升存储效率,降低海量数据存储成本 磁带比磁盘成本更低,耗能也低 ILM的最佳实践 怎么样找回数据? 怎么确保找回数据是可用的? 混合解决方案: 在线访问大量最近的文档 低成本、低耗能的长期数据保存 $7 $6,365,950 空间成本 (Floor space) 能耗 (Power & Cooling) 维护(Maintenance) 磁带保留(Prod + DR Carts) 硬件 (Hardware) Millions $3.5 $2,255,346 $946,405 $0 SATA Disk Tape Blended Disk and Tape 磁盘 磁带 混合 得益于混合的解决方案 * TCO estimates based on IBM internal studies. IBM Confidential
分级存储解决方案需要考虑的问题 信息介质多样性 信息在分级存储间的流动和管理 信息长期保存时的特殊需求 拥有多种类型的存储设备。在线高性能、近线高性价比、离线高可扩展能力的需求。 信息在分级存储间的流动和管理 信息识别 、信息迁移 和信息回调功能的实现。不同的业务数据实现的方式有差异 信息长期保存时的特殊需求 灵活性。信息生命长于存储硬件的生命 扩展性。海量数据保存会超过单个存储所能支撑的容量 可持续性。为了节省空间、能源,实现可持续性。信息最终应能向磁带迁移。 安全性。对近线存储和离线存储的数据也应能实现备份、加密等保护。 合规性。应满足组织内部审计或者国家法律规定对信息不可篡改的规定。 In addition to the active data that can me pooled and shared, almost every IT environment has inactive data that it must manage. This is the quarter-end financial report that is looked at briefly and then maintained as bookkeeping records. It’s the regulatory compliance data that must be maintained. And it’s the copies of Active data that are maintained for disaster recovery purposes. In most IT environments, the bulk of the gigabytes stored are different types of inactive data. To help manage the cost of storing all this data, IT managers have always been able to buy storage devices with a variety of costs/MB. An On Demand storage environment takes those devices and adds the software that manages them as a hierarchy of progressively lower cost storage. (click) With an On Demand Storage environment, inactive files may be moved to progressively lower cost storage as the files get older. For some inactive file types, IT managers may choose to duplicate the files when they move from a media that is protected against media failure by RAID (like disk) to a media type that is not protected against media failure (like tape). For other types of inactive data – perhaps regulatory compliance data that must be stored for many, many years – IT managers are likely to want to take advantage of newer media technologies before the data expires. An On Demand storage environment can help manage the transition to newer media types – updating the inventory as the data is moved. Some IT managers may also be faced with regulations that dictate certain kinds of WORM media. An On Demand storage environment can assist with compliance management by getting the right files to the right media type. And finally, IT managers may also need to vault certain types of inactive files for disaster recovery purposes. Comprehensive management of inactive files through a hierarchy of lower cost storage is another characteristic of an On Demand storage environment. IBM Confidential
Fully Managed Costs: Storage Options Relative Cost / GB / Year 关键在于自动化处理:基于策略自动的数据迁移 数据访问需要生命周期全程访问 Fast/High 使数据的价值和存储的技术和管理相匹配 确保需要的时候可以访问到数据 Slow/Low 1 Hour 1 Day 20 Years 50 Years 2 Months 3 Years 5 Years 100+ Years Fully Managed Costs: Storage Options 现在: 按照当前数据使用的策略将数据迁移到成本效率高的存储上 70 60 50 40 Relative Cost / GB / Year Nearly 1000 different regulations impacting data retention. Unstructured data represents nearly 80% of all data. 30 20 将来: 迁移数据到能源利用率高的存储上,有效的满足用户需求 10 High Perf Disk Nearline Disk Online Archive (Disk) Online Tape Offline Tape Leverage automated data migration from tier to tier IBM Confidential
DBMS Siebel Peoplesoft Virtual Tape (Remote Site) 信息分级存储解决方案架构 应用与信息管理层 IBM CommonStore, IBM Content Manager IBM FileNet P8 Content Manager, Image Services, SAP Connector IBM Records Crawler IBM Optim EMC Documentum 信息保留层 Tivoli Storage Manager System Storage Archive Manager 存储层 IBM DR550 (includes SSAM) Tape Systems Disk Systems N series with SnapLock feature 应用程序 Records Images Files E-mail SAP DBMS Siebel Peoplesoft PACS TSM Client Common Store FileNet Optim GMAS 归档应用 Enterprise Archive Services 归档 /分级存储基础架构 SSAM TSM Non lines between applications and infrastructure boxes – would be too busy too many permutations but top level isn’t bad in this version. PACS = Picture Archive Communication System 信息保留系统 DR550 Virtual Tape (Remote Site) Tape/Optical Disk / N series IBM Confidential
增强保护信息更安全 不可删除,不可重写 (NENR) 策略管理的 IBM DR550, N Series,WARM磁带 EMC Centra 信息加密 系统管理 磁盘和磁带 应用管理 各厂商的LTO4磁带机 IBM TS1130磁带机 IBM DS8000/DS5000加密磁盘 数据保护 合规 访问和安全 This slide should also include the dr550 under nenr storage (harley added) IBM Confidential
提高存储利用效率 : 企业级重复数据删除技术 重复数据删除技术要点 确保数据完整性 高性能 支持大容量,PB级 平滑融入现有环境 C A B New Data Stream Repository Memory Resident Index 散裂算法 FC Switch TS7650G Disk Arrays Backup Servers IBM Confidential
Q & A Thank You! Fan Hua 樊华 存储架构师 021-60924160 13701645233 fanhua@cn Q & A Thank You! Fan Hua 樊华 存储架构师 021-60924160 13701645233 fanhua@cn.ibm.com IBM Confidential