联想DSS并行存储 张莫穷, 联想HPC团队 zhangmq3@lenovo.com
RAID和磁盘管理功能迁移到I/O服务器 DSS/GSS并行存储系统 SAN存储方案 o l 磁盘扩展柜 File Server 1 File Server 2 客户端 RAID和磁盘管理功能迁移到I/O服务器 GPFS Native RAID GSS存储方案 o l 磁盘阵列控制器 磁盘扩展柜 File Server 1 x3650 File Server 2 客户端 I/O服务器 * Just a bunch of disks i.e. an array of hard drives
Lenovo GSS并行存储 GSS26 (28U) GSS24 (20U) GSS22 (12U) Lenovo GSS with NL-SAS GSS v2.5.9 支持8TB 单盘;GPFS 4.1.1版本 两台 x3650m5 服务器 SAS 12Gb/s连接磁盘JBODs HPC 网络接入: 10Gb / 40Gb / FDR EDR interoperability since GSS v2.5.8 两台, 四台 or 六台 JBODs (4U60) 硬盘支持 3,4,6,8 TB NL-SAS硬盘 可用空间0.5 / 1.0 / 1.6 PB (6TB, 8+2p) 可用空间0.7 / 1.4 / 2.1 PB (8TB, 8+2p) 以“building blocks”方式扩展 容量,性能同时获得提升 GSS26 (28U) GSS24 (20U) GSS22 (12U)
GSS28 (44U) would be 5.2 PB usable (10TB, 8+2P) Lenovo DSS Model G2x0 DSS-G发布时间 : 1Q CY2017 支持Lenovo D3284 JBODs (5U84盘位) 支持Lenovo D1224 JBODs (2U24盘位) 两台x3650-m5服务器 SAS 磁盘连接(12Gbps) HPC高速网络支持:Eth, IB, OPA 1到6个D3284 JBODs (5U84, 12Gbps) 支持4,6,8,10 TB NL-SAS disks 高达3.9 PB可用容量 (10TB, 8+2p) Scale-out横向扩展: 增加building blocks 纵向扩展: 做多6盘箱 For capacitiy calculations: Use 2 disks of spare capacity per enclosure (same as current 4U60), so 82 drives usable Use 2x SSD in encl1, and populate those slots in the remaining enclosures (no holes as in 4U60) Use 8+2P for a fair comparison with competitors (8/10=80% space efficiency); GSS can also use 8+3P which is only 8/11=73% space efficient. GSS28 (44U) would be 5.2 PB usable (10TB, 8+2P) DSS G210 (9U) DSS G260 (34U)
3 1-fault-tolerant mirrored groups (RAID1) De-clustering - a key feature of IBM System x GPFS Storage Server 21 stripes (42 strips) 7 stripes per group (2 strips per stripe) 49 strips 3 1-fault-tolerant mirrored groups (RAID1) 3 groups 6 disks spare disk 7 spare strips 7 disks
Rebuild overhead reduced by 3.5x De-clustering can reduce data rebuild overhead by ~ 4-6 times failed disk failed disk Rd Wr time Rebuild activity confined to just a few disks – slow rebuild, disrupts user programs Rd-Wr time Rebuild activity spread across many disks, less disruption to user programs Large number of stripes completely contained on small number of disks. Rebuild activity confined to just a few disks – slow rebuild, disrupts user programs Rebuild activity spread across many disks, faster rebuild or less disruption (nominally 3%) Rebuild overhead reduced by 3.5x
De-clustered RAID6 enables higher data availability when disks fail parity and spare 14 physical disks / 3 traditional RAID6 arrays / 2 spares 14 physical disks / 1 declustered RAID6 array / 2 spares Decluster data, parity and spare 14 physical disks / 3 traditional RAID6 arrays / 2 spares 14 physical disks / 1 declustered RAID6 array / 2 spares failed disks failed disks failed disks Number of faults per stripe Red Green Blue 2 1 Number of stripes with 2 faults = 1 Number of stripes with 2 faults = 7 This slide shows a graphic that talks about de-clustered RAID6 enables higher data availability when disks fail.
Critical Rebuild Test with 8+3P and 3 disk failures Percentage of critical stripes on 8+3P after 3 disk failures in a 58-disk array: (11/58)*(10/57)*(9/56)=0.5% (4TB ~ 24h 20GB ~ 8min)
Second Phase Deployment GSS v3.1: Enclosure Expansion (Scaling “Up”) Alternative view without animation. Initial Deployment: GSS22 - Storage Enclosures 2 x 200 GB SSD, 116 x 10TB, 12 Gb/s, NL-SAS Raw Capacity = 1160 TB Usable Capacity < 880 TB - Streaming Performance Write < 10 GB/s Read < 14 GB/s Comment: The GSS22 is not intended for scaling out; as a best practice, it would be best to expand to a GSS24 or GSS26 before scaling out. Storage Enclosure 1.2 58 x NL-SAS Storage Enclosure 1.1 58 x NL-SAS + 2 x SSD GSS Server 1.2 GSS Server 1.1 2xEDR 1xGbE Storage Enclosure 1.3 58 x NL-SAS Storage Enclosure 1.2 GSS Server 1.2 GSS Server 1.1 2xEDR 1xGbE Storage Enclosure 1.1 58 x NL-SAS + 2 x SSD Storage Enclosure 1.4 Storage Enclosure 1.6 58 x NL-SAS Storage Enclosure 1.5 Storage Enclosure 1.4 Storage Enclosure 1.3 Storage Enclosure 1.2 Storage Enclosure 1.1 58 x NL-SAS + 2 x SSD GSS Server 1.2 GSS Server 1.1 2xEDR 1xGbE 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Initial Deployment Second Phase Deployment Final Deployment Second Phase Deployment: GSS24 - Storage Enclosures 2 x 200 GB SSD, 232 x 10TB, 12 Gb/s, NL-SAS Raw Capacity = 2320 TB Usable Capacity < 1761 TB - Streaming Performance Write < 20 GB/s Read < 27 GB/s Final Deployment: GSS26 - Storage Enclosures 2 x 200 GB SSD, 348 x 10TB, 12 Gb/s, NL-SAS Raw Capacity = 3480 TB Usable Capacity < 2653 TB - Streaming Performance Write < 23 GB/s Read < 35 GB/s Comment: This feature also applies to SSD Models (GSS21s → GSS22s → GSS24s ) and 10K SAS Models (GSS22s → GSS24s → GSS26s) 2016 Lenovo Internal. All rights reserved.
DSS 概览 Model Reads Writes DSS-G 240 31 24 DSS-G 260 34 20 DSS G202 x3650M5 HPIO D3284 164 x NL-SAS DSS G220 x3650M5 HPIO D3284 334 x NL-SAS DSS G240 x3650M5 HPIO D3284 502 x NL-SAS DSS G260 x3650M5 HPIO D3284 670 x NL-SAS DSS G280 SSD / SAS Option for High Performance / IOPS Low Cost of Entry Performance optimized Capacity Optimized D1224 D1224 D1224 D1224 D1224 x3650M5 HPIO D1224 x3650M5 HPIO x3650M5 HPIO x3650M5 HPIO x3650M5 HPIO D1224 D1224 D1224 D1224 D1224 D1224 2017 Lenovo Internal. All rights reserved. HPIO = High Performance I/O
DSS并行存储的整体优势 Declustered RAID 高性能 高速网络支持 数据一致性、可靠性和灵活性保障 软、硬件打包集成 降低重构过程的系统负载 ,重构速度提高4 – 8倍 高性能 x3650 M5 性能远高于存储控制器芯片 数据一致性, 可靠性&灵活性 End-to-end checksum 2 & 3 fault tolerance Application optimized RAID 高速网络互连 Cluster & storage traffic including failover FDR/EDR/OPA100/10GbE/25GbE/40GbE/100GbE Integrated Server & Storage Packaging Improves density & efficiency Software-based Controller Reduces HW overhead & cost . Enables enhanced functionality Declustered RAID 高性能 高速网络支持 数据一致性、可靠性和灵活性保障 软、硬件打包集成 基于软件的控制器 Spectrum Scale RAID 2016 Lenovo Internal. All rights reserved.
DSS存储系统的发展计划 Distributed Storage Solution for IBM Spectrum Scale Defined Solution especially for large capacity, high performance workloads in HPC environments Distributed Storage Solution for SUSE Enterprise Storage Defined Solution especially for interaction with Lenovo scale-out HANA solutions. Distributed Storage Architecture for SUSE Enterprise Storage / Red Hat Ceph Storage Tested architecture as entry point and mid range CEPH offering in HPC environments. Distributed Storage Architecture for Intel Lustre EE Tested architecture as entry point and mid range Lustre offering in HPC environments. 2017 Lenovo Internal. All rights reserved.