百万亿次超级计算机诞生记 姓名 Xiangyu Ye 职务 微软中国技术中心资深HPC顾问 公司 微软中国 5/13/2019 12:38 AM 百万亿次超级计算机诞生记 姓名 Xiangyu Ye 职务 微软中国技术中心资深HPC顾问 公司 微软中国 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
议程 高性能计算机发展 曙光5000 Linpack测试记
X86 (Pentium/EM64T/Opteron) 平台达到 68% 过去10年全球500强趋势 行业使用率增长很快 50%的系统采用千兆以太网 集群(cluster)超过70% X86 (Pentium/EM64T/Opteron) 平台达到 68% 高性能计算机正在成为主流
高性能计算的“摩尔定律”:每年性能提升一倍 性能的提升 高性能计算机市场从2000年开始繁荣 高性能计算的“摩尔定律”:每年性能提升一倍
性能后面的问题 #1 : 在后面的12个月里,什么会限制你使用高性能的群集系统? #2 : 什么样的高性能计算的人员是你最需要的? From: http://www.linux-mag.com/microsites.php?site=business-class-hpc
高性能计算概念的演变 高效率运算 高性能运算 简单计算 问题的解决 IT 管理 灵活的系统架构 代码优化 高速互联 并行化 Run-Time Lib 数学模型 编译器 简单计算
微软高性能计算产品路线图 Mainstream HPC Version 2 H2 2008 Mainstream High Performance Computing on Windows platform Interoperability: Web Services for Job Scheduler, Parallel File Systems Applications: Service Oriented, Batch, .NET Turnkey: Enabling pre-configured OEM solutions Scale: Large scale, non-uniform clusters, diagnostics framework Service Pack 1 Performance & Reliability Improvements Support for Windows Server 2003 SP2 Support for Windows Deployment Services Vista Support for CCP Client tools Web Releases MOM Pack PowerShell for CLI Tools for Accelerating Excel SP1 & Web 2007 Mainstream High Performance Computing on Windows platform Simple to set up and manage in familiar environment Integrated with existing Windows infrastructure V1 Summer 2006
5/13/2019 12:38 AM 曙光 5000A高性能计算机 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
系统概览
基本数据 1920 运算节点 30720 核 233 Tflops 理论峰值 122.88 TB 内存 20Gb/s InfiniBand Ports
体系结构
总体架构
总体效果图
计算子系统 --TC2600刀片服务器介绍
TC2600刀片产品概览 第三代双核/四核刀片服务器产品 第一款国产通用刀片服务器 自主研发、具有完全的知识产权 高效能、高密度、高扩展、高稳定、高可用 更高计算密度、更多集成功能和更强管理能力 模块化的设计,运转高效的体系架构 简单使用、简化管理、低复杂度、低总拥有成本
产品规格概览 19”标准机架式 7U 10个四路处理器刀片 增加42.8%的计算密度 全模块化设计
TC2600刀片主要组成部分 7U 硬盘位 计算刀片
TC2600刀片主要组成部分 网络模块1 网络模块2 IB交换模块 主管理模块 从管理模块 IOE扩展模块 (含热插拔风扇) 电源模块
IB高速交换模块 刀片机箱提供一个IB高速 交换网络模块插槽; 每个模块对外提供10个4x DDR 20Gbps的Infiniband 接口;
TC2600的特性 独创的I/O扩展方式,打破刀片一直以来的限制; 共享USB接口实现Share Media功能; 使用自动智能调节策略的电源模块SRPM(Self Regulating Power Modules); 使用线性预补偿策略的散热模块LPCM(Linear Pre-Compensation Cooling Modules); 管理模块提供全视角管理控制功能FVMM(Full View Management Modules); 具有800Gb/s的交换带宽IB高速交换模块;
计算网络 --288口infiniband交换机介绍 产品简介: (4X=20Gbps DDR) 14U交换机机柜;24个leaf插槽和6个Spine插槽;最大支持6块 Spine模块;标配6块电源模块(可扩到12个);8个风扇;交换机 以12 个Infiniband 端口为单位,最大递增支持288个 Infiniband端 口全互联 物理特征参数: 背板带宽为11.52Tbps 高623mm X 宽440mm X 长679.4mm 重约82kg 功率为1489w 标配6块电源模块,8个风扇模块,上架导轨,前面板,电源线缆
5/13/2019 12:38 AM Linpack 测试记 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
面临的挑战 系统稳定性 部署与管理 资源分配 问题诊断 44 小时无故障= 单条内存Single Memory DIMM MTBF 154.3 Years 部署与管理 所有1920台服务器2天部署完毕 管理1920个节点和16个节点一样容易 资源分配 资源单位 节点 CPU 核 在10分钟内 分发/提交/取消 30000个核的作业 问题诊断 保持所有运算节点的一致性 网络的连接与性能 单个节点的性能 BSOD
Leverage Product Feature 测试流程 Accelerate Cycle Locate problem Leverage Product Feature
疑问和解答
© 2008 Microsoft Corporation. All rights reserved © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.