大型稀疏矩阵的LU分解及特征值求解陈英时 2016 . 1. 9 2013. 7. 20.

Slides:

Advertisements

Similar presentations

A self-reflection of my teaching design Unit 1 New Friends New Faces 戴弘梧.

Advertisements

北京理工大学继续教育暨现代远程教学院院长

南京楚然电子科技有限公司 Nanjing Truerun Electronics Technology Co.,Ltd

2007年8月龙星课程周源源老师课程体会包云岗中科院计算所

中职英语课程改革中如何实践“以就业为导向，服务为宗旨”的办学理念

Today – Academic Presentation 学术报告

南山区自主创新产业发展专项资金文化产业发展政策解读南山区政府文化产业发展办公室李斌.

持续创新合作共赢山东华创信息技术有限公司王树德

数学与工程的对话中山大学信息科学与技术学院李硕彦教授演讲 (10月21, 24日) 李硕彦 ( Bob Li ) 简介:

-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學資訊管理系李麗華教授.

Chapter 8 Liner Regression and Correlation 第八章直线回归和相关

摘要的开头： The passage mainly tells us sth.

大型稀疏矩阵求解器GSS简介上海智琢软件科技有限公司

高等振動學楊啟榮博士 Advanced Vibration 教授國立台灣師範大學機電工程學系

云实践引导产业升级沈寓实博士教授 MBA 中国云体系产业创新战略联盟秘书长微软云计算中国区总监 WinHEC 2015

Unit 11 I like the Spring Festival best

近期申请问题及相关案例 Justin 22/12/14.

Welcome Welcome to my class Welcome to my class!.

Operating System CPU Scheduing - 3 Monday, August 11, 2008.

IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 3, MARCH 2013

Thinking of Instrumentation Survivability Under Severe Accident

Manifold Learning Kai Yang

異質計算教學課程內容「異質計算」種子教師研習營洪士灝國立台灣大學資訊工程學系

Special English for Industrial Robot

中国汽车燃料经济性标准及燃料经济性政策研究 The Automobile Fuel Economy Standards and Fuel Efficiency Promotion Policies of China 中国汽车技术研究中心 China Automotive.

Do you want to watch a game show?

計算方法設計與分析 Design and Analysis of Algorithms 唐傳義

SAT and max-sat Qi-Zhi Cai.

Guide to Freshman Life Prepared by Sam Wu.

Department of Computer Science & Information Engineering

China Standardization activities of ITS

971研究方法課程第九次上課認識、理解及選擇一項適當的研究策略

重點資料結構之選定會影響演算法選擇對的資料結構讓您上天堂程式.

Formal Pivot to both Language and Intelligence in Science

生命科学概论授课教师：牛登科助教：；助教：李希希兰欣然.

塑膠材料的種類塑膠在模具內的流動模式流動性質的影響溫度性質的影響

邹佳恒第十八届全国科学计算与信息化会议 • 威海，

第六届全国网络科学论坛与第二届全国混沌应用研讨会

工程教育：问题与建议王沛民教育部战略研究基地浙江大学科教发展战略研究中心

LAB 4- Hardware TA: Hu-Hsi Yeh Date: /28

Connecting Education and Career through Learning

PHOTO FUN Name：黃齡誼 Number：９１４０５２５３ Class：應英三C Guide Teacher：蔡佩倫老師.

中国企业如何走进欧洲市场 “一次从欧洲商业角度的探讨”

戴运财浙江农林大学 1.

Unit 1 This is me ! Task.

資料結構 Data Structures Fall 2006， 95學年第一學期 Instructor : 陳宗正.

高性能计算与天文技术联合实验室智能与计算学部天津大学

Safety science and engineering department

虚拟仪器 virtual instrument

Ⅲ. rewarding communicate access embarrassing positive commitment

SIAM全文电子期刊数据库使用指南 iGroup 亚太资讯集团公司

中考英语阅读理解完成句子命题与备考宝鸡市教育局教研室任军利

Inter-band calibration for atmosphere

系统科学与复杂网络初探刘建国上海理工大学管理学院

特殊疑问词的用法.

Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨

Create and Use the Authorization Objects in ABAP

(二)盲信号分离.

学术报告题目：Platinum Alloy Nanocatalyst with Manipulated Particle Composition and Morphology for Improved ORR Properties 报告人：彭振猛助理教授美国阿克伦大学时.

高效洁净机械制造实验室是 2009 年教育部批准立项建设的重点实验室。实验室秉承“突出特色、创新发展“的宗旨，以求真务实的态度认真做好各项工作。实验室主任为黄传真教授，实验室副主任为刘战强教授和李方义教授。学术委员会主任为中国工程院院士卢秉恒教授。实验室固定人员中，有中国工程院院士艾兴教授，教育部.

Resources Planning for Applied Research

中国区部分高性能计算行业用户名单中石油北京勘探开发研究院中海油南方基地中石化物探研究院中石油东方物探(BGP) 中科院数学所

Operating System Software School of SCU

School of law, Guizou University

2052 Jorgen Randers (2012) 下個40年的全球生態、經濟與人類生活總預測

健康按摩法請開音樂.

My favorite subject science.

2015年度中科院超级计算环境青岛分中心培训中科院超算中心中科院海洋所超算中心

China’s most famous “farmer”

Presentation transcript:

大型稀疏矩阵的LU分解及特征值求解陈英时 2016 . 1. 9 2013. 7. 20

稀疏矩阵求解的广泛应用偏微分方程，积分方程，特征值，优化… 万阶以上dense matrix不可行时间瓶颈，内存，外存等瓶颈矩阵求解是数值计算的核心[1] 稀疏矩阵求解是数值计算的关键之一偏微分方程，积分方程，特征值，优化… 万阶以上dense matrix不可行稀疏矩阵求解往往是资源瓶颈时间瓶颈，内存，外存等瓶颈直接法基于高斯消元法，即计算A的LU分解。A通常要经过一系列置换排序，可归并为左置换矩阵P，右置换矩阵Q。基本步骤如下： 1）符号分析：得到置换排序矩阵 P Q 2）数值分解： 3）前代回代： I.S.Duff, A.M.Erisman, and J.K.Reid. Direct Methods for Sparse Matrices. London:Oxford Univ. Press,1986. J.J.Dongarra,I.S.Duff, ... Numerical Linear Algebra for High-Performance Computers. G.H.Golob, C.F.Van loan. Matrix Computations. The Johns Hopkins University Press. 1996

稀疏矩阵复杂、多变基本参数对称性，稀疏性，非零元分布敏感性，病态矩阵条件数格式多变测试集 Harwell-Boeing Exchange Format 。。。测试集 Harwell-Boeing Sparse Matrix Collection UF sparse matrix collection

求解器的飞速发展 BBMAT http://www.cise.ufl.edu/research/sparse/matrices/Simon/bbmat.html 38744阶，分解后元素超过四千万. 1988 巨型机cray-2上 >1000秒 2003 4G umfpack4 32.6秒[4] 2006 3.0G GSS1.2 15秒 2012 3.0G 4核 GSS 2.3 4秒 2014 i7 8核 GSS 2.4 1秒硬件的发展 CPU，内存等稀疏算法逐渐成熟稀疏排序，密集子阵 multifrontal ,supernodal… 数学库 BLAS，LAPACK www.netlib.org

稀疏LU分解算法的关键稀疏LU分解（不考虑零元的）LU分解 1 零元是动态的概念，需要稀疏排序,减少注入元(fill-in) 2 密集子阵根据符号分析，数值分解算法的不同，直接法有以下几类[15]： 1）general technique(通用方法)：主要采用消去树等结构进行LU分解。适用于非常稀疏的非结构化矩阵。 2）frontal scheme(波前法) LU分解过程中，将连续多行合并为一个密集子块(波前)，对这个子块采用BLAS等高效数学库进行分解。 3）multifrontal method(多波前法) 将符号结构相同的多行(列) 合并为多个密集子块，以这些密集子块为单位进行LU分解。这些子块的生成，消去，装配，释放等都需要特定的数据结构及算法。

稀疏排序排序算法的作用是减少矩阵LU分解过程中产生的注入元，求解矩阵的最优排序方法是个NP完全问题（不能够在合理的时间内进行求解），对具体矩阵而言，目前也没有方法或指标来判定哪种算法好。因此实测不同的算法，对比产生的注入元，是确定排序算法的可靠依据。主要的排序算法有最小度排序（MMD，minimum degree ordering algorithm）和嵌套排序（nested dissection）两种。矩阵排序方面相应的成熟软件库有AMD[12] 、COLAMD、METIS[13]等。 Yannokakis M. Computing the minimum fill-in in NP-Complete. SIAM J.Algebraic Discrete Methods, 1981, 2:77~79

近似最小度排序算法 – 商图近似最小度排序（AMD，Approximate Minimum Degree OrderingAlgorithm）算法于1996年左右由Patrick R. Amestoy等学者提出 AMESTOY, P. R., DAVIS, ... 1996a. An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Applic. 17, 4, 886–905.

为何需要密集子块(Dense Matrix) http://eigen.tuxfamily.org/index.php?title=Benchmark

多波前法(multifrontal)简介发展 Duff and Raid [2] J.W.H.Liu等分析，改进 [3] T.A.Davis开发UMFPACK [4] 基本算法利用稀疏矩阵的特性，得到一系列密集子阵（波前）。将LU分解转化为对这些波前的装配，消去，更新等操作。多波前法的优点波前是dense matrix ,可直接调用高性能库（BLAS等）密集子阵可以节省下标存储提高并行性目前主要的求解器 UMFPACK,,GSS,MUMPS,PARDISO, WSMP,HSL MA41等

LU分解形成frontal 10阶矩阵。蓝点代表非零元。红点表示分解产生的注入元(fill-in) Frontal划分{a}, {b}{c}{d} {e} {f,g}{h,i,j}

Frontal的装配，消去，更新过程 {a} {c} {b} {f,g} {e} {d} {h,i,j} 消去树 a c h c · · c,g,h g · · h · · b e j e · · j · · {c} {b} d,i,j i · · j · · f,g,h g g · h · · e,i,j i · · j · · {f,g} {e} {d} h,i, j i i · j · j {h,i,j} 消去树

消去树消去树是符号分析的关键结构，其每个节点对应于矩阵的一列（行），该节点只与其父节点相连，父节点定义如下： J.W.H.Liu. The Role of Elimination Trees in Sparse Factorization. SIAM J.Matrix Anal.Applic.,11(1):134-172,1990. J.W.H.Liu. Elimination structures for unsymmetric sparse LU factors. SIAM J. Matrix Anal. Appl. 14, no. 2, 334--352, 1993.

GSS简介高校，研究所中国电力科学研究院香港大学南京大学河海大学中国石油大学电子科技大学三峡大学山东大学标准C开发，适用于各种平台比INTEL PARDISO更快，更稳定 CPU/GPU 混合计算突破32位Windows内存限制 32个用户参数支持用户定制模块高校，研究所中国电力科学研究院香港大学南京大学河海大学中国石油大学电子科技大学三峡大学山东大学

他们为什么选择GSS? user detail Why they choose GSS crosslight Industry leader in TCAD simulation Hybrid GPU/CPU version, more than 2 times faster than PARDISO, MUMPS and other sparse solvers. soilvision The most technically advanced suite of 1D/2D/3D geotechnical software Much faster than their own sparse solver. FEM consulting The leading research teams in the area of the Finite Element Method since 1967 GSS is faster than PARDISO and provide many custom module. GSCAD Leading building software in China GSS provide a user-specific module to deal with ill-conditioned matrix. ICAROS A global turnkey geospatial mapping service provider and state of the art photogrammetric technologies developer. GSS is faster than PARDISO. Also provide some technical help. EPRI China Electric Power Research Institute 3-4 times faster than KLU

GSS – 加权消去树工作量消去树基于消去树结构来计算数值分解的工作量，将LU分解划分为多个独立的任务，为高效并行计算奠定基础。

GSS -双阈值列选主元算法

GSS - CPU/GPU混合计算 1) After divides LU factorization into many parallel tasks, GSS will use adaptive strategy to run these tasks in different hardware (CPU, GPU …). That is, if GPU have high computing power, then it will run more tasks automatically. If CPU is more powerful, then GSS will give it more tasks. 2) And furthermore, if CPU is free (have finished all tasks) and GPU still run a task, then GSS will divide this task to some small tasks then assign some child-task to CPU, then CPU do computing again. So get higher parallel efficiency. 3) GSS will also do some testing to get best parameters for different hardware.

GSS – 求解频域谱元方法生成的矩阵一次LU分解符号分解时间 50秒数值分解时间 46秒回代 2.5秒需要求解多个右端项矩阵较稠密约40万阶，15亿个非零元 GSS约需15G内存一次LU分解符号分解时间 50秒数值分解时间 46秒回代 2.5秒需要求解多个右端项 32个右端项需80秒？估计80/4=20秒进一步优化 CPU/GPU混合计算数值分解约35秒重复利用符号分析结果根据矩阵的特殊结构来进一步减少非零元

对比测试 The test matrices are all from the UF sparse matrix collection PARDISO is from Intel Composer XE 2013 SP1. GSS 2.4 use CPU-GPU hybrid computing. The testing CPU is INTEL Core i7-4770(3.4GHz) with 24G memory. The graphics card is ASUS GTX780 (with compute capability 3.5). NVIDIA CUDA Toolkit is 5.5. The operating system is Windows 7 64. Both solvers use default parameters. For large matrices need long time computing, GSS 2.4 is Nearly 3 times faster than PARDISO. For matrices need short time computing, PARDISO is faster than GSS. One reason is that complex synchronization between CPU/GPU do need some extra time.

大型稀疏矩阵的特征值求解 150 Years old and still alive: eigenproblems 为特征值，x为对应的特征向量 150 Years old and still alive: eigenproblems 1997 --- by Henk A. van der Vorst , Gene H. Golub 稀疏LU分解 - 理论上即高斯消元法稀疏特征值 – 趋近于纯数学代数特征值问题 J.H.Wilkinson Matrix Algorithms || EigenSystems. G.W.Stewart G.H.Golob, C.F.Van loan. Matrix Computations. The Johns Hopkins University Press. 1996 Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Zhaojun Bai , …. 2000 重视

幂迭代 -- Power iteration Math 685/CSI 700 Spring 08 George Mason University, Department of Mathematical Sciences

幂迭代 -- 一个形象的解释 Math 685/CSI 700 Spring 08 George Mason University, Department of Mathematical Sciences

瑞利商迭代 -- Rayleigh Quotient iteration Math 685/CSI 700 Spring 08 瑞利商迭代 -- Rayleigh Quotient iteration George Mason University, Department of Mathematical Sciences

子空间迭代

Krylov子空间 Math 685/CSI 700 Spring 08 George Mason University, Department of Mathematical Sciences

Arnoldi 迭代 Math 685/CSI 700 Spring 08 George Mason University, Department of Mathematical Sciences

Arnoldi 迭代的基本算法 ARPACK www.caam.rice.edu/software/ARPACK/ Math 685/CSI 700 Spring 08 Arnoldi 迭代的基本算法 ARPACK www.caam.rice.edu/software/ARPACK/ ON RESTARTING THE ARNOLDI METHOD FOR LARGENONSYMMETRIC EIGENVALUE PROBLEMS, MORGAN R.B. Lehoucq. Analysis and Implementation of an Implicitly Restarted Iteration. PhD thesis George Mason University, Department of Mathematical Sciences

Arnoldi 迭代求解特征值 Math 685/CSI 700 Spring 08 George Mason University, Department of Mathematical Sciences

Krylov分解 -- 基于酉相似变换(unitary similarity ) 标准Arnoldi分解广义Krylov分解 Matrix Algorithms || EigenSystems. G.W.Stewart P.309 其中Q为酉矩阵，且可以连续叠加重视基于酉相似变换的分解具有后向稳定性 ---Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide P.173

Krylov-schur分解的优点易于挑选ritz值作为implicit shift Schur分解将任何一个矩阵归约为上三角矩阵，对角线即为该矩阵的特征值；并且在这条对角线上，特征值可以通过酉相似变换来任意排列。也就是说，在生成Rayleigh矩阵，并计算出所有的ritz值之后，可以把需要的Ritz值排到前面，而不需要的Ritz值排到后面，重启之后，只有挑出来的Ritz值出现在序列中。易于Deflation(Lock+Purge) Deflation操作的基本思路也类似，更加复杂。实测更效率实测迭代次数，运行时间都减少约1/3。（与ARPACK对比） G. W. Stewart, A Krylov–Schur algorithm for large eigenproblems, SIAM J. Matrix Anal. Appl., 23 (2001), pp. 601–614.

Krylov-schur及其重启考虑重启后，B矩阵更加复杂，如右图所示包含重启的Krylov-Schur分解算法 Matrix Algorithms || EigenSystems. G.W.Stewart P329-330

收敛速度更快经多个实际算例验证，其速度明显快于目前通用的ARPACK，一般迭代次数仅为ARPACK的60%-70%。 Why?

Subspace iteration with approximate spectral projection 最新算法 Subspace iteration with approximate spectral projection 基于cauch积分的spectral projection 可求出复平面内指定区域内的所有特征值主要用于对称矩阵，需推广到非对称矩阵 FEAST AS A SUBSPACE ITERATION EIGENSOLVER ACCELERATED BY APPROXIMATE SPECTRAL PROJECTION --- P.TAK, P.TANG, E.POLIZZI

shift-invert变换标准的shift-invert变换 Matrix transformations for computing rightmost eigenvalues of large sparse non-symmetric eigenvalue problems -- K.MEERBERGEN, D.ROOSE

Cayley 变换 Cayley变换 Cayley变换特性 1 平行于虚轴的直线映射到单位圆 2 该直线左侧的点被映射到单位圆内部 1 平行于虚轴的直线映射到单位圆 2 该直线左侧的点被映射到单位圆内部 3 该直线右侧的点被映射到单位圆外部 Matrix transformations for computing rightmost eigenvalues of large sparse non-symmetric eigenvalue problems -- K.MEERBERGEN, D.ROOSE

Filter polynomial --- Matrix Algorithms || EigenSystems. G.W.Stewart P.317

附一 Aleksei Nikolaevich Krylov (1863–1945) showed in 1931 how to use sequences of the form {b, Ab, A2b, . . .} to construct the characteristic polynomial of a matrix. Krylov was a Russian applied mathematician whose scientific interests arose from his early training in naval science that involved the theories of buoyancy, stability, rolling and pitching, vibrations, and compass theories. Krylov served as the director of the Physics–Mathematics Institute of the Soviet Academy of Sciences from 1927 until 1932, and in 1943 he was awarded a “state prize” for his work on compass theory. Krylov was made a “hero of socialist labor,” and he is one of a few athematicians to have a lunar feature named in his honor—on the moon there is the “Crater Krylov.” Walter Edwin Arnoldi (1917–1995) was an American engineer who published this technique in 1951, not far from the time that Lanczos’s algorithm emerged. Arnoldi received his undergraduate degree in mechanical engineering from Stevens Institute of Technology, Hoboken, New Jersey, in 1937 and his MS degree at Harvard University in 1939. He spent his career working as an engineer in the Hamilton Standard Division of the United Aircraft Corporation where he eventually became the division’s chief researcher. He retired in 1977. While his research concerned mechanical and aerodynamic properties of aircraft and aerospace structures, Arnoldi’s name is kept alive by his orthogonalization procedure.

附二参考文献 [1] Numerical Analysis. Rainer Kress. Springer-Verlag. 1991 [2] I.S.Duff, A.M.Erisman, and J.K.Reid. Direct Methods for Sparse Matrices. London:Oxford Univ. Press,1986. [3] J.W.H.Liu. The Multifrontal Method for Sparse Matrix Solution: Theory and Practice. SIAM Rev., 34 (1992), pp. 82--109. [4] T.A.Davis. A column pre-ordering strategy for the unsymmetric-pattern multifrontal method, ACM Trans. Math. Software, vol 30, no. 2, pp. 165-195, 2004. [5] N.J.Higham. Accuracy and Stability of Numerical Algorithms. SIAM,2002 [6] G..H.Golob, C.F.Van loan. Matrix Computations. The Johns Hopkins University Press. 1996 [7] J.W.H.Liu. The Multifrontal Method for Sparse Matrix Solution: Theory and Practice. SIAM Rev., 34 (1992), pp. 82--109. [8] Fast PageRank Computation via a Sparse Linear System (Extended Abstract) Gianna M. Del Corso,Antonio Gullí, Francesco Romani. [9] Y.Saad, Iterative Methods for Sparse Linear Systems, PWS, Boston,1996 [10] Y.S. Chen* ,Simon Li. Application of Multifrontal Method for Doubly-Bordered Sparse Matrix in Laser Diode Simulator. NUSOD,2004 [11] 陈英时吴文勇等. 采用多波前法求解大型结构方程组.建筑结构,2007年09期 [12]宋新立,陈英时等.电力系统全过程动态仿真中大型稀疏线性方程组的分块求解算法

非常感谢各位老师，同学！ MAIL: gsp@grusoft.com QQ: 304718494 www.grusoft.com