大型稀疏矩阵的LU分解及特征值求解陈英时 2016 . 1. 9 2013. 7. 20.

大型稀疏矩阵的LU分解及特征值求解陈英时

稀疏矩阵求解的广泛应用偏微分方程，积分方程，特征值，优化… 万阶以上dense matrix不可行时间瓶颈，内存，外存等瓶颈
矩阵求解是数值计算的核心[1] 稀疏矩阵求解是数值计算的关键之一偏微分方程，积分方程，特征值，优化… 万阶以上dense matrix不可行稀疏矩阵求解往往是资源瓶颈时间瓶颈，内存，外存等瓶颈直接法基于高斯消元法，即计算A的LU分解。A通常要经过一系列置换排序，可归并为左置换矩阵P，右置换矩阵Q。基本步骤如下： 1）符号分析：得到置换排序矩阵 P Q 2）数值分解： 3）前代回代： I.S.Duff, A.M.Erisman, and J.K.Reid. Direct Methods for Sparse Matrices. London:Oxford Univ. Press,1986. J.J.Dongarra,I.S.Duff, ... Numerical Linear Algebra for High-Performance Computers. G.H.Golob, C.F.Van loan. Matrix Computations. The Johns Hopkins University Press. 1996

稀疏矩阵复杂、多变基本参数对称性，稀疏性，非零元分布敏感性，病态矩阵条件数格式多变测试集
Harwell-Boeing Exchange Format 。。。测试集 Harwell-Boeing Sparse Matrix Collection UF sparse matrix collection

求解器的飞速发展 BBMAT 38744阶，分解后元素超过四千万. 巨型机cray-2上 >1000秒 G umfpack4 32.6秒[4] G GSS 秒 G 4核 GSS 2.3 4秒 2014 i7 8核 GSS 2.4 1秒硬件的发展 CPU，内存等稀疏算法逐渐成熟稀疏排序，密集子阵 multifrontal ,supernodal… 数学库 BLAS，LAPACK

稀疏LU分解算法的关键稀疏LU分解（不考虑零元的）LU分解 1 零元是动态的概念，需要稀疏排序,减少注入元(fill-in)
2 密集子阵根据符号分析，数值分解算法的不同，直接法有以下几类[15]： 1）general technique(通用方法)：主要采用消去树等结构进行LU分解。适用于非常稀疏的非结构化矩阵。 2）frontal scheme(波前法) LU分解过程中，将连续多行合并为一个密集子块(波前)，对这个子块采用BLAS等高效数学库进行分解。 3）multifrontal method(多波前法) 将符号结构相同的多行(列) 合并为多个密集子块，以这些密集子块为单位进行LU分解。这些子块的生成，消去，装配，释放等都需要特定的数据结构及算法。

稀疏排序排序算法的作用是减少矩阵LU分解过程中产生的注入元，求解矩阵的最优排序方法是个NP完全问题（不能够在合理的时间内进行求解），对具体矩阵而言，目前也没有方法或指标来判定哪种算法好。因此实测不同的算法，对比产生的注入元，是确定排序算法的可靠依据。主要的排序算法有最小度排序（MMD，minimum degree ordering algorithm）和嵌套排序（nested dissection）两种。矩阵排序方面相应的成熟软件库有AMD[12] 、COLAMD、METIS[13]等。 Yannokakis M. Computing the minimum fill-in in NP-Complete. SIAM J.Algebraic Discrete Methods, 1981, 2:77~79

近似最小度排序算法 – 商图近似最小度排序（AMD，Approximate Minimum Degree OrderingAlgorithm）算法于1996年左右由Patrick R. Amestoy等学者提出 AMESTOY, P. R., DAVIS, a. An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Applic. 17, 4, 886–905.

为何需要密集子块(Dense Matrix)

多波前法(multifrontal)简介
发展 Duff and Raid [2] J.W.H.Liu等分析，改进 [3] T.A.Davis开发UMFPACK [4] 基本算法利用稀疏矩阵的特性，得到一系列密集子阵（波前）。将LU分解转化为对这些波前的装配，消去，更新等操作。多波前法的优点波前是dense matrix ,可直接调用高性能库（BLAS等）密集子阵可以节省下标存储提高并行性目前主要的求解器 UMFPACK,,GSS,MUMPS,PARDISO, WSMP,HSL MA41等

LU分解形成frontal 10阶矩阵。蓝点代表非零元。红点表示分解产生的注入元(fill-in)
Frontal划分{a}, {b}{c}{d} {e} {f,g}{h,i,j}

Frontal的装配，消去，更新过程 {a} {c} {b} {f,g} {e} {d} {h,i,j} 消去树 a c h c · ·
c,g,h g · · h · · b e j e · · j · · {c} {b} d,i,j i · · j · · f,g,h g g · h · · e,i,j i · · j · · {f,g} {e} {d} h,i, j i i · j · j {h,i,j} 消去树

消去树消去树是符号分析的关键结构，其每个节点对应于矩阵的一列（行），该节点只与其父节点相连，父节点定义如下：
J.W.H.Liu. The Role of Elimination Trees in Sparse Factorization. SIAM J.Matrix Anal.Applic.,11(1): ,1990. J.W.H.Liu. Elimination structures for unsymmetric sparse LU factors. SIAM J. Matrix Anal. Appl. 14, no. 2, , 1993.

GSS简介高校，研究所中国电力科学研究院香港大学南京大学河海大学中国石油大学电子科技大学三峡大学山东大学
标准C开发，适用于各种平台比INTEL PARDISO更快，更稳定 CPU/GPU 混合计算突破32位Windows内存限制 32个用户参数支持用户定制模块高校，研究所中国电力科学研究院香港大学南京大学河海大学中国石油大学电子科技大学三峡大学山东大学

他们为什么选择GSS? user detail Why they choose GSS crosslight
Industry leader in TCAD simulation Hybrid GPU/CPU version, more than 2 times faster than PARDISO, MUMPS and other sparse solvers. soilvision The most technically advanced suite of 1D/2D/3D geotechnical software Much faster than their own sparse solver. FEM consulting The leading research teams in the area of the Finite Element Method since 1967 GSS is faster than PARDISO and provide many custom module. GSCAD Leading building software in China GSS provide a user-specific module to deal with ill-conditioned matrix. ICAROS A global turnkey geospatial mapping service provider and state of the art photogrammetric technologies developer. GSS is faster than PARDISO. Also provide some technical help. EPRI China Electric Power Research Institute 3-4 times faster than KLU

GSS – 加权消去树工作量消去树基于消去树结构来计算数值分解的工作量，将LU分解划分为多个独立的任务，为高效并行计算奠定基础。

GSS -双阈值列选主元算法

GSS - CPU/GPU混合计算 1) After divides LU factorization into many parallel tasks, GSS will use adaptive strategy to run these tasks in different hardware (CPU, GPU …). That is, if GPU have high computing power, then it will run more tasks automatically. If CPU is more powerful, then GSS will give it more tasks. 2) And furthermore, if CPU is free (have finished all tasks) and GPU still run a task, then GSS will divide this task to some small tasks then assign some child-task to CPU, then CPU do computing again. So get higher parallel efficiency. 3) GSS will also do some testing to get best parameters for different hardware.

GSS – 求解频域谱元方法生成的矩阵一次LU分解符号分解时间 50秒数值分解时间 46秒回代 2.5秒需要求解多个右端项
矩阵较稠密约40万阶，15亿个非零元 GSS约需15G内存一次LU分解符号分解时间 50秒数值分解时间 46秒回代 2.5秒需要求解多个右端项 32个右端项需80秒？估计80/4=20秒进一步优化 CPU/GPU混合计算数值分解约35秒重复利用符号分析结果根据矩阵的特殊结构来进一步减少非零元

对比测试 The test matrices are all from the UF sparse matrix collection
PARDISO is from Intel Composer XE 2013 SP1. GSS 2.4 use CPU-GPU hybrid computing. The testing CPU is INTEL Core i7-4770(3.4GHz) with 24G memory. The graphics card is ASUS GTX780 (with compute capability 3.5). NVIDIA CUDA Toolkit is 5.5. The operating system is Windows Both solvers use default parameters. For large matrices need long time computing, GSS 2.4 is Nearly 3 times faster than PARDISO. For matrices need short time computing, PARDISO is faster than GSS. One reason is that complex synchronization between CPU/GPU do need some extra time.

大型稀疏矩阵的特征值求解 150 Years old and still alive: eigenproblems
为特征值，x为对应的特征向量 150 Years old and still alive: eigenproblems by Henk A. van der Vorst , Gene H. Golub 稀疏LU分解 - 理论上即高斯消元法稀疏特征值 – 趋近于纯数学代数特征值问题 J.H.Wilkinson Matrix Algorithms || EigenSystems. G.W.Stewart G.H.Golob, C.F.Van loan. Matrix Computations. The Johns Hopkins University Press. 1996 Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Zhaojun Bai , …. 2000 重视

幂迭代 -- Power iteration Math 685/CSI 700 Spring 08
George Mason University, Department of Mathematical Sciences

幂迭代 -- 一个形象的解释 Math 685/CSI 700 Spring 08

瑞利商迭代 -- Rayleigh Quotient iteration
Math 685/CSI 700 Spring 08 瑞利商迭代 -- Rayleigh Quotient iteration George Mason University, Department of Mathematical Sciences

子空间迭代

Krylov子空间 Math 685/CSI 700 Spring 08

Arnoldi 迭代 Math 685/CSI 700 Spring 08

Arnoldi 迭代的基本算法 ARPACK www.caam.rice.edu/software/ARPACK/
Math 685/CSI 700 Spring 08 Arnoldi 迭代的基本算法 ARPACK ON RESTARTING THE ARNOLDI METHOD FOR LARGENONSYMMETRIC EIGENVALUE PROBLEMS, MORGAN R.B. Lehoucq. Analysis and Implementation of an Implicitly Restarted Iteration. PhD thesis George Mason University, Department of Mathematical Sciences

Arnoldi 迭代求解特征值 Math 685/CSI 700 Spring 08

Krylov分解 -- 基于酉相似变换(unitary similarity ) 标准Arnoldi分解广义Krylov分解
Matrix Algorithms || EigenSystems. G.W.Stewart P.309 其中Q为酉矩阵，且可以连续叠加重视基于酉相似变换的分解具有后向稳定性 ---Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide P.173

Krylov-schur分解的优点易于挑选ritz值作为implicit shift
Schur分解将任何一个矩阵归约为上三角矩阵，对角线即为该矩阵的特征值；并且在这条对角线上，特征值可以通过酉相似变换来任意排列。也就是说，在生成Rayleigh矩阵，并计算出所有的ritz值之后，可以把需要的Ritz值排到前面，而不需要的Ritz值排到后面，重启之后，只有挑出来的Ritz值出现在序列中。易于Deflation(Lock+Purge) Deflation操作的基本思路也类似，更加复杂。实测更效率实测迭代次数，运行时间都减少约1/3。（与ARPACK对比） G. W. Stewart, A Krylov–Schur algorithm for large eigenproblems, SIAM J. Matrix Anal. Appl., 23 (2001), pp. 601–614.

Krylov-schur及其重启考虑重启后，B矩阵更加复杂，如右图所示包含重启的Krylov-Schur分解算法
Matrix Algorithms || EigenSystems. G.W.Stewart P

收敛速度更快经多个实际算例验证，其速度明显快于目前通用的ARPACK，一般迭代次数仅为ARPACK的60%-70%。 Why?

Subspace iteration with approximate spectral projection
最新算法 Subspace iteration with approximate spectral projection 基于cauch积分的spectral projection 可求出复平面内指定区域内的所有特征值主要用于对称矩阵，需推广到非对称矩阵 FEAST AS A SUBSPACE ITERATION EIGENSOLVER ACCELERATED BY APPROXIMATE SPECTRAL PROJECTION --- P.TAK, P.TANG, E.POLIZZI

shift-invert变换标准的shift-invert变换
Matrix transformations for computing rightmost eigenvalues of large sparse non-symmetric eigenvalue problems -- K.MEERBERGEN, D.ROOSE

Cayley 变换 Cayley变换 Cayley变换特性 1 平行于虚轴的直线映射到单位圆 2 该直线左侧的点被映射到单位圆内部
1 平行于虚轴的直线映射到单位圆 2 该直线左侧的点被映射到单位圆内部 3 该直线右侧的点被映射到单位圆外部 Matrix transformations for computing rightmost eigenvalues of large sparse non-symmetric eigenvalue problems -- K.MEERBERGEN, D.ROOSE

Filter polynomial --- Matrix Algorithms || EigenSystems. G.W.Stewart P.317

附一 Aleksei Nikolaevich Krylov (1863–1945)
showed in 1931 how to use sequences of the form {b, Ab, A2b, . . .} to construct the characteristic polynomial of a matrix. Krylov was a Russian applied mathematician whose scientific interests arose from his early training in naval science that involved the theories of buoyancy, stability, rolling and pitching, vibrations, and compass theories. Krylov served as the director of the Physics–Mathematics Institute of the Soviet Academy of Sciences from 1927 until 1932, and in 1943 he was awarded a “state prize” for his work on compass theory. Krylov was made a “hero of socialist labor,” and he is one of a few athematicians to have a lunar feature named in his honor—on the moon there is the “Crater Krylov.” Walter Edwin Arnoldi (1917–1995) was an American engineer who published this technique in 1951, not far from the time that Lanczos’s algorithm emerged. Arnoldi received his undergraduate degree in mechanical engineering from Stevens Institute of Technology, Hoboken, New Jersey, in 1937 and his MS degree at Harvard University in He spent his career working as an engineer in the Hamilton Standard Division of the United Aircraft Corporation where he eventually became the division’s chief researcher. He retired in While his research concerned mechanical and aerodynamic properties of aircraft and aerospace structures, Arnoldi’s name is kept alive by his orthogonalization procedure.

附二参考文献 [1] Numerical Analysis. Rainer Kress. Springer-Verlag. 1991
[2] I.S.Duff, A.M.Erisman, and J.K.Reid. Direct Methods for Sparse Matrices. London:Oxford Univ. Press,1986. [3] J.W.H.Liu. The Multifrontal Method for Sparse Matrix Solution: Theory and Practice. SIAM Rev., 34 (1992), pp [4] T.A.Davis. A column pre-ordering strategy for the unsymmetric-pattern multifrontal method, ACM Trans. Math. Software, vol 30, no. 2, pp , 2004. [5] N.J.Higham. Accuracy and Stability of Numerical Algorithms. SIAM,2002 [6] G..H.Golob, C.F.Van loan. Matrix Computations. The Johns Hopkins University Press. 1996 [7] J.W.H.Liu. The Multifrontal Method for Sparse Matrix Solution: Theory and Practice. SIAM Rev., 34 (1992), pp [8] Fast PageRank Computation via a Sparse Linear System (Extended Abstract) Gianna M. Del Corso,Antonio Gullí, Francesco Romani. [9] Y.Saad, Iterative Methods for Sparse Linear Systems, PWS, Boston,1996 [10] Y.S. Chen* ,Simon Li. Application of Multifrontal Method for Doubly-Bordered Sparse Matrix in Laser Diode Simulator. NUSOD,2004 [11] 陈英时吴文勇等. 采用多波前法求解大型结构方程组.建筑结构,2007年09期 [12]宋新立,陈英时等.电力系统全过程动态仿真中大型稀疏线性方程组的分块求解算法

非常感谢各位老师，同学！ MAIL: QQ:

大型稀疏矩阵的LU分解及特征值求解陈英时 2016 . 1. 9 2013. 7. 20.

Similar presentations

Presentation on theme: "大型稀疏矩阵的LU分解及特征值求解陈英时 2016 . 1. 9 2013. 7. 20."— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

大型稀疏矩阵 的LU分解及特征值求解 陈英时 2016 . 1. 9 2013. 7. 20.

Similar presentations

Presentation on theme: "大型稀疏矩阵 的LU分解及特征值求解 陈英时 2016 . 1. 9 2013. 7. 20."— Presentation transcript:

Similar presentations

About project

反馈

大型稀疏矩阵的LU分解及特征值求解陈英时 2016 . 1. 9 2013. 7. 20.

Presentation on theme: "大型稀疏矩阵的LU分解及特征值求解陈英时 2016 . 1. 9 2013. 7. 20."— Presentation transcript: