Manifold Learning Kai Yang sadoii@163.com 1.12.2015
Machine Learning Problem (Training Data) f C罗 1.欧式空间 2.我们是否还需要去关注,当数据不处于欧式空间的时候,如何解决? We always think X and Y are in Euclidean space f:X→Y Manifold learning
Outline What’s manifold and manifold learning? What’s classical methods and its application? Summary and thought. Manifold learning
What’s manifold and manifold learning? Background Motivation Manifold learning
Background Manifold Manifold = Many + Fold 流形学习——北大数学系江泽涵教授 2.流形是一个空间,而不是一个形状——流形空间。 3.形象表示流形,不从数学上说明。 Manifold learning
Background Manifold learning Dimensionality reduction The geometry and topology of data manifold Study on machine learning problem under manifold assumption Dimensionality reduction Manifold learning
Motivation Data:Euclidean space Traditional dimensionality reduction Principal Component Analysis(PCA) Linear Discriminant Analysis(LDA) PCA 降维的意义: 1.原始采样空间冗余信息过多 2.数据可视化,特征提取 Data:Euclidean space LDA Manifold learning
Motivation Traditional method in manifold PCA Not Work! Manifold learning
Manifold learning Manifold Dimensionality Reduction It’s an dimensionality reduction method based on manifold space Manifold Dimensionality Reduction Manifold learning
Manifold learning Low-dimensional embedding / coordinate space dimensionality reduction Maintain a certain geometric properties (principle) High-dimensional data / observation space Low-dimensional embedding / coordinate space [王瑞平,流形学习专题介绍] Manifold learning
Outline What’s classical methods and its application? 等距离映射 局部线性嵌入 (Isometric MaPPing,ISOMAP) 局部线性嵌入 (Locally Linear Embedding,LLE) Manifold learning
The manifold ways of perception[H. Sebastian Seung, Daniel D The manifold ways of perception[H. Sebastian Seung, Daniel D. Lee,2000,science] 这篇文章的重要在于,告诉人们所认知的世界,是流形的。 而机器学习是让机器模拟人的行为,因此让机器去研究流形的流形学习就有了现实意义。 Professor of Computer Science and the Princeton Neuroscience Institute Professor in School of Engineering and Applied Science at the University of Pennsylvania Manifold learning
Isometric Feature Mapping J.B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, vol. 290, pp. 2319--2323, 2000. Cited Views: 7424 Manifold learning
Isometric Feature Mapping geodesic distance Embedding space of dimensionality reduction Euclidean distance vs. geodesic distance Shortest path approximate geodesic distance 我们研究的是流形空间,而非欧式空间。那么我们需要找到一种,在流形空间中表示距离的量。 欧式空间的距离,两个点的坐标差。那么在流形空间,应该如何鉴定两个点的距离呢?——测底距离。 测底距离:1.形成紧邻图。2.相邻点用欧氏距离表示测底距离,不相邻点用最短路径表示测底距离。 Manifold learning
Isometric Feature Mapping The basic idea After the reduction, the distance between any two points in low-dimensional space should be same with distance in the original high-dimensional space xi xj gij yi yj dij Mapping gij dij Manifold learning
Isometric Feature Mapping 证明在欧式空间相似性较小,但流形空间相似性大的数据,降维之后仍能保持原有的相似性。 16 Manifold learning
Locally linear Embedding Cited Views: 7660 Locally linear Embedding S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, vol. 290, pp. 2323--2326, 2000. Cited Views: 7660 Manifold learning
Locally linear Embedding The basic idea Sampling data with low-dimensional manifold is linear approximately a Euclidean space locally Data samples on manifold Manifold learning
Locally linear Embedding Manifold learning
Locally linear Embedding Manifold learning
Application Data visualization Information retrieval Image process Pattern recognition …… Manifold learning
Outline Summary and Thoughts. Compare ISOMAP and LLE Conclusion Resources and Reference Manifold learning
ISOMAP vs LLE Similar They all can keep the geometrical properties of manifold for the same purpose. Manifold learning
ISOMAP vs LLE Different Isomap wants to maintain the geodesic distance between any two points while LLE hope to maintain local linear relationship 从保持几何的角度来看,Isomap保持了更多的信息量 LLE算法希望样本集均匀稠密采样于低维流形,因此,对于受噪声污染、样本密度稀疏或相互关联较弱的数据集,在从高维观测 空间到低维嵌入空间的映射过程中,可能会将相互关联较弱的远点映射到局部近邻点的位置,从而破坏了低维嵌入结果。 Manifold learning
ISOMAP vs LLE Different Isomap:global LLE:local Too Hard Better 从复杂度角度来看 Isomap:考虑任意两点之间的关系,这个数量将随着数据点数量的增多而爆炸性增长,从而使得计算难以负荷。 以 LLE 为开端的局部分 析方法的变种和相关的理论基 础研究逐渐受到更多的关注。 Too Hard Better Isomap:global LLE:local Manifold learning
Conclusion Advantage Disadvantages Based on the geometry structure of the manifold, can keep the original information Disadvantages The assumption of manifold structure Neighborhood parameter k 非线性:基于流形内在几何结构,体现现实数据的本质 对观察数据存在流形结构的假设 需要调节较多的算法参数,如k-NN的邻域参数k Manifold learning
Resources Isomap LLE Mani fold Learning Matlab Demo http://isomap.stanford.edu/ LLE http://www.cs.nyu.edu/~roweis/lle/publications.html Mani fold Learning Matlab Demo http://www.math.ucla.edu/~wittman/mani/index.html Comparison of Manifold Learning methods http://scikit-learn.org/stable/auto_examples/ manifold/plot_compare_methods.html http://people.cs.uchicago.edu/~xiaofei/ Manifold learning
Reference Xiaofei He :manifold learning http://www.cad.zju.edu.cn/reports/%C1%F7%D0%CE%D1%A7%CF%B0.pdf Homepage: http://people.cs.uchicago.edu/~xiaofei/ Joshua B. Tenenbaum, Vin de Silva, John C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction Sam T. Roweis and Lawrence K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding Chunguang LI. Manifold Learning and its Application in Pattern Recognition Yingke Lei. The study of Manifold Learning Algorithms and Their Applications Ruiping Wang. Manifold Learning presentations http://blog.csdn.net/xywlpo/article/details/6450632 http://blog.sciencenet.cn/blog-722391-583413.html Manifold learning
Question and Answer? 2011年11月1日
Thanks! Manifold learning