Chapter 7 Dimensionality reduction Prof. Dehan Luo

Chapter 7 Dimensionality reduction Prof. Dehan Luo
第七章纬度压缩 Section One The curse of dimensionality 第一节多纬度存在的问题 Section Two Feature extraction vs. feature selection 第二节特征提取与特征选择 Section Three Principal Components Analysis 第三节主要成分分析 Section Four Linear Discriminant Analysis 第四节线性判别分析 Intelligent Sensors System School of Information Engineering

Section One The curse of dimensionality 多纬度存在的问题 The “curse of dimensionality” Refers to the problems associated with multivariate data analysis as the dimensionality increases （由于纬度增加，多变量数据分析引起一些严重问题） Consider a 3-class pattern recognition problem（三类模型识别问题） Three types of objects have to be classified based on the value of a single feature: （根据单个特征对三类目标进行分类） Intelligent Sensors System School of Information Engineering

A 3-class pattern recognition problem（三类模型识别问题）（续） A simple procedure would be to （1）Divide the feature space into uniform bins （将特征空间分成三个相同的箱柜）（2）Compute the ratio of examples for each class at each bin （在每一个箱中计算每个样本所占比例）and, （3）For a new example, find its bin and choose the predominant class in that bin（对一个新样本，找到一箱柜，在这箱柜中该类占主要部分） We decide to start with one feature and divide the real line into 3 bins （首先用一个特征，并将一根实线（一维空间）分成三段（三箱柜）） Notice that there exists a lot of overlap between classes ⇒ to improve discrimination, we decide to incorporate a second feature （在类与类之间有重叠，为了改善识别，结合第二特征） Intelligent Sensors System School of Information Engineering

A 3-class pattern recognition problem（三类模型识别问题）（续） Moving to two dimensions increases the number of bins from 3 to 32 = 9 （将一维空间的三个箱柜移到二维空间则增至为9个箱柜） QUESTION: Which should we maintain constant? The density of examples per bin? This increases the number of examples from 9 to 27（每个箱柜保持样本密度不变？） The total number of examples? This results in a 2D scatter plot that is very sparse（保持总样本数不变？） Intelligent Sensors System School of Information Engineering

A 3-class pattern recognition problem（三类模型识别问题）（续） Moving to three dimensions increases the number of bins from 3 to 33 = 27 （将一维空间的三个箱柜移到三维空间则增至为27个箱柜） The number of bins grows to 33=27 To maintain the initial density of examples, the number of required examples grows to 81（要保持初始样本密度，样本数需要增加到81个） For the same number of examples the 3D scatter plot is almost empty （要保持相同的样本数，三维图中的箱柜空的太多） Intelligent Sensors System School of Information Engineering

A 3-class pattern recognition problem（三类模型识别问题）（续） Implications of the curse of dimensionality Exponential growth with dimensionality in the number of examples required to accurately estimate a function （为获得精确地估计函数，样本数随纬度按指数规律增加） In practice, the curse of dimensionality means that For a given sample size, there is a maximum number of features above which the performance of our classifier will degrade rather than improve （在相同样本数条件下，取最大特征数，其分类性能降低而不是改善） Intelligent Sensors System School of Information Engineering

A 3-class pattern recognition problem（三类模型识别问题）（续） In most cases The information that was lost by discarding some features is compensated by a more accurate mapping in lower dimensional space performance （在大多数情况下，因丢弃某些特征而失去的信息可由低纬度空间更精确映射来补偿） Intelligent Sensors System School of Information Engineering

Section Two Feature extraction vs. feature selection
Chapter 7 Dimensionality reduction Prof. Dehan Luo Section Two Feature extraction vs. feature selection 第二节特征提取与特征选择 How do we beat the curse of dimensionality? （如何解决纬度问题？） By incorporating prior knowledge （结合先前知识） By providing increasing smoothness of the target function （增加目标函数的平稳性） By reducing the dimensionality （降低纬度） Intelligent Sensors System School of Information Engineering

Feature extraction vs. feature selection（特征提取与特征选择）（续） Two approaches to perform dim. reduction RN→RM (M<N) Feature selection: choosing a subset of all the features （选择所有特征子集） Feature extraction: creating new features by combining existing ones （结合已有特征，建立新的特征） In either case, the goal is to find a low-dimensional representation of the data that preserves (most of) the information or structure in the data Intelligent Sensors System School of Information Engineering

Feature extraction vs. feature selection（特征提取与特征选择）（续） Linear feature extraction The “optimal” mapping y=f(x) is, in general, a non-linear function whose form is problem-dependent （通常，最佳映射y=f(x) 是非线性函数，其表达方式与问题相关） Hence, feature extraction is commonly limited to linear projections y=Wx （特征提取只限于线性关系） Intelligent Sensors System School of Information Engineering

Feature extraction vs. feature selection（特征提取与特征选择）（续） Two criteria can be used to find the “optimal” feature extraction mapping y=f(x)（有两个标准用于求得“最佳”特征提取映射y=f(x)） Signal representation: The goal of feature extraction is to represent the samples accurately in a lower-dimensional space （特征提取目标之一是在低纬度空间精确表达样本） Classification: The goal of feature extraction is to enhance the class discriminatory information in the lower-dimensional space （特征提取目标之二是在低纬度空间增强分类识别信息） Intelligent Sensors System School of Information Engineering

Feature extraction vs. feature selection（特征提取与特征选择）（续） Within the realm of linear feature extraction, two techniques are commonly used （1）Principal Components Analysis (PCA) Based on signal representation (主要成分分析，基于信号表达方式）（2）Fisher’s Linear Discriminant Analysis (LDA)， Based on classification (Fisher’s线性判别分析，基于信号表达方式） Intelligent Sensors System School of Information Engineering

Section Three Principal Components Analysis
Chapter 7 Dimensionality reduction Prof. Dehan Luo Section Three Principal Components Analysis 第三节主要成分分析 Let us illustrate PCA with a two dimensional problem Data x follows a Gaussian density as depicted in the figure。Vectors can be represented by their 2D coordinates（数据X服从如图所示的Gaussian密度分布，他在两维空间的矢量也在图中所示） Intelligent Sensors System School of Information Engineering

Principal Components Analysis（主要成分分析）（续） Let us illustrate PCA with a two dimensional problem（续） We seek to find a 1D representation x’ “close” to x Where “closeness” is measured by the mean squared error over all points in the distribution Intelligent Sensors System School of Information Engineering

Principal Components Analysis（主要成分分析）（续） RESULT It can be shown that the “optimal”1D representation consists of projecting the vector x over the direction of maximum variance in the data (e.g., the longest axis in the ellipse) （可见，“最佳”一维表达是由最大数据偏差方向上的映射矢量X组成，（如椭圆中长轴）） This result can be generalized for more than two dimensions Intelligent Sensors System School of Information Engineering

Principal Components Analysis（主要成分分析）（续） Summary where is the eigenvector（特征向量）corresponding to the kth largest eigenvalue（特征值）of the covariance matrix（协方差矩阵）（ vk是相应于协方差矩阵第K个最大特征值的特征向量） Intelligent Sensors System School of Information Engineering

Linear Discriminant Analysis
Chapter 7 Dimensionality reduction Prof. Dehan Luo Linear Discriminant Analysis 第四节线性判别分析 The objective of LDA is to perform dimensionality reduction while preserving as much of the class discriminatory information as possible （线性判别分析目的是使纬度压缩而尽可能多的保留分类判别的信息） Assume we have a set of N-dimensional samples (x1, x2, …, xN), P1 of which belong to class ω1, and P2 to class ω2. We seek to obtain a scalar y （标量y）by projecting the samples x onto a line Of all the possible lines we would like to select the one that maximizes the separability(可分离性） of the classes Intelligent Sensors System School of Information Engineering

Linear Discriminant Analysis（线性判别分析）（续） In a nutshell, we want Maximum separation between the means of the projection（映射） Minimum variance within each projected class Intelligent Sensors System School of Information Engineering

Linear Discriminant Analysis（线性判别分析）（续） PCA Versus LDA（PCA与LDA比较） Intelligent Sensors System School of Information Engineering

Linear Discriminant Analysis（线性判别分析）（续） Limitations of LDA（LDA局限性）（1）LDA assumes unimodal Gaussian likelihoods （假设高斯分布是单峰的） If the densities are significantly non-Gaussian, LDA may not preserve any complex structure of the data needed for classification Intelligent Sensors System School of Information Engineering

Linear Discriminant Analysis（线性判别分析）（续） Limitations of LDA（LDA局限性）（续）（2）LDA will fail when the discriminatory information is not in the mean but rather in the variance of the data （若判别别数据不在均值上而是存在很大不一致性时，LDA判别失灵） Intelligent Sensors System School of Information Engineering

Limitations of LDA（LDA局限性）（续）（3） LDA has a tendency to overfit training data To illustrate this problem, we generate an artificial dataset. Three classes, 50 examples per class, with the exact same likelihood: a multivariate Gaussian with zero mean and identity covariance Intelligent Sensors System School of Information Engineering

Chapter 7 Dimensionality reduction Prof. Dehan Lu
Limitations of LDA（LDA局限性）（续）（3） LDA has a tendency to overfit training data As we arbitrarily (任意） increase the number of dimensions, classes appear to separate better, even though they come from the same distribution（分布） Intelligent Sensors System School of Information Engineering

Chapter 7 Dimensionality reduction Prof. Dehan Luo

Similar presentations

Presentation on theme: "Chapter 7 Dimensionality reduction Prof. Dehan Luo"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

Chapter 7 Dimensionality reduction Prof. Dehan Luo

Similar presentations

Presentation on theme: "Chapter 7 Dimensionality reduction Prof. Dehan Luo"— Presentation transcript:

Similar presentations

About project

反馈