Download presentation
Presentation is loading. Please wait.
1
Canonical Correlation Analysis 经典相关分析
Chapter 10 Canonical Correlation Analysis 经典相关分析 School of information Technology, Jiangxi University of Finance & Economics Zhu yongjun
2
典型相关分析 主要目的:识别和量化两组变量集之间的相关关系 (its use)Examples
Relating arithmetic speed and arithmetic power to reading speed and reading power Relating government policy variables with economic goal variables Relating college “performance” variables with precollege “achievement” variables
3
how can we relize the idea!
典型相关分析 主要讨论两个变量集中变量线性组合的相关关系。 第一步确定具有最大相关关系的线性组合对 其次确定同前不相关,且具有最大相关系数的线性对。如此等等。 how can we relize the idea! Something like PCA
4
CCA的主要内容 典型变量 典型相关系数 优化方面
Pairs of linear combinations used in canonical correlation analysis 典型相关系数 Correlations between the canonical variables Measures the strength of association between the two sets of variables 优化方面 Attempt to concentrate a high-dimensional relationship between two sets of variables into a few pairs of canonical variables
5
例题 10.5 工作满意度 任务特性 Job characteristics ,The answer may have implications for job design!
6
Example 10.5 Job Satisfaction
Job satisfaction, n=784
7
CCA的假设 In order to measure association between two groups of variables,we make some assumption. Build new variable prime , Partition of matrix
8
CCA的注意点 不同变量集成对变量之间的协方差包含在S12 中或者S21
当p和q相对较大时,使用 S12 中的元素来解释集合之间的相关程度相对要困难 典型相关分析可以使用少数协方差来总结两个变量集之间的相关关系 ,而不是用 S12
9
CCA的主要任务 It is often linear combinations of variables that are interesting and useful for predictive or comparative purpose. The main task of CCA is to summarize the associations between the X(1) and X(2) sets in terms of a few carefully chosen covariances (or correlations) rather than the pq covariances in S12.
10
原始变量的线性变换
11
典型变量的定义 First pair of canonical variables
Pair of linear combinations U1, V1 having unit variances, which maximize the correlation kth pair of canonical variables Pair of linear combinations Uk, Vk having unit variances both, which maximize the correlation among all choices uncorrelated with the previous k-1 canonical variable pairs
12
典型相关系数的定义 The correlation between the kth cannonical variate pair is called the kth cannonical correlation.Such as ,when correlation coefficient =1,it represent completely linear correlation! The following result gives the necessary details of obtaining the canonical variables and their correlations
13
结论 10.1 Suppose X(1) and X( 2) as above, p<q,U=aX(1),V=bX(2)
14
Result 10.1 ?
15
Result 10.1
16
According to the spectral decomposition of matrix,see p66 (2-22)
Proof of Result 10.1 According to the spectral decomposition of matrix,see p66 (2-22) Expressed as 张尧庭老师有另外一种方法来证明!Anderson(1984) use lagrange multipliers. nominator denominator See.p78,(2-48),c’*sigma*… etal as b
17
The first part of the brackets of the right inequality
Proof of Result 10.1 See p80,(2-51),PCA Denote is as f1
18
Proof of Result 10.1 AB,BA have same nonzero eigenvalue!
19
Proof of Result 10.1 orthogonal to
20
Proof of Result 10.1
21
典型变量 Application software such as Spss ,the standardized variable are used
22
Comment Decomposition
23
Comment Note: If there are multiple roots, the coefficient a and b is not the only one!
24
Example 10.1
25
Example 10.1 Choose b by this formula
26
Example 10.1 Scale change Unchange by standardized
27
其他求解方法 Why the correlation is the same , AB,BA have same egienvalue.
Two side multiply by sqare-root of inverse matrix of big sigma 11 get the third result. Why the correlation is the same , AB,BA have same egienvalue. Get the correlation,see Exercise 10.4
28
10.3 解释总体典型变量 典型变量一般来说是人工生成的. 即,没有明显的物理意义. If the original variables X(1) and X(2) are used, the canonical coefficients a and b have unit proportional to those of the X(1) and X(2) sets.
29
识别典型变量
30
Identifying Canonical Variables by Correlation
31
Example 10.2 Here Az and Bz is coefficient matrix.
32
典型相关系数同其他相关系数关系 This mean the first canonical correlation is larger than the absolute value of any entry in eho 12. CC are also the multiple correlation coefficent of U with X(2)
33
前r对典型相关系数总结了相关程度 坐标变换 X(1) to U=AX(1) and from X(2) to V=BX(2) 目的在于最大化 corr(U1,V1) and, 同样,corr(U2,V2)….(Ui,Vi) have zero correlation with the previous pairs.变量集之间的相关系数X(1) and X(2) 就被分类成了典型相关系数.
35
样本典型变量和样本典型相关系数
36
结果 10.2
37
矩阵表示
38
标准化数据的样本典型相关分析
41
Example 10.5 Job Satisfaction
42
Example 10.5 Job Satisfaction
43
Example 10.5: Sample Correlation Matrix Based on 784 Responses
44
Example 10.5: Canonical Variate Coefficients
45
Example 10.5: Sample Correlations between Original and Canonical Variables
47
10.5 渐近误差矩阵
48
Matrices of Errors of Approximations
49
渐近误差矩阵为
50
Example 10.6
51
Example 10.6
52
Example 10.6
53
典型变量和原始变量之间的样本相关相关系数矩阵
54
被解释的样本方差的比例
55
Proportion of Sample Variances Explained by the Canonical Variables
56
Example 10.7
57
大样本推断Result 10.3
58
Bartlett’s 修正
59
典型相关系数的显著性检验
60
Example 10.8
61
Example 10.8
62
SPSS program MANOVA VAR1 VAR2 VAR3 WITH VAR4 VAR5 VAR6 /DISCRIM RAW STAN ESTIM CORR ALPHA(1) /PRINT SIGNIF(MULT UNIV EIGN DIMNER) SIGNIF(EFSIZE) CELLINFO(CORR) /NOPRINT PARAM(ESTIM) /POWER T(.05) F(.05) /METHOD=UNIQUE /ERROR WITHIN+RESIDUAL /DESIGN. Open a new sytax,cope it. Then, we need change the var1 ….to your variable .
63
上面为各典型变量与变量组1中各变量间标化与未标化的系数列表,由此我们可以写出典型变量的转换公式(标化的)为: L1=0. 05759
上面为各典型变量与变量组1中各变量间标化与未标化的系数列表,由此我们可以写出典型变量的转换公式(标化的)为: L1= *var *var02 *var03
64
上表为第一变量组中各变量分别与自身、相对的典型变量的相关系数,可见它们主要和第一对典型变量的关系比较密切。
参考地址:
65
经典变量同原始第一组变量的关系
66
第二组变量关系 第二组中变量,这里称为为协变量covariaates
68
Sas程序 options ls=78; title "Canonical Correlation Analysis - Sales Data"; data sales; infile "D:\Statistics\STAT 505\data\sales.txt"; input growth profit new create mech abs math; run; proc cancorr out=canout vprefix=sales vname="Sales Variables" wprefix=scores wname="Test Scores"; var growth profit new; with create mech abs math; proc gplot; axis1 length=3 in; axis2 length=4.5 in; plot sales1*scores1 / vaxis=axis1 haxis=axis2; symbol v=J f=special h=2 i=r color=black;
Similar presentations