Chapter 8 Liner Regression and Correlation 第八章 直线回归和相关 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 In the past several chapters we have discussed the statistical analysis methods with one variable (X), for example, the tests of significant difference for two means (u test, t test,…) and variance analysis for more means,to get to know the difference of treatments (treatment combinations) in experimental indicators (traits). 在前面几章的统计分析方法仅考虑了一个变数,通过两个样本平均数差异显著性检验(u测验,t测验)或多个样本的方差分析和平均数差异显著性检验,了解不同处理(处理组合)在某一试验指标(性状,X)上表现的差异显著性。 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 In practice we also want to know the relationship of two or more variables, for example, we want to know the relations between planting densities (or amount of fertilizer use or hormone dosages) and the performance of one or more traits. Therefore,different relations between variables need to be studied. 在实践中我们还想了解两个或两个以上的变数间的关系,例如,了解不同的密度 (施肥量、激素浓度,等)与某个或某些试验指标(性状)之间的数量关系,因此,有不同的变数间的关系需要研究. 2017/9/11 Henan Agricultural University 3
Relationship between variables 第八章 简单直线回归与相关分析 Relationship between variables 变数之间的关系 Relationship of two variables (x, y) Liner relation: liner regression and correlation analysis Non-liner relation: non-liner regression and correlation analysis 两个变数(x, y) 间的关系: 简单直线回归和相关分析 曲线回归和相关分析 2017/9/11 Henan Agricultural University 4
Henan Agricultural University 第八章 简单直线回归与相关分析 Relationship of more than two variables (x1, x2, …, y): Multi-variable regression and correlation analysis 三个及三个以上变数 (x1, x2, ..., y) 间的关系: 多元回归与相关分析 本章仅讨论简单直线回归和相关分析 2017/9/11 Henan Agricultural University 5
Section 1 Liner regression and correlation 第八章 简单直线回归与相关分析 Section 1 Liner regression and correlation 第一节 直线回归和相关 . . From an experiment the n pairs of observed data of X and Y are presented as (x1, y1), (x2, y2), …, (xn, yn). The observed values can be illustrated in the figure, indicating a liner relationship. y . . . . . . . . x 对于变数X和Y,可以有n对观察值 (x1, y1), (x2, y2), …, (xn, yn)。在直角坐标图上表现为直线性关系。 2017/9/11 Henan Agricultural University 6
Henan Agricultural University 第八章 简单直线回归与相关分析 There are two models to describe the relationship of X and Y in statistics: Regression Model and Correlation Model 在统计上有两种模型来描述X与Y的关系: 回归模型 和 相关模型。 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 一、Regression Model 回归模型 In this model the relation of X and Y can be defined as : 在回归模型中,X与Y的关系可以表达为: X is a independent variable and Y is a dependent variable. The variation of Y depends on the variation of X. 两变量有因果关系,X 为自变量,Y 为依变量;Y 随X的变化而变化; 2017/9/11 Henan Agricultural University 8
Henan Agricultural University 第八章 简单直线回归与相关分析 It is easy to control X, leading to no experimental error or less error. Y is changed with error since it is dependents on the variation of X and other influencing factors. X比较容易控制,没有或很少有试验误差,Y有随机误差,在受X变化的影响的同时还受到其它因素的影响; The analysis of regress model is to setup a function of X and Y. After a significant test of the relation Y is estimated by X in practice. 回归分析是建立回归方程,并进行显著性检验,实现由X来预测Y。 2017/9/11 Henan Agricultural University 9
Henan Agricultural University 第八章 简单直线回归与相关分析 二、Correlation Model 相关模型 In this model, the relation of X and Y can be defined as : 在相关模型中,X与Y的关系可以表达为: X and Y has a parallel relationship, without being classified as a independent or dependent variable. 两变数为平行关系:X 与无自变数、依变数之分; 2017/9/11 Henan Agricultural University 10
Henan Agricultural University 第八章 简单直线回归与相关分析 Both X and Y have experimental error X和Y都有试验误差; The analysis of correlation model is to calculate a correlation coefficient to describe the relationship of X and Y. A significant test of the correlation coefficient is conducted. 相关分析是计算出相关系数,并进行显著性检验,由 相关系数来描述而二者间的相关关系。 2017/9/11 Henan Agricultural University 11
Section 2 Liner regression analysis 第八章 简单直线回归与相关分析 Section 2 Liner regression analysis 第二节 直线回归分析 一、Function of liner regression 直线回归方程 y = a + bx a – regression intercept 回归截距 b – regression coefficient 回归系数 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 Calculation of a and b 回归系数b和回归截距a的计算 b = (x – x)(y – y)/ (x – x)2 = (xy – (x y)/n) /(x2 – ((x)2/n) = SP/SSx SP 为两变量的离差乘积和 (sum of products) , SSx为自变量离差平方和 (sum of squares) a = y – b x 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 Characteristics of regression coefficient 回归系数的特点 回归系数 b 表示了自变数X每增加一个单位,依变数Y平均增加 (b > 0) 或减少 (b < 0) 的单位数; b 的取值可正、可负; b 的取值范围没有限制; 2017/9/11 Henan Agricultural University 14
Henan Agricultural University 第八章 简单直线回归与相关分析 A setup of regression function (example) 直线回归方程的建立(例子) Example 8.1. An investigation of the planting density and yield of peach was conducted in 10 orchards with the peach trees of 8-12 years old. The data is illustrated in the table. Please analysis the relationship. 例 8.1. 调查了若干地区10个果园8-12年生长的十郎梨的栽培密度与产量的关系,结果如右表。试分析该品种的密度与产量间的关系。 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 1. Calculation of the data 数据计算 计算一级数据: x = 6090 x = 609 x2 = 4090500 n = 10 y = 304.5 y = 30.45 y2 = 9929.25 xy = 200880 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 计算二级数据: SP = xy – x y/n = 15439.5 SSx = x2 – (x)2/n = 381690 SSy = y2 – (y)2/n = 657.225 b = SP/SSx = 0.0405 a = y – bx = 5.7855 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 2. Setup of regression function and figure 建立回归方程和回归关系图 y = 5.78 + 0.041x y y = 5.78 + 0.041x . . Yield(t/ha) . . . . . . . . x Density (plants/ha) 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 3. Prediction of Y by X 由 X 来预测 Y y = 5.78 + 0.041x 例如,当 x = 500 时,y = 5.78 + 0.041×500 = 26.3 Attention: 需要注意 The x should be from 330 to 960 for predicting y since the function was made from this set of data. Exceeding of the x may have different relation with y. X的取值范围为330-960, 超出范围可能出错,因为二者 关系可能不是直线的. The regression should be significantly existed. 二者的回归关系必须显著存在. 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 二、Significant test of the regression 直线回归的显著性检验 Three ways can be used to test the existence of regression. 以下三个途径可以检验回归是否存在: F test F测验 T test of regression coefficient b 回归系数的 t 测验 Reference from the significant of correlation coefficient r 参考相关系数的显著性 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 1. F test F测验 The total variation of Y can be divided into two parts: one part of the variation is due to the effect of X called variation of regression (denoted as U); another part is due to other factors called deviation from regression (denoted as Q). Y 的总变异可以分解为两部分:一部分是由X的影响带来的变异,叫做回归变异(由U表示);另一部分变异不是有X影响的,是其他因素造成的,叫做离回归变异(由Q表示)。 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 Y 总变异 = 回归变异U + 离回归变异Q Y的总方差 Sy2 = 回归的方差 SU2 + 离回归方差 SQ2 Y的总平方和 SSy = 回归的平方和 SSU + 离回归平方和SSQ Y的总自由度DFy = 回归的自由度DFU + 离回归自由度DFQ 平方和及自由度的分解 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 Calculation of the variances 方差的计算 Calculation of SS 离差平方和的计算 SS for Y 依变数Y的平方和 SSy = ∑(y −y)2 = ∑y2 − (∑y)2 /n = 9929,25 − (304,5)2 /10 = 657,225 SS for U 回归平方和 SSU = b SP = 0,041x 15439,5 = 624,533 SS for Q 离回归平方和 SSQ = SSy−SSU = 657, 225−624.533 = 32, 6916 2017/9/11 Henan Agricultural University 23
Henan Agricultural University 第八章 简单直线回归与相关分析 Calculation of DF 自由度的分解 DF for Y Y的自由度 DFy = n −1 = 10 − 1 = 9 DF for U 回归自由度 DFU = 1 DF for Q 离回归自由度 DFQ = n − 2 = 10 − 2 = 8 2017/9/11 Henan Agricultural University 24
Henan Agricultural University 第八章 简单直线回归与相关分析 Table of ANOVA 方差分析表 变异来源 SS DF MS F F0.05 or F0.01 回归变异 SSU 1 SSU / 1 MSU / MSQ 离回归变异 SSQ n -2 SSQ /(n-2) Y的总变异 SSy n -1 The variation of regression is significantly existed if F statistic is larger than F0.05. 如果计算出的F值大于F0.05,回归变异显著存在,即回归方程成立,可由x预测y。 2017/9/11 Henan Agricultural University 25
Henan Agricultural University 第八章 简单直线回归与相关分析 Table of ANOVA for Example 8.1 例 8.1 方差分析表 变异来源 SS DF MS F F0.05 回归变异 624,533 1 152,83** 11,26 离回归变异 32,692 8 4,086 Y的总变异 657,225 9 . 回归变异极显著地存在,说明X极显著地影响着Y的变异,回归方程有效。 . 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 2. Significant test of regression coefficient – t test t 测验--检验回归系数的显著性 检验回归系数是否真实存在。 H0: 直线回归关系不存在, = 0; HA: 直线回归关系存在, 0 t = b / Sb 遵循 df = n-2 的 t 分布。 推断:若 t ≥ t α, df , 则否定H0 . . 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 回归系数的标准误 Sb 为: S b = 直线回归的估计标准误S y/x / √ SSx = S y/x / √ SSx 直线回归的估计标准误 (离回归标准误 ) S y/x = (离回归平方和 Q /自由度)1/2 = (Q/(n-2) )1/2 = 2.0215 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 回归系数的标准误 Sb 为: S b = S y/x / √ SSx = 2.0215/√ 381690 = 0.0327 t = b / Sb = 0.0405 / 0.0327 = 12.377** 因 t > t 0.01,8 = 3.355,所以接受HA,直线回归关系显著存在。 . . 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 3. Reference from the significant of correlation coefficient r 参考相关系数 r 的显著性 Please have a look at the discussion in Section 3. 以下第三节讨论 2017/9/11 Henan Agricultural University
Section 3 Liner correlation analysis 第三节 直线相关分析 第八章 简单直线回归与相关分析 Section 3 Liner correlation analysis 第三节 直线相关分析 一、Calculation of correlation coefficient 相关系数的计算 r = SP /√SSx SSy 例8.1 中, r = 15439.5/(381690 x 657.225) 1/2 = 0.9748 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 二、Significant test of correlation coefficient 相关系数的显著性测验 There are two ways for testing the significance of correlation coefficient. One way is t-test and another is comparing with listed standard r in a table. 有两种方法检验相关系数的显著性,一种方法是 t-测验法,另一方法是查表法(与表列的标准相关系数比较)。 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 1. t-test of correlation coefficient 相关系数的 t 测验 检验相关系数是否真实存在。 H0:相关不存在,总体相关系数 = 0; HA:相关存在, 0 t = (r - )/ Sr = r / Sr 符合自由度= n−2的t分布 相关系数的标准误 Sr = √(1 − r2)/(n − 2) 比较 | t | 与 t ,n-2,接受或拒绝H0 2017/9/11 Henan Agricultural University 33
Henan Agricultural University 第八章 简单直线回归与相关分析 例8.1 中, Sr = √(1 − r2)/(n − 2) = √(1 − 0.95022)/8 = 0.0789 t = r / Sr = 0.9748/0.0789 = 12.35** 推断:t > t 0.01,8 = 3.3554,接受HA,即梨的产量与栽培密度有极显著的相关关系。 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 2. Comparing with a standard r in a table 查表法检验相关系数的显著性 查 P288 附表9 “r与R的显著数值”,得到一定自由度(v = n−2) 、显著水平(P)和自变量个数下的临界值 rα,v 。比较算得的 | r | 与临界 rα,v , 确定相关系数的显著性。若 | r | ≥ rα,v , 则相关系数(极)显著性;反之,则不显著。 例8.1 中,查表得 r0.01, 8 = 0.765,因此,r = 0.9748** 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 三、Characteristics of correlation coefficient 相关系数的特点 Correlation coefficient indicates the relation of two paralell variables at some extent. 相关系数一定程度地表示了两个平行变数间的相关 程度; It can be positive or negtive. 相关系数可正、可负; It is in a rang of −1≤ r ≤1. 相关系数的大小(取值范围):− 1 ≤ r ≤1 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 四、Determination coefficient r2 决定系数 r2 Determination coefficient is a square value of correlation coefficient. 决定系数是相关系数的平方值; Determination coefficient represents the relation of two paralell variables. 决定系数表示在 x 和 y 的 协同变异中,由相关所能 解释的部分,表示了两个平行变数间的相关程度; It is in a rang of 0≤ r ≥1. 决定系数的大小(取值范围):0 ≤ r ≥ 1 2017/9/11 Henan Agricultural University 37
Henan Agricultural University 第八章 简单直线回归与相关分析 五、Relationship of liner regression and correlation 直线回归与相关的关系 性质一致:b与 r 正负一致,因都由SP算来; 显著性一致: b与 r 显著性一致;因此,在实际应用中,常利用相关系数的显著性(查表检验)来推断回归系数的显著性。 2017/9/11 Henan Agricultural University
Henan Agricultural University 第八章 简单直线回归与相关分析 Exercises 练习题 P194 习题 7 2017/9/11 Henan Agricultural University