Download presentation
Presentation is loading. Please wait.
1
Simple Linear Regression -2
第十三章 簡單線性迴歸分析-2 Simple Linear Regression -2
2
學習目標 各項平方和的求解 模型各變異量的估計 決定係數的計算 線性相關係數的估計 電腦使用及報表的解讀 1. 簡單線性迴歸模型的描述
1. 簡單線性迴歸模型的描述 2. 最小平方法的觀念與應用 3. 簡單迴歸模型參數之估計 各項平方和的求解 模型各變異量的估計 決定係數的計算 線性相關係數的估計 電腦使用及報表的解讀 As a result of this class, you will be able to...
3
迴歸模型使用時的步驟Regression Modeling Steps
1. 事先決定反應變數與獨立變數間的模式 2. 估計模式的參數 3. 模式中誤差項的機率分配之描述 估計誤差項的變異情形 4. 評估模式 5. 利用模式做估計或預測工作 F
4
簡單線性迴歸模型 1. 獨立變數和反應變數之間為線性關係 Y = b + b X + e 截距參數 Y-intercept
1. 獨立變數和反應變數之間為線性關係 截距參數 Y-intercept 斜率參數slope 自變數(Independent , explanatory variable) Y = b + b X + e i 1 i i 因變數(Dependent response variable) 隨機誤差Random error
5
迴歸模型適用前滿足之假設Linear Regression Model Assumptions
1. 隨機誤差機率分配的平均數為0 2. 隨機誤差機率分配的變異數為固定常數s2 3. 隨機誤差機率分配為常態分配 4. 任何隨機誤差間均相互獨立 i.i.d:獨立且為完全相同之分配
6
隨機誤差機率分配示意圖Error Probability Distribution
^ f( e ) Y X 1 X 2 X 91
7
簡單線性迴歸模型取樣後結果 Sample Linear Regression Model
ei = 觀察到的誤差 ^ 未取到的觀察值 觀察值 36
8
最小平方法的圖形表達 Least Squares Method Graphically
LS即為使得 最小 Y ^ e ^ 4 e 2 ^ e e ^ 1 3 X 52
9
方程式各係數的求解 註: 必在迴歸線上 預估方程式 方程式截距的估計 方程式斜率的估計 53
10
隨機誤差變異量 Random Error Variation
^ 1. 真實的Y與預估的Y 間的差異變異情形 2. 根據迴歸模型所測得的標準誤 樣本標準誤 e, s 3. 受到下列因素的影響 模型選定的正確性 各個參數估計的正確性 ^
11
最小平方法的圖形表達 Least Squares Method Graphically
LS即為使得 最小 Y ^ e ^ 4 e 2 ^ e e ^ 1 3 X 52
12
隨機誤差變異量 Random Error Variation
^ 1. 真實的Y與預估的Y 間的差異變異情形 2. 根據迴歸模型所測得的標準誤 樣本標準誤 e, s 3. 受到下列因素的影響 模型選定的正確性 各個參數估計的正確性 ^
13
最小平方法的圖形表達 Least Squares Method Graphically
LS即為使得 最小 Y ^ e ^ 4 e 2 ^ e e ^ 1 3 X 52
14
隨機誤差變異量 Random Error Variation
^ 1. 真實的Y與預估的Y 間的差異變異情形 2. 根據迴歸模型所測得的標準誤 樣本標準誤 e, s 3. 受到下列因素的影響 模型選定的正確性 各個參數估計的正確性 ^
15
迴歸模型變異量的量測Measures of Variation in Regression
1. 總變異量 (SST或SSy) 觀察值Yi與平均數Y差異的平方和 2. 經由模型可解釋的變異量 (SSR) 平均數Y與預估值Yi間差異的平方和 3. 模型仍未解釋之隨機變異量 (SSE) 其他未能考慮到的因素所產生的變異量 觀察值Yi與預估值Yi間差異的平方和 ^ ^
16
迴歸模型變異量的示意圖Variation Measures
模型尚未能解釋的平方和 (Yi - Yi)2 ^ Yi 總變異量平方和 (Yi -Y)2 模型已解釋的平方和 (Yi -Y)2 ^ 78
17
以文氏圖Venn Diagrams 說明迴歸模型解釋能力
Variations in sales explained by the error term SST = SSR + SSE Sales (SSE) (SSR) Variations in sales explained by sizes or variations in sizes used in explaining variation in sales Sizes
18
平方和SS變異計算 總變異:SST =SSy= 迴歸解釋變異:SSR= 誤差殘差變異: SSE= SST=SSE+SSR,
利用公式僅需計算任何兩項
19
以文氏圖Venn Diagrams 說明迴歸模型解釋能力
Variations in sales explained by the error term SST = SSR + SSE Sales (SSE) (SSR) Variations in sales explained by sizes or variations in sizes used in explaining variation in sales Sizes
20
平方和SS變異計算 總變異:SST =SSy= 迴歸解釋變異:SSR= 誤差殘差變異: SSE= SST=SSE+SSR,
利用公式僅需計算任何兩項
21
計算係數常用的表Computation Table
54
22
計算係數常用的公式 54
23
平方和的求解範例 你是銘傳熊寶寶的行銷分析人員,根據過去所花廣告費用與實際銷售量間的關係如下:
廣告費(千元) 用 銷售量 (千個) 廣告費用與銷售量間的各SS 為何?
24
銷售量對廣告費的散布圖Scattergram Sales vs. Advertising
廣告花費 57
25
估算用總結表 58
26
計算各平方和實例1 打開檔案:銷售與廣告 5組總和:
Sx=15, Sy=10, Sxy=37, Sx2 =55, Sy2 =26, n=5 5組總和: 計算SSy=SST=26-10*10/5=6, SSxy=37-15*10/5=7, SSx=55-15*15/5=10 計算 SSR=0.7*7=4.9 再計算SSE=6-0.7*7 = 1.1 = 6 – 4.9 使用Excel根據定義計算並比較 54
27
計算各平方和實例2 打開檔案:資料二 5組總和: 計算SSx=1020.833, Ssy=SST=2350, SSxy=925
Sx=185, Sy=210, Sxy=7400, Sx2 =6725, Sy2 =9700, n=6 5組總和: 計算SSx= , Ssy=SST=2350, SSxy=925 計算 SSR=0.906*925 =838.16 再計算SSE= *925 = =2350 – 54
28
變異數分析ANOVA表 ANOVA變異數分析表 1 n-2 df SS MS F Significance F Regression
p -1 SSR MSR =SSR/(p-1) MSR/MSE P-value of the F Test Residuals n-p SSE MSE =SSE/(n-p) Total n-1 SST 1 n-2
29
變異數分析ANOVA表計算 變異數分析表(ANOVA Table) 平方和:SST = SSR + SSE
自由度:(n-1) = (p-1) + (n-p);p = 2 MSR=SSR/((p-1) ; MSE=SSE/(n-p) MSE為誤差項之估計值(估計s2) 檢定Ho: 1=0 vs Ha: 1 0 Under Ho, F*=(MSR/MSE) ~ F(p-1; n-p) 計算「決定係數, R2 」 SST=SSE+SSR, R2 =SSR/SSE
30
ANOVA表解答1 ANOVA變異數分析表 SST SSE SSR source df SS MS F P-value Reg 1 4.9
13.36 0.0353 Error 3 1.1 0.367 Total 4 6.0 因迴歸所生的自由度Regression (explained) df SST SSE 因誤差所生的自由度Error (residual) df SSR 總自由度Total df S2誤差變異量估計
31
ANOVA表解答2 ANOVA變異數分析表 SST SSE SSR source df SS MS F P-value Reg 1
816.06 2.133 0.218 Error 4 383.49 Total 5 因迴歸所生的自由度Regression (explained) df SST SSE 因誤差所生的自由度Error (residual) df SSR 總自由度Total df S2誤差變異量估計
32
Excel Output for Produce Stores
ANOVA表範例 Excel Output for Produce Stores 自由度Degrees of freedom 因回歸所生的自由度Regression (explained) df SST SSE 因誤差所生的自由度Error (residual) df SSR 總自由度Total df
33
決定係數 The Coefficient of Determination
量測出Y(因變數)總變異量可以因引進X(相關變數、自變數)而被解釋掉(減少)的比例Measures the proportion of variation in Y that is explained by the independent variable X in the regression model
34
以文氏圖Venn Diagrams 說明迴歸模型解釋能力
Sales Sizes
35
模型決定係數 Coefficient of Determination
1. 模型所能產生的解釋變異量佔所有需解釋的總變異量的百分比 0 ≦ R2≦1 79
36
各決定係數所對應的關係圖Coefficient of Determination Examples
80
37
模型決定係數範例1 為何不能為負數,或者大於1? 假設某迴歸模型得到了r2=0.8,請解說其含義。
38
模型決定係數範例2 SST=SSR+SSE=36+4=40 90%的誤差變異量可以經由迴歸模型的引用得以減少,使得誤差變異量僅剩10%
假設某迴歸模型得到了SSR=36, SSE=4,求SST以及r2,並解說其含義。 SST=SSR+SSE=36+4=40 r2=SSR/SST=36/40=0.9 90%的誤差變異量可以經由迴歸模型的引用得以減少,使得誤差變異量僅剩10%
39
模型決定係數範例3 SSR=SST-SSE=40-10=30 75%的誤差變異量可以經由迴歸模型的引用得以減少,使得誤差變異量僅剩25%
假設某迴歸模型得到了SST=40, SSE=10,求SSR以及r2,並解說其含義。 SSR=SST-SSE=40-10=30 r2=SSR/SST=30/40=0.75 75%的誤差變異量可以經由迴歸模型的引用得以減少,使得誤差變異量僅剩25%
40
決定係數的求解實例1 你是銘傳熊寶寶的行銷分析人員根據過去所花廣告費用與實際銷售量間的關係如下:
廣告費(千元) 用 銷售量 (千個) 廣告費用與銷售量間的關係 為何?
41
計算決定係數實例1 打開檔案:銷售與廣告 5組總和:
Sx=15, Sy=10, Sxy=37, Sx2 =55, Sy2 =26, n=5 5組總和: 計算SSy=SST=26-10*10/5=6, SSxy=37-15*10/5=7, SSx=55-15*15/5=10 計算 SSR=0.7*7=4.9 再計算SSE=6-0.7*7 = 1.1 = 6 – 4.9 R2=4.9/6=81.7% 54
42
計算決定係數實例2 打開檔案:資料二 5組總和: 計算SSx=1020.833, Ssy=SST=2350, SSxy=925
Sx=185, Sy=210, Sxy=7400, Sx2 =6725, Sy2 =9700, n=6 5組總和: 計算SSx= , Ssy=SST=2350, SSxy=925 計算 SSR=0.906*925 =838.16 再計算SSE= *925 = =2350 – R2=838.16/2350=35.7% 54
43
迴歸模型的標準誤 Standard Error of Estimate
The standard deviation of the variation of observations around the regression line
44
決定係數與標準差估計範例 Produce Store Example
Excel Output for Produce Stores Syx r2 = .94 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage
45
決定係數範例 你是銘傳熊寶寶的行銷分析人員, 已知b0 = -0.1和 b1 = 0.7.
廣告(千元) 銷售量(千個) 請解釋決定係數(R2= 0.8167)的含意. ^ ^ 83
46
R2 電腦報表結果 R2 根據樣本數與X變數個數增減而調整的R2 =S ^
Root MSE R-square Dep Mean Adj R-sq C.V R2 根據樣本數與X變數個數增減而調整的R2 ^ =S
47
相關(線性)模型Correlation Models
1. 衡量兩變數之間線性相關的強度 2. 線性相關係數(coefficient of correlation) 母體(真正)相關係數為 其值介於-1 至 +1間 3. 用於了解兩變數之間的線性相關之強度及方向
48
樣本形成的線性相關係數Sample Coefficient of Correlation
量測兩數值變數間線性相關的程度 皮耳森線性相關係數(Pearson’s coefficient of correlation) r 註:r 與斜率估計正負號相同 132
49
雙數值變數的散佈圖形表達 —正相關例題一
50
線性相關係數r的計算一
51
雙數值變數的散佈圖形表達 —負相關例題二
52
線性相關係數r的計算二
53
線性相關係數的性質 Features of Correlation Coefficient
無單位Unit free 值在-1與1之間Ranges between –1 and 1 越靠近-1時表示負線性相關越強烈The closer to –1, the stronger the negative linear relationship 越靠近1時表示正線性相關越強烈The closer to 1, the stronger the positive linear relationship 數值靠近0時表示線性相關微弱The closer to 0, the weaker any positive linear relationship
54
各種線性相關所繪得的散佈圖 Y X Y X Y X r = -1 r = -.6 r = 0 Y X Y X r = .6 r = 1
55
線性相關係數值的含意Coefficient of Correlation Values
無線性相關 No Correlation -1.0 -.5 +.5 +1.0 循此方向逐漸加強兩者間的負線性相關關係Increasing degree of negative correlation 136
56
線性相關係數值的含意Coefficient of Correlation Values
無線性相關 No Correlation 完全負相關 -1.0 -.5 +.5 +1.0 循此方向逐漸加強兩者間的正線性相關關係Increasing degree of positive correlation 138
57
線性相關係數值的含意Coefficient of Correlation Values
無線性相關 No Correlation 完全正相關 完全負相關 -1.0 -.5 +.5 +1.0 139
58
線性相關係數值各範例 Coefficient of Correlation Examples
141
59
電腦報表結果 開啟: 資料一、產量與肥料、PHStat範例四以及PHStat範例五等四個檔案,利用Excel或PHStat軟體求得ANOVA表,並詳加解釋所得之r2, S以及線性相關係數估計r等所得結果。
60
今日課程複習 各項平方和的求解 模型各變異量的估計 決定係數的計算 線性相關係數的估計 電腦使用及報表的解讀 1. 簡單線性迴歸模型的描述
1. 簡單線性迴歸模型的描述 2. 最小平方法的觀念與應用 3. 簡單迴歸模型參數之估計 各項平方和的求解 模型各變異量的估計 決定係數的計算 線性相關係數的估計 電腦使用及報表的解讀 As a result of this class, you will be able to...
61
測驗與解答1 單選題: The least squares method minimizes which of the following?
a) SSR b) SSE c) SST d) All of the above ANSWER: b
62
測驗與解答2 單選題: ANSWER: a The Y-intercept (b0) represents the
a) predicted value of Y when X = 0. b) change in Y per unit change in X. c) predicted value of Y. d) variation around the line of regression. ANSWER: a
63
測驗與解答3 單選題: ANSWER: b The slope (b1) represents
a) predicted value of Y when X = 0. b) the average change in Y per unit change in X. c) the predicted value of Y. d) variation around the line of regression. ANSWER: b
64
測驗與解答4 單選題: In performing a regression analysis involving two numerical variables, we are assuming a) the variances of X and Y are equal. b) the variation around the line of regression is the same for each X value. c) that X and Y are independent. d) all of the above. ANSWER: b
65
測驗與解答5 單選題: ANSWER: b The residuals represent
a) the difference between the actual Y values and the mean of Y. b) the difference between the actual Y values and the predicted Y values. c) the square root of the slope. d) the predicted value of Y for the average X value. ANSWER: b
66
測驗與解答6 單選題: Which of the following assumptions concerning the probability distribution of the random error term is stated incorrectly? a) The distribution is normal. b) The mean of the distribution is 0. c) The variance of the distribution increases as X increases. d) The errors are independent. ANSWER: c
67
測驗與解答7 單選題: The standard error of the estimate is a measure of
a) total variation of the Y variable. b) the variation around the regression line. c) explained variation. d) the variation of the X variable. ANSWER: b
68
測驗與解答8 單選題: If the correlation coefficient (r) = 1.00, then
a) the Y-intercept (b0) must equal zero. b) the explained variation equals the unexplained variation. c) there is no unexplained variation. d) there is no explained variation. ANSWER: c
69
測驗與解答9 單選題: Assuming a linear relationship between X and Y, if the coefficient of correlation (r) equals – 0.30, a) there is no correlation. b) the slope (b1) is negative. c) variable X is larger than variable Y. d) the variance of X is negative. ANSWER: b
70
測驗與解答10 單選題: The coefficient of determination (r2) tells us
a) that the coefficient of correlation (r) is larger than one. b) whether r has any significance. c) that we should not partition the total variation. d) the proportion of total variation that is explained. ANSWER: d
71
測驗與解答11 In a simple linear regression problem, r and b1 單選題:
a) may have opposite signs. b) must have the same sign. c) must have opposite signs. d) are equal. ANSWER: b
72
綜合測驗與解答 TABLE 16-3 The director of cooperative education at a state college wants to examine the effect of cooperative education job experience on marketability in the work place. She takes a random sample of four students. For these four, she finds out how many times each had a cooperative education job and how many job offers they received upon graduation. These data are presented in the table below. Student CoopJobs JobOffer 1 4 2 6 3
73
綜合測驗與解答1 Referring to Table 16-3, set up a scatter diagram. ANSWER
74
綜合測驗與解答2 填充題: the least squares estimate of the slope is __________. the least squares estimate of the Y-intercept is __________. the prediction for the number of job offers for a person with 2 Coop jobs is __________. the total sum of squares (SST) is __________. 2.50 1.00 6.00 13.00
75
1綜合測驗與解答ANOVA表 ANOVA變異數分析表 SST SSE SSR source df SS MS F P-value Reg 1
12.5 50 0.001 Error 2 0.5 0.25 Total 3 13.0 因迴歸所生的自由度Regression (explained) df SST SSE 因誤差所生的自由度Error (residual) df SSR 總自由度Total df S2誤差變異量估計
76
1綜合測驗與解答2 填充題: Referring to Table 16-3, the total sum of squares (SST) is __________. the regression sum of squares (SSR) is __________. the error or residual sum of squares (SSE) is __________. the coefficient of determination is __________. the standard error of estimate is __________. the coefficient of correlation is __________. 13.0 12.5 0.50 0.942 0.50 0.981
77
2綜合測驗題目-1 The managing partner of an advertising agency believes that his company's sales are related to the industry sales. He uses MINITAB to analyze the last four years of quarterly data (i.e., n = 16) with the following results: The regression equation is Company = Industry Predictor Coef Stdev t-ratio p-value Constant Industry s = R-sq = 64.3% R-sq(adj) = 61.8%
78
2綜合測驗題目-2 ANOVA變異數分析表 Analysis of Variance SOURCE DF SS MS F p
Regression Error Total Durbin-Watson statistic = 1.59
79
2綜合測驗與解答 填充題: the value of the quantity that the least squares regression line minimizes is ________ . the estimates of the Y-intercept and slope are ________ and ____ ________ ____, respectively. the prediction for a quarter in which X = 120 is Y = ________ . the standard error of the estimate is ________. the coefficient of determination is ________. the adjusted coefficient of determination is ________. the correlation coefficient is ________. 11.912 3.962 8.816 0.9224 0.643 0.618 0.802
Similar presentations