Download presentation
Presentation is loading. Please wait.
1
Linear Regression Analysis (線 性 迴 歸 分 析)
張 玉 坤 淡江大學數學系 教授 國防醫學院護理系 兼任教授
2
Q: When do we need regression analysis?
Association (探討相關性 ) Prediction (建構預測模型) Adjustment (調整干擾因素效應) Q: What do we need to collect (prepare)? Collect profile data: Y and X1, X2, X3, … e.g. BW and gestation, maternal age, (smoking)…
3
廣義線性模式 (Generalized Linear Models)
探討相關性 or 建立預測模型 Y X1, X2, X3, … 例如: SBP Age, Sex, Race,… CHD Age, Sex, SBP,… Response Rate Treatments, Sex… General Linear Model 一般線性模式
4
General Linear Model (Y ~ Normal) 可涵蓋的統計分析方法
Two Independent Samples T-Test Analysis of Variance (ANOVA) (X1, X2, … 均為“類別變項”) Regression (X1, X2, … 均為“連續變項”) Analysis of Covariance (ANCOVA) (X1, X2, … 為“類別變項”及“連續變項” ) Note: 市售統計軟體均可處理此類分析
5
General Linear Model (Y ~ Normal) 可涵蓋的統計分析方法
Two Independent Samples T-Test Analysis of Variance (ANOVA) (X1, X2, … 均為“類別變項”) Regression (X1, X2, … 均為“連續變項”) Analysis of Covariance (ANCOVA) (X1, X2, … 為“類別變項”及“連續變項” ) Note: 市售統計軟體均可處理此類分析
6
Explore the Association (探討相關性 )
Y X (X1, X2, X3, …) Y & X 成直線關係 Y Slope (斜率) Intercept (截距) X
7
Explore the Association (探討相關性 )
Y X (X1, X2, X3, …) 兩點式直線方程式 Y ( X2 , Y2 ) Y2 ■ ( X1 , Y1 ) Y1 ■ Slope (斜率) Intercept (截距) X X1 X2
8
Explore the Association (探討相關性 )
Y X (X1, X2, X3, …) BirthWeight.xlc Slope = ( ) /(42-36) = Intercept = 2420 – *36 =
9
Explore the Association (探討相關性 )
Slope = ( ) /(42-36) = Intercept = 2420 – *36 = BW2.SPSS BW.SPSS
10
Explore the Association (探討相關性 ) Usually, the sample size n > 2
BW Gestation, Smoking ID Weight Gestat Smoking 1 2940 38 2 3130 3 2420 36 4 2450 34 5 2760 39 6 2440 35 7 3226 40 8 3301 42 9 2729 37 10 3410 11 2715 12 3095 13 14 3244 15 2520 BirthWeight.exc
11
Scatter Plot (先畫圖) BW.STATA
12
Scatter Plot (先畫圖) BW.SPSS BW.STATA
13
Note: 迴歸分析一定要先畫圖(Scatter Plot), 否則… Examples: Anscombe Quartet (四重奏)
Y1 X2 Y2 X3 Y3 X4 Y4 10 8.04 9.14 7.46 8 6.58 6.95 8.14 6.77 5.76 13 7.58 8.74 12.74 7.71 9 8.81 8.77 7.11 8.84 11 8.33 9.26 7.81 8.47 14 9.96 8.1 7.04 6 7.24 6.13 6.08 5.25 4 4.26 3.1 5.39 19 12.5 12 10.84 9.13 8.15 5.56 7 4.82 7.26 6.42 7.91 5 5.68 4.74 5.73 6.89 Anscombe.dta Anscombe.SPSS
14
Anscombe.dta Anscombe.SPSS
15
twoway (lfitci y1 x1) (scatter y1 x1, msymbol(circle) mcolor(red) msize(medium))
16
Note: 迴歸分析要畫那些圖(Scatter Plot)? Ans.: 盡可能對所有Y’s & X’s (Matrix Plot)
Anscombe.spss Anscombe.dta
17
研究目的:BW Gestation, Smoking (n > 2)
Q: 迴歸分析是如何利用統計學的理論架構來建構(描述)此現象? BW.SPSS BW.STATA
18
X: Independent Variable(s)
Y: Dependent Variable X: Independent Variable(s) Predicted (fitted) values
19
其他依變數與自變數名稱 依變數(Dependent variable) 自變數(Independent variable)
被解釋變數(explained variable) 效標變項(outcome variable) 內生變項(endogenous variable) 自變數(Independent variable) 解釋變數(explanatory variable) 預測變數(predictor) 外生變項(exogenous variable)
20
Ans.: 最小(垂直距離)平方和法(LSE) Least Sum of Square Error
BW.SPSS BW.STATA
21
Least Square Estimator (LSE):
That is, 最小平方法(LSE)之迴歸線一定通過
22
若用SPSS Linear Regression BW.SPSS BW3.dta id weight gestat smoking 3
2420 36 1 13 3130 39 21 3530 42 2 若用SPSS Linear Regression BW.SPSS BW3.dta
23
BW3.dta id weight gestat smoking 3 2420 36 1 13 3130 39 21 3530 42 2
(42,3530) (39,3130) (36,2420) BW3.dta
24
ANOVA Table id weight gestat smoking 3 2420 36 1 13 3130 39 21 3530 42 2 (42,3530) (39,3130) (36,2420) Residual Sum of Square = ( )2 + ( )2 + ( )2 = weight ymean yhat Y - Yhat Yhat-Ymean Y - Ymean 2420 -555 3130 3530 555 ErrorSS= RegSS= 616050 TotalSS= BW3.dta
25
ANOVA Table id weight gestat smoking 3 2420 36 1 13 3130 39 21 3530 42 2 (42,3530) (39,3130) (36,2420) Regression Sum of Square = ( – )2 + ( )2 + ( )2 = weight ymean yhat Y - Yhat Yhat-Ymean Y - Ymean 2420 -555 3130 3530 555 ErrorSS= RegSS= 616050 TotalSS= BW3.dta
26
ANOVA Table id weight gestat smoking 3 2420 36 1 13 3130 39 21 3530 42 2 (42,3530) (39,3130) (36,2420) Total Sum of Square = (2420 – )2 + ( )2 + ( )2 = weight ymean yhat Y - Yhat Yhat-Ymean Y - Ymean 2420 -555 3130 3530 555 ErrorSS= RegSS= 616050 TotalSS= BW3.dta
27
Q: 迴歸分析的迴歸系數, ,如何解釋? BW.STATA
28
Q: 迴歸分析的迴歸系數, ,如何解釋? 懷孕週數每增加一週,出生嬰兒體重平均增加130.8166 克(忽略其他因素效應) BW.SPSS
BW.STATA BW.SPSS 懷孕週數每增加一週,出生嬰兒體重平均增加 克(忽略其他因素效應)
29
Note: 當獨立變數, X, 為類別資料時, 迴歸分析結果如何解讀?
30
Two Independent Samples T-Test 比較兩組間之差異:
可改寫成 FEV.dta FEV.SPSS
31
Two Independent Samples T-Test
FEV.SPSS Testing Hypothesis : Testing Hypothesis : Interpretation of FEV.dta
32
FEV.SPSS
33
Regression Model : = FEV.SPSS
34
One-way Analysis of Variance (ANOVA)
35
One-way Analysis of Variance (ANOVA)
Testing Hypothesis : Catalyst.SPSS
36
Catalyst.SPSS
37
同理,可改寫 : Catalyst.SPSS Cat.dta
38
Three-Way ANOVA Dur_stay: Duration of hospital stay Age
Sex: 1 Male; 2 Female Temp: First temperature following admission WBC: First WBC(x1000) following admission Antibio: Received antibiotic (1: Yes; 2: No) Bact_cul: Received bacterial culture (1: Yes; 2: No) Service: 1 Medication; 2 Surgery Hospital.SPSS
39
Three-Way ANOVA 23 Factorial Design
Dur_stay: Duration of hospital stay Age Sex: 1 Male; 2 Female Temp: First temperature following admission WBC: First WBC(x1000) following admission Antibio: Received antibiotic (1: Yes; 2: No) Bact_cul: Received bacterial culture (1: Yes; 2: No) Service: 1 Medication; 2 Surgery 23 Factorial Design Hospital.SPSS
40
Hospital.SPSS
41
Q: 如何利用SPSS將資料作自然對數(ln)轉換?
Hospital.SPSS Q: 如何利用SPSS將資料作自然對數(ln)轉換?
42
Hospital.SPSS
43
Hospital.SPSS
44
Hospital.SPSS
45
Hospital.SPSS
46
Hospital.SPSS
47
Hospital.SPSS
48
Hospital.SPSS Checking.EXC
49
Ex. Multiple Regression (ANCOVA)
Y : Birth Weight (Grams) X1: Length of Gestation (Weeks) X2: Smoking Status of Mother (1: Smoker; 2: Nonsmoker) (Potential) Model: or Interaction Term (交互作用項)
50
Ex. Multiple Linear Regression (ANCOVA)
Y: Birth Weight (Grams) X1: Length of Gestation (Weeks) X2: Smoking Status of Mother (1: Smoker; 2: Nonsmoker) (Potential) Model: or Q: Which one? BW.SPSS
51
BW.SPSS
52
BW.SPSS BW.SPSS
53
Q: 進一步依Smoking狀態繪Scatter Plot,此種結果提供何種訊息?
BW.SPSS
54
STATA data file BW.SPSS BW.dta Birth Weight
55
懷孕期間不吸煙的母親與吸煙者之新生兒平均體重無顯著差異(忽略其他因素效應)
BW.SPSS 懷孕期間不吸煙的母親與吸煙者之新生兒平均體重無顯著差異(忽略其他因素效應)
56
STATA data file Birth Weight Q: 要如何調整干擾因素(懷孕週數)效應?!
BW.SPSS BW.dta Birth Weight 問題: 重要干擾因素(懷孕週數)效應被忽略!! Q: 要如何調整干擾因素(懷孕週數)效應?!
57
BW.SPSS
58
BW.SPSS
59
Using SPSS SPSS.sav Interpretation?
經調整懷孕週數效應後,懷孕期間不吸煙的母親比吸煙者之新生兒體重平均多 克
60
STATA data file BW.SPSS BW.dta Birth Weight
61
應用: 估計“懷孕40週的婦女, 出生嬰兒體重”: Smoker: NonSMi = 0 Non-Smoker: NonSMi = 1
經調整懷孕週數效應後,懷孕期間不吸煙的母親比吸煙者之新生兒體重平均多 克 應用: 估計“懷孕40週的婦女, 出生嬰兒體重”: Smoker: NonSMi = 0 Non-Smoker: NonSMi = 1 BW.SPSS
62
Ex. Multiple Regression (ANCOVA)
Y: FEV1 (liters) X1: Age (yrs) X2: Height (inches) X3: Sex (0:Female; 1:Male) X4: Smoking Status 0: Non-current Smoker; 1: Current Smoker
63
Matrix Scatter Plots FEV.SPSS
64
FEV.SPSS
65
FEV.SPSS
66
FEV.SPSS
67
FEV.SPSS
68
FEV.SPSS
72
Q2: What’s your next step?
Q1: What’s your finding? Q2: What’s your next step? FEV.SPSS
73
Q: 迴歸分析中,何時需加入Interaction Term? 該項之迴歸係數如何解讀?
FEV.STATA Q: 迴歸分析中,何時需加入Interaction Term? 該項之迴歸係數如何解讀?
74
FEV.STATA FEV.SPSS Sex 0: Female 1: Male
75
FEV.STATA FEV.SPSS =
76
探討 Factor X 與 Result Y 的關係, 是否會受 Factor C的影響 (i.e. C is a Moderator)
需檢驗 X 與 C 是否存在交互作用 C X Y Need to check it’s significant or not
77
Q: 調整年齡、性別、吸煙效應後, FEV 與身高是否有關?
Interpretation of the coefficient, , is: “經調整年齡、性別、吸煙效應後, 身高每增加一英寸,FEV值平均增加 個單位,且達統計上之顯著性 (p < 0.001)” Q: What does that mean?! or Why?! FEV.STATA FEV.SPSS
78
Least Sum of Square Error
Ans.: 最小(垂直距離)平方和法 Least Sum of Square Error Regress Y on X o Residual FEV.STATA FEV.SPSS
79
Y (Residuals) (X1, X2, …Xk) (Predicted values)
80
FEV.SPSS
81
Thanks for Your Attention
Similar presentations