多元迴歸分析.

Slides:



Advertisements
Similar presentations
MINITAB Minitab 培训 Minitab Training.
Advertisements

Basic concepts of structural equation modeling
迴歸分析與軟體應用 林 國 欽 博士 商學與管理研究所 台南科技大學.
人群健康研究的统计方法 预防医学系 指导教师:方亚 电话:
关于中文字体辨识度的探究实验 让你看清楚PPT 钱翔、王洋、冒星阳、张芸、刘娜、郭艾利.
Chapter 15 複迴歸.
How to Use SPSS in Biomedical Data analysis
Physician Financial Incentives and Cesarean Section Delivery
多元迴歸 Multiple Regression
分析抗焦慮劑/安眠劑之使用的影響因子在重度憂鬱症及廣泛性焦慮症病人和一般大眾的處方形態
上皮生長因子接受器-1, -2基因多形性與泌尿道上皮癌之相關研究
STATISTICA統計軟體的應用 第二講:廻歸與ANOVA
Chapter 8 Liner Regression and Correlation 第八章 直线回归和相关
第四章 测试效度及其 验证方法(一) 湖南师范大学外国语学院 邓 杰 教授.
XI. Hilbert Huang Transform (HHT)
3-3 Modeling with Systems of DEs
-Artificial Neural Network- Adaline & Madaline
Population proportion and sample proportion
Descriptive statistics
型II誤差機率的計算 Calculating Type II Error Probabilities
Chapter 2 簡單迴歸模型.
第 14 章 複迴歸與相關分析.
Differential Equations (DE)
第十章 兩母體之假設檢定 Inferences Based on Two-Samples:
次数依变量模型 (Models for Count Outcomes)
The Empirical Study on the Correlation between Equity Incentive and Enterprise Performance for Listed Companies 上市公司股权激励与企业绩效相关性的实证研究 汇报人:白欣蓉 学 号:
非線性規劃 Nonlinear Programming
What is poverty? Source: Commission on Proverty, HKSAR Government
描述資料: 次數表, 次數分配, 以及 統計圖 第二章
第七章 SPSS的非参数检验.
多元回歸及模型 Multiple Regression Model Building
第一章.
Stochastic Relationships and Scatter Diagrams
Sampling Theory and Some Important Sampling Distributions
第十一章. 簡單直線迴歸與簡單相關 Simple Linear Regression and Simple Correlation
十一、簡單相關與簡單直線回歸分析(Simple Correlations and Simple Linear Regression )
簡單迴歸模型的基本假設 用最小平方法(OLS-ordinary least square)找到一個迴歸式:
创建型设计模式.
非均一性的誤差變異數 and SERIAL CORRELATION
Chapter 14 Simple Linear Regression
The role of leverage in cross-border mergers and acquisitions
二元隨機變數(Bivariate Random Variables)
Interval Estimation區間估計
塑膠材料的種類 塑膠在模具內的流動模式 流動性質的影響 溫度性質的影響
線性相關與直線迴歸 基本概念 線性相關:兩個連續變項的共變關係,且有線性關係。所謂 的線性關係乃指兩個變項的關係可以被一條最具
The Nature and Scope of Econometrics
第四章 测试效度及其 验证方法(一) 湖南师范大学外国语学院 邓 杰 教授.
Unit title: 假期 – Holiday
GRANT UNION HIGH SCHOOL
以每年參觀Lake Keepit的人數為例
统 计 学 (第三版) 2008 作者 贾俊平 统计学.
生物統計 1 課程簡介 (Introduction)
Mechanics Exercise Class Ⅰ
相關統計觀念復習 Review II.
Design and Analysis of Experiments Final Report of Project
Simple Regression (簡單迴歸分析)
Inter-band calibration for atmosphere
The Bernoulli Distribution
第二章 经典线性回归模型: 双变量线性回归模型
Statistics Chapter 1 Introduction Instructor: Yanzhi Wang.
Review of Statistics.
磁共振原理的临床应用.
An Overview of Labor Market 2012
名词从句(2).
國立東華大學課程設計與潛能開發學系張德勝
Logistic回归 Logistic regression 研究生《医学统计学》.
Multiple Regression: Estimation and Hypothesis Testing
簡單迴歸分析與相關分析 莊文忠 副教授 世新大學行政管理學系 計量分析一(莊文忠副教授) 2019/8/3.
Gaussian Process Ruohua Shi Meeting
Presentation transcript:

多元迴歸分析

遺漏變數偏誤 多元迴歸模型 多元迴歸模型的估計 多元迴歸模型: 實例 變異數分析與參數檢定 多元迴歸模型的幾個重要議題

遺漏變數偏誤 我們將不再假設解釋變數為固定值, 而是隨機變數 在簡單迴歸模型中, 只有一個解釋變數, 然而, 在大多數的情形下, 被解釋變數Y 通常可被一個以上的變數所解釋。舉例來說, 所得水準除了受到教育程度的影響之外, 亦可能受到工作經驗等其他變數所影響

遺漏變數偏誤 此外, 只考慮一個解釋變數時, 可能會產生遺漏變數偏誤(omitted variable bias) 考慮解釋變數(如教育程度) 與另外一個變數(如父母所得水準) 具相關性,(一般來說, 父母所得越高, 子女能夠得到的教育越好, 教育程度自然越高) 且該變數(父母所得水準) 本身亦會直接影響被解釋變數(所得水準), (一般來說, 父母所得越高, 投注在子女身上的其他資源越多, 子女的所得也因而越高)

遺漏變數偏誤 如果我們在迴歸模型中忽略了此變數, 就會造成遺漏變數偏誤 假設原有解釋變數為X, 遺漏變數為Z, 而被解釋變數為Y 。換句話說, 一個變數是否為迴歸模型中的遺漏變數, 必須符合以下兩條件: 該變數與模型原有的解釋變數相關: Corr (X, Z) ≠ 0。 該變數 Z 亦會直接影響被解釋變數Y 。

Suppose the true model is The estimated model is The covariance between Xi and error term is

Therefore, Since < 0 (the effect of Pct EL on Test Score) and Cov(Xi, Zi) > 0, we have

遺漏變數的影響 遺漏變數偏誤不會隨樣本增加而變小 簡言之, 如果我們忽略了遺漏變數, 將使原有的解釋變數的估計式 不是參數 的一致估計式 簡言之, 如果我們忽略了遺漏變數, 將使原有的解釋變數的估計式 不是參數 的一致估計式 遺漏變數偏誤決定於|Cov(X, Z)| 的大小 若Cov(X, Z) > 0, 則存在正向偏誤(高估欲估計的參數); 反之, 若Cov(X, Z) < 0, 則存在負向偏誤(低估欲估計的參數)。

An example of omitted variable bias: Mozart Effect? Listening to Mozart for 10-15 minutes could raise IQ by 8 or 9 points. (Nature 1993) Students who take optional music or arts courses in high school have higher English and math test scores than those who don’t.

多元迴歸模型 我們將只考慮一個解釋變數的簡單迴歸模型擴充為如下的多元迴歸模型: 其中, X = {X1, . . . , Xk} 就是模型中的k 個解釋變數, ei 為隨機干擾項, 且

Population Multiple Regression Model Bivariate model: y (Observed y) Response b e i Plane x2 x1 (x1i , x2i) 12

是未知參數, 其意義為 亦即在控制其他變數影響之情況下, 第j 個解釋變 數對於Y 的淨影響

多元迴歸模型: 薪資所得, 教育程度與工作經驗 多元迴歸模型為 薪資所得= β0+ β1×教育程度+ β2×工作經驗+ei , 簡單迴歸模型為 薪資所得= α + β ×教育程度+ ei , 可以確定的是, β1 與 β 都是用來探討教育程度對於薪資所得的影響, 但是β1 與 的詮釋卻不相同

β 單純地衡量教育程度如何影響薪資所得, 亦即, 教育程度增加一單位(譬如說增加一年), 薪資所得將增加 β 單位 然而, 我們知道影響薪資所得的解釋變數應該不只一個, 因此, 一旦我們將其他可能的解釋變數考慮進來(本例中的工作經驗), 則β1 詮釋為:「在給定相同的工作經驗下, 教育程度增加一單位, 薪資所得將增加β1 單位」

多元迴歸模型 這就是在經濟學的研究中, 我們時常探討所謂的「其他情況不變下」(ceteris paribus), 變數之間的關係 譬如說, 其他情況不變下, 價格如何影響需求量。或者是, 其他情況不變下, 工資率如何影響勞動供給

多元迴歸模型的估計 欲估計迴歸模型中的未知參數, 我們知道 相互獨立, 最小平方法為

多元迴歸模型的估計 因此, 尋找 來極大 透過 我們可以得到k + 1 條標準方程式, 進而解出 因此, 尋找 來極大 透過 我們可以得到k + 1 條標準方程式, 進而解出 許多商業軟體如EXCEL 都能夠輕易地幫你找出這些估計值

Estimation of σ2 For a model with k independent variables

多元迴歸模型: 實例 阿中為一物流送貨員, 時常在外奔波運送貨品。阿中的老板懷疑阿中利用在外送貨的空檔開小差, 因此,阿中的老板將他以前的送貨行程記錄調出

根據多元迴歸模型: 其中, Y =在外奔波時數, X1 =送貨路程, 而X2 =送貨點個數 阿中的老板估計出如下的迴歸模型

在固定的送貨點個數下, 阿中的送貨路程每多一公里, 在外奔波時數增加0.066 小時; 在相同的送貨路程下, 阿中的送貨點每多一個,在外奔波時數增加0.694 小時 其中,

在本例中, 以及

根據自由度為n − (k + 1) = 10 − (2 + 1) = 7的t 分配, 在顯著水準 γ=1%, 5% 以及10%的臨界值分別為3.499, 2.365 以及1.895 因此, 在1% 的顯著水準下具顯著性, 而 則是在10% 的顯著水準下具顯著性 「送貨路程」與「送貨點個數」無論是在經濟上或是統計上均具顯著性 亦即, 都是「在外奔波時數」的重要解釋變數

在得到以上的估計後, 阿中的老板一旦知道阿中今天有5 個送貨點得跑, 總路程為110 公里, 則阿中的老板可以預測阿中今天在外奔波時數為−0 在得到以上的估計後, 阿中的老板一旦知道阿中今天有5 個送貨點得跑, 總路程為110 公里, 則阿中的老板可以預測阿中今天在外奔波時數為−0.39 + 0.066× 110 + 0.694× 5 = 10.35 小時 如果阿中今天在外奔波了12 個小時, 則阿中的老板就能夠合理地懷疑阿中利用2 小時開小差 這個例子清楚地說明迴歸模型的兩大重要功能:解釋與預測

23.1 The Multiple Regression Model A chain is considering where to locate a new restaurant. Is it better to locate it far from the competition or in a more affluent area? Use multiple regression to describe the relationship between several explanatory variables and the response. Multiple regression separates the effects of each explanatory variable on the response and reveals which really matter. Copyright © 2011 Pearson Education, Inc. 3 of 47

23.2 Interpreting Multiple Regression Example: Women’s Apparel Stores Response variable: sales at stores in a chain of women’s apparel (annually in dollars per square foot of retail space). Two explanatory variables: median household income in the area (thousands of dollars) and number of competing apparel stores in the same mall. Copyright © 2011 Pearson Education, Inc. 7 of 47

23.2 Interpreting Multiple Regression Example: Women’s Apparel Stores Begin with a scatterplot matrix, a table of scatterplots arranged as in a correlation matrix. Using a scatterplot matrix to understand data can save considerable time later when interpreting the multiple regression results. Copyright © 2011 Pearson Education, Inc. 8 of 47

23.2 Interpreting Multiple Regression Scatterplot Matrix: Women’s Apparel Stores Copyright © 2011 Pearson Education, Inc. 9 of 47

23.2 Interpreting Multiple Regression Example: Women’s Apparel Stores The scatterplot matrix for this example Confirms a positive linear association between sales and median household income. Shows a weak association between sales and number of competitors. Copyright © 2011 Pearson Education, Inc. 10 of 47

23.2 Interpreting Multiple Regression Correlation Matrix: Women’s Apparel Stores Copyright © 2011 Pearson Education, Inc. 11 of 47

23.2 Interpreting Multiple Regression Partial Slopes: Women’s Apparel Stores Copyright © 2011 Pearson Education, Inc. 16 of 47

23.2 Interpreting Multiple Regression Marginal and Partial Slopes Partial slope: slope of an explanatory variable in a multiple regression that statistically excludes the effects of other explanatory variables. Marginal slope: slope of an explanatory variable in a simple regression. Copyright © 2011 Pearson Education, Inc. 15 of 47

23.2 Interpreting Multiple Regression Partial Slopes: Women’s Apparel Stores Copyright © 2011 Pearson Education, Inc. 16 of 47

Inference in Multiple Regression Inference for One Coefficient The t-statistic is used to test each slope using the null hypothesis H0: βj = 0. The t-statistic is calculated as Copyright © 2011 Pearson Education, Inc. 31 of 47

Inference in Multiple Regression t-test Results for Women’s Apparel Stores The t-statistics and associated p-values indicate that both slopes are significantly different from zero. Copyright © 2011 Pearson Education, Inc. 32 of 47

Copyright © 2011 Pearson Education, Inc. Prediction Intervals An approximate 95% prediction interval is given by . For example, the 95% prediction interval for sales per square foot at a location with median income of $70,000 and 3 competitors is approximately $545.47 ±$136.06 per square foot. Copyright © 2011 Pearson Education, Inc. 33 of 47

Copyright © 2011 Pearson Education, Inc. Partial Slopes: Women’s Apparel Stores The slope b1 = 7.966 for Income implies that a store in a location with a higher median household of $10,000 sells, on average, $79.66 more per square foot than a store in a less affluent location with the same number of competitors. The slope b2 = -24.165 implies that, among stores in equally affluent locations, each additional competitor lowers average sales by $24.165 per square foot. Copyright © 2011 Pearson Education, Inc. 17 of 47

Copyright © 2011 Pearson Education, Inc. Marginal and Partial Slopes Partial and marginal slopes only agree when the explanatory variables are uncorrelated. In this example they do not agree. For instance, the marginal slope for Competitors is 4.6352. It is positive because more affluent locations tend to draw more competitors. The MRM separates these effects but the SRM does not. Copyright © 2011 Pearson Education, Inc. 18 of 47

Copyright © 2011 Pearson Education, Inc. Checking Conditions Conditions for Inference Use the residuals from the fitted MRM to check that the errors in the model are independent; have equal variance; and follow a normal distribution. Copyright © 2011 Pearson Education, Inc. 21 of 47

Copyright © 2011 Pearson Education, Inc. Checking Conditions Calibration Plot Calibration plot: scatterplot of the response on the fitted values . R2 is the correlation between and ; the tighter data cluster along the diagonal line in the calibration plot, the larger the R2 value. Copyright © 2011 Pearson Education, Inc. 22 of 47

Copyright © 2011 Pearson Education, Inc. 23.3 Checking Conditions Calibration Plot: Women’s Apparel Stores Copyright © 2011 Pearson Education, Inc. 23 of 47

Copyright © 2011 Pearson Education, Inc. 23.3 Checking Conditions Residual Plots Plot of residuals versus fitted y values is used to identify outliers and to check for the similar variances condition. Plot of residuals versus each explanatory variable are used to verify that the relationships are linear. Copyright © 2011 Pearson Education, Inc. 24 of 47

Copyright © 2011 Pearson Education, Inc. 23.3 Checking Conditions Residual Plot: Women’s Apparel Stores This plot of residuals versus fitted values of y has no evident pattern. Copyright © 2011 Pearson Education, Inc. 25 of 47

Copyright © 2011 Pearson Education, Inc. 23.3 Checking Conditions Residual Plot: Women’s Apparel Stores This plot of residuals versus Income has no evident pattern. Copyright © 2011 Pearson Education, Inc. 26 of 47

Copyright © 2011 Pearson Education, Inc. Checking Conditions Check Normality: Women’s Apparel Stores The quantile plot indicates nearly normal condition is satisfied. Copyright © 2011 Pearson Education, Inc. 27 of 47

變異數分析與參數檢定 我們可以輕易地將簡單迴歸模型中的變異數分析表擴展為多元迴歸架構下的變異數分析表。其中, UV的自由度變成n − k − 1 係因估計參數 而損失了(k + 1) 個自由度。

我們可以算出判定係數R2 為 亦即被解釋變數Y 的總變異中, 有多少比例可被迴歸模型所解釋 然而, 每增加一個解釋變數進入多元迴歸模型,UV 亦會隨之減少(或著不變), 進而使得R2 增加(或著不變)

為什麼每增加一個解釋變數, UV 就會隨之減少? 假設你本來考慮兩個解釋變數, 如今欲增加一個解釋變數, 因此, 極小化問題變成 如果找到的 恰好為零, 則此時的UV 就會等於 只考慮兩個解釋變數時的 : 若找到的 不為零, 代表

亦即, 多增加一個解釋變數會使UV 降低, 進而造成R2 增加 如此一來, 我們若是在原來的模型中無止境的增加解釋變數, 或是放入一些不相干的變數, 模型的解釋力不會降低(亦即增加或是不變), 但是這樣做毫無意義

修正的判定係數(adjusted coefficient of determination) 為了彌補判定係數的這個缺陷, 我們採用修正的判定係數:

在 中, 我們對於增加解釋變數予以懲罰, 當解釋變數增加, 雖然R2 會增加或不變, 但是懲罰項增加, 進而拉低 。 因此, 利用修正的判定係數來衡量模型的配適度, 並不會得到解釋變數多多益善的結論

F 檢定 最後, 一如前一章的討論, 對於 的虛無假設, 我們可以採用F 檢定: 在顯著水準為 γ 下, 當 我們拒絕虛無假設

Example: Women’s Apparel Stores Response variable: sales at stores in a chain of women’s apparel (annually in dollars per square foot of retail space). Two explanatory variables: median household income in the area (thousands of dollars) and number of competing apparel stores in the same mall. Copyright © 2011 Pearson Education, Inc. 7 of 47

Copyright © 2011 Pearson Education, Inc. R-squared and se The equation of the fitted model for estimating sales in the women’s apparel stores example is = 60.3587 + 7.966 Income -24.165 Competitors Copyright © 2011 Pearson Education, Inc. 12 of 47

Copyright © 2011 Pearson Education, Inc. R-squared and se R2 indicates that the fitted equation explains 59.47% of the store-to-store variation in sales. For this example, R2 is larger than the r2 values for separate SRMs fitted for each explanatory variable; it is also larger than their sum. For this example, se = $68.03. Copyright © 2011 Pearson Education, Inc. 13 of 47

Copyright © 2011 Pearson Education, Inc. R-squared and se is known as the adjusted R-squared. It adjusts for both sample size n and model size k. It is always smaller than R2. The residual degrees of freedom (n-k-1) is the divisor of se. and se move in opposite directions when an explanatory variable is added to the model ( goes up while se goes down). Copyright © 2011 Pearson Education, Inc. 14 of 47

Inference for the Model: F-test F-test: test of the explanatory power of the MRM as a whole. F-statistic: ratio of the sample variance of the fitted values to the variance of the residuals. Copyright © 2011 Pearson Education, Inc. 28 of 47

23.4 Inference in Multiple Regression Inference for the Model: F-test The F-Statistic is used to test the null hypothesis that all slopes are equal to zero, e.g., H0: . Copyright © 2011 Pearson Education, Inc. 29 of 47

F-test Results in Analysis of Variance Table The F-statistic has a p-value of <0.0001; reject H0. Income and Competitors together explain statistically significant variation in sales. Copyright © 2011 Pearson Education, Inc. 30 of 47

Steps in Fitting a Multiple Regression What is the problem to be solved? Do these data help in solving it? Check the scatterplots of the response versus each explanatory variable (scatterplot matrix). If the scatterplots appear straight enough, fit the multiple regression model. Otherwise find a transformation. Obtain the residuals and fitted values from the regression. Copyright © 2011 Pearson Education, Inc. 34 of 47

Steps in Fitting a Multiple Regression Use residual plot of e vs. to check for similar variance condition. Construct residual plots of e vs. explanatory variables. Look for patterns. Check whether the residuals are nearly normal. Use the F-statistic to test the null hypothesis that the collection of explanatory variables has no effect on the response. If the F-statistic is statistically significant, test and interpret individual partial slopes. Copyright © 2011 Pearson Education, Inc. 35 of 47

4M Example 23.1: SUBPRIME MORTGAGES Motivation A banking regulator would like to verify how lenders use credit scores to determine the interest rate paid by subprime borrowers. The regulator would like to separate its effect from other variables such as loan-to-value (LTV) ratio, income of the borrower and value of the home. Copyright © 2011 Pearson Education, Inc. 36 of 47

4M Example 23.1: SUBPRIME MORTGAGES Method Use multiple regression on data obtained for 372 mortgages from a credit bureau. The explanatory variables are the LTV, credit score, income of the borrower, and home value. The response is the annual percentage rate of interest on the loan (APR). Copyright © 2011 Pearson Education, Inc. 37 of 47

4M Example 23.1: SUBPRIME MORTGAGES Method Find correlations among variables: Copyright © 2011 Pearson Education, Inc. 38 of 47

4M Example 23.1: SUBPRIME MORTGAGES Method Check scatterplot matrix (like APR vs. LTV ) Linearity and no obvious lurking variables conditions satisfied. Copyright © 2011 Pearson Education, Inc. 39 of 47

4M Example 23.1: SUBPRIME MORTGAGES Mechanics Fit model and check conditions. Copyright © 2011 Pearson Education, Inc. 40 of 47

4M Example 23.1: SUBPRIME MORTGAGES Mechanics Residuals versus fitted values. Similar variances condition is satisfied. Copyright © 2011 Pearson Education, Inc. 41 of 47

4M Example 23.1: SUBPRIME MORTGAGES Mechanics Nearly normal condition is not satisfied; data are skewed. Copyright © 2011 Pearson Education, Inc. 42 of 47

4M Example 23.1: SUBPRIME MORTGAGES Message Regression analysis shows that the characteristics of the borrower (credit score) and loan LTV affect interest rates in the market. These two factors together explain almost half of the variation in interest rates. Neither income of the borrower nor the home value improves a model with these two variables. Copyright © 2011 Pearson Education, Inc. 43 of 47

完全線性重合(perfect multicollinaerity) 如果多元迴歸模型中的解釋變數之間具有線性關係: 是謂完全線性重合 亦即, 至少有一個解釋變數可以寫成其他解釋變數的線性組合。一旦存在完全線性重合, 代表模型中有一個多餘的變數, 使得迴歸係數的估計有認定上的問題, 無法求算

值得注意的是, 我們所定義的完全線性重合係定義在解釋變數的線性關係, 因此, 非線性關係如 則不構成完全線性重合問題 反之, 則具有完全線性重合問題(為什麼?)

虛擬變數(dummy variables) 討論至此, 我們所探討的解釋變數均為連續隨機變數 有時我們關心的解釋變數可能為間斷 譬如說, 回到阿中送貨的例子, 如果在外奔波時數還會受到天氣影響, 則我們的解釋變數為 稱之為虛擬變數

虛擬變數 我們的模型變成 給定當天為晴天, 在外奔波時數的條件期望值為 給定當天為雨天, 在外奔波時數的條件期望值為

虛擬變數 兩者之差異 就是在控制了其他變數後(給定相同的送貨路程與送貨點個數), 天氣對於在外奔波時數的條件均數之影響 兩者之差異 就是在控制了其他變數後(給定相同的送貨路程與送貨點個數), 天氣對於在外奔波時數的條件均數之影響 一般而言, 下雨天的視線不良, 路況不佳, 我們預期平均而言在外奔波時數會增加, 亦即 > 0

虛擬變數 關於虛擬變數, 在給定迴歸模型存在截距項 的情況下, 有一個重要的設定規則: 如果有m 種不同屬性需要考慮, 則只能設定m − 1 個虛擬變數。 關於這樣的設定規則, 其背後的理由在於, 如果我們設定了m 個虛擬變數, 在截距項 存在的情況下, 將會造成完全線性重合問題

虛擬變數 回到阿中的例子。如果公司有四輛貨車(I, II, III,以及IV 號車), 由於車況不同, 亦會影響在外奔波時數, 則我們只能設定3 個虛擬變數:

虛擬變數的設定

Interaction Models

Types of Regression Models Explanatory Variable 1st Order Model 3rd 2 or More Quantitative Variables 2nd Inter- Action 1 Qualitative Dummy This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Interaction Model With 2 Independent Variables Hypothesizes interaction between pairs of x variables Response to one x variable varies at different levels of another x variable Contains two-way cross product terms Can be combined with other models Example: dummy-variable model 61

Effect of Interaction Given: Without interaction term, effect of x1 on y is measured by 1 With interaction term, effect of x1 on y is measured by 1 + 3x2 Effect increases as x2 increases 67

Interaction Model Relationships E(y) = 1 + 2x1 + 3x2 + 4x1x2 E(y) E(y) = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1 12 8 E(y) = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1 4 x1 0.5 1 1.5 Effect (slope) of x1 on E(y) depends on x2 value 68

Interaction Model Worksheet Case, i yi x1i x2i x1i x2i 1 1 1 3 3 2 4 8 5 40 3 1 3 2 6 4 3 5 6 30 : : : : : Multiply x1 by x2 to get x1x2. Run regression with y, x1, x2 , x1x2 69

Interaction Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x1, and newspaper circulation (000), x2, on the number of ad responses (00), y. Conduct a test for interaction. Use α = .05. Is this model specified correctly? What other variables could be used (color, photo’s etc.)? 18

Interaction Model Worksheet yi x1i x2i x1i x2i 1 1 2 2 4 8 8 64 1 3 1 3 3 5 7 35 2 6 4 24 4 10 6 60 Multiply x1 by x2 to get x1x2. Run regression with y, x1, x2 , x1x2 69

Excel Computer Output Solution Global F–test indicates at least one parameter is not zero F P-Value

Interaction Test Solution H0: Ha:   df  Critical Value(s): 3 = 0 3 ≠ 0 Test Statistic: Decision: Conclusion: .05 6 - 2 = 4 t 2.776 -2.776 .025 Reject H0

Excel Computer Output Solution

Interaction Test Solution H0: Ha:   df  Critical Value(s): 3 = 0 3 ≠ 0 .05 6 - 2 = 4 t 2.776 -2.776 .025 Reject H0

Interaction Test Solution Test Statistic: t = 1.8528 Decision: Do no reject at  = .05 Conclusion: There is no evidence of interaction

Second–Order Models

Types of Regression Models Explanatory Variable 1st Order Model 3rd 2 or More Quantitative Variables 2nd Inter- Action 1 Qualitative Dummy This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27

Second-Order Model With 1 Independent Variable Relationship between 1 dependent and 1 independent variable is a quadratic function Useful 1st model if non-linear relationship suspected Model Linear effect Curvilinear effect Note potential problem with multicollinearity. This is solved somewhat by centering on the mean. 48

Second-Order Model Relationships 2 > 0 2 > 0 y y x1 x1 2 < 0 2 < 0 y y x1 x1 49

Second-Order Model Worksheet 2 Case, i yi xi xi 1 1 1 1 2 4 8 64 3 1 3 9 4 3 5 25 : : : : Create x2 column. Run regression with y, x, x2. 50

2nd Order Model Example Errors (y) Weeks (x) 20 1 18 1 16 2 10 4 8 4 4 5 3 6 1 8 2 10 1 11 0 12 1 12 The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2nd order model, conduct the global F–test, and test if β2 ≠ 0. Use α = .05 for all tests.

Second-Order Model Worksheet 2 yi xi xi 20 1 1 18 1 1 16 2 4 10 4 16 : : : Create x2 column. Run regression with y, x, x2. 50

Excel Computer Output Solution

Overall Model Test Solution Global F–test indicates at least one parameter is not zero F P-Value

β2 Parameter Test Solution β2 test indicates curvilinear relationship exists P-Value t