Copyright © 2011 Pearson Education, Inc. Chapter 19 Linear Patterns Copyright © 2011 Pearson Education, Inc.
Copyright © 2011 Pearson Education, Inc. 19.1 Fitting a Line to Data What is the relationship between the price and weight of diamonds? Use regression analysis to find an equation that summarizes the linear association between price and weight The intercept and slope of the line estimate the fixed and variable costs in pricing diamonds Copyright © 2011 Pearson Education, Inc. 3 of 37
Copyright © 2011 Pearson Education, Inc. 19.1 Fitting a Line to Data Consider Two Questions about Diamonds: What’s the average price of diamonds that weigh 0.4 carat? How much more do diamonds that weigh 0.5 carat cost? Copyright © 2011 Pearson Education, Inc. 4 of 37
Copyright © 2011 Pearson Education, Inc. 19.1 Fitting a Line to Data Equation of a Line Using a sample of diamonds of various weights, regression analysis produces an equation that relates weight to price. Let y denote the response variable (price) and let x denote the explanatory variable (weight). Copyright © 2011 Pearson Education, Inc. 5 of 37
Copyright © 2011 Pearson Education, Inc. 19.1 Fitting a Line to Data Scatterplot of Price vs. Weight Linear association is evident (r = 0.66). Copyright © 2011 Pearson Education, Inc. 6 of 37
Copyright © 2011 Pearson Education, Inc. 19.1 Fitting a Line to Data Equation of a Line Identify the line fit to the data by an intercept and a slope . The equation of the line is Estimated Price = Weight. Copyright © 2011 Pearson Education, Inc. 7 of 37
Copyright © 2011 Pearson Education, Inc. 19.1 Fitting a Line to Data Least Squares Residual: vertical deviations from the data points to the line ( ). The best fitting line collectively makes the squares of residuals as small as possible (the choice of b0 and b1 minimizes the sum of the squared residuals). Copyright © 2011 Pearson Education, Inc. 8 of 37
Copyright © 2011 Pearson Education, Inc. 19.1 Fitting a Line to Data Residuals Copyright © 2011 Pearson Education, Inc. 9 of 37
19.2 Interpreting the Fitted Line Diamond Example The least squares regression equation for relating diamond prices to weight is Estimated Price = 43 + 2670 Weight Copyright © 2011 Pearson Education, Inc. 11 of 37
19.2 Interpreting the Fitted Line Diamond Example The average price of a diamond that weighs 0.4 carat is Estimated Price = 43 + 2,670(0.4) = $1,111 A diamond that weighs 0.5 carat costs $267 more, on average. Copyright © 2011 Pearson Education, Inc. 12 of 37
19.2 Interpreting the Fitted Line Diamond Example Copyright © 2011 Pearson Education, Inc. 13 of 37
19.2 Interpreting the Fitted Line Interpreting the Intercept The intercept is the portion of y that is present for all values of x (i.e., fixed cost, $43, per diamond). The intercept estimates the average response when x = 0 (where the line crosses the y axis). Copyright © 2011 Pearson Education, Inc. 14 of 37
19.2 Interpreting the Fitted Line Interpreting the Intercept Unless the range of x values includes zero, b0 will be an extrapolation. Copyright © 2011 Pearson Education, Inc. 15 of 37
19.2 Interpreting the Fitted Line Interpreting the Slope The slope estimates the marginal cost used to find the variable cost (i.e., marginal cost is $2,670 per carat). While tempting, it is not correct to describe the slope as the change in y caused by changing x. Copyright © 2011 Pearson Education, Inc. 16 of 37
Example II Empirical problem: Class size and educational output. Policy question: What is the effect of reducing class size by one student per class? by 8 students/class? What is the right output (performance) measure? parent satisfaction. student personal development. future adult welfare. future adult earnings. performance on standardized tests.
What do data say about class sizes and test scores? The California Test Score Data Set All K-6 and K-8 California school districts (n = 420) Variables: 5th grade test scores (Stanford-9 achievement test, combined math and reading), district average. Student-teacher ratio (STR) = number of students in the district divided by number of full-time equivalent teachers.
An initial look at the California test score data:
Question: Do districts with smaller classes (lower STR) have higher test scores? And by how much?
The class size/test score policy question: What is the effect of reducing STR by one student/teacher on test scores ? Object of policy interest: . This is the slope of the line relating test score and STR.
This suggests that we want to draw a line through the Test Score v. s This suggests that we want to draw a line through the Test Score v.s. STR scatterplot, but how?
Linear Regression: Some Notation and Terminology The population regression line is
β0 and β1 are “population” parameters? We would like to know the population value of β1 We don’t know β1, so we must estimate it using data.
Application to the California Test Score-Class Size data Estimated slope = = - 2.28 Estimated intercept = = 698.9 Estimated regression line: = 698.9 - 2.28 ST R
Scattergram Plot of all (xi, yi) pairs Suggests how well model will fit 20 40 60 x y
How would you draw a line through the points? Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’? 20 40 60 x y 42
迴歸分析的基本概念 迴歸分析(regression analysis) 以成對的資料點(pair data) 研究兩個或兩個以上變數之間的關係 以兩個變數為例, 所謂成對的資料點(pair data)係指觀察到的資料為: , 如果經濟理論告訴我們x 與y 之間具有一定的關係, 我們可用y = f (x) 來刻畫此關係 舉例來說, 「個人所得」為「教育程度」所影響;或者是「物價膨脹率」為「貨幣供給」所影響
Population Linear Regression Model y Observed value i = Random error x Observed value 35
母體迴歸線 簡單地說, 如果我們擁有母體資料, 母體迴歸線與相關係數一樣, 都可視為描繪這組母體資料的敘述統計量
Population & Sample Regression Models Random Sample $ Unknown Relationship $ $ $ $ $ 31
Sample Linear Regression Model y i = Random error ^ Unsampled observation x Observed value 36
Least Squares minimizes the Sum of the Squared Differences (SSE) ‘Best fit’ means difference between actual y values and predicted y values are a minimum But positive differences off-set negative Least Squares minimizes the Sum of the Squared Differences (SSE) 49
最小平方法 最小平方法是一個預測誤差, 我們當然希望以xi 來預測Yi 的誤差能夠越小越好, 換言之, 就是希望Yi 能夠被xi 解釋的部份越大越好 如果我們將預測誤差視為一種損失, 則我們希望能夠極小以下誤差平方和的損失函數(loss function):
一般來說, 損失函數的函數形式不一定誤差平方和, 也可以是誤差絕對和: , , 甚至是以效用函數u(·) 來衡量損失的痛苦, 亦即 無論是誤差平方和或是誤差絕對和, 我們所關心(或是說所感到痛苦) 的是誤差的大小(size), 對於誤差為正或是為負, 並不予以考慮, 這就是為什麼在這兩種損失函數中, 分別考慮誤差的平方或是其絕對值
損失函數若為誤差平方和, 則估計方法就稱為最小平方法, 而利用最小平方法所找出來的估計式就稱作最小平方估計式(least-squares estimators)
最小平方法(method of least-squares ) 乃是統計學文獻上, 估計迴歸係數最常用的方法 在求解最佳線性預測式時, 我們已經使用過此法 最小平方法是一種數學上的求解方法, 因此, 我們不需任何統計上的機率模型就可以利用母體資料「求解」出β(β1)與α(β0) 當我們利用樣本資料估計迴歸線時, 我們利用最小平方法求解得到「最小平方解」, 並且在給定機率模型假設下, 最小平方解就稱作最小平方估計式
根據 , 極小化的問題可改寫成 亦即, 我們要找出α 及β 使損失函數極小
利用一階條件我們知道,
聯立求解以找出極小化q 的α 及β。我們將此最小平方估計式稱之為 與 : 注意到為什麼這裡的最小平方解又稱做最小平方估計式? 因為 與 是隨機樣本的函數!
我們進一步將最小平方估計式 與 代回一階條件, 可得 稱作標準方程式(normal equations)
根據 與 , 我們可以得到誤差ei 的估計式: 估計出來的樣本迴歸線則為 因此, 亦可寫成
值得一提的是, 根據 如果x 完全相等, 則x1 = x2 = x3 = · · · xn = , 將使分母的部份為零,後面會有說明。
最小平方估計式的一般性質 令 則我們知道
最小平方估計式的一般性質 di 的重要性質
最小平方估計式的一般性質
例一: 獨占廠商產品需求之估計 吳阿帥經營的軟體公司為中文排版軟體市場上的一個獨占廠商。吳阿帥相信需求量(Y ) 與價格(x) 具有一線性關係。因此, 吳阿帥觀察了五種不同價格下的需求量, 並據此估計此線性的需求線。茲簡述資料如下
例一: 獨占廠商產品需求之估計 我們可以算出 根據最小平方估計式的公式,
Least Squares Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Find the least squares line relating sales and advertising.
Scattergram Sales vs. Advertising 4 3 2 1 1 2 3 4 5 Advertising 57
Parameter Estimation Solution Table xi yi xi 2 yi 2 xiyi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 58
Parameter Estimation Solution 59
Regression Line Fitted to the Data Sales 4 3 2 1 1 2 3 4 5 Advertising 57
Least Squares Thinking Challenge You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.5 10 6.5 12 9.0 Find the least squares line relating crop yield and fertilizer. © 1984-1994 T/Maker Co. 62
Scattergram Crop Yield vs. Fertilizer* Yield (lb.) 10 8 6 4 2 5 10 15 Fertilizer (lb.) 65
Parameter Estimation Solution Table* 2 2 xi yi xi yi xiyi 4 3.0 16 9.00 12 6 5.5 36 30.25 33 10 6.5 100 42.25 65 12 9.0 144 81.00 108 32 24.0 296 162.50 218 66
Parameter Estimation Solution* 67
Regression Line Fitted to the Data* Yield (lb.) 10 8 6 4 2 5 10 15 Fertilizer (lb.) 65
古典常態迴歸模型 欲做統計推論, 就必須先設定隨機樣本所來自的機率模型, 在此我們介紹一個最基本的模型: 古典常態迴歸模型(classical normal regression model), 簡稱CNRM
Error Probability Distribution E(y) = + βx x x1 x2 x3 91
古典常態迴歸模型I 考慮成對的隨機樣本 , 其中 以及 其中, (A1)–(A5) 就是古典常態迴歸模型的基本假設
古典常態迴歸模型的主要假設 (A1) 線性條件期望值(linear conditional expectation): 。注意到由於假設(A5), Yi 的條件期望值等於非條件期望值。
(A2) 均齊變異(homoskedasticity): for all i。亦即條件變異數為一常數, 不會隨xi 改變而改變。如果Yi 的條件變異數不為常數則稱之為非均齊變異(heteroskedasticity), 如何處理非均齊變異已超出本書範圍, 有興趣的讀者可參閱計量經濟學之相關書籍。
(A3) 為獨立樣本, 也就是說Yi 為獨立的隨機變數。若資料為橫斷面資料(cross-sectiondata), 此假設多能成立。然而, 資料若為時間數列(time series data) 如國內生產毛額(GDP),則該假設不成立。譬如說, 去年的GDP 與今年的GDP 具有相關性, 並不獨立。
古典常態迴歸模型的次要假設 (A4) Yi 具常態分配。此假設可輕易放寬, 因為就算Yi 不具常態分配, 只要樣本點夠多, 可以應用中央極限定理。
(A5) x 為固定抽樣(stratified sampling): 亦即xi 為非隨機的變數。如果資料來自於實驗(laboratory experiment), 則x 就是實驗者所控制的變數。在經濟學的研究中, 固定抽樣可進行如下: 先將母體根據x 值分成好幾個次母體(subpopulation), 再由各個次母體中隨機抽樣。
(A5) 舉例來說, 如果我想研究教育程度對薪資所得之影響, 可先將母體分成國小, 國中, 高中, 大學以及研究所等次母體, 再分別由各次母體中抽樣。此種抽樣法並非在所有經濟學的研究中都可行,然而, 再加入一些其他假設, 就能允許xi 為隨機變數, 我們之後將對此有所討論。
聯立求解以找出極小化q 的α 及β。我們將此最小平方估計式稱之為 與 : 注意到為什麼這裡的最小平方解又稱做最小平方估計式? 因為 與 是隨機樣本的函數!
(A5) 至於要求x 非完全相等的直觀解釋在於, 如果x完全相等, 代表x 的值沒有變化, 則x 自然對於Yi 的預測無法提供任何有用的資訊。此外, 即使x 的值非完全相等, 但其變化不大, 我們亦無法據此建立一個好的模型。
(A5) 譬如說, 如果我們想要探討頂級模特兒的薪資(Yi ) 與其身材(xi) 之間的關係, 亦即是否身材較好的模特兒其薪資所得亦較高 (A5) 譬如說, 如果我們想要探討頂級模特兒的薪資(Yi ) 與其身材(xi) 之間的關係, 亦即是否身材較好的模特兒其薪資所得亦較高? 由於美女身材太標準(參見趙民德, 李紀難(2005), 表12.2), x 的值變化不大, 我們反而不易利用模特兒的身材來解釋其薪資所得。
古典常態迴歸模型
Error Probability Distribution E(y) = + βx x x1 x2 x3 91
古典常態迴歸模型的另一種表示法 如果我們定義ei 為 則上式可以改寫成 其中ei 一般稱作迴歸模型的殘差(residual), 誤差(error), 或是干擾項(disturbance)。
古典常態迴歸模型的另一種表示法 殘差的意義就是Yi 與其條件期望值E(Yi |xi )之間的差異。如果我們將條件期望值E(Yi |xi )想成給定xi 的資訊下, 對於Yi 的預測, 則ei 就能想成是一個預測誤差 根據假設(A1) 及(A2), 我們知道
古典常態迴歸模型II
古典常態迴歸模型II 是一般常見的迴歸模型表達方式, 雖然不如古典常態迴歸模型I 簡潔易懂, 但此表達方式相當符合所謂的「模型加上誤差」直覺: 亦即, 我們所觀察到的實際資料可分成 利用模型可以捕捉到的部分, 模型無法解釋的部分
What do we want from an estimator? In general, we want an estimator that gets as close as possible to the unknown true value, at least in some average sense. In other words, we want the sampling distribution of an estimator to be as tightly centered around the unknown value as possible. This leads to three specific desirable characteristics of an estimator.
Three desirable characteristics of an estimator. Let denote some estimator of , Unbiasedness: Consistency: Efficiency. Let be another estimator of , and suppose both and are unbiased. Then is said to be more efficient than if
最小平方估計式的一般性質
最小平方估計式的一般性質
最小平方估計式的一般性質 最後要說明的是, 由於Var ( ˆ β), Var (ˆα) 與Cov(ˆα, 都包含未知參數σ, 因此, Var ( ˆ β), Var ˆ 與 的估計式為
Gauss-Markov 定理 我們之前已經說明過ˆα 以及ˆ β 都是Yi 線性函數: 由於ˆα 與ˆ β 均為不偏估計式, 則我們可以將他們稱為線性不偏估計式 (linear unbiased estimator)
Gauss-Markov 定理 Gauss-Markov 定理說明了, 在所有可能的線性不偏估計式中, ˆα 以及ˆ β 具有變異數最小的性質
Gauss-Markov 定理 考慮β 另一個可能的線性不偏估計式, b: 則我們可以證明
Gauss-Markov 定理 簡言之, ˆ 與 β 為所有線性不偏估計式中, 變異數最小的線性不偏估計式, 從而又被稱作最佳線性不偏估計式(Best Linear Unbiased Estimator), 簡稱BLUE
放寬有關自變數的假設 迄今, 我們假設x 為非隨機變數。然而, 在大部分的 經濟經濟學研究中, x 為隨機。因此, 若x為隨機變 數, 我們之前所討論的最小平方估計式的各個性質 將不再成立 然而, 只要我們對於古典迴歸模型作一些適當的修 改, 則之前所提到的大部分性質將不會改變
放寬有關自變數的假設 修改一: 假設X 與迴歸誤差e 為獨立 修改二: 假設 不難發現[修改一]是一個較強的假設
19.3 Properties of Residuals Show variation that remains in the data after accounting for the linear relationship defined by the fitted line. Should be plotted against x to check for patterns. Copyright © 2011 Pearson Education, Inc. 22 of 37
19.3 Properties of Residuals Residual Plots If the least squares line captures the association between x and y, then a plot of residuals versus x should stretch out horizontally with consistent vertical scatter. Can use the visual test for association to check for the absence of a pattern. Copyright © 2011 Pearson Education, Inc. 23 of 37
19.3 Properties of Residuals Residual Plot for Diamond Example There is a subtle pattern. The residuals become more variable as x (carat weight) increases. Copyright © 2011 Pearson Education, Inc. 24 of 37
19.3 Properties of Residuals Standard Deviation of Residuals (se) Measures how much the residuals vary around the fitted line. Also known as standard error of the regression or the root mean squared error (RMSE). For the diamond example, se = $169. Copyright © 2011 Pearson Education, Inc. 25 of 37
19.3 Properties of Residuals Standard Deviation of Residuals Since the residuals are approximately normal, the empirical rule implies that about 95% of the prices are within $338 of the regression. Copyright © 2011 Pearson Education, Inc. 26 of 37
19.5 Conditions for Simple Regression Checklist Linear: use scatterplot to see if pattern resembles a straight line. Random residual variation: use the residual plot to make sure no pattern exists. No obvious lurking variable: need to think about whether other explanatory variables might better explain the linear association between x and y. Copyright © 2011 Pearson Education, Inc. 29 of 37
Copyright © 2011 Pearson Education, Inc. 4M Example 19.2: LEASE COSTS Motivation How can a dealer anticipate the effect of age on the value of a used car? The dealer estimates that $4,000 is enough to cover the depreciation per year. Copyright © 2011 Pearson Education, Inc. 30 of 37
Copyright © 2011 Pearson Education, Inc. 4M Example 19.2: LEASE COSTS Method Use regression analysis to find the equation that relates y (resale value in dollars) to x (age of the car in years). The car dealer has data on the prices and age of 218 used BMWs in the Philadelphia area. Copyright © 2011 Pearson Education, Inc. 31 of 37
Copyright © 2011 Pearson Education, Inc. 4M Example 19.2: LEASE COSTS Mechanics Linear association is evident. Mileage of the car may be a potential lurking variable. Copyright © 2011 Pearson Education, Inc. 32 of 37
Copyright © 2011 Pearson Education, Inc. 4M Example 19.2: LEASE COSTS Mechanics The fitted least squares regression line is Estimated Price = 39,851.72 – 2,905.53 Age r2 = 0.45 and se = $3,367 Copyright © 2011 Pearson Education, Inc. 33 of 37
Copyright © 2011 Pearson Education, Inc. 4M Example 19.2: LEASE COSTS Mechanics Residuals are random. Copyright © 2011 Pearson Education, Inc. 34 of 37
Copyright © 2011 Pearson Education, Inc. 4M Example 19.2: LEASE COSTS Message The results indicate that used BMWs decline in resale value by $2,900 per year. The current lease price of $4,000 per year appears profitable. However, the fitted line leaves more than half of the variation unexplained. And leases longer than 5 years would require extrapolation. Copyright © 2011 Pearson Education, Inc. 35 of 37
Copyright © 2011 Pearson Education, Inc. Best Practices Always look at the scatterplot. Know the substantive context of the model. Describe the intercept and slope using units of the data. Limit predictions to the range of observed conditions. Copyright © 2011 Pearson Education, Inc. 36 of 37
Copyright © 2011 Pearson Education, Inc. Pitfalls Do not assume that changing x causes changes in y. Do not forget lurking variables. Don’t trust summaries like r2 without looking at plots. Copyright © 2011 Pearson Education, Inc. 37 of 37