非均一性的誤差變異數 and SERIAL CORRELATION

Slides:

Advertisements

Similar presentations

統計學 : 應用與進階第 11 章 : 點估計.  點估計  類比原則  最大概似法  不偏性  有效性  一致性.

Advertisements

计量经济学第五章异方差性.

Dr. Baokun Li 经济实验教学中心商务数据挖掘中心

數據挖掘課程王海深圳國泰安教育技術股份有限公司.

Chapter 15 複迴歸.

How to Use SPSS in Biomedical Data analysis

Chapter 3 預測.

B型肝炎帶原之肝細胞癌患者接受肝動脈栓塞治療後血液中DNA之定量分析

多元迴歸 Multiple Regression

-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學資訊管理系李麗華教授.

第三章隨機變數.

Chapter 8 Liner Regression and Correlation 第八章直线回归和相关

第四章测试效度及其验证方法（一）湖南师范大学外国语学院邓杰教授.

3-3 Modeling with Systems of DEs

Euler’s method of construction of the Exponential function

-Artificial Neural Network- Adaline & Madaline

Platypus — Indoor Localization and Identification through Sensing Electric Potential Changes in Human Bodies.

Analysis of Variance 變異數分析

Population proportion and sample proportion

Chapter 2 簡單迴歸模型.

模式识别 Pattern Recognition

What are samples?. Chapter 6 Introduction to Inferential Statistics Sampling and Sampling Designs.

优化试验设计与数据分析第五章回归分析方法本章主要内容 · 一元线性回归方程度建立、显著性检验、预报和控制。非线性回归方程的线性化。

次数依变量模型 (Models for Count Outcomes)

非線性規劃 Nonlinear Programming

課程九迴歸與相關1.

Continuous Probability Distributions

第六章正态条件下回归的推论.

多元回歸及模型 Multiple Regression Model Building

Properties of Continuous probability distributions

Stochastic Relationships and Scatter Diagrams

Sampling Theory and Some Important Sampling Distributions

第十一章. 簡單直線迴歸與簡單相關 Simple Linear Regression and Simple Correlation

十一、簡單相關與簡單直線回歸分析(Simple Correlations and Simple Linear Regression )

簡單迴歸模型的基本假設用最小平方法(OLS-ordinary least square)找到一個迴歸式：

Logistic Regression Appiled Linear Statistical Models，由Neter等著

Chapter 14 Simple Linear Regression

Tel: 第11章 SPSS在时间序列预测中的应用周早弘旅游与城市管理学院

Inventory System Changes and Limitations

Interval Estimation區間估計

消費者偏好與效用概念.

主講人陳陸輝特聘研究員兼主任政治大學選舉研究中心

The Nature and Scope of Econometrics

多元迴歸分析.

第四章测试效度及其验证方法（一）湖南师范大学外国语学院邓杰教授.

相關係數（Correlation）描述兩個變數X、Y之間的線性相關 Example: data1中的身高及體重如何量化這樣的線性關係呢？

以每年參觀Lake Keepit的人數為例

统计学 (第三版) 2008 作者贾俊平统计学.

MyLibrary ——数字图书馆的个性化服务

CH6 Pairs Selection in Equity Markets

第3章預測 2019/4/11 第3章預測.

相關統計觀念復習 Review II.

Design and Analysis of Experiments Final Report of Project

課程十迴歸3.

Simple Regression (簡單迴歸分析)

The Bernoulli Distribution

社会研究方法第7讲：社会统计2.

第二章经典线性回归模型：双变量线性回归模型

Statistics Chapter 1 Introduction Instructor: Yanzhi Wang.

Nucleon EM form factors in a quark-gluon core model

Review of Statistics.

More About Auto-encoder

品質管理與實習 : MIL-STD-105E 何正斌國立屏東科技大學工業管理學系.

Logistic回归 Logistic regression 研究生《医学统计学》.

Multiple Regression: Estimation and Hypothesis Testing

线性分类方匡南教授博士生导师耶鲁大学博士后厦门大学数据挖掘研究中心副主任厦门大学经济学院统计系中华数据挖掘协会(台湾）顾问

簡單迴歸分析與相關分析莊文忠副教授世新大學行政管理學系計量分析一(莊文忠副教授) 2019/8/3.

Gaussian Process Ruohua Shi Meeting

Presentation transcript:

非均一性的誤差變異數 and SERIAL CORRELATION Variance stabilize transformation Weighted LS correlated errors collinear data

Heteroskedasticity 變異數不齊一性 For regressions with cross-section data it is usually safe to assume the errors are uncorrelated, but often their variances are not constant across individuals. This is known as the problem of heteroskedasticity (for "unequal scatter"). Unequal Error Variances the usual assumption of constant error variance is referred to as homoskedasticity. Although the mean of the dependent variable might be a linear function of the regressors, the variance of the error terms might also depend on those same regressors, so that the observations might "fan out" in a scatter diagram, as illustrated in the following diagrams.

Heteroskedasticity

Heteroscadasticity 1.Examples: Variance (large firm’s sales) > Variance (small firm’s sales) Variance (high income family’s expenditure) > Variance (low income family’s) 2.Error Terms: …the error variance is not constant for each obs.

unweighted model Call:lm(formula = Y ~ X1 + X2 + X3, data = education.table) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.566e+02 1.232e+02 -4.518 4.34e-05 *** X1 7.239e-02 1.160e-02 6.239 1.27e-07 *** X2 1.552e+00 3.147e-01 4.932 1.10e-05 *** X3 -4.269e-03 5.139e-02 -0.083 0.934 --- Residual standard error: 40.47 on 46 degrees of freedom Multiple R-Squared: 0.5913, Adjusted R-squared: 0.5647 F-statistic: 22.19 on 3 and 46 DF, p-value: 4.945e-09

boxplot of residuals from the unweighted model boxplot(rstandard(education.lm) ~ Region) Remove Alaska: 1975 was a big oil year

unweighted model after removing Alaska Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -277.57731 132.42286 -2.096 0.041724 * X1 0.04829 0.01215 3.976 0.000252 *** X2 0.88693 0.33114 2.678 0.010291 * X3 0.06679 0.04934 1.354 0.182591 Residual standard error: 35.81 on 45 degrees of freedom Multiple R-Squared: 0.4967, Adjusted R-squared: 0.4631 F-statistic: 14.8 on 3 and 45 DF, p-value: 7.653e-07

Properties of Classical Least Squares Under Heteroskedasticity Least squares estimators of and still unbiased and consistent; a) unbiased b) consistency (2) Not efficient (Best? Min variance?) 即的variance並非minimum

Inherent Heteroscedasticity If the response variable follows a distribution in which the variance is functionally related to the mean, heteroscedasticity is inherent. When the residuals from OLS Regression were plotted against the predicted values, the characteristic funnel shape was observed. This would suggest that a transformation on the dependent variable Y might stabilize the variance. If Y is a Poisson random variable, counting the number of occurrences per unit of time or space, then the variance increases with the mean.

Some Types of Nonnormal Data and Their Variance-Stabilizing Transformations Type of Distribution Relationship of Mean & Variance Type of Transformation Poisson Variance = Mean Square Root Binomial Proportions Mean = p Variance = p(1–p)/n Arcsine of Exponential SD = Mean Log and Rank

Variance Stabilizing Transformations We investigate 3 different transformations: Square Root Logarithm Inverse • An examination of the following residual plots shows that only the inverse transformation improves the heterscedasticity, but not as well as the WLS regression. • Also, the OLS regression using the inverse(Y) as the response is harder to interpret than the WLS regression using Y as the response.

Another remedial measure in such cases is to use weighted least squares.

廣義的複迴歸模型函數假設誤差項，對應不同的誤差項則變異數為，則廣義的複迴歸模型函數為： (M1.1) 其中為參數假設誤差項，對應不同的誤差項則變異數為，則廣義的複迴歸模型函數為： (M1.1) 其中為參數 Xi1, …, Xi,p–1為已知的常數為獨立且服從 i = 1,…, n

對於廣義複迴歸模型之誤差項的變異－共變異矩陣比之前所討論的更為複雜：

Weighted Least-Squares Regression The logic of WLS Regression In WLS, values of a and bk are estimated which minimize This process has the effect of minimizing the influence of a case with a large error on the estimation of a and bk and maximizing the influence of a case with a small error on the estimation of a and bk

表示模型(M1.1)之迴歸參數的最大概似及加權最小平方估計量，其最簡單的方法是採用矩陣表示。令矩陣W為由加權量wi構成的對角矩陣：則標準的函數如下所示：

而迴歸參數之加權最小平方及最大概似估計量為：其中bw為已加權最小平方法得到之估計的迴歸參數向量。此迴歸參數之加權最小平方估計量的共變異矩陣為：此共變異矩陣為已知，因為變異數　均假設已知。

Find a weight: 誤差項變異數已知 Observations with small variances provide more reliable information about the regression function than do those with large variances. Each observation is weighted differently. inversely proportionally to its variance:

Find a weight -Grouped data the observed yi‘ s are actually averages of several (say ni) observations.

Find a weight: function of a predictor

Ex:Age and BP OLS

Residual plots

Find a weight

Ex: WLS Age and BP

revisit: Education example Suppose we have a hypothesis about the weights, i.e. they are constant within Region, Fit model using OLS (Ordinary Least Squares) to get initial estimate b_{OLS} Use predicted values from this model to estimate wi . Refit model using WLS (Weighted Least Squares). If needed, iterate previous two steps.

weighted model after removing Alaska lm(formula = Y ~ X1 + X2 + X3, weights = weights) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.181e+02 7.833e+01 -4.060 0.000193 *** X1 6.245e-02 7.867e-03 7.938 4.24e-10 *** X2 8.791e-01 2.003e-01 4.388 6.83e-05 *** X3 2.981e-02 3.421e-02 0.871 0.388178 --- Residual standard error: 0.984 on 45 degrees of freedom Multiple R-Squared: 0.7566, Adjusted R-squared: 0.7404 F-statistic: 46.63 on 3 and 45 DF, p-value: 7.41e-14

WLS in matrix form

Consequences of the Errors Being Autocorrelated Correlated errors Consequences of the Errors Being Autocorrelated

SERIAL CORRELATION error corresponding to different observations are correlated.

Non Independence of Error Variables A time series is constituted if data were collected over time. Examining the residuals over time, no pattern should be observed if the errors are independent. When a pattern is detected, the errors are said to be autocorrelated. Autocorrelation can be detected by graphing the residuals against time.

Non Independence of Error Variables Patterns in the appearance of the residuals over time indicates that autocorrelation exists. Residual Residual + + + + + + + + + + + + + + + Time Time + + + + + + + + + + + + + Note the runs of positive residuals, replaced by runs of negative residuals Note the oscillating behavior of the residuals around zero.

自我相關的問題

The impacts- (1) unbiased, consistency, but not efficient 即為不偏的, 即為不偏的, 但為有偏的, biased, 且not minimum variance (2) MSE underestimate the variance of the error terms (3) underestimate (4) CI and tests using the t and F distributions are no longer strictly applicable

Correlated errors (AR(1) noise) Suppose that, instead of being independent, the errors in our model were If rho is close to 1, then errors are very correlated, rho= 0 is independence. This is “Auto-Regressive Order (1)” noise (AR(1)). 第一階自迴歸誤差模型 Many other models of correlation exist: ARMA, ARIMA, ARCH,GARCH, etc.

線性迴歸單預測變數當隨機誤差項服從第一階自我迴歸過程AR(1)時的廣義簡單線性迴歸模型為：隨機誤差項服從第一階自我迴歸過程時的廣義複迴歸模型為：

4.Corrections for serial correlation (1)subscript….t total observations ….T (2)假設

自我相關的Durbin-Watson檢定由於商業及經濟應用上相關的誤差項傾向於有正的序列相關，常見的檢定假說為要得到Durbin-Watson檢定統計量，先以普通最小平方法配適迴歸函數，計算普通殘差然後計算統計量其中n為個案數。

正確臨界值難求，但Durbin和Watson有得到上下界限和，當D值落於界限之外時可得一確定結論。對(12.12)之假說檢定的決策規則為

自我相關的矯正策略考慮轉換的應變數：其中其中為獨立的干擾項。

因此，使用轉換變數及則得具獨立誤差項的標準簡單線性迴歸模型。這表示普通最小平方法用在此模型具有其最適性。

estimating rho Cochrane-Orcutt程序 Cochrane-Orcutt包含三步驟的迭代。假設的自我迴歸誤差過程視為一通過原點的迴歸

Hildreth-Lu程序估計自我相關參數以Hildreth-Lu程序選擇的值要將轉換後迴歸模型的誤差平方和極小化：有電腦程式可找出使SSE最小的值。

第一階差程序由於自我相關參數常很大，且將SSE視為的函數，且其在大時相當平坦，如Blaisdell公司之例，因此有些經濟學家及統計學家建議在轉換的模型 = 1.0代入。其中因此，迴歸係數再一次可直接用普通最小平方法估計，但這次是依經過原點迴歸計算。

轉換後變數的配適迴歸函數為可轉回原始變數如下：其中