Sampling Error and Hypothesis Test

Slides:

Advertisements

Similar presentations

20-Opening 統計學授課教師：楊維寧 10Simple-R-Commands.

Advertisements

Sampling 抽樣中央大學. 資訊管理系范錚強 mailto: updated 11.

第十三章医学统计学方法的基本概念和基本步骤

第二章语言测试的功能与分类湖南师范大学外国语学院邓杰教授.

Dr. Baokun Li 经济实验教学中心商务数据挖掘中心

人群健康研究的统计方法预防医学系指导教师：方亚电话：

Measures of location and dispersion

第一章緒論.

2017/3/9 实验误差及其控制魏敏杰陈杰阮强王振宁单凤平孟繁浩富伟能陈磊中国医科大学.

如何定义和确定参考区间郭健卫生部北京医院.

Physician Financial Incentives and Cesarean Section Delivery

6.6 单侧置信限 1、问题的引入 2、基本概念 3、典型例题 4、小结.

完全随机设计多样本资料秩和检验.

第三章隨機變數.

第三篇医学统计学方法. 第三篇医学统计学方法医学统计学方法实习2 主讲人陶育纯医学统计学方法实习2 主讲人陶育纯流行病与卫生统计学教研室

Chapter 8 Liner Regression and Correlation 第八章直线回归和相关

第四章抽样误差与假设检验要求：掌握：均数的抽样误差与标准误，t分布的特征，t界值表，总体均数可信区间及其与参考值范围的区别。

Euler’s method of construction of the Exponential function

Business Statistics Topic 6

Introduction To Mean Shift

Analysis of Variance 變異數分析

Population proportion and sample proportion

Descriptive statistics

模式识别 Pattern Recognition

一元线性回归（二）.

What are samples?. Chapter 6 Introduction to Inferential Statistics Sampling and Sampling Designs.

大眾媒體研究導論 Chapter 4 抽樣第一部分研究程序

第十章兩母體之假設檢定 Inferences Based on Two-Samples:

Estimation and Confidence Intervals

分析化学教程第二章分析数据处理及分析测试的质量保证（1）分析化学教程（学年)

Continuous Probability Distributions

Properties of Continuous probability distributions

Sampling Theory and Some Important Sampling Distributions

簡單迴歸模型的基本假設用最小平方法(OLS-ordinary least square)找到一個迴歸式：

第11章抽樣設計 本章的學習主題 1.抽樣的基本概念 2.抽樣的程序 3.機率抽樣 4.非機率抽樣 5.電話抽樣

製程能力分析何正斌教授國立屏東科技大學工業管理學系.

New Statistical Tools for Android Device

Chapter 7 Sampling and Sampling Distributions

護理研究概論─ 樣本與取樣策略許翠華長庚科技大學護理系 T.H. Hsu.

Interval Estimation區間估計

塑膠材料的種類塑膠在模具內的流動模式流動性質的影響溫度性質的影響

消費者偏好與效用概念.

第十章方差分析.

Workshop on Statistical Analysis

Chap 9 Testing Hypotheses and Assessing Goodness of Fit

第七章参数估计 7.3 参数的区间估计.

统计学 (第三版) 2008 作者贾俊平统计学.

第 9 章估　計.

生物統計 1 課程簡介 (Introduction)

Introduction to Basic Statistics

抽樣分配 Sampling Distributions

相關統計觀念復習 Review II.

第八章假設之檢定與信賴區間陳順宇教授成功大學統計系.

Introduction to Basic Statistics

Review 統計方法的順序確定目的蒐集資料整理資料分析資料推論資料 (變量，對象) (方法：普查，抽樣)

The Bernoulli Distribution

CH13 超越描述統計：推論統計.

Review of Statistics.

医学统计学（Medical Statistics）

严肃游戏设计—— Lab-Adventure

品質管理與實習 : MIL-STD-105E 何正斌國立屏東科技大學工業管理學系.

何正斌博士國立屏東科技大學工業管理研究所教授

第十五讲区间估计本次课讲完区间估计并开始讲授假设检验部分下次课结束假设检验，并进行全书复习本次课程后完成作业的后两部分

第七章计量资料的统计分析.

簡單迴歸分析與相關分析莊文忠副教授世新大學行政管理學系計量分析一(莊文忠副教授) 2019/8/3.

Gaussian Process Ruohua Shi Meeting

Presentation transcript:

Sampling Error and Hypothesis Test 第四章抽样误差与假设检验 Sampling Error and Hypothesis Test 宇传华 yuchua@163.com www.hstathome.com

Contents §1. Sampling error of estimated mean 均数的抽样误差 §2. z distribution & t distribution z分布与t分布 §3. Estimate of population mean 总体均数的估计 §4. Principle and procedures of hypothesis test 假设检验的基本思想与步骤

error §1. Sampling error of estimated mean 均数的抽样误差 systematic error(系统误差) ----avoidable error random measurement error 随机测量误差 random error 随机误差 random sampling error 随机抽样误差 ----unavoidable Difference between true and estimate

Sampling statistical analysis descriptive inferential Sample 1.Interval estimating 2. Hypotheses testing statistical analysis descriptive inferential Sample population Mean ：m SD ：s Sampling error

Sampling errors of means ↓ Sampling errors of means Population sample sample Population mean μ ≠ Sample mean Sampling error is the difference between the sample mean and the population mean, due to the chance selection of individuals.

Example for sampling error of mean

将这100份样本的均数看成新变量值，按第二章的频数分布方法，得到这100个样本均数的直方图如下：

Central Limit Theorem 中心极限定理 1) If X ～N(μ,σ2) , then 2) If X ～N(μ,σ2) , when n is large enough, n≥30 or 50 is large enough in general But if X is strongly skewness，sample sizes should be more large.

中心极限定理: 当样本含量足够大的情况下，无论原始测量变量服从什么分布，的抽样分布均近似正态。样本含量足够大条件？ s 抽样分布

Standard Error of the Mean（SEM）均数的标准误 This equation implies that sampling error decreases as sample size increases. This is important because it suggests that if we want to make sampling error as small as possible, we need to use as large of a sample size as we can manage. Sample size （n）

图不同样本含量时的均数分布

例4.1 在某地随机抽查成年男子140人，计算得红细胞均数4.77×1012/L，标准差0.38 ×1012/L ，试计算均数的标准误。标准误是抽样分布的重要特征之一，可用于衡量抽样误差的大小，更重要的是可以用于参数的区间估计和对不同组之间的参数进行比较。

The difference between SD & SE SD （）：It describes the dispersion of the observations from the mean. ---used in the descriptive analysis SE（）： It describes the dispersion of the sample means from the population mean. ---used in the inferential analysis

§2. z distribution & t distribution population sample 1 sample 2 …… sample r

t-distribution

Characteristics of t-distribution The density function is symmetry about t =0； When t =0, density function has max value； There are a cluster of density functions with different degree of freedomν, the smaller the ν, the dispersed the t-values, the larger theν, the density function more closer to the normal distribution. When ν=∞, t-distribution function is identical with the normal distribution function. The area under the curve is 1.

(p195)

Table for t-critical values (p195，附表2) α/2 .05/2 α/2 .05/2 (ν= ∞) two-tail 双侧 -t t -1.96 1.96 α 0.05 (ν= ∞) one-tail 单侧 -t t -t 1.64

Ex1：For given probability αand df, find out the critical values t(α, df). 由P值，df获得t临界值 t -t t 0.05/2, 20 = t 0.05/2, 50 = t 0.05/2, ∞= 2.086 2.009 1.960 =TINV(0.05,20) 1.96 -1.96 .05/2 (ν= ∞) =NormsINV(0.05/2) t 0.05, 20 = t 0.05, 50 = t 0.05, ∞= 1.725 1.676 1.645 =TINV(2*0.05,20) 0.05 (ν= ∞) =NormsINV(0.05) -t 1.64 (p195，附表2)

Ex2：For given critical values and df, find out the probability interval. 由t、df获得P值 t -t =TDIST(1.85,10,1) t ≥1.85,ν=10 t ≤ -1.85,ν=10 |t| ≥ 1.85,ν=10 0.025<P(t ≥ 1.85)<0.05 0.025<P(t ≤-1.85)<0.05 0.05<P(| t | ≥1.85)<0.10 =TDIST(1.85,10,2) two-tail 双侧 t ≥ 1.96 ,ν= ∞ t ≤-1.96 ,ν= ∞ |t| ≥1.96 ,ν= ∞ P(t ≥ 1.96)=0.025 P(t ≤-1.96)=0.025 P(| t | ≥1.96)=0.05 =NormsDIST(-1.96) two-tail 双侧 =2*NormsDIST(-1.96)

§3. Estimate of population mean Estimate of population mean 总体均数的估计 Estimate of population mean Point estimate点估计 Interval estimate区间估计 Mean: Confidence Interval: 95% CI for μ: ( L, U )

Point Estimator(点估计) A point estimator draws inference about a population by estimating the value of an unknown parameter using a single value or point. Parameter ? Population distribution Point estimator Sampling distribution

Interval Estimator（区间估计） An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. Parameter Population distribution Sample distribution Interval estimator

Confidence Interval Estimation（置信区间（CI）估计） (1)称为可信度或置信度（confidence level），usually (1) =95％，sometime 90% or 99%。置信限(confidence limit，CL): 较小的称为置信下限（lower limit，L）较大的称为置信上限（upper limit，U）

Interpreting the CI 解释 Many students want to say that a 95% confidence interval means that there is a 95% chance that the confidence interval contains the population mean. But any particular confidence interval either contains the population mean, or it doesn’t. The confidence interval shouldn’t be interpreted as a probability. The correct interpretation is based on repeated sampling. If samples of the same size are drawn repeatedly from a population, and a confidence interval is calculated from each sample, then 95% of these intervals should contain the population mean.

置信区间的涵义

Information and the Width of the Interval 区间宽度与信息量 Wide interval estimator provides little information（宽的区间提供的信息量较少）. Where is m ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Wide interval estimator provides little information. Where is m ? Ahaaa! Here is a much narrower interval. If the confidence level remains unchanged, the narrower interval provides more meaningful information.

The width of the confidence interval is affected by置信区间宽度的影响因素 the population standard deviation (s) 1.总体标准差 the confidence level (1-a) 2. 置信度 the sample size (n) 3. 样本含量

The Affects of s on the interval width 总体标准差的影响 90% Confidence level Suppose the standard deviation has increased by 50%标准差增加50% To maintain a certain level of confidence, a larger standard deviation requires a larger confidence interval.

The Affects of Changing the Confidence Level 置信度的影响 90% 95% Let us increase the confidence level from 90% to 95%. 置信度由90%增加到95% Larger confidence level produces a wider confidence interval

The Affects of Changing the Sample Size 样本含量的影响 90% Confidence level Increasing the sample size decreases the width of the confidence interval while the confidence level can remain unchanged.

Confidence Interval for the Mean 总体均数置信区间 Population Assumption None Normal Population Standard Dev s known s unknown Sample Size n 30 n < 30 Statistic Used z t s已知，或s未知但n足够大，按z分布 s未知，且 n较小，按t分布

The Confidence Interval for m ( s is known) This leads to the following equivalent statement The confidence interval 95% CI forμ

Graphical Demonstration of the Confidence Interval for m Confidence level 1 - a Lower confidence limit Upper confidence limit

Example of the confidence interval for m ( s is known) the confidence interval for 95% （90%）confidence level. e.g. for 95% .95 .90

The Confidence Interval for m ( s is unknown or n is small) 样本含量较小s未知的情况 This leads to the following equivalent statement The confidence interval 95% CI forμ

例4. 2 某医生测得25名动脉粥样硬化患者血浆纤维蛋白原含量的均数为3. 32 g/L，标准差为0 例4.2 某医生测得25名动脉粥样硬化患者血浆纤维蛋白原含量的均数为3.32 g/L，标准差为0.57 g/L，试计算该种病人血浆纤维蛋白原含量总体均数的95%可信区间。下限：上限：

例4.3 试计算例4.1中该地成年男子红细胞总体均数的95%可信区间。本例属于大样本，可采用正态近似的方法计算可信区间。因为，则95%可信区间为：下限：上限：

one-sided confidence interval (单侧置信区间)

总体均数置信区间与参考值范围的区别

附录方差的抽样误差与置信区间卡方界值见P206附表7

§4. Principle and procedures of hypothesis test 假设检验的基本思想与步骤一、基本思想：反证法，小概率反证法：事先对总体分布参数做出某种假设（如H0 ：m1=m2 ），如果样本信息不支持该假设，则认为该假设不成立。目的是想证实两参数有差异，但其假定是两参数无差异。正如司法部门判案！！！！小概率：事先给定一个小概率，如0.05=5%=α，如果样本信息支持H0的概率≤α，则认为在某一次试验中， H0不成立，由此拒绝H0 . 并非某一次完全不会发生，只是犯错误的机会小于或等于α ，统计上称α 为I型错误。

Principle of hypothesis test (假设检验原理): A null hypothesis（H0） is made about a paramenter. Data is then collected and used to estimate that parameter, and its results are compared to your hypothesized value. When comparing the observed to the hypothesized value the following question is asked, "What is the likelihood of the observed value if the hyothesis is true?" If the answer to that question is "very unlikely" (i.e. the probability is less than 5%) then you conclude that the null hypothesis must be wrong, and the opposite statement （H1） correct.

问题归纳：样本疗效药物作用 + 机遇问题：究竟多大能够下“有效”的结论？治疗前后甘油三酯的变化（差值）例4.4 使用黑加仑油软胶囊治疗高脂血症，30名高脂血症患者治疗前后血清甘油三酯检测结果的差值为1.38±0.76 (g/L)，问治疗后血清甘油三酯是否有所改善？治疗前后甘油三酯的变化（差值）样本问题归纳：样本疗效药物作用 + 机遇问题：究竟多大能够下“有效”的结论？

根据 t 分布能够计算出有如此大差异的概率P ，如果P 值很小，即计算出的t 值超出了给定的界限，则倾向于拒绝H0，认为治疗前后有差别。

假设检验的基本步骤 1.建立假设和确定检验水准无效假设H0(null hypothesis)指需要检验的假设，备择假设H1(alternative hypothesis)指在H0成立证据不足的情况下而被接受的假设。例如 2.选择检验方法和计算检验统计量根据资料类型、研究设计方案和统计推断的目的，选择适当的检验方法，不同检验方法各有其相应的检验统计量及计算公式。许多假设检验方法是以检验统计量来命名的，如 t 检验、z检验、F检验和检验等。 3.确定P 值并做出统计推断结论查表得到检验用的临界值，然后将算得的统计量与拒绝域的临界值作比较，确定P 值（或直接计算机软件计算获得P 值）。如对双侧 t 检验，则 ,按检验水准拒绝H0。

检验假设：如法官判定一个人是否犯罪，首先是假定他“无罪”（H0），然后通过侦察寻找证据，如果证据充分则拒绝 “无罪”的假定（H0），判嫌疑人有罪；否则只能暂且认为“无罪”的假定（H0）成立。

单、双侧检验 H1： μ≠μ0，双侧，μ<μ0与μ>μ0都有可能 H1： μ>μ0，单侧 H1： μ<μ0，单侧对于本例，根据医学知识，经常参加体育锻炼的中学男生心率不会高于一般中学男生的心率。所以使用单侧。即H0：μ＝μ0，H1：μ<μ0 由专业知识确定单、双侧。

假设检验中α与P有何不同 P f (z) z α事先确定的界值。有统计学意义的最大P值。所谓P值，是指在H0成立的前提下，出现目前样本数据对应的统计量（如z、t、F值等）数值乃至比它更极端数值的概率。P值也是一个随机变量。 P f (z) ＝0.05 z －1.96 1.96