Download presentation
Presentation is loading. Please wait.
1
Sampling Error and Hypothesis Test
第四章 抽样误差与假设检验 Sampling Error and Hypothesis Test 宇传华
2
Contents §1. Sampling error of estimated mean 均数的抽样误差
§2. z distribution & t distribution z分布与t分布 §3. Estimate of population mean 总体均数的估计 §4. Principle and procedures of hypothesis test 假设检验的基本思想与步骤
3
error §1. Sampling error of estimated mean 均数的抽样误差
systematic error(系统误差) ----avoidable error random measurement error 随机测量误差 random error 随机误差 random sampling error 随机抽样误差 ----unavoidable Difference between true and estimate
4
Sampling statistical analysis descriptive inferential Sample
1.Interval estimating 2. Hypotheses testing statistical analysis descriptive inferential Sample population Mean :m SD :s Sampling error
5
Sampling errors of means
↓ Sampling errors of means Population sample sample Population mean μ ≠ Sample mean Sampling error is the difference between the sample mean and the population mean, due to the chance selection of individuals.
6
Example for sampling error of mean
7
将这100份样本的均数看成新变量值,按第二章的频数分布方法,得到这100个样本均数的直方图如下:
8
Central Limit Theorem 中心极限定理
1) If X ~N(μ,σ2) , then 2) If X ~N(μ,σ2) , when n is large enough, n≥30 or 50 is large enough in general But if X is strongly skewness,sample sizes should be more large.
9
中心极限定理: 当样本含量足够大的情况下,无论原始测量变量服从什么分布, 的抽样分布均近似正态。 样本含量足够大 条件? s 抽样分布
10
Standard Error of the Mean(SEM) 均数的标准误
This equation implies that sampling error decreases as sample size increases. This is important because it suggests that if we want to make sampling error as small as possible, we need to use as large of a sample size as we can manage. Sample size (n)
11
图 不同样本含量时的均数分布
12
例4.1 在某地随机抽查成年男子140人,计算得红细胞均数4.77×1012/L,标准差0.38 ×1012/L ,试计算均数的标准误。
标准误是抽样分布的重要特征之一,可用于衡量抽样误差的大小,更重要的是可以用于参数的区间估计和对不同组之间的参数进行比较。
13
The difference between SD & SE
SD ( ):It describes the dispersion of the observations from the mean. ---used in the descriptive analysis SE( ): It describes the dispersion of the sample means from the population mean. ---used in the inferential analysis
15
§2. z distribution & t distribution
population sample 1 sample 2 …… sample r
16
t-distribution
17
Characteristics of t-distribution
The density function is symmetry about t =0; When t =0, density function has max value; There are a cluster of density functions with different degree of freedomν, the smaller the ν, the dispersed the t-values, the larger theν, the density function more closer to the normal distribution. When ν=∞, t-distribution function is identical with the normal distribution function. The area under the curve is 1.
18
(p195)
19
Table for t-critical values (p195,附表2)
α/2 .05/2 α/2 .05/2 (ν= ∞) two-tail 双侧 -t t -1.96 1.96 α 0.05 (ν= ∞) one-tail 单侧 -t t -t 1.64
20
Ex1:For given probability αand df, find out the critical values t(α, df).
由P值,df获得t临界值 t -t t 0.05/2, 20 = t 0.05/2, 50 = t 0.05/2, ∞= 2.086 2.009 1.960 =TINV(0.05,20) 1.96 -1.96 .05/2 (ν= ∞) =NormsINV(0.05/2) t 0.05, 20 = t 0.05, 50 = t 0.05, ∞= 1.725 1.676 1.645 =TINV(2*0.05,20) 0.05 (ν= ∞) =NormsINV(0.05) -t 1.64 (p195,附表2)
21
Ex2:For given critical values and df, find out the probability interval.
由t、df获得P值 t -t =TDIST(1.85,10,1) t ≥1.85,ν=10 t ≤ -1.85,ν=10 |t| ≥ 1.85,ν=10 0.025<P(t ≥ 1.85)<0.05 0.025<P(t ≤-1.85)<0.05 0.05<P(| t | ≥1.85)<0.10 =TDIST(1.85,10,2) two-tail 双侧 t ≥ 1.96 ,ν= ∞ t ≤-1.96 ,ν= ∞ |t| ≥1.96 ,ν= ∞ P(t ≥ 1.96)=0.025 P(t ≤-1.96)=0.025 P(| t | ≥1.96)=0.05 =NormsDIST(-1.96) two-tail 双侧 =2*NormsDIST(-1.96)
22
§3. Estimate of population mean Estimate of population mean
总体均数的估计 Estimate of population mean Point estimate点估计 Interval estimate区间估计 Mean: Confidence Interval: 95% CI for μ: ( L, U )
23
Point Estimator(点估计) A point estimator draws inference about a population by estimating the value of an unknown parameter using a single value or point. Parameter ? Population distribution Point estimator Sampling distribution
24
Interval Estimator(区间估计)
An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. Parameter Population distribution Sample distribution Interval estimator
25
Confidence Interval Estimation(置信区间(CI)估计)
(1)称为可信度或置信度(confidence level),usually (1) =95%,sometime 90% or 99%。 置信限(confidence limit,CL): 较小的称为置信下限(lower limit,L) 较大的称为置信上限(upper limit,U)
26
Interpreting the CI 解释 Many students want to say that a 95% confidence interval means that there is a 95% chance that the confidence interval contains the population mean. But any particular confidence interval either contains the population mean, or it doesn’t. The confidence interval shouldn’t be interpreted as a probability. The correct interpretation is based on repeated sampling. If samples of the same size are drawn repeatedly from a population, and a confidence interval is calculated from each sample, then 95% of these intervals should contain the population mean.
27
置信区间的涵义
29
Information and the Width of the Interval 区间宽度与信息量
Wide interval estimator provides little information(宽的区间提供的信息量较少). Where is m ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
30
Wide interval estimator provides little information.
Where is m ? Ahaaa! Here is a much narrower interval. If the confidence level remains unchanged, the narrower interval provides more meaningful information.
31
The width of the confidence interval is affected by置信区间宽度的影响因素
the population standard deviation (s) 1.总体标准差 the confidence level (1-a) 2. 置信度 the sample size (n) 3. 样本含量
32
The Affects of s on the interval width 总体标准差的影响
90% Confidence level Suppose the standard deviation has increased by 50%标准差增加50% To maintain a certain level of confidence, a larger standard deviation requires a larger confidence interval.
33
The Affects of Changing the Confidence Level 置信度的影响
90% 95% Let us increase the confidence level from 90% to 95%. 置信度由90%增加到95% Larger confidence level produces a wider confidence interval
34
The Affects of Changing the Sample Size 样本含量的影响
90% Confidence level Increasing the sample size decreases the width of the confidence interval while the confidence level can remain unchanged.
35
Confidence Interval for the Mean 总体均数置信区间
Population Assumption None Normal Population Standard Dev s known s unknown Sample Size n 30 n < 30 Statistic Used z t s已知,或s未知但n足够大,按z分布 s未知,且 n较小,按t分布
36
The Confidence Interval for m ( s is known)
This leads to the following equivalent statement The confidence interval 95% CI forμ
37
Graphical Demonstration of the Confidence Interval for m
Confidence level 1 - a Lower confidence limit Upper confidence limit
38
Example of the confidence interval for m ( s is known)
the confidence interval for 95% (90%)confidence level. e.g. for 95% .95 .90
39
The Confidence Interval for m ( s is unknown or n is small) 样本含量较小s未知的情况
This leads to the following equivalent statement The confidence interval 95% CI forμ
40
例4. 2 某医生测得25名动脉粥样硬化患者血浆纤维蛋白原含量的均数为3. 32 g/L,标准差为0
例4.2 某医生测得25名动脉粥样硬化患者血浆纤维蛋白原含量的均数为3.32 g/L,标准差为0.57 g/L,试计算该种病人血浆纤维蛋白原含量总体均数的95%可信区间。 下限: 上限:
41
例4.3 试计算例4.1中该地成年男子红细胞总体均数的95%可信区间。
本例属于大样本,可采用正态近似的方法计算可信区间。因为 ,则95%可信区间为: 下限: 上限:
44
one-sided confidence interval (单侧置信区间)
45
总体均数置信区间与参考值范围的区别
46
附录 方差的抽样误差与置信区间 卡方界值见P206附表7
47
§4. Principle and procedures of hypothesis test 假设检验的基本思想与步骤
一、基本思想:反证法,小概率 反证法:事先对总体分布参数做出某种假设(如H0 :m1=m2 ),如果样本信息不支持该假设,则认为该假设不成立。目的是想证实两参数有差异,但其假定是两参数无差异。 正如司法部门判案!!!! 小概率:事先给定一个小概率,如0.05=5%=α,如果样本信息支持H0的概率≤α,则认为在某一次试验中, H0不成立,由此拒绝H0 . 并非某一次完全不会发生,只是犯错误的机会小于或等于α ,统计上称α 为I型错误。
48
Principle of hypothesis test (假设检验原理):
A null hypothesis(H0) is made about a paramenter. Data is then collected and used to estimate that parameter, and its results are compared to your hypothesized value. When comparing the observed to the hypothesized value the following question is asked, "What is the likelihood of the observed value if the hyothesis is true?" If the answer to that question is "very unlikely" (i.e. the probability is less than 5%) then you conclude that the null hypothesis must be wrong, and the opposite statement (H1) correct.
49
问题归纳: 样本疗效 药物作用 + 机遇 问题: 究竟多大能够下“有效”的结论? 治疗前后甘油三 酯的变化(差值)
例4.4 使用黑加仑油软胶囊治疗高脂血症,30名高脂血症患者治疗前后血清甘油三酯检测结果的差值为1.38±0.76 (g/L),问治疗后血清甘油三酯是否有所改善? 治疗前后甘油三 酯的变化(差值) 样 本 问题归纳: 样本疗效 药物作用 + 机遇 问题: 究竟多大能够下“有效”的结论?
50
根据 t 分布能够计算出有如此大差异的概率P ,如果P 值很小,即计算出的t 值超出了给定的界限,则倾向于拒绝H0,认为治疗前后有差别。
51
假设检验的基本步骤 1.建立假设和确定检验水准
无效假设H0(null hypothesis)指需要检验的假设,备择假设H1(alternative hypothesis)指在H0成立证据不足的情况下而被接受的假设。例如 2.选择检验方法和计算检验统计量 根据资料类型、研究设计方案和统计推断的目的,选择适当的检验方法,不同检验方法各有其相应的检验统计量及计算公式。许多假设检验方法是以检验统计量来命名的,如 t 检验、z检验、F检验和 检验等。 3.确定P 值并做出统计推断结论 查表得到检验用的临界值,然后将算得的统计量与拒绝域的临界值作比较,确定P 值(或直接计算机软件计算获得P 值)。 如对双侧 t 检验 ,则 ,按检验水准 拒绝H0。
52
检验假设: 如法官判定一个人是否犯罪,首先是假定他“无罪”(H0),然后通过侦察寻找证据,如果证据充分则拒绝 “无罪”的假定(H0),判嫌疑人有罪;否则只能暂且认为“无罪”的假定(H0)成立。
54
单、双侧检验 H1: μ≠μ0,双侧,μ<μ0与μ>μ0都有可能 H1: μ>μ0,单侧 H1: μ<μ0,单侧
对于本例,根据医学知识,经常参加体育锻炼 的中学男生心率不会高于一般中学男生的心率。所 以使用单侧。即H0:μ=μ0,H1:μ<μ0 由专业知识确定单、双侧。
55
假设检验中α与P有何不同 P f (z) z α事先确定的界值。有统计学意义的最大P值。
所谓P值,是指在H0成立的前提下,出现目前样本数据对应的统计量(如z、t、F值等)数值乃至比它更极端数值的概率。P值也是一个随机变量。 P f (z) =0.05 z -1.96 1.96
Similar presentations