Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimating Chapter 10& Chapter 12 No:1 5/5/2009.

Similar presentations


Presentation on theme: "Estimating Chapter 10& Chapter 12 No:1 5/5/2009."— Presentation transcript:

1 Estimating Chapter 10& Chapter 12 No:1 5/5/2009

2 Review 1.Random Variables
2.Sampling Distributions for Sample Proportions 3. Sampling Distributions for Sample Means 4. What to Expect in Other Situations: CLT 5.Sampling Distribution for Any Statistic No:2 5/5/2009

3 Learning objectives We will learn how to use a realized value of a sample statistics to guess about the unknown value of the corresponding parameter. It is called estimation 1. Point Estimating: a single estimate of a population made by looking at sample statistics. [1] Moment Method, (MM) [2] Maximum Likelihood Estimate (MLE) No:3 5/5/2009

4 2. Interval Estimating: we can construct an interval estimate which adds more confidence to our estimation of the population mean or proportion. Confidence interval, an interval of estimates that is likely to capture the population value. The primary objective of this chapter is to describe how to calculate and interpret a confidence interval. No:4 5/5/2009

5 Learning Contents , Emphases and Difficulties
Confidence interval estimate [1] for one proportion [2] for one mean (or mean difference of pairs) [3] for difference between two means (un-pooled) [4] for difference between two means (pooled) [5] for difference between two proportions (independent sample) No:5 5/5/2009

6 Teaching methods Both English and Chinese
Both PPT and writing on blackboard No:6 5/5/2009

7 estimating proportions with confidence intervals
Section 1: estimating proportions with confidence intervals No:7 5/5/2009

8 10.1 The Language and Notation of Estimation P330
1.Unit: an individual person or object to be measured. 2.Population (or universe): the entire collection of units about which we would like information or the entire collection of measurements we would have if we could measure the whole population. 3.Sample: the collection of units we will actually measure or the collection of measurements we will actually obtain. 4.Sample size: the number of units or measurements in the sample, denoted by n. No:8 5/5/2009

9 More Language and Notation of Estimation
5.Population proportion: the fraction of the population that has a certain trait/characteristic or the probability of success in a binomial experiment – denoted by p. The value of the parameter p is not known. 6.Sample proportion: the fraction of the sample that has a certain trait/characteristic – denoted by The statistic is an estimate of p. 7.A Random selected sample : 8.The Fundamental Rule for Using Data for Inference is that available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to the question(s) of interest. No:9 5/5/2009

10 9.Margin of Error Margin of Error: This measure of accuracy in the sample surveys is a number called the margin of error. In other words: a number that provides a likely upper limit for the difference between the sample proportion and the unknown population proportion. The margin of error provided in Media Descriptions of survey results has these characteristics No:10 5/5/2009

11 Characteristics p331 The difference between the sample proportion and the population proportion is less than the margin of error about 95% of the time, or for about 19 of every 20 sample estimates. The difference between the sample proportion and the population proportion is more than the margin of error about 5% of the time, or for about 1 of every 20 sample estimates In other words, for most sample estimate, the actual error is quite likely to be smaller than the margin of error. No:11 5/5/2009

12 Example 10.1 Teens and Interracial Dating
1997 USA Today/Gallup Poll of teenagers across country: 57% of the 497 teens who go out on dates say they’ve been out with someone of another race or ethnic group. How to use sample data to provide an interval of values that the researcher is confident covers the true value for the population.? Reported margin of error for this estimate was about 4.5%. No:12 5/5/2009

13 In surveys of this size, the difference between the sample estimate of 57% and the true percent is likely* to be less than 4.5% one way or the other. There is, however, a small chance that the sample estimate might be off by more than 4.5%. * The value of how ‘likely’ is often 95%. No:13 5/5/2009

14 10.3 Confidence Intervals P332
Confidence interval: an interval of values computed from sample data that is likely to include the true population value. the phrase confidence level is used to describe the chance that an interval actually contains the true population value in the following sense. Most of the time( quantified by the confidence level) intervals computed in this way will capture the truth about the population, but occasionally they will not. In any given instance, the interval either captures the truth or it does not ,but we will never know which is the case No:14 5/5/2009

15 Therefore , our confidence is in the procedure—it works most of the time– and the “confidence level” or “level of confidence” is the percentage of the time we expect it to work. No:15 5/5/2009

16 In summary Interpreting the Confidence Level
The confidence level is the probability that the procedure used to determine the interval will provide an interval that includes the population parameter. If we consider all possible randomly selected samples of same size from a population, the confidence level is the fraction or percent of those samples for which the confidence interval includes the population parameter. Note: Often express the confidence level as a percent. Common levels are 90%, 95%, 98%, and 99%. No:16 5/5/2009

17 Note Be careful when giving information about a specific confidence interval computed from an observed sample. The confidence level only expresses how often the confidence interval procedure works in the long run p333 No:17 5/5/2009

18 Interpretation of Confidence Intervals
Repeated samples of size n taken from the same population will generate (1–a)% of the time a sample statistic that falls within the stated confidence interval. OR Based on this sample, we have (1–a)% confident that the population parameter falls within the stated confidence interval. Be careful: The confidence level only expresses how often the procedure works in the long run. Any one specific interval either does or does not include the true unknown population value. No:18 5/5/2009

19 置信区间 (confidence interval)
1.由样本统计量所构造的总体参数的估计区间称为置信区间 总体参数的真值是固定的,未知的,而用样本构造的区间则是不固定的。抽取不同的样本时,用该方法可以得到不同的区间,从这个意义上说,置信区间是一个随机区间,它会因样本的不同而不同。 2.统计学家在某种程度上确信这个区间会包含真正的总体参数,所以给它取名为置信区间 例如:95% 考试成绩置信区间 (60,80),不能说(60,80)这个区间以95%的概率包含全班考试成绩的真值,只是知道在多次抽样中有95%的样本得到的 No:19 5/5/2009

20 所以用一个具体的样本所构造的区间是一个特定的区间,我们无法知道这个样本所产生的区间是否包含总体参数的真值
区间包含全班考试成绩的真值。它的真正意义是如果做了100次抽样,大概有95次找到的区间包含真值,有5次找到的区间不含真值。因此,这个概率不是用来描述某个特定的区间包含总体参数真值可能性的,一个特定的区间“总是包含”或“绝对不包含”参数的真值, 不存在“可能包含”或“可能不包含”的问题。 所以用一个具体的样本所构造的区间是一个特定的区间,我们无法知道这个样本所产生的区间是否包含总体参数的真值 我们只能是希望这个区间是大量包含总体参数真值的区间中的一个,但它也可能是少数几个不包含参数真值的区间中的一个 No:20 5/5/2009

21 There are three types: 1.conduct the confidence interval
2.determine the sample size 3.using confidence interval to guide decisions No:21 5/5/2009

22 Sample estimate  Margin of error
10.4 Constructing a 95% Confidence Interval for a Population Proportion Sample estimate  Margin of error In the long run, about 95% of all confidence intervals computed in this way will capture the population value of the proportion, and about 5% of them will miss it. No:22 5/5/2009

23 Confidence Interval on p
z ) 1 ( : × + n n No:23 5/5/2009

24 p335 For a 95% confidence level, the approximate margin of error for a sample proportion is Note: The “95% margin of error” is simply two standard errors, or 2 s.e.( ). No:24 5/5/2009

25 Factors that Determine Margin of Error p335
1. The sample size, n. When sample size increases, margin of error decreases. 2. The sample proportion, If the proportion is close to either 1 or 0 most individuals have the same trait or opinion, so there is little natural variability and the margin of error is smaller than if the proportion is near 0.5. 3. The “multiplier” 2. Connected to the “95%” aspect of the margin of error. Later you’ll learn: the exact value for 95% is 1.96 and how to change the multiplier to change the level. No:25 5/5/2009

26 Example 10.3 Pollen Count Must Be High p336
Poll: Random sample of 883 American adults. “Are you allergic to anything?” Results: 36% of the sample said “yes”, = .36 95% Confidence Interval: .36  .032, or about .33 to .39 We can be 95% confident that somewhere between 33% and 39% of all adult Americans have allergies. No:26 5/5/2009

27 The Conservative Estimate of Margin of Error
Conservative estimate of the margin of error = It usually overestimates the actual size of the margin of error. It works (conservatively) for all survey questions based on the same sample size, even if the sample proportions differ from one question to the next. Obtained when = .5 in the margin of error formula. No:27 5/5/2009

28 Example 10.3 Really Bad Allergies (cont) p337
Poll: Random sample of 883 American adults 3% of the sample experience “severe” symptoms 95% (conservative) Confidence Interval: 3%  3.4%, or -0.4% to 6.4% When is far from .5, the conservative margin of error is too conservative. The 95% margin of error using = .03 is just .011 or 1.1%, for an interval from 1.9% to 4.1%. No:28 5/5/2009

29 10.5 General Format for Confidence Intervals p337 339
For any confidence level, a confidence interval for either a population proportion or a population mean can be expressed as Sample estimate  Multiplier  Standard error The multiplier is affected by the choice of confidence level. No:29 5/5/2009

30 More about the Multiplier p340
Note: Increase confidence level => larger multiplier. Multiplier, denoted as z*, is the standardized score such that the area between -z* and z* under the standard normal curve corresponds to the desired confidence level. No:30 5/5/2009

31 Formula for a Confidence Interval for a Population Proportion p
is the sample proportion. z* denotes the multiplier. where is the standard error of . No:31 5/5/2009

32 Example 10.6 Intelligent Life Elsewhere? 1.conduct the interval p340
Poll: Random sample of 935 Americans Do you think there is intelligent life on other planets? Results: 60% of the sample said “yes”, = .60 90% Confidence Interval: .60  1.65(.016), or .60  .026 98% Confidence Interval: .60  2.33(.016), or .60  .037 Note: entire interval is above 50% => high confidence that a majority believe there is intelligent life. No:32 5/5/2009

33 Example 10.6 Intelligent Life Elsewhere?
Poll: Random sample of 935 Americans “Do you think there is intelligent life on other planets? Results: 60% of the sample said “yes”, = .60 We want a 50% confidence interval. If the area between -z* and z* is .50, then the area to the left of z* is From Table A.1 we have z*  .67. 50% Confidence Interval: .60  .67(.016), or .60  .011 Note: Lower confidence level results in a narrower interval. No:33 5/5/2009

34 总体比率的区间估计 (例题分析) 解:已知 n=100,p=65% , 1- = 95%,z/2=1.96
【例】某城市想要估计下岗职工中女性所占的比率,随机地抽取了100名下岗职工,其中65人为女性职工。试以95%的置信水平估计该城市下岗职工中女性比率的置信区间 该城市下岗职工中女性比率的置信区间为55.65%~74.35% No:34 5/5/2009

35 Conditions for Using the Formula p341
1. Sample is randomly selected from the population. Note: Available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to the question(s) of interest. 2. Normal curve approximation to the distribution of possible sample proportions assumes a “large” sample size. Both and should be at least 10 (although some say these need only to be at least 5). No:35 5/5/2009

36 10.6 Choosing a Sample Size (2.determine the sample size) p341
Table provides 95% conservative margin of error for various sample sizes n Important features: 1. When sample size is increased, margin of error decreases. 2. When a large sample size is made even larger, the improvement in accuracy is relatively small. No:36 5/5/2009

37 The Effect of Population Size
For most surveys, the number of people in the population has almost no influence* on the accuracy of sample estimates. Margin of error for a sample size of 1000 is about 3% whether the number of people in the population is 30,000 or 200 million. * As long as the population is at least ten times as large as the sample. No:37 5/5/2009

38 Sample Size Determination for p from an Infinite Population
Proportion: Note e, the bound within which you want to estimate p, is given. The interval half-width is e, also called the maximum likely error: Solving for n, we find: 2 ) 1 ( e p z n = × No:38 5/5/2009

39 Sample Size Determination for p from a Finite Population
Mean: Note e, the bound within which you want to estimate µ, is given. where n = required sample size N = population size z = z-score for (1–a)% confidence p = sample estimator of p n = p ( 1 ) e 2 z + N No:39 5/5/2009

40 Example 2 (2.determine the sample size)
A student guild whishes to estimate the proportion of students who would support the “voluntary guild fee” proposal being debated, what sample size is necessary to estimate the true level of support to within 5% at the 90% confidence level? No:40 5/5/2009

41 估计总体比率时样本容量的确定 (例题分析)
估计总体比率时样本容量的确定 (例题分析) 解:已知=90%,=0.05, z/2=1.96,E=5% 【例】根据以往的生产统计,某种产品的合格率约为90%,现要求允许误差为5%,在求95%的置信区间时,应抽取多少个产品作为样本? 应抽取的样本容量为 应抽取139个产品作为样本 No:41 5/5/2009

42 10. 7 Using Confidence Intervals. to Guide Decisions p344 3
10.7 Using Confidence Intervals to Guide Decisions p using confidence interval to guide decisions Principle 1. A value not in a confidence interval can be rejected as a possible value of the population proportion. A value in a confidence interval is an “acceptable” possibility for the value of a population proportion. Principle 2. When the confidence intervals for proportions in two different populations do not overlap, it is reasonable to conclude that the two population proportions are different. No:42 5/5/2009

43 Example 10.7 Which Drink Tastes Better?
Taste Test: A sample of 60 people taste both drinks and 55% like taste of Drink A better than Drink B. Makers of Drink A want to advertise these results. Makers of Drink B make a 95% confidence interval for the population proportion who prefer Drink A. 95% Confidence Interval: Note: Since .50 is in the interval, there is not enough evidence to claim that Drink A is preferred by a majority of population represented by the sample. No:43 5/5/2009

44 Case Study 10.1 ESP Works with Movies p345
ESP Study by Bem and Honorton (1994) Subjects (receivers) described what another person (sender) was seeing on a screen. Receivers shown 4 pictures, asked to pick which they thought sender had actually seen. Actual image shown randomly picked from 4 choices. Image was either a single, “static” image or a “dynamic” short video clip, played repeatedly (additional three choices shown were always of the same type as actual. No:44 5/5/2009

45 Case Study 10.1 ESP Works (cont)
Bem and Honorton (1994) ESP Study Results Is there enough evidence to say that the % of correct guesses for dynamic pictures is significantly above 25%? 95% CI: Can claim the true % of correct guesses is significantly better than what would occur from random guessing. No:45 5/5/2009

46 Case Study 10.2 Nicotine Patches vs Zybanp346
Study: New England Journal of Medicine 3/4/99) 893 participants randomly allocated to four treatment groups: placebo, nicotine patch only, Zyban only, and Zyban plus nicotine patch. Participants blinded: all used a patch (nicotine or placebo) and all took a pill (Zyban or placebo). Treatments used for nine weeks. No:46 5/5/2009

47 Case Study 10.2 Nicotine (cont)
Conclusions: Zyban is effective (no overlap of Zyban and no Zyban CIs) Nicotine patch is not particularly effective (overlap of patch and no patch CIs) No:47 5/5/2009

48 Case Study 10.3 What a Great Personalityp347
Would you date someone with a great personality even though you did not find them attractive? Women: 61.1% of 131 answered “yes.” 95% confidence interval is 52.7% to 69.4%. Men: 42.6% of 61 answered “yes.” 95% confidence interval is 30.2% to 55%. Conclusions: Higher proportion of women would say yes. CIs slightly overlap Women CI narrower than men CI due to larger sample size No:48 5/5/2009

49 In Summary: Confidence Interval for a Population Proportion p
General CI for p: Approximate 95% CI for p: Conservative 95% CI for p: No:49 5/5/2009

50 In summary 1.conduct the interval 2.determine the sample size
3.using confidence interval to guide decisions No:50 5/5/2009

51 Confidence intervals for the sample mean p405
Chapter 12 Section 4 No:51 5/5/2009

52 Teaching methods Both English and Chinese
Both PPT and writing on blackboard No:52 5/5/2009

53 Review Confidence interval, an interval of estimates that is likely to capture the population value. Confidence level is the probability that the procedure used to determine the interval will provide an interval that includes the population parameter Confidence interval estimate for one proportion( conduct the confidence interval, determine the sample size, using confidence interval to guide decisions) No:53 5/5/2009

54 12.4 Confidence intervals for the sample mean p405
No:54 5/5/2009

55 Learning objectives In this section, we describe
how to determine a confidence interval for the population mean using a sample of any size, large or small, and with any confidence level No:55 5/5/2009

56 Learning Contents Confidence interval estimate [12.4.1] for a mean
[12.4.2]for the mean difference in paired variables [12.4.3] for difference between two means (un-pooled) [12.4.4] for difference between two means (pooled) No:56 5/5/2009

57 Emphases and Difficulties
Conditions required for using the t confidence interval or z confidence interval confidence interval for difference between two means (paired samples, independent samples, pooled samples, un-pooled samples) No:57 5/5/2009

58 Confidence interval estimate for a mean
Section : Confidence interval estimate for a mean No:58 5/5/2009

59 We can use the general format of a confidence interval :
Sample estimate  Multiplier  Standard error The multiplier is affected by the choice of confidence level. No:59 5/5/2009

60 A Confidence Interval for a Population Mean
where the multiplier t* is the value in a t-distribution with degrees of freedom = df = n – 1 where the multiplier z* is the value in a normal distribution No:60 5/5/2009

61 such that the area between -t. and t
such that the area between -t* and t* equals the desired confidence level. No:61 5/5/2009

62 Conditions for t confidence interval p406-407
Population of measurements is bell-shaped and a random sample of any size is measured; In practice, for small samples, the data show no extreme skewness and should not contain any outliers. Population of measurements is not bell-shaped, but a large random sample is measured, n  30. No:62 5/5/2009

63 conditions Sample size n z z t z large small Y Y N N  is known?
Population bell-shaped? Population bell-shaped? Population bell-shaped? z No:63 5/5/2009

64 Exercises Decide the nature of distribution for the following
(1)samples of 10 with a mean of 12 and a standard deviation of 5 taken form a skewed population (2)sample of 39 with a mean of 60 and a standard deviation of 10 taken from a skewed population (3) sample of 15 taken from a normal population with a mean of 5 and a standard deviation of 1 (4) sample of 10 with a mean 16 and a standard deviation of 2 taken from a normal population No:64 5/5/2009

65 Example 12.5 Mean Forearm Length p405
Data: Forearm lengths (cm) for a random sample of n = 9 men , 24.0, 26.5, 25.5, 28.0, 27.0, 23.0, 25.0, 25.0 Step1: checking the conditions N=9<30 since Dotplot shows no obvious skewness and no outliers. So we can assume Population of measurements is bell-shaped No:65 5/5/2009

66 Step2: calculating the confidence interval
Multiplier t* from Table A.2 with df = 8 is t* = 2.31 95% Confidence Interval:  2.31(.507) => 25.5  1.17 => to cm No:66 5/5/2009

67 Step 3: interpreting the confidence intervals
Based on this sample, we have 95% confidence that somewhere between and for Mean Forearm Length No:67 5/5/2009

68 Example 12.6 What Students Sleep More? p408
Q: How many hours of sleep did you get last night, to the nearest half hour? Class N Mean StDev SE Mean Stat 10 (stat literacy) Stat 13 (stat methods) Step1: checking the conditions : Bell-shape was reasonable for Stat 10 (with smaller n). No:68 5/5/2009

69 Step2: calculating the confidence interval
No:69 5/5/2009

70 Step 3: interpreting the confidence intervals
Interval for Stat 10 is wider (smaller sample size) Two intervals do not overlap => Stat 10 average significantly higher than Stat 13 average. No:70 5/5/2009

71 总体均值的区间估计 (例题分析) 【 例 】一家食品生产企业以生产袋装食品为主,为对产量质量进行监测,企业质检部门经常要进行抽检,以分析每袋重量是否符合要求。现从某天生产的一批食品中随机抽取了25袋,测得每袋重量如下表所示。已知产品重量的分布服从正态分布,且总体标准差为10g。试估计该批产品平均重量的置信区间,置信水平为95% 25袋食品的重量 112.5 101.0 103.0 102.0 100.5 102.6 107.5 95.0 108.8 115.6 100.0 123.5 101.6 102.2 116.6 95.4 97.8 108.6 105.0 136.8 102.8 101.5 98.4 93.3 No:71 5/5/2009

72 总体均值的区间估计 (例题分析) 解:已知X~N(,102),n=25, 1- = 95%,z/2=1.96。根据样本数据计算得:
总体均值在1-置信水平下的置信区间为 该食品平均重量的置信区间为101.44g~109.28g No:72 5/5/2009

73 总体均值的区间估计 (例题分析) 【例】一家保险公司收集到由36投保个人组成的随机样本,得到每个投保人的年龄(周岁)数据如下表。试建立投保人年龄90%的置信区间 36个投保人年龄的数据 23 35 39 27 36 44 42 46 43 31 33 53 45 54 47 24 34 28 40 49 38 48 50 32 No:73 5/5/2009

74 总体均值的区间估计 (例题分析) 解:已知n=36, 1- = 90%,z/2=1.645。根据样本数据计算得:
总体均值在1- 置信水平下的置信区间为 投保人平均年龄的置信区间为37.37岁~41.63岁 No:74 5/5/2009

75 总体均值的区间估计 (例题分析) 【例】已知某种灯泡的寿命服从正态分布,现从一批灯泡中随机抽取16只,测得其使用寿命(小时)如下。建立该批灯泡平均使用寿命95%的置信区间 16灯泡使用寿命的数据 1510 1520 1480 1500 1450 1490 1530 1460 1470 No:75 5/5/2009

76 总体均值的区间估计 (例题分析) 解:已知X~N(,2),n=16, 1- = 95%,t/2=2.131 根据样本数据计算得: ,
根据样本数据计算得: , 总体均值在1-置信水平下的置信区间为 该种灯泡平均使用寿命的置信区间为1476.8小时~1503.2小时 No:76 5/5/2009

77 Note :Converting Confidence Intervals to Accommodate a Finite Population
Mean: or No:77 5/5/2009

78 Confidence interval for the mean difference in paired variables
Section (p409): Confidence interval for the mean difference in paired variables No:78 5/5/2009

79 Paired Data: A Special Case of One Mean
Paired data (or paired samples): when pairs of variables are collected. Only interested in population (and sample) of differences, and not in the original data. Each person measured twice. Two measurements of same characteristic or trait are made under different conditions. Similar individuals are paired prior to an experiment. Each member of a pair receives a different treatment. Same response variable is measured for all individuals. Two different variables are measured for each individual. Interested in amount of difference between two variables. No:79 5/5/2009

80 Paired Data Confidence Interval
Data: two variables for n individuals or pairs; use the difference d = x1 – x2. Population parameter: md = mean of differences for the population = m1 – m2. Sample estimate: = sample mean of the differences Standard deviation and standard error: sd = standard deviation of the sample of differences; Confidence interval for md: , where df = n – 1 for the multiplier t*. No:80 5/5/2009

81 Example 12.7 Screen Time: Computer vs TV p409
Data: Hours spent watching TV and hours spent on computer per week for n = 25 students. Task: Make a 90% CI for the mean difference in hours spent using computer versus watching TV. Note: Boxplot shows no obvious skewness and no outliers. No:81 5/5/2009

82 Example 12.7 Screen Time: Computer vs TV
Results: Multiplier t* from Table A.2 with df = 24 is t* = 1.71 90% Confidence Interval:  1.71(3.05) => 5.36  5.22 => 0.14 to hours Interpretation: We are 90% confident that the average difference between computer usage and television viewing for students represented by this sample is covered by the interval from 0.14 to hours per week, with more hours spent on computer usage than on television viewing. No:82 5/5/2009

83 两个总体均值之差的估计 (匹配大样本) 假定条件 两个匹配的大样本(n1 30和n2  30) 两个总体各观察值的配对差服从正态分布
两个总体均值之差d =1-2在1- 置信水平下的置信区间为 对应差值的标准差 对应差值的均值 No:83 5/5/2009

84 两个总体均值之差的估计 (匹配小样本) 假定条件 两个匹配的大样本(n1< 30和n2 < 30)
两个总体各观察值的配对差服从正态分布 两个总体均值之差d=1-2在1- 置信水平下的置信区间为 No:84 5/5/2009

85 两个总体均值之差的估计 (例题分析) 【例】由10名学生组成一个随机样本,让他们分别采用A和B两套试卷进行测试,结果如下表 。试建立两种试卷分数之差d=1-2 95%的置信区间 10名学生两套试卷的得分 学生编号 试卷A 试卷B 差值d 1 78 71 7 2 63 44 19 3 72 61 11 4 89 84 5 6 91 74 17 49 51 -2 68 55 13 8 76 60 16 9 85 77 10 39 No:85 5/5/2009

86 两个总体均值之差的估计 (例题分析) 解: 根据样本数据计算得 两种试卷所产生的分数之差的置信区间为6.33分 ~15.67分
No:86 5/5/2009 90

87 Section 3(p411): Confidence interval for the difference between two means (un-pooled data) Independent Samples No:87 5/5/2009

88 12.5 General CI for Difference Between Two Means (Indep)
A CI for the Difference Between Two Means (Independent Samples): where t* is the value in a t-distribution with area between -t* and t* equal to the desired confidence level. The df used depends on if equal population variances are assumed. No:88 5/5/2009

89 Degrees of Freedom p411 The t-distribution is only approximately correct and df formula is complicated (Welch’s approx): Statistical software can use the above approximation, but if done by-hand then use a conservative df = smaller of n1 – 1 and n2 – 1. No:89 5/5/2009

90 Necessary Conditions p412
Two samples must be independent. Either … Populations of measurements both bell-shaped, and random samples of any size are measured. or … Large (n  30) random samples are measured. No:90 5/5/2009

91 Example 12.8 Effect of a Stare on Driving p412
Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection. No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7 Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8 Task: Make a 95% CI for the difference between the mean crossing times for the two populations represented by these two independent samples. No:91 5/5/2009

92 Example 12.8 Effect of a Stare on Driving
Checking Conditions: Boxplots show … No outliers and no strong skewness. Crossing times in stare group generally faster and less variable. No:92 5/5/2009

93 Example 12.8 Effect of a Stare on Driving
Note: The df = 21 was reported by the computer package based on the Welch’s approximation formula. The 95% confidence interval for the difference between the population means is 0.14 seconds to 1.93 seconds . No:93 5/5/2009

94 Section 4(p414): Confidence interval for the difference between two means ( pooled data) Independent Samples No:94 5/5/2009

95 Equal Variance Assumption p414
Often reasonable to assume the two populations have equal population standard deviations, or equivalently, equal population variances: Estimate of this variance based on the combined or “pooled” data is called the pooled variance. The square root of the pooled variance is called the pooled standard deviation: No:95 5/5/2009

96 Pooled Standard Error Note: Pooled df = (n1 – 1) + (n2 – 1) = (n1 + n2 – 2). No:96 5/5/2009

97 Pooled Confidence Interval
Pooled CI for the Difference Between Two Means (Independent Samples): where t* is found using a t-distribution with df = (n1 + n2 – 2) and sp is the pooled standard deviation. No:97 5/5/2009

98 Example 12.9 Male and Female Sleep Times p415
Q: How much difference is there between how long female and male students slept the previous night? Data: The 83 female and 65 male responses from students in an intro stat class. Task: Make a 95% CI for the difference between the two population means sleep hours for females versus males. Note: We will assume equal population variances. No:98 5/5/2009

99 Example 12.9 Male and Female Sleep Times
Two-sample T for sleep [with “Assume Equal Variance” option] Sex N Mean StDev SE Mean Female Male Difference = mu (Female) – mu (Male) Estimate for difference: % CI for difference: (-0.103, 1.025) T-Test of difference = 0 (vs not =): T-Value = P = DF = 146 Both use Pooled StDev = 1.72 Notes: Two sample standard deviations are very similar. Sample mean for females higher than for males. 95% confidence interval contains 0 so cannot rule out that the population means may be equal. No:99 5/5/2009

100 Example 12.9 Male and Female Sleep Times
Pooled standard deviation and pooled standard error “by-hand”: No: /5/2009

101 Pooled or Unpooled? P If sample sizes are equal, the pooled and unpooled standard errors are equal. If sample standard deviations similar, assumption of equal population variance is reasonable and pooled procedure can be used. If sample sizes are very different, pooled test can be quite misleading unless sample standard deviations are similar. If the smaller standard deviation accompanies the larger sample size, we do not recommend using the pooled procedure. If sample sizes are very different, the standard deviations are similar, and the larger sample size produced the larger standard deviation, the pooled procedure is acceptable because it will be conservative. No: /5/2009

102 两个总体均值之差的估计 (大样本) 两个总体都服从正态分布,12、 22已知
1. 假定条件 两个总体都服从正态分布,12、 22已知 若不是正态分布, 可以用正态分布来近似(n130和n230) 两个样本是独立的随机样本 使用正态分布统计量 z No: /5/2009

103 两个总体均值之差的估计 (大样本) 12、 22未知时,两个总体均值之差1-2在1- 置信水平下的置信区间为
两个总体均值之差的估计 (大样本) 1. 12, 22已知时,两个总体均值之差1-2在1- 置信水平下的置信区间为 12、 22未知时,两个总体均值之差1-2在1- 置信水平下的置信区间为 No: /5/2009

104 两个总体均值之差的估计 (小样本: 12= 22 )
两个总体均值之差的估计 (小样本: 12= 22 ) 1. 假定条件 两个总体都服从正态分布 两个总体方差未知但相等:12=22 两个独立的小样本(n1<30和n2<30) 总体方差的合并估计量 估计量x1-x2的抽样标准差 No: /5/2009

105 两个总体均值之差的估计 (小样本: 12=22 )
两个总体均值之差的估计 (小样本: 12=22 ) 两个样本均值之差的标准化 两个总体均值之差1-2在1- 置信水平下的置信区间为 No: /5/2009

106 两个总体均值之差的估计 (小样本: 12 22 )
两个总体均值之差的估计 (小样本: 12 22 ) 1. 假定条件 两个总体都服从正态分布 两个总体方差未知且不相等:1222 两个独立的小样本(n1<30和n2<30) 使用统计量 No: /5/2009

107 两个总体均值之差的估计 (小样本: 1222 )
两个总体均值之差的估计 (小样本: 1222 ) 两个总体均值之差1-2在1- 置信水平下的置信区间为 自由度 No: /5/2009

108 两个总体均值之差的估计 (匹配大样本) 假定条件 两个匹配的大样本(n1 30和n2  30) 两个总体各观察值的配对差服从正态分布
两个总体均值之差d =1-2在1- 置信水平下的置信区间为 对应差值的标准差 对应差值的均值 No: /5/2009

109 两个总体均值之差的估计 (匹配小样本) 假定条件 两个匹配的大样本(n1< 30和n2 < 30)
两个总体各观察值的配对差服从正态分布 两个总体均值之差d=1-2在1- 置信水平下的置信区间为 No: /5/2009

110 两个总体均值之差的估计 (例题分析) 【例】某地区教育委员会想估计两所中学的学生高考时的英语平均分数之差,为此在两所中学独立抽取两个随机样本,有关数据如右表 。建立两所中学高考英语平均分数之差95%的置信区间 两个样本的有关数据 中学1 中学2 n1=46 n1=33 S1=5.8 S2=7.2 No: /5/2009

111 两个总体均值之差的估计 (例题分析) 解: 两个总体均值之差在1-置信水平下的置信区间为 两所中学高考英语平均分数之差的置信区间为
5.03分~10.97分 No: /5/2009 90

112 两个总体均值之差的估计 (例题分析) 【例】为估计两种方法组装产品所需时间的差异,分别对两种不同的组装方法各随机安排12名工人,每个工人组装一件产品所需的时间(分钟)下如表。假定两种方法组装产品的时间服从正态分布,且方差相等。试以95%的置信水平建立两种方法组装产品所需平均时间差值的置信区间 两个方法组装产品所需的时间 方法1 方法2 28.3 36.0 27.6 31.7 30.1 37.2 22.2 26.0 29.0 38.5 31.0 32.0 37.6 34.4 33.8 31.2 32.1 28.0 20.0 33.4 28.8 30.0 30.2 26.5 2 1 No: /5/2009

113 两个总体均值之差的估计 (例题分析) 解: 根据样本数据计算得 合并估计量为: 两种方法组装产品所需平均时间之差的置信区间为
0.14分钟~7.26分钟 No: /5/2009 90

114 两个总体均值之差的估计 (例题分析) 【例】沿用前例。假定第一种方法随机安排12名工人,第二种方法随机安排名工人,即n1=12,n2=8 ,所得的有关数据如表。假定两种方法组装产品的时间服从正态分布,且方差不相等。以95%的置信水平建立两种方法组装产品所需平均时间差值的置信区间 两个方法组装产品所需的时间 方法1 方法2 28.3 36.0 27.6 31.7 30.1 37.2 22.2 26.5 29.0 38.5 31.0 37.6 34.4 33.8 32.1 28.0 20.0 28.8 30.0 30.2 2 1 No: /5/2009

115 两个总体均值之差的估计 (例题分析) 解: 根据样本数据计算得 自由度为: No: /5/2009 90

116 两种方法组装产品所需平均时间之差的置信区间为 0.192分钟~9.058分钟
No: /5/2009

117 两个总体均值之差的估计 (例题分析) 【例】由10名学生组成一个随机样本,让他们分别采用A和B两套试卷进行测试,结果如下表 。试建立两种试卷分数之差d=1-2 95%的置信区间 10名学生两套试卷的得分 学生编号 试卷A 试卷B 差值d 1 78 71 7 2 63 44 19 3 72 61 11 4 89 84 5 6 91 74 17 49 51 -2 68 55 13 8 76 60 16 9 85 77 10 39 No: /5/2009

118 两个总体均值之差的估计 (例题分析) 解: 根据样本数据计算得 两种试卷所产生的分数之差的置信区间为6.33分 ~15.67分
No: /5/2009 90

119 for difference between two proportions (independent sample)
Section 5: p418 for difference between two proportions (independent sample) No: /5/2009

120 12.6 The Difference Between Two Proportions (Indep)
A CI for the Difference Between Two Proportions (Independent Samples): where z* is the value of the standard normal variable with area between -z* and z* equal to the desired confidence level. No: /5/2009

121 Necessary Conditions Condition 1: Sample proportions are available based on independent, randomly selected samples from the two populations. Condition 2: All of the quantities – – are at least 5 and preferably at least 10. No: /5/2009

122 Example 12.10 Snoring and Heart Attacks p419
Q: Is there a relationship between snoring and risk of heart disease? Data: Of 1105 snorers, 86 had heart disease. Of 1379 nonsnorers, 24 had heart disease. No: /5/2009

123 Example 12.10 Snoring and Heart Attacks
Note: the higher the level of confidence, the wider the interval. It appears that the proportion of snorers with heart disease in the population is about 4% to 8% higher than the proportion of nonsnorers with heart disease. Risk of heart disease for snorers is about 4.5 times what the risk is for nonsnorers. No: /5/2009

124 Example 12.11 How often do you wear a seatbelt when driving a car?
No: /5/2009

125 两个总体比率之差的估计 (例题分析) 【例】在某个电视节目的收视率调查中,农村随机调查了400人,有32%的人收看了该节目;城市随机调查了500人,有45%的人收看了该节目。试以90%的置信水平估计城市与农村收视率差别的置信区间 1 2 No: /5/2009

126 两个总体比率之差的估计 (例题分析) 1- =95%, z/2=1.96 1- 2置信度为95%的置信区间为
两个总体比率之差的估计 (例题分析) 解: 已知 n1=500 ,n2=400, p1=45%, p2=32%, 1- =95%, z/2=1.96 1- 2置信度为95%的置信区间为 城市与农村收视率差值的置信区间为6.68%~19.32% No: /5/2009 90

127 In summary p424 No: /5/2009

128 Note :12.1 Examples of Different Estimation Situations p392
Situation 1. Estimating the proportion falling into a category of a categorical variable. Example research questions: What proportion of American adults believe there is extraterrestrial life? In what proportion of British marriages is the wife taller than her husband? Population parameter: p = proportion in the population falling into that category. Sample estimate: = proportion in the sample falling into that category. No: /5/2009

129 More Estimation Situations
Situation 2. Estimating the mean of a quantitative variable. Example research questions: What is the mean time that college students watch TV per day? What is the mean pulse rate of women? Population parameter: m (spelled “mu” and pronounced “mew”) = population mean for the variable Sample estimate: = the sample mean for the variable No: /5/2009

130 More Estimation Situations
Situation 3. Estimating the difference between two populations with regard to the proportion falling into a category of a qualitative variable. Example research questions: How much difference is there between the proportions that would quit smoking if taking the antidepressant buproprion (Zyban) versus if wearing a nicotine patch? How much difference is there between men who snore and men who don’t snore with regard to the proportion who have heart disease? Population parameter: p1 – p2 = difference between the two population proportions. Sample estimate: = difference between the two sample proportions. No: /5/2009

131 More Estimation Situations
Situation 4. Estimating the difference between two populations with regard to the mean of a quantitative variable. Example research questions: How much difference is there in average weight loss for those who diet compared to those who exercise to lose weight? How much difference is there between the mean foot lengths of men and women? Population parameter: m1 – m2 = difference between the two population means. Sample estimate: = difference between the two sample means. No: /5/2009

132 Note :Independent Samples
Two samples are called independent samples when the measurements in one sample are not related to the measurements in the other sample. Random samples taken separately from two populations and same response variable is recorded. One random sample taken and a variable recorded, but units are categorized to form two populations. Participants randomly assigned to one of two treatment conditions, and same response variable is recorded. No: /5/2009

133 Note :Paired Data: A Special Case of One Mean
Paired data (or paired samples): when pairs of variables are collected. Only interested in population (and sample) of differences, and not in the original data. Each person measured twice. Two measurements of same characteristic or trait are made under different conditions. Similar individuals are paired prior to an experiment. Each member of a pair receives a different treatment. Same response variable is measured for all individuals. Two different variables are measured for each individual. Interested in amount of difference between two variables. No: /5/2009

134 Note :12.2 Standard Errors Rough Definition: The standard error of a sample statistic measures, roughly, the average difference between the statistic and the population parameter. This “average difference” is over all possible random samples of a given size that can be taken from the population. Technical Definition: The standard error of a sample statistic is the estimated standard deviation of the sampling distribution for the statistic. No: /5/2009

135 Standard Error of a Sample Proportion
Example Intelligent Life on Other Planets Poll: Random sample of 935 Americans Do you think there is intelligent life on other planets? Results: 60% of the sample said “yes”, = .60 The standard error of .016 is roughly the average difference between the statistic, , and the population parameter, p, for all possible random samples of n = 935 from this population. No: /5/2009

136 Standard Error of a Sample Mean
Example Mean Hours Watching TV Poll: Class of 175 students. In a typical day, about how much time to you spend watching television? Variable N Mean Median TrMean StDev SE Mean TV No: /5/2009

137 Standard Error of the Difference Between Two Sample Proportions
Example Patches vs Antidepressant (Zyban)? Study: n1 = n2 = 244 randomly assigned to each treatment Zyban: 85 of the 244 Zyban users quit smoking = .348 Patch: 52 of the 244 patch users quit smoking = .213 So, No: /5/2009

138 Standard Error of the Difference Between Two Sample Means
Example Lose More Weight by Diet or Exercise? Study: n1 = 42 men on diet, n2 = 47 men on exercise routine Diet: Lost an average of 7.2 kg with std dev of 3.7 kg Exercise: Lost an average of 4.0 kg with std dev of 3.9 kg So, No: /5/2009

139 Note 样本容量的确定 一、估计总体均值时样本容量的确定 二、估计总体比率时样本容量的确定 三、估计总体均值之差时样本容量的确定
四、估计总体比率之差时样本容量的确定 No: /5/2009 9

140 估计总体均值时样本容量的确定 其中: 估计总体均值时样本容量n为 样本容量n与总体方差 2、允许误差E、可靠性系数Z或t之间的关系为
与总体方差成正比 与允许误差成反比 与可靠性系数成正比 其中: No: /5/2009

141 估计总体均值时样本容量的确定 (例题分析)
估计总体均值时样本容量的确定 (例题分析) 【例】拥有工商管理学士学位的大学毕业生年薪的标准差大约为2000元,假定想要估计年薪95%的置信区间,希望允许误差为400元,应抽取多大的样本容量? No: /5/2009

142 估计总体均值时样本容量的确定 (例题分析)
估计总体均值时样本容量的确定 (例题分析) 解: 已知 =2000,E=400, 1-=95%, z/2=1.96 应抽取的样本容量为 即应抽取97人作为样本 No: /5/2009 90

143 估计总体比率时样本容量的确定 E的取值一般小于0.1  未知时,可取最大值0.5 其中: 根据比率区间估计公式可得样本容量n为
No: /5/2009

144 估计两个总体均值之差时 样本容量的确定 其中: 设n1和n2为来自两个总体的样本,并假定n1=n2
No: /5/2009

145 估计两个总体均值之差时样本容量的确定 (例题分析)
估计两个总体均值之差时样本容量的确定 (例题分析) 【例】一所中学的教务处想要估计试验班和普通班考试成绩平均分数差值的置信区间。要求置信水平为95%,预先估计两个班考试分数的方差分别为:试验班12=90 ,普通班 22=120 。如果要求估计的误差范围(允许误差)不超过5分,在两个班应分别抽取多少名学生进行调查? No: /5/2009

146 估计两个总体均值之差时样本容量的确定 (例题分析)
估计两个总体均值之差时样本容量的确定 (例题分析) 解: 已知12=90,22=120,E=5, 1-=95%, z/2=1.96 即应抽取33人作为样本 No: /5/2009

147 估计两个总体比率之差时 样本容量的确定 其中: 设n1和n2为来自两个总体的样本,并假定n1=n2
No: /5/2009

148 估计两个总体比率之差时样本容量的确定 (例题分析)
估计两个总体比率之差时样本容量的确定 (例题分析) 【例】一家瓶装饮料制造商想要估计顾客对一种新型饮料认知的广告效果。他在广告前和广告后分别从市场营销区各抽选一个消费者随机样本,并询问这些消费者是否听说过这种新型饮料。这位制造商想以10%的误差范围和95%的置信水平估计广告前后知道该新型饮料消费者的比率之差,他抽取的两个样本分别应包括多少人?(假定两个样本容量相等) 绿色 健康饮品 No: /5/2009

149 估计两个总体比率之差时样本容量的确定 (例题分析)
估计两个总体比率之差时样本容量的确定 (例题分析) 解: E=10%, 1-=95%,z/2=1.96,由于没有的信息,用0.5代替 即应抽取193位消费者作为样本 No: /5/2009

150 Note :两个总体方差比的区间估计 1. 比较两个总体的方差比 用两个样本的方差比来判断
1. 比较两个总体的方差比 用两个样本的方差比来判断 如果S12/ S22接近于1,说明两个总体方差很接近 如果S12/ S22远离1,说明两个总体方差之间存在差异 总体方差比在1-置信水平下的置信区间为 No: /5/2009

151 两个总体方差比的区间估计 (图示) F F1-  F  总体方差比 1-的置信区间 方差比置信区间示意图
In this diagram, do the populations have equal or unequal variances? Unequal. No: /5/2009 38

152 两个总体方差比的区间估计 (例题分析) 【例】为了研究男女学生在生活费支出(元)上的差异,在某大学各随机抽取25名男学生和25名女学生,得到下面的结果: 男学生: 女学生: 试以90%置信水平估计男女学生生活费支出方差比的置信区间 No: /5/2009

153 两个总体方差比的区间估计 (例题分析) 解:根据自由度 n1=25-1=24 ,n2=25-1=24,查得 F/2(24)=1.98, F1-/2(24)=1/1.98=0.505 12 /22置信度为90%的置信区间为 男女学生生活费支出方差比的置信区间为0.47~1.84 No: /5/2009 90

154 No: /5/2009

155 No: /5/2009

156 No: /5/2009

157 No: /5/2009

158 No: /5/2009

159 补充 考研题 No: /5/2009


Download ppt "Estimating Chapter 10& Chapter 12 No:1 5/5/2009."

Similar presentations


Ads by Google