Estimating Chapter 10& Chapter 12 No:1 5/5/2009.

Slides:



Advertisements
Similar presentations
广州市教育局教学研究室英语科 Module 1 Unit 2 Reading STANDARD ENGLISH AND DIALECTS.
Advertisements

Which TV program is the video? 中国达人秀 China’s Got Talent 选秀节目 talent show talent n. 天资;天赋.
期末考试作文讲解 % 的同学赞成住校 30% 的学生反对住校 1. 有利于培养我们良好的学 习和生活习惯; 1. 学生住校不利于了解外 界信息; 2 可与老师及同学充分交流有 利于共同进步。 2. 和家人交流少。 在寄宿制高中,大部分学生住校,但仍有一部分学生选 择走读。你校就就此开展了一次问卷调查,主题为.
考研英语复试 口语准备 考研英语口语复试. 考研英语复试 口语准备 服装 谦虚、微笑、自信 态度积极 乐观沉稳.
英语中考复习探讨 如何写好书面表达 宁波滨海学校 李爱娣. 近三年中考试题分析 评分标准 试卷评分与练习 (2009 年书面表达为例 ) 影响给分的因素: 存在问题 书面表达高分技巧 建议.
2014 年上学期 湖南长郡卫星远程学校 制作 13 Getting news from the Internet.
Time Objectives By the end of this chapter, you will be able to
Unit 9 Have you ever been to an amusement park? Section A.
-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么?
Measures of location and dispersion
专题八 书面表达.
How can we become good leamers
Chapter 8 Liner Regression and Correlation 第八章 直线回归和相关
Academic Year TFC EFL Data Collection Outline 学年美丽中国英语测试数据收集概述
P42) be dying to do渴望做某事 L2) hear from sb 收到某人来信
专题讲座 武强中学外语组 制作:刘瑞红.
WRITNG Welcome to enjoy English..
Euler’s method of construction of the Exponential function
Unit 4 I used to be afraid of the dark.
Thinking of Instrumentation Survivability Under Severe Accident
Population proportion and sample proportion
Descriptive statistics
模式识别 Pattern Recognition
What are samples?. Chapter 6 Introduction to Inferential Statistics Sampling and Sampling Designs.
Continuous Probability Distributions
Properties of Continuous probability distributions
Sampling Theory and Some Important Sampling Distributions
Guide to Freshman Life Prepared by Sam Wu.
Time Objectives By the end of this chapter, you will be able to
製程能力分析 何正斌 教授 國立屏東科技大學工業管理學系.
但是如果你把它发给最少两个朋友。。。你将会有3年的好运气!!!
This Is English 3 双向视频文稿.
Chapter 7 Sampling and Sampling Distributions
Interval Estimation區間估計
Time Objectives By the end of this chapter, you will be able to
Lesson 44:Popular Sayings
Chapter 3 Nationality Objectives:
Try to write He Mengling Daqu Middle School.
十七課 選課(xuǎn kè) 十七课 选课(xuǎn kè)
第十五课:在医院看病.
英语教学课件 九年级全.
SectionA(Grammar Focus-4c)
统 计 学 (第三版) 2008 作者 贾俊平 统计学.
Objective Clauses (宾语从句)
Unit 8 Our Clothes Topic1 What a nice coat! Section D 赤峰市翁牛特旗梧桐花中学 赵亚平.
Unit 5 Reading A Couch Potato.
Guide to a successful PowerPoint design – simple is best
相關統計觀念復習 Review II.
Good Karma 善業 原稿:牛Sir 配楽:懺悔經 捕頭恭製 按鍵換頁.
中央社新聞— <LTTC:台灣學生英語聽說提升 讀寫相對下降>
关联词 Writing.
True friendship is like sound health;
Simple Regression (簡單迴歸分析)
中考英语阅读理解 完成句子命题与备考 宝鸡市教育局教研室 任军利
The Bernoulli Distribution
統 計 學 Power Power of the two-sample t test depends on four factors.
高考应试作文写作训练 5. 正反观点对比.
Good Karma 善因緣 This is a nice reading, but short. Enjoy! This is what The Dalai Lama has to say for All it takes is a few seconds to read and think.
定语从句 ●关系词的意义及作用 : 定语从句一般都紧跟在它所修饰名词后面,所以如果在名词或代词后面出现一个从句,根据它与前面名词或代词的逻辑关系来判断是否是定语从句。
第八章 結論章節.
Review of Statistics.
Good Karma 善因緣 This is a nice reading, but short. Enjoy! This is what The Dalai Lama has to say for All it takes is a few seconds to read and think.
品質管理與實習 : MIL-STD-105E 何正斌 國立屏東科技大學工業管理學系.
何正斌 博士 國立屏東科技大學工業管理研究所 教授
怎樣把同一評估 給與在不同班級的學生 How to administer the Same assessment to students from Different classes and groups.
Class imbalance in Classification
簡單迴歸分析與相關分析 莊文忠 副教授 世新大學行政管理學系 計量分析一(莊文忠副教授) 2019/8/3.
Sun-Star第六届全国青少年英语口语大赛 全国总决赛 2015年2月 北京
Gaussian Process Ruohua Shi Meeting
Presentation transcript:

Estimating Chapter 10& Chapter 12 No:1 5/5/2009

Review 1.Random Variables 2.Sampling Distributions for Sample Proportions 3. Sampling Distributions for Sample Means 4. What to Expect in Other Situations: CLT 5.Sampling Distribution for Any Statistic No:2 5/5/2009

Learning objectives We will learn how to use a realized value of a sample statistics to guess about the unknown value of the corresponding parameter. It is called estimation 1. Point Estimating: a single estimate of a population made by looking at sample statistics. [1] Moment Method, (MM) [2] Maximum Likelihood Estimate (MLE) No:3 5/5/2009

2. Interval Estimating: we can construct an interval estimate which adds more confidence to our estimation of the population mean or proportion. Confidence interval, an interval of estimates that is likely to capture the population value. The primary objective of this chapter is to describe how to calculate and interpret a confidence interval. No:4 5/5/2009

Learning Contents , Emphases and Difficulties Confidence interval estimate [1] for one proportion [2] for one mean (or mean difference of pairs) [3] for difference between two means (un-pooled) [4] for difference between two means (pooled) [5] for difference between two proportions (independent sample) No:5 5/5/2009

Teaching methods Both English and Chinese Both PPT and writing on blackboard No:6 5/5/2009

estimating proportions with confidence intervals Section 1: estimating proportions with confidence intervals No:7 5/5/2009

10.1 The Language and Notation of Estimation P330 1.Unit: an individual person or object to be measured. 2.Population (or universe): the entire collection of units about which we would like information or the entire collection of measurements we would have if we could measure the whole population. 3.Sample: the collection of units we will actually measure or the collection of measurements we will actually obtain. 4.Sample size: the number of units or measurements in the sample, denoted by n. No:8 5/5/2009

More Language and Notation of Estimation 5.Population proportion: the fraction of the population that has a certain trait/characteristic or the probability of success in a binomial experiment – denoted by p. The value of the parameter p is not known. 6.Sample proportion: the fraction of the sample that has a certain trait/characteristic – denoted by . The statistic is an estimate of p. 7.A Random selected sample : 8.The Fundamental Rule for Using Data for Inference is that available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to the question(s) of interest. No:9 5/5/2009

9.Margin of Error Margin of Error: This measure of accuracy in the sample surveys is a number called the margin of error. In other words: a number that provides a likely upper limit for the difference between the sample proportion and the unknown population proportion. The margin of error provided in Media Descriptions of survey results has these characteristics No:10 5/5/2009

Characteristics p331 The difference between the sample proportion and the population proportion is less than the margin of error about 95% of the time, or for about 19 of every 20 sample estimates. The difference between the sample proportion and the population proportion is more than the margin of error about 5% of the time, or for about 1 of every 20 sample estimates In other words, for most sample estimate, the actual error is quite likely to be smaller than the margin of error. No:11 5/5/2009

Example 10.1 Teens and Interracial Dating 1997 USA Today/Gallup Poll of teenagers across country: 57% of the 497 teens who go out on dates say they’ve been out with someone of another race or ethnic group. How to use sample data to provide an interval of values that the researcher is confident covers the true value for the population.? Reported margin of error for this estimate was about 4.5%. No:12 5/5/2009

In surveys of this size, the difference between the sample estimate of 57% and the true percent is likely* to be less than 4.5% one way or the other. There is, however, a small chance that the sample estimate might be off by more than 4.5%. * The value of how ‘likely’ is often 95%. No:13 5/5/2009

10.3 Confidence Intervals P332 Confidence interval: an interval of values computed from sample data that is likely to include the true population value. the phrase confidence level is used to describe the chance that an interval actually contains the true population value in the following sense. Most of the time( quantified by the confidence level) intervals computed in this way will capture the truth about the population, but occasionally they will not. In any given instance, the interval either captures the truth or it does not ,but we will never know which is the case No:14 5/5/2009

Therefore , our confidence is in the procedure—it works most of the time– and the “confidence level” or “level of confidence” is the percentage of the time we expect it to work. No:15 5/5/2009

In summary Interpreting the Confidence Level The confidence level is the probability that the procedure used to determine the interval will provide an interval that includes the population parameter. If we consider all possible randomly selected samples of same size from a population, the confidence level is the fraction or percent of those samples for which the confidence interval includes the population parameter. Note: Often express the confidence level as a percent. Common levels are 90%, 95%, 98%, and 99%. No:16 5/5/2009

Note Be careful when giving information about a specific confidence interval computed from an observed sample. The confidence level only expresses how often the confidence interval procedure works in the long run p333 No:17 5/5/2009

Interpretation of Confidence Intervals Repeated samples of size n taken from the same population will generate (1–a)% of the time a sample statistic that falls within the stated confidence interval. OR Based on this sample, we have (1–a)% confident that the population parameter falls within the stated confidence interval. Be careful: The confidence level only expresses how often the procedure works in the long run. Any one specific interval either does or does not include the true unknown population value. No:18 5/5/2009

置信区间 (confidence interval) 1.由样本统计量所构造的总体参数的估计区间称为置信区间 总体参数的真值是固定的,未知的,而用样本构造的区间则是不固定的。抽取不同的样本时,用该方法可以得到不同的区间,从这个意义上说,置信区间是一个随机区间,它会因样本的不同而不同。 2.统计学家在某种程度上确信这个区间会包含真正的总体参数,所以给它取名为置信区间 例如:95% 考试成绩置信区间 (60,80),不能说(60,80)这个区间以95%的概率包含全班考试成绩的真值,只是知道在多次抽样中有95%的样本得到的 No:19 5/5/2009

所以用一个具体的样本所构造的区间是一个特定的区间,我们无法知道这个样本所产生的区间是否包含总体参数的真值 区间包含全班考试成绩的真值。它的真正意义是如果做了100次抽样,大概有95次找到的区间包含真值,有5次找到的区间不含真值。因此,这个概率不是用来描述某个特定的区间包含总体参数真值可能性的,一个特定的区间“总是包含”或“绝对不包含”参数的真值, 不存在“可能包含”或“可能不包含”的问题。 所以用一个具体的样本所构造的区间是一个特定的区间,我们无法知道这个样本所产生的区间是否包含总体参数的真值 我们只能是希望这个区间是大量包含总体参数真值的区间中的一个,但它也可能是少数几个不包含参数真值的区间中的一个 No:20 5/5/2009

There are three types: 1.conduct the confidence interval 2.determine the sample size 3.using confidence interval to guide decisions No:21 5/5/2009

Sample estimate  Margin of error 10.4 Constructing a 95% Confidence Interval for a Population Proportion Sample estimate  Margin of error In the long run, about 95% of all confidence intervals computed in this way will capture the population value of the proportion, and about 5% of them will miss it. No:22 5/5/2009

Confidence Interval on p z ) – 1 ( : × + n n No:23 5/5/2009

p335 For a 95% confidence level, the approximate margin of error for a sample proportion is Note: The “95% margin of error” is simply two standard errors, or 2 s.e.( ). No:24 5/5/2009

Factors that Determine Margin of Error p335 1. The sample size, n. When sample size increases, margin of error decreases. 2. The sample proportion, . If the proportion is close to either 1 or 0 most individuals have the same trait or opinion, so there is little natural variability and the margin of error is smaller than if the proportion is near 0.5. 3. The “multiplier” 2. Connected to the “95%” aspect of the margin of error. Later you’ll learn: the exact value for 95% is 1.96 and how to change the multiplier to change the level. No:25 5/5/2009

Example 10.3 Pollen Count Must Be High p336 Poll: Random sample of 883 American adults. “Are you allergic to anything?” Results: 36% of the sample said “yes”, = .36 95% Confidence Interval: .36  .032, or about .33 to .39 We can be 95% confident that somewhere between 33% and 39% of all adult Americans have allergies. No:26 5/5/2009

The Conservative Estimate of Margin of Error Conservative estimate of the margin of error = It usually overestimates the actual size of the margin of error. It works (conservatively) for all survey questions based on the same sample size, even if the sample proportions differ from one question to the next. Obtained when = .5 in the margin of error formula. No:27 5/5/2009

Example 10.3 Really Bad Allergies (cont) p337 Poll: Random sample of 883 American adults 3% of the sample experience “severe” symptoms 95% (conservative) Confidence Interval: 3%  3.4%, or -0.4% to 6.4% When is far from .5, the conservative margin of error is too conservative. The 95% margin of error using = .03 is just .011 or 1.1%, for an interval from 1.9% to 4.1%. No:28 5/5/2009

10.5 General Format for Confidence Intervals p337 339 For any confidence level, a confidence interval for either a population proportion or a population mean can be expressed as Sample estimate  Multiplier  Standard error The multiplier is affected by the choice of confidence level. No:29 5/5/2009

More about the Multiplier p340 Note: Increase confidence level => larger multiplier. Multiplier, denoted as z*, is the standardized score such that the area between -z* and z* under the standard normal curve corresponds to the desired confidence level. No:30 5/5/2009

Formula for a Confidence Interval for a Population Proportion p is the sample proportion. z* denotes the multiplier. where is the standard error of . No:31 5/5/2009

Example 10.6 Intelligent Life Elsewhere? 1.conduct the interval p340 Poll: Random sample of 935 Americans Do you think there is intelligent life on other planets? Results: 60% of the sample said “yes”, = .60 90% Confidence Interval: .60  1.65(.016), or .60  .026 98% Confidence Interval: .60  2.33(.016), or .60  .037 Note: entire interval is above 50% => high confidence that a majority believe there is intelligent life. No:32 5/5/2009

Example 10.6 Intelligent Life Elsewhere? Poll: Random sample of 935 Americans “Do you think there is intelligent life on other planets? Results: 60% of the sample said “yes”, = .60 We want a 50% confidence interval. If the area between -z* and z* is .50, then the area to the left of z* is .75. From Table A.1 we have z*  .67. 50% Confidence Interval: .60  .67(.016), or .60  .011 Note: Lower confidence level results in a narrower interval. No:33 5/5/2009

总体比率的区间估计 (例题分析) 解:已知 n=100,p=65% , 1- = 95%,z/2=1.96 【例】某城市想要估计下岗职工中女性所占的比率,随机地抽取了100名下岗职工,其中65人为女性职工。试以95%的置信水平估计该城市下岗职工中女性比率的置信区间 该城市下岗职工中女性比率的置信区间为55.65%~74.35% No:34 5/5/2009

Conditions for Using the Formula p341 1. Sample is randomly selected from the population. Note: Available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to the question(s) of interest. 2. Normal curve approximation to the distribution of possible sample proportions assumes a “large” sample size. Both and should be at least 10 (although some say these need only to be at least 5). No:35 5/5/2009

10.6 Choosing a Sample Size (2.determine the sample size) p341 Table provides 95% conservative margin of error for various sample sizes n Important features: 1. When sample size is increased, margin of error decreases. 2. When a large sample size is made even larger, the improvement in accuracy is relatively small. No:36 5/5/2009

The Effect of Population Size For most surveys, the number of people in the population has almost no influence* on the accuracy of sample estimates. Margin of error for a sample size of 1000 is about 3% whether the number of people in the population is 30,000 or 200 million. * As long as the population is at least ten times as large as the sample. No:37 5/5/2009

Sample Size Determination for p from an Infinite Population Proportion: Note e, the bound within which you want to estimate p, is given. The interval half-width is e, also called the maximum likely error: Solving for n, we find: 2 ) – 1 ( e p z n = × No:38 5/5/2009

Sample Size Determination for p from a Finite Population Mean: Note e, the bound within which you want to estimate µ, is given. where n = required sample size N = population size z = z-score for (1–a)% confidence p = sample estimator of p n = p ( 1 – ) e 2 z + N No:39 5/5/2009

Example 2 (2.determine the sample size) A student guild whishes to estimate the proportion of students who would support the “voluntary guild fee” proposal being debated, what sample size is necessary to estimate the true level of support to within 5% at the 90% confidence level? No:40 5/5/2009

估计总体比率时样本容量的确定 (例题分析) 估计总体比率时样本容量的确定 (例题分析) 解:已知=90%,=0.05, z/2=1.96,E=5% 【例】根据以往的生产统计,某种产品的合格率约为90%,现要求允许误差为5%,在求95%的置信区间时,应抽取多少个产品作为样本? 应抽取的样本容量为 应抽取139个产品作为样本 No:41 5/5/2009

10. 7 Using Confidence Intervals. to Guide Decisions p344 3 10.7 Using Confidence Intervals to Guide Decisions p344 3.using confidence interval to guide decisions Principle 1. A value not in a confidence interval can be rejected as a possible value of the population proportion. A value in a confidence interval is an “acceptable” possibility for the value of a population proportion. Principle 2. When the confidence intervals for proportions in two different populations do not overlap, it is reasonable to conclude that the two population proportions are different. No:42 5/5/2009

Example 10.7 Which Drink Tastes Better? Taste Test: A sample of 60 people taste both drinks and 55% like taste of Drink A better than Drink B. Makers of Drink A want to advertise these results. Makers of Drink B make a 95% confidence interval for the population proportion who prefer Drink A. 95% Confidence Interval: Note: Since .50 is in the interval, there is not enough evidence to claim that Drink A is preferred by a majority of population represented by the sample. No:43 5/5/2009

Case Study 10.1 ESP Works with Movies p345 ESP Study by Bem and Honorton (1994) Subjects (receivers) described what another person (sender) was seeing on a screen. Receivers shown 4 pictures, asked to pick which they thought sender had actually seen. Actual image shown randomly picked from 4 choices. Image was either a single, “static” image or a “dynamic” short video clip, played repeatedly (additional three choices shown were always of the same type as actual. No:44 5/5/2009

Case Study 10.1 ESP Works (cont) Bem and Honorton (1994) ESP Study Results Is there enough evidence to say that the % of correct guesses for dynamic pictures is significantly above 25%? 95% CI: Can claim the true % of correct guesses is significantly better than what would occur from random guessing. No:45 5/5/2009

Case Study 10.2 Nicotine Patches vs Zybanp346 Study: New England Journal of Medicine 3/4/99) 893 participants randomly allocated to four treatment groups: placebo, nicotine patch only, Zyban only, and Zyban plus nicotine patch. Participants blinded: all used a patch (nicotine or placebo) and all took a pill (Zyban or placebo). Treatments used for nine weeks. No:46 5/5/2009

Case Study 10.2 Nicotine (cont) Conclusions: Zyban is effective (no overlap of Zyban and no Zyban CIs) Nicotine patch is not particularly effective (overlap of patch and no patch CIs) No:47 5/5/2009

Case Study 10.3 What a Great Personalityp347 Would you date someone with a great personality even though you did not find them attractive? Women: 61.1% of 131 answered “yes.” 95% confidence interval is 52.7% to 69.4%. Men: 42.6% of 61 answered “yes.” 95% confidence interval is 30.2% to 55%. Conclusions: Higher proportion of women would say yes. CIs slightly overlap Women CI narrower than men CI due to larger sample size No:48 5/5/2009

In Summary: Confidence Interval for a Population Proportion p General CI for p: Approximate 95% CI for p: Conservative 95% CI for p: No:49 5/5/2009

In summary 1.conduct the interval 2.determine the sample size 3.using confidence interval to guide decisions No:50 5/5/2009

Confidence intervals for the sample mean p405 Chapter 12 Section 4 No:51 5/5/2009

Teaching methods Both English and Chinese Both PPT and writing on blackboard No:52 5/5/2009

Review Confidence interval, an interval of estimates that is likely to capture the population value. Confidence level is the probability that the procedure used to determine the interval will provide an interval that includes the population parameter Confidence interval estimate for one proportion( conduct the confidence interval, determine the sample size, using confidence interval to guide decisions) No:53 5/5/2009

12.4 Confidence intervals for the sample mean p405 No:54 5/5/2009

Learning objectives In this section, we describe how to determine a confidence interval for the population mean using a sample of any size, large or small, and with any confidence level No:55 5/5/2009

Learning Contents Confidence interval estimate [12.4.1] for a mean [12.4.2]for the mean difference in paired variables [12.4.3] for difference between two means (un-pooled) [12.4.4] for difference between two means (pooled) No:56 5/5/2009

Emphases and Difficulties Conditions required for using the t confidence interval or z confidence interval confidence interval for difference between two means (paired samples, independent samples, pooled samples, un-pooled samples) No:57 5/5/2009

Confidence interval estimate for a mean Section 12.4.1: Confidence interval estimate for a mean No:58 5/5/2009

We can use the general format of a confidence interval : Sample estimate  Multiplier  Standard error The multiplier is affected by the choice of confidence level. No:59 5/5/2009

A Confidence Interval for a Population Mean where the multiplier t* is the value in a t-distribution with degrees of freedom = df = n – 1 where the multiplier z* is the value in a normal distribution No:60 5/5/2009

such that the area between -t. and t such that the area between -t* and t* equals the desired confidence level. No:61 5/5/2009

Conditions for t confidence interval p406-407 Population of measurements is bell-shaped and a random sample of any size is measured; In practice, for small samples, the data show no extreme skewness and should not contain any outliers. Population of measurements is not bell-shaped, but a large random sample is measured, n  30. No:62 5/5/2009

conditions Sample size n z z t z large small Y Y N N  is known? Population bell-shaped? Population bell-shaped? Population bell-shaped? z No:63 5/5/2009

Exercises Decide the nature of distribution for the following (1)samples of 10 with a mean of 12 and a standard deviation of 5 taken form a skewed population (2)sample of 39 with a mean of 60 and a standard deviation of 10 taken from a skewed population (3) sample of 15 taken from a normal population with a mean of 5 and a standard deviation of 1 (4) sample of 10 with a mean 16 and a standard deviation of 2 taken from a normal population No:64 5/5/2009

Example 12.5 Mean Forearm Length p405 Data: Forearm lengths (cm) for a random sample of n = 9 men 25.5, 24.0, 26.5, 25.5, 28.0, 27.0, 23.0, 25.0, 25.0 Step1: checking the conditions N=9<30 since Dotplot shows no obvious skewness and no outliers. So we can assume Population of measurements is bell-shaped No:65 5/5/2009

Step2: calculating the confidence interval Multiplier t* from Table A.2 with df = 8 is t* = 2.31 95% Confidence Interval: 25.5  2.31(.507) => 25.5  1.17 => 24.33 to 26.67 cm No:66 5/5/2009

Step 3: interpreting the confidence intervals Based on this sample, we have 95% confidence that somewhere between 24.33 and 26.67 for Mean Forearm Length No:67 5/5/2009

Example 12.6 What Students Sleep More? p408 Q: How many hours of sleep did you get last night, to the nearest half hour? Class N Mean StDev SE Mean Stat 10 (stat literacy) 25 7.66 1.34 0.27 Stat 13 (stat methods) 148 6.81 1.73 0.14 Step1: checking the conditions : Bell-shape was reasonable for Stat 10 (with smaller n). No:68 5/5/2009

Step2: calculating the confidence interval No:69 5/5/2009

Step 3: interpreting the confidence intervals Interval for Stat 10 is wider (smaller sample size) Two intervals do not overlap => Stat 10 average significantly higher than Stat 13 average. No:70 5/5/2009

总体均值的区间估计 (例题分析) 【 例 】一家食品生产企业以生产袋装食品为主,为对产量质量进行监测,企业质检部门经常要进行抽检,以分析每袋重量是否符合要求。现从某天生产的一批食品中随机抽取了25袋,测得每袋重量如下表所示。已知产品重量的分布服从正态分布,且总体标准差为10g。试估计该批产品平均重量的置信区间,置信水平为95% 25袋食品的重量 112.5 101.0 103.0 102.0 100.5 102.6 107.5 95.0 108.8 115.6 100.0 123.5 101.6 102.2 116.6 95.4 97.8 108.6 105.0 136.8 102.8 101.5 98.4 93.3 No:71 5/5/2009

总体均值的区间估计 (例题分析) 解:已知X~N(,102),n=25, 1- = 95%,z/2=1.96。根据样本数据计算得: 总体均值在1-置信水平下的置信区间为 该食品平均重量的置信区间为101.44g~109.28g No:72 5/5/2009

总体均值的区间估计 (例题分析) 【例】一家保险公司收集到由36投保个人组成的随机样本,得到每个投保人的年龄(周岁)数据如下表。试建立投保人年龄90%的置信区间 36个投保人年龄的数据 23 35 39 27 36 44 42 46 43 31 33 53 45 54 47 24 34 28 40 49 38 48 50 32 No:73 5/5/2009

总体均值的区间估计 (例题分析) 解:已知n=36, 1- = 90%,z/2=1.645。根据样本数据计算得: 总体均值在1- 置信水平下的置信区间为 投保人平均年龄的置信区间为37.37岁~41.63岁 No:74 5/5/2009

总体均值的区间估计 (例题分析) 【例】已知某种灯泡的寿命服从正态分布,现从一批灯泡中随机抽取16只,测得其使用寿命(小时)如下。建立该批灯泡平均使用寿命95%的置信区间 16灯泡使用寿命的数据 1510 1520 1480 1500 1450 1490 1530 1460 1470 No:75 5/5/2009

总体均值的区间估计 (例题分析) 解:已知X~N(,2),n=16, 1- = 95%,t/2=2.131 根据样本数据计算得: , 根据样本数据计算得: , 总体均值在1-置信水平下的置信区间为 该种灯泡平均使用寿命的置信区间为1476.8小时~1503.2小时 No:76 5/5/2009

Note :Converting Confidence Intervals to Accommodate a Finite Population Mean: or No:77 5/5/2009

Confidence interval for the mean difference in paired variables Section 12.4.2 (p409): Confidence interval for the mean difference in paired variables No:78 5/5/2009

Paired Data: A Special Case of One Mean Paired data (or paired samples): when pairs of variables are collected. Only interested in population (and sample) of differences, and not in the original data. Each person measured twice. Two measurements of same characteristic or trait are made under different conditions. Similar individuals are paired prior to an experiment. Each member of a pair receives a different treatment. Same response variable is measured for all individuals. Two different variables are measured for each individual. Interested in amount of difference between two variables. No:79 5/5/2009

Paired Data Confidence Interval Data: two variables for n individuals or pairs; use the difference d = x1 – x2. Population parameter: md = mean of differences for the population = m1 – m2. Sample estimate: = sample mean of the differences Standard deviation and standard error: sd = standard deviation of the sample of differences; Confidence interval for md: , where df = n – 1 for the multiplier t*. No:80 5/5/2009

Example 12.7 Screen Time: Computer vs TV p409 Data: Hours spent watching TV and hours spent on computer per week for n = 25 students. Task: Make a 90% CI for the mean difference in hours spent using computer versus watching TV. Note: Boxplot shows no obvious skewness and no outliers. No:81 5/5/2009

Example 12.7 Screen Time: Computer vs TV Results: Multiplier t* from Table A.2 with df = 24 is t* = 1.71 90% Confidence Interval: 5.36  1.71(3.05) => 5.36  5.22 => 0.14 to 10.58 hours Interpretation: We are 90% confident that the average difference between computer usage and television viewing for students represented by this sample is covered by the interval from 0.14 to 10.58 hours per week, with more hours spent on computer usage than on television viewing. No:82 5/5/2009

两个总体均值之差的估计 (匹配大样本) 假定条件 两个匹配的大样本(n1 30和n2  30) 两个总体各观察值的配对差服从正态分布 两个总体均值之差d =1-2在1- 置信水平下的置信区间为 对应差值的标准差 对应差值的均值 No:83 5/5/2009

两个总体均值之差的估计 (匹配小样本) 假定条件 两个匹配的大样本(n1< 30和n2 < 30) 两个总体各观察值的配对差服从正态分布 两个总体均值之差d=1-2在1- 置信水平下的置信区间为 No:84 5/5/2009

两个总体均值之差的估计 (例题分析) 【例】由10名学生组成一个随机样本,让他们分别采用A和B两套试卷进行测试,结果如下表 。试建立两种试卷分数之差d=1-2 95%的置信区间 10名学生两套试卷的得分 学生编号 试卷A 试卷B 差值d 1 78 71 7 2 63 44 19 3 72 61 11 4 89 84 5 6 91 74 17 49 51 -2 68 55 13 8 76 60 16 9 85 77 10 39 No:85 5/5/2009

两个总体均值之差的估计 (例题分析) 解: 根据样本数据计算得 两种试卷所产生的分数之差的置信区间为6.33分 ~15.67分 No:86 5/5/2009 90

Section 3(p411): Confidence interval for the difference between two means (un-pooled data) Independent Samples No:87 5/5/2009

12.5 General CI for Difference Between Two Means (Indep) A CI for the Difference Between Two Means (Independent Samples): where t* is the value in a t-distribution with area between -t* and t* equal to the desired confidence level. The df used depends on if equal population variances are assumed. No:88 5/5/2009

Degrees of Freedom p411 The t-distribution is only approximately correct and df formula is complicated (Welch’s approx): Statistical software can use the above approximation, but if done by-hand then use a conservative df = smaller of n1 – 1 and n2 – 1. No:89 5/5/2009

Necessary Conditions p412 Two samples must be independent. Either … Populations of measurements both bell-shaped, and random samples of any size are measured. or … Large (n  30) random samples are measured. No:90 5/5/2009

Example 12.8 Effect of a Stare on Driving p412 Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection. No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7 Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8 Task: Make a 95% CI for the difference between the mean crossing times for the two populations represented by these two independent samples. No:91 5/5/2009

Example 12.8 Effect of a Stare on Driving Checking Conditions: Boxplots show … No outliers and no strong skewness. Crossing times in stare group generally faster and less variable. No:92 5/5/2009

Example 12.8 Effect of a Stare on Driving Note: The df = 21 was reported by the computer package based on the Welch’s approximation formula. The 95% confidence interval for the difference between the population means is 0.14 seconds to 1.93 seconds . No:93 5/5/2009

Section 4(p414): Confidence interval for the difference between two means ( pooled data) Independent Samples No:94 5/5/2009

Equal Variance Assumption p414 Often reasonable to assume the two populations have equal population standard deviations, or equivalently, equal population variances: Estimate of this variance based on the combined or “pooled” data is called the pooled variance. The square root of the pooled variance is called the pooled standard deviation: No:95 5/5/2009

Pooled Standard Error Note: Pooled df = (n1 – 1) + (n2 – 1) = (n1 + n2 – 2). No:96 5/5/2009

Pooled Confidence Interval Pooled CI for the Difference Between Two Means (Independent Samples): where t* is found using a t-distribution with df = (n1 + n2 – 2) and sp is the pooled standard deviation. No:97 5/5/2009

Example 12.9 Male and Female Sleep Times p415 Q: How much difference is there between how long female and male students slept the previous night? Data: The 83 female and 65 male responses from students in an intro stat class. Task: Make a 95% CI for the difference between the two population means sleep hours for females versus males. Note: We will assume equal population variances. No:98 5/5/2009

Example 12.9 Male and Female Sleep Times Two-sample T for sleep [with “Assume Equal Variance” option] Sex N Mean StDev SE Mean Female 83 7.02 1.75 0.19 Male 65 6.55 1.68 0.21 Difference = mu (Female) – mu (Male) Estimate for difference: 0.461 95% CI for difference: (-0.103, 1.025) T-Test of difference = 0 (vs not =): T-Value = 1.62 P = 0.108 DF = 146 Both use Pooled StDev = 1.72 Notes: Two sample standard deviations are very similar. Sample mean for females higher than for males. 95% confidence interval contains 0 so cannot rule out that the population means may be equal. No:99 5/5/2009

Example 12.9 Male and Female Sleep Times Pooled standard deviation and pooled standard error “by-hand”: No:100 5/5/2009

Pooled or Unpooled? P416-417 If sample sizes are equal, the pooled and unpooled standard errors are equal. If sample standard deviations similar, assumption of equal population variance is reasonable and pooled procedure can be used. If sample sizes are very different, pooled test can be quite misleading unless sample standard deviations are similar. If the smaller standard deviation accompanies the larger sample size, we do not recommend using the pooled procedure. If sample sizes are very different, the standard deviations are similar, and the larger sample size produced the larger standard deviation, the pooled procedure is acceptable because it will be conservative. No:101 5/5/2009

两个总体均值之差的估计 (大样本) 两个总体都服从正态分布,12、 22已知 1. 假定条件 两个总体都服从正态分布,12、 22已知 若不是正态分布, 可以用正态分布来近似(n130和n230) 两个样本是独立的随机样本 使用正态分布统计量 z No:102 5/5/2009

两个总体均值之差的估计 (大样本) 12、 22未知时,两个总体均值之差1-2在1- 置信水平下的置信区间为 两个总体均值之差的估计 (大样本) 1. 12, 22已知时,两个总体均值之差1-2在1- 置信水平下的置信区间为 12、 22未知时,两个总体均值之差1-2在1- 置信水平下的置信区间为 No:103 5/5/2009

两个总体均值之差的估计 (小样本: 12= 22 ) 两个总体均值之差的估计 (小样本: 12= 22 ) 1. 假定条件 两个总体都服从正态分布 两个总体方差未知但相等:12=22 两个独立的小样本(n1<30和n2<30) 总体方差的合并估计量 估计量x1-x2的抽样标准差 No:104 5/5/2009

两个总体均值之差的估计 (小样本: 12=22 ) 两个总体均值之差的估计 (小样本: 12=22 ) 两个样本均值之差的标准化 两个总体均值之差1-2在1- 置信水平下的置信区间为 No:105 5/5/2009

两个总体均值之差的估计 (小样本: 12 22 ) 两个总体均值之差的估计 (小样本: 12 22 ) 1. 假定条件 两个总体都服从正态分布 两个总体方差未知且不相等:1222 两个独立的小样本(n1<30和n2<30) 使用统计量 No:106 5/5/2009

两个总体均值之差的估计 (小样本: 1222 ) 两个总体均值之差的估计 (小样本: 1222 ) 两个总体均值之差1-2在1- 置信水平下的置信区间为 自由度 No:107 5/5/2009

两个总体均值之差的估计 (匹配大样本) 假定条件 两个匹配的大样本(n1 30和n2  30) 两个总体各观察值的配对差服从正态分布 两个总体均值之差d =1-2在1- 置信水平下的置信区间为 对应差值的标准差 对应差值的均值 No:108 5/5/2009

两个总体均值之差的估计 (匹配小样本) 假定条件 两个匹配的大样本(n1< 30和n2 < 30) 两个总体各观察值的配对差服从正态分布 两个总体均值之差d=1-2在1- 置信水平下的置信区间为 No:109 5/5/2009

两个总体均值之差的估计 (例题分析) 【例】某地区教育委员会想估计两所中学的学生高考时的英语平均分数之差,为此在两所中学独立抽取两个随机样本,有关数据如右表 。建立两所中学高考英语平均分数之差95%的置信区间 两个样本的有关数据 中学1 中学2 n1=46 n1=33 S1=5.8 S2=7.2 No:110 5/5/2009

两个总体均值之差的估计 (例题分析) 解: 两个总体均值之差在1-置信水平下的置信区间为 两所中学高考英语平均分数之差的置信区间为 5.03分~10.97分 No:111 5/5/2009 90

两个总体均值之差的估计 (例题分析) 【例】为估计两种方法组装产品所需时间的差异,分别对两种不同的组装方法各随机安排12名工人,每个工人组装一件产品所需的时间(分钟)下如表。假定两种方法组装产品的时间服从正态分布,且方差相等。试以95%的置信水平建立两种方法组装产品所需平均时间差值的置信区间 两个方法组装产品所需的时间 方法1 方法2 28.3 36.0 27.6 31.7 30.1 37.2 22.2 26.0 29.0 38.5 31.0 32.0 37.6 34.4 33.8 31.2 32.1 28.0 20.0 33.4 28.8 30.0 30.2 26.5 2 1 No:112 5/5/2009

两个总体均值之差的估计 (例题分析) 解: 根据样本数据计算得 合并估计量为: 两种方法组装产品所需平均时间之差的置信区间为 0.14分钟~7.26分钟 No:113 5/5/2009 90

两个总体均值之差的估计 (例题分析) 【例】沿用前例。假定第一种方法随机安排12名工人,第二种方法随机安排名工人,即n1=12,n2=8 ,所得的有关数据如表。假定两种方法组装产品的时间服从正态分布,且方差不相等。以95%的置信水平建立两种方法组装产品所需平均时间差值的置信区间 两个方法组装产品所需的时间 方法1 方法2 28.3 36.0 27.6 31.7 30.1 37.2 22.2 26.5 29.0 38.5 31.0 37.6 34.4 33.8 32.1 28.0 20.0 28.8 30.0 30.2 2 1 No:114 5/5/2009

两个总体均值之差的估计 (例题分析) 解: 根据样本数据计算得 自由度为: No:115 5/5/2009 90

两种方法组装产品所需平均时间之差的置信区间为 0.192分钟~9.058分钟 No:116 5/5/2009

两个总体均值之差的估计 (例题分析) 【例】由10名学生组成一个随机样本,让他们分别采用A和B两套试卷进行测试,结果如下表 。试建立两种试卷分数之差d=1-2 95%的置信区间 10名学生两套试卷的得分 学生编号 试卷A 试卷B 差值d 1 78 71 7 2 63 44 19 3 72 61 11 4 89 84 5 6 91 74 17 49 51 -2 68 55 13 8 76 60 16 9 85 77 10 39 No:117 5/5/2009

两个总体均值之差的估计 (例题分析) 解: 根据样本数据计算得 两种试卷所产生的分数之差的置信区间为6.33分 ~15.67分 No:118 5/5/2009 90

for difference between two proportions (independent sample) Section 5: p418 for difference between two proportions (independent sample) No:119 5/5/2009

12.6 The Difference Between Two Proportions (Indep) A CI for the Difference Between Two Proportions (Independent Samples): where z* is the value of the standard normal variable with area between -z* and z* equal to the desired confidence level. No:120 5/5/2009

Necessary Conditions Condition 1: Sample proportions are available based on independent, randomly selected samples from the two populations. Condition 2: All of the quantities – – are at least 5 and preferably at least 10. No:121 5/5/2009

Example 12.10 Snoring and Heart Attacks p419 Q: Is there a relationship between snoring and risk of heart disease? Data: Of 1105 snorers, 86 had heart disease. Of 1379 nonsnorers, 24 had heart disease. No:122 5/5/2009

Example 12.10 Snoring and Heart Attacks Note: the higher the level of confidence, the wider the interval. It appears that the proportion of snorers with heart disease in the population is about 4% to 8% higher than the proportion of nonsnorers with heart disease. Risk of heart disease for snorers is about 4.5 times what the risk is for nonsnorers. No:123 5/5/2009

Example 12.11 How often do you wear a seatbelt when driving a car? 747---1443 677---1572 No:124 5/5/2009

两个总体比率之差的估计 (例题分析) 【例】在某个电视节目的收视率调查中,农村随机调查了400人,有32%的人收看了该节目;城市随机调查了500人,有45%的人收看了该节目。试以90%的置信水平估计城市与农村收视率差别的置信区间 1 2 No:125 5/5/2009

两个总体比率之差的估计 (例题分析) 1- =95%, z/2=1.96 1- 2置信度为95%的置信区间为 两个总体比率之差的估计 (例题分析) 解: 已知 n1=500 ,n2=400, p1=45%, p2=32%, 1- =95%, z/2=1.96 1- 2置信度为95%的置信区间为 城市与农村收视率差值的置信区间为6.68%~19.32% No:126 5/5/2009 90

In summary p424 No:127 5/5/2009

Note :12.1 Examples of Different Estimation Situations p392 Situation 1. Estimating the proportion falling into a category of a categorical variable. Example research questions: What proportion of American adults believe there is extraterrestrial life? In what proportion of British marriages is the wife taller than her husband? Population parameter: p = proportion in the population falling into that category. Sample estimate: = proportion in the sample falling into that category. No:128 5/5/2009

More Estimation Situations Situation 2. Estimating the mean of a quantitative variable. Example research questions: What is the mean time that college students watch TV per day? What is the mean pulse rate of women? Population parameter: m (spelled “mu” and pronounced “mew”) = population mean for the variable Sample estimate: = the sample mean for the variable No:129 5/5/2009

More Estimation Situations Situation 3. Estimating the difference between two populations with regard to the proportion falling into a category of a qualitative variable. Example research questions: How much difference is there between the proportions that would quit smoking if taking the antidepressant buproprion (Zyban) versus if wearing a nicotine patch? How much difference is there between men who snore and men who don’t snore with regard to the proportion who have heart disease? Population parameter: p1 – p2 = difference between the two population proportions. Sample estimate: = difference between the two sample proportions. No:130 5/5/2009

More Estimation Situations Situation 4. Estimating the difference between two populations with regard to the mean of a quantitative variable. Example research questions: How much difference is there in average weight loss for those who diet compared to those who exercise to lose weight? How much difference is there between the mean foot lengths of men and women? Population parameter: m1 – m2 = difference between the two population means. Sample estimate: = difference between the two sample means. No:131 5/5/2009

Note :Independent Samples Two samples are called independent samples when the measurements in one sample are not related to the measurements in the other sample. Random samples taken separately from two populations and same response variable is recorded. One random sample taken and a variable recorded, but units are categorized to form two populations. Participants randomly assigned to one of two treatment conditions, and same response variable is recorded. No:132 5/5/2009

Note :Paired Data: A Special Case of One Mean Paired data (or paired samples): when pairs of variables are collected. Only interested in population (and sample) of differences, and not in the original data. Each person measured twice. Two measurements of same characteristic or trait are made under different conditions. Similar individuals are paired prior to an experiment. Each member of a pair receives a different treatment. Same response variable is measured for all individuals. Two different variables are measured for each individual. Interested in amount of difference between two variables. No:133 5/5/2009

Note :12.2 Standard Errors Rough Definition: The standard error of a sample statistic measures, roughly, the average difference between the statistic and the population parameter. This “average difference” is over all possible random samples of a given size that can be taken from the population. Technical Definition: The standard error of a sample statistic is the estimated standard deviation of the sampling distribution for the statistic. No:134 5/5/2009

Standard Error of a Sample Proportion Example 12.1 Intelligent Life on Other Planets Poll: Random sample of 935 Americans Do you think there is intelligent life on other planets? Results: 60% of the sample said “yes”, = .60 The standard error of .016 is roughly the average difference between the statistic, , and the population parameter, p, for all possible random samples of n = 935 from this population. No:135 5/5/2009

Standard Error of a Sample Mean Example 12.2 Mean Hours Watching TV Poll: Class of 175 students. In a typical day, about how much time to you spend watching television? Variable N Mean Median TrMean StDev SE Mean TV 175 2.09 2.000 1.950 1.644 0.124 No:136 5/5/2009

Standard Error of the Difference Between Two Sample Proportions Example 12.3 Patches vs Antidepressant (Zyban)? Study: n1 = n2 = 244 randomly assigned to each treatment Zyban: 85 of the 244 Zyban users quit smoking = .348 Patch: 52 of the 244 patch users quit smoking = .213 So, No:137 5/5/2009

Standard Error of the Difference Between Two Sample Means Example 12.4 Lose More Weight by Diet or Exercise? Study: n1 = 42 men on diet, n2 = 47 men on exercise routine Diet: Lost an average of 7.2 kg with std dev of 3.7 kg Exercise: Lost an average of 4.0 kg with std dev of 3.9 kg So, No:138 5/5/2009

Note 样本容量的确定 一、估计总体均值时样本容量的确定 二、估计总体比率时样本容量的确定 三、估计总体均值之差时样本容量的确定 四、估计总体比率之差时样本容量的确定 No:139 5/5/2009 9

估计总体均值时样本容量的确定 其中: 估计总体均值时样本容量n为 样本容量n与总体方差 2、允许误差E、可靠性系数Z或t之间的关系为 与总体方差成正比 与允许误差成反比 与可靠性系数成正比 其中: No:140 5/5/2009

估计总体均值时样本容量的确定 (例题分析) 估计总体均值时样本容量的确定 (例题分析) 【例】拥有工商管理学士学位的大学毕业生年薪的标准差大约为2000元,假定想要估计年薪95%的置信区间,希望允许误差为400元,应抽取多大的样本容量? No:141 5/5/2009

估计总体均值时样本容量的确定 (例题分析) 估计总体均值时样本容量的确定 (例题分析) 解: 已知 =2000,E=400, 1-=95%, z/2=1.96 应抽取的样本容量为 即应抽取97人作为样本 No:142 5/5/2009 90

估计总体比率时样本容量的确定 E的取值一般小于0.1  未知时,可取最大值0.5 其中: 根据比率区间估计公式可得样本容量n为 No:143 5/5/2009

估计两个总体均值之差时 样本容量的确定 其中: 设n1和n2为来自两个总体的样本,并假定n1=n2 No:144 5/5/2009

估计两个总体均值之差时样本容量的确定 (例题分析) 估计两个总体均值之差时样本容量的确定 (例题分析) 【例】一所中学的教务处想要估计试验班和普通班考试成绩平均分数差值的置信区间。要求置信水平为95%,预先估计两个班考试分数的方差分别为:试验班12=90 ,普通班 22=120 。如果要求估计的误差范围(允许误差)不超过5分,在两个班应分别抽取多少名学生进行调查? No:145 5/5/2009

估计两个总体均值之差时样本容量的确定 (例题分析) 估计两个总体均值之差时样本容量的确定 (例题分析) 解: 已知12=90,22=120,E=5, 1-=95%, z/2=1.96 即应抽取33人作为样本 No:146 5/5/2009

估计两个总体比率之差时 样本容量的确定 其中: 设n1和n2为来自两个总体的样本,并假定n1=n2 No:147 5/5/2009

估计两个总体比率之差时样本容量的确定 (例题分析) 估计两个总体比率之差时样本容量的确定 (例题分析) 【例】一家瓶装饮料制造商想要估计顾客对一种新型饮料认知的广告效果。他在广告前和广告后分别从市场营销区各抽选一个消费者随机样本,并询问这些消费者是否听说过这种新型饮料。这位制造商想以10%的误差范围和95%的置信水平估计广告前后知道该新型饮料消费者的比率之差,他抽取的两个样本分别应包括多少人?(假定两个样本容量相等) 绿色 健康饮品 No:148 5/5/2009

估计两个总体比率之差时样本容量的确定 (例题分析) 估计两个总体比率之差时样本容量的确定 (例题分析) 解: E=10%, 1-=95%,z/2=1.96,由于没有的信息,用0.5代替 即应抽取193位消费者作为样本 No:149 5/5/2009

Note :两个总体方差比的区间估计 1. 比较两个总体的方差比 用两个样本的方差比来判断 1. 比较两个总体的方差比 用两个样本的方差比来判断 如果S12/ S22接近于1,说明两个总体方差很接近 如果S12/ S22远离1,说明两个总体方差之间存在差异 总体方差比在1-置信水平下的置信区间为 No:150 5/5/2009

两个总体方差比的区间估计 (图示) F F1-  F  总体方差比 1-的置信区间 方差比置信区间示意图 In this diagram, do the populations have equal or unequal variances? Unequal. No:151 5/5/2009 38

两个总体方差比的区间估计 (例题分析) 【例】为了研究男女学生在生活费支出(元)上的差异,在某大学各随机抽取25名男学生和25名女学生,得到下面的结果: 男学生: 女学生: 试以90%置信水平估计男女学生生活费支出方差比的置信区间 No:152 5/5/2009

两个总体方差比的区间估计 (例题分析) 解:根据自由度 n1=25-1=24 ,n2=25-1=24,查得 F/2(24)=1.98, F1-/2(24)=1/1.98=0.505 12 /22置信度为90%的置信区间为 男女学生生活费支出方差比的置信区间为0.47~1.84 No:153 5/5/2009 90

No:154 5/5/2009

No:155 5/5/2009

No:156 5/5/2009

No:157 5/5/2009

No:158 5/5/2009

补充 考研题 No:159 5/5/2009