Presentation is loading. Please wait.

Presentation is loading. Please wait.

Population proportion and sample proportion

Similar presentations


Presentation on theme: "Population proportion and sample proportion"— Presentation transcript:

1 Population proportion and sample proportion
生活中很多的調查都僅問是否贊成…、是否支持…,然後計算「贊成」與「反對」的人數(count)所佔之比例(proportion)。 本章要介紹如何用統計方法來推論單一的「比例」(a single proportion)。下一章將會介紹如何來推論一組比例的分配。 社會統計(上) ©蘇國賢2007

2 Population proportion and sample proportion
想要估計總統大選阿扁的得票率,即投票給阿扁的人佔所有投票者的比例,我們可以利用適當的抽樣方法取處樣本數為n的樣本,然後觀察樣本中支持阿扁的人數佔整個樣本n的比例,即可得到樣本中的阿扁支持率,稱之為樣本比例。 如果我們知道樣本比例的抽樣分配,即樣本比例的期望值,變異數,及分配形狀,則可以用樣本比例來推估母體比例。 社會統計(上) ©蘇國賢2007

3 Sampling Distribution of the Sample Proportion
Let p denote the proportion of items in a population that possess a certain characteristic (unemployed, income below poverty level). To estimate p, we take a random sample of n observation from the population and count the number X of items in the sample that possess the characteristic. The sample proportion p^ = X/n is used to estimate the population proportion p. 社會統計(上) ©蘇國賢2007

4 Sampling Distribution of the Sample Proportion
定義 若一隨機試驗只有兩種課能的結果(X=1支持阿扁, X=0不支持阿扁),若母體數總共為N(所有投票人),若母體中有K個人會投票給阿扁,則支持阿扁的母體比例(population proportion)為 p = K/N (N=母體個數,K=支持阿扁總人數) 社會統計(上) ©蘇國賢2007

5 Sampling Distribution of the Sample Proportion
定義 上次總統大選的有效投票數12,664,393 (N) 其中阿扁得4,977,697 (K) 母體比例為39.30% 社會統計(上) ©蘇國賢2007

6 Sampling Distribution of the Sample Proportion
定義 若母體N中隨機抽取n個元素為樣本,表為(X1, X2, …Xn),且n個樣本中有k個人支持阿扁,支持阿扁所佔的比例稱為樣本比例(sample proportion): (n=樣本個數,k=樣本個數) k為樣本中,支持阿扁(X=1)的個數總和。 社會統計(上) ©蘇國賢2007

7 Sampling Distribution of the Sample Proportion
定義 在大選前,民調中心調查1500個樣本(n=1500),其中有573人支持阿扁(k=573),樣本支持比例為38.2% 抽樣誤差為 隨著每一次樣本所抽取的對象不同,所計算出的樣本比例也會有差異,因此樣本比例本身為一隨機變數。 社會統計(上) ©蘇國賢2007

8 The Bernoulli Distribution
定義 P(X=1) = p P(X=0) = (1-p) If we let q = 1- p, then the p.f of X can be written as follows: 社會統計(上) ©蘇國賢2007

9 The Bernoulli Distribution
定義 E(X) = 1·p +0·q = p (X的期望值等於母體比例) E(X2) =X2 f(x)=12·p+02·q = p Var(X) = E(X2) –[E(X)]2 =p-p2 =p(1-p) = p·q 社會統計(上) ©蘇國賢2007

10 Sampling Distribution of the Sample Proportion
The Normal Approximation Rule for Proportion: Let p denote the proportion of a population possessing some characteristics of interest. Take a random sample of n observations from the population. Let X denote the number of items in the sample possessing the characteristic. We estimate the population proportion p by the sample proportion p^=X/n. If np5, and nq 5, the random variable p^ has approximately a normal distribution with: 社會統計(上) ©蘇國賢2007

11 Sampling Distribution of the Sample Proportion
證明 社會統計(上) ©蘇國賢2007

12 Sampling Distribution of the Sample Proportion
證明 assume X1, X2…Xn independent 社會統計(上) ©蘇國賢2007

13 Sampling Distribution of the Sample Proportion
If the distribution of p^ is approximately normal, and 社會統計(上) ©蘇國賢2007

14 例題 假設這一次的大選會有55%的選民支持阿扁,假設我們任取n=400人的隨機樣本來預測阿扁的當選率,我們預測阿扁會輸的的機率為?
社會統計(上) ©蘇國賢2007

15 例題 Of your first 15 grandchildren, what is the chance there will be more than 10 boys? (assume equal probability of male/female) “more than 10 boys””the proportion of boys is more than 10/15” Use the Normal Approximation Rule: 社會統計(上) ©蘇國賢2007

16 Confidence intervals for proportions (large samples)
we know that p^ ~N(p, pq/n) , where q = 1-p and np≧5 and nq≧5) 社會統計(上) ©蘇國賢2007

17 Value of Zα P(Z≧ zα/2) =α/2 P(Z≦ -zα/2) =α/2 P(-zα/2 ≦Z≦ zα/2) =(1-α)
1-α/2-α/2 =1-α P(Z≧ zα/2) =α/2 P(Z≦ -zα/2) =α/2 P(-zα/2 ≦Z≦ zα/2) =(1-α) α/2 社會統計(上) ©蘇國賢2007

18 Confidence intervals for proportions (large samples)
社會統計(上) ©蘇國賢2007

19 Confidence intervals for proportions (large samples)
因為沒有p與q的資訊,在樣本數夠大時,我們通常以樣本的比例p^來估計母體的標準誤: 社會統計(上) ©蘇國賢2007

20 Confidence interval for the population proportion p
定義 Let p denote the population proportion. Suppose we take a large random sample of n observations and obtain the sample proportion p^. A confidence interval for the population proportion having level of confidence 100(1-α)% is given by 社會統計(上) ©蘇國賢2007

21 社會統計(上) ©蘇國賢2007

22 Wilson estimate 用樣本比例取代母體比例來估計標準誤並不一定正確。 例如:丟一個銅板三次得到三次都得正面,則 p^=3/3=1
p^(1-p^)/n=1(1-0)/3=0 社會統計(上) ©蘇國賢2007

23 Wilson estimate We must know the s.d. of the population to get a CI for p. Unfortunately, modern computer studies reveal the confidence intervals based on this approach can be quite inaccurate, even for large samples. -- When the sample is not a SRS. -- When the sample size is small 社會統計(上) ©蘇國賢2007

24 Wilson estimate The Wilson estimate ~ Add 2 successes and 2 failures (so that the sample proportion is slightly moved away from 0 and 1.) -- Because this estimate was first suggested by Edwin Bidwell Wilson in 1927, we call it the Wilson estimate. 社會統計(上) ©蘇國賢2007

25 Wilson estimate An approximate level C confidence interval for p is
The margin of error is 社會統計(上) ©蘇國賢2007

26 Confidence interval for the population proportion p
例題 政府想要估計月收入低於$25,000NT的家庭。500個家庭接受訪問,其中有200戶人家年收入少於 求p的95%信賴區間? (.3572, .4428) 社會統計(上) ©蘇國賢2007

27 例題 從台北市隨機抽取500個人,詢問是否贊成公投,結果有312名贊成。試求台北市贊成公投比率95%信賴區間。 ,p的信賴區間為:
社會統計(上) ©蘇國賢2007

28 One-sided confidence intervals for the population proportion
Suppose that we take a random sample of n observations from some population having unknown proportion p. Suppose we wish to find the lower confidence limit LCL such that the probability is (1-) that p exceeds LCL. The one-sided interval (LCL, 1.00) is a left-sided confidence interval. The LCL is given by: 社會統計(上) ©蘇國賢2007

29 One-sided confidence intervals for the population proportion
Construct a right-sided 95% CI for the proportion of defective items produced by a machine if 16 items are found to be defective in a random sample of 100 items. The 95% right-sided CI for p is (0, .2306) This mean that we can be 95% confident that the population proportion is less than .2306 社會統計(上) ©蘇國賢2007

30 Determining the sample size決定樣本大小
Margin of Error Suppose that we take a random sample from some population. Then a 100(1-)% confidence interval for the population proportion extends at most a distance m on each side of the sample proportion if the number of observations is ? 社會統計(上) ©蘇國賢2007

31 Determining the sample size決定樣本大小
(1) 我們可以用pilot study來得到p的估計值。 (2) 在不知道的樣本比例情形下,我們可以採用最保守的估計,也就是最大的變異.5*.5=.25來估計n。 社會統計(上) ©蘇國賢2007

32 Sample size and confidence interval for the proportion
如果母體比率無法推估,則樣本數: 如果母體比率p可以推估,則樣本數: 社會統計(上) ©蘇國賢2007

33 Sample size and confidence interval for the proportion
民意調查機構想知道某總統候選人得票的比率,請問至少要多大的樣本數才可以使此機構在95%的信賴度下,估計的誤差界不會超過.03? 社會統計(上) ©蘇國賢2007

34 Sample size and confidence interval for the proportion
民意調查機構想知道某總統候選人得票的比率。假設該公司要求樣本比例與母體之誤差不能超過0.01,且有95%的信賴度,則樣本數應為何? p未知,故以 代入, 故至少應選取9,600個樣本點。 社會統計(上) ©蘇國賢2007

35 Tests of the population proportion
樣本比例的抽樣分配 f(p^):如果母體的比例為p, 且np5 and nq 5, 則樣本比例p^為一常態分配~N(p, pq/n) The Normal Approximation Rule for Proportion: If np5, and nq 5, the random variable p^ has approximately a normal distribution with: 社會統計(上) ©蘇國賢2007

36 Sampling Distribution of the Sample Proportion
If the distribution of p^ is approximately normal, then random variable 社會統計(上) ©蘇國賢2007

37 Tests of the population proportion
設np5 and nq 5 檢證下列假說: H0: p = p0 or H0: pp0 H1: p < p0 如果H0為真,則樣本比率~N(p0, p0q0/n) 假設為真時的母體比例 Reject H0 if Z < -z or p^ < p^* (critical value approach) 社會統計(上) ©蘇國賢2007

38 社會統計(上) ©蘇國賢2007

39 社會統計(上) ©蘇國賢2007

40 社會統計(上) ©蘇國賢2007

41 Page 614, Procedure 12.2B (cont.)
社會統計(上) ©蘇國賢2007

42 例:Testing a population Proportion
藍營立法委員宣稱民調顯示60%的民眾支持連戰出訪中國,綠營團體宣稱支持的民眾不會超過60%,妳用100的樣本來驗證: H0: p = .6 v.s. H1: p < .6 假設55個樣本支持連戰出訪,以5%的顯著水準,我們可以推翻藍營立委的說法嗎? 社會統計(上) ©蘇國賢2007

43 例:Testing a population Proportion
Solution: If H0 is true, then p^ has a normal distribution with mean p =.6 and variance pq/n = (.6)(.4)/100 = .0024 If we use a one-tailed test at the 5% level of significance, the critical region consists of all values of Z less than –z = -z.05 = 從樣本中得知p^=x/n = 55/100 =.55 社會統計(上) ©蘇國賢2007

44 例:Testing a population Proportion
We do not reject H0 1 -1.02 實際上觀察到的樣本比例為.55>.519因此無法推翻虛擬假設 社會統計(上) ©蘇國賢2007

45 Sampling distribution of the difference between sample proportions
Suppose we take independent sample of size n1 and n2 from two population. Let p1 and p2 be the proportion of items in each population that possess a certain characteristics, and let q1=(1-p1), q2=(1-p2). If n1p1>5, n1q1>5, n2p2>5, n2q2>5, then the random variable (p1^-p2^) is approximately normally distributed with 社會統計(上) ©蘇國賢2007

46 例題 假設某行銷公司想要知道某電視節目在高、低收入人口中受歡迎的程度。假設高收入的人中有40%喜歡看此節目,在低收入人口中喜歡此節目的佔50%。這家行銷公司從高收入的人口中抽取100人的樣本,從低收入中抽200人的樣本。請問兩樣本比率差距小於.05的機率? 社會統計(上) ©蘇國賢2007

47 例題 社會統計(上) ©蘇國賢2007

48 Confidence intervals for the difference of Two population proportion
Let p1 denote the observed proportion of successes in a random sample of n1 observation from a population with proportion p1 successes, and let p2 denote the observed proportion of successes in an independent random sample of n2 observations from a population with proportion p2 successes. A 100(1- α) % confidence interval for (p1 – p2) is given by the interval This result holds provided n1p1≧ 5 n1q1 ≧5 n2p2≧ 5 and n2q2 ≧5 社會統計(上) ©蘇國賢2007

49 Tests concerning differences of proportions
欲檢定兩母體的比率是否等於某特定值(相等),假設母體1的比率為p1,母體2的比率為p2: H0: p1 –p2 = D0 分別從兩母體中抽取樣本n1, n2並計算樣本比率為p^1 p^2。 社會統計(上) ©蘇國賢2007

50 Tests concerning differences of proportions
若虛擬假設為真H0: p1 –p2 = D0,且n1p1≥5, n1q1≥5, n2p2≥5, n2q2≥5 通常我們想要檢驗虛擬假設H0: p1 –p2 =0的情形,即H0: p1 = p2 社會統計(上) ©蘇國賢2007

51 Tests concerning differences of proportions
由於p1和 p2為未知,我們無法計算變異數。 由於我們假設p1 = p2,一個合理估計母體變異數的方法為同時利用兩樣本的資料來估計母體比率p = p1 = p2,稱為pooled sample proportion。 社會統計(上) ©蘇國賢2007

52 Tests concerning differences of proportions
檢定 H0: p1 – p2 =0 v.s. H1: p1 – p2 ≠0 社會統計(上) ©蘇國賢2007

53 例題 第一台機器生產的400產品中,有23個瑕疵品,第二台機器生產的400樣本種,有17個瑕疵品,請用5%的顯著水準測驗這兩台機器的品質是否相當。 社會統計(上) ©蘇國賢2007

54 例題 第一台機器生產的400產品中,有23個瑕疵品,第二台機器生產的400樣本種,有17個瑕疵品,請用5%的顯著水準測驗這兩台機器的品質是否相當。 Failed to reject H0 社會統計(上) ©蘇國賢2007

55 社會統計(上) ©蘇國賢2007

56 社會統計(上) ©蘇國賢2007

57 社會統計(上) ©蘇國賢2007

58 社會統計(上) ©蘇國賢2007

59 社會統計(上) ©蘇國賢2007

60 例題 兩家銀行,信用卡部門上個月申請件數與核准件數如下表:
當顯著水準等於5%時,檢定兩家銀行信用卡核准率是否相同?若不同,則求核准率差的95%信賴區間。 申請件數 核准件數 銀行 350 273 450 378 社會統計(上) ©蘇國賢2007

61 因檢定統計量小於左尾臨界值,故拒絕虛無假設,兩家銀行信用卡核准率不同。
申請件數 核准件數 銀行 350 273 450 378 因檢定統計量小於左尾臨界值,故拒絕虛無假設,兩家銀行信用卡核准率不同。


Download ppt "Population proportion and sample proportion"

Similar presentations


Ads by Google