Chapter 6 Introduction to Inferential Statistics Sampling and Sampling Designs
What are samples?
Sampling 抽樣 σ2 Ѕ2 Generalization 推論 Population Sample 母體 樣本 Parameter 參數 Statistic 統計量
誤差 Differences between parameters and statistics=error sampling error 抽樣誤差 non-sampling error 非抽樣誤差 (also called measurement error)
Sampling error the degree to which a given sample differs from the population sampling error tends to be high with small sample sizes and will decrease as sample size increases
Target Population group to which you wish to generalize the results of the study should be defined as specifically as possible
sampling frame population sample
Sampling Techniques Nonprobability Sampling (nonrandom sampling) 非隨機抽樣 Probability Sampling (random sampling) 隨機抽樣
Nonprobability sampling Convenience sampling 方便抽樣 getting people who are most conveniently available fast & low cost Volunteers 自願樣本 units are self-selected
Characteristics of nonprobability samples members of the population DO NOT have an equal chance of being selected results cannot be generalized beyond the group being tested
Probability Sampling sample should represent the population using random selection methods
Types of Probability Sampling Simple random sampling簡單隨機抽樣 Systematic sampling系統式抽樣 Stratified sampling 分層隨機抽樣 Cluster sampling 部落抽樣
Simple Random Sampling every unit in the population has an equal and known probability of being selected as part of the sample (抽籤) e.g. in obtaining a sample of 10 subjects from a population of 1,000 people, everyone in the population would have a 1/100 chance of being selected (or p of .01)
亂數表 1 2 3 4 5 6 7 8 9 10 49486 93775 88744 80091 92732 38532 41506 54131 44804 43637 94860 36746 04571 13150 65383 44616 97170 25057 02212 41930 10169 95685 47585 53247 60900 20097 97962 04267 29283 07550 12018 45351 15671 23026 55344 54654 73717 97666 00730 89083 45611 71585 61487 87434 07498 60596 36255 82880 84381 30433 89137 30984 18842 69619 53872 95200 76474 67528 14870 59628 94541 12057 30771 19598 96069 10399 50649 41909 09994 75322 89920 28843 87599 30181 26839 02162 56676 39342 95045 60146 32472 32796 15255 39636 90819 54150 24064 50514 15194 41450 63958 47944 82888 66709 66525 67616 75709 56879 29649 07325
Characteristics of simple random sampling Unbiased: 母體內每一個體被抽到的機會均等 Independence : 母體內某一個個體被抽到不會影響其他個體被抽到的機會
Limitations of simple random samples not practical for large populations Simple random sampling becomes difficult when we dont have a list of the population
Systematic Sampling系統性抽樣 a type of probability sampling in which every kth member of the population is selected k=N/n N = size of the population n = sample size
For example: You want to obtain a sample of 200 from a population of 10,000. You would select every 50th (or kth) person from the list. k = 10000/200=50
Advantages/disadvantages of systematic sampling Assuming availability of a list of population members Randomness of the sample depends on randomness of the list periodicity bias: 當母體個體排序出現某一週期性或規則時, systematic sampling 會有週期性誤差(periodicity bias)
Stratified Random Sample分層隨機抽樣 Prior to random sampling, the population is divided into subgroups, called strata, e.g., gender, ethnic groups, professions, etc.依母體特性將個體分層(Strata) & 每一個體只屬一層 Subjects are then randomly selected from each strata再從每一層中隨機抽取樣本(using simple random sampling)
第一層 第二層 第三層 . 第K層 Sample
Should select variables that are related to the dependent variable Homogeneity is very high within the strata. Heterogeneity is very high between the stratas
Why use stratified samples? permits examination of subgroups by ensuring sufficient numbers of subjects within subgroups 確保樣本包含母體中各種不同特性的個體,增加樣本的代表性 generally more convenient than a simple random sample
Potential disadvantages Sometimes the exact composition of the population is often unknown with multiple stratifying variables, sampling designs can become quite complex
Types of Stratified Sampling Proportionate Stratified Random Sampling 比例分層隨機抽樣 Disproportionate Stratified Random Sampling非比例分層隨機抽樣
Proportionate Sampling strata sample sizes are proportional to population subgroup sizes按母體比例抽取樣本 e.g., if a group represents 25% of the population, the stratum representing that group will comprise 25% of the sample
Disproportionate Sampling strata sample sizes are not proportional to population subgroup sizes每層抽出之樣本數不能與母體之特徵比例相呼應 may be used to achieve equal sample sizes across strata
For example: Suppose a researcher plans to conduct a survey regarding various attitudes of Agricultural College Students at Tunghai U. He wishes to compare perceptions across 4 major groups but finds some of the groups are quite small relative to the overall student population. As a result, he decides to over-sample minority students. For example, although Hospitality students only represent 10% of the Agricultural student population, he uses a disproportional stratified sample so that Hospitality students will comprise 25% of his sample.
Cluster Sampling部落抽樣 used when subjects are randomly sampled from within a "cluster" or unit (e.g., classroom, school, country, etc) 將母體分為若干部落 (cluster),在自所有部落中隨機抽取若干部落樣本並對這些抽取的部落作抽查
Population Sample Cluster 1 Cluster 2 Cluster 1 Cluster 4 Cluster 5 Cluster k Population Sample
Example 台中市民眾對薛凱莉事件看法 將台中市依“里”為部落分成許多里 隨機抽取3個里然後對此3個里的居民作全面性的訪問 Compare using cluster sampling technique and simple sampling technique
Why use cluster samples? They're easier to obtain than a simple random or systematic sample of the same size
Disadvantages of Cluster Sampling Less accurate than other sampling techniques (selection stages, accuracy) Generally leads to violation of an assumption that subjects are independent
Sampling Distribution 抽樣分配
For the most part in social science, we want to know about the population. In reality, the parameters are often unknown. The best thing we can do is to “guess” what our population should be like based on the info we get from a sample results of a sample=the results of a population???
Sampling Distributions抽樣分配 The “bridge” b/w information from the sample to the population a theoretical, probabilistic distribution of all possible samples of a given size, 在母體中重複抽取固定大小的隨機樣本,所有隨機樣本的統計值的機率分配稱為抽樣分配
Sampling distribution Population Sampling distribution Sample The relationship b/w population, sampling distribution, and sample.
= 100 etc. for all possible samples of a given N from the population
Sampling Distribution 定理 當母體為normal distribution, 我們重複抽取固定大小的隨機樣本時, 則此一抽樣分配會趨近normal distribution 並且有一平均值及標準差
以五名學生的考試成績(91, 92, 93, 94,95)為母體, 母體的mean 為 93。試比較從5名學生(母體)中隨機抽取2位學生作為樣本(n=2)和隨機抽取3位學生作為樣本之抽樣分配
When n=2 sample Sample mean 91,92 91.5 92,94 93 91,93 92 92,95 93.5 91,94 92.5 93,94 91,95 93,95 94 92,93 94,95 94.5
When n=3 sample Sample mean 91,92,93 92 91,94,95 93.33 91,92,94 92.33 92,93,94 93 91,92,95 92.67 92,93,95 91,93,94 92,94,95 93.67 91,93,95 93,94,95 94
Sampling distribution of sample mean Mean of the sampling distribution = St.D. of the sampling distribution (Standard Error ) = σ2/N Standard error (樣本平均數的標準誤)告訴我們樣本平均數對母體平均數的估計有多準確 N, Standard Error
Central Limit Theorem 中央極限定理 無論母體分配是否為normal distribution, 當我們重複抽取固定大小的隨機樣本時,只要樣本的N夠大 (N100),則此一抽樣分配也會趨近normal distribution If n is sufficiently large X ~N(, 2/n)
Summary of Sampling Distribution 若母體的分配式常態分配,則樣本平均的抽樣分配亦為常態分配 若母體的分配不是常態,則樣本平均的抽樣分配再樣本夠大時會近似常態分配 樣本平均值的平均會等於母體平均值 樣本標準差的平均會比母體標準差小
Exercise 假設王品牛排每位顧客等待主菜的時間呈常態分配,平均等待時間為10分鐘,標準差為2分鐘。某餐旅研究生作服務品質調查,隨機抽選16名顧各瞭解其等待時間,試問該16名顧客平均等待時間超過11分鐘的機率為何?
Sampling distribution of sample proportion( ) Mean of the sampling distribution of = P Standard error of the sampling distribution of =