Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive statistics

Similar presentations

Presentation on theme: "Descriptive statistics"— Presentation transcript:

1 Descriptive statistics
for one variable 描述性统计

2 统计方法的类型 Inferential Statistics 对数据来自的总体分布进行推断 Descriptive Statistics
通过数值和图的方式,清楚明了地对样本数据进行总结描述 Inferential Statistics 对数据来自的总体分布进行推断

3 描述什么? 数据的“位置”或者“中心” (“measures of location”) 数据的波动
(“measures of variability”).

4 使用统计方法的原因 有助于总结信息 有助于了解当前数据内在特点 有助于从数据中导出“信息” 有助于交流

5 数据的类型 根据测量的尺度不同,可以分为:
名义尺度:Nominal scales are read as discrete measurements at each level (no ordering) 顺序尺度:Ordinal measures show tendencies, but categories should not be compared (ordering exists, but not distance) 区间尺度:Interval (distance exists, but no ratios) 比例尺度: ratio scales (ratios exist) all for comparison among categories

6 Frequency distribution

7 频率分布 频率分布是描述一组数据最常用的(图形)工具之一,它有时候也通过罗列观测数据的频率表来表示。 特点
频率分布是描述一组数据最常用的(图形)工具之一,它有时候也通过罗列观测数据的频率表来表示。  特点 可以通过直方图、密度直方图、累计频率分布图等等表示 可以描述数据的分布特点 可以推测总体的特征

8 例:开车最快速度调查数据

9 分类数据的盒形图

10 Source: Protecting Children from Harmful Television: TV Ratings and the V-chip
Amy I. Nathanson, PhD Lecturer, University of California at Santa Barbara Joanne Cantor, PhD Professor, Communication Arts, University of Wisconsin-Madison

11 Source:
Web page on cryptography


13 Source: Cornell University website

14 Source:

15 The percentage of online searches done by US home and work web surfers in July 2006

16 NY Times

17 Old Faithful Geyser

18 Duration in seconds of 272 eruptions of the Old Faithful geyser.
library(datasets) > faithful[1:10,] eruptions waiting > summary(faithful) eruptions waiting Min. : Min. : 1st Qu. : st Qu.: Median : Median : 76.0 Mean : Mean : 3rd Qu. : rd Qu.: Max. : Max. :




22 正态分布 总体中的许多特征都依“正态”的形式分布 正态曲线有很好的统计性质
Parametric statistics are based on the assumption that the variables are distributed normally Most commonly used statistics This is the famous “Bell curve” where many cases fall near the middle of the distribution and few fall very high or very low I.Q.

23 Statistical properties of the normal distribution


25 I.Q. distribution


27 “中心”的度量 众数,Mode (Mo): 样本数据中出现次数最多的点 中位数,Median (Md): 样本数据的中点.
good for nominal data 中位数,Median (Md): 样本数据的中点. (50% cases above/50% cases below) – insensitive to extreme cases --Interval or ratio Source : Reasoning with Statistics, by Frederick Williams & Peter Monge, fifth edition, Harcourt College Publishers.

28 “中心”的度量 样本平均值(Mean) 样本分位数 有很多良好的统计性质 许多统计方法是基于平均值的 常见1/4, 3/4分位数
但是对极值点敏感 许多统计方法是基于平均值的 样本分位数 常见1/4, 3/4分位数 对极值点不敏感

29 Index of central tendency

30 例:开车最快速度调查数据 Sex N Mean Median TrMean StDev SE Mean
female male Minimum Maximum Q Q3 female male

31 Source:

32 Source:

33 Source: CSAP’s Data Pathways

34 “散布”程度的度量 研究数据的散布程度 具有相同中心的不同数据集,散布可以不同 为了解数据集的散布程度,我们需要计算每个数据到中心的距离

35 “散布”程度的度量 Range 样本数据的最大值与最小值之间的距离; 一般和其他工具一起用来描述数据的散布程度 对极值点敏感;
IQR:3/4-quantile-1/4quantile 一般和其他工具一起用来描述数据的散布程度

36 Range Source: statglos/sgrange.htm

37 Source:

38 Measures of dispersion
样本方差,Sample Variance (S2) Average of squared distances of individual points from the mean High variance means that most scores are far away from the mean. Low variance indicates that most scores cluster tightly about the mean. 

39 标准偏差(SD) A summary statistic of how much scores vary from the mean
Square root of the Variance expressed in the original units of measurement Used in a number of inferential statistics

40 方差 vs. 标准偏差 Variance Standard Deviation Population Sample

41 分布的偏度 度量数据的分布偏斜程度 中位数和平均值不同时,意味着数据的分布有偏.

42 Different Shapes of Distributions

43 Skewness of distributions

44 Distribution of posting frequency on Usenet

45 峰度 Kurtosis



Download ppt "Descriptive statistics"

Similar presentations

Ads by Google