Download presentation
Presentation is loading. Please wait.
1
Chapter 3 descriptive statistics:numerical methods
Statistics in Practice The Barnes hospital in treating centre of Washington University is voted to be the best. The hospital owns a plan about accepting sufferers to help them and their families to improve the quality of life. In this plan, sufferers and their families would obtain essential guidance to conquer nervous emotion brought by illness, isolation, and death. In the management and cooperation of plan, missionaries summarize their task with monthly report and seasonal report, and make layout and arrangement best. A report about the time span 67 sufferers staying in hospital is as follows: mean:8 days、standard deviation:3.2 days; median:5 days; mode:3 days。 we would realize connotation of above data from this chapter. After that, it’s helpful for you to find valuable information from data for your mastering of the methods to calculate. 华盛顿大学医疗中心的Barnes医院,被公认为全美最好的医院。该医院有一个收容计划,用以帮助身患绝症的人及其家人提高生活质量。通过该计划的实施,病人及其家属会获得必要的指导和支持,以克服由于疾病、隔离和死亡而带来的紧张情绪。 在收容计划的协作与管理上,采用每月报告和季度报告总结来帮助工作人员总结过去的工作,并作好规划及安排。一个含有67个病人记录的样本表明,有关收容时间的: 平均数:8天、标准差:3.2天; 中位数:5天; 众 数:3天。 通过本章的学习,我们将明白以上 数据的真实内涵,把握其计算方法,有 助于我们从数据中挖掘出有价值的信息。
2
Chapter 3 descriptive statistics:numerical methods
Main point 1、 central tendency and tendency of dispersion 2、types of measures of central location: (1)counting average: mean、harmonic mean、geometric mean; (2)position average: median、mode、percentiles, quartiles。 3、measures of variability:range、average deviation 、standard deviation、coefficient of variation; 4、measures of relative location and detecting outliers:z-Scores、Chebyshev’s Theorem、the empirical rule. Difficulties 1、mean; 2、deviation and standard deviation。
3
Chapter 3 descriptive statistics:numerical methods
central tendency and tendency of dispersion The distributing trend of data a、 central tendency: the frequency arises frequently when close with the middle level, contrarily will be in reverse. b、 tendency of dispersion: the tendency that deviate from the middle level and disperse in the two sides. Definition and kinds of average target a、definition: the representative value that reflect the common level of collectivity. b、characteristics: (1) avoid deviation; (2) find the center.
4
Chapter 3 descriptive statistics:numerical methods
Section 3.1(Measures of Location) 1、Mean(P74) a、definition:average value, and a measure of central location. [e.g.] age for ten persons: 15,16,16,17,17,17,18,18,18,18。Find the average age of age. b、formula:
5
Chapter 3 descriptive statistics:numerical methods
c、attention: (1)weight:the amount to balance degree f or f /f (2)calculation for the mean of class width series:use the midpoint to substitute variable x, and calculate it with formula. 其中f为绝对权数, f/f为相对权数。
6
Chapter 3 descriptive statistics:numerical methods
(3)the mean of ratio Analysis of ingredients for mean:
7
Chapter 3 descriptive statistics:numerical methods
2、Harmonic Mean a、definition:the reciprocal of mean which the variables’ reciprocals have. b、formula: Simple harmonic mean: Weighted harmonic mean: Meanwhile, m=xf is weight
8
Chapter 3 descriptive statistics:numerical methods
[e.g.] the information for four corporations belonging to one bureau as follows, try to calculate the average plan accomplished percentage for the industry bureau.
9
Chapter 3 descriptive statistics:numerical methods
2、 Weighted harmonic mean :( it can be used when variables has different weight)
10
Chapter 3 descriptive statistics:numerical methods
1、Basic formula :mean= symbol gross /population gross The precondition for the mean and harmonic mean: A、when the denominator is known, mean can be used in calculation ;(numerator is unknown) B、 when the numerator is known, harmonic mean can be used in calculation ;(denominator is unknown) 2、Weighted harmonic mean:(it can be used when variables has different weight) Harmonic mean is the transformation of mean
11
Chapter 3 descriptive statistics:numerical methods
3、geometric mean a、definition: n hypo-square of n variables’ product(ratio). b、precondition: symbol gross of population=data value of each unit in population, it’s suitable to compute the average of ratio or speed. c、formula: d、notice: when there is a zero or a negative value in the observation, it is not suitable to use geometric mean for calculation. e、if use the same data to calculate the arithmetic mean、 harmonic mean and geometric mean separately, the relation will show as follow:
12
Chapter 3 descriptive statistics:numerical methods
4、The position average (a)Median(P74) 1、definition:it is the value in the middle when the data are arranged in ascending order, and it is another measure of central location for a variable. Me [e.g.] the ages of nine officers in section office: 24, 25, 25, 26, 26, 27, 28, 29, 55 Sequence:A1, A2, A3, A4, A5, A6, A7, A8, A9 2、solution: (1)when the material is not grouped, middle point=(n+1)/2; when n is odd, Me =the value of the middle variable; when n is even, e.g. 24,25,25,26,26,27,28,29 Me =the average value of the two variables staying in the middle. (2)when material is grouped, and it forms into monomial variable sequence, middle point=f+1/2
13
Chapter 3 Descriptive Statistics II: Numerical Methods
Ex: median=180+1/2=90.5 ,so Me ought to the age of the ninetieth So : Me =18。
14
Chapter 3 Descriptive Statistics II: Numerical Methods
(3)The data has already grouping and form into type of class interval of fluent sequence ( A)L is the lower class limit of the median,the U is the up class limit (B)I is the width for the interval of median in place set (C)Sm-1 is the sum of smaller than each number of times of median (D)Sm+1 is the sum of larger than each number (E)fm is the times of median in place set
15
Chapter 3 Descriptive Statistics II: Numerical Methods
[EX] lower class limit formula: up class limit formula: And
16
Chapter 3 Descriptive Statistics II: Numerical Methods
deduce: (L) (U) x y (Sm-1) Me= L+x=U-y The ninetieth person Suppose that the variable of median groups is average distribution , then take the methods of interpolation by proportional parts
17
Chapter 3 Descriptive Statistics II: Numerical Methods
3、Attention of the problems: (1)Not affect by the extreme value, more steadiness . (2)The median takes value only bear on 1 or two numeral value in neutral position, make use of information insufficiency, ignore other size of data , and is not suit for algebraic operation.
18
Chapter 3 Descriptive Statistics II: Numerical Methods
(b) Mode(P76) 1、definition:The mode is the data value that occurs with greatest frequency。Express by the Mo。 A、20,15,18,20,20,22,20,23; n=8 Mo=20 B、20,20,15,19,19,20,19,25; n=8 Mo= Mo=19 C、10,11,13,16,15,25,8,12; n=8,no mode 2 calculation (1)If the data is the monomial number sequence 。 First identify the mode groups then identify mode :Mo=18
19
Chapter 3 Descriptive Statistics II: Numerical Methods
2) If the data is types of class interval of number sequence Make sure modal class first ; Then use the follow formula to calculate : Sign meaning : (A)L is the lower limit of modal class , U is the up limit ; (B)I is the class interval of modal class; (C)1=fm-fm-1, is the difference of order of modal class and and ex-number of order 2=fm -fm+1,is the difference of order of modal class and heel number of order
20
Chapter 3 Descriptive Statistics II: Numerical Methods
population A B C F E D O x y grades (L) (U) Mo=L+x=U-y
21
Chapter 3 Descriptive Statistics II: Numerical Methods
Population F E D A B C O x y grades (L) (U)
22
Chapter 3 Descriptive Statistics II: Numerical Methods
The characteristics of mode takes value The numerical value of mode always incline toward to the larger order of neighbor groups ,when the order in the two groups is equals , the mode is midclass in the modal class 3、Advert to a problem. (1)Advantage:not affect by the extreme value (2)disadvantage:Did not make use of all information Lack the sensitivity and is not suitable for the algebra operation
23
Chapter 3 Descriptive Statistics II: Numerical Methods
4、The relationship amount of Mean 、Median and Mode (a)The relationship of them: 1、quantitative relations: (1)symmetric distribution: This point all equal 35。
24
Chapter 3 Descriptive Statistics II: Numerical Methods
(2)Biased distribution A、Diverge right (positive) :
25
Chapter 3 Descriptive Statistics II: Numerical Methods
B、 Diverge left (negative):
26
Chapter 3 Descriptive Statistics II: Numerical Methods
(c)Percentile and Quartile(P77) Percentile A percetile provides information about how the data set are spread over the interval from the smallest value to the largest value. The pth percentile is value such that at least p percent of the observations take this value or less and at least (100-p) percent of the observations take this value or more Formula percentile i=(P/100)n (三)百分位数(Percentile)和四分位数(Quartile) 百分位数提供了数据如何在最小值与最大值之间分布的信息。它是指在一个数据集中至少有P%的观察值与它一样大或比它小;至少有(100P)%的观察值与它一样大或比它大。 1、定义:第P百分位数是指至少有P%的观察值与它一样大或比它小;至少有(100P)%的观察值与它一样大或比它大。 2、公式:百分位i=(P/100)n 注:若i不是整数,则下一整数即为第P百分位数之位置;若i为整数,则第i与第(i+1)个数的平均数为第P百分位数之位置。 [例]数据为:2210,2255,2350,2380,2380,2390,2420,2440,2450,2550,2630,2825。n=12。 第85百分位数: i=(P/100)n= (85/100)12=10.2 2630; 第50百分位数: i=(P/100)n= (50/100)12=6 ( )/2=2405(Me)。 3、四分位数(Quartile)P65
27
Chapter 3 Descriptive Statistics II: Numerical Methods
Claculating the pth percentile Step 1. Arrange the data in ascending order( small value to largest value) Step 2. Compute an index i i=(P/100)n Step 3.If I is not an integer, round up. The next integer greater than I denotes the position of the pth percentile If I is an integer, the pth percentile is the average of the data values in positions i and i+1. [EX] Date:2210,2255,2350,2380,2380,2390,2420,2440,2450,2550,2630,2825。n=12。 The 85th percentile: i=(P/100)n= (85/100)12=10.2 2630; The 50th percentile: i=(P/100)n= (50/100)12=6 ( )/2=2405(Me)。 (三)百分位数(Percentile)和四分位数(Quartile) 百分位数提供了数据如何在最小值与最大值之间分布的信息。它是指在一个数据集中至少有P%的观察值与它一样大或比它小;至少有(100P)%的观察值与它一样大或比它大。 1、定义:第P百分位数是指至少有P%的观察值与它一样大或比它小;至少有(100P)%的观察值与它一样大或比它大。 2、公式:百分位i=(P/100)n 注:若i不是整数,则下一整数即为第P百分位数之位置;若i为整数,则第i与第(i+1)个数的平均数为第P百分位数之位置。 [例]数据为:2210,2255,2350,2380,2380,2390,2420,2440,2450,2550,2630,2825。n=12。 第85百分位数: i=(P/100)n= (85/100)12=10.2 2630; 第50百分位数: i=(P/100)n= (50/100)12=6 ( )/2=2405(Me)。 3、四分位数(Quartile)P65
28
Chapter 3 Descriptive Statistics II: Numerical Methods
Quartiles P78 Quartiles are just specific percentiles, it is often desirable to divide data into four parts, with each part containing approximately one-fourth or 25%,of the observations. Q1=first quartile, or 25th percentile. Q2=second quartile, or 50th percentile, also the median. Q3=third quartile, or 75th percentile. 四分位数是一种特殊的百分位数,它通常描述把数据分成四个部分,各部分包括四分之一的观察值。
29
Chapter 3 Descriptive Statistics II: Numerical Methods
Section Measures of variability(P83) 1、Conception and function a、dispersion trend index :reflect the index of the difference of each variable value 。 b、Function: (1)Measure the size of mean value representative ness 。 (2)Reflect the scedasticity and disparity of variable value distribution 。 (3)Reflect the proportionality and stability of developing phenomena 第三节 变异性的测度(Measures of variability) 一、概念与作用 1、离散趋势指标:反映各变量值差异程度的指标。 2、作用 (1)衡量平均数代表性的大小。 (2)反映变量值分布的离中趋势和分散程度。 (3)反映现象发展的均衡性和稳定性。
30
Chapter 3 Descriptive Statistics II: Numerical Methods
2、The kind of Range(P84) R =Xmax– Xmin Interquartile Range IQR =Q3– Q1 [EX]: The data for starting salary of 12 business college graduates: 2210,2255,2350,2380,2380,2390,2420,2440,2450,2550,2630,2825。 Q1=( )/2=2365(dollar) Q3=( )/2=2500(dollar) IQR =Q3– Q1=2500 –2365=135(dollar) R=2825 –2210=615(dollar)
31
Chapter 3 Descriptive statistics: Numerical Methods
3、Average deviation A.D 1、Definition: The average deviation of the variable and the mean. 2、The preceding of the formula: –5 –2 2 5 5 2 14
32
Chapter 3 Descriptive Statistics II: Numerical Methods
Example: –5.61 –2.61 1.39 4.39 5.612=11.22 2.615=13.05 1.398=11.22 4.393=13.17 48.66
33
Chapter 3 Descriptive Statistics II: Numerical Methods
Standard deviation and Variance(P84) 1、Variance 2 : The variance is the average of the squared differences between each data value and the mean. If the data set is a sample, the variance is denoted by s2. (四)标准差(standard deviation)与方差(variance)※ 1、方差2 :离差平方的平均数。标准差 :平均离差。 2、公式推导 If the data set is a population, the variance is denoted by 2
34
Chapter 3 Descriptive Statistics II: Numerical Methods
Standard deviation and Variance 2、 The preceding of the formula: 25 4 58 –5 –2 2 5 (四)标准差(standard deviation)与方差(variance)※ 1、方差2 :离差平方的平均数。标准差 :平均离差。 2、公式推导
35
Chapter 3 Descriptive Statistics II: Numerical Methods
Example: Try to calculate the variance and the standard deviation about the grades of these 40 students as following: X 55 65 75 85 95 X f 110 520 1200 850 380 3060 -21.5 -11.5 -1.5 8.5 18.5 462.25 132.25 2.25 72.25 342.25 924.5 1058 36 722.5 1369 4110
36
3、The rule of determinant:
Chapter 3 Descriptive Statistics II: Numerical Methods STAT 3、The rule of determinant: The simple method of calculating the variance:
37
Chapter 3 Descriptive Statistics II: Numerical Methods
Example:
38
Chapter 3 Descriptive Statistics II: Numerical Methods
4、The variance and the standard deviation of the percentage Example: Try to calculate the variance and the standard deviation of the pass rate in the computer class as following :
39
5、The addition theorem of the variance (#)
Chapter 3 Descriptive Statistics II: Numerical Methods STAT 5、The addition theorem of the variance (#) Example: The output of 11 persons (Good:piece)as following :15、17、19、20、22、22、23、23、25、26、30。Try to calculate the total variance.
40
Chapter 3 Descriptive Statistics II: Numerical Methods
Example: The output of 11 persons (Good:piece)as following :15,17,19;20,22,22,23,23;25,26,30。
41
Chapter 3 Descriptive Statistics II: Numerical Methods
(4)The variance of the average group: The average of the variance between each group.
42
Chapter 3 Descriptive Statistics II: Numerical Methods
Example: The output of 11 persons (Good:piece)as following : 15,17,19,20,22,22,23,23,25,26,30。
43
Chapter 3 Descriptive Statistics II: Numerical Methods
5 、Coefficient of variation(P87) (relative measure of variability ) 1、R、A.D、 absolute or average measure of variability; 2、Used when the averages of the two groups are same. 3、The formula of calculating
44
Chapter 3 Descriptive Statistics II: Numerical Methods
Example: According the date as following, which group is more convergence ( order ) ?
45
6 、 Measure of relative location and detecting outliers(P89) 1.Z-Score
Chapter 3 Descriptive Statistics II: Numerical Methods STAT 6 、 Measure of relative location and detecting outliers(P89) 1.Z-Score Assume the average mark of the students , the following are the variance and standard variance of each subject
46
Chapter 3 Descriptive Statistics II: Numerical Methods
2.Chebyshev’s theorem(P90) (1)At least (1–1/ Z2=)of the data values must be within Z standard deviation of the mean, where Z is any value grater than 1. Example: There is an information that a group of customers waiting, the average of waiting time is 4minutes, the standard variance is 0.9 minutes, then At least0 At least 75 At least 89 Characteristic:Universality but more conservative
47
(2)The experience theorem(P91): when the data is symmetric, then
Chapter 3 Descriptive Statistics II: Numerical Methods STAT (2)The experience theorem(P91): when the data is symmetric, then
48
(3)when the data is normal distribution, then
Chapter 3 Descriptive Statistics II: Numerical Methods STAT (3)when the data is normal distribution, then distribution Normal
49
3.Detecting Outliers(P92)
Chapter 3 Descriptive Statistics II: Numerical Methods STAT 3.Detecting Outliers(P92) Standardized values(Z-scores) can be used to help identify outliers. We recommend treating any data value with z-score less than -3 or greater than +3 as an outliers.
50
例如:NBA的一个得分情况的样本提供了下列获胜队和所得分的数据。
STAT 第2章 描述统计学 例如:NBA的一个得分情况的样本提供了下列获胜队和所得分的数据。 获胜队 得分 印第安纳 克利夫兰 芝加哥 波特兰 波士顿 89 95 105 97 106 圣安东尼奥 纽约 明尼苏达 犹他 L.A.快船 84 88 109 1.计算这些数据的平均数与标准差。 2.在另一比赛中,西雅图对温哥的比分为120:108。利用Z分数,确定西雅图队得分是否为异常值。
51
第2章 描述统计学 STAT 1.计算这些数据的平均数与标准差。
2.在另一比赛中,西雅图对温哥的比分为120:108。利用Z分数,确定西雅图队得分是否为异常值。 3.假设球队得分为丘形。估计获胜队得分等于或大于105分将占所有NBA球队得分的%;估计获胜队得分小于80分将占所有NBA球队提分的%; 解:1. 2. 3.
52
第2章 描述统计学 STAT (1)你准备采用什么方法来评价组装方法的优劣?试说明理由。
方法1 方法2 方法3 平均数 165.7 129.1 126.5 中位数 165 129 众数 164 126 标准差 2.45 1.20 0.85 极差 8 4 3 最小值 162 127 125 最大值 170 131 128 (1)你准备采用什么方法来评价组装方法的优劣?试说明理由。 (2)如果让你选择一种方法,你会做出怎样的选择?试说明理由。
Similar presentations