Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心

Similar presentations


Presentation on theme: "Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心"— Presentation transcript:

1 Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心
描述统计的SAS方法 Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心

2 建立和执行SAS程序的步骤 建立SAS程序 在程序编辑窗口,或者文本文件编辑器进行 2. 运行SAS 程序-点击工具条的图标
  在程序编辑窗口,或者文本文件编辑器进行 2. 运行SAS 程序-点击工具条的图标 3. 观察日志文件-发现是否有错误和警告 如果有错误发生,返回到第一步,然后重复1-3 5. 如果没有错误了,看输出结果窗口 

3 (Descriptive Procedures)
SAS 描述性统计程序步 (Descriptive Procedures) PROC PRINT PROC MEANS PROC UNIVARIATE PROC FREQ PROC PLOT PROC CHART PROC GPLOT PROC GCHART

4 程序步的用法 (Syntax for Procedures)
PROC PROCNAME DATA=datasetname <可选项> ; 子句(substatements)/<可选项> ; WHERE 句子是所有程序步都允许的子句 (WHERE statement is a useful substatement available to all procedures.) PROC PRINT DATA=demo ; VAR marstat ; WHERE state = 'MN';

5 DATA demo; INFILE DATALINES; INPUT gender $ age marstat $ credits state $ ; if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN' then resid = 'Y'; else resid = 'N'; DATALINES; F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; RUN;

6 * PROGRAM 3; DATA weight; INFILE ‘d:\...\tomhs.txt' ; INPUT ptid $ clinic $ sex $ height weight; bmi = (weight* )/(height*height); * bmi 的单位是kg/m2; RUN;

7 PROC PRINT DATA = weight (OBS=5) NOOBS;
TITLE ‘Proc Print: TOMHS 数据的5条观测'; RUN; PROC MEANS DATA = weight; VAR height weight bmi; TITLE 'Proc Means Example 1'; PROC MEANS DATA = weight MEAN MEDIAN STD MAXDEC=2; TITLE ‘Proc Means Example 2 (指定选项)';

8 Proc Print: Five observations from the TOMHS Study
patid clinic sex height weight bmi C C B B B B D D A A Proc Means Example 1 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum height weight bmi

9 Proc Means Example 2 (指定选项)
The MEANS Procedure Variable Mean Median Std Dev height weight bmi

10 FW=field width, 字段宽 PROC MEANS DATA = weight N MEAN STD MAXDEC=2 FW=8;
CLASS clinic; TITLE ‘Proc Means Example 3 (使用 CLASS类别语句)'; RUN; N clinic Obs Variable N Mean Std Dev A height weight bmi B height weight bmi C height weight bmi D height weight bmi

11 PROC UNIVARIATE DATA = weight PLOT ;
ID ptid; VAR bmi; TITLE 'Proc Univariate Example 1'; RUN; * Note: PROC UNIVARIATE will give you much output ;

12 Proc Univariate Example 1
The UNIVARIATE Procedure Variable: bmi Moments N Sum Weights Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Tests for Location: Mu0=0 Test Statistic p Value------ Student's t t Pr > |t| <.0001 Sign M Pr >= |M| <.0001 Signed Rank S Pr >= |S| <.0001

13 Quantile Estimate 100% Max 37.5179 99% 37.4385 95% 35.8871 90% 34.3378
99% 95% 90% 75% Q 50% Median 25% Q 10% 5% 1% 0% Min Extreme Observations Lowest Highest Value patid Obs Value patid Obs A B C B B A A C B B

14 ----+----+----+----+
Stem Leaf # Boxplot | | | | | | | | | | + | *-----* | | | | | | | | 75th Percentile Mean 25th Percentile

15 直线表明数据是正态分布 The UNIVARIATE Procedure Variable: bmi
Normal Probability Plot * *+ * | *++ | *** | ***+ | ***** | **+++ | **++ | *++ ** | *** | **** | **** | **** | **** | ***+ | * ***++ 21.5+* 直线表明数据是正态分布

16 * High resolution graphs can also be produced.
The following makes a histogram ; PROC UNIVARIATE DATA = weight; VAR bmi; HISTOGRAM bmi / NORMAL MIDPOINTS=20 to 40 by 2; INSET N = 'N' (5.0) MEAN = 'Mean' (5.1) STD = 'Sdev' (5.1) MIN = 'Min' (5.1) MAX = 'Max' (5.1)/ POS=lm HEADER='Summary Statistics'; LABEL bmi = 'Body Mass Index (kg/m2)'; TITLE 'Histogram of BMI'; RUN;

17

18 Using Comment Statements in SAS
Two Purposes Documenting your program Temporary delete part of a program See Page C & S

19 Examples of Comment Code PROC UNIVARIATE DATA = weight PLOT ;
* Run proc univariate for variable BMI; * * High resolution graphs can also be produced. The following makes a pdf file containing a histogram with the best fit normal curve and summary statistics. Other types of files such as GIF * *; PROC UNIVARIATE DATA = weight PLOT ; * ID patid ; VAR bmi; PROC UNIVARIATE DATA = weight /*PLOT*/;

20 Temporarily Removing Code: Do not want to produce histogram but may want to run it at another time
PROC UNIVARIATE DATA = weight; VAR bmi; /* HISTOGRAM bmi / NORMAL MIDPOINTS=20 to 40 by 2; INSET N = 'N' (5.0) MEAN = 'Mean' (5.1) STD = 'Sdev' (5.1) MIN = 'Min' (5.1) MAX = 'Max' (5.1)/ POS=lm HEADER='Summary Statistics'; */ LABEL bmi = 'Body Mass Index (kg/m2)'; TITLE 'Histogram of BMI'; RUN;


Download ppt "Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心"

Similar presentations


Ads by Google