Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心

Similar presentations


Presentation on theme: "Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心"— Presentation transcript:

1 Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心
优化输出和创建新变量 Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心

2 优化 SAS 输出 TITLE 语句 – 为Proc程序块的输出作标记 LABEL 语句 – 为变量名称作标记
FORMAT 语句 – 为变量的数值作标记

3 Number of Patients by Clinic
标准输出 The FREQ Procedure Cumulative Cumulative clinic Frequency Percent Frequency Percent A B C D 优化输出 Number of Patients by Clinic Clinical Center clinic Frequency Percent Frequency Percent Birmingham Chicago Minneapolis Pittsburgh Output from programs 5 and 6.

4 Patient Report Headaches
标准输出 The FREQ Procedure Cumulative Cumulative sebl_6 Frequency Percent Frequency Percent 优化输出 Patient Report Headaches None Mild Moderate Severe

5 TITLE 语句 PROC FREQ DATA=tdata; TABLES clinic group sex educ sebl_1 sebl_6; TITLE 'Distribution of Selected Variables'; TITLE2 'on the TOMHS Dataset' ; RUN; TITLE 语句可以放在程序的任何地方。最好是放在过程 步里面 title可随时改变 使用格式为:TITLE(n)‘文字'

6 Label 语句可以放在数据步或过程步内的任何地方(但不可在中间)。
LABEL clinic = 'Clinical Center'; LABEL group = 'Drug Treatment Group'; LABEL educ = 'Highest Education Attained'; LABEL sebl_1 = 'Patient Report Drowsiness'; LABEL sebl_6 = 'Patient Report Headaches'; Label 语句可以放在数据步或过程步内的任何地方(但不可在中间)。 If in datastep will apply to all procedures. If only a PROC then will apply only to that PROC.

7 FORMAT brthdate mmddyy10. ; FORMAT clinic $clinic. ;
FORMAT group group. ; FORMAT fever headache se. ; 告诉SAS 根据format(格式)显示变量的数值 Format 语句可以放在数据步或过程步内的任何地方 既有内置format (例如日期),也有用户定义的format. 一个format 可用于多个变量 Format以句号结束 (.) 字符型format以$开始

8 VALUE group 1 = 'Beta Blocker' 2 = 'Calcium Channel Blocker'
如何定义FORMAT PROC FORMAT; VALUE group 1 = 'Beta Blocker' 2 = 'Calcium Channel Blocker' 3 = 'Diuretic' 4 = 'Alpha Blocker' 5 = 'ACE Inhibitor' 6 = 'Placebo'; VALUE gender 1 = 'Men' 2='Women' ; VALUE se = 'None' = 'Mild’ 3 = 'Moderate' 4 = 'Severe'; Format名称并不必是数据集合内一个变量的名称 Format名 Need to distinguish the format name with the variables that are to be formatted.

9 PROC FORMAT; VALUE $clinic 'A' = 'Birmingham' 'B' = 'Chicago' 'C' = 'Minneapolis' 'D' = 'Pittsburgh' ; 请不要混淆format和将要被格式化的变量 Sas单从PROC FORMAT不能知道你想要格式化哪个变量。你需要用format语句对变量进行格式化。

10 * 定义的Format没有使用; PROC FREQ; TABLES clinic sebl_6; RUN; ========================================== * 使用了format; FORMAT clinic $clinic. sebl_6 se. ;

11 INFILE 'd:\data\tomhs3.dat';
Program 6 PROC FORMAT; ... DATA tdata ; INFILE 'd:\data\tomhs3.dat'; INPUT ptid $ clinic $ group sex educ eversmk alcbl sebl_1 sebl_6 ; run; LABEL clinic = 'Clinical Center'; LABEL group = 'Drug Treatment Group'; LABEL educ = 'Highest Education Attained'; LABEL sebl_1 = 'Patient Report Drowsiness'; LABEL sebl_6 = 'Patient Report Headaches'; LABEL alcbl = 'Alcoholic Drinks Per Week'; LABEL eversmk = 'Ever Smoke Cigarettes'; PROC FREQ DATA=tdata; TABLES clinic sebl_6; FORMAT clinic $clinic. sebl_6 se. ;

12 PROC MEANS DATA=tdata N MEAN STD; VAR alcbl; CLASS eversmk; ;
FORMAT eversmk smoke. ; TITLE 'PROC MEANS With Variable and Value Labels'; RUN; The MEANS Procedure Analysis Variable : alcbl Alcoholic Drinks Per Week Ever Smoke N Cigarettes Obs N Mean Std Dev ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Smoker Non-smoker

13 需掌握的要点 Format在使用之前需先定义 (PROC FORMAT). FORMAT 语句可应用定义的format.
数据步里的Label 和 format 语句可用于其后的 PROCs 过程步里面的Label 和 format 语句只能用于该过程步 PROC本身

14 为网络或Word/Excel文件创建HTML 表 ODS LISTING CLOSE; ODS HTML FILE='educ.html';
PROC FREQ DATA=tdata ; TABLES educ; FORMAT educ educ.; TITLE 'HTML Output From PROC FREQ'; RUN; ODS HTML CLOSE; The output will go to the html file rather than the .lst file. This can then be viewed on the web or inserted into word. Can also create RTF files. Note: Run statement is needed here, and ODS statement must be after run.

15 插入Word的html文件

16 教育级别变量educ的解释 创建一个新变量 grad1= 1 如果大学毕业 = 2 否则

17 Program 7 DATA tdata; INFILE '/home/ph5420/data/tomhs.data' ; INPUT @ 1 ptid $10. @ 49 educ @123 sbp ; * 这样的话,缺失值被变换为2; if educ < 7 then grad1 = 2 ; else if educ >=7 then grad1 = 1 ; * 下面的两种方法结果一样且是正确的; if educ < 7 and educ ne . then grad2 = 2; else if educ >=7 then grad2 = 1; * IN 是SAS的一个特殊函数; if educ IN(1,2,3,4,5,6) then grad3 = 2; else if educ IN(7,8,9) then grad3 = 1;

18 PROC FORMAT; VALUE grad = '< College Graduate' 7-9 = 'College Graduate'; VALUE sbpcat LOW = '< 120' = ' ' 140-HIGH = '140+' ; 注意: LOW 不包括缺失值

19 最初的变量值为 1-9 缺失值被变成2 PROC FREQ DATA=tdata;
TABLES grad1 grad2 grad3 educ ; FORMAT educ grad.; RUN; grad1 Frequency Percent grad2 Frequency Percent Frequency Missing = 1 grad3 Frequency Percent educ Frequency Percent < College Graduate College Graduate 最初的变量值为 1-9 缺失值被变成2

20 * 把变量sbp12 变换成 3 个级别; if sbp12 = . then sbp12c = . ; else if sbp12 < 120 then sbp12c = 1 ; else if sbp12 < 140 then sbp12c = 2 ; else if sbp12 >=140 then sbp12c = 3 ; With if-then-else definitions SAS stops reading after the first true statement

21 PROC FREQ DATA=tdata; TABLES sbp12c sbp12; FORMAT sbp12 sbpcat.; RUN;
OUTPUT The FREQ Procedure Cumulative Cumulative sbp12c Frequency Percent Frequency Percent Frequency Missing = 8 sbp12 Frequency Percent Frequency Percent <

22 if sbp12 = . then sbp12c = . ; else
* 代价惨重的简单错误; if sbp12 = . then sbp12c = . ; else if sbp12 < 120 then sbp12c = 1 ; else if sbp12 < 140 then sbp12 = 2 ; else if sbp12 >=140 then sbp12c = 3 ; PROC FREQ DATA=tdata; TABLES sbp12c; RUN; The FREQ Procedure Cumulative Cumulative sbp12cat Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Frequency Missing = 51 How come no values of 2 and why so many missing?

23 创建新变量时需知道: 新变量的初始值是缺失值; 2. 缺失值 < 任何值; 3. 数值变量缺失值的指代值为.; 4. 类别变量缺失值的指代值为‘ ’; if sbp = . then ... if clinic = ' ' then ...

24 如何检查新建立的变量是对的? 显示新变量和原变量 PROC PRINT DATA=temp (OBS=20); VAR educ college ; 在新变量和原变量上运行PROC MEANS,确保两个变量的缺失值个数相同. PROC MEANS DATA=temp; VAR educ college; 在新变量和原变量上运行PROC FREQ. PROC FREQ DATA=temp; TABLES educ college;


Download ppt "Dr. Baokun Li 经济实验教学中心 商务数据挖掘中心"

Similar presentations


Ads by Google