Download presentation
Presentation is loading. Please wait.
1
日期和函数 经济实验教学中心 商务数据挖掘中心
2
日期有多种形式 10/18/04 18/10/04 10/18/2004 18OCT2004 101804 October 18, 2004
我们需要了解如何读入日期,以及如何处理日期。
3
关于日期的问题 如何显示日期? 比较两个日期-哪个更早? 如何求出两个日期之间相差的天数?
ndays = date2 - date1,这样计算行吗? 问题:日期的用普通的减法不行 例如: date2 = 03/02/2003 date1 = 08/02/2002 ========== -05/00/0001
4
Jan 1, 1960 DATA dates; INFILE DATALINES; INPUT brthdate mmddyy10.;
03/03/1971 02/14/1956 01/01/1960 ; PROC PRINT; VAR brthdate; PROC PRINT; VAR brthdate; FORMAT brthdate mmddyy10.; Obs brthdate /03/1971 /14/1956 /01/1960 Jan 1, 1960
5
当你用日期informat读入数据时 SAS 给定的数值是1960年1月1日后的天数 这使得两个日期的相减容易了。
6
* 关于日期; DATA age; INFILE '/home/ph5420/data/tomhs.data' ; INPUT
@14 randdate mmddyy10. @34 brthdate mmddyy10. @74 date12 mmddyy10. ; agedays = randdate - brthdate ; ageyrs = (randdate - brthdate)/365.25; ageint = INT( (randdate - brthdate)/365.25); agetoday= (TODAY() - brthdate)/ ; ageendst= (MDY(02,28,1992) - brthdate)/365.25; daysv12 = date12 - randdate; if ABS(daysv ) = . then window12 = .; else if ABS(daysv ) < 31 then window12 = 1; else if ABS(daysv ) >= 31 then window12 = 2; yrrand = YEAR(randdate);
7
PROC PRINT DATA=age (obs=10);
VAR brthdate randdate agedays ageyrs ageint agetoday; TITLE ‘显示不加日期Format的数据'; RUN; VAR brthdate randdate agedays ageyrs ageint agetoday; FORMAT brthdate mmddyy10. randdate mmddyy10.; TITLE ‘显示加了日期Format的数据';
8
显示不加日期Format的数据 All before 1960
Obs brthdate randdate agedays ageyrs ageint agetoday All before 1960
9
显示加了日期Format的数据 Obs brthdate randdate 1 06/26/1936 11/10/1987
/26/ /10/1987 /01/ /13/1988 /31/ /21/1987 /27/ /10/1987 /02/ /23/1988
10
PROC PRINT DATA=age (OBS=20);
VAR randdate date12 daysv12 window12; FORMAT randdate date12 mmddyy8.; TITLE 'Printing Days From Randomization to st Year Visit'; RUN; PROC FREQ DATA=age; TABLES yrrand; TITLE 'Frequency Distribution of Year Randomized';
11
Obs randdate date12 daysv12 window12
/10/ /25/ /13/ /09/ /21/ /10/ /30/ /23/ /13/ /12/ /02/ /05/ /03/ /12/ /16/ /21/ /09/ /16/ /04/ /12/ /10/ /16/ /02/ /02/ /08/ /04/ /30/ /27/ /08/ /29/ /13/
12
Frequency Distribution of Year Randomized
The FREQ Procedure Cumulative Cumulative yrrand Frequency Percent Frequency Percent
13
* 关于函数 ; DATA example; INFILE '/home/ph5420/data/tomhs.data' ; INPUT @058 height 4.1 @085 weight 5.1 @172 ursod 3. @236 (se1-se10) ( ); bmi = (weight* )/(height*height); rbmi1 = ROUND(bmi,1); rbmi2 = ROUND(bmi,.1); lursod = LOG(ursod); seavg = MEAN (OF se1-se10); semin = MIN (OF se1-se10); semax = MAX (OF se1-se10);
14
seavg = MEAN (OF se1-se10); 这和下面的写法等价
* 使用横线“-”符号; seavg = MEAN (OF se1-se10); 这和下面的写法等价 seavg = MEAN (se1,se2,se3,se4,se5,se6,se7,se8,se9,se10); 注意:OF 非常重要. 否则,SAS 认为你想做se1减去se10. 要使用这一符号,切记变量名的“根”必须相同.
15
* 计算平均值的两种方法 ; seavg = MEAN (se1,se2,se3,se4,se5,se6,se7,se8,se9,se10); 和 seavg = (se1+se2+se3+se4+se5+se6+se7+se8+se9+se10)/10; 第一种方法计算的是没有缺失的数值的平均值, 当所有的数值都缺失时才会没有结果。 第二种方法需要所有的值都不缺少,否则没有结果。 if N(of se1-se10) > 5 then seavg = MEAN(of se1-se10); 这个句子是什么意思?
16
队列(ARRAYS) - 用于缩短代码 用于重复执行相同的代码 - 用于DO/END 循环
用于重复执行相同的代码 - 用于DO/END 循环 ARRAY wtlb(3) wt1 wt2 wt3; ARRAY wtkg(3) newwt1 newwt2 newwt3; DO index = 1 to 3; wtkg(index) = wtlb(index) / 2.2; END; /* 作用与以下相同 Newwt1 = wt1 / 2.2 ; Newwt2 = wt2 / 2.2; Newwt3 = wt3 / 2.2; *************************************/
17
if se(senumber) = 1 then hse(senumber) = 0; else
ARRAY se(10) se1-se10; ARRAY hse(10) hse1-hse10; DO senumber = 1 to 10; if se(senumber) = 1 then hse(senumber) = 0; else if se(senumber) in(2,3,4) then hse(senumber) = 100; END; 新变量
18
PROC PRINT DATA = example (OBS=15);
VAR bmi rbmi1 rbmi2 seavg semin semax ; TITLE 'Listing of Selected Data for 15 Patients '; RUN; PROC FREQ DATA = example; TABLES semax; TITLE 'Distribution of Worse Side Effect Value'; TITLE2 'Side Effect Scores Range from 1 to 4'; PROC MEANS DATA = example; VAR hse1-hse10; TITLE 'Percent of Patients With Condition by Condition'; PROC UNIVARIATE DATA = example PLOT; VAR ursod lursod; TITLE 'Stem and Leaf Plots for Urine Sodium Data';
19
Listing of Selected Data for 15 Patients
Obs bmi rbmi1 rbmi2 seavg semin semax
20
Distribution of Worse Side Effect Value
Side Effect Scores Range from 1 to 4 The FREQ Procedure Cumulative Cumulative semax Frequency Percent Frequency Percent 2 patients had at least 1 severe side effect
21
These means are percent of patients with se
Percent of Patients With Condition by Condition The MEANS Procedure Variable N Mean Std Dev Minimum Maximum hse hse hse hse hse hse hse hse hse hse These means are percent of patients with se
22
The UNIVARIATE Procedure
Variable: ursod Normal Probability Plot * | * | * * | *** +++ | * +++ * +++ | *++ | * *** | *** | ** ****** | ***** | ******** 15+* * ** ** +++
23
Log transformed value shows a better linear pattern
Variable: lursod Normal Probability Plot * | *++ | **++ | **++ | ** + * ++ | *++ | *+ | *** | ** ** | * | ** | *** | *** ** | ** | * | **** | ** **+ | *+ | | **+** | * + 2.65+* ++ Log transformed value shows a better linear pattern
24
fullname = Gregory A. Grandits
* 字符操作函数; 假如已知: fname = GREGORY lname = GRANDITS MI = A 目的是创建一个新变量 fullname = Gregory A. Grandits Working with names, addresses, etc.
25
需要的函数/操作符 SUBSTR 从一个字符变量抽取一个子集 LOWCASE 把字符变成小写 COMPBL 删除字符之间的空格
|| 连接变量或字符串,例如 var1 = 'abc'; var2 = 'def'; var3 = var1||var2; var3 的值是 'abcdef' var=SUBSTR(argument,position<,n>): extracts a substring from an argument.
26
字符操作函数; DATA names; INFILE DATALINES DSD; INFORMAT fname $20. lname $20. mi $1. ; INPUT lname fname mi ; LENGTH fnamemix $20. lnamemix $20. fullname $44.; * 从name中抽取第一个字符,然后接上从第二个字符开始的所有的字符的小写 ; fnamemix = SUBSTR(fname,1,1) || LOWCASE(SUBSTR(fname,2)); lnamemix = SUBSTR(lname,1,1) || LOWCASE(SUBSTR(lname,2)); * 连接3个名字,并删除多余的空格; fullname = COMPBL (fnamemix || mi || '. ' || lnamemix ) ; DATALINES; GRANDITS, GREGORY, A SIU, YI, W ; Obs fnamemix lnamemix mi fullname 1 Gregory Grandits A Gregory A. Grandits 2 Yi Siu W Yi W. Siu
27
SCAN(argument,n <,delimiters>)
DATA names; INFILE DATALINES DSD; INFORMAT fullname $44.; INPUT fullname ; LENGTH fname $20. lname $20. mi $2.; fname = SCAN(fullname,1); *Take 1st word; mi = SCAN(fullname,2,' '); *Take 2nd word; lname = SCAN(fullname,3); *Take 3rd word; DATALINES; Gregory A. Grandits Yi W. Siu ; SCAN(argument,n <,delimiters>)
28
PROC PRINT DATA=names; VAR fullname fname mi lname; TITLE ‘原变量和几个新变量';
RUN; Obs fullname fname mi lname 1 Gregory A. Grandits Gregory A. Grandits 2 Yi W. Siu Yi W. Siu
Similar presentations