日期和函数 经济实验教学中心 商务数据挖掘中心
日期有多种形式 10/18/04 18/10/04 10/18/2004 18OCT2004 101804 October 18, 2004 我们需要了解如何读入日期,以及如何处理日期。
关于日期的问题 如何显示日期? 比较两个日期-哪个更早? 如何求出两个日期之间相差的天数? ndays = date2 - date1,这样计算行吗? 问题:日期的用普通的减法不行 例如: date2 = 03/02/2003 date1 = 08/02/2002 ========== -05/00/0001
Jan 1, 1960 DATA dates; INFILE DATALINES; INPUT brthdate mmddyy10.; 03/03/1971 02/14/1956 01/01/1960 ; PROC PRINT; VAR brthdate; PROC PRINT; VAR brthdate; FORMAT brthdate mmddyy10.; ------------------------------------------------------ Obs brthdate 1 4079 2 -1417 3 0 1 03/03/1971 2 02/14/1956 3 01/01/1960 Jan 1, 1960
当你用日期informat读入数据时 SAS 给定的数值是1960年1月1日后的天数 这使得两个日期的相减容易了。
* 关于日期; DATA age; INFILE '/home/ph5420/data/tomhs.data' ; INPUT @14 randdate mmddyy10. @34 brthdate mmddyy10. @74 date12 mmddyy10. ; agedays = randdate - brthdate ; ageyrs = (randdate - brthdate)/365.25; ageint = INT( (randdate - brthdate)/365.25); agetoday= (TODAY() - brthdate)/365.25 ; ageendst= (MDY(02,28,1992) - brthdate)/365.25; daysv12 = date12 - randdate; if ABS(daysv12 - 365) = . then window12 = .; else if ABS(daysv12 - 365) < 31 then window12 = 1; else if ABS(daysv12 - 365) >= 31 then window12 = 2; yrrand = YEAR(randdate);
PROC PRINT DATA=age (obs=10); VAR brthdate randdate agedays ageyrs ageint agetoday; TITLE ‘显示不加日期Format的数据'; RUN; VAR brthdate randdate agedays ageyrs ageint agetoday; FORMAT brthdate mmddyy10. randdate mmddyy10.; TITLE ‘显示加了日期Format的数据';
显示不加日期Format的数据 All before 1960 Obs brthdate randdate agedays ageyrs ageint agetoday 1 -8589 10175 18764 51.3730 51 69.0678 2 -6880 10239 17119 46.8693 46 64.3888 3 -12572 10002 22574 61.8042 61 79.9726 4 -9592 10175 19767 54.1191 54 71.8138 5 -12996 10280 23276 63.7262 63 81.1335 All before 1960
显示加了日期Format的数据 Obs brthdate randdate 1 06/26/1936 11/10/1987 1 06/26/1936 11/10/1987 2 03/01/1941 01/13/1988 3 07/31/1925 05/21/1987 4 09/27/1933 11/10/1987 5 06/02/1924 02/23/1988
PROC PRINT DATA=age (OBS=20); VAR randdate date12 daysv12 window12; FORMAT randdate date12 mmddyy8.; TITLE 'Printing Days From Randomization to 1st Year Visit'; RUN; PROC FREQ DATA=age; TABLES yrrand; TITLE 'Frequency Distribution of Year Randomized';
Obs randdate date12 daysv12 window12 1 11/10/87 11/25/88 381 1 2 01/13/88 01/09/89 362 1 3 05/21/87 . . . 4 11/10/87 11/30/88 386 1 5 02/23/88 02/13/89 356 1 6 11/12/87 11/02/88 356 1 7 12/05/86 12/03/87 363 1 8 06/12/87 06/16/88 370 1 9 01/21/88 01/09/89 354 1 10 04/16/87 04/04/88 354 1 11 08/12/87 08/10/88 364 1 12 04/16/87 05/02/88 382 1 13 02/02/88 02/08/89 372 1 14 11/04/86 11/30/87 391 1 15 05/27/87 06/08/88 378 1 16 03/29/88 07/13/89 471 2
Frequency Distribution of Year Randomized The FREQ Procedure Cumulative Cumulative yrrand Frequency Percent Frequency Percent ----------------------------------------------------------- 1986 9 9.00 9 9.00 1987 65 65.00 74 74.00 1988 26 26.00 100 100.00
* 关于函数 ; DATA example; INFILE '/home/ph5420/data/tomhs.data' ; INPUT @058 height 4.1 @085 weight 5.1 @172 ursod 3. @236 (se1-se10) (1.0 + 1); bmi = (weight*703.0768)/(height*height); rbmi1 = ROUND(bmi,1); rbmi2 = ROUND(bmi,.1); lursod = LOG(ursod); seavg = MEAN (OF se1-se10); semin = MIN (OF se1-se10); semax = MAX (OF se1-se10);
seavg = MEAN (OF se1-se10); 这和下面的写法等价 * 使用横线“-”符号; seavg = MEAN (OF se1-se10); 这和下面的写法等价 seavg = MEAN (se1,se2,se3,se4,se5,se6,se7,se8,se9,se10); 注意:OF 非常重要. 否则,SAS 认为你想做se1减去se10. 要使用这一符号,切记变量名的“根”必须相同.
* 计算平均值的两种方法 ; seavg = MEAN (se1,se2,se3,se4,se5,se6,se7,se8,se9,se10); 和 seavg = (se1+se2+se3+se4+se5+se6+se7+se8+se9+se10)/10; 第一种方法计算的是没有缺失的数值的平均值, 当所有的数值都缺失时才会没有结果。 第二种方法需要所有的值都不缺少,否则没有结果。 if N(of se1-se10) > 5 then seavg = MEAN(of se1-se10); 这个句子是什么意思?
队列(ARRAYS) - 用于缩短代码 用于重复执行相同的代码 - 用于DO/END 循环 用于重复执行相同的代码 - 用于DO/END 循环 ARRAY wtlb(3) wt1 wt2 wt3; ARRAY wtkg(3) newwt1 newwt2 newwt3; DO index = 1 to 3; wtkg(index) = wtlb(index) / 2.2; END; /* 作用与以下相同 Newwt1 = wt1 / 2.2 ; Newwt2 = wt2 / 2.2; Newwt3 = wt3 / 2.2; *************************************/
if se(senumber) = 1 then hse(senumber) = 0; else ARRAY se(10) se1-se10; ARRAY hse(10) hse1-hse10; DO senumber = 1 to 10; if se(senumber) = 1 then hse(senumber) = 0; else if se(senumber) in(2,3,4) then hse(senumber) = 100; END; 新变量
PROC PRINT DATA = example (OBS=15); VAR bmi rbmi1 rbmi2 seavg semin semax ; TITLE 'Listing of Selected Data for 15 Patients '; RUN; PROC FREQ DATA = example; TABLES semax; TITLE 'Distribution of Worse Side Effect Value'; TITLE2 'Side Effect Scores Range from 1 to 4'; PROC MEANS DATA = example; VAR hse1-hse10; TITLE 'Percent of Patients With Condition by Condition'; PROC UNIVARIATE DATA = example PLOT; VAR ursod lursod; TITLE 'Stem and Leaf Plots for Urine Sodium Data';
Listing of Selected Data for 15 Patients Obs bmi rbmi1 rbmi2 seavg semin semax 1 28.2620 28 28.3 1.1 1 2 2 35.9963 36 36.0 1.0 1 1 3 27.0489 27 27.0 1.0 1 1 4 28.2620 28 28.3 1.1 1 2 5 33.2008 33 33.2 1.0 1 1 6 27.7691 28 27.8 1.2 1 2 7 32.6040 33 32.6 1.0 1 1 8 22.4057 22 22.4 1.2 1 2 9 37.2037 37 37.2 1.1 1 2 10 33.1717 33 33.2 1.7 1 3
Distribution of Worse Side Effect Value Side Effect Scores Range from 1 to 4 The FREQ Procedure Cumulative Cumulative semax Frequency Percent Frequency Percent ---------------------------------------------------------- 1 33 33.00 33 33.00 2 52 52.00 85 85.00 3 13 13.00 98 98.00 4 2 2.00 100 100.00 2 patients had at least 1 severe side effect
These means are percent of patients with se Percent of Patients With Condition by Condition The MEANS Procedure Variable N Mean Std Dev Minimum Maximum hse1 100 12.0000000 32.6598632 0 100.0000000 hse2 100 21.0000000 40.9360181 0 100.0000000 hse3 100 8.0000000 27.2659924 0 100.0000000 hse4 100 13.0000000 33.7997669 0 100.0000000 hse5 100 10.0000000 30.1511345 0 100.0000000 hse6 100 30.0000000 46.0566186 0 100.0000000 hse7 100 16.0000000 36.8452949 0 100.0000000 hse8 100 31.0000000 46.4823199 0 100.0000000 hse9 100 7.0000000 25.6432400 0 100.0000000 hse10 100 14.0000000 34.8735088 0 100.0000000 These means are percent of patients with se
The UNIVARIATE Procedure Variable: ursod Normal Probability Plot 165+ * | * | * 135+ * ++ | *** +++ | * +++ 105+ * +++ | *++ | ++* 75+ ++*** | ++*** | +++ ** 45+ +****** | ***** | ******** 15+* * ** ** +++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
Log transformed value shows a better linear pattern Variable: lursod Normal Probability Plot 5.15+ +* | *++ | **++ | **++ | ** + 4.65+ * ++ | *++ | *+ | *** | ** 4.15+ ** | +* | ++** | +*** | *** 3.65+ ** | ** | +* | **** | ** 3.15+ **+ | *+ | ++ | **+** | * + 2.65+* ++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 Log transformed value shows a better linear pattern
fullname = Gregory A. Grandits * 字符操作函数; 假如已知: fname = GREGORY lname = GRANDITS MI = A 目的是创建一个新变量 fullname = Gregory A. Grandits Working with names, addresses, etc.
需要的函数/操作符 SUBSTR 从一个字符变量抽取一个子集 LOWCASE 把字符变成小写 COMPBL 删除字符之间的空格 || 连接变量或字符串,例如 var1 = 'abc'; var2 = 'def'; var3 = var1||var2; var3 的值是 'abcdef' var=SUBSTR(argument,position<,n>): extracts a substring from an argument.
字符操作函数; DATA names; INFILE DATALINES DSD; INFORMAT fname $20. lname $20. mi $1. ; INPUT lname fname mi ; LENGTH fnamemix $20. lnamemix $20. fullname $44.; * 从name中抽取第一个字符,然后接上从第二个字符开始的所有的字符的小写 ; fnamemix = SUBSTR(fname,1,1) || LOWCASE(SUBSTR(fname,2)); lnamemix = SUBSTR(lname,1,1) || LOWCASE(SUBSTR(lname,2)); * 连接3个名字,并删除多余的空格; fullname = COMPBL (fnamemix || mi || '. ' || lnamemix ) ; DATALINES; GRANDITS, GREGORY, A SIU, YI, W ; Obs fnamemix lnamemix mi fullname 1 Gregory Grandits A Gregory A. Grandits 2 Yi Siu W Yi W. Siu
SCAN(argument,n <,delimiters>) DATA names; INFILE DATALINES DSD; INFORMAT fullname $44.; INPUT fullname ; LENGTH fname $20. lname $20. mi $2.; fname = SCAN(fullname,1); *Take 1st word; mi = SCAN(fullname,2,' '); *Take 2nd word; lname = SCAN(fullname,3); *Take 3rd word; DATALINES; Gregory A. Grandits Yi W. Siu ; SCAN(argument,n <,delimiters>)
PROC PRINT DATA=names; VAR fullname fname mi lname; TITLE ‘原变量和几个新变量'; RUN; Obs fullname fname mi lname 1 Gregory A. Grandits Gregory A. Grandits 2 Yi W. Siu Yi W. Siu