Stat 302 Statistical Software and Its Applications SAS Functions Fritz Scholz Department of Statistics, University of Washington
Winter Quarter 2013
1
Creating New Variables
Here we create new variables using functions applied to existing variables. We will do this by example, showing code and output. This is based on Ch. 11 in Cody's book.
2
A List of Functions Operator
More Math Description
ABS(x) Absolute value of x = |x | EXP(x) Exponential of x = e x INT(x) Truncate x to an integer LOG(x) Natural log of x, ln(x ) = loge (x ) LOG10(x) Log base 10 of x = log (x ) MOD(x,d) Remainder when x is divided by d ROUND(num) Round num to the nearest integer ROUND(num,unit) Round num to the nearest specied unit SQRT(x) Square root of x NORMAL(seed) Returns a standard normal random number UNIFORM(seed) Returns a uniform[0,1] random number 10
3
Example Code for Function Evaluations data functions; input x; rndx = round(x); intx = int(x); sqx = x∗∗2; sqrtx = sqrt(x); log10x = log10(x); lnx = log(x); absx = abs(x); expx = exp(x); U = uniform(123); rndU = round(U,.02); Z = normal(123); modZ = mod(Z,.5); datalines; 4 9.3 -9.3 run; title "Function Evaluations"; proc print data=functions noobs; run; 4
SAS Output
Page 1 of 1
Output for Function Evaluations Function Evaluations x rndx intx 4.0
4
9.3
9
-9.3
-9
sqx
sqrtx
log10x
lnx absx
expx
U rndU
Z
modZ
4 16.00 2.00000 0.60206 1.38629
4.0
54.60 0.75040
0.76 0.65572 0.15572
9 86.49 3.04959 0.96848 2.23001
9.3 10938.02 0.90603
0.90 0.25903 0.25903
9.3
0.78 0.96175 0.46175
-9 86.49
.
.
.
0.00 0.78644
Here we used seed 123 in U = uniform(123) and Z = normal(123). This will always generate the same random deviates, but dierent from row to row. Using U = uniform(0) and Z = normal(0) will produce dierent results from run to run. The system provides a random seed. Seed < 2 − 1. If seed ≥ 2 − 1, a missing value results, but next seeds are OK. 31
5
31
Functions That Work With Missing Values
libname learn "U:\learn" ; data test_miss; set learn.blood; if Gender = ’ ’ then MissGender +1; if WBC = . then MissWBC + 1; if RBC = . then MissWBC + 1; if Chol < 200 and Chol ne . then Level = ’Low’ ; else if Chol ge 200 then Level = ’High’ ; run; title "Missing Values in Blood Data"; proc print data = test_miss noobs; run;
6
SAS Output
Page 1 of 29
Output for Function Evaluations Missing Values in Blood Data Gender BloodType AgeGroup Subject
7
WBC RBC Chol MissGender MissWBC Level
Female AB
Young
1
7710
7.40
258
0
Male
AB
Old
2
6560
4.70
.
0
0 Hig 0
Male
A
Young
3
5690
7.53
184
0
0 Low
Male
B
Old
4
6680
6.85
.
0
0
Male
A
Young
5
.
7.72
187
0
1 Low
Male
A
Old
6
6140
3.69
142
0
1 Low
Female A
Young
7
6550
4.78
290
0
1 Hig
Male
O
Old
8
5200
4.96
151
0
1 Low
Male
O
Young
9
.
5.66
311
0
2 Hig
Female O
Young
10
7710
5.55
.
0
2
Male
B
Young
11
.
5.62
152
0
3 Low
Female O
Young
12
7410
5.85
241
0
3 Hig
Male
Young
13
5780
4.37
.
0
3
Female O
O
Old
14
5590
6.94
152
0
3 Low
Female A
Old
15
6520
6.03
217
0
3 Hig
Female O
Young
16
7210
5.17
193
0
3 Low
Male
A
Old
17
.
5.63
.
0
4
Male
O
Old
18
6410
6.02
224
0
4 Hig
Female A
Old
19
6360
3.74
211
0
4 Hig
Male
Young
20
7580
5.13
179
0
4 Low
A
Female A
Old
21
7150
6.35
200
0
4 Hig
Female A
Young
22
8710
5.12
211
0
4 Hig
Female O
Young
23
7660
4.91
.
0
4
Female B
Young
24
8280
6.14
.
0
4 4 Low
Female AB
Old
25
7480
4.70
183
0
Male
Young
26
8320
4.74
186
0
4 Low
Old
27
8020
5.03
182
0
4 Low
O
Female A
Functions That Work With Missing Values libname learn "U:\learn" ; libname data "U:\data"; data data.test_miss; set learn.blood; if missing(Gender) then MissGender +1; if missing(WBC) then MissWBC + 1; if missing(RBC) then MissWBC + 1; if Chol < 200 and not missing(Chol) then Level = ’Low’ ; else if Chol ge 200 then Level = ’High’ ; run; title "Missing Values in Blood Data"; proc print data = data.test_miss noobs; run;
Same output result, but saved as permanent data set U:\data\test_miss.sas7bdat 8
Comment on MISSING Function
The MISSING function also returns a TRUE for alternative numeric values, such as .A, .B, .C, ..., .Z and . Such alternative numeric missing values can be useful when you deal with dierent types of missing information, For example .A could stand for no answer while .B could mean not applicable and .C could mean to be determined
9
Descriptive Statistics Functions data psych; input ID $ Q1-Q10; if n(of Q1-Q10) ge 7 then Score = mean(of Q1-Q10); MaxScore = max(of Q1-Q10); MinScore = min(of Q1-Q10); datalines; 001 4 1 3 9 1 2 3 5 . 3 002 3 5 4 2 . . . 2 4 . 003 9 8 7 6 5 4 3 2 1 5 ; title "Descriptive Stats"; proc print data = psych noobs; ∗ var Score maxScore MinScore; run;
10
Output from Descriptive Statistics Functions
SAS Output
Page
Descriptive Stats ID
11
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
Score MaxScore MinScore
001
4
1
3
9
1
2
3
5
.
3 3.44444
9
1
002
3
5
4
2
.
.
.
2
4
.
.
5
2
003
9
8
7
6
5
4
3
2
1
5 5.00000
9
1
Output from Descriptive Statistics Functions
SAS Output
Page
Descriptive Stats Score MaxScore MinScore
12
3.44444
9
1
.
5
2
5.00000
9
1
Comments on Previous Program
The N function returns the number of non-missing numeric values among its arguments. A companion function NMISS returns the number of missing numeric values among its arguments. In all such functions you must precede the list of variables Var1-Varn with the key word OF, otherwise SAS assumes that you subtract Varn from Var1. The MEAN function ignores missing values. This also applies to other such functions.
13
Working with Order Statistics data order_stats; set psych; ∗ assumes that psych was created previously and is still in the WORK library; M3 = mean(largest(1,of Q1-Q10), largest(2,of Q1-Q10),largest(3,of Q1-Q10), smallest(1,of Q1-Q10),smallest(2,of Q1-Q10), smallest(3,of Q1-Q10)); ∗ computes the average of the 3 extreme observations from each end; run; title "Mean of Order Statistics"; proc print data = order_stats noobs; var Q1-Q10 M3; run; 14
Working with Order Statistics
SAS Output
Page
Mean of Order Statistics Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
15
M3
4
1
3
9
1
2
3
5
.
3 3.66667
3
5
4
2
.
.
.
2
4
. 3.33333
9
8
7
6
5
4
3
2
1
5 5.00000
The SUM Function data sum; set learn.EndOfYear; ∗ assumes that Learn is in the library; Total = sum(0,of Pay1-Pay12, of Extra1-Extra12); run; Title "Total Pay"; proc print data = sum noobs; var Total; run;
Using Total = Pay1+...+Pay12+Extra1+...+Extra12 would give missing values. And you would need to write all variables out. The SUM function ignores missing values, allows of Pay1-Pay12. 16
Order Statistics
SAS Output
Page
Total Pay Total 35800 0 28300
17