Stat 302 Statistical Software and Its Applications SAS Functions

Stat 302 Statistical Software and Its Applications SAS Functions Fritz Scholz Department of Statistics, University of Washington Winter Quarter 2013 ...
Author: Amanda Heath
5 downloads 0 Views 231KB Size
Stat 302 Statistical Software and Its Applications SAS Functions Fritz Scholz Department of Statistics, University of Washington

Winter Quarter 2013

1

Creating New Variables

Here we create new variables using functions applied to existing variables. We will do this by example, showing code and output. This is based on Ch. 11 in Cody's book.

2

A List of Functions Operator

More Math Description

ABS(x) Absolute value of x = |x | EXP(x) Exponential of x = e x INT(x) Truncate x to an integer LOG(x) Natural log of x, ln(x ) = loge (x ) LOG10(x) Log base 10 of x = log (x ) MOD(x,d) Remainder when x is divided by d ROUND(num) Round num to the nearest integer ROUND(num,unit) Round num to the nearest specied unit SQRT(x) Square root of x NORMAL(seed) Returns a standard normal random number UNIFORM(seed) Returns a uniform[0,1] random number 10

3

Example Code for Function Evaluations data functions; input x; rndx = round(x); intx = int(x); sqx = x∗∗2; sqrtx = sqrt(x); log10x = log10(x); lnx = log(x); absx = abs(x); expx = exp(x); U = uniform(123); rndU = round(U,.02); Z = normal(123); modZ = mod(Z,.5); datalines; 4 9.3 -9.3 run; title "Function Evaluations"; proc print data=functions noobs; run; 4

SAS Output

Page 1 of 1

Output for Function Evaluations Function Evaluations x rndx intx 4.0

4

9.3

9

-9.3

-9

sqx

sqrtx

log10x

lnx absx

expx

U rndU

Z

modZ

4 16.00 2.00000 0.60206 1.38629

4.0

54.60 0.75040

0.76 0.65572 0.15572

9 86.49 3.04959 0.96848 2.23001

9.3 10938.02 0.90603

0.90 0.25903 0.25903

9.3

0.78 0.96175 0.46175

-9 86.49

.

.

.

0.00 0.78644

Here we used seed 123 in U = uniform(123) and Z = normal(123). This will always generate the same random deviates, but dierent from row to row. Using U = uniform(0) and Z = normal(0) will produce dierent results from run to run. The system provides a random seed. Seed < 2 − 1. If seed ≥ 2 − 1, a missing value results, but next seeds are OK. 31

5

31

Functions That Work With Missing Values

libname learn "U:\learn" ; data test_miss; set learn.blood; if Gender = ’ ’ then MissGender +1; if WBC = . then MissWBC + 1; if RBC = . then MissWBC + 1; if Chol < 200 and Chol ne . then Level = ’Low’ ; else if Chol ge 200 then Level = ’High’ ; run; title "Missing Values in Blood Data"; proc print data = test_miss noobs; run;

6

SAS Output

Page 1 of 29

Output for Function Evaluations Missing Values in Blood Data Gender BloodType AgeGroup Subject

7

WBC RBC Chol MissGender MissWBC Level

Female AB

Young

1

7710

7.40

258

0

Male

AB

Old

2

6560

4.70

.

0

0 Hig 0

Male

A

Young

3

5690

7.53

184

0

0 Low

Male

B

Old

4

6680

6.85

.

0

0

Male

A

Young

5

.

7.72

187

0

1 Low

Male

A

Old

6

6140

3.69

142

0

1 Low

Female A

Young

7

6550

4.78

290

0

1 Hig

Male

O

Old

8

5200

4.96

151

0

1 Low

Male

O

Young

9

.

5.66

311

0

2 Hig

Female O

Young

10

7710

5.55

.

0

2

Male

B

Young

11

.

5.62

152

0

3 Low

Female O

Young

12

7410

5.85

241

0

3 Hig

Male

Young

13

5780

4.37

.

0

3

Female O

O

Old

14

5590

6.94

152

0

3 Low

Female A

Old

15

6520

6.03

217

0

3 Hig

Female O

Young

16

7210

5.17

193

0

3 Low

Male

A

Old

17

.

5.63

.

0

4

Male

O

Old

18

6410

6.02

224

0

4 Hig

Female A

Old

19

6360

3.74

211

0

4 Hig

Male

Young

20

7580

5.13

179

0

4 Low

A

Female A

Old

21

7150

6.35

200

0

4 Hig

Female A

Young

22

8710

5.12

211

0

4 Hig

Female O

Young

23

7660

4.91

.

0

4

Female B

Young

24

8280

6.14

.

0

4 4 Low

Female AB

Old

25

7480

4.70

183

0

Male

Young

26

8320

4.74

186

0

4 Low

Old

27

8020

5.03

182

0

4 Low

O

Female A

Functions That Work With Missing Values libname learn "U:\learn" ; libname data "U:\data"; data data.test_miss; set learn.blood; if missing(Gender) then MissGender +1; if missing(WBC) then MissWBC + 1; if missing(RBC) then MissWBC + 1; if Chol < 200 and not missing(Chol) then Level = ’Low’ ; else if Chol ge 200 then Level = ’High’ ; run; title "Missing Values in Blood Data"; proc print data = data.test_miss noobs; run;

Same output result, but saved as permanent data set U:\data\test_miss.sas7bdat 8

Comment on MISSING Function

The MISSING function also returns a TRUE for alternative numeric values, such as .A, .B, .C, ..., .Z and . Such alternative numeric missing values can be useful when you deal with dierent types of missing information, For example .A could stand for no answer while .B could mean not applicable and .C could mean to be determined

9

Descriptive Statistics Functions data psych; input ID $ Q1-Q10; if n(of Q1-Q10) ge 7 then Score = mean(of Q1-Q10); MaxScore = max(of Q1-Q10); MinScore = min(of Q1-Q10); datalines; 001 4 1 3 9 1 2 3 5 . 3 002 3 5 4 2 . . . 2 4 . 003 9 8 7 6 5 4 3 2 1 5 ; title "Descriptive Stats"; proc print data = psych noobs; ∗ var Score maxScore MinScore; run;

10

Output from Descriptive Statistics Functions

SAS Output

Page

Descriptive Stats ID

11

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

Score MaxScore MinScore

001

4

1

3

9

1

2

3

5

.

3 3.44444

9

1

002

3

5

4

2

.

.

.

2

4

.

.

5

2

003

9

8

7

6

5

4

3

2

1

5 5.00000

9

1

Output from Descriptive Statistics Functions

SAS Output

Page

Descriptive Stats Score MaxScore MinScore

12

3.44444

9

1

.

5

2

5.00000

9

1

Comments on Previous Program

The N function returns the number of non-missing numeric values among its arguments. A companion function NMISS returns the number of missing numeric values among its arguments. In all such functions you must precede the list of variables Var1-Varn with the key word OF, otherwise SAS assumes that you subtract Varn from Var1. The MEAN function ignores missing values. This also applies to other such functions.

13

Working with Order Statistics data order_stats; set psych; ∗ assumes that psych was created previously and is still in the WORK library; M3 = mean(largest(1,of Q1-Q10), largest(2,of Q1-Q10),largest(3,of Q1-Q10), smallest(1,of Q1-Q10),smallest(2,of Q1-Q10), smallest(3,of Q1-Q10)); ∗ computes the average of the 3 extreme observations from each end; run; title "Mean of Order Statistics"; proc print data = order_stats noobs; var Q1-Q10 M3; run; 14

Working with Order Statistics

SAS Output

Page

Mean of Order Statistics Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

15

M3

4

1

3

9

1

2

3

5

.

3 3.66667

3

5

4

2

.

.

.

2

4

. 3.33333

9

8

7

6

5

4

3

2

1

5 5.00000

The SUM Function data sum; set learn.EndOfYear; ∗ assumes that Learn is in the library; Total = sum(0,of Pay1-Pay12, of Extra1-Extra12); run; Title "Total Pay"; proc print data = sum noobs; var Total; run;

Using Total = Pay1+...+Pay12+Extra1+...+Extra12 would give missing values. And you would need to write all variables out. The SUM function ignores missing values, allows of Pay1-Pay12. 16

Order Statistics

SAS Output

Page

Total Pay Total 35800 0 28300

17