Keywords and Formulas

SAS Elementary Statistics Procedures: Keywords and Formulas Page 1 of 10 Previous Page | Next Page SAS Elementary Statistics Procedures Keywords a...
Author: Jessica Ryan
15 downloads 0 Views 387KB Size
SAS Elementary Statistics Procedures: Keywords and Formulas

Page 1 of 10

Previous Page | Next Page

SAS Elementary Statistics Procedures

Keywords and Formulas Simple Statistics The base SAS procedures use a standardized set of keywords to refer to statistics. You specify these keywords in SAS statements to request the statistics to be displayed or stored in an output data set. In the following notation, summation is over observations that contain nonmissing values of the analyzed variable and, except where shown, over nonmissing weights and frequencies of one or more:

is the nonmissing value of the analyzed variable for observation i.

is the frequency that is associated with statement, then

if you use a FREQ statement. If you omit the FREQ

i.

is the weight that is associated with automatically exclude the values of

if you use a WEIGHT statement. The base procedures

By default, the base procedures treat a negative weight as if it is equal to zero. However, if you use the EXCLNPWGT option in the PROC statement, then the procedure also excludes those values of with nonpositive weights. Note that most SAS/STAT procedures, such as PROC TTEST and PROC GLM, exclude values with nonpositive weights by default. If you omit the WEIGHT statement, then

for all i.

is the number of nonmissing values of , . If you use the EXCLNPWGT option and the WEIGHT statement, then is the number of nonmissing values with positive weights.

is the mean

is the variance

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

Page 2 of 10

where is the variance divisor (the VARDEF= option) that you specify in the PROC statement. Valid values are as follows: When VARDEF=

equals . . .

N DF WEIGHT WDF The default is DF.

is the standardized variable

The standard keywords and formulas for each statistic follow. Some formulas use keywords to designate the corresponding statistic. The Most Common Simple Statistics PROC MEANS and SUMMARY

PROC UNIVARIATE

PROC TABULATE

PROC REPORT

Number of missing values

X

X

X

X

Number of nonmissing values

X

X

X

X

Number of observations

X

X

Sum of weights

X

X

X

X

X

X

Mean

X

X

X

X

X

X

Sum

X

X

X

X

X

X

Extreme values

X

X

Minimum

X

X

X

X

X

X

Maximum

X

X

X

X

X

X

Range

X

X

X

X

Uncorrected sum of squares

X

X

X

X

X

X

Corrected sum of squares

X

X

X

X

X

X

Variance

X

X

X

X

X

X

Statistic

Covariance

PROC CORR

PROC SQL X

X

X X

X

X

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

Page 3 of 10

Standard deviation

X

X

X

X

X

X

Standard error of the mean

X

X

X

X

X

Coefficient of variation

X

X

X

X

X

Skewness

X

X

X

Kurtosis

X

X

X

X

X

X

Confidence Limits of the mean of the variance

X

of quantiles

X

Median

X

X

X

X

Mode

X

X

X

X

Percentiles/Deciles/Quartiles

X

X

X

X

X

X

X

X

X

t test for mean=0 for mean=

X

X

Nonparametric tests for location

X

Tests for normality

X

Correlation coefficients

X

Cronbach's alpha

X

Descriptive Statistics The keywords for descriptive statistics are CSS is the sum of squares corrected for the mean, computed as

CV is the percent coefficient of variation, computed as

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

Page 4 of 10

KURTOSIS | KURT is the kurtosis, which measures heaviness of tails. When VARDEF=DF, the kurtosis is computed as

where

is

. The weighted kurtosis is computed as

When VARDEF=N, the kurtosis is computed as

and the weighted kurtosis is computed as

where is . The formula is invariant under the transformation When you use VARDEF=WDF or VARDEF=WEIGHT, the kurtosis is set to missing. Note: PROC MEANS and PROC TABULATE do not compute weighted kurtosis.

.

MAX is the maximum value of

.

MEAN is the arithmetic mean

.

MIN is the minimum value of

.

MODE is the most frequent value of

.

Note: When QMETHOD=P2, PROC REPORT, PROC MEANS, and PROC TABULATE do not compute MODE. N less than one and equal to is the number of missing or (when you use the EXCLNPWGT option) are excluded from the analysis and are not included in the calculation of N.

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

Page 5 of 10

NMISS is the number of values that are missing. Observations with missing or (when you use the EXCLNPWGT option) are excluded from the analysis and are not included in the calculation of NMISS. NOBS is the total number of observations and is calculated as the sum of N and NMISS. However, if you use the WEIGHT statement, then NOBS is calculated as the sum of N, NMISS, and the number of observations excluded because of missing or nonpositive weights. RANGE is the range and is calculated as the difference between maximum value and minimum value. SKEWNESS | SKEW is skewness, which measures the tendency of the deviations to be larger in one direction than in the other. When VARDEF=DF, the skewness is computed as

where

is

. The weighted skewness is computed as

When VARDEF=N, the skewness is computed as

and the weighted skewness is computed as

The formula is invariant under the transformation . When you use VARDEF=WDF or VARDEF=WEIGHT, the skewness is set to missing. Note: PROC MEANS and PROC TABULATE do not compute weighted skewness. STDDEV|STD is the standard deviation s and is computed as the square root of the variance,

.

STDERR | STDMEAN is the standard error of the mean, computed as

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

Page 6 of 10

when VARDEF=DF, which is the default. Otherwise, STDERR is set to missing. SUM is the sum, computed as

SUMWGT is the sum of the weights,

, computed as

USS is the uncorrected sum of squares, computed as

VAR is the variance

.

Quantile and Related Statistics The keywords for quantiles and related statistics are MEDIAN is the middle value. P1 is the 1st percentile. P5 is the 5th percentile. P10 is the 10th percentile. P90 is the 90th percentile. P95 is the 95th percentile.

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

Page 7 of 10

P99 is the 99th percentile. Q1 is the lower quartile (25th percentile). Q3 is the upper quartile (75th percentile). QRANGE is interquartile range and is calculated as

You use the QNTLDEF= option (PCTLDEF= in PROC UNIVARIATE) to specify the method that the procedure uses to compute percentiles. Let is the smallest value, smallest value, and define

as the integer part of

. Then and

as the fractional part of

, so that

Here, QNTLDEF= specifies the method that the procedure uses to compute the tth percentile, as shown in the table that follows. When you use the WEIGHT statement, the tth percentile is computed as

is the sum of the weights. When the where is the weight associated with observations have identical weights, the weighted percentiles are the same as the unweighted percentiles with QNTLDEF=5. Methods for Computing Quantile Statistics QNTLDEF=

Description

1

weighted average at

Formula

where

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

2

observation numbered closest to

Page 8 of 10

if if is even if is odd where i is the integer part of

3

empirical distribution function

if if

4

weighted average aimed at where 5

empirical distribution function with averaging

if if

Hypothesis Testing Statistics The keywords for hypothesis testing statistics are T is the Student's t statistic to test the null hypothesis that the population mean is equal to calculated as

By default, is equal to zero. You can use the MU0= option in the PROC UNIVARIATE statement to specify . You must use VARDEF=DF, which is the default variance divisor, otherwise T is set to missing. By default, when you use a WEIGHT statement, the procedure counts the values with nonpositive weights in the degrees of freedom. Use the EXCLNPWGT option in the PROC statement to exclude values with nonpositive weights. Most SAS/STAT procedures, such as PROC TTEST and PROC GLM automatically exclude values with nonpositive weights. PROBT | PRT is the two-tailed p-value for Student's t statistic, T, with probability under the null hypothesis of obtaining a more extreme value of T than is observed in this sample.

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

Page 9 of 10

Confidence Limits for the Mean The keywords for confidence limits are CLM is the two-sided confidence limit for the mean. A two-sided for the mean has upper and lower limits

percent confidence interval

where is , is the ( ) critical value of the Student's t statistics with by default is 0.05. Unless you use VARDEF=DF, which is the default variance divisor, CLM is set to missing. LCLM is the one-sided confidence limit below the mean. The one-sided interval for the mean has the lower limit

percent confidence

Unless you use VARDEF=DF, which is the default variance divisor, LCLM is set to missing. UCLM is the one-sided confidence limit above the mean. The one-sided interval for the mean has the upper limit

percent confidence

Unless you use VARDEF=DF, which is the default variance divisor, UCLM is set to missing.

Using Weights For more information on using weights and an example, see WEIGHT.

Data Requirements for Summarization Procedures The following are the minimal data requirements to compute unweighted statistics and do not describe

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013

SAS Elementary Statistics Procedures: Keywords and Formulas

Page 10 of 10

recommended sample sizes. Statistics are reported as missing if VARDEF=DF (the default) and the following requirements are not met: z

N and NMISS are computed regardless of the number of missing or nonmissing observations.

z

SUM, MEAN, MAX, MIN, RANGE, USS, and CSS require at least one nonmissing observation.

z

VAR, STD, STDERR, CV, T, PRT, and PROBT require at least two nonmissing observations.

z

SKEWNESS requires at least three nonmissing observations.

z

KURTOSIS requires at least four nonmissing observations.

z

SKEWNESS, KURTOSIS, T, PROBT, and PRT require that STD is greater than zero.

z

CV requires that MEAN is not equal to zero.

z

CLM, LCLM, UCLM, STDERR, T, PRT, and PROBT require that VARDEF=DF.

Previous Page | Next Page | Top of Page Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

ms-its:C:\Program Files\SAS\SASFoundation\9.2\core\help\proc.chm::/proc.hlp/a0024733... 9/16/2013