Experimental Design and Data Interpretation: The Use of Statistics

Experimental Design and Data Interpretation: The Use of Statistics ENVE 569 Environmental Risk Assessment Data are not merely numbers but numbers wit...
Author: Madlyn Mills
0 downloads 0 Views 236KB Size
Experimental Design and Data Interpretation: The Use of Statistics ENVE 569 Environmental Risk Assessment

Data are not merely numbers but numbers with a context or meaning

1

Introduction to Statistics and Data Analysis • “One important use of statistics is in the interpretation of experimental results. • The first step in such an examination is the study of the experiment on which the conclusions were based. • By understanding and applying the principles of experimental design, a decision can be made regarding the validity of the study under question. • The problem of design deals with how to plan the experiment to answer specific questions.”

Introduction to Statistics and Data Analysis • Units of measurement must be chosen with care. THIS TEXT uses the following units: – Solution concentrations are reported as g/m3 or g/L (where a liter is 0.001 m3 or approximately 1 qt). – Toxicological doses will be cited as g/kg. – Air concentrations are routinely reported on a mol/mol basis. This cancels the effect of declining air density with altitude. Conversion equation between g/m3 and mol/mol:

g [air concentration (mol/mol)][mol wt (g/mol)] = 22.4 x 10−3 m 3/mol m3

– Pressure is the force exerted per unit area fo surface where force equals mass (kg) times acceleration (m/s2).

2

Introduction to Statistics and Data Analysis • How statistical reasoning works: – – – – – –

Ask a question. Design a study and specify its measures. Collect data and describe them. Simplify the data. Interpret the data. Generalize the findings.

Introduction to Statistics and Data Analysis • What is Statistics? – Collecting data, analyzing it and making meaningful decisions based on the data. • Qualitative data describes attributes while quantitative data describes observations or measurements.

• In risk assessment, most of the statistical applications have to do with predicting “safe” levels of exposure to something that has the potential to cause harm. – Information gathered in the form of samples, or collections of observations. – Samples collected from populations that are collections of all individuals or items of a particular type. – Experiments manipulate factors to investigate the change in the desired result based upon the change in one or more factors. • Observational studies are conducted where the factors cannot be manipulated/controlled.

– Descriptive statistics gives a sense of the center of the data, variability in the data, and the general nature of the observations in the sample. No statistical inferences can typically be learned from descriptive statistics.

3

Introduction to Statistics and Data Analysis •

Simple random sample – any sample of a specified number of units from a population has the same chance of being selected as any other sample of the same size.



Experimental design often involves the invocation of treatments or treatment combinations.

– Problem of biased sampling.

– These treatments become the populations to be studied. – Completely randomized design is one method of ensuring adequate sample design (e.g., drug testing), but has complexities to be addressed. Medicine working or not working may depend on uncontrolled factors.

• •

Variability will camouflage scientific results. Sample Mean (centroid of the data – balance of weights on either side) n

x=∑ i −1



xi n

Sample Median (central tendency of the sample – not as influenced by extreme values or outliers) • If n is an odd number:

If n is an even number:

~ x = x( n +1) / 2

x + xn / 2 +1 ~ x = n/2 2

Introduction to Statistics and Data Analysis • Sample means and medians used to draw inferences about the underlying population. • Measures of location and especially of central tendency do not provide a proper summary of the data set. • Sample range Range = Xmax – Xmin

• Sample Variance (units of measurement squared)

(x − x ) =∑

2

n

s

2

i

i =1

n −1

• Sample Standard Deviation (units of measurement)

s = s2 • Frequency Distribution: groups the data into different classes. • Symmetry – Symmetric distributions can be folded along a vertical axis so that the two sides coincide. – Otherwise, the distribution is said to be skewed.

4

Definitions • Observation – any recording of information, whether it be numerical or categorical • Experiment – any process that generates a set of data • We are particularly interested in the observations obtained by repeating the experiment several times. • Sample Space – set of all possible outcomes of a statistical experiment • [denoted: S (of a coin toss) = {H,T}] • Sample Point, Element, Member – each outcome in the sample space • Inside the {} are all elements of a sample space or a description of them • More than one sample space can be used to describe the outcomes of an experiment. • Event – subset of sample space • Null Set – set that contains no elements [denoted by the symbol Ø].

Definitions • Complement (of an event) – subset of all elements in sample space that are not part of the event (denoted by apostrophe after event name) • Intersection – event that contains the elements that are common to two or more sample spaces – [denoted: Event = (A ∩ B)]

• Mutually Exclusive Sample Spaces – occurs when the intersection of the sample paces is the null set (A ∩ B = Ø) – Also called disjoint.

• Union – event that contains all elements that are in at least one sample space (belong to either A or B or to both) – [denoted: Event = (A U B)

• Generalized Multiplication Rule: – If an operation can be performed in n1 ways, and if for each of these a second operation can be performed in n2 ways, and for each of the first two a third operation can be performed in n3 ways, and so forth, then the sequence of k operations can be performed in n1n2…nk ways.

• Permutation – an arrangement of all or part of a set of objects. – Number of Permutations of n distinct objects = n!

5

Definitions • Likelihood of event occurrence (where there are either a finite or infinite number of outcomes) can be calculated as weights or probabilities ranging from 0 to 1. – To every point (for a finite set of sample members) in this sample space, assign a probability such that the sum of all probabilities is 1. – If a certain sample point is likely to occur when the experiment is conducted, the probability assigned should be close to 1, and vice versa.

• Probability of an event A is the sum of the weights of all sample points in A. Therefore, 0 ≤ P(A) ≤ 1 P(Ø) = 0 P(S) = 1 • Furthermore, if A1, A2, A3, … is a sequence of mutually exclusive events, then P(A1 U A2 U A3 U …) = P(A1) + P(A2) + P(A3) + … • Distributions of Sample Data: – Discrete – deal with enumeration; most data reported in this fashion (particularly due to rounding) – Continuous – deal with data which can take on any value. Any implication that they cannot take on any value is due to reporting techniques.

Continuous Probability Distributions: Normal Distribution • Normal distribution (Gaussian distribution) is the most important continuous probability distribution in statistics. • The normal curve is the bell-shaped curve, which describes approximately many phenomena that occur in nature, industry and research (examples: meteorological experiments, rainfall studies, measurements of manufactured parts). • Properties of the normal curve: – The mode (the point on the horizontal axis where the curve is at a maximum) occurs at x = µ. – The curve is symmetric through the mean µ. – The normal curve approaches the horizontal axis asymptotically as we proceed in either direction away from the mean. – The total area under the curve and above the horizontal axis is equal to 1.

• For a basic understanding of statistics, assume that µ and σ2 are known. Later, will make statistical inferences when µ and σ2 are unknown and have been estimated from the available experimental data.

6

Continuous Probability Distributions Areas Under the Normal Curve • The area under the curve bounded by the two ordinates x = x1 and x = x2 equals the probability that the random variable X assumes a value between x = x1 and x = x2. • The function is very difficult to solve by integration. Therefore, tables are available to aid in the calculation. • Transform all the observations of any normal random variable X to a new set of observations of a normal random variable Z with mean zero and variance 1. The transformation is:

Z=

X −µ

σ

• Standard normal distribution – distribution of a random variable with mean zero and variance 1. • Now have 1 table which indicates areas under the standard normal curve corresponding to P(Z < z) for values of z ranging from -3.49 to 3.49.

Continuous Probability Distributions Example: • •

Given a random variable X having a normal distribution with µ = 50 and σ = 10, find the probability that X assumes a value between 45 and 62. The z-values corresponding to x1 = 45 and x2 = 62 are: z1 = z2 =

x−µ

σ

=

45 − 50 = −0.5 10

62 − 50 = 1.2 10

P(45 < X < 62) = P(-0.5 < Z < 1.2) = P(Z < 1.2) – P(Z < -0.5) = 0.8849 – 0.3085 = 0.5764

Example: • •

Given that X has a normal distribution with µ = 300 and σ = 50, find the probability that X assumes a value greater than 362. Need to evaluate the area under the normal curve for x = 362. The transform is: 362 − 300 z= = 1.24 50 P(X > 362) = P(Z > 1.24) = 1 – P(Z < 1.24) = 1 – 0.8925 = 0.1075

7

Continuous Distributions: Student’s t-Distribution • It has been assumed that the population standard deviation is known. However, neither µ or σ2 may be known (may have to be approximated by the sample mean x and sample variance s2. • An estimate of σ may have to be supplied by the same sample information that produced the sample average . As a result, the statistic to consider to deal with inferences on µ is:

T=

X −µ S n

• The exact statistic has a t-distribution with υ = n – 1 degrees of freedom. • This distribution is often called the Student’s t-distribution. • The distribution of T is similar to the distribution of Z in that they are both symmetric around a mean of zero. Both distributions are bell-shaped, but the t-distribution is more variable, owing to theX fact that the T-values depend on the fluctuations of the two quantities, and S2..

Continuous Distributions: Student’s t-Distribution Example: • A drug manufacturer claims that the population mean yield of a certain batch process is 500 grams per milliliter of raw material. To check this claim he samples 25 batches each month. If the computed t-value falls between –t0.05 and t0.05, he is satisfied with his claim. What conclusion should he draw from a sample that has a mean of = 518 grams per milliliter and a sample standard deviation s = 40 grams? • Assume the distribution of yields to be approximately normal. • From the table, t0.05 = 1.711 for 24 degrees of freedom. • Therefore, the manufacturer is satisfied with his claim if a sample of 25 batches yields a t-value between –1.711 and 1.711. If µ = 500, then

T=

18 X − µ 518 − 500 = = = 2.25 40 40 5 S n 25

• This value is well above 1.711. The probability of obtaining a t-value with υ = 24, equal to or greater than 2.25 is approximately 0.02. If µ > 500, the value of t computed from the sample is likely to be more reasonable. Hence the manufacturer is likely to conclude that the process produces a better product that he thought.

8

One and Two-Sample Estimation Statistical inference may be divided into two major areas: estimation and hypothesis testing. • Classical Methods of Estimation • An estimator is not expected to estimate the population parameter without error. We do not expect x to estimate µ exactly, but we certainly hope that it is not far off.

One and Two-Sample Estimation Unbiased Estimator • What are the desirable properties of a “good” decision function that would influence us to choose one estimator rather than another? • If two unbiased estimators are calculated of the same population parameter θ, we would chose the estimator whose sampling distribution had the smallest variance. The estimator with the lowest variance is called the most efficient estimator of θ. • Definition: – If we consider all possible unbiased estimators of some parameter θ, the one with the smallest variance is called the most efficient estimator of θ.

• There are many situations in which it is preferable to determine an interval within which we would expect to find the value of the parameter. Such an interval is called an interval estimate. • An estimate of a population parameter θ is an interval of the form ^

^

θ L < θ < θU

• The interval estimate indicates, by its length, the accuracy of the point estimate.

9

One and Two-Sample Estimation Interpretation of Interval Estimates • From the sampling distribution of ^

^

^

^

Θ , we can determine θ L and θU such that

^

P( θ L < θ < θU ) is equal to any positive fractional value that we care to specify. If, for instance, we find such that: ^

^

P(θ L < θ < θU ) = 1 – α •

For 0 < α < 1, then we have a probability of 1 – α of selecting a random sample that will produce an interval containing θ.



The interval θL < θ < θU , computed from the selected sample is then called a (1 – α)100% confidence interval, the fraction 1 – α is called the confidence coefficient or the degree of confidence, and the endpoints, , are called the lower and upper confidence limits. The wider the confidence interval is, the more confident we can be that the given interval contains the unknown parameter. Ideally, prefer a short interval with a high degree of confidence. NOTE: Point and interval estimation represent different approaches to gain information regarding a parameter.

^

• •

^

Single Sample: Estimating the Mean • If x is the mean of a random sample of size n from a population with known variance σ2, a (1 – α)100% confidence interval for µ is given by:

x − zα / 2

σ n

< µ < x + zα / 2

σ n

• For small samples selected from nonnormal populations, we cannot expect our degree of confidence to be accurate. • However, for samples of size n ≥ 30, regardless of the shape of most populations, sampling theory guarantees good results.

10

Single Sample: Estimating the Mean • If is the mean of a random sample of size n from a population with known variance σ2, a (1 – α)100% confidence interval for µ is given by:

x − zα / 2

σ n

< µ < x + zα / 2

σ n

• For small samples selected from nonnormal populations, the degree of confidence may not be accurate. However, for samples of size n ≥ 30, regardless of the shape of most populations, sampling theory guarantees good results.

Single Sample: Estimating the Mean • If x is used as an estimate of µ, we can then be (1 – α)100% confident that the error will not exceed:

zα / 2σ n

• If x is used as an estimate of µ, we can be (1 – α)100% confident that the error will not exceed a specified amount e when the sample size is: 2 ⎛z σ ⎞ n = ⎜ α /2 ⎟ ⎝ e ⎠ • When solving for the sample size, n, all fractional values are rounded up to the next whole number. By adhering to this principle, we can be sure that our degree of confidence never falls below (1 – α)100%.

11

Single Sample: Estimating the Mean The Case of σ Unknown • Earlier, we reviewed that if we have a random sample from a normal distribution, then the random variable: T=

X −µ S n

• has a Student’s t-distribution with n – 1 degrees of freedom. Here S is the sample standard deviation. In this situation with σ unknown, T can be used to construct a confidence interval on µ. The procedure is the same as that with known σ except that σ is replaced by S and the standard normal distribution is replaced by the tdistribution.

Single Sample: Estimating the Mean Confidence Interval for µ; σ Unknown • If and s are the mean and standard deviation of a random sample from a normal population with unknown variance σ2, a (1 – α)100% confidence interval for µ is:

x − tα / 2

s s < µ < x + tα / 2 n n

• where tα/2 is the t-value with υ = n – 1 degrees of freedom, leaving an area of α/2 to the right. • Quite often statisticians recommend that even when normality cannot be assume, σ is unknown, and n ≥ 30, s can replace σ and the confidence interval

x ± zα / 2 s

n

• may be used. This is often referred to as a large-sample confidence interval. The justification lies only in the presumption that with a sample as large as 30, s will be very close to the true σ and the central limit theorem prevails.

12

Single Sample: Estimating the Mean Example: • The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2 and 9.6 liters. Find a 95% confidence limit for the mean of all such containers, assuming an approximate normal distribution. • The sample mean and standard deviation for the given data are:

x = 10.0 s = 0.283 • Using Table A.4, we find t0.025 = 2.447 for υ = 6 degrees of freedom. • Hence the 95% confidence interval for µ is: s s < µ < x + tα / 2 n n 0.283 0.283 < µ < 10.0 + (2.447) 10.0 − (2.447) 7 7 9.74 < µ < 10.26 x − tα / 2

Single Sample: Estimating the Mean • A rather sharp distinction exists between the goals of point estimates and confidence interval estimates. • The former supplies a single number extracted from a set of experimental data, and the latter provides an interval given the experimental data that is reasonable for the parameter; that is, 100(1 – α)% of such computed intervals “cover” the parameter. • These two approaches to estimation are related to each other. The “common thread” is the sampling distribution of the point estimator.

13

Single Sample: Estimating the Mean • The standard deviation of X or standard error of X is σ n Simply put, the standard error of an estimator is its standard deviation. For the case of X , the computed confidence limit for µ with σ known is:

x ± zα / 2

σ n

= x ± zα / 2 s.e.( x)

• The confidence interval for µ with σ unknown is: s x ± tα / 2 = x ± tα / 2 s.e.( x) n • The confidence interval is no better (in terms of width) than the quality of the point estimate, in this case through its estimated standard error. Computer packages often refer to estimated standard errors merely as “standard errors.”

Single Sample: Estimating the Mean Prediction Interval • Sometimes, other than the population mean, the experimenter may be interested in predicting the possible value of a future observation. Thus, the confidence interval on the mean does not capture the requirement. The customer requires a statement regarding the uncertainty of one single observation. • This type of requirement is nicely fulfilled by the construction of a prediction interval. • Assume that the random sample comes from a normal population with unknown mean µ and known variance σ2. A natural point estimator of a new observation is X . • The variance of X is σ2/n. However, to predict a new observation, not only do we need to account for the variation due to estimating the mean, but also we should account for the variation of the future observations.

14

Single Sample: Estimating the Mean Prediction Interval • For a normal distribution of measurements with unknown mean µ and known variance σ2, a (1 – α)100% prediction interval of a future observation, x0, is: x − zα / 2σ 1 +

1 1 < x0 < x + zα / 2σ 1 + n n

where zα/2 is the z-value leaving an area of α/2 to the right

• For a normal distribution of measurements with unknown mean µ and unknown variance, a (1 – α)100% prediction interval of a future observation, x0, is: x − tα / 2 s 1 +

1 1 < x 0 < x + tα / 2 s 1 + n n

where tα/2 is the t-value with υ = n – 1 degrees of freedom, leaving an area of α/2 to the right.

Single Sample: Estimating the Mean Example: • Due to the decreasing of interest rates, the First Citizens Bank received a lot of mortgage applications. A recent sample of 50 mortgage applications resulted in an average of $128,300. Assume a population standard deviation of $15,000. If a new customer called in for a mortgage application, find a 95% prediction interval on this customer’s loan amount. • The point value of the next customer’s loan amount is = $128,300. The z-value for a 95% confidence interval, α = 0.05, leaves an α/2 = 0.025. The z-value for α = 0.025 is 1.96. • Hence, a 95% prediction interval for a future loan is:

x − zα / 2σ 1 +

1 1 < x0 < x + zα / 2σ 1 + = n n

128300 − (1.96) 1 +



1 1 < x0 < 128300 + (1.96) 1 + 50 50

The interval is (98,607.46, $157,992.54).

15

Single Sample: Estimating the Mean Outlier Detection • One methodology for outlier detection involves the rule that an observation is an outlier if it falls outside the prediction interval computed without inclusion of the questionable observation in the sample. Distinction Among Confidence Intervals, Prediction Intervals, and Tolerance Intervals • In real life applications, these intervals are NOT interchangeable because their interpretations are quite distinct. • Confidence Interval – attentive to population means by themselves. Interested in population mean. • Tolerance Limits – confidence interval on the middle 1 – α % of the corresponding normal distribution. Interested in the location of the majority of the population. • Prediction Limits – important when determining the bounds on a single value. The mean is not the issue and neither is the location of the majority of the population.

Two Samples: Estimating Difference Between Two Means •

Two populations with means µ1 and µ2 and variances σ12 and σ22, respectively, the point estimator of the difference between µ1 and µ2 is given by the statistic X − X . Therefore, to obtain a point estimate of µ1 – µ2, we shall select two independent random samples, one from each population, of size n1 and n2, and compute the difference x − x , of the sample means. 1

2

1

2



Confidence Interval for µ1 – µ2; σ12 and σ22 Known



If x1and x 2 are means of independent random samples of size n1 and n2 from populations with known variances σ12 and σ22, respectively, a (1 – α)100% confidence interval for µ1 – µ2 is given by:

(x − x )− z 1



2

α /2

σ 12 n1

+

σ 22 n2

(

)

< µ1 − µ 2 < x1 − x 2 + zα / 2

σ 12 n1

+

σ 22 n2

where zα/2 is the z-value leaving an area of α/2 to the right.

16

Two Samples: Estimating Difference Between Two Means • The Experimental Conditions and the Experimental Unit • It is assumed that we have two independent random samples from distributions with means µ1 and µ2, respectively. It is important that experimental conditions emulate this “ideal” described the assumptions as closely as possible. • If the “ideal” conditions are emulated as much as possible, the experimenter will have a degree of insurance that experimental units will not bias results if the conditions that define the two populations are randomly assigned to the experimental units.

Two Samples: Estimating Difference Between Two Means Interpretation of the Confidence Interval • For the case of a single parameter, the confidence interval simply produces error bounds on the parameter. • Values contained in the interval should be viewed as reasonable values given the experimental data. • In the case of a difference between two means, the interpretation can be extended to one of comparing the two means. – For example, if we have a high confidence that a difference µ1 – µ2 is positive, we would certainly infer that µ1 > µ2.

• Equal Sample Sizes • The procedure for constructing confidence intervals for µ1 – µ2 with σ1 = σ2 = σ unknown requires the assumption that the populations are normal. • Slight departures from either the equal variance or normality assumption do not seriously alter the degree of confidence for our interval. • If the population variances are considerably different, we still obtain reasonable results when the populations are normal, provided that n1 = n2. • Therefore, in a planned experiment, one should make every effort to equalize the size of the samples.

17

One- and Two-Sample Tests of Hypotheses • Often, the problem confronting the scientist or engineer is not so much the estimation of a population parameter, but rather the formation of a data-based decision procedure that can produce a conclusion about some scientific system. – Example: medical research, accuracy of two different gauges

• In each of these cases, the scientist or engineer postulates or conjectures something about a system. In addition, each must involve the use of experimental data and decision making that is based on the data. • Formally, in each case, the conjecture can be put in the form of a statistical hypothesis. These procedures that lead to the acceptance or rejection of a statistical hypothesis. • Statistical Hypothesis – an assertion or conjecture concerning one or more populations. – The truth or falsity of a statistical hypothesis is never known with absolute certainty unless we examine the entire population. This, of course, would be impractical in most situations. – Instead, we take a random sample from the population of interest and use the data contained in this sample to provide evidence that either supports or does not support the hypothesis.

One- and Two-Sample Tests of Hypotheses Role of Probability in Hypothesis Testing • The decision procedure must be done with the awareness of the probability of a wrong conclusion. • The reader must be accustomed to understanding that the acceptance of a hypothesis merely implies that the data does not give sufficient evidence to refute it. – On the other hand, rejection implies that the sample evidence does not refute it. Put another way, rejection implies that there is a small probability of obtaining the sample information observed when, in fact, the hypothesis is true.

• In other words, rejection of the hypothesis tends to all but “rule out” the hypothesis. On the other hand, it is very important to emphasize that the acceptance or, rather, the failure to reject does not rule out other possibilities. As a result, the firm conclusion is established by the data analyst when the hypothesis is rejected. • The formal statement of a hypothesis is often influenced by the structure of the probability of a wrong conclusion. If the scientist is interested in strongly supporting a contention, he or she hopes to arrive at the contention in the form of rejection of the hypothesis.

18

One- and Two-Sample Tests of Hypotheses The Null and Alternative Hypotheses • The structure of hypothesis testing will be formulated with the use of the term null hypothesis. This refers to any hypothesis we wish to test and is denoted by H0. The rejection of H0 leads to the acceptance of an alternative hypothesis, denoted by H1. • A null hypothesis concerning a population parameter will always be stated so as to specify an exact value of the parameter, whereas the alternative hypothesis allows for the possibility of several values. Hence, if H0 is the null hypothesis p = 0.05 for a discrete population, the alternative hypothesis H1 would be one of the following: p < 0.5

p > 0.5

p ≠ 0.5

One- and Two-Sample Tests of Hypotheses Error Types • Type I Error – rejection of the null hypothesis when it is true • Type II Error – acceptance of the null hypothesis when it is false

Accept H0 Reject H0

H0 is true

H0 is false

Correct Decision Type I Error

Type II Error Correct Decision

19

One- and Two-Sample Tests of Hypotheses Error Types • The probability of committing a Type I Error, also called the Level of Significance, is denoted by the Greek letter α. Sometimes the Level of Significance is called the Size of the Test. • The probability of committing a Type II Error, denoted by β, is impossible to compute unless we have a specific alternative hypothesis (the alternative hypothesis must set the test value against a specific value). • Ideally, we like to use a test procedure for which the Type I and Type II Error probabilities are both small.

One- and Two-Sample Tests of Hypotheses • Relationship to Sample Size • For a fixed sample size, a decrease in the probability of one error will usually result in an increase in the probability of the other error. – Fortunately, the probability of committing both types of error can be reduced by increasing the sample size.

• After the null and alternative hypotheses are stated, it is important to consider the sensitivity of the test procedure. • By this, we mean that there should be a determination, for a fixed α, of a reasonable value for the probability of wrongly accepting H0 (i.e., the value of β) when the true situation represents some important deviation from H0. • The value of the sample size can usually be determined for which there is a reasonable balance between α and the value β computed in this fashion.

20

One- and Two-Sample Tests of Hypotheses Important Properties of a Test of Hypothesis 1. The Type I Error and Type II Error are related. A decrease in the probability of one generally results in an increase in the probability of the other. 2. The probability of committing a Type I Error, can always be reduced by adjusting the critical region (the region where the null hypothesis is rejected). 3. An increase in the sample size n will reduce α and β simultaneously. 4. If the null hypothesis is false, β is a maximum when the true value of a parameter approaches the hypothesized value. The greater the distance between the true value and the hypothesized value, the smaller β will be. • •

Power of a Test – probability of rejection H0 given that a specific alternative is true. The power of a test can be computed as 1 – β. Often different types of tests are compared by contrasting power properties. In a sense, the power is a more succinct measure of how sensitive the test is for “detecting differences” between a mean of, say, between 68 and 68.5.

One- and Two-Sample Tests of Hypotheses One- and Two-Tailed Tests • A test of any statistical hypothesis, where the alternative is one-sided, such as H0: θ = θ0 H1: θ < θ0 Or H0: θ = θ0 H1: θ > θ1 • And is called a one-tailed test.

21

One- and Two-Sample Tests of Hypotheses One- and Two-Tailed Tests • Generally, the critical region for the alternative hypothesis θ > θ0 lies in the right tail of the distribution of the test statistic, and vice versa. The inequality symbol points in the direction where the critical region lies. • A test of any statistical hypothesis where the alternative is two-sided, such as H0: θ = θ0 H1: θ ≠ θ0 • Is called a two-tailed test, since the critical region is split into two parts. The alternative hypothesis θ ≠ θ0 states that either θ < θ0 or θ > θ0.

One- and Two-Sample Tests of Hypotheses How Are the Null and Alternative Hypotheses Chosen? • The null hypothesis, H0, will always be stated using the equality sign so as to specify a single value. – In this way, the probability of committing a Type I Error can be controlled. – Whether one sets up a one-tailed or two-tailed test will depend on the conclusion to be drawn if H0 is rejected. – The critical region can be determined only after H1 has been stated.

• Guidelines for determining which hypothesis should be stated as H0 and which should be stated as H1. 1. Read the problem statement carefully. 2. Determine the claim that you want to test. 1. Should the claim suggest a single direction such as more than, less than, superior to, inferior to, and so on, then H1 will be stated using the inequality symbol (< or >) corresponding to the suggested direction. 2. Should the claim suggest a compound direction (equality as well as direction) such as at least, equal to or greater, at most, no more than, and so on, then this entire compound direction (≤ or ≥) is expressed as H0, but using only the equality sign, and H1 is given the opposite direction. 3. Finally, if no direction is suggested by the claim, then H1 is stated using the not equal (≠) symbol.

22

One- and Two-Sample Tests of Hypotheses Example: • A manufacturer of a rice cereal claims that the average saturated fat content does not exceed 1.5 milligrams. State the null and alternative hypotheses to be used in testing this claim. • The manufacturer’s claim should be rejected only if µ is greater than 1.5 milligrams and should be accepted if µ is less than or equal to 1.5 milligrams. Since the null hypothesis always specifies a single value of the parameter, we test: •

H0: H1:

µ = 1.5 milligrams µ > 1.5 milligrams

Although the null hypothesis has been stated with an equal sign, it is understood that it contains any value not specified by the alternative hypothesis.

Example: • A real estate agent claims that 60% of all private residences being built today are 3-bedroom homes. To test this claim, a large sample of new residences is inspected; the proportion of these homes with three bedrooms is recorded and is used as our test statistic. State the null and alternative hypotheses to be used in this test. H0: H1:

p = 0.6 p ≠ 0.6

One- and Two-Sample Tests of Hypotheses The Use of P-Values for Decision Making • In testing hypotheses, the critical region may be chosen arbitrarily and its size determined. If α is too large, it can be reduced by making an adjustment in the critical value. – It may be necessary to increase the sample size to offset the decrease that occurs automatically in the power of the test.

• Over a number of generations of statistical analysis, it had become customary to choose an α of 0.05 or 0.01 and select the critical region accordingly. • This preselection of a significance level α has its roots in the philosophy that the maximum risk of making a Type I error should be controlled. However, the approach does not account for values of test statistics that are “close” to the critical region. • The P-value approach has been adopted extensively by users in applied statistics. The approach is designed to give the user an alternative (in terms of a probability) to a mere “reject” or “do not reject” conclusion. The P-value computation also gives the user important information when the z-value falls well into the ordinary critical region.

23

One- and Two-Sample Tests of Hypotheses • The P-value can be viewed as simply the probability of obtaining these data given that both samples come from the same distribution. Thus, the small P-value clearly refutes H0, and the conclusion is that the population means are significantly different. • The P-value approach as an aid in decision-making is quite natural because nearly all computer packages that provide hypothesis-testing computation print out P-values along with the values of the appropriate test statistics. • P-value – the lowest level (of significance) at which the observed value of the test statistics is significant.

One- and Two-Sample Tests of Hypotheses • Approach to Hypothesis Testing with Fixed Probability of Type I Error – State the null and alternative hypotheses. – Choose a fixed significance level α. – Choose an appropriate test statistics and establish the critical region based on α. – From the computed test statistic, reject H0 if the test statistic is in the critical region. Otherwise, do not reject. – Draw scientific or engineering conclusions.

• Significance Testing (P-Value Approach) – – – –

State null and alternative hypotheses. Choose an appropriate test statistic. Compute P-value based on computed value of test statistic. Use judgment based on P-value and knowledge of scientific system.

24

One- and Two-Sample Tests of Hypotheses Single Sample: Test Concerning a Single Mean (Variance Known) • Consider first the hypothesis: H0: µ = µ0 H1: µ ≠ µ0 • Select the appropriate test statistic. Test Procedure for a Single Mean

z=

x − µ0

σ

n

• If –zα/2 < z < zα/2, do not reject H0. Rejection of H0, of course, implies acceptance of the alternative hypothesis µ = µ0. • Tests of one-sided hypotheses on the mean involve the same statistic described in the two-sided case. The difference, of course, is that the critical region is only in one tail of the standard normal distribution.

One- and Two-Sample Tests of Hypotheses Example: • A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years. Assuming a population standard deviation of 8.9 years, does this seem to indicate that the mean life span today is greater than 70 years? Use a 0.05 level of significance. • Define the hypotheses: x − µ0 z= µ = 70 years H0: σ n H1: µ > 70 years Substitute : α = 0.05 71.8 − 70 = 2.02 z= • Critical region: z > 1.645 where 8.9 100

• This z-value is greater than the z-value for the critical region. Therefore, reject H0 and conclude that the mean life span today is greater than 70 years.

25

One- and Two-Sample Tests of Hypotheses Example: • A manufacturer of sports equipment has developed a new synthetic fishing line that he claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5 kilograms. Test the hypothesis that µ = 8 kilograms against the alternative that µ ≠ 8 kilograms if a random sample of 50 lines is tested and found to have a mean breaking strength of 7.8 kilograms. Use a 0.01 level of significance. • Define the hypotheses: µ = 8 kilograms H0: H1: µ ≠ 8 kilograms α = 0.01

One- and Two-Sample Tests of Hypotheses Example: • Critical region: z=

|z| > 2.575 where

x − µ0

σ

n

Substitute : z=

7.8 − 8.0 = −2.83 0.5 50

• This z-value is greater than the z-value for the critical region. Therefore, reject H0 and conclude that the mean breaking strength is not equal to 8 but is, in fact, less than 8 kilograms. • Since the test in this example is two-tailed, the desired P-value is twice the area of the shaded region to the left of z = -2.83. Therefore, using the normal tables, we have • P = P(|z| > 2.83) = 2P(Z < -2.83) = 0.0046 • This is smaller than the required P value of 0.01.

26

One- and Two-Sample Tests of Hypotheses • Relationship to Confidence Interval Estimation • The reader should realize by now that the hypothesis-testing approach to statistical inference in this chapter is very closely related to the confidence interval approach. • Confidence interval estimation involves computation of bounds for which it is “reasonable” that the parameter in question is inside the bounds. • It turns out that the testing of H0: µ = µ0 against H1: µ ≠ µ0 at a significance level α is equivalent to computing a 100(1 – α)% confidence interval on µ and rejecting H0 if µ0 is not inside the confidence interval.

27