Inference is divided into two broad categories:

Theory of Hypothesis Testing Inference is divided into two broad categories: • Estimation • Testing Chapter 7 devoted to point estimation. Will dis...
Author: Roy Rodger Page
76 downloads 0 Views 87KB Size
Theory of Hypothesis Testing Inference is divided into two broad categories:

• Estimation

• Testing

Chapter 7 devoted to point estimation. Will discuss interval estimation in Chapter 9. Testing is the subject of Chapter 8. Definition 14 A hypothesis is a statement about population parameters. There are usually two hypotheses, called the null and alternative hypotheses. 125

The null hypothesis is denoted H0, and the alternative is denoted H1. Usually, H1 is the negation, or complement, of H0. Often the two hypotheses have the form H0 : θ ∈ Θ0

H1 : θ ∈ Θc0.

Θ0 ⊂ Θ and Θc0 is the complement of Θ0. A test of the hypotheses is carried out as follows: We are to observe a value of random vector X having distribution f ( · |θ ). Before observing the data, we formulate a decision rule of the form: Reject H0 if the observed value x of X is in some set R. Don’t reject H0 if x ∈ Rc. 126

R is called the rejection region or critical region. We could also phrase the decision rule in terms of a sufficient statistic. Any statistic whose values are used to define the rejection region is called a test statistic. Suppose that Θ0 contains only a single element. Then H0 : θ ∈ Θ0 is called a simple hypothesis. Otherwise it is called a composite hypothesis. We have two hypotheses H0 and H1. After observing the data we either take action a0 or a1.

127

a0 = ”accept, or fail to reject, H0” a1 = ”accept H1”

Consequences of actions Action

True state of nature

a0

a1

θ ∈ Θ0

Correct

Type I Error

θ ∈ Θc0

Type II Error

Correct

128

Example 29 Suppose X1, . . . , Xn is a random sample from N (θ, 1). We’re interested in testing the hypotheses H0 : θ ≤ 10 Θ = (−∞, ∞)

H1 : θ > 10. Θ0 = (−∞, 10]

Θc0 = (10, ∞) For example: • Xi might be a measure of product quality when a new process is used. • The average quality measure using the old process is 10. • H0 says that the new process is no better than the old. • H1 says the new process is better than the old. 129

A sufficient statistic in this model is n 1 X ¯= X Xi. n i=1

¯ is both the MLE and Also, we know that X the UMVUE of θ. A sensible test would have the following form:

Take action a1 if x ¯ ≥ cn, and take action a0 if x ¯ < cn, ¯ and cn is where x ¯ is the observed value of X some constant larger than 10. Type I error: Conclude new process is better when it isn’t. Type II error: Conclude new process is no better than the old when in fact it is better. 130

A general approach to testing When θ is the true parameter value, let β(θ) = Pθ (rejecting H0) = Pθ (X ∈ R), where R is the rejection region of the test. We have (

β(θ) =

Pθ (Type I error), 1 − Pθ (Type II error),

if θ ∈ Θ0 if θ ∈ Θc0.

Note that for θ ∈ Θc0, β(θ) = Pθ (making correct decision). The function β restricted to Θc0 is called the power function of the test, and β(θ) is called the power at θ.

131

Define α = sup β(θ). θ∈Θ0

α is called the size of the test. Casella and Berger say that the test is of level η if α ≤ η. Notation A test can be characterized by a test function φ. ½

φ(x) =

1, 0,

if x ∈ R if x ∈ Rc

Note that β(θ) = Eθ [φ(X )]. Example 29 (continued) Suppose we use a test function φ as follows: (

φ(x) =

1, 0,

√ if x ¯ ≥ 10 + 1.645 n otherwise.

132

Ã

1.645 ¯ ≥ 10 + √ β(θ) = Pθ X n

!

Ã

!

¯ −θ √ X = Pθ √ ≥ n(10 − θ) + 1.645 1/ n = P (Z ≥



n(10 − θ) + 1.645),

where Z ∼ N (0, 1). Remarks

• β(10) = 0.05 for each n.

• β(θ) increases monotonically, from 0 at θ = −∞ to 1 at θ = ∞.

• So, the size of the test is 0.05, no matter the value of n. 133

1.0

Power curves for Example 29

5

0.0

0.2

0.4

β(θ)

0.6

0.8

40 20 10

8

9

10

11

12

θ

The numbers beside the curves indicate sample size, n. 134

Goal in constructing a test Make α as small as possible while making the power as large as possible. Clearly we can construct a test with α = 0 by taking R = empty set! Also, we can make the power equal to 1 for all θ ∈ Θc by using a test with R = sample space. However, neither of these tests attains the goal of making α small and power large. Our main approach will be to consider tests with level of significance set at some desired value (such as 0.05), and to try and select from these tests one that maximizes power.

135

By taking α to be fairly small, this approach implicitly says that a Type I error is more serious than a Type II error. Choosing α to be small makes a Type I error unlikely, but may mean that our test has low power, and hence a high probability of Type II error. Example 29 (continued) The way we set up the hypotheses, a Type I error means concluding that the new process is better when it really isn’t. Taking α small may reflect our reluctance to change to the new process unless there’s very convincing evidence that it’s really better.

136

Type II error: missing out on a better process.

Type I error: make a (perhaps costly) switch to a new process that is no better than the old one.

If the latter error is more serious, it would be advisable to choose α small and to live with the resulting Type II error probability. Testing a simple null against a simple alternative Definition 15 A test φ is said to be a most powerful test of size α for testing H0 : θ = θ0

H 1 : θ = θ1

if Eθ0 [φ(X )] = α and for any other test φ∗ of size α, Eθ1 [φ(X )] ≥ Eθ1 [φ∗(X )]. 137

We observe a value of X , whose distribution is either f (x|θ0) or f (x|θ1). Suppose we observe X to be x. Consider the likelihood ratio f (x|θ1) L(x|θ0, θ1) = . f (x|θ0) It would be sensible to reject H0 in favor of H1 only when L(x|θ0, θ1) is sufficiently large. Neyman-Pearson Lemma Let X be a random vector with pdf or pmf f (x|θ), where θ ∈ Θ = {θ0, θ1}. Define the hypotheses H 0 : θ = θ0

H1 : θ = θ1.

138

a) Any test φ of the form (

φ(x) =

1, 0,

if f (x|θ1) > kf (x|θ0) if f (x|θ1) < kf (x|θ0)

with Eθ0 [φ(X )] = α is a most powerful level α test of H0 vs. H1. b) If there exists a test of the form in a) with k > 0, then any other most powerful level α test, call it φ∗, has size α and is such that φ(x) = φ∗(x) for almost all x in {x : f (x|θ1) 6= kf (x|θ0)}.

139

Suggest Documents