7. Hypothesis testing

620.152 Introduction to Biomedical Statistics 7. Hypothesis testing 7.1 7.2 7.3 7.4 7.5 Introduction and terminology Hypothesis testing and confiden...
Author: Avice Price
6 downloads 3 Views 257KB Size
620.152 Introduction to Biomedical Statistics

7. Hypothesis testing 7.1 7.2 7.3 7.4 7.5

Introduction and terminology Hypothesis testing and confidence intervals p values Critical region, types of error and power Hypothesis testing for Normal populations

... 1 ... 3 ... 3 ... 4 . . . 10

References: Pagano and Gauvreau, Chapter 10.

“I had come to an entirely erroneous conclusion, which shows, my dear Watson, how dangerous it is to reason from insufficient data.” Sherlock Holmes, The Speckled Band, 1892.

7.1 Introduction We introduce the concepts and terminology for hypothesis testing using the case of inference on the population mean, µ, based on a random sample from a Normal population with known variance. In this case, inference is based on the result ¯ −µ d X √ = N(0, 1). Z= σ/ n Extension to other cases comes later: not until section 7.5. Hypothesis testing can be regarded as the “other side” of confidence intervals. We have seen that a confidence interval for the parameter µ gives a set of “plausible” values for µ. Suppose we are interested in whether µ = µ0 . In determining whether or not µ0 is a plausible value for µ (using a confidence interval) we are really testing µ = µ0 against the alternative that µ 6= µ0 . If µ0 is not a plausible value, then we would reject µ = µ0 . In this subject, we deal only with two-sided confidence intervals and, correspondingly, with two-sided tests, i.e tests against a two-sided alternative (µ = µ0 vs µ 6= µ0 ). There are circumstances in which onesided tests and one-sided confidence intervals seem more appropriate. Some statisticians argue that they are never appropriate. In any case, we will use only two-sided tests. All our confidence intervals are based on the central probability interval for the estimator, i.e. that obtained by excluding probability 12 α at each end of the distribution, giving a Q% confidence interval, where Q = 100(1 − α). In the first instance, this means that our tests are based on rejecting µ = µ0 for an event of probability 12 α at either end of the estimator distribution. Note: this is not always the case for other test statistics, i.e. test statistics that ¯ − µ0 )2 are not estimators. For example, using a test statistic such as U = (X to test µ = µ0 . We will consider such cases in a later chapter.

page 7.1

620.152 Introduction to Biomedical Statistics

Example (Serum cholesterol level) The distribution of serum cholesterol level for the population of males in the US who are hypertensive and who smoke is approximately normal with an unknown mean µ. However, we do know that the mean serum cholesterol level for the general population of all 20–74-year-old males is 211 mg/100ml. We might wonder whether the mean cholesterol level of the subpopulation of men who smoke and are hypertensive is different. Suppose we select a sample of 25 men from this group and their mean cholesterol level is x ¯ = 220 mg/100ml. What can we conclude from this? A statistical hypothesis is a statement concerning the probability distribution of a population (a random variable X). We are concerned with parametric hypotheses: where the distribution of X is specified except for a parameter. In the present case µ, where the population distribution is N(µ, σ 2 ). The hypotheses can take the form µ = 6, or µ 6= 4 (or µ > 10, or 6 < µ < 8, or . . . ). The hypothesis under test is called the null hypothesis, denoted H0 . It has a special importance in that it usually reflects the status quo: the way things were, or should be. Often the null hypothesis represents a “no effect” hypothesis. The onus is on the experimenter to demonstrate that an “effect” exists. We don’t reject the null hypothesis unless there is strong evidence against it. We always take H0 to be a simple hypothesis: µ = µ0 . We test the null hypothesis against an alternative hypothesis, denoted by H1 . We will always take the alternative hypothesis to be H00 , i.e. the complement of H0 (µ 6= µ0 ). It it need not be: it may be another simple hypothesis (µ = µ1 ) or a one-sided alternative (µ > µ0 ). Example (Serum cholesterol level) State the null and alternative hypotheses for this problem. The “logic” of the hypothesis testing procedure seems a bit back-to-front at first. It is based on the contrapositive: [P ⇒ Q] = [Q0 ⇒ P 0 ]. For example: [sheeP ⇒ Quadraped] = [notQuadraped ⇒ notsheeP ]; [(x = 2) ⇒ (x2 = 4)] = [(x2 6= 4) ⇒ (x 6= 2)]. Our application is rather more uncertain: [(µ = µ0 ) ⇒ (¯ x ≈ µ0 )] = [(¯ x 6≈ µ0 ) ⇒ (µ 6= µ0 )] This logic means that we have a (NQR) “proof” of µ 6= µ0 . (If the signs were all equalities rather than (random) approximations, it would be a proof.) We have no means of “proving” (NQR or otherwise) that µ = µ0 .

We observe the sample and compute x ¯. On the basis of the sample and the , we must reach a decision: “reject H0 ”, or not. Statisticians are reluctant to use “accept H0 ” for “do not reject H0 ”, for the reasons indicated above. Mind you, this does seem a bit odd when they can use “success” to mean “the patient dies”. If ever I use “accept H0 ” (and I’m inclined to), it means only “do not reject H0 ”. In particular, it does not mean that H0 is true, or even that I think it likely to be true! “I am getting into your involved habit, Watson, of telling a story backward.” Sherlock Holmes, The Problem of Thor Bridge, 1927.

page 7.2

620.152 Introduction to Biomedical Statistics

To demonstrate the existence of an effect (µ 6= µ0 ), the sample must produce evidence against the no-effect hypothesis (µ = µ0 ). There are several ways of approaching this. The first, and simplest after Chapter 6, is to compute a confidence interval (which is a good idea in any case); and then to check whether or not the null-hypothesis value (µ = µ0 ) is in the in the confidence interval.

7.2 Hypothesis testing and confidence intervals We have seen how to obtain a confidence interval for µ, so there is not much more to do. In fact, a number of the problems in Problem Set 6 had parts that questioned the plausibility of particular values of µ. This is now seen to be equivalent to hypothesis testing. Example We obtain a random sample of n=40 from a Normal population with known standard deviation σ=4. The sample mean is x ¯=11.62. Test the null hypothesis H0 : µ=10 (against a two-sided alternative). 95% CI for µ: (11.62 ± 1.96 √4 ) = (10.38, 12.86). 40

Since the 95% confidence interval does not include 10, we reject the null hypothesis µ = 10. There is evidence in this sample that µ > 10. Example

(Serum cholesterol level)

95% CI for µ:

220 ± 1.96× √46 25

[n=25, µ0 =211, σ=46; x ¯=220] = (202.0, 238.0).

Since the 95% confidence interval includes 211, we do not reject the null hypothesis µ = 211. There is no evidence in this sample that µ 6= 211.

7.3 p-value We measure the strength of the evidence of the sample against H0 by using the “unlikelihood” of the data if H0 is true. The idea is to work out how unlikely the observed sample is, assuming µ = µ0 . If it is “too unlikely”, then we reject H0 ; and otherwise, we do not reject H0 . The p-value is the probability (if H0 were true) of observing a value as extreme as the one observed. This means that: ¯ 0 is at least as far from µ0 as the observed x p = 2 Pr(X ¯), ¯ 0 denotes a (hypothetical) sample mean, assuming H0 true, i.e. where X 2 d ¯0 = X N(µ0 , σn ); the 2 is because this is a two-sided test, and we must allow for the possibility of being as extreme at the other end of the distribution. Therefore: ½ ¯0 > x 2 Pr(X ¯) if x ¯>µ0 p= ¯0 < x 2 Pr(X ¯) if x ¯11.62), where X ¯0 = p = 2 Pr(X N(10, 4 ) 40

¯ 0 >11.62) = 2 Pr(X ¯ 0 > 11.62−10 ¯ 0 >2.56) = 0.010. √ p = 2 Pr(X ) = 2 Pr(X s s 4/ 40

page 7.3

620.152 Introduction to Biomedical Statistics

Example A random sample of fifty observation is obtained from a Normal population with standard deviation 5. The observed sample mean is 8.3. Test the null hypothesis that µ = 10. [n=50, µ0 =10, σ=5, x ¯=8.3] 2 d ¯ 0 220−211 √ ) = 2 Pr(X Thus p = 2 Pr(X 46/ 25

Since p > 0.05, we do not reject the null hypothesis µ = 211. There is no evidence in this sample that µ 6= 211.

7.4 Critical values, types of error and power The critical value approach specifies a decision rule for rejecting H0 : for example, for a test of significance level 0.05 the rule is ¯ > µ0 +1.96 √σn . reject H0 if x ¯ < µ0 −1.96 √σn or x This specifies a rejection rule in terms of the estimate: if the estimate (¯ x) is too far away from the H0 value (µ0 ), then H0 is rejected. The rejection rule is often best expressed in terms of a statistic that has a standard distribution if H0 is true. Here the test statistic is ¯ X−µ

Z = σ/√n0 d

which is such that, if H0 is true, then Z = N(0, 1).

page 7.4

620.152 Introduction to Biomedical Statistics

The rule then is to compute the observed value of Z and to see if it could plausibly be an observation from a standard Normal distribution. If not, we reject H0 . This leads to the name often used for this test: the z-test. ¯ and known constants (the null hypothesis Note that Z involves only X value µ0 , the known standard deviation, σ, and the sample size, n). In particular, Z does not depend on the unknown parameter µ. We compute the observed value of Z: x ¯−µ

z = σ/√n0 and compare it to the standard Normal distribution (where “plausible” is taken to mean within the central 95% of the distribution) . Thus the decision rule is reject H0 if z < −1.96 or z > 1.96; i.e. if |z| > 1.96. which corresponds exactly to the rejection region for x ¯ given above. Example A random sample of fifty observation is obtained from a Normal population with standard deviation 5. The observed sample mean is 8.3. Test the null hypothesis that µ = 10. [n=50, µ0 =10, σ=5, x ¯=8.3]

√ = −2.40, z = 8.3−10



5/ 50

hence we reject H0 (using significance level 0.05) since z < −1.96. There is evidence in this sample that µ < 10. Example We obtain a random sample of n=40 from a Normal population with known standard deviation σ=4. The sample mean is x ¯=11.62. Test the null hypothesis H0 : µ=10 (against a two-sided alternative). [n=40, µ0 =10, σ=4, x ¯=11.62]



√ z = 11.62−10 = 2.56, 4/ 40

Since |z| > 1.96, we reject the null hypothesis µ = 10. There is evidence in this sample that µ > 10. Example z=

(Serum cholesterol level)

x ¯−µ √0 σ/ n

=

220−211 √ 46/ 25

[n=25, µ0 =211, σ=46; x ¯=220]

= 0.978.

Since |z| < 1.96, we do not reject the null hypothesis µ = 211. There is no evidence in this sample that µ 6= 211. Types of error In deciding whether to accept or reject H0 , there is a risk of making two types of errors:

H0 true

reject H0 × error of type I prob α α = significance level

don’t reject H0 X correct prob 1−α

H1 true

reject H0 X correct prob 1−β 1−β = power

don’t reject H0 × error of type II prob β

We want α and β to be small. The significance level, α is usually pre-set at 0.05; we then do what we can to make the power large (and hence β small). This will generally mean taking a bigger sample.

page 7.5

620.152 Introduction to Biomedical Statistics

Example (Birth weights) A researcher thinks that mothers with low socioeconomic status (SES) deliver babies whose birthweights are lower than “normal”. To test this hypothesis, a random sample of birthweights from 100 consecutive, full-term, live-born babies from the maternity ward of a hospital in a low-SES area is obtained. Their mean birthweight is found to be 3240 g. We know from nationwide surveys that the population mean birthweight is 3400 g with a standard deviation of 700 g. • Do the data support her hypothesis? √ [n=100, x ¯=3240; we assume σ=700; µ0 =3400] z = 3240−3400 = −2.29. 700/ 100

Since |z| > 1.96, we reject H0 . There is significant evidence in this sample that the mean birthweight of SES babies is less than the national average. • Describe the type I and type II errors in this context. In this context, a type I error is to conclude that “SES babies” are different, when they are actually the same as the rest of the population; a type II error is to conclude that “SES babies” are the same, when they are in fact different. Example (Serum cholesterol level) • Describe the type I and type II errors in this context. A type I error is to conclude that the group of interest (SHM = men who smoke and have hypertension) have different mean serum cholesterol level from the general population, when they actually have the same mean. A type II error is to conclude that the SHM individuals are no different from the general population with respect to serum cholesterol levels, when in fact they are different. • Compute β, the probability of making a type II error, when the true value of µ is 250. [n = 25, µ0 = 211, σ = 46] β = Pr(don’t reject H0 | µ = 250) 2 d ¯ 0 < 211+1.96× √46 ), where X ¯0 = = Pr(211−1.96× √46 < X N(250, 46 ) 25

25

25

¯ s0 < − 39 √ + 1.96) − 1.96 < X 46/ 25 0 ¯ = Pr(−6.20 < Xs < −2.28) = 0.0113 − 0.0000 = 0.011 = Pr(−

39 √ 46/ 25

¯ √ . This calculation can be done more neatly in terms of Z = X−211 46/ 25

2

d d d ¯0 = √ , 1), i.e. Z 0 = N(4.24, 1). ), then Z 0 = N( 250−211 If X N(250, 46 25

46/ 25

2 µ−a [using the result that Y = X−a has mean b and variance σb2 ] b

Then: β = Pr(−1.96 < Z 0 < 1.96) = Pr(−6.20 < Zs0 < −2.28) = 0.011, as we obtained above.

page 7.6

620.152 Introduction to Biomedical Statistics

There is a helpful analogy between legal processes, at least in Westminster-style legal systems, and hypothesis testing. hypothesis testing

the law

null hypothesis H0 alternative hypothesis H1 don’t reject H0 without strong evidence type I error type II error α = Pr(type I error) power = 1 − Pr(type II error)

accused is innocent accused is guilty innocent until proven guilty, beyond a reasonable doubt convict an innocent person acquit a guilty person beyond reasonable doubt effectiveness of system in convicting a guilty person

A simple example to illustrate some of the terms used. Example I have a coin which I think may be biased. To test this I toss it five times: if I get all heads or all tails, I will say it is biased, otherwise I’ll say it’s unbiased. Let θ = probability of obtaining a head, then: null hypothesis, H0 : θ = 12 (unbiased); alternative hypothesis, H1 : θ 6= 21 (biased); test statistic, X = number of heads obtained. test (decision rule): reject H0 if X ∈ {0, 5}. significance level

= Pr(reject H0 | H0 true) = Pr(X ∈ {0, 5} | θ = 12 ) = ( 12 )5 + ( 21 )5 1 = 16 ≈ 0.06

power

= Pr(reject H0 | H1 true) = Pr(X ∈ {0, 5} | θ 6= 12 ) (this can’t be evaluated)

So, we define the power function: Q(θ) = Pr(reject H0 | θ) = Pr(X ∈ {0, 5} | θ) = (1 − θ)5 + θ5 graph of Q(θ):

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Note 1: Q(0.5) is the significance level of the test Note 2: Q(0.75) = 0.255 + 0.755 ≈ 0.24; so this is not a particularly good test. But we knew that anyway! page 7.7

620.152 Introduction to Biomedical Statistics

d

Example Suppose that Z = N(θ, 1). We observe Z. On the basis on this one observation, we wish to test H0 : θ = 0 against H1 : θ 6= 0. This relatively trivial example has a range of applications! d

Z = N(θ, 1)

reject H0 |Z| > 1.96

don’t reject H0 |Z| > 1.96

H0 true (θ = 0)

× error of type I

X correct

d

Z = N(0, 1)

H1 true (θ 6= 0)

α = Pr(|Z| > 1.96) = 0.05

prob = 0.95

reject H0 |Z| > 1.96

don’t reject H0 |Z| > 1.96

X correct

× error of type II

d

power = Pr(|Z| > 1.96) = 0.17

prob = 0.83

d

power = Pr(|Z| > 1.96) = 0.52

prob = 0.48

d

power = Pr(|Z| > 1.96) = 0.85

prob = 0.15

d

power = Pr(|Z| > 1.96) = 0.95

prob = 0.05

d

power = Pr(|Z| > 1.96) = 0.98

prob = 0.02

d

power = Pr(|Z| > 1.96) = 0.17

prob = 0.83

d

power = Pr(|Z| > 1.96) = 0.52

prob = 0.48

d

power = Pr(|Z| > 1.96) = 0.85

prob = 0.15

d

power = Pr(|Z| > 1.96) = 0.95

prob = 0.05

d

power = Pr(|Z| > 1.96) = 0.98

prob = 0.02

e.g. Z = N(1, 1) e.g. Z = N(2, 1) e.g. Z = N(3, 1) e.g. Z = N(3.61, 1) e.g. Z = N(4, 1) e.g. Z = N(−1, 1) e.g. Z = N(−2, 1) e.g. Z = N(−3, 1) e.g. Z = N(−3.61, 1) e.g. Z = N(−4, 1) For example, for θ = 3,

d

power = Pr(|Z| > 1.96), where Z = N(3, 1) 1 − power = Pr(−1.96 < Z < 1.96) = Pr(−4.96 < Zs < −1.04) = Pr(Z < −1.04) − Pr(Z < −4.96) = 0.1492 − 0.0000 · power = 0.851 · · Except for θ around zero, it is usually the case that only one tail is required (as the other is negligible). For example, for θ = −1, d

· · ·

power = Pr(|Z| > 1.96), where Z = N(−1, 1) 1 − power = Pr(−1.96 < Z < 1.96) = Pr(−0.96 < Zs < 2.96) = Pr(Z < 2.96) − Pr(Z < −0.96) = 0.9985 − 0.1685 power = 0.170

Using the above table, we could plot a graph of the power function. The graph will have a minimum at zero (of 0.05, the significance level), and increases up to 1 on both sides, as θ moves away from zero: for θ = 4, or θ = −4 the power is 0.98.

page 7.8

620.152 Introduction to Biomedical Statistics

¯ X−µ

For the z-test, the statistic is Z = σ/√n0 . d

If µ = µ0 , then Z = N(0, 1). µ −µ

d

1√ 0 If µ = µ1 , then Z = N(θ, 1), where θ = σ/ . n We only get one observation on Z.

So the z-test is actually equivalent to the example above. We can use the results of that example to work out power for any z-test, d

using power = Pr(|Z| > 1.96), where Z = N(θ, 1).

Sample size calculations To devise a test of significance level 0.05 that has power of 0.95 when µ = µ1 , we need θ = 3.61. 13 σ 2 µ1 −µ √ 0 = 3.61 ⇒ i.e. σ/ n = . n (µ1 − µ0 )2 Example (Serum cholesterol level) Find the required sample size if we want a test to have significance level 0.05 and power 0.95 when µ = 220. Here µ0 = 211, µ1 = 220 and σ = 46. Therefore: 2

n > 13×46 = 340. 92 Thus we need a sample of at least 340, in order to ensure a power of 0.95 when the population mean is 220. The sample size result can be generalised, to any significance level α and specified power β as indicated in the following diagram, which indicates the derivation of 3.61 ≈ 1.96 + 1.6449.

0.025 -4

-3

0.05 -2

-1

0

1

1.96 z1− 12 α

0.025 2

3

4

5

6

1.6449 z1−β

For a z-test of µ = µ0 , with significance level α and power β when µ = µ1 , we require (z1− 12 α + z1−β ) σ 2 n> , (µ1 − µ0 )2 where zq denotes the standard Normal q-quantile.

page 7.9

7

620.152 Introduction to Biomedical Statistics

7.5 Hypothesis testing for Normal populations In this section, we consider tests for the parameters (µ and σ) for a Normal population. In each case, we define a statistic that has a “standard” distribution (N, t, χ2 ) when H0 is true. A decision is then obtained by comparing the observed value for this statistic with the standard distribution. In reporting the results of the test, you should give the value of the “standard” statistic, the p-value, and a verbal conclusion/explanation. It is recommended that you also give a confidence interval in reporting your results.

z-test (testing µ=µ0 when σ is known/assumed) We define:

¯ − µ0 X √ σ/ n ¯ is observed; µ0 , σ and n are given or assumed known.) (in which, X Z=

d

If H0 is true, then Z = N(0, 1). We evaluate the observed value of Z: x ¯ − µ0 √ z= σ/ n and compare it to the standard Normal distribution. For significance level 0.05, we reject H0 if |z| > 1.96. The p-value is computed using the tail probability for a standard Normal distribution: ½ 2 Pr(Z 0 > z) if z > 0 d where Z 0 = N(0, 1). p= 2 Pr(Z 0 < z) if z < 0 Example We obtain a random sample of n=40 from a Normal population with known standard deviation σ=4. The sample mean is x ¯=11.62. Test the null hypothesis H0 : µ=10 (against a two-sided alternative). [n = 40, σ = 4, x ¯ = 11.62, µ0 = 10] √ z = 11.62−10 = 2.56; 4/ 40

p = 2 Pr(Z 0 > 2.56) = 0.010.

The sample mean, x ¯=11.62; the z-test of µ=10 gives z=2.56, p=0.010. Thus there is significant evidence in this sample that µ>10. In reporting this test result, it is recommended that you also give the 95% CI for µ: (10.28, 12.86). Example The distribution of diastolic blood pressure for the population of female diabetics between the ages of 30 and 34 has an unknown mean µ, and standard deviation σ = 9.1 mm Hg. Researchers want to determine whether or not the mean of this population is equal to the mean diastolic blood pressure of the general population of females in this age group, which is 74.4 mm Hg. A sample of fifty diabetic women in this age group is selected and their mean diastolic blood pressure is 79.6 mm Hg. [n = 50, σ = 9.1, x ¯ = 79.6, µ0 = 74.4] √ z = 79.6−74.4 = 4.04; 9.1/ 50

p = 2 Pr(Z 0 > 4.04) = 0.000.

Note: p-values should be reported to three decimal places. p = 0.000 means that p < 0.0005. The mean blood pressure for this sample of female diabetics is 79.6 mm Hg. This is significantly greater than the general population value of 74.4 mm Hg (z=4.04, p=0.000).

page 7.10

620.152 Introduction to Biomedical Statistics

Example (Renal disease) The mean serum-creatinine level measured in 12 patients 24 hours after they received a newly proposed antibiotic was 1.2 mg/dL. The mean and standard deviation of serum-creatinine level in the general population are 1.0 and 0.4 mg/dL respectively. Is there evidence to support the claim that their mean serum-creatinine level is different from that of the general population? The z-test is available on MINITAB using Stat > Basic Statistics I 1-Sample Z . . . and then enter either (the column containing the data) or (n and x ¯); and enter the known value for σ and the null hypothesis value µ0 . For the above example, the following output is obtained: One-Sample Z Test of mu = 1 vs not = 1 The assumed standard deviation = 0.4 N Mean SE Mean 95% CI 12 1.20000 0.11547 (0.97368, 1.42632)

Z 1.73

P 0.083

Note that we are assuming the standard deviation of serum-creatine level is the same in the treated individuals as the general population (as well as Normality etc.) There is no evidence in this sample that the mean serum-creatine level is different in these patients (z = 1.73, p = 0.083). The z-test provides a routine which can be followed in the other cases.

t-test (testing µ=µ0 when σ is unknown) We define:

¯ − µ0 X √ S/ n ¯ and S are observed; µ0 and n are given.) (in which, X T =

d

If H0 is true, then T = tn−1 . We evaluate the observed value of T : x ¯ − µ0 √ t= s/ n and compare it to the tn−1 distribution. For significance level 0.05, we reject H0 if |t| > “2” = c0.975 (tn−1 ). The p-value is computed using the tail probability for a tn−1 distribu½ tion: 2 Pr(T 0 > t) if t > 0 d p= where T 0 = tn−1 . 2 Pr(T 0 < t) if z < 0 Example (Cardiology) A topic of recent clinical interest is the possibility of using drugs to reduce infarct size in patients who have had a myocardial infarction (MI) within the past 24 hours. Suppose we know that in untreated patients the mean infarct size is 25. In 18 patients treated with drug, the sample mean infarct size is 16.2 with a sample standard deviation of 8.4. Is the drug effective in reducing infarct size? [µ0 = 25; n = 18, x ¯ = 16.2, s = 8.4] √ t = 16.2−25 = −4.44; 8.4/ 18

d

p = 2 Pr(T 0 < −4.44) = 0.000. (T 0 = t17 )

The sample mean for treated patients x ¯=16.2 is significantly less than the known mean for untreated patients of 25: t= − 4.44, p=0.000. In reporting this test result, it is recommended that you also give the 95% CI for µ: (12.0, 20.4). page 7.11

620.152 Introduction to Biomedical Statistics

Example (Calorie content) Many consumers pay careful attention to stated nutritional contents on packaged foods when making purchases. It is therefore important that the information on packages be accurate. A random sample of n = 12 frozen dinners of a certain type was selected from production during a particular period, and calorie content of each one was determined. Here are the resulting observations. 255

244

239

242

265

245

259

248

225

226

251

233

The stated calorie content is 240. Do the data suggest otherwise? MINITAB can be used to analyse the data using Stat > Basic Statistics I 1-Sample t . . . and then enter either (the column containing the data) or (n, x ¯ and s); and enter the null hypothesis value µ0 . For the above example we obtain One-Sample T: Test of mu = 240 vs not = 240 N Mean StDev SE Mean 95% CI 12 244.33 12.383 3.575 (236.47, 252.20)

T 1.21

P 0.251

There is no significant evidence in this sample that the mean is different from 240 calories (t = 1.21, p = 0.251).

χ2 -test (testing σ=σ0 ) We define: (n − 1)S 2 σ02 (in which S is observed; σ0 and n are given.) U=

d

If H0 is true, then U = χ2n−1 . We evaluate the observed value of U : (n − 1)s2 u= σ02 and compare it to the χ2n−1 distribution. For significance level 0.05, we reject H0 if u is outside the central 95% probability interval for the χ2n−1 distribution. The p-value is computed using the tail probability for a χ2n−1 distribu½ tion: 2 Pr(U 0 > u) if u large d p= where U 0 = χ2n−1 . 2 Pr(U 0 < u) if u small Example (Packaging variation) A packaging line fills nominal 900 g tomato juice jar with an actual mean of 908.5 g. The process should have a standard deviation smaller than 4.25 g per jar (a larger standard deviation leads to too many underweight and overfilled jars). Samples of 61 jars are regularly taken to test the process. One such sample yields a mean of 907.9 g and a standard deviation of 3.74 g. Does this indicate that the true (population) standard deviation is less than 4.25? (i.e. is σ < 4.25?) [σ0 = 4.25; n = 61, s = 3.74] 2

d

= 46.46, p = 2 Pr(U 0 < 46.46) = 0.200. (U 0 = χ260 ). u = 60×3.74 4.252 There is no significant evidence in this sample that the true standard deviation is not equal to 4.25. q q ´ ³ 60 60 , 3.74 40.48 = (3.17, 4.55) Note: the 95% CI for σ: 3.74 83.30

page 7.12

620.152 Introduction to Biomedical Statistics

Example (Renal disease) The mean serum-creatinine level measured in 12 patients 24 hours after they received a newly proposed antibiotic was 1.20 mg/dL; the sample standard deviation was 0.52 mg/dL. The mean and standard deviation of serum-creatinine level in the general population are 1.0 and 0.4 mg/dL respectively. Is there evidence that their standard deviation of the serum-creatinine level for the treated group is different from that of the general population? [σ0 = 0.4; n = 12, s = 0.52] 2

d

u = 11×0.52 = 18.59, p = 2 Pr(U 0 > 18.59) = 0.138. (U 0 = χ211 ). 0.42 There is no significant evidence in this sample that the true standard deviation is not equal to 0.4. q q ³ ´ 11 11 Note: the 95% CI for σ: 0.52 21.92 , 0.52 3.816 = (0.37, 0.88)

approximate z-test (testing λ=λ0 ) The procedure described above can be applied to cases where the null distribution is approximately known. For example, we can follow the routine to obtain an approximate test for the mean of a Poisson distribution. We define: X − λ0 √ λ0 (in which, X is observed; λ0 is given.) Z=

d

If H0 is true, then Z ≈ N(0, 1), provided λ0 > 10 This can then be used in the same way as a z-test. We evaluate the observed value of Z: x ¯ − λ0 z= √ λ0 and compare it to the standard Normal distribution. It is common to use this result in epidemiological studies when examining disease occurrence. If X denotes the number of cases of a rare d

disease in a sub-population of n individuals, then X ≈ Bi(n, p), where d

n is large and p small. So, to a good approximation X ≈ Pn(λ). Example (Occupational health) Many studies have looked at possible health hazards of workers in the aluminium industry. In one such study, a group of 8418 male workers ages 40–64 (either active or retired) on January 1, 1994, were followed for 10 years for various mortality outcomes. Their mortality rates were then compared with national male mortality rates in 1998. In one of the reported findings, there were 21 observed cases of bladder cancer and an expected number of events from general-population cancer mortality rates of 16.1. Evaluate the statistical significance of this result. x = 21, λ0 = 16.1

√ ⇒ z = 21−16.1 = 1.271; so p = 0.204. 16.1

This result is not significant: there is no evidence in this result to indicate that the occurrence of bladder cancer is different from the general population.

page 7.13

620.152 Introduction to Biomedical Statistics

This approximate z-test can be used in a wide variety of situations: whenever we have a result that says the null distribution is approximately normal.

approximate z-test (testing p=p0 ) Suppose we observe a large number of independent trials and obtain X successes. To test H0 : p = p0 , where p denotes the probability of success, we can use X − p0 X − np0 Z=p = qn p0 (1−p0 ) np0 (1 − p0 ) n

(in which, X is observed; p0 and n are given.) d

If H0 is true, then Z ≈ N(0, 1), provided n is large. This too can then be used in the same way as a z-test. We evaluate the observed value of Z: x − p0 Z = qn p0 (1−p0 ) n

and compare it to the standard Normal distribution. Example 100 independent trials resulted in 23 successes. Test the hypothesis that the probability of success is 0.3. 0.23−0.3 = −1.528; p = 2 Pr(Z 0 < −1.528) = 0.127. pˆ = nx = 0.23, z = √ 0.3×0.7 100

There is no significant evidence in this result that p is not equal to 0.3.

page 7.14