Introduction to Hypothesis Testing. Introduction to Hypothesis Testing

1 Introduction to Hypothesis Testing Decision Examples TRUE STATE DECISION Innocent Guilty Declare innocent correct decision Declare guilty ERROR...
Author: Jasper Norris
0 downloads 0 Views 698KB Size
1

Introduction to Hypothesis Testing Decision Examples TRUE STATE DECISION

Innocent

Guilty

Declare innocent correct decision Declare guilty

ERROR

ERROR correct decision

• How can the jury avoid – Convicting an innocent person? – Freeing a guilty person?

• Is one kind of error worse than another? – What does the instruction, “Innocent unless the evidence proves guilt beyond a reasonable doubt,” suggest about how our system, in theory, balances the two?

Introduction to Hypothesis Testing Decision Examples

TRUE STATE

DECISION

No Prostate Cancer

Prostate Cancer

Decide no Prostate Cancer

correct decision

ERROR

Biopsy

ERROR

correct decision

• The evidence comes from a painless blood test – PSA 4.0 can be caused by infection or by cancer of the prostrate – Urologists disagree on how much above 4.0 or for how long above 4.0 the PSA should be to call for a biopsy, and on how age of the patient should influence the decision • Note, disagreements are about the decision criterion and the relative costs of the two types of errors

2

Introduction to Hypothesis Testing What would you do if you wanted to determine if a two sided coin is fair? • You’d probably flip it a bunch of times to see if about 1/2 the time it’s “heads” and 1/2 the time it’s “tails”. • You might also set a criteria by which it would be considered unfair. For example, you might suggest that out of 12 flips if there are 9 or more “heads” or “tails” the coin is unfair. • This scenario is a simple hypothesis test. Using what is known about probabilities and sampling distributions, even more precise tests may be developed.

Introduction to Hypothesis Testing • as researchers, we need to decide at what point we believe the coin is unfair • a typical guideline is to call anything within the middle 95% of the distribution fair, while the upper and lower 2.5% would be unfair fair Area=95%

unfair 4 Area=2.5% 3 CRITICAL REGION

unfair Area=2.5% CRITICAL REGION

2 1 0 −



α 2

+



α 2

3

Introduction to Hypothesis Testing Probability 0.00024 0.0029 0.0161 0.0537 0.1208 0.1936 0.2256 0.1936 0.1208 0.0537 0.0161 0.0029 0.00024

2 std dev

0.25 0.2 0.15 p

Number of Heads 12 11 10 9 8 7 6 5 4 3 2 1 0

0.1 0.05 0 1

2

3

4

5

6

7 8 # heads

9

10

11

12

13

• using the addition rule of probability, notice that the probability of 10, 11, or 12 “heads” out of 12 is < .025 or 2.5% • the same is true for 0, 1, or 2

Hypothesis Testing Definition: • An inferential procedure that uses sample data to evaluate a hypothesis about a population • Hypothesis testing involves a standardized set of procedures so a researcher can objectively evaluate a hypothesis • The process starts with a research question -how will the population mean change after a treatment (independent variable) is administered?

4

Hypothesis Testing: The Steps 1. State the hypotheses: null & alternative 2. Set the criterion 3. Obtain sample data 4. Calculate the test statistic 5. Decided to reject or fail to reject the null hypothesis and interpret your decision

1. State the hypotheses • the null hypothesis, H0 , is always the hypothesis that states that there is no treatment effect, no change, no difference, etc. • the alternative hypothesis, H1 , states that there was a treatment effect, usually in terms of the independent variable, I.V., having an effect on the dependent variable, D.V. hypothesis are always stated in terms of populations remember, even though samples are used, the goal of inferential statistics is to make statements about the population of interest

5

1. State the hypotheses (cont.) Null Hypothesis H0 • for example, suppose a researcher wanted to know what effect smoking marijuana has on reaction time • knowing the population mean on this particular reaction time instrument is 1.2 seconds, the hypothesis can be set up

H0: µ=1.2 sec Control Group

1. State the hypotheses (cont.) Alternative Hypothesis H1

• when the direction of the effect is not known, the alternative hypothesis will be stated in terms of inequality,

H1: µ≠1.2 sec

• there are instances, based on theory or previous research, when the alternative hypothesis is stated in terms of direction – for example, based on previous research, it is known that smoking marihuana increases the amount of time it takes to react

H1: µ>1.2 sec

6

1. State the hypotheses (cont.) • notice in the previous example that the null hypothesis, H0 , still maintains equality • this should always be the case • therefore,

H0: µ=1.2 sec H1: µ>1.2 sec

2. Set the criterion • referring back to the example of flipping the coin, setting the criterion, α, is the statistical equivalent of deciding “at what point is the coin unfair” • as was already mentioned, the middle 95% is usually considered “fair” • in this example, the remaining 5% would be considered error, therefore the criterion is

α=0.05

Area=95%

4 3

Area=2.5%

2

Area=2.5%

1 0 −



α 2

+



α 2

7

2. Set the criterion (cont.) • The criterion, α, is also known as Type I Error • Type I error is defined as the probability of rejecting a true null hypothesis – that is to say, if the null hypothesis is true and we reject it, there is a predetermined chance (usually a 5%) that we are wrong

• errors will be discussed in detail later on

2. Set the criterion (cont.) • The criterion delimits what is called the critical region • The critical region is defined as the extreme scores in a distribution where the probability of obtaining them is < α when the null hypothesis is true

Two-Tailed Test

4 3 2

Critical Region

Critical Region

1

One-Tailed Test

4

0 −

α 2

+

α 2

3

Critical Region

2 1 0





+



α 2

8

2. Set the criterion (cont.) • as was previously mentioned, the unit normal table can be used to calculate area proportions above or below a score or scores in a distribution corresponding to a given percentage Example – Find the z-score associated with the upper and lower scores when considering 95% of a normal distribution • “upper and lower scores”  two-tailed test • α should be divided by 2 before looking up the z-score • α/2 = 0.05/2 = 0.025

2. Set the criterion (cont.) 4 3 2

p=.025

p=.025

1 0 −

α 2

+

α 2

• In Appendix D: Table A look for p=0.025 in “the area beyond z”€ € • The z-score is 1.96. Since it’s a two-tailed test z= +/-1.96.

9

4. Obtain Sample Data • After manipulating as per your hypothesis, collect sample data • Use descriptive statistics to see how your data looks like

4. Calculate the test statistic • one of the challenges you will face is deciding which test statistic to use • you will learn what each one is used for as the class progresses

10

5. Decide to reject or fail to reject • if the test statistic falls in the critical region, the null hypothesis is rejected

test statistic

4 3 2 1 0 −

4

• if the test statistic does not fall 3 in the critical region, the null 21 hypothesis is NOT rejected 0

α 2



+

α 2

test statistic





α 2

+

α 2

Notice that no statements are made about€ the € alternative hypothesis

Caveat: • hypothesis testing does not “prove” anything • this is particularly true of the alternative hypothesis • the reason probability statements are not made about the alternative hypothesis, is that there still might be other alternative hypothesis – comments such as “supports the theory” and “provide evidence to suggest” are common ways of describing research findings

11

Example: Suppose I am interested in determining whether or not review sessions have any effect on exam performance. I will administer the independent variable, a review session, to a sample of students in an attempt to determine if this has an effect on the dependent variable, exam performance. Based on information gathered in previous semesters, I know that the population mean for a given exam is 24.

Step 1: State the hypotheses • A researcher always states two opposing hypotheses NULL HYPOTHESIS: – States that the treatment has no effect (there is no change, no difference, nothing happened). – The null hypothesis is always written as Ho. Example: – H0: µ=24 (Even with the review session, the mean exam score is 24) – µ represents the hypothesized population mean for students having review sessions

12

Step 1: State the hypotheses (cont) ALTERNATIVE HYPOTHESIS: Predicts that the independent variable will have an effect on the dependent variable (this is the hypothesis the researcher “roots” for – The alternative hypothesis is written as H1 or HA. We’ll use H1. Example: – H1: µ≠24 – µ represents the hypothesized population mean for students having review sessions. The true population mean for these students may be higher or lower than 24

Step 1: State the hypotheses (cont) Hypotheses: • H0: µ=24 • H1: µ≠24 – The task is to choose between these two hypotheses – The null hypothesis is the hypothesis that is actually tested (we can only test one distribution at at time) – The null hypothesis states that the mean for the review population will be 24 -- the same as the untreated, previous population

13

Step 2: Setting the criterion • Our decision is going to be based on a comparison of our sample mean and the hypothesized population mean Small discrepancy  fail to reject null hypothesis

X compared to µ Large discrepancy  reject null hypothesis



How far away does our sample data mean need to be from the hypothesized mean in order to tell if the effect is due to our manipulation or just sampling error? The process of answering this question involves establishing an alpha level.

Step 2: Setting the criterion (cont) ALPHA LEVEL (LEVEL OF SIGNIFICANCE): Alpha is

• An area under the curve that symbolized as α we use to define “very unlikely” or “very extreme” sample values • By convention, α is usually set at .05, .01, or .001 • The alpha level is used to split the distribution into two sections: – Sample means that are compatible with the null hypothesis (the center of the distribution) – Sample means that are significantly different from the null hypothesis (the very unlikely values that fall in the tails of the distribution) α α Compatible Ho 4



3 Incompatible Ho

+

2

Incompatible Ho

2 1 0



2



14

Step 2: Setting the criterion (cont) • If alpha is set at α=.05, then the extreme 5% of scores in the sampling distribution would represent those “extreme” or “unlikely” sample values • This “extreme” region of the distribution that we define with α is called the critical region • If we set α to .05 for our example, this would mean that if our sample mean falls in the critical region, we would believe that the mean of the population of the review group is not 24 (the same as the non-review group). It is something larger or smaller, depending on which tail it falls in. 4

2.5% CRITICAL 2 REGION

2.5% CRITICAL REGION

3 1 0





α 2

+

α 2



Step 2: Setting the criterion (cont) Directional vs. Non-directional Hypotheses (One-tailed vs. Two-Tailed) TWO-TAILED HYPOTHESIS TEST (NON-DIRECTIONAL): The alternative hypothesis does not specify the direction of change in the mean; all that is predicted is that some change will occur Example: Do review sessions have any effect on exam 4 3 performance? 2 H0: µ=24 1 H1: µ≠24 0 • Sample values that are substantially different (either larger or smaller) than the hypothesized population mean would lead to a rejection of the null hypothesis

15

Step 2: Setting the criterion (cont) Directional vs. Non-directional Hypotheses (One-tailed vs. Two-Tailed) ONE-TAILED HYPOTHESIS TEST (DIRECTIONAL): The alternative hypothesis specifies either an increase or a decrease in the mean due to treatment; a specific prediction about the direction of change is made Example: Do review sessions4 improve exam performance? 3 H0: µ< 24 2 1 H1: µ> 24 0

• Only sample values substantially larger than 24 would lead to a rejection of the null hypothesis

Step 2: Setting the criterion (cont) Effects on Alpha: • Due to convention, alpha is most often set at .05 • For a two-tailed test, alpha must be divided between the two tails (.025 in each tail of the distribution) • For a one-tailed test, all of the alpha amount is found in one tail (.05)

Two-Tailed Test

4 3 2

.025

.025

1

One-Tailed Test

4

0 −

α 2

+

α 2

3

.05

2 1 0





+



α 2

16

Step 3: Obtain sample data • In order to ensure that the researcher makes an objective decision, the data is collected after the researcher has stated the hypotheses and set the alpha level. – Our hypothesis is that the review session will improve test scores. Thus, we should a one-tailed test, α = 0.05 EXAMPLE A

EXAMPLE B

X = 28 σ X = 2.29

X = 28 σ X = 2.67





Step 4: Calculate the test statistic

z= EXAMPLE A

28 − 24 z= 2.29 z = 1.75





X −µ σX EXAMPLE B

28 − 24 2.67 z = 1.50 z=

17

Step 5: Evaluate the null hypothesis • In the final step, you compare your sample data to the null hypothesis and make a decision • There are 2 possible decisions: 1. Reject the null hypothesis: if our sample

mean is substantially different from what the null hypothesis predicts (if the sample mean falls in the critical region)

2. Fail to reject the null hypothesis: if our

sample mean is not substantially different from the null hypothesis (does not fall in the critical region)

Step 5: Evaluate the null hypothesis (cont) 1) Reject the null hypothesis: – The sample mean provides evidence that the treatment had an effect – Findings are considered statistically significant when the null hypothesis is rejected EXAMPLE A In Appendix D:Table A, lookup what the p value is for z=1.75 • • • •

Which column should you look at, B or C? Is the p value less or greater than alpha? Did the treatment have an effect? Was it statistically significant?

18

Step 5: Evaluate the null hypothesis (cont) 2) Fail to reject the null hypothesis: – Findings are considered statistically nonsignificant when we fail to reject the null hypothesis EXAMPLE B In Appendix D:Table A, lookup what the p value is for z=1.5 • • • •

Which column should you look at, B or C? Is the p value less or greater than alpha? Did the treatment have an effect? Was it statistically significant?

Type I & Type II error • the fifth step of hypothesis testing is deciding to reject or fail to reject the null hypothesis • when this decision is made one of two things is possible, either you are right or you are wrong TRUE STATE DECISION

Ho

H1

correct decision

Type II error

p =1-α

p =β

Type I error

correct decision

p =α

p =1-β

Do not reject Ho

Reject Ho

19

Type I & Type II error • Type I error, α (alpha), is defined as the probability of rejecting a true null hypothesis • Type II error, β (beta), is defined as the probability of failing to reject a false TRUE STATE null hypothesis DECISION Ho

H1

correct decision

Type II error

p =1-α

p =β

Type I error

correct decision

p =α

p =1-β

Do not reject Ho

Reject Ho

Type I & Type II error analogy • consider a court case – H0: not guilty – H1: guilty

TRUE STATE DECISION

not guilty

guilty

not guilty

correct decision

Type II error

guilty

Type I error

correct decision

• A Type I error would occur if a jury convicted an innocent person • A Type II error would occur if a jury let a guilty man walk • Our justice system sets the probability of a Type I error to “beyond a reasonable doubt”, just as researchers set it to .05, .01, etc.

20

Type I & Type II error • example of a Type I error: A researcher concludes that a certain drug treatment significantly decreases the possibility of heart disease when, if fact, it doesn’t. • example of a Type II error. A researcher concludes that a certain drug does not significantly decrease overactive behavior in children when, in fact, it does. TRUE STATE DECISION

NO decrease heart disease

decrease heart disease

NO decrease heart disease

correct decision

Type II error

decrease heart disease

Type I error

correct decision

Suggest Documents