Statistical Inference Summary

Statistical Inference Summary Robert S Michael Overview “Statistical inference” is another term for a specific kind of decision making (aka, “hypothe...
Author: Osborne Doyle
1 downloads 0 Views 77KB Size
Statistical Inference Summary Robert S Michael

Overview “Statistical inference” is another term for a specific kind of decision making (aka, “hypothesis testing”) that is central to experimental research, but the underlying logic has broad applicability. The chain of inference in this type of decision making is very long and for this reason is sometimes hard to grasp. We begin by defining a hypothesis and listing the steps in testing a hypothesis.

Hypothesis & Testability

A hypothesis is a tentative explanation for something. For example, “Children’s reading readiness is related to how much their parents value reading.” Is this hypothesis testable? As attractive as it may be, if a hypothesis cannot be tested, it has little utility to empirical investigators. Hypotheses are not testable if the concepts (constructs) to which they refer are not adequately (operationally) defined. To facilitate clarity in communication, operational definitions (procedures that define variables) provide the concrete means of evaluating whether a hypothesis constains concepts that can be observed. Hypotheses are also untestable if they are circular. That is, the event or outcome becomes an explanation for the event. For example: “Your eight-year-old son is distractable in school because he has attention deficit hyperactivity disorder.” This statement is circular. Attention deficit hyperactivity disorder is partially defined by the inability to pay attention, so making the above statement simply says that he doesn’t pay attention because he doesn’t pay attention.

Steps: hypothesis testing Step

Action

1.

Develop the research statement (hypothesis). That is, state what you hope to discover. Interrogative form is common.

2.

Set up the null hypothesis (H0). That is, the assumption of no difference between treatment and comparison groups. You are assuming that the treatment has no effect. Note that this assumption is the exact opposite of what you wish to discover.

3.

Construct the sampling distribution based on the assumption that H0 2

is true. That is, choose a particular test statistic (e.g., z, t, F, χ ) based on particular assumptions about the population, draw a sample, and collect data. 4.

Compare the sample statistic to the the distribution. That is, what is likelihood of obtaining the difference (between the means of groups) that we did obtain if the null hypothesis is true?

F:\rsm\y520\y520_spr04\week_10\hypothesis_test_summary021126.fm

Step 5.

Action Accept or fail to accept the H0 depending on the probability. If the likelihood of the observed difference between means is small we fail to accept the null hypothesis and conclude that the exogenous variable did have an effect. When the difference is large, and hence unlikely due to chance variation, we fail to accept the null hypothesis. Then the difference is said to be “statistically significant.”

More specifically . . .

Parametric tests (e.g., z, t, F, χ2), require the data meet certain assumptions before the test is performed. Otherwise, the results are, at best, meaningless, and at worst, misleading. Assumptions for t: • • • • •

Random sampling. Scores in sample are independent of one another. Sample comes from normal distribution. Measurement scale is at least interval. Population standard deviation is unknown and the sample standard deviation is known.

Hypothesis testing begins with the drawing of a sample and calculating its characteristics (aka, “statistics”). A statistical test (a specific form of a hypothesis test) is an inferential process, based on probability, and is used to draw conclusions about the population parameters. One way to clarity the process of hypothesis testing is to imagine that you have first of all, a population to which no treatment has been applied (aka, “comparison group”). You know the parameters of this population (for example, the mean and standard deviation). Another population exists that is the same as the first, except that some treatment has been applied (aka, the treatment or experimental group). You do not know the parameters of this population. Samples are drawn from this latter population and the statistics derived from the sample serve as the estimates of the unknown population parameters. This is the situation in which hypothesis testing applies, and it provides an introduction for understanding more complicated versions of hypothesis testing that you will encounter later. The logic of hypothesis testing can be stated in three steps: 1 A hypothesis concerning a population is stated. 2 A sample is selected from the population. 3 The sample data are used to determine whether the hypothesis can reasonably be supported or not. Ultimately, the conclusion drawn is about the population not just the sample. None of this is necessary if the entire population is small and accessible, but this is almost never the case.

Step 1: HO

Step 1 breaks down into a series of formal stages. The first stage is to state the null hypothesis, which is usually the hypothesis of no difference. The null hypothesis states that the treatment has no effect, or, stated differently, there is no difference between treated and untreated populations (e.g., µ1 − µ2 = 0). That is, the independent variable or treatment will have no effect on the dependent variable. The null hypothesis is represented by the symbol Ho.

F:\rsm\y520\y520_spr04\week_10\hypothesis_test_summary021126.fm

Examples of several forms that the null hypothesis can take are contained in the following table:

Explanation

H0: A.

µ = 100

Population mean is 100.

B.

µ =0

Population mean is 0.

C.

σ = 15

Standard deviation in the population is 15.

D.

µ1 − µ2 = 0

Means of populations 1 and 2 are equal — no difference in the parameters µ1 and µ2

E.

σ1 2− σ2 2 = 0

Variance in population 1 is equal to variance in population 2.

F.

ρ XY = 0

Correlation coefficient between x and y in the population is 0.

G.

ρ1 −ρ 2 = 0

The difference between ρ XY in population 1 and ρ XY in population 2 is 0.

H.

µ1 = µ2 = µ3

The means in populations 1, 2, and 3 are equal.

I.

p = .5

The proportion in the population is.5

Step 1: HA

The second stage is to state the alternative hypothesis. It proposes the opposite of the null hypothesis in that it says there will be an effect of treatment, there will be differences between populations, or that the independent variable or treatment does indeed affect the dependent variable. The symbol for the alternative hypothesis is either HA or H1 . Most often the HA is non-directional — it just says there will be a difference without saying in which direction. Sometimes a directional hypothesis is used. This will be discussed a little later.

Step 2: Sampling

Step 2 requires that a suitable sample is selected from the population. In order to adequately represent the population, the sample must be random. See sections 9.4 through 9.9 of Hopkins, Hopkins, & Glass (1996) if you have problems with this concept.

Step 3: Statistical Test

In Step 3 the data from the sample are compared with the statement of the null hypothesis. For example, the sample mean (representing the mean of the unknown population) is compared with the known population mean. The decision is made whether or not to reject the null hypothesis. See the discussion in section 10.3 in HH&G for the reasoning behind dealing only with failure to accept the null hypothesis. If we fail to accept null hypothesis, we accept the alternative hypothesis and conclude that there is a treatment effect or a difference between populations, that is, that the independent variable or treatment has affected the dependent variable. These steps are restated in the following table:

Step 1.

Action State the statistical hypothesis H0 to be tested (e.g., H0: µ = 100)

F:\rsm\y520\y520_spr04\week_10\hypothesis_test_summary021126.fm

Step

Action

2.

Set the level of statistical significance (alpha level). That is, specify the degree of risk of a Type I error — the risk of incorrectly concluding that H0 is false when indeed it is true. This risk, stated as a probability, is denoted by α (alpha) and is the probability of a type I error (e.g., α =.05).

3.

Assuming H0 to be correct, determine the probability (p) of obtaining a sample mean (X) that differs from the population mean (µ) by an amount as large or larger than that which was observed (e.g., if µ = 100, and X = 108, calculate the probability of observing a difference between X and µ of 8 or more points).

4.

Make a decision regarding H0 — whether or not to reject it (e.g., if the probability (p) from Step 3 is less than alpha (Step 2), we fail to accpet the null hypothesis and conclude that mu does not equal 100.

Whenever we make a decision about rejecting or failing to reject the null hypothesis, two types of error may occur: • We reject the null hypothesis when we should not because in reality the null hypothesis is true. This is known Type I, or alpha error. • We accept the null hypothesis when we should not because in reality the null hypothesis is false. This decision is known as Type II, or beta error. The four outcomes of decision making are illustrated in the following box: Actual State of Nature — The Null Hypothesis is, in reality: True Our decision

Accept HO

Reject HO

☺ 

Error, Type I or alpha

False



Error, Type II or beta



Most experimenters hope to reject the null hypothesis and to therefore claim that their experimental treatment has had an effect. However, as false claims of treatment effects (Type I errors) are scientifically serious, it is necessary to set stringent criteria. We can never be absolutely certain that we have correctly rejected, or failed to reject, the null hypothesis, but we can determine the probability associated with making an error in this process. You may recall from a previous lecture and textbook readings that the probability of obtaining a particular sample mean from a population can be determined using z-scores. Sample means very close to the population mean are highly likely. Sample means distant from the population mean (in the tails of the distribution) are very unlikely, but they do occur. If the null hypothesis is true and our treatment has no effect, we would expect that the sample we draw will have a mean close to that of the population. Sample means in the tails are not very likely if the null hypothesis is true. Such means indicate that we should reject the null

F:\rsm\y520\y520_spr04\week_10\hypothesis_test_summary021126.fm

hypothesis. (See Figure 10.2, p. 175, HH&G). A boundary or decision line has to be drawn, therefore, between those sample means that are expected, given the null hypothesis, and those that are so unlikely that they lead to rejection of the null hypothesis. The boundary that separates these sample means is called the level of significance or alpha level. It is a probability value beyond which obtained sample means are very unlikely to occur if the null hypothesis is true. The value .05 is commonly used as the alpha ) level. It represents the proportion of the area in the tails of the distribution where sample means are sufficiently unlikely, if the null hypothesis is true. The alpha level also tells us the probability of producing a Type I error. An example makes this whole process clearer— a good one is provided on pp. 174-175 of HH&G Although .05 is the most commonly accepted alpha level in psychological and educational research, more stringent levels, such as .01 and .001 may be used when the consequence of making a Type I error is serious. A general statement of the z-score statistic is provided on p. 175 of HH&G: samplemean – hypothesizedpopulationmean z = ---------------------------------------------------------------------------------------------------------------------s tan darderrorofthesamplingdistribution

A more general form of this which you will find applicable to a large range of statistics that you will learn about in the future is: samplestatistic – hypothesizedpopulationparameter teststatistic = ------------------------------------------------------------------------------------------------------------------------------------------s tan darderrorofthedistributionoftheteststatistic

This may be restated in less statistical terms as: obtaineddifference teststatistic = ---------------------------------------------------------------------------difference exp ectedbychance

Do remember that, when testing the null hypothesis, you can reject it when the difference between your sample data and that which would be expected according to the null hypothesis is large enough. However, if a small difference is obtained, you should not accept the null hypothesis. Instead, according to the logic involved in this process, you are only entitled to say that you fail to reject the null hypothesis. When we reject the null hypothesis, we are saying that the difference we obtained (between the sample statistic and the hypothesized population parameter) is sufficiently unlikely to occur by chance alone. We are entitled to say that our treatment has had an effect. But there is always a small chance that we are wrong. In this case we have made a Type I error. The probability that we are wrong is equal to significance level. When findings are stated in a research report the null or the alternative hypotheses are normally not mentioned. Instead, the term statistical significance is used. If the null hypothesis has been rejected, the findings are said to be “statistically significant.” If the null hypothesis was not rejected the findings are not statistically significant. It is necessary to make a statement about whether or not you obtained statistical significance and to include the value of your sample statistic and say whether the probability of obtaining that statistic is greater or smaller than the alpha level you have chosen. These values are often included in brackets for linguistic simplicity. There is an endless variety of ways in which you can say the same thing, but you must take care with the wording of statements about significance and non-significance.

F:\rsm\y520\y520_spr04\week_10\hypothesis_test_summary021126.fm

One-Tailed Hypothesis Tests The null hypothesis always says that there is no treatment effect. The alternative hypothesis says that there is a treatment effect (or in other words, a difference between the sample data and that expected according to the null hypothesis). Such a statement does not predict the direction of difference created by the treatment. It is said to be a two-tailed hypothesis because highly unlikely events in either tail of the distribution will lead to rejection of the null hypothesis. Although they are used less often, alternative hypotheses may be one-tailed or directional. In this case, the researcher is predicting either an increase or a decrease as a result of treatment, but not both. In this case the null hypothesis is rejected only if the sample data fall in the predicted tail of the distribution. The critical region still represents the same area of the curve (e.g., .05) but the whole area is located only in one tail (see section 10.10, p. 181; HH&G. Two-tailed tests are said to be more conservative because the difference between the sample data and that expected according to the null hypothesis (that is the treatment effect) must be larger to achieve the same level of statistical significance, and thus reject the null hypothesis, than in a one-tailed test. Even when a particular direction of treatment effect can be predicted a two-tailed test is still often used.

Statistical Power The goal of hypothesis testing is usually to correctly reject the null hypothesis, or in other words, to show that the treatment applied has had an effect. The probability of correctly rejecting the null hypothesis is called the power of a statistical test. Power is calculated as 1 - α, where α is the probability of making a Type II error (failing to reject the null hypothesis when it is false). Statistical power is large when the treatment effect is large. Put another way, you are more likely to correctly reject the null hypothesis when the treatment has created a large difference between your sample data and the original population. Other factors that influence power that are more directly controllable than size of treatment effect are: • the alpha level chosen. Smaller α levels produce smaller values for power. • whether a one-tailed or two-tailed test is used. Statistical power is greater for one-tailed tests • sample size. Larger samples provide greater power

F:\rsm\y520\y520_spr04\week_10\hypothesis_test_summary021126.fm

Suggest Documents