A10_304213 9/4/07 12:34 PM Page A-10

Appendix B

Statistics in Psychological Research Understanding and interpreting the results of psychological research depends on statistical analyses, which are methods for describing and drawing conclusions from data. The chapter on research in psychology introduced some terms and concepts associated with descriptive statistics—the numbers that psychologists use to describe and present their data—and with inferential statistics—the mathematical procedures used to draw conclusions from data and to make inferences about what they mean. Here, we present more details about these statistical analyses that will help you to evaluate research results.

Describing Data To illustrate our discussion, consider a hypothetical experiment on the effects of incentives on performance. The experimenter presents a list of mathematics problems to two groups of participants. Each group must solve the problems within a fixed time, but for each correct answer, the low-incentive group is paid ten cents, whereas the high-incentive group gets one dollar. The hypothesis to be tested is the null hypothesis, the assertion that the independent variable manipulated by the experimenter will have no effect on the dependent variable measured by the experimenter. In this case, the null hypothesis is that the size of the incentive (the independent variable) will not affect performance on the mathematics task (the dependent variable). Assume that the experimenter has gathered a representative sample of participants, assigned them randomly to the two groups, and done everything possible to avoid the confounds and other research problems discussed in the chapter on research in psychology. The experiment has been run, and the psychologist now has the data: a list of the number of correct answers given by each participant in each group. Now comes the first task of statistical analysis: describing the data in a way that makes them easy to understand.

The Frequency Histogram

 null hypothesis The assertion that the independent variable manipulated by the experimenter will have no effect on the dependent variable measured by the experimenter.  frequency histogram A graphic presentation of data that consists of a set of bars, each of which represents how frequently different scores or values occur in a data set.  descriptive statistics Numbers that summarize a set of research data.

A-10

The simplest way to describe the data is to draw up something like Table 1, in which all the numbers are simply listed. After examining the table, you might notice that the high-incentive group seems to have done better than the low-incentive group, but this is not immediately obvious. The difference might be even harder to see if more participants had been involved and if the scores included three-digit numbers. A picture is worth a thousand words, so a better way of presenting the same data is in a picture-like graphic known as a frequency histogram (see Figure 1). Construction of a histogram is simple. First, divide the scale for measuring the dependent variable (in this case, the number of correct answers) into a number of categories, or “bins.” The bins in our example are 1–2, 3–4, 5–6, 7–8, and 9–10. Next, sort the raw data into the appropriate bin. (For example, the score of a participant who had 5 correct answers would go into the 5–6 bin, a score of 8 would go into the 7–8 bin, and so on.) Finally, for each bin, count the number of scores in that bin and draw a bar up to the height of that number on the vertical axis of a graph. The resulting set of bars makes up the frequency histogram.

A11_304213 9/4/07 12:34 PM Page A-11

Describing Data

TA B L E

1

Number of cases

4 3 2 1 Test score categories

A-11

A Simple Data Set

Here are the test scores obtained by thirteen participants performing under low-incentive conditions and thirteen participants performing under highincentive conditions.

5



Low Incentive

4 6 2 7 6 8 3 5 2 3 5 9 5

High Incentive

6 4 10 10 7 10 6 7 5 9 9 3 8

Because we are interested in comparing the scores of two groups, there are separate histograms in Figure 1: one for the high-incentive group and one for the lowincentive group. Now the difference between groups that was difficult to see in Table 1 becomes clearly visible: High scores were more common among people in the high-incentive group than among people in the low-incentive group. Histograms and other pictures of data are useful for visualizing and better understanding the “shape” of research results, but in order to analyze those results statistically, we need to use other ways of handling the data that make up these graphic presentations. For example, before we can tell whether two histograms are different statistically or just visually, the data they represent must be summarized using descriptive statistics.

Low incentive

Number of cases

Descriptive Statistics The four basic categories of descriptive statistics (1) measure the number of observations made; (2) summarize the typical value of a set of data; (3) summarize the spread, or variability, in a set of data; and (4) express the correlation between two sets of data.

5 4 3 2

N The easiest statistic to compute, abbreviated as N, simply describes the number

1 Test score categories High incentive

1–2

5–6

3–4

7–8

9–10

FIGURE 1 Frequency Histograms The height of each bar of a histogram represents the number of scores falling within each range of score values. The pattern formed by these bars gives a visual image of how research results are distributed.

of observations that make up the data set. In Table 1, for example, N  13 for each group, or 26 for the entire data set. Simple as it is, N plays a very important role in more sophisticated statistical analyses.

Measures of Central Tendency It is apparent in the histograms in Figure 1 that there is a difference in the pattern of scores between the two groups. But how much of a difference? What is the typical value, the central tendency, that represents each group’s performance? As described in the chapter on research in psychology, there are three measures that capture this typical value: the mode, the median, and the mean. Recall that the mode is the value or score that occurs most frequently in the data set. The median is the halfway point in a set of data: Half the scores fall above the median, half fall below it. The mean is the arithmetic average. To find the mean, add up the values of all the scores and divide that total by the number of scores. Measures of Variability The variability, or spread, or dispersion of a set of data is often just as important as its central tendency. This variability can be quantified by measures known as the range and the standard deviation.

A12_304213 9/4/07 12:34 PM Page A-12

A-12



TA B L E

2

APPENDIX B

Statistics in Psychological Research

Calculating the Standard Deviation

The standard deviation of a set of scores reflects the average degree to which those scores differ from the mean of the set.

Difference from Mean  D

Raw Data

2  4  2

2

D2

4

2

2  4  2

4

3

3  4  1

1

4

44

0

0

9

94

5

25 ∑D2  34

Mean  20/5  4

Standard deviation =

∑D N

2

=

34

=

6. 8 = 2. 6

5

Note: ∑ means “the sum of.”

As described in the chapter on research in psychology, the range is simply the difference between the highest and the lowest values in a data set. For the data in Table 1, the range for the low-incentive group is 9  2  7; for the high-incentive group, the range is 10  3  7. The standard deviation, or SD, measures the average difference between each score and the mean of the data set. To see how the standard deviation is calculated, consider the data in Table 2. The first step is to compute the mean of the set—in this case, 20/5  4. Second, calculate the difference, or deviation (D), of each score from the mean by subtracting the mean from each score, as in column 2 of Table 2. Third, find the average of these deviations. Notice, though, that if you calculated this average by finding the arithmetic mean, you would sum the deviations and find that the negative deviations exactly balance the positive ones, resulting in a mean difference of 0. Obviously there is more than zero variation around the mean in the data set. So, instead of employing the arithmetic mean, you compute the standard deviation by first squaring the deviations (which, as shown in column 3 of Table 2, removes any negative values). You then add up these squared deviations, divide the total by N, and then take the square root of the result. These simple steps are outlined in more detail in Table 2.

 range A measure of variability that is the difference between the highest and the lowest values in a data set.  standard deviation (SD) A measure of variability that is the average difference between each score and the mean of the data set.  normal distribution A dispersion of scores such that the mean, median, and mode all have the same value. When a distribution has this property, the standard deviation can be used to describe how any particular score stands in relation to the rest of the distribution.

The Normal Distribution Now that we have described histograms and reviewed some descriptive statistics, let’s reexamine how these methods of representing research data relate to some of the concepts discussed elsewhere in the book. In most subfields in psychology, when researchers collect many measurements and plot their data in histograms, the resulting pattern often resembles the one shown for the low-incentive group in Figure 1. That is, the majority of scores tend to fall in the middle of the distribution, with fewer and fewer scores occurring as one moves toward the extremes. As more and more data are collected, and as smaller and smaller bins are used (perhaps containing only one value each), histograms tend to smooth out until they resemble the bell-shaped curve known as the normal distribution, or normal curve. When a distribution of scores follows a truly normal curve, its mean, median, and mode all have the same value. Furthermore, if the curve is normal, we can use its standard deviation to describe how any particular score stands in relation to the rest of the distribution. IQ scores provide an example. They are distributed in a normal curve, with a mean, median, and mode of 100 and an SD of 16—as shown in Figure 2. In such a

A13_304213 9/4/07 12:34 PM Page A-13

Describing Data



A-13

95% of the scores 68% of the scores

FIGURE 2 The Normal Distribution Many kinds of research data approximate the balanced, or symmetrical, shape of the normal curve, in which most scores fall toward the center of the range.

–2

–1

0

+1

+2

Standard deviations 68

84

100

116

132

IQ The normal distribution of IQ

distribution, half of the population will have an IQ above 100, and half will be below 100. The shape of the true normal curve is such that 68 percent of the area under it lies in a range within one standard deviation above and below the mean. In terms of IQ, this means that 68 percent of the population has an IQ somewhere between 84 (100 minus 16) and 116 (100 plus 16). Of the remaining 32 percent of the population, half falls more than 1 SD above the mean, and half falls more than 1 SD below the mean. Thus, 16 percent of the population has an IQ above 116, and 16 percent scores below 84. The normal curve is also the basis for percentiles. A percentile score indicates the percentage of people or observations that fall below a given score in a normal distribution. In Figure 2, for example, the mean score (which is also the median) lies at a point below which 50 percent of the scores fall. Thus the mean of a normal distribution is at the 50th percentile. What does this say about IQ? If you score 1 SD above the mean, your score is at a point above which only 16 percent of the population falls. This means that 84 percent of the population (100 percent minus 16 percent) must be below that score; so this IQ score is at the 84th percentile. A score at 2 SDs above the mean is at the 97.5 percentile, because only 2.5 percent of the scores are above it in a normal distribution. Scores may also be expressed in terms of their distance in standard deviations from the mean, producing what are called standard scores. A standard score of 1.5, for example, is 1.5 standard deviations from the mean.

 percentile score A value that indicates the percentage of people or observations that fall below a given point in a normal distribution.  standard score A value that indicates the distance, in standard deviations, between a given score and the mean of all the scores in a data set.

Correlation Histograms and measures of central tendency and variability describe certain characteristics of one dependent variable at a time. However, psychologists are often interested in describing the relationship between two variables. Measures of correlation are frequently used for this purpose. We discussed the interpretation of the correlation coefficient in the chapter on research in psychology; here we describe how to calculate it. Recall that correlations are based on the relationship between two numbers that are associated with each participant or observation. The numbers might represent, say, a person’s height and weight or the IQ scores of a parent and child. Table 3 contains this kind of data for four participants from our incentives study who took the test twice. (As you may recall from the chapter on cognitive abilities, the correlation between their scores would be a measure of test-retest reliability.)

A14_304213 9/4/07 12:34 PM Page A-14

A-14



APPENDIX B

Statistics in Psychological Research

The formula for computing the Pearson product-moment correlation, or r, is as follows: r =

∑ ( x − M x) ( y − M y) ∑ ( x − M x) ∑ ( y − M y) 2

2

where: x y Mx My

TRY THIS

TA B L E

3

   

each score on variable 1 (in this case, test 1) each score on variable 2 (in this case, test 2) the mean of the scores on variable 1 the mean of the scores on variable 2

The main function of the denominator (bottom part) in this formula is to ensure that the coefficient ranges from 1.00 to 1.00, no matter how large or small the values of the variables being correlated. The “action element” of this formula is the numerator (or top part). It is the result of multiplying the amounts by which each of two observations (x and y) differ from the means of their respective distributions (Mx and My). Notice that, if the two variables “go together” (so that, if one score is large, the score it is paired with is also large, and if one is small, the other is also small), then both scores in each pair will tend to be above the mean of their distribution or both of them will tend to be below the mean of their distribution. When this is the case, x  Mx and y  My will both be positive, or they will both be negative. In either case, when you multiply one of them by the other, their product will always be positive, and the correlation coefficient will also be positive. If, on the other hand, the two variables go opposite to one another, such that, when one score in a pair is large, the other is small, one of them is likely to be smaller than the mean of its distribution, so that either x  Mx or y  My will have a negative sign, and the other will have a positive sign. Multiplying these differences together will always result in a product with a negative sign, and r will be negative as well. Now compute the correlation coefficient for the data presented in Table 3. The first step (step a in the table) is to compute the mean (M) for each variable. Mx turns out to be 3 and My is 4. Next, calculate the numerator by finding the differences between each x and y value and its respective mean and by multiplying them (as in step b of Table 3). Notice that, in this example, the differences in each pair have like signs, so the correlation coefficient will be positive. The next step is to calculate the terms in the denominator; in this case, as shown in steps c and d in Table 3, they have values of 18 and 4. Finally, place all the terms in the

Calculating the Correlation Coefficient

Though it appears complex, calculation of the correlation coefficient is quite simple. The resulting r reflects the degree to which two sets of scores tend to be related, or to co-vary.

Participant

Test 1

(x  Mx)(y  My)(b)

Test 2

A

1

3

(1  3)(3  4)  (2)(1)  2

B

1

3

(1  3)(3  4)  (2)(1)  2

C

4

5

(4  3)(5  4)  (1)(1)

 1

D

6

5

(6  3)(5  4)  (3)(1)

 3

Mx  3

My  4

(x  Mx )(y  My )

 8

(a)

(c)

∑(x  Mx)2  4  4  1  9  18

(d)

∑(y  My)2  1  1  1  1  4 ( e)

r =

∑ ( x − Mx)( y − My) ∑ ( x − Mx) ∑ ( y − My) 2

2

=

8 18 × 4

=

8 72

=

8 8.48

= +.94

A15_304213 9/4/07 12:34 PM Page A-15

Inferential Statistics



A-15

formula and carry out the arithmetic (step e). The result in this case is an r of .94, a high and positive correlation suggesting that performances on repeated tests are very closely related. A participant doing well the first time is very likely to do well again; a person doing poorly at first will probably do no better the second time.

Inferential Statistics The descriptive statistics from the incentives experiment tell the experimenter that the performances of the high- and low-incentive groups differ. But there is some uncertainty. Is the difference large enough to be important? Does it represent a stable effect or a fluke? The researcher would like to have some measure of confidence that the difference between groups is genuine and reflects the effect of incentives on mental tasks in the real world, rather than the effect of random or uncontrolled factors. One way of determining confidence would be to run the experiment again with a new group of participants. Confidence that incentives produced differences in performance would grow stronger if the same or a larger between-group difference occurs again. In reality, psychologists rarely have the opportunity to repeat, or replicate, their experiments in exactly the same way three or four times. But inferential statistics provide a measure of how likely it was that results came about by chance. They put a precise mathematical value on the confidence or probability that rerunning the same experiment would yield similar (or even stronger) results.

Differences Between Means: The t Test

 inferential statistics A set of procedures that provides a measure of how likely it is that research results came about by chance.

One of the most important tools of inferential statistics is the t test. It allows the researcher to ask how likely it is that the difference between two means occurred by chance rather than as a function of the effect of the independent variable. When the t test or other inferential statistic says that the probability of chance effects is small enough (usually less than 5 percent), the results are said to be statistically significant. Conducting a t test of statistical significance requires the use of three descriptive statistics. The first component of the t test is the size of the observed effect, the difference between the means. Recall that the mean is calculated by summing a group’s scores and dividing that total by the number of scores. In the example shown in Table 1, the mean of the high-incentive group is 94/13, or 7.23, and the mean of the lowincentive group is 65/13, or 5. So the difference between the means of the high- and low-incentive groups is 7.23  5  2.23. Second, we have to know the standard deviation of scores in each group. If the scores in a group are quite variable, the standard deviation will be large, indicating that chance may have played a large role in producing the results. The next replication of the study might generate a very different set of group scores. If the scores in a group are all very similar, however, the standard deviation will be small, which suggests that the same result would probably occur for that group if the study were repeated. In other words, the difference between groups is more likely to be significant when each group’s standard deviation is small. If variability is high enough that the scores of two groups overlap, the mean difference, though large, may not be statistically significant. (In Table 1, for example, some people in the low-incentive group actually did better on the math test than some in the high-incentive group.) Third, we need to take the sample size, N, into account. The larger the number of participants or observations, the more likely it is that an observed difference between means is significant. This is so because, with larger samples, random factors within a group—the unusual performance of a few people who were sleepy or anxious or hostile, for example—are more likely to be canceled out by the majority, who better represent people in general. The same effect of sample size can be seen in coin tossing. If you toss a quarter five times, you might not be too surprised if

A16_304213 9/4/07 12:34 PM Page A-16

A-16



APPENDIX B

Statistics in Psychological Research

heads comes up 80 percent of the time. If you get 80 percent heads after one hundred tosses, however, you might begin to suspect that this is probably not due to chance alone and that some other effect, perhaps some bias in the coin, is significant in producing the results. (For the same reason, even a relatively small correlation coefficient—between diet and grades, say—might be statistically significant if it was based on 50,000 students. As the number of participants increases, it becomes less likely that the correlation reflects the influence of a few oddball cases.) To summarize, as the differences between the means get larger, as N increases, and as standard deviations get smaller, t increases. This increase in t raises the researcher’s confidence in the significance of the difference between means. Let’s now calculate the t statistic and see how it is interpreted. The formula for t is: ( M 1 − M 2)

t =

( N1 − 1 ) S 12 + ( N2 − 1 ) S 22 ⎛N1 + N2 ⎞

⎜ NN ⎟ ⎝ 1 2 ⎠

N1 + N2 − 2

where: M1  M2  N1  N2  S1  S2 

mean of group 1 mean of group 2 number of scores or observations for group 1 number of scores or observations for group 2 standard deviation of group 1 scores standard deviation of group 2 scores

Despite appearances, this formula is quite simple. In the numerator is the difference between the two group means; t will get larger as this difference gets larger. The denominator contains an estimate of the standard deviation of the differences between group means; in other words, it suggests how much the difference between group means would vary if the experiment were repeated many times. Because this estimate is in the denominator, the value of t will get smaller as the standard deviation of group differences gets larger. For the data in Table 1, ( M1 − M 2 )

t = 2

( N1 − 1) S 1 + ( N2 − 1) S 2 N1 + N2 − 2 =

 degrees of freedom (df) The total sample size or number of scores in a data set, less the number of experimental groups.

⎛N1 + N2⎞ ⎜ ⎟ ⎝ N1 N2 ⎠

7.23 − 5 ( 12 )( 5.09 ) + ( 12 )( 4.46 ) ⎛ 26 ⎞ 24

=

2

2.23 . 735

⎜ ⎟ ⎝169⎠

= 2.60 with 24 df

To determine what a particular t means, we must use the value of N and a special statistical table called, appropriately enough, the t table. We have reproduced part of the t table in Table 4. First, we have to find the computed values of t in the row corresponding to the degrees of freedom, or df, associated with the experiment. In this case, degrees of freedom are simply N1  N2  2 (or two less than the total sample size or number of scores). Because our experiment had 13 participants per group, df  13  13  2  24. In the row for 24 df in Table 4, you will find increasing values of t in each column. These columns correspond to decreasing p values, the probabilities that the difference between means occurred by chance. If an obtained t value is equal to or larger than one of the values in the t table (on the correct df line), then the difference between means that generated that t is said to be significant at the .10, .05, or .01 level of probability.

A17_304213 9/4/07 12:35 PM Page A-17

Inferential Statistics

TA B L E

4



A-17

The t Table

This table allows the researcher to determine whether an obtained t value is statistically significant. If the t value is larger than the one in the appropriate row in the .05 column, the difference between means that generated that t score is usually considered statistically significant.

p Value df

.10 (10%)

.05 (5%)

.01 (1%)

4

1.53

2.13

3.75

9

1.38

1.83

2.82

14

1.34

1.76

2.62

19

1.33

1.73

2.54

22

1.32

1.71

2.50

24

1.32

1.71

2.49

Suppose, for example, that an obtained t (with 19 df) was 2.00. Looking along the 19 df row, you find that 2.00 is larger than the value in the .05 column. This allows you to say that the probability that the difference between means occurred by chance was no greater than .05, or 5 in 100. If the t had been less than the value in the .05 column, the probability of a chance result would have been greater than .05. As noted earlier, when an obtained t is not large enough to exceed t table values at the .05 level, at least, it is not usually considered statistically significant. The t value from our experiment was 2.60, with 24 df. Because 2.60 is greater than all the values in the 24 df row, the difference between the high- and low-incentive groups would have occurred by chance less than 1 time in 100. In other words, the difference is statistically significant.

Beyond the t Test Many experiments in psychology are considerably more complex than simple comparisons between two groups. They often involve three or more experimental and control groups. Some experiments also include more than one independent variable. For example, suppose we had been interested not only in the effect of incentive size on performance but also in the effect of problem difficulty. We might then create six groups whose members would perform easy, moderate, or difficult problems and would receive either low or high incentives. In an experiment like this, the results might be due to the size of the incentive, the difficulty of the problems, or the combined effects (known as the interaction) of the two. Analyzing the size and source of these effects is typically accomplished through procedures known as analysis of variance. The details of analysis of variance are beyond the scope of this book. For now, note that the statistical significance of each effect is influenced by the size of the differences between means, by standard deviations, and by sample size in much the same way as we described for the t test. For more detailed information about how analysis of variance and other inferential statistics are used to understand and interpret the results of psychological research, consider taking courses in research methods and statistical or quantitative methods.

A18_304213 9/4/07 12:35 PM Page A-18

A-18



APPENDIX B

Statistics in Psychological Research

SUMMARY Psychological research generates large quantities of data. Statistics are methods for describing and drawing conclusions from data.

Describing Data Researchers often test the null hypothesis, which is the assertion that the independent variable will have no effect on the dependent variable.

The Frequency Histogram Graphic representations such as frequency histograms provide visual descriptions of data, making the data easier to understand. Numbers that summarize a set of data are called descriptive statistics. The easiest statistic to compute is N, which gives the number of observations made. A set of scores can be described by two other types of descriptive statistics: a measure of central tendency, which describes the typical value of a set of data, and a measure of variability. Measures of central tendency include the mean, median, and mode; variability is typically measured by the range and by the standard deviation. Sets of data often follow a normal distribution, which means that most scores fall in the middle of the range, with fewer and fewer scores occurring as one moves toward the extremes. In a truly normal distribution the mean, median, and mode are identical. When a set of data shows a normal distribution, a data point can be cited in terms of a percentile score, which indicates the per-

Descriptive Statistics

centage of people or observations falling below a certain score, and in terms of standard scores, which indicate the distance, in standard deviations, between any score and the mean of the distribution. Another type of descriptive statistic, a correlation coefficient, is used to measure the correlation between sets of scores.

Inferential Statistics Researchers use inferential statistics to quantify the probability that conducting the same experiment again would yield similar results.

Differences Between Means: The t Test One inferential statistic, the t test, assesses the likelihood that differences between two means occurred by chance or reflect the impact of an independent variable. Performing a t test requires using the difference between the means of two sets of data, the standard deviation of scores in each set, and the number of observations or participants. Interpreting a t test requires that degrees of freedom also be taken into account. When the t test indicates that the experimental results had a low probability of occurring by chance, the results are said to be statistically significant.

Beyond the t Test When more than two groups must be compared, researchers typically rely on analysis of variance in order to interpret the results of an experiment.