Evaluating the slope of the relationship between a continuous variable and the dependent variable when there is an interaction

Evaluating the slope of the relationship between a continuous variable and the dependent variable when there is an interaction 1. When one variable in...
Author: Shawn Allen
13 downloads 2 Views 29KB Size
Evaluating the slope of the relationship between a continuous variable and the dependent variable when there is an interaction 1. When one variable in the interaction is dummy-coded and the other is continuous This is based on the analysis of the bank3.sav dataset, which is available in the Psych/Share directory on the Student Server, and also at http://online.mq.edu.au/pub/PSYSTAT/download.htm. The analysis is as follows: compute mineduc=minority*edlevel. manova tsalnow minority mineduc edlevel/ analysis=tsalnow/ print=param(estim)/ design=constant minority edlevel mineduc. The resulting regression equation is: tsalnow' = 334.68 - 91.61 * minority - 13.49 * edevel + 7.96 * minority * edlevel

Graphing the Interaction The first step is to draw a graph of the interaction. In this case, the values of minority are fixed at 0 and 1, but we have to decide what values of edlevel to substitute in the equation in order to get estimated values of salnow. There may be theoretical reasons, or reasons arising from previous research, for choosing particular values. In the absence of these, a common method is to use plus or minus half or one standard deviation from the mean of the variable. As the following output from descriptives shows, the mean for edlevel is around 13.5 and the standard deviation is around 3. To keep things simple, we'll use values Variable Mean Std Dev Minimum Maximum N Label EDLEVEL 13.49 2.88 8 21 474 Educational level

of 10 (roughly the mean minus one s.d.) and 16 (the mean plus one s.d.). When the combinations (0,10), (0,16), (1,10) and (1,16) are substituted in the equation, the resulting values of salnow' are 199.78 (white employees with 10 years of education), 118.84 (white employees with 16 years of education), 187.77 (non-white employees with 10 years of education) and 154.59 (non-white employees with 16 years of education) respectively. These values are plotted in the following graph.

-2-

Testing the Slopes We know that when an interaction is included in the model, the regression coefficient for each individual variable shows the slope of the relationship between that variable and the dependent variable at the point where the other variable involved in the interaction is zero. Because in this analysis white employees are coded zero for the minority variable, we can tell from the regression equation that the slope of the relation between predicted current salary and education level is -13.49. That is, for every additional year of education a 210

Predicted TSALNOW

200 190 180 170 160 150 140 130

MINORITY

120

White (0)

110 100 -1 (10)

Non-white (1) 1 (16)

EDLEVEL white employee has, the number of hours needed to earn $1000 drops by 13.49. We can also tell that this slope is significantly different from zero ( the t-value for the regression coefficient is 20.5). We'd now like to know what the slope of the relationship is for non-white employees. We know from the fact that the interaction is significant that the slopes for white and non-white employees are different. However, we don't know what the slope actually is for non-white employees, and whether it's different from zero. The simplest way (see Aiken & West, 1991, p. 131) to find out is to temporarily switch the codes for white and non-white employees and read the equation again. The analysis is: temporary. recode minority (0=1)(1=0). compute mineduc=minority*edlevel. manova tsalnow minority mineduc edlevel/ analysis=tsalnow/ print=param(estim)/ design=minority edlevel mineduc. Note that the interaction term is recreated with the recoded version of minority. This gives the regression equation: tsalnow' = 243.07 + 91.61 * minority - 5.52 * edevel - 7.96 * minority * edlevel

-3-

The equation shows that the slope of the relation between edlevel and tsalnow' when minority is zero is -5.52. This means that while the number of hours worked per $1000 goes down by 13.49 for every extra year of education for white employees, it only goes down by 5.52 for non-white employees. This is still a statistically significant amount (t = 3.9) but is much less than the slope for white employees. Two further points. One is that the slope of -5.52 for the non-white employees (but not its significance) could have been calculated from the first equation by using the coefficient for the interaction, 7.96. This coefficient shows how the slope of the relation between edlevel and tsalnow' in the model changes for each unit change in minority. So, given that the slope when minority equals zero is -13.49 (the slope for white employees), and the slope changes by 7.96 when the minority variable changes from zero (white) to one (non-white), we know that the slope becomes -13.49 + 7.96 = -5.53. A second point is that this method still works with variables which have more than two categories. All we have to do is calculate all possible dummy variables (k, say), then do analyses with all combinations of k - 1 of them, not forgetting to recalculate the interaction term each time. In each analysis, the slope shown for the continuous variable will apply to the category for which the dummy variable has been omitted.

Testing Differences Between Groups When a regression model contains an interaction, the regression coefficient for each individual variable shows the relationship between that variable and the dependent variable at the point where the second variable involved in the interaction is zero. This very useful fact can be used to test whether there is a difference between groups (white and non-white employees in our example) at a particular value of the continuous variable (edlevel in our example). Say we would like to know whether there is a difference between the white and non-white employees in terms of salnow' when both have had 10 years of work experience. We can simply subtract this value from edlevel before the analysis so that the point at which the minority variable is tested, now zero, actually represents 10, the value we're interested in. The analysis would be as follows1. temporary. compute edlevel=edlevel-10. compute mineduc=minority*edlevel. manova tsalnow minority mineduc edlevel/ analysis=tsalnow/ print=param(estim)/ design=constant minority edlevel mineduc. The resulting regression equation shows a value of -11.99 for minority, i.e., non-white employees have a lower (more favourable) value of tsalnow' than white employees, a reversal of the general trend. However, the t-test of the regression coefficent shows that the difference is only marginally significant (p=.055). Aiken & West, 1991, p. 132-133 discuss 1

In the data used for this excample, zero is defined as missing for edlevel, and the subtraction results in some zeroes. To avoid cases with zeroes being omitted, the command missing values edlevel (-999) can be used. It sets missing values to another number [one which will never occur] and wipes out the original missing values assignment.

-4this method further. They also show how, in the kind of interaction shown in the graph, to calculate the point on the x-axis where the regression lines for the two groups cross (p. 23-24 & p. 125-126).

2. When both variables in the interaction are continuous This is based on the analysis starting at line 121 of bank3.out (or bank3_80.out). The analysis is as follows: compute agework=age*work. manova tsalnow age work agework/ analysis=tsalnow/ print=param(estim)/ design=constant age work agework. The resulting regression equation is: tsalnow' = 113.65 + 1.60 * age - 8.33 * work + .128 * age * work

Simple Slopes A useful way of looking at a regression equation containing an interaction when both variables are continuous is to rearrange it as follows (Aiken & West, 1991, p. 12-14; Jaccard, Turrisi & Wan, 1990, p.25-26):

tsalnow' = 113.65 + 1.60 * age - 8.33 * work + .128 * age * work = - 8.33 * work + .128 * age * work + 113.65 + 1.60 * age = (- 8.33 + .128 * age) * work + (113.65 + 1.60 * age) (In this case we are interested in how the relationship between tsalnow' and work differs at various values of age, but we could equally well have rearranged the equation to investigate the relation between tsalnow' and age at different values of work.) In the rearranged equation above, the first part of the right-hand side, (- 8.33 + .128 * age) * work, shows the slope of the relation between work and tsalnow'. This is called the 'simple slope'. The fact that age is included in this expression shows that the slope will change as age changes. The second part of the right-hand side of the equation, (113.65 + 1.60 * age), doesn't affect the slope of this relationship, just the intercept.

Calculating and Graphing the Values of tsalnow' The next stage in interpretation is to substitute values of age and work into the equation so that we can see how salnow' is affected. But what values of age and work should be used? It may be that there are specific values you wish to test because of theory or previous research. If this is not the case, and you mainly want to see what the interaction looks like, the decision is to some extent arbitrary. It also depends on how we want the graph (there should always be a graph of an interaction) to look. In this case we want a graph which shows how work relates to tsalnow' for a range of age values. This means that we only need to predict two values of tsalnow for each value of work. (Only two points are needed to

-5completely specify a line -- any other points are redundant.) However, we would like a larger range of age values to see their effect on the slope of the relation between work and tsalnow'. (This will be clearer when you look at the graph we end up with.) As mentioned above, a common way of choosing values of the two variables for plotting is to use plus and minus 1 or .5 of a standard deviation from the mean of each variable. The descriptives command produces the following results for age and work: Variable AGE WORK

Mean Std Dev Minimum Maximum N Label 37.19 11.79 23.00 64.50 474 Age of employee 7.99 8.72 .00 39.67 474 Work experience

Following a bit of truncation and rounding, the decision is to use two values of work, plus and minus an approximate standard deviation (8) above and below the mean of 8 to give values of 16 and 0 years of work experience (having a value of zero makes life a bit easier). For age, where we want a range of values, the values are the mean (37) itself, plus and minus approximately one standard deviation (12) above and below the mean, giving 49 and 25, and plus and minus approximately half a standard deviation (5), giving 42 and 32. (I've chosen to use only whole numbers.) Now it's simply a matter of substituting all combinations of these values into the equation. Along the way to working out the predicted values of tsalnow, the simple slope can also be calculated from the rearranged equation for later use.

Age

s.d.

Work

s.d.

Simple slope (- 8.33 + .128 * age)

tsalnow'

25 -1.0 0 -1 -5.13 25 -1.0 16 1 32 -0.5 0 -1 -4.23 32 -0.5 16 1 37 0 0 -1 -3.59 37 0 16 1 42 0.5 0 -1 -2.95 42 0.5 16 1 49 1.0 0 -1 -2.06 49 1.0 16 1 ________________________________________________________

153.65 71.57 164.85 97.11 172.85 115.35 180.85 133.59 192.05 159.12

As the graph (next page) and the simple slopes in the table show, the effect of work experience (work) on tsalnow (according to the model) differs for employees of different ages. For the youngest employees represented in the graph (those aged 25), the number of hours per $1000 earned goes down by 5.13 for each additional year of work experience. For employees aged 49, however, the number of hours per $1000 goes down by only about two hours for each additional year of work experience.

Predicted TSALNOW

-6200 190 180 170 160 150 140 130 120 110 100 90 80 70 -1.0 (0)

AGE -1.0 (25) -0.5 (32) 0.0 (37) 0.5 (42) 1.0 (49) 1.0 (16)

WORK

Testing whether the simple slopes differ from zero In most cases, a graph similar to the one above, perhaps with some simple slope estimates, will provide a sufficient description and evaluation of an interaction. In other cases, however, you may have reason to test whether a simple slope differs significantly from zero. Aiken and West, 1991, e.g., p. 16-18, and Jaccard, Turrisi & Wan, 1990, e.g., p. 32-33, provide appropriate equations for doing these tests. The equations require the variances and some covariances of the coefficents in the regression equation. These can be obtained from the regression procedure as shown beginning on line 130 of bank3.out. Again, however, we may achieve the same thing very simply, by manipulating the values of one or both of the variables in the equation. For example, if we want to test whether the slope of the relationship between work and tsalnow' differs from zero at the mean of age (it looks as if it does from the graph), we could carry out this analysis: temporary. compute age=age-37.19. compute agework=age*work. manova tsalnow age work agework/ analysis=tsalnow/ print=param(estim)/ design=constant age work agework. The mean of age has been set to zero, so the regression coefficient for work, and the corresponding test of significance, should tell us what we want to know. The former is 3.58 (approximately the same as in the table) and the t-value is 5.94, p < .00001.

-7-

3. Centring continuous variables when testing interactions A point which arises again and again when interactions involving continuous variables are being considered is the value of 'centring ' the variables by subtracting the mean of each variable from each of its values. This is done before interaction terms are created by multiplying the variables, as shown in the analysis beginning at line 152 of bank3.out. One advantage of doing this will be obvious: the main effect of each variable in the interaction (age and work in this example) is tested at a value of the other variable which not only really exists, but is a value at which a test of the simple slope of the first variable is sensible. Compare the results of the analysis using the uncentred variables (line 121 on) with those from the analysis based on the centred variables (line 152 on). The regression equations are shown below. The coefficient for age is reasonably sensible in both analyses (1.60 and 2.62 respectively) because a value of zero for work (i.e., no work experience) actually occurs in the dataset.

tsalnow' = 113.65 + 1.60 * age - 8.33 * work + .128 * age * work

(uncentred)

tsalnow' = 144.41 + 2.62 * age - 3.58 * work + .128 * age * work

(centred)

However, the value of -8.33 in the uncentred analysis differs sharply from that obtained in the centred analysis. Is the first coefficient really sensible? No, because it is the slope of the relationship between work and tsalnow' for someone with an age of zero! The silliness of this estimate is reflected in the standard error in the uncentred analysis: 1.45 compared with .60 in the centred analysis. This brings out another advantage of centring variables. Aiken and West (p. 32-36; p. 181182) show that if variables are centred, the covariance between a product term (e.g., age * work) and its components is reduced considerably, which helps to avoid possible problems caused by multicollinearity. Jaccard, Turrisi & Wan (p. 30-31) also make this point. In our data, the correlations between the uncentred age and work variables and the age*work product term are .83 and .98 respectively. The corresponding correlations for the centred variables are .50 and .76 respectively. Aiken and West (p.36) refer to a distinction between problems of multcollinearity caused by essential ill-conditioning, due to correlations between the variables in the population, and those caused by non-essential ill-conditioning, which can be eliminated by centring the variables involved.

4. Errors of measurement Jaccard, Turrisi & Wan (p.38-40) discuss the problems caused by measurement error in the variables involved in interaction analyses. Unreliability reduces the power to detect significant interactions and also leads to biased and inconsistent estimates of the regression coefficients. A number of possible solutions have been proposed, none of which is entirely satisfactory, although the authors are "optimistic about what the future holds" (p. 40). In the meantime, we should endeavour to produce variables which are as reliable as possible and to avoid forming interactions from variables which are known to have low reliability.

-8-

5. Conclusion When carrying out analyses involving interactions, use measures which are reliable, and centre them, for reasons both of interpretability and avoidance of possible problems caused by multicollinearity. When interpreting interactions, always produce a graph (both for your sake and that of your readers), and take advantage of the very useful fact that when a regression model contains an interaction, the regression coefficient for each individual variable shows the relationship between that variable and the dependent variable at the point where the other variable involved in the interaction is zero.

References Aiken, Leona S. & Stephen G. West. Multiple regression : testing and interpreting interactions. Newbury Park, Calif : Sage Publications, 1991. [QA278.2.A34] Jaccard, James, Robert Turrisi, & Choi K. Wan. Interaction effects in multiple regression. Newbury Park, Calif : Sage Publications, 1990. [HA29.Q35] Both of these books have been placed in Reserve.

Alan Taylor Department of Psychology 20th May 2001

Suggest Documents