Notes on the Reflection Problem

Notes on the Reflection Problem Ian M. Schmutte March 15, 2010 1 The Reflection Problem Social interactions models can fail to be fully identified ...
Author: Rose Spencer
4 downloads 0 Views 222KB Size
Notes on the Reflection Problem Ian M. Schmutte March 15, 2010

1

The Reflection Problem

Social interactions models can fail to be fully identified for a variety of reasons, the simplest of which is the reflection problem. Fortunately, while serious, the reflection problem is frequently the easiest of the identification problems to resolve. To start thinking about the reflection problem, remember that the fundamental identifying assumption of the linear model, E(y|X) = Xβ, (1) is that the matrix X is of full rank. Otherwise an infinite number of solutions are possible. The textbook violation of this assumption occurs when you accidentally include both a constant term and a full set of dummy variables for some categorical variable of interest. Identification is generally discussed as the ability to recover a set of structural parameters from a reduced form model. The structural model is some specificies the underlying data generating process. A reduced form is some specification of a statistical model to be estimated, also with unknown parameters. Borrowing notation from Blume and Durlauf (2005), Manski considers a social interactions model with the following structure: ωi = k + cXi + dYg + Jmei,g + εi .

(2)

Where, • ω is the individual’s choice regarding some behavior, say how much to study. • k is the average behavior in the population • Xi are individual characteristics that influence the outcome, for instance parental income. Manski calls these exogenous effects. • Yg are group characteristics that influence the individual, for instance the average parental income among the group g. Manski calls these contextual effects. • mei,g is the individual’s forecast of the average group behavior. Manski calls these the endogenous effects. 1

As a behavioral model, this says that an individual’s choice of ω depends on the average choice of people around him, on the characteristics of people in his reference group, and on his own idiosyncratic characteristics. The intuition behind the reflection problem is that with cross-sectional data, it is impossible to distinguish the effect on individual behavior of changes in mei,g from the effect of changes in Yg since whenever one moves, so does the other. In other words, if average study habits are high whenever parental education is high, the effect of average study habits cannot be properly distinguished from parental education. You need something that causes average study habits to differ across groups that does not affect individual behavior. To see the reflection problem most clearly, consider Manski’s original treatment where • Yg = E(X|g). The contextual factors are the group-level means of the exogenous variables, and • mei,g = E(ω|g) ≡ mg . The above implements the assumption that subjective expectations are consistent with the structure of the model. Assuming everyone has information about everything in the model, the expected value of the decision in the group is calculated by integrating out the exogenous variables in the conditional expectation, yielding E(ω|g) = EX [E(ω|X, g)] = k + cE(X|g) + dYg + Jmei,g .

(3)

After substitutions, this is  mg =

1 1−J

 [k + (c + d)Yg ] .

(4)

Notice that there is no idiosyncratic error in the above equation. We are working with the asymptotic limit of the model, in which case there is no sampling error associated with these forecasts. The preceding equation shows that in the structural model, mg is a linear combination of the constant and the contextual factors. To push things just a little further, substitute mg back into the structural equation to get the reduced form J [k + (c + d)Yg ] 1−J k Jc + d = + cX + Yg 1−J 1−J = πi + π2 X + π 3 Y g

E(ω|X, g) = k + cX + dYg +

The parameters π1 , π2 , π3 of the reduced form are identified (as long as Yg is not perfectly collinear with a constant and the exogenous variables), but recovering the full set of structural parameters (k, c, d, J) is not possible since the function mapping structural parameters into reduced form parameters is not bijective. More plainly, the problem here is that recovering the structural parameters requires solving three equations in four unknowns. 2

When the reduced form is identified, it is still possible to identify the presence of some kind of social interaction, since π3 6= 0 implies Jc + d 6= 0, and so either Jc 6= 0, d 6= 0, or both. That is, π3 6= 0 means there is either an endogenous or a contextual effect on the outcome.

1.1

Breaking the reflection problem

People often take the reflection problem as damning the entire enterprise of identifying social interactions, and especially of identifying endogenous effects. But it really just demonstrates what is usually true of identification: it is difficult in observational data, and requires certain assumptions on the modeling process. The reflection problem emerges because of three modeling assumptions: 1. The conditional expectation E(ω|X, g) is linear. 2. The conditional expectations of all exogenous variables appear as contextual variables Yg = E(X|g) 3. Reference groups have a very specific structure. The reflection problem can be broken by weakening any of these assumptions. For a general articulation of the conditions for identification see Brock and Durlauf (2001). In these notes, I focus on weakening assumptions 2 and 3. Weakening assumption 2 is interesting pedagogically, while weakening assumption 3 is generally easier to defend and is the source of a very interesting recent paper tying together the literature on complex network analysis with the literature on identifcation of social interactions. Weakening the linearity assumption is also important, and is covered in a number of papers, e.g. Brock and Durlauf (2001); Blume and Durlauf (2005). On the one hand, the fact that the reflection problem relies on the assumtion that the conditional mean is linear suggests that the non-identification result is fragile in some sense. On the other hand, from an applied perspective, it is generally suspicious if the only source of identification in the model is a specific functional form. The results for identification with non-linear models are pretty robust, but again, the relevant warning is to be very clear about the assumptions providing identification and the sensitivity of the result to modifications to those model assumptions. 1.1.1

Identification Strategy 1: Not all exogenous variables appear as contextual variables

The structural equations above, including the equation for the average individual behavior mg look like simultaneous equations. We are looking for something like an exclusion restriction; something that appears in the equation for mg that does not directly affect individual behavior, ω. The easiest case is if the exogenous variables, X are such that X = [X1 X2 ] with Yg = E(X1 |g). In this case, the equation for mg remains the same, but now, defining

3

Wg = E(X2 |g), in the reduced form we have J [k + (c1 + d)Yg + c2 Wg ] 1−J k Jc1 + d Jc2 = + X 1 c1 + X 2 c2 + Yg + Wg 1−J 1−J 1−J = πi + π2 X1 + π3 X2 + π4 Yg + π5 Wg

E(ω|X, g) = k + Xc + Yg d +

In the example where ω is study time, the excluded factor could be the age of the parents among your reference group. Now we have increased the number of parameters in the reduced form of the model. This means that the order condition for identification is satisfied (as many equations as unknowns). Establishing the rank condition for identification is straightforward and only relies on weak assumptions. 1.1.2

Identification Strategy 2: Exogenous assignment or variance restrictions

An easy way to frame this is to start again with a model subject to the reflection problem. Moffitt (2004) provides another simple exposition of the reflection problem, again framed in the language of simultaneous equations. In this case, reference groups are really trivial: agents interact in pairs so that the structural model is completely specified by: ω1g = ω2g J + x1g c + x2g d + ε1g ω2g = ω1g J + x2g c + x1g d + ε2g Notice that I am assuming that there is no constant term. To simplify exposition, I proceed assuming that all data have been de-meaned. Furthermore, the structural error has the form εig = µg + ηig , where E(µg |x) = E(ηig |x) = 0 E(ηig ηjg0 ) = 0 E(µg ηjg0 ) = 0 where the last two assumptions hold for all i, g, j, and g 0 . Also define

E(xig x−ig ) = σxx−i var(η) = ση2 var(µ) = σµ2

4

Notice that this specification allows errors to be correlated within pairs, but assumes that assignment to pairs is exogenous conditional on the x’s. 1 There is no sorting into groups on the basis of unobservable individual characteristics. It is as if pairs match assortatively on the basis of observable characteristics, and are then randomly assigned to heterogeneous ’locations’ that are differentially productive. You might imagine the setting is a golf tournament where players are assigned random partners of, say, the same age, to play a single hole, but those holes are differentiated by their degree of difficulty - par -, but par is not observed. Or agents are grocery checkers who work on parallel aisles, and who are scheduled randomly, but where productivity depends on the shift and supervisor, both of which are unknown. With exogenous assignment and no correlation in unobservables, if there is no groupspecific heterogeneity, i.e. σµ2 = 0, then endogenous interactions are identified by the covariance within pairs. Moffitt argues, in most settings, we should be reluctant to assume σµ2 = 0, since the whole point of the exercise is to establish that some part of unobservable variation between groups is due to social interactions, and σµ2 = 0 sort of assumes that by construction. The following demonstrates the identification problem formally. Moffitt considers the possibility that certain kinds of policy interventions may generate random assignment of the sort envisioned here, and therefore facilitate identification. Graham (2008) considers how to use variation induced by a policy experiment to identify social interactions when there is still unobservable correlation within groups (i.e. σµ2 > 0). He provides restrictions on the variance of unobservables conditional on an observable instrument that facilitate identification. The reduced form is derived by substution, ωig = π1 xig + π2 x−ig + νig ,

(5)

where i = 1 or 2 and −i is defined to be 2 or 1, so the above defines two reduced form equations. The equations mapping the reduced form parameters to the structural parameters are π1 = π2 = νig = σν2 = σνν−i =

c + Jd 1 − J2 d + Jc 1 − J2 Jε−ig + εig 1 − J2   1 (1 + J)2 σµ2 + (1 + J 2 )ση2 2 (1 − J 2 )   1 (1 + J)2 σµ2 + 2Jση2 . 2 (1 − J 2 )

1

(6) (7) (8) (9) (10)

Another problem is that the structural model should perhaps include not the realization of y−i , but the partner’s best forecast of y−i . If that is the case, then identification is possible depending on what we believe is in the agent’s information set. If we assume the information set includes µi but not εig or ε−ig , then the residual covariance in the reduced form identifies σµ2 .

5

Except for the last two, these are identical to equations (6)-(9) in Moffitt (2001). The model described here is for the case Moffitt warns of, where structural errors are correlated within groups (i.e. σµ2 > 0). It is clear from the equations above that even if the reduced form is identified (which it is), the structural parameters cannot be recovered since there are five structural parameters here: (c, d, J, σµ2 , ση2 ) and four reduced form parameters. Notice that: (1 + J)2 σµ2 + 2Jση2 σνν−i = σν2 (1 + J)2 σµ2 + (1 + J 2 )ση2

(11)

identifies the social interaction parameter J when σµ2 = 0 since there is no group level heterogeneity inducing correlation in the unobservables. Likewise, the model embeds the first identification strategy, since if we exclude x−i by setting d = 0 then J is identified by π2 . π1 To see that the reduced-form parameters are identified, notice that the variance of x, σx2 , and the covariance of xi and x−i , σxx−i , are non-parameterically identified in the data under the assumption that assignment is exogenous. The variance components for the rest of the model are easy to derive as  2 σy2 = E(yig ) = πi2 + π22 σx2 + 2π1 π2 σxx−i + σν2  σyy−i = E(yig y−ig ) = πi2 + π22 σxx−i + 2π1 π2 σx2 + σνν−i σyx = E(yig xig ) = π1 σxx−i + π2 σx2 σyx−i = E(yig x−ig ) = π1 σx2 + π2 σxx−i . Following Graham’s approach, let’s assume the distributions of the error components depend on some instrument. In his example, it is class size. So we specify ση2 (w), σµ2 (w), ... where w is the instrument and ση2 (w) is the variance of η conditional on w. We assume the behavioral model is still the same, but when w is binary, we have twice as many data moments:  2 σy2 (w) = E(yig |w) = πi2 + π22 σx2 (w) + 2π1 π2 σxx−i (w) + σν2 (w)  σyy−i (w) = E(yig y−ig |w) = πi2 + π22 σxx−i (w) + 2π1 π2 σx2 (w) + σνν−i (w) σyx (w) = E(yig xig |w) = π1 σxx−i (w) + π2 σx2 (w) σyx−i (w) = E(yig x−ig |w) = π1 σx2 (w) + π2 σxx−i (w). More importantly, we have increased the number of reduced form parameters from four to six (notice also that the reduced form model is overidentified). We now have (π1 , π2 , σv2 (0), σvv−i (0), σv2 (1), σvv−i (1)) At the same time, we increased the number of structural parameters from five to seven (c, d, J, σµ2 (0), ση2 (0), σµ2 (1), ση2 (1)). What to do? 6

What if we just put some restrictions on these conditional variances? Say, for instance, that σµ2 (0) = σµ2 (1). We are at seven equations in seven unknowns. Now calculate 2J(ση2 (1) − ση2 (0)) σνν−i (1) − σνν−i (0)   = σν2 (1) − σν2 (0) (1 + J 2 ) ση2 (1) − ση2 (0) 2J = 1 + J2

(12) (13)

In other words, the social interaction parameter, J, is identified here from the ratio of within group covariance to total variance, which is directly analogous to Graham’s identification strategy. 1.1.3

Identification Strategy 3: Exploit Social Network Structure

In looking for things that influence the endogenous variable without affecting your own outcome, another possibility arises if reference groups do not perfectly overlap. In that case, the average characteristics of friends of friends can drive variation in endogenous effects without having a direct effect on choices. The discussion here follows Bramoull´e et al. (2009). Their main insight, which has a good deal in common with the literature on spatial autoregression, is to conceive of people arranged in social networks, so that each person has a specific reference group of ’neighbors’ that is not, in general, identical to the reference group of anyone else. As an example, consider a setting in which people are arranged on an infinite line, and only interact with their immediate, left hand neighbor. Indexing individuals by i, we have the structural equation ωi = k + ωi−1 J + xi c + x−i d + εi .

(14)

It is clear how this structure supplies identifying variation. The variable xi−2 does not have a direct effect on ωi , but moves ωi−1 around. The value of this approach is high when we have reasonable data on the social network connecting agents. In most cases, we do not have this information, and it is then not clear how to apply these results.

7

Bibliography Blume, L. E. and Durlauf, S. N. (2005). Identifying social interactions: A review. July 22, 2005. Bramoull´e, Y., Djebbari, H. and Fortin, B. (2009). Identification of peer effects through social networks, Journal of Econometrics 150: 41–55. Brock, W. A. and Durlauf, S. N. (2001). Interaction-Based Models, Vol. Volume 5, Elsevier Science. Graham, B. S. (2008). Identifying social interactions through conditional variance restrictions, Econometrica 76: 643–660. Moffitt, R. A. (2004). Policy Interventions, Low-Level Equilibria, and Social Interactions, MIT Press, pp. 45–82.

8