Part IV. Simultaneous Equation Models

Part IV Simultaneous Equation Models Chapter 14 Introductory Examples Introduction The most crucial assumption justifying the use of least squares ...
Author: Easter Parker
3 downloads 1 Views 388KB Size
Part IV Simultaneous Equation Models

Chapter 14 Introductory Examples

Introduction The most crucial assumption justifying the use of least squares (ordinary or generalised) in linear models concerns the …rst order moment of the dependent variable. In fact, in order to estimate b without bias by least squares in the model y = X b + u, it is essential that40 E(u j X) = 0

or equivalently

E(Y j X) = Xb

Without this assumption, the model itself is not clearly de…ned in its current form and the status of the error u becomes ambiguous in the absence of extra information. This assumption is not satis…ed for instance, in the case of speci…cation errors in‡uencing the explanatory variables or even measurement errors in‡uencing the same. In this text, we 40

In practice, one writes more often :

Eu = 0; u and X independent (this implies that the conditional expectation E(u j x = is zero). It is then crucial that the assumption of independence between u and X is satis…ed. Otherwise, there would be no meaning in reparating the in‡uence of X on from that Holly of an unobservable ”remainder” on Y , if they were not independent. c Y Alberto ° 301

302

Introductory Examples

[Chap. 14]

will be studying another important class of models in which this assumption E(u j X ) = 0 is not valid : the class of simultaneous equation models in which several dependent variables are determined simultaneously by several equations with random disturbances and in which at least one equation has a random disturbance term and more than one dependent variable.41 As we will see later, the linear simultaneous equation models may super…cially resemble linear models though they are not of the same type as the latter, in the probabilistic sense. This is due to the fact that, for reasons of convenience, the economists are used to analysing separately the various economic phenomene and the econometricians are used to just adding a random disturbance term on the right hand side of the equations given by the economists. Here, we don’t intend to enter the ”epistemological” debate on the profound nature of simultaneity as it will anyway be pointless at this stage. However, we will …rst discuss in depth two simple classical examples of simultaneous equation models, which form the ”paradigm” in this …eld - a partial equilibrium model of supply and demand of a market good and a simple keynesian macroeconomic model. These examples will enable us to see what kind of speci…c di¢culties arise when we formalise and estimate simultaneous equation models and will thus lead us to the di¤erent topics that will be treated in this text.

14.1 An Equilibrium Model of Supply and Demand 14.1.1 The Model The …rst example of a simultaneous equation model is a model of partial equilibrium of supply and demand of a single good on a single market, i.e. of equilibrium by price adjustment in a context of perfect competition. To start with, the model can be written as:

41

The simultaneous equation models should not be mixed up with simultaneous regression models or stacked regressions in which several regressions are considered simultaneously but each of them has only one dep endent variable. May be, this is why some people prefer to say ”interdependent equation models” while talking of simultaneous equation models.

[14.1]

An Equilibrium Model of Supply and Demand

303

8 < qtd = a + bpt + ut demand 0 supply q = c + dpt + ezt + vt : td qt = qt0 equilibrium condition

where

qdt quantity of the good demanded q0t

quantity of the good supplied

pt price of the good zt an observable variable in‡uencing the supply42 a; b; c; d; e unknown coe¢cients to be estimated The variables explained by the model, in other words the endogenous variables are qd , q0 and p. The explanatory or exogenous variables - there is really only one in this model (apart from unit-valued variable associated with the constant terms) - is z. Now, u and v are random disturbance terms on which we make the following assumptions :

E(ut ) = E(vt ) = 0: E(u2t ) = ¾

2 u;

E(v2t ) = ¾ 2v;

E(ut vt ) = ¾

uv;

E(ut u0t ) = E(vt vt0 ) = 0 u and v independent of z: The above equations along with the assumptions constitute the structural form of our model. However, in order to facilitate notations, we make the following harmless simpli…cations : - the constant terms a and c are omitted (we know that for OLS, this amounts to assuming that we have centered observations) - the equality between supply and demand is eliminated by introducing a new variable q, which is the quantity exchanged in the market at the equilibrium price (whose existence is postulated by the third equation); in other words, we write :

qdt = q0t = qt 42 For example, for an agricultural product like wheat or beetroot, the subsidies (which in‡uence the cultivated area) or the weather (which in‡uences the yield per hectare).

304

Introductory Examples

[Chap. 14]

Thus the ”structural form” (S.F.) that we want to estimate can be written as : S.F. ½

qt = bpt + ut qt = dpt + ezt + vt

where the endogenous variables are q and p and the exogenous variable z. The assumptions regarding the error terms remain unchanged. Now, let us express the endogenous variables q and p in terms of the exogenous variable z and the errors u and v. We will see that this is a natural way of proceeding if we are interested in the statistical properties of the estimators considered later. Doing so, we obtain the reduced form (R.F.) as follows : R.F.

e 1 zt + (vt ¡ ut ) b¡d b¡d b 6= d be 1 qt = zt + (bvt ¡ dut) b¡d b¡d

pt =

14.1.2 Immediate Consequences Can we assume that

E(ut j pt ) = 0 in the structural form of our supply-demand model? The answer is no: by eliminating qt from the two equations, we obtain

ut = (d ¡ b)pt + ezt + vt and hence

E(ut j pt ) = (d ¡ b)pt + eE(zt j pt ) + E(vt j pt ) i.e. taking account of the exogeneity of z t

[14.1]

An Equilibrium Model of Supply and Demand

305

E(ut j pt ) = (d ¡ b)pt + eE(zt ) + E(vt j pt ) This equality should hold for all pt Now, the right hand side has no reason to be zero in general. Thus, in spite of its apparent resemblance it is not right to consider the …rst equation as a linear model.43 Therefore, one can expect that the ordinary least squares estimators do not possess the interesting statistical properties that they do in linear models. For the …rst equation of our model, the OLS estimator ^b of b is : P ^b = Ppt qt = Nu m erator p2t Deno min ator

(where the index of summation is t).

Replacing pt and qt by their respective expressions given by the reduced form in the numerator N and the denominator D of ^b, we get

(b ¡ d)2 N = be2

X

z 2t ¡ e(d + b)

(b ¡ d)2D = e2

X

zt2 ¡ 2e

X

X

zt ut + b

zt ut + 2e

X

X

vt2 ¡ (b + d)

zt vt +

X

v2t ¡ 2

X

ut vt + d

X

ut vt +

X

X

u2t

u2t

Thus, ^b is a ratio of two non-independent quadratic forms in ut and vt and hence E(^b) cannot be calculated without additional assumptions regarding the joint distribution of (u; v) and, in general, ^b is biased.44 43

One can verify even more that u and p cannot be independent according to the model itself. For this, let us calculate the covariance between ut and pt which is equal to E(ut pt) (since it is assumed that E(ut) = 0). From the reduced form,

E(ut pt)

= =

1 1 e 2 E(utzt) + E(utvt ) ¡ E(ut ) b¡d b¡d b¡d ¾ uv ¡ ¾ 2u b¡d

(as, by assumption, u and z are independent and E(ut ) = 0) This last expression is, in general, di¤erent from zero which implies the non-independence of u and p. 44 b is an unbiased estimator even though one cannot prove it? If we are ready to make additional Is it possible that ^ assumptions regarding the distribution of the disturbances (for example, the normality assumption), then we can b is in b for a …nite sample. One can then verify that ^ devise ways of calculating approximately the expectation of ^ fact biased, as we will see when we deal with the small sample properties.

306

Introductory Examples

[Chap. 14]

14.1.3 Inconsistency (or asymptotic bias) of least squares estimators By making some simple asymptotic assumptions, we can study the asymptotic properties of ^b in detail. We assume that, when T ! 1, 1X 2 p zt ! Mzz > 0 T

1X p zt ut ! 0 T

1X p zt vt ! 0 T

1X 2 p 2 1X 2 p 2 1X p ut ! ¾ u vt ! ¾ v ut vt ! ¾ uv T T T These assumptions concern, in particular, the exogeneity of Z and express the (weak) law of large numbers on the convergence of the empirical moments to their corresponding theoretical moments. By virtue of Slutsky’s theorem, we can then write :

p lim ^b =

p lim N T p lim

D T

=

be2M zz + b¾ 2v ¡ (b + d)¾ uv + d¾ e2 Mzz + ¾ 2v ¡ 2¾ uv + ¾ 2u

2 u

or (d ¡ b) (¾ 2u ¡ ¾ uv ) e2 Mzz + ¾ 2v ¡ 2¾ uv + ¾ 2u We call this last expression the ”inconsistency” of ^b,45 since it shows that, in general, the OLS estimator ^b is not consistent except in the strange case ¾ uv = ¾ 2v which is di¢cult to p lim ^b ¡ b =

interpret (the case d = b will lead to a degenerate model as it implies e = 0 and u = v).46 By carefully examining the inconsistency formula of p lim ^b ¡ b, one can say more about the asymptotic behaviour of ^b. Indeed, we see that - the greater the value of Mzz (ceteris paribus) the weaker the inconsistency i.e. p lim ^b closer to b. In other words, the stronger the ”signal” given by the exogenous variable compared to the ”noise” of the disturbances, the closer p lim ^b is to b. - since the denominator of p lim ^b ¡ b is positive, sign of p lim ^b ¡ b = sign of (d ¡ b) (¾ 45

2¡ u

¾

uv ).

Sometimes the word ”asympototic bias” is preferred to ”inconsistency” but the former may lead to some ambiguity to the extent that it is not necessarily the limit of the exact bias (which can even be in…nite!). 46 Coming back to the model, one may be surprised that ^b is not consistent in the case d = 0 in which the simultaneity is much less apparent. This situation willPbe clari…ed while studying recursive models. But, now itself, the reader is q2 b = P pttq t is, in this case, a consistent estimator of b if ¾ 2uv = 0 invited to verify that the estimator ^

[14.1]

An Equilibrium Model of Supply and Demand

307

Further, one can make use of the a priori ideas of the economist, concerning the signs of the true coe¢cients d and b: in the case of a ”normal” economic good, one expects that b < 0 (demand decreases when price increases) and d > 0 (supply increases with price). Under these additional assumptions:

sign of p lim ^b ¡ b = sign of

¡ ¾

2 u¡

¾

uv

¢

If one also adds assumptions concerning the second-order moments of the disturbances, 2 u,

¾

¾

2 v,

- if ¾

uv ,

¾

one can be even more precise:

< 0, i.e. if all the facotrs that are omitted from the model and represented by the disturbances u and v have an inverse e¤ect on supply and demand, then p lim ^b ¡ b > 0. Since b is supposed to be negative, we say that, in this case, ^b is ”asymptotically biased uv

towards zero”. - if ¾

uv

= 0, i.e. if these random errors in the supply and demand equations are

independent then

p lim ^b ¡ b = (d ¡ b)

1 1+

(e 2Mzz +¾ ¾ 2u

2) v

i.e. not only p lim ^b ¡ b > 0 but also p lim ^b ¡ b < d ¡ b or b < p lim ^b < d. Moreover, in this same case, - if ¾

2 u

2 v,

is very small compared to ¾

i.e. if the random elements of demand are relatively

very small compared to those of supply (like in the case of certain agricultural products though our model is quite simple!), then p lim ^b 6= b. - on the other hand if e2Mzz + ¾

2 v

is very small compared to ¾

2 u,

i.e. if the variations of

supply both due to those of the random elements a¤ecting supply and due to those of the exogenous variable z, are very small compared to the random ‡uctuations of demand, then p lim b^ 6= d. In practice, one should clearly state the various assumptions so that the degree of plausibility of each of them can be carefully examined. This is not a very easy task especially as regards the second-order moments of the disturbances.

308

Introductory Examples

[Chap. 14]

14.1.4 Graphical Illustration The previous results can be clearly illustrated by means of a graph. In the (p; q) plane, an equilibrium point (pt ; qt) is situated in the neighbourhood of the point of intersection of the two lines q = bp and q = dp + ezt . The nature of this ”neighbourhood” depends on the assumptions made concerning u and v.47 nbigskip %…gure nbigskip The above graph enables to illustrate some of the cases that we talked about earlier (cum grano salis, the line of least-squares regression of q on p is represented...........): nbigskip %…gure nbigskip We can think of other possibilities of this type. In fact, we will come back to use of graphs later when we discuss the identi…ability of the parameters of the model.

14.1.5 Estimation of the parameter b of the demand equation Since the OLS estimation ^b is, in general, not consistent, we will have to …nd another estimator for b. Let us rewrite the reduced form as follows:

pt = ®zt + "t

with

qt = °zt + ´ t with

®=

°=

e b¡d

be b¡d

and

"t =

and ´t =

1 (vt ¡ ut ) b¡d

1 (bvt ¡ dut) b¡d

Keeping in mind the assumptions concerning the disturbances, it is natural, a priori, to estimate ® and ° using OLS separately on each of the two equations: 47

Note that, in order to correctly interpret the following graph, it should be recalled that the absence of a constant term in the model means that the variables are measured in terms of deviations with respect to sample means resulting in both positive and negative quantities and prices.

[14.1]

An Equilibrium Model of Supply and Demand

309

P P zt pt z tqt ¤ ® = P 2 ° = P 2 zt zt ¤ ¤ These estimators ® and ° are in fact BLUE since the reduced form is a system of ¤

two stacked regressions in which the explanatory variables are the same. Further, they are consistent given the assumptions on limits made earlier: e be p lim ° ¤ = ° = b¡d b¡d By virtue of Slutsky’s theorem, it is therefore obvious that the estimator p lim ®¤ = ® =

P °¤ zt qt b = ¤= P ® zt pt ¤

is a consistent estimator of b :

°¤ ° = =b ¤ ® ® ¤ (we can also verify that p lim b = b starting from the formula b¤ = the same procedure as we did for calculating p lim ^b). p lim b¤ = p lim

P P ztqt z t pt

and following

We will see later that this medthod of estimation - called ”the indirect least squares” has a particularly simple interpretation in the general famework of ”instrumental variables” method of estimation. It provides a satisfactory way of estimating the parameters of a ”…rstorder just-identi…ed” structural equation. Before explaining this last term more precisely, let us brie‡y mention the problem posed by the non-existence of the moments of b¤. This estimator is a ratio of two linear forms that are not independent of the disturbances. One can therefore easily imagine, particularly in the case of normality, that the expectation of this ratio is in…nite. We will come back to this problem later.

14.1.6 On the identi…ability of the parameters of the model If we talked about the estimation of b and not that of d or e till now, it is not only for reasons of simplicity in the writing of formulas. There are more profound logical reasons, in the context of our model, that in general make the estimation of d and e irrelevant. In this chapter, we are going to present an intuitive introduction to the identi…cation problem in a simultaneous equation model. This problem will be treated in a more general

310

Introductory Examples

[Chap. 14]

and detailed manner in the next chapter. Let us use the graphical technique once more, and represent several equilibrium points (pt ; qt) (observed at di¤erent dates t = 1; 2; 3; 4 in the plane (p; q): %…gure We …nd that the points 1; 2; 3; 4 are situated ”around” the demand line and that the more the number of points, the better the demand line q = bp will be ”surrounded”. Thus one can imagine that with enough number of points the demand line can be located without any ambiguity: in this case, we will say that it is ”identi…able”. On the other hand, due to the presence of the exogenous variable z, the di¤erent equilibrium points generally give no indication on the slope of the supply curve in the (p; q) plane. We say that the supply equation is not identi…able. In order to make the above reasoning more rigorous, one should consider two related points: (i) the maximum information that the observations can contain (however big they are in number), (ii) the questions raised while estimating the structural form. The identi…cation problem consists in determining whether the information of (i) is su¢cient for answering the questions of (ii) i.e. whether the questions are pertinent or not. At best, the observations can give us the joint probability distribution of the endogenous variables p and q conditional upon the exogenous variable z. Thus the identi…cation problem consists in examining whether it is possible to go back, without any ambiguity, to the parameters of the structural form from the parameters of the joint distribution of p and q given z. If there are several ways of going from the parameters of the joint distribution of the endogenous variables to certain structural form parameters, this means that these structural parameters are not identi…able as there are many di¤erent structural values compatible with the same joint distribution.48 Note that the assumptions made on the second-order moments of the disturbances guarantee the existence of the second-order moments of the joint distribution of the endogenous variables. Now, these are totally described by the parameters of …rst and second order, of the reduced form. Hence, if we limit ourselves to the moments of the …rst two orders of the joint distribution of the endogenous variables, (or if this joint distribution is fully characterised by its …rst and second order moments e.g. as in the ”normal” distribution), then the study of the identi…ability of the structural parameters 48 We also say in this case that the di¤erent structures are equivalent from the point of view of the observations. We come across this type of situation in a linear model when there is (strict) multicollinearity.

[14.1]

An Equilibrium Model of Supply and Demand

311

boils down to that of the passage from the parameters of the …rst two orders of the reduced form to those of the structural form. In practice, one always limits oneself to this discussion of identi…ability ”at the …rst and second order levels”. In an even more restricted perspective, if we consider only the coe¢cients of the structural form (which are b d, e in our example), then the identi…ability problem at the …rst-order level is reduced to the study of the transformation which enables to obtain the coe¢cients from the …rst-order parameters of the reduced form (i.e. ® and °, in our case). Now, we have seen that e be ; °= b¡d b¡d It is obvious that this system of two equations - one linear and the other non-linear - in ®=

3 unknowns (b; d; e) has, in general, an in…nite number of solutions. However, by writing the system as nnote{If we examine the system carefully, we …nd that the ”Jacobian” of the transformation is

µ ¶ 1 @®; ° ¡e e b¡d = @(b; d; e) (b ¡ d)2 ¡ed eb b(b ¡ d) of which one minor is identically equal to zero}

®b ¡ ®d ¡ e = 0

(° ¡ e) b ¡ °d = 0

we note that there is only one solution for b, namely ° ® but there are in…nite solutions for d and e given by b=

®d + e = ° We therefore conclude that the …rst-order parameters of the supply equation are note identi…able at the …rst order level whereas the …rst-order parameters of the demand equation are just-identi…ed at the …rst-order level. It should be mentioned that even if the term ”at the …rst-order level” is often omitted in practice, it is necessary to specify that only the information concerning the …rst-order moments of the endogenous variables in the reduced form, is taken into account. If we want

312

Introductory Examples

[Chap. 14]

to consider other information - e.g. information relative to the second-order moments of the disturbances - then the structural parameters of …rst-order that were not identi…able at the …rst-order level, may become so. We have already seen an example of this in which ¾ and ¾

2 u



2 v

uv

=0

+ e2M zz make the supply equation identi…able and not the demand one.49

Let us …nally add that it is also possible to study the identi…cation of the secondorder parameters of the structural form - ¾

2 u,

¾

2 v

and ¾

uv

in our case - from the second-

order parameters of the reduced form. This point will be developed while discussing the identi…cation problem in general.

14.2 A Simple Keynesian Model 14.2.1 The model The simple Keynesian model has an important role to play in the teaching of macro-economic analysis. In economic theory, one studies the model and its implications regarding economic policies, without introducing any random elements and assuming that the coe¢cients are known. Here, on the contrary, we are only going to look at the econometric problems that this model brings up and see how we can estimate its parameters once we explicitly add random disturbances in the consumption equation.50 The most elementary version of this model consists of a consumption fraction and an (expost) accounting identity between resources and expenditures. This model therefore appears to be simpler than the supply-demand equilibrium model since it has only one equation with parameters to be estimated. On the other hand, it has an identity equation (i.e. an equation with known coe¢cients and without any disturbance term; the coe¢cients are all equal to 1 here), which leads to certain particularities concerning estimation of the model. This will be discussed later in a following chapter. On the whole, we write 49

Let us just note that, in this case, we speak of ”identi…ability at the second order” concerning the supply equation i.e. identi…ability resulting from the a priori information relating to second-order moments. 50 More detailed and instructive information can b e found in Malinvaud (1978, pp.124-135) and in the article by Haavelmo reproduced in Hood and Koopmans (1953, pp.75-91). This is a simple model in the sense that it does not contain any external sector or plublic expenditure or money or …nancial operations, etc.

[14.2]

A Simple Keynesian Model

313

Ct = a + bRt + ut t = 1; :::; T Rt = C t + It where Ct =total consumption in period t, Rt = total income and It ”autonomous investment”. The variables ”explained” by the model (or the dependent variables) are C and R; the only real explanatory variable (apart from the unit-valued variable e implicitly associated with the constant term) is I. This variable I plays the same role as the usual explanatory variables in a linear model. It is assumed that u is a random disturbance term with the following properties:

E(ut ) = 0 E(u2t ) = ¾

2

E(ut u0t ) = 0

u independent of I In the terminology of simultaneous equation models, C and R are called endogenous variables, I is called the exogenous variable (as well as e) and all the above information constitutes the ”structural form” of the model. Thus, the structural form consists of not only the equations but also the list of endogenous and exogenous variables as well as the assumptions on the disturbances. If we express the endogenous variables C and R in terms of the exogenous ones I, e (the dummy variable with unit value) and the disturbance u, we get the reduced form: R.F.

a 1 1 + It + ut 1 ¡b 1¡ b 1¡b b 6= 1 a 1 b Ct = + It + ut 1 ¡b 1¡ b 1¡b

Rt =

Just like the structural form, the reduced form is made of not only the equations but also the set of assumptions on the disturbances.

314

Introductory Examples

[Chap. 14]

14.2.2 Immediate consequences What are the immediate consequences of the above speci…cation of the structural form? More speci…cally, as the crucial assumption

E(ut j Rt ) = 0 valid in the consumption equation, in which case, the equation becomes a simple regression model? We are going to show that it is not so. For this, let us add the two equations of the model thus eliminating Ct and solve for ut :

ut = (1 ¡ b)Rt ¡ It ¡ a so that

E(ut j Rt ) = (1 ¡ b)Rt ¡ E(It j Rt) ¡ a Since It is an exogenous variable i.e. one which is not explained by the model (the term ”autonomous” is in fact meant in this sense)51 , we have E(It j Rt ) = It and hence

E(ut j Rt) = (1 ¡ b)Rt ¡ It ¡ a = Ct ¡ bRt ¡ a = ut so that E(ut j Rt ) is in general di¤erent from zero (for almost all t) as the consumption function is not a deterministic one. The same function is not a deterministic one. The same result can be expressed in a di¤erent way. It is not possible to write E(Ct j Rt) = a + bRt when Rt = Ct + It and that It is exogenous. Actually, from the identity, we derive that E(Ct j Rt ) = Rt ¡ It which cannot be equal to a + bRt for all Rt except in the case in which It =constant = ¡a and b = 1. Therefore, the fundamental assumption of a simple linear model is not satis…ed and hence the consumption equation of a simple Keynesian model cannot be classi…ed as a simple linear regression model even if it appears so. 51

By assuming investment to be autonomous, we think of the Keynesian model essentially as a short term one, as investment expenditure is mostly predetermined in the short run. (There is generally a period of several years between the date at which one decides to launch a new model of a car or a plane or to install a thermal or nuclear power station and the date at which the corresponding investments in buildings and materials are actually undertaken.

[14.2]

A Simple Keynesian Model

315

What is then the validity of the ordinary least squares estimators for estimating the coe¢cients of the consumption function. We expect that they do not, in general, possess interesting properties since the fundamental assumption of the linear model does not hold. Let us verify this directly in the case of coe¢cient b (the marginal propensity to consume). The ordinary least squares estimator of b is:

^b = where the sum is from t = 1; :::; T .

P

¹ ¹ (Ct ¡ C)(R t ¡ R) P ¹ 2 (Rt ¡ R)

¹ and (Ct ¡ C¹ ) from the reduced form. By substituting We can easily calculate (Rt ¡ R) them in the formula for ^b, we obtain P P P ¹ 2 + (1 + b) (It ¡ I) ¹ (ut ¡ u¹) + (ut ¡ u¹)2 (I ¡ I) t ^b = P P P ¹ (ut ¡ u¹) + (ut ¡ u¹)2 (It ¡ I¹)2 + 2 (It ¡ I) While, in the simple regression model, the disturbances ut appear only in the numerator of b

the corresponding expression and that too in a linear form52 , we see that, here, they appear both in the numerator and in the denominator and in a non-linear manner. It is therefore not generally possible to calculate explicitly and rigorously E(^b). Hence we cannot say ^b is an unbiased estimator. 53

14.2.3 Inconsistency of ordinary least squares estimators The non-linearity of the OLS estimator ^b in terms of the disturbances makes it practically infeasible for using the expectation operator which is a linear one. Thus, in order to be more explicit, we will use the notion limit of probability and show that ^b is not a consistent estimator. For this, we need to make additional assumptions concerning the asymptotic behaviour of It and ut . We will thus assume that 52

In the simple regression model yt = a + bxt + ut, we indeed have:

^b

= = =

53

refer to note 5) ab ove.

P

P (xt ¡ x¹) + (yt ¡ y¹) P (xt ¡ x¹)2 P P b (xt ¡ x¹)2 + (xt ¡ ¹ x) (ut ¡ u ¹) P (xt ¡ ¹ x)2 P (xt ¡ x¹) ut b+ P (xt ¡ x¹)2

316

Introductory Examples

As T ! 1

[Chap. 14]

1X p ¹2! (It ¡ I) MII > 0 T

1X p (ut ¡ u¹)2 ! ¾ T

2 u

>0

1X p (It ¡ I¹) (ut ¡ u¹) ! 0 T These assumptions are quite straight forward: the …rst one consists in giving an asymptotic meaning to the dispersion of It ; the other two are the result of a classical ”law of large numbers” by which the empirical moments converge almost surely (and hence in probability) to the corresponding theoretical moments. Note that the theoretical ”covariance” between I and u is zero by de…nition as I is an exogenous variable. Following these assumptions, the p lim of ^b can be calculated by dividing the numerator and the denominator by T and then using Slutsky’s theorem (on the probability limits of continuous function of random variables), we thus obtain: ¾ 2 +¾ p lim ^b = bMII + MII

2

which shows that ^b is not a consistent estimator of b We can go further and study the ”inconsistency” of ^b i.e. the di¤erence between p lim ^b and b. We have: 1 (1 ¡ b) 1 + M¾ II 2 We can immediately draw the following conclusions: p lim ^b ¡ b =

- Since - If ¾

2

1 1+

M II ¾ 2

> 0, p lim ^b ¡ b is of the same sign as 1 ¡ b.

is very small compared to MII , then p lim ^b 6= b. This is quite natural since, in this

case, the disturbances ”disturb” in a negligible manner the in‡uence of the income variable Y , at least asymptotically speaking. In other words, the ”signal” predominates over the ”noise”. 54 - On the other hand, if ¾

2

is very big compared to MII , then p lim ^b 6= 1 : the ”noise”

caused by the disturbances completely masks the information provided by the variable R 54

Already in the simple linear model y = a + bx + u, the smaller ¾ the variance of the OLS estimator ^ b, in …nite samples.

2

is with respect to

P

(xn ¡ x¹)2, the smaller is

[14.2]

A Simple Keynesian Model

317

in the explanation of the consumption, and what we estimate asymptotically is simply the accounting identity C = R ¡ I. We can be even more explicit if we are ready to make more assumptions on the (unknown) numerical value of the parameter b. Recalling that it can be interpreted economically as the marginal propensity to consume, it may be justi…ed to assume, a priori, that b < 1 and hence 1 ¡ b > 0. In this case,

0 < p lim ^b ¡ b < 1 ¡ b as p lim ^b ¡ b =

1 1+

MII ¾ 2

(1 ¡ b).

Equivalently,

b < 1 ) b < p lim ^b < 1 We say in this case that ^b ”over-estimates” b asymptotically.55 To sum up, we can learn a lot about the asymptotic behaviour of the ordinary least squares estimator, at almost no cost, in this simple model.56

14.2.4 Graphical Illustration As our Keynesian model is su¢ciently simple, it is possible to illustrate the above results on the asymptotic properties of the ordinary least squares, by means of a graph. Such an illustration obviously cannot serve as a proof (if only because it is not possible to represent in a graph an in…nite number of points!) but it can help to understand better what we have already established. In the (R; C) plane, a typical observation represented by a couple (Rt ; Ct ) is necessarily 55

Let us just mention that this result is the opposite of the one that we obtain in the case of an errors-in-variables model. 1 is the Keynesian multiplier of investment, we see that it is systematically overFurther, if we recall that 1¡b 1 1 estimated as > . (This may lead to in economic policies, if at all this simple model is actually used for ^ 1¡b 1¡p lim b policy-making. However, we should not forget that this Keynesian mechanism is at the heart of most of the current macroeconometric models...) 56 ¹ We will then get We can also easily determine the asymptotic properties of the OLS estimator of a. ~ a = C¹ ¡ ^bR. I¹ a ¡ lim p lim a ~= 1+° 1+ ° P It ! lim I¹ > 0. so that p lim a ~ < a if a > 0 and T1

where ° =

MII ¾ 2

318

Introductory Examples

[Chap. 14]

on the line Ct = Rt ¡ It for a given It. The disturbance ut is therefore the vertical distance between the observation point (Rt ; Ct ) and the line C = a + bR. Moreover, for a given It, the initial assumptions on the disturbances (E (ut) = 0 etc...) imply that the observed points corresponding to di¤erent drawings of ut will be distributed on either side of the line C = a + bR. Further, they will be distributed symmetrically on the line Ct = Rt ¡ It if the distribution of ut is symmetric around zero (which we will assume for the sake of simplicity). We will also assume that the distribution is uniform in an interval whose length is determined by the assumption E(u2t ) = ¾ 2. 57 Finally, the ”cloud of points” will be, at least asymptotically, of a particular form - more or less a parallelogram - such that the line of OLS regression C~ = a~ + ^bR has a greater slope than the line C = a + bR. (The former line is obtained by minimising the sum of squares of the vertical distances and the latter is the ”big axis” of the parallelogram.) %…gure nbigskip If we assume, in a slightly abusive manner, that the width of the parallelogram MII and its height ¾ 2, the limiting cases mentioned above can be illustrated in a graph as below: %…gure nbigskip

14.2.5 Identi…cation and estimation of parameters We have seen that the OLS estimators of the structural parameters a and b are not satisfactory. Then, how else can we estimate these parameters - and ¾

2

- of this simple

Keynesian model? Let us …rst note that the fact that we could study, in detail, the ”inconsistency” of ^b, give us a hope that we could, a priori, correct this inconsistency in order to estimate the coe¢cient b in an asymptotically more satisfactory way than by OLS. In other words, we expect that it should be possible to determine the true values of a and b with a su¢ciently big number of observations i.e. it should be possible to ”identify” a and b using the whole set of possible observations. The simplest way to justify our hope is to obtain a consistent 57

p i.e. of length 2 3¾

2

.

[14.2]

A Simple Keynesian Model

319

estimator. Let us see how we can go about in order to get one. Let us go back to the reduced form:

Rt =

a 1 1 + It + ut 1 ¡b 1¡b 1¡b

Ct =

a 1 b + It + ut 1¡b 1 ¡b 1¡b

On examining it once again, we see that each of the two equations represents a simple linear model since it is assumed that the disturbances ut are of zero mean and are independent of the explanatory variable It . It is therefore natural to proceed to estimate them by OLS.58 By denoting ® =

a 1¡b

and ¯ =

1 , 1¡b

we obtain the OLS estimators ®¤ and ¯ ¤ for the …rst

equation of the reduced form by regressing R on I. These estimators are consistent i.e.

p lim ®¤ = ® =

a 1¡b

p lim ¯ ¤ = ¯ =

1 1¡b

In the same way as we can obtain a and b from ® and ¯, we can derive the estimators a¤ and b¤ from ®¤ and ¯ ¤:

b¤ = 1 ¡

1 ¯¤

a¤ =

®¤ ¯¤

We thus obtain, say for b¤: P

P ¹)2 ¹ ¡ I (I (I ¡ I¹)(Ct ¡ C) t P t b¤ = 1 ¡ P = ¹ t ¡ R) ¹ ¹ t ¡ R) ¹ (It ¡ I)(R (It ¡ I)(R ¹ ¹ ¡ ¯¤ I. from which a¤ can be seasily obtained using ®¤ = R It is obvious that the absence of bias of ®¤ and ¯ ¤ do not imply the unbiasedness of a¤ and b¤ since the transformation allowing us to go from one set of parameters to the other is not linear. On the other hand, Slutsky’s theorem ensures that the consistency property is carried over from one set to the other 58

At this stage, we can consider the two equations of the reduced form as a system of stacked regressions with a 1 the particularity that the constant terms 1¡b and the disturbances 1¡b ut are the same in both the equations. This particularity - which is due to the fact that the second equation of the simple Keynesian model is an identity - may lead to the following question: are there more e¢cient estimators than OLS applied separately to the …rst (or the second) equation? The answer is no. We will come back to this point in a chapter on the treatment of identities in general. In any case,Pthe only property of OLS that we use here is the consistency which is obviously ensured by p ¹2! MII > 0. E(ut j It ) = 0 and T1 (I t ¡ I)

320

Introductory Examples

p lim b¤ = 1 ¡

[Chap. 14]

1 1 = 1 ¡ (1 ¡ b) = b ¤ = 1¡ p lim ¯ ¯

p lim ®¤ ® =a ¤ = p lim ¯ ¯ We have therefore succeeded in obtaining consistent estimators of a and b. (A consistent p lim a ¤ =

estimator of ¾

2

can be easily derived from these.) These estimators a¤ and b¤ are called the

indirect least squares estimators. Having come to this stage, it is natural to ask the following question: instead of estimating by OLS the parameters of the …rst equation of the reduced form, we could have as well estimated those of the second equation and got consistent estimators °¤ and ±¤ with ° = and ± =

b 1¡b ;

a 1¡b

by following the same procedure as for ®¤ and ¯ ¤ above, would we have

obtained the same estimators a¤ and b¤ ? It is easy to verify that the answer is yes (the reader is invited to check the result!). Is it a coincidence? Well, we will see later that in general the estimation of the structural parameters of a simultaneous equation model is meaningful only if these parameters are identi…able i.e. if there is only one way of going back to the structural parameters starting from the reduced form parameters. It is in fact the case here. There are 1 parameters to be estimated in the structural form (a and b) and 4 in the reduced form (®; ¯; °; ±); if the latter could be, a priori, any value, then there would be no way of obtaining the structural form from the reduced form. Indeed, a system of 4 non-linear equations in 2 unknowns has, in general, no solution. But, if we take into account the expressions of ®; ¯; °; ± in terms of a and b, we see that ° = ®, ± = ¯ ¡ 1 so that there are only 2 out of the 4 parameters, that are linearly independent, say ® and ¯. Hence, the transformation (®; ¯; °; ±) to (a; b) is unique and the structural parameters are ”identi…able”. It is also interesting to note that the estimation of the reduced form by OLS without taking into account the a priori constraints ° = ® and ± = ¯ ¡ 1 leads to estimators ®¤, ¯ ¤,° ¤ , ±¤ that still satisfy these constraints (check!). As explained in footnote 21, the reason for this result (which may be surprising at …rst sight) is that the second equation of the structural form is an accounting identity. We will not insist any more on this point for the moment. Finally, we can ask ourselves the following question: is it possible to obtain the moments of a¤ and b¤ from those of ®¤ and ¯ ¤ which are easy to calculate since the reduced-form equation in which they appear is estimated by ordinary regression? The answer to this

[14.2]

A Simple Keynesian Model

321

question is much more delicate that what one can think of a priori, as the transformation from (®¤; ¯ ¤) to (a¤ ; b¤) is not linear. In fact, it is not even sure that the moments of (a¤; b¤) exist. We can even show that the distribution of (a¤ ; b¤ ) has no moments or equivalently that the expectations of a¤ and b¤ are in…nite. This is nothing extraordinary except that it invalidates certain statistical procedures that are based on the existence of moments of a distribution especially as regards tests. Thus, we are led to consider the asymptotic distribution of (a¤; b¤) which is equally delicate if we are to be rigorous. These di¤erent problems - …nite sample distributions, asymptotic distributions, tests - will be studied in a general way later in various chapters. Conclusion The two classical examples that we dealt with are simple enough to allow us to carry out and interpret most of the calculations, but at the same time, they help us to illustrate most of the problems that arise in the estimation of simultaneous equation econometric models. They also lead to certain common conclusions. The most important conclusion is that from the moment that there are more than one endogenous variable in at least one of the equations of the system, it is no longer possible to work within the framework of classical regression model. In the simultaneous context, the disturbances do not play the same role and do not have the same status as in the linear model in spite of their apparent similarity. An immediate consequence of this is that the least squares method, with its di¤erent variants, which was remarkably adapted to the linear model by virtue of Gauss-Markov theorem, looses its interesting statistical properties in the case of simultaneous equation models. Nevertheless, this does not mean that the calculation of OLS estimators is completely useless: an econometrician can often acquire information of some interest, on the unknown parameters of this model, by using OLS, via more assumptions that re‡ect the a priori ideas of an economist. Further, this method is easy to implement. Hence, we will denote a brief chapter to a systematic study of the ”simultaneity bias” of the OLS estimators in a simultaneous equation model. Another important conclusion to be drawn from the previous examples is that, prior to any attempt at estimation, it is logically necessary to ensure that the parameters to be estimated are identi…able. That is, in the set of all possible values of these parameters, there is only one which is compatible with the joint distribution of the endogenous variables conditional

322

Introductory Examples

[Chap. 14]

upon the exogenous variables. This identi…cation problem can be thought of as one related to the uniqueness of the solution of the transformation from the reduced form parameters to the structural form parameters. Because of the logical (and practical) importance of this problem, it will be treated more systematically and in a wider perspective, in a later chapter. At this stage i.e. supposing that the problem of identi…cation of the structural form is clari…ed, it was possible, in the examples considered, to derise satisfactory estimation methods (at least from an asymptotic point of view) for the simultaneous equations. The most e¢cient method should obviously take into account all the a priori information that one has both at the …rst order and the second order levels. Thus, a major part of the course will be concerned with the detailed study of di¤erent estimation methods (there are in fact many of them) and their statistical properties (at least asymptotically) along with comparisons among them in order to choose the best one in each case, in terms of statistical e¢ciency, of computing di¢culty and may be of robustness with respect to most delicate assumptions of the speci…cation in question. Before proceeding with a general formulation and a full treatment of the simultaneous equation models, we would like to brie‡y come back to the notion of simultaneity itself. In order to resort to a simultaneous equation model for the analysis of an economic phenomenon, it seems that the phenomenon under study should involve some ”instantaneous feedback” among observable quantities, which basically result from the decision making process of economic agents. One can imagine that the presence of such a simultaneity of action and reaction will depend on the level of aggregation of economic agents and the length of a unit time period for which observations are available. It is easily understandable, for instance, that a particular phenomenon may be analysed in a ”sequential” manner rather than in a ”simultaneous” manner if weekly rather than annual data are available. Thus, formally, there is no clear-cut epistemological division between certain simultaneous equation models and certain vectorial dynamic (autoregressive) models. One should therefore keep in mind that, in practice, using one or the other model to represent the reality in a necessarily simpli…ed manner requires a prior interrogation about the existence and importance of possible simultaneities. We will come back later to this important point, more technically.

Chapter 15 Identi…cation

Introduction The reader may recall that we already came across the notion of identi…cation while discussing the introductive examples in Chapter 1. In this chapter, we are going to formalize the identi…cation problem in the context of ”standard” simultaneous equation models, keeping in mind the limitations set by pedagogical needs. We can approach the identi…cation problem in two ways - the …rst one being general and the second one more restricted.

In the …rst approach, we base our reasoning on

the distribution of endogenous variables conditional upon the exogenous ones, in order to determine the conditions that enable a set of structural parameters to be compatible with it in a unique manner. In the more restricted second approach, we only con…ne ourselves to the moments of the …rst two orders of the above distribution. In the latter ase, we speak of identi…cation ”at the …rst two order levels” of the structural parameters. c Alberto Holly °

323

324

Identi…cation

[Chap. 15]

The use of the general approach, based on the conditional distribution of the endogenous variables given the exogenous variables, is justi…ed especially in cases in which one obtains di¤erent conclusions from those obtained using the …rst two moments. In fact, the economist rarely has a priori information on the probability distribution. He is often led to hypothesize that it is normal and in this case, both the approaches lead to the same results as the normal distribution is entirely determined by its …rst two moments. For this reason, we will not examine in detail the approach based on the probability distribution. Nevertheless, we will mention a few results that follow from this approach, in the last paragraph. This chapter will therefore be mainly devoted to the study of identi…cation based on the …rst two moments of the conditional distribution of the endogenous variables. Actually, the major part of this chapter will discuss the identi…cation of the …rst order only, i.e. identi…cation based on the conditional expectation of the endogenous variables. There are several reasons for adopting this restricted approach. The …rst one is of a pedagogical nature. It is easier to get used to the di¤erent results on identi…cation if only the …rst order is examined. Secondly, except for very speci…c models, the economist does not have any a priori information on the second order moments of the distribution of the endogenous variables. Finally, it seemed preferable to look at the problem of identi…cation at both the …rst and the second order levels while examining the models in which it is really important to study the problem in a precise manner. This is especially so in the case of recursive models which can be perfectly identi…ed at the …rst and second order levels but need not be so at the …rst order only. Let us just add that, in this chapter, we assume that the models do not have any identity equations. We will in fact devote a full chapter on models with identity equations, in which we will also discuss the problem of identi…cation of such models. Before discussing the results on identi…cation, we would like to once again stress the fact that it is a logical problem. Examining the identi…cation problem in the context of an econometric model amounts to studing the internal coherence prior to the implementation of estimation procedures. In fact, the conclusions that one draws from the identi…cation study have important consequences on the method of estimation.

[15.1]

Graphical Illustration

We will start with a graphical representation of the di¤erent possible cases.

325 The

graphical approach is instructive but only partial and hence we will later examine the general identi…cation criteria. These criteria are very useful in dealing with the properties of estimation methods. Moreover, even though identi…cation is essentially a logical problem, its study which leads to the concept of identi…ability of a set of parameters, enables to justify certain estimation methods and to study their statistical properties.

15.1 Graphical Illustration In Chapter 14, we already examined the identi…ability of parameters in two classical examples namely the simple Keynesian model and the equilibrium model of supply and demand. We may recall that the second model is written as : 8 d (demand) < q = a + bp + u 0 q = c + dp + ez + v (supply) : d = q0 (equilibrium) q and that, in this model, the …rst equation is identi…able whereas the second one is not, at least in the absence of any a priori information on the elements of the covariance matrix of the disturbances. We are now going to examine two other equilibrium models of supply and demand. In the …rst one, none of the equations is identi…ed whereas in the second one both the equations are. The …rst model is written as follows : 8 < qd = a + bp + u (demand) q0 = c + dp + v (supply) : qd = q0 (equilibrium)

Let us represent, in a graph, the di¤erent equilibrium points (p; q) observed at di¤erent dates t = 1; 2; 3; t: nbigskip %…gure nbigskip

326

Identi…cation

[Chap. 15]

From the graph, it can be noted that the observed points 1; 2; 3 and t are situated around the point of intersection of the two lines q = c + dp and q = a + bp. Thus we see that the equilibrium points do not provide any information on the parameters of the supply and the demand equations and this will be the case even if we have an in…nity of points. We therefore say that the two equations are not identi…able. Let us now consider the second example : 8 < qd = a + bp + u (demand) q0 = c + dr + v (supply) : d = q0 (equilibrium) q

with v exogenous and let us graphically represent an equilibrium point (vt ; qt; pt ) : nbigskip %…gure nbigskip In this graph, one may see (with a little bit of imagination!) that the oberved point t is near the theoretical equilibrium point (for a given rt ), such that in the (q; r) and (q; p) planes, the observed points (qt ; rt ) and (qt ; pt) are situated near the lines q = c + dr and q = a+bp respectively. By the way, these points (qt; rt ) and (qt ; pt ) are the projections of point t in the respective planes. Moreover, if we have enough observation points t = 1; 2; :::; the corresponding points will surround the above lines, which will enable them to be identi…ed and eventually estimated. The graphical examples provide an intuitive approach to the identi…ability problem. However, in this approach, on one hand the treatment of disturbances is very brief and on the other, this type of representation becomes impractical once we have four or more variables in the model. Hence we go on to examine the general identi…cation criteria.

15.2 First-order identi…cation In this section, we will see that identi…cation can be viewed from two di¤erent angles. The …rst way of looking at it is to examine the passage between the structural form and the reduced form and to arrive at certain criteria involving the reduced form. The second approach brings in the notion of ”equivalent structures” and leads to criteria involving the

[15.2]

First-order identi…cation

327

structural form. Let us brie‡y recall some notations. Let

Ay + Bx = u denote a set of linear models, where y denotes a vector of endogenous variables, x a vector of exogenous variables and u the disturbance vector. Note that the index t is omitted in the above expression. In fact, throughout this chapter, we will try as far as possible to retain the same notation, i.e. without t, in order to stress the point that the identi…ability concept is in no way linked to the observations or the sample. Let us further denote by A0 and B 0 the true values of the matrices A and B. Thus, we assume that the true model is

A 0y + B 0x = u At this stage, it is relevant to add that the economist does have some a priori information on the coe¢cients in matrices A and B of the model Ay + Bx = u, which can be represented by J constraints of the form:

à j(A; B) = 0

(j = 1; :::; J)

In most of the cases, these restrictions are very simple : linear restrictions among certain coe¢cients of A and B or simply that some coe¢cients are equal to zero (exclusion relations) and that the diagonal elements of A are equal to 1 (normalisation rule). Of course, if the model speci…cation is correct, the true matrices A0 and B 0 must satisfy the a priori constraints. In other words, we necessarily have

Ãj (A0; B 0) = 0

(j = 1; :::; J)

Before considering the identi…cation problem in detail, let us make a few remarks on the …rst-order identi…cation59 . First of all, the fact that we assume that the a priori constraints concern only the matrices A and B and not the variance-covariance matrix §, amounts to 59 In the whole of paragraph II we assume that the de…nitions and results concern ”the …rst-order”. We will therefore omit to repeat this term in each case.

328

Identi…cation

[Chap. 15]

ignoring the existence of the disturbances (i.e. reasoning as though u = 0). This in turn implies that we ignore the di¤erence in the stochastic nature between the endogenous and the exogenous variables. However, from a deterministic point of view, the distinction between the two types of variables does not disappear : it has to be maintained in order to say that the endogenous variables are explained by the exogenous ones. Our study of the passage from the reduced form to the structural form boils down to the question as to whether there exists only one set of structural coe¢cients that express this (linear) relationship between the exogenous and the endogenous variables. Thus our present discussion is also useful for economists in the study of the internal coherence of their models even if they are not concerned about the estimation of parameters. Now, some de…nitions : A model is a set of structures S = (A; B) that verify a set of a priori restrictions : Ãj (A; B) = 0 (j = 1; :::; J ). The true model is one particular structure denoted as : S0 = (A0; B 0).

It will be convenient to denote a model by (S; Ã). Note that the above de…nition of a model corresponds to the structural form. To each model (S; Ã) corresponds a matrix C of the reduced form de…ned by ¡A¡1B = C with à j(A; B) = 0 (j = 1; :::; J). Of course, for the true matrix of the reduced form, we have C 0 = ¡A

0¡1

B 0.

15.2.1 From the Reduced Form to the Structural Form The Problem and the De…nitions

Let us ask ourselves the following question : Suppose we

know the true matrix C 0 of the reduced form; is it possible to derive in a unique manner the matrices A0 and B0 from C 0 ? In other words, is there a unique structure S 0 = (A0; B 0) such that ¡A0 ¡1B 0 = C 0 (with à j(A0 ; B 0) = 0) ? At this stage, it is to be noted that the constraints à j (A; B) = 0 play a crucial role in the identi…cation of A and B. The system AC 0 + B = 0 being a system of GK equations in G2 + GK unknowns, there would be an in…nite number of solutions for A and B if there were no additional constraints on the elements of A and B. Finally, let us add that the above question is rather too restrictive as it concerns the whole

[15.2]

First-order identi…cation

329

set of elements of A and B. Instead, we could have asked the same question for each of the elements of these matrices. For this reason, we adopt the following de…nitions : 1) We will say that a parameter of the structure (S; Ã) is identi…able (at the …rst order level) if and only if it retains the same value in all the solutions (for A and B) of the sytem 60 ½

AC 0 + B = 0 Ãj (A; B) = 0 (j = 1; :::; J

2) We will say that a structural equation is identi…able (at the …rst order level) if and only if all its parameters (of the …rst order) are identi…able. 3) We will say that a model is identi…able (at the …rst-order level) if all its structural equations are identi…able. Note that if we can directly incorporate the constraints Ãj (A; B) = 0 in A and B, then the above system can be written as

A¤ C 0 + B ¤ = 0 We will now illustrate the above de…nition using two examples. Let us …rst consider the structural form : ½

a 11 y1 + a12 y2 + b11 x1 = u1 a 21 y1 + a22 y2 + b21 x1 = u2

with a11 = a22 = 1 and b11 = 0. We have

¤

A =

µ

1 a12 a21 1



¤

B =

µ

0 b21



Let us write

0

C =

µ

0 C11 0 C21



The system A¤C 0 + B ¤ = 0 becomes ½

c011 + a12c021 = 0 a21c011 + c021 = ¡b21

From the …rst equation, we derive that 60

One can de…ne in the same way the identi…ability of a function linear in parameters

330

Identi…cation

[Chap. 15]

c011 c021 is identi…able according to our above de…nition as it can be determined a12 = ¡

We see that a12

uniquely from the reduced form parameters. In fact, the …rst equation is itself identi…able as all its parameters are so, given that a11 = 0 and b12 = 0. On the other hand, we note that the second equation is not identi…able as the equation a21c011 + c021 = ¡b21 does not enable us to determine a21 and b21 uniquely. These two parameters are not identi…able and hence the second equation and the model are not identi…able. Let us now consider a second instructive example : ½

a11y1 + a12y2 + b11x1 + b12x2 = u1 a21y1 + a22y2 + b21x1 + b22x2 = u2

Suppose that we do not have any a priori information on the elements of the intercovariance matrix of the disturbances and the only a priori information or constraints are the following : ½

a 11 = a22 (normalisation rule) b11 = b12 = 0 Let us examine the solutions for A¤ and B ¤ of the system A¤ C 0 + B¤ = 0 i.e. of µ

1 a12 a21 1

¶µ

c011 c012 c021 c022



+

µ

0 0 b21 b22



=0

We obtain : 8 0 > c + a12c021 = 0 > < 11 c012 + a12c022 = 0 > a21c011 + c021 + b21 = 0 > : a 0 0 21c12 + c22 + b 22 = 0 We note that we can obtain a12 in terms of the elements of C 0,in two ways : either c011 a12 = ¡ 0 c21

[15.2]

First-order identi…cation

331

or c012 c022 is identi…able according to our previous de…nition. Therefore, a12 = ¡

Thus, the parameter a12

we note that the true matrix C 0 veri…es the following constraint :

c022 c011 ¡ c012c021 = 0 This constraint implies that the two columns of C 0 are colinear and that the rank of C 0 is equal to 1. We will come back later to this important aspect of identi…cation once we would have presented the notions of ”under-identi…ed”, ”just-identi…ed” and ”over-identi…ed” equations. Let us just mention that the constraints on the reduced form resulting from those on the structural form will often prevent us from directly estimating the parameters of the model using the reduced form. These constraints are in general non-linear and hence the methods of estimation of multiple regression models under linear constraints, cannot be used. The identi…ability concepts de…ned above are quite general and allow to deal with situations in which the a priori information may consist of constraints involving parameters of one or more equations. However, in many cases, the a priori information concerns the coe¢cients of each equation separately. In the following section, we consider the case of linear restrictions on the …rst order parameters (i.e. elements of A and B) of the same equation. In sections IV and V, we will give results in relation to more general situations that can arise in some econometric models. The Case of intra-equation Linear constraints

We are now going to consider linear

restrictions on the coe¢cients of the matrices A and B that relate to a single equation of the model, say the g-th equation. Though the restrictions are of a particular form, we should say that this is the most frequent case encountered, at least in small models. The g-th equation can be written as :

a 0gy + b0gx = ug

332

Identi…cation

[Chap. 15]

Let us suppose that the coe¢cients a 0g and b0g are the coe¢cients of the g-th equation in its general form. Further, let us suppose that the a priori information on these coe¢cients are in the form of linear constraints written as :

p0g = © rg = ¸rg

(r = 1; :::; Rg )

where p0g = (a0g ; b0g ) and © rg is a vector of G + K elements. Let us add that we include the normalisation rule agg = 1 in the above constraints 61 . Let us write :

©g = (©1g ©2g :::©rg:::©Rg g) of dimention (G + K; Rg)

¸ 0g = (¸1g ¸2g :::¸rg :::¸Rg g ) of dimention (1; Rg ) The constraints can then be written compactly as

p0g ©g = ¸0g We will obviously assume that the above system is compatible i.e. there is at least one solution. Thus, for the model ½

a11y1 + a12y2 + b11x1 + b12x2 = u1 a21y1 + a22y2 + b21x1 + b22x2 = u2

the a priori information is as follows : ½

a11 = a22 = 1 b11 = b12 = 0

Let us consider the a priori constraints in the coe¢cients of the …rst equation. The vector ©11 and the constant ¸11 corresponding to the normalisation rule are : 61

or any other rule of the form : agg0 = 1

[15.2]

First-order identi…cation

0

1 1 B0 C C ©11 = B @0 A 0

and

333

¸ 11 = 1

Now, regarding the constraints b11 = 0, b12 = 0, we have

©21

0

1 0 B 0C C =B @ 1A 0

0

1 0 B 0C C ©31 = B @ 0A 1

and

and the two constants

¸21 = 0

¸31 = 0

Thus, we have 0

1 B 0 ©1 = B @ 0 0

0 0 1 0

1 0 0C C 0A 1

and ¡

¸01 =

1 0 0

¢

Next, let us look at the constraints relating to the second equation. Once again, for the normalisation rule, we have :

©12

1 0 B 1C C =B @ 0A 0 0

and the constant

¸12 = 1 and for the constraint a21 + b21 = 0, we have

334

Identi…cation

©22 and the constant

[Chap. 15]

0

1 1 B 0C C =B @ 1A 0

¸22 = 0 Hence, for the second equation 0

and

0 B1 ©2 = B @0 0

¸02 =

¡

1 1 0 C C 1 A 0

¢

1 0

It sould be noted that, in general, the number of columns in ©g is equal to the number of restrictions (Rg ) on the g-th equation (including the normalisation rule). Further, we will assume that the Rg constraints in the g-th equation are not redundant - in other words that rank (©g) = Rg . Let us now introduce the matrix W0 of dimension (G + K; K) such that

W0 =

µ

C0 IK



and let us denote as P the matrix

P = (A; B) It follows that P W0 = 0 as we have AC 0 + B = 0. The g-th row of the above relation P W0 = 0 can be written as

p0g W0 = 0 We are now going to derive an identi…ability criterion for the g-th equation in the presence of intra-equation constraints. It should be noted that this criterion concerns the whole of

[15.2]

First-order identi…cation

335

g-th equation and not any single parameter. Hence it does not enable us to establish the identi…ability of a single parameter considered separately. Taking account of the a priori restrictions p0g©g = ¸0g, the study of the identi…ability of the g-th equation amounts to the study of the uniqueness of the solution for pg, of the following system : ½

p0g W0 = 0 p0g ©g = ¸0g

or of

0

(W0©g ) pg =

µ



0 ¸g

This linear system has a unique solution for pg if and only if rank (W0©g )0 = the number of columns in (W0©g )0 = G + K. Since rank (W0© g)0 = (W0 ©g), we can state the following result : Theorem 2.1 (Rank Condition on the Reduced Form): The g-th equation is identi…able by means of linear (intra-equation) restrictions if and only if the rank of the matrix (W0© g) (of dimension (G + K; K + Rg)) is equal to G + K. As noted above, the criterion that we have established in the above theorem is called the rank condition. We can derive an important corollary for practical purposes. As the matrix (W0 ©g) is of dimension (G + K;

K + Rg ), we have

rank (W0 ©g) · min(G + K;

K + Rg )

For the inequality

G + K · min(G + K;

K + Rg )

to hold, it is necessary (but not su¢cient) that

Rg ¸ G This necessary condition is known as the order condition Thus we can say : Corollary 2.2 (Order Condition). For the g-th equation to be identi…able by means of linear (intra-equation) constraints, it is necessary that Rg ¸ G

336

Identi…cation

[Chap. 15]

In other words, in order that an equation of a model of G equations be identi…able, it is necessary for the number of a priori constraints on the parameters to be greater than or equal to the number of equations.62 Once again, this condition is not su¢cient. It only states the minimal number of a priori information on the (…rst-order) parameters of an equation, for this equation to be identi…able. We will now illustrate the use of the above criterion with he help of two examples. First, let us consider the model : ½

y1 + y2 + b12x1 = u1 a21 y1 + y2 + b22 x1 = u2

for which G = 2, K = 1 and R1 = 2. We have

¤

A =

µ

1 1 a 21 1



;

¤

B =

µ

b12 b22



and

¤

¤¡1

1 B =¡ 1 ¡ a21

c011 c021



C = ¡A

¤

µ

¡b22 b12 ¡b12a21 +b22



b012 ¡b022 ¡b012 a021 +b022



Of course,

0

C =

µ

1 =¡ 1 ¡ a021

µ

It can be easily seen that 0

1 1 0 ©1 = @ 0 1 A 0 0 Hence, 62 In the most frequent case in which the a priori restrictions consist of the normalisation rule on one hand and the exclusion restrictions on the other, we have Rg = Ng¹ + K¹g + 1 where Ng¹ and Kg¹ denote respectively the number of endogenous and exogenous variables excluded from the g-th equation. The order condition then becomes N¹g + K ¹g ¸ G ¡ 1. In other words, the total number of variables excluded from the g-th equation should be greater than or equal to the number of equations minus one in order that the equation be identi…able. Further, Ng represents the number of endogenous variables appearing in the g-th equation without taking into account the g-th variable yg . Of course, we have G = Ng + N¹g + 1 and the order condition becomes Kg¹ ¸ Ng . This implies that the number of exogenous variables excluded from the g-th equation should be at least equal to the number of endogenous variables appearing in this equation without considering the g-th endogenous variable yg .

[15.2]

First-order identi…cation

337

0

1 c011 1 0 (W0 ©1) = @ c021 0 1 A 1 0 0

We have

det(W0 ©1) = 1 Thus, the rank of (W0©1 ) = 3 = G + K. Therefore, the …rst equation is identi…able as it satis…es the rank condition. In the above example, even if the elements of C 0 are functions of those of A0 and B0, they could be considered to be free : no constraint was implied on C 0 by the particular values taken by elements of A0 and B 0. The next example will show that this is not always the case. Here, the model is written as : ½

y1 + a 12 y2 + x1 + b12 x2 = u1 y2 + b22 x2 = u2

We have

¤

A =

µ

1 a12 0 1



;

B =

1 b12 0 b22



µ

1 b12 ¡ a 12 b22 0 b22



¤

µ

and

¤

C = ¡A

¤¡1

¤

B =¡

As C 0 satis…es the same constraints as C ¤ we have

0

C =

µ

c011 c012 c021 c022



with c011 = ¡1, c012 = ¡b012 + a 012 b022, c021 = 0, c022 = b022. Let us suppose that we do not take account of these constraints. We would have 0

1 c011 c021 B c021 c022 C C W0 = B @ 1 0 A 0 1

Furthermore, R1 = 2 for the …rst equation and

338

Identi…cation 0

1 B0 ©1 = B @0 0

The determinant of the matrix

0

is given by

[Chap. 15]

1 0 0 C C 1 A 0

c011 c021 B c0 c0 21 22 (W0 ©1) = B @ 1 0 0 1

1 0 0 0

1 0 0C C 1A 0

det (W0© 1) = ¡c021 If we do not consider the constraints on the coe¢cients of C 0, then we would conclude that det (W0©1 ) is di¤erent from zero and hence that the rank of (W0© 1) = 4 = G +K which would have led us to the conclusion that the …rst equation is identi…able. But we have just seen that c021 = 0. Thus, det (W0©1) = 0 and the rank condition is not satis…ed. Therefore, the …rst equation is not identi…able. This example shows that it is necessary to take account of the constraints on C 0 implied by the constraints on A and B if we want to apply the earlier theorem. In addition, it clearly brings out an important limitation of the criterion that we stated before. Indeed, the search of the explicit constraints on C 0 can be di¢cult in quite a few models. It is for this reason that we are going to derive a criterion which directly involves the structural matrices A0 and B0 . But, before that, it may be useful to derive two relatively simple rank conditions that apply in the frequent case in which the only a priori constraints, apart from the normalisation rule, are the exclusion relations. The …rst one involves the so-called ”selection matrices”. Given the constraints, examining the identi…ability of the g-th equations amounts to examining the passage from the reduced form to the structural form in the following equation :

0 0 ®0gSag C + ¯0g Sbg = s 0gC

This system is linear given C and can be written as :

[15.2]

First-order identi…cation

0

(C Sag Sbg )

µ

®g ¯g



339

= C 0 sg

Thus we derive the following criterion :

Theorem 2.3 The g-th equation is identi…able if and only if

rank(C 0SagSbg) = Ng + Kg Another useful criterion can be derived as follows. Let us rewrite the relation a0gC + b0g = 0 taking into account the normalisation rule and the exclusion relations :

¡

®0g

0

1 C C gg g¹ g ¢ ¢ ¡ ¡1 0 @ cgg cg¹g A + ¯0g ; 0 = 0 C ¹gg Cg¹¹g

or ½

®0g Cgg + ¯ 0g = cgg ®0gCg¹g = cg¹g

Assuming that C is known and hence that Cgg, cgg and C¹gg are known, we note that given ®g we can determine ¯g from the …rst equation. Thus we are led to study solutions for ®g of the system ®0gC ¹gg = cg¹g . This gives the following criterion : Theorem 2.4 The g-th equation is identi…able if and only if

rank(C¹gg) = Ng The above two conditions are still not very practical. However, they will be very useful later in the study of the asymptotic properties (convergence in probability, convergence in distribution) of estimations.

340

Identi…cation

[Chap. 15]

15.2.2 Equivalent Structures Presentation of the Problem and De…nition of Equivalent Structures

We saw that the

identi…cation problem can be examined as one of the passage from the matrix C to the matrices A and B. It is therefore natural to try to characterise the matrices A and B which lead to the same matrix C. This in turn brings us to the notion of equivalent structures and then to the identi…ability criteria based on such equivalent structures. We adopt the following de…nition :63 De…nition 2.1 : Let (S; Ã) be a model. We will say that a structure S1 = (A1; B 1) is equivalent to S0 = (A0; B0 ) if and only if a) it has the same reduced form b) the matrices A1 and B1 verify the a priori restrictions : Ã(A1; B 1) = 0 (j = 1; :::; J). This de…nition enables us to characterise all the structures equivalent to S0 as follows: Theorem 2.5 : Let (S; Ã) denote a model. The structure S1 is equivalent to S 0 if and only if there exists a non-singular matrix M such that (i) A1 = M A0 and B1 = MB 0 (ii) Ãj (M A0; M B0 ) = 0 (j = 1; :::; J) Proof 2.1 From part a) of the above de…nition of equivalent structures, we know that S1 and S 0 have the same reduced form i.e. ¡(A1)¡1B 1 = ¡(A0 )¡1 B0 or B 1 = A1(A0 )¡1 B0 Let M = A1(A0)¡1 The above relation can then be written as : B1 = MB 0 Further, we have A1 = MA0 which proves part (i) of the theorem. For proving part (ii), it is su¢cient to note that, from part (b) of the de…nition of equivalent structures, we should have Ãj (A1; B 1) = 0 i.e. Ã j (A0; B0 ) = 0 for j = 1; :::; J. 63

In the de…nition that follows, we will assume that the matrices Ao and A1 are non-singular.

[15.2]

First-order identi…cation

341

It is important to remember the essential result that comes out of this theorem : If we want to study the structures that are eventually equivalent to a given structure S 0, it is su¢cient to consider only linear combinations of S 0. Let us illustrate these de…nitions using the model : ½ a11 y1 + a12y2 + b11 x = u1 a12 y1 + a22y2 + b21 x = u2 Suppose that the a priori constraints are : (i) the normalisation rule a 11 = a22 = 1 and (ii) b11 = 0. We then have ¶

¶ 0 B = A = b021 Let us …nd the structures M(S 0) that are equivalent to S0. Let us further write M as 0

µ

1 a012 a021 1

M=

µ

µ

0

m11 m12 m21 m22



We will see that the (non-singular) M matrices form an ”admissible transformation” if and only if they have a particular con…guration. That is, M A0 and M B 0 will satisfy the a priori constraints if and only if 8 < m11 + m12a 021 = 1 m21 a012 + m22 = 1 : m12b021 = 0

We conclude from the last equation that m12 = 0 and therefore, from the …rst equation, that m11 = 1. Thus, the admissile transformations are of the form :

M=

µ

1 0 m21 m22



Note, furthermore, that the structures M (S0 ) that are equivalent to S0 are such that ¶ 1 a012 MA = m21 + m22 a021 1 In general, imposing the a priori restrictions on M(S 0) reduces the set of transformations 0

µ

equivalent to S 0. This reduction is related to the notion of identi…cation that we are going to de…ne now : De…nition 2.2 : Let (S; Ã) be a model. We will say that :

342

Identi…cation

[Chap. 15]

1) a structural parameter is identi…able (at the …rst order level) if and only if it retains the same value in all structures M (S0 ) equivalent to S 0.64 2) an equation is identi…able if all its parameters are. 3) a model is identi…able if all its equations are. We will illustrate the above de…nition using the same example that we considered earlier. The structures M(S 0) are characterised by the matrices MA0 and M B0 given above. We see that the coe¢cient a 012 is the same in all the structures equivalent to S 0. This coe¢cient is thus identi…able according to our de…nition. In addition, given the normalisation rule and the constraint b11 = 0, we see that all the parameters of the …rst equation are invariant (therefore identi…able), which implies that the …rst equation is identi…able. On the other hand, the coe¢cient a021 is changed into m21 + m22a 021. This coe¢cient is therefore not transformation-invariant and hence not identi…able. The second equation is thus not identi…able as one of its parameters is not. Let us give an intuitive interpretation of these results. Indeed, explicitly taking into account the constraint b11 = 0 and the normalisation rule, the model can be written as: ½

y1 + a12y2 = u1 a12y1 + y2 + b21 x = u2

First of all, we note that we cannot distinguish between the second equation and any linear combination of the …rst and the second. This is what the non-identi…ability of the second equation means. On the contrary, any linear combination of the …rst and the second equations will be di¤erent from the …rst equation at least by the presence of the exogenous variable x, and this is the meaning of the identi…ability of the …rst equation. The Case of Linear Intraequation Restrictions

We are going to derive an identi…cation

criterion for a single equation (say the g ¡ th one) when the a priori constraints on its structural parameters are as before given by:

p0g ©g = ¸0g where p0g is the g ¡ th row of P = [AB]. Let P 0 = [A0B 0]. We will see that the de…nition given in the previous section enables us to show the following 64

More generally, one can de…ne the identi…ability of a linear combination of coe¢cients.

[15.2]

First-order identi…cation

343

result : Theorem 2.6 (Rank Condition for the structural form in the presence of intra-equation constraints : A Necessary and Su¢cient Condition). The g ¡ th equation is identi…able if and only if the rank of P 0 ©g is equal to G. Proof 2.2 Let M be a non-singular matrix and let us consider a structure M (S0 ) equivalent to S0 . The matrix P 0 becomes M P 0 . This matrix should satisfy the a priori constraints 0 p00 g ©g = ¸g

If m0g is the g ¡ th row of M , we should have m0gP 0 ©g = ¸0g But, we know, by construction, that the structure satis…es the a priori constraints i.e. 0 p00 g ©g = ¸g

This relation can also be written as s 0gP 0© g = ¸0g where sg is a G £ 1 vector with a 1 in the g ¡ th position and zero elsewhere. Now, it is equivalent to say that the g ¡ th equation is identi…able if and only if, in all the structures equivalent to S 0, we have m0g = s 0g. It is equivalent to say that the solution of x0 P 0 ©g = ¸0g for x is unique in x and is equal to sg . This result is veri…ed if and only if the rank of P 0© g, which is of dimension (G; Rg), is equal to G This completes the proof of the theorem. Let us apply the theorem to the last example considered earlier. We have, for the …rst equation, µ

1 0 P ©1 = a021 b021 taking into account the a priori constraints. 0



It is obvious that this matrix is of rank 2 and therefore the …rst equation is identi…able. But, in the case of the second equation and given the a priori constraint a022 = 1, ¶ a012 P ©1 = 1 This matrix is of rank 1 and hence the second equation is not identi…able as the rank 0

µ

condition for the structural form is not satis…ed. From the above theorem, we can derive the same necessary condition for identi…ability (of …rst order) as the one obtained earlier. The above theorem states the rank condition

344

Identi…cation

[Chap. 15]

that P 0© g should be of rank G. Now, the matrix P 0©g is of dimension (G; Rg ). Therefore, in order that the rank of P 0© g = G, it is necessary (and not su¢cient) that $Rg ¸ G. In other words, it is necessary that the number of a priori constraints on the g ¡ th structural equation should be greater than or equal to the number of equations. Remark 2.1 : We have stated two necessary and su¢cient conditions of identi…ability one concerning the structural form and the other concerning the reduced form. We can verify that these two conditions are equivalent by means of the following theorem :

Theorem 2.7 : rank

·µ

C0 IK



¸

; ©g = rank [(A0 ; B0 ) © g] + K

Proof 2.3 Let us partition © g as ©g =

µ

©g1 ©g2



where ©g1 is of dimension (G; Rg) and ©g2 of dimension (K; Rg ). We have ¡ 0 0¢ A ; B © g = A0©g1 + Bo ©g2 and

¶ ¸ µ 0 ¶ C ©g1 C0 ; ©g = IK IK ©g2 By using some elementary properties of the operator rank, we can see that µ ¶ £¡ 0 0 ¢ ¤ 0 A0© g1 + B o©g2 rank A ; B ©g + K = rank IK ©g2 Further,

·µ

¶ µ 0 ¶µ 0 ¶ A B0 C © g1 C 0 ©g1 = rank rank I 0 IK IK © g2 K ©g2 µ 0 ¶ 0 A B as the matrix is non-singular. 0 IK We thus derive that ¶ µ 0 0 ¶ µ 0 A C + B 0 A0© g1 + B 0©g2 C © g1 = rank rank IK © g2 IK © g2 µ ¶ 0 0 0 A © g1 + B ©g2 = rank IK © g2 µ

since A0C 0 + B0 = 0. This completes the proof. Now we know the general criteria of identi…ability of …rst order in the presence of intraequation constraints. One important consequence of the above analysis is that such a priori restrictions have to be necessarily imposed on a model in order to identify its parameters or equations.

[15.3]

Under-Identi…cation, Just-Identi…cation and Over-Identi…cation

345

15.3 Under-Identi…cation, Just-Identi…cation and Over-Identi…cation

As we mentioned just now, the general criteria of identi…ability (of …rst order) of an equation have been established. In this section, we will examine a few more related points which have an impact on the method of estimation. Earlier, we looked at the identi…ability question from the point of view of the passage from the reduced form to the structural form. It seems quite natural to use this approach in the estimation of the structural form parameters. In this context, it becomes relevant to ask the following question : what are the consequences, on the reduced form if there are more a priori restrictions than what is su¢cient to identify an equation? The second example in paragraph II.1.1. shows that the identi…ability of an equation may sometimes lead to certain constraints on the elements of C . The object of this section is to in fact examine such situations in a more general manner. Let us consider the following model of 2 equations :

½

y1 + a12y2 + b12x2 + b13x3 = u1 a21y1 + y2 + b21x1 = u2

The reader can verify that both the equations are identi…able, by applying the criteria developed before. Now, let us examine the second equation. We note that the two exogenous variables x2 and x3 are absent in this equation. It can be easily seen that the absence of either one of them (i.e. either x2 or x3 ) is enough for the identi…ability of the equation. It is in this sense that we say that the number of a priori restrictions on the second equation (2) is more that what is su¢cient (1) to make it identi…able. Now, let us look at the passage from the reduced form to the structural form, in this example. The system A¤ C 0 + B¤ = 0 leads to the following : First equation 8 0 < c11 + a12c021 = 0 c0 + a12c022 + b12 = 0 : c12 0 0 13 + a12c22 + b 13 = 0

346

Identi…cation

[Chap. 15]

Second equation

8 < a21c011 + c021 + b21 = 0 a c0 + c022 = 0 : 21 12 a21c013 + c023 = 0 The last two equations above imply that we can obtain a21 either by the relation a21 =

¡c022 c012

0

or by the relation a21 = ¡ cc23 0 . Given that the second equation is identi…able this is possible 13

only if the elements of C 0 satisfy the constraint c022 c013 ¡ c012c023 = 0. On the other hand, the identi…ability of the …rst equation does not lead to any constraint on the reduced form. How can we use this result in the estimation of the structural form ? Without going into the details of any etimation procedure, let us suppose that we use a procedure that enables us to estimate C under the constraint c22c13 ¡ c12c23 = 0, in a consistent way. We can then derive consistent estimators of S and B by using the above relations. This method is a generalisation of the method of indirect least squares that we referred to in the introductive chapter. Let us now see when we can estimate the reduced form without taking into account the constraint c22c13 ¡ c12 c23 = 0. In small samples, there is no reason why the estimators of C 0 should satisfy the constraint. However, if the model is correctly speci…ed, this (unconstrained) estimator will converge (in probability) to C 0 and its elements will asymptotically satisfy this constraint. In other words, we would have two estimators of a21 depending on whether we use the estimator of these two estimators converge to a021 = ¡

c022 c012

=

c023 c013

c22 c12

or that of

c23 c13 .

But, asymptotically,

.

Without anticipating on the estimation procedures, it seems natural to state the following conjectures based on the above remarks : i) There may be several estimation procedures to apply to the …rst equation and they may all lead to the same estimators in small samples. ii) These procedures, when applied to the second equation, may yield di¤erent estimators of the parameters, in small samples but which will all be consistent. In the following chapters, we will indeed verify these conjectures. For the moment, we are going to introduce some new general de…nitions keeping in mind the estimation problems.

[15.3]

Under-Identi…cation, Just-Identi…cation and Over-Identi…cation

347

15.3.1 Notions related to the Reduced Form Estimation : Simple Models and Over-identi…ed Models The …rst step to be taken in the estimation of the reduced form is to …nd out if the a priori constraints on the structural form lead to any constraint on the reduced form. Once again, without going into the chapters on the estimation methods, it is clear that if the a priori restrictions on the structural form do not lead to any constraint on the reduced form (i.e. if the constrained reduced form is the same as the unconstrained one), then the estimation of the reduced form is particularly simple as it boils down to a multivariate regression. We will say that a model (S; Á) is ”simple” if the a priori restrictions Á on the structural form S = (A; B) do not lead to any restriction on the reduced form C = ¡A¡1B. On the contrary, if they lead to certain restrictions on the reduced form, then we say that the model is ”over-identi…ed”. As an illustration, let us reconsider the second example of II.1.1. The model is : ½

a11y1 + a12y2 + b11x1 + b12x2 = u1 a12y1 + a22y2 + b21x1 + b22x2 = u2

with a11 = a22 = 1, b11 = b12 = 0. We saw that the …rst equation is identi…able but the second one is not. Further, the a priori restrictions on the …rst equation lead to non-linear constraints on the elements of

C=

µ

c11 c12 c21 c22



More precisely, we have c22c11 ¡ c12c21 = 0. According to the above de…nition, we will say that, from the point of view of the reduced form, the model is over-identi…ed. It is important to stress the clause ”from the point of view of the reduced form” as this example clearly shows that a model, seen from the angle of the reduced form, can be over-identi…ed without each structural equation being necessarily identi…able. The same remark holds for simple models. It may be interesting to develop a criterion (a necessary and su¢cient condition) which will enable us to state if a model is simple or over-identi…ed, from the point of view of the reduced form. Unfortunately, even in the case of intra-equation constraints, such a criteria

348

Identi…cation

[Chap. 15]

is di¢cult to establish (see, for example, Malinvaud (1978), page 721).

15.3.2 Notions related to structural Form Estimation : Under-identi…ed, Just-identi…ed and Over-identi…ed Equations (at the …rst order level, in presence of linear intra-equation constraints) We are now going to assume that an equation is identi…able and ask ourselves the following question : do the a priori restrictions (which make it identi…able) lead to any constraints on the reduced form? It is important to note that we are going to ask this question for each equation separately and not for any sub-system or the whole system of equations. Indeed, from the estimation point of view, we would like to know if we can obtain the parameters of particular structural equation from the estimation of the unconstrained reduced form. In this context, we adopt the following de…nition : De…nition 3.1 : Consider an identi…able structural equation of a model (S; Á) where the restrictions Á are of the intra-equation type. We will say that this equation is just-identi…ed if the restrictions concerning the equation do not imply themselves any restrictions on the reduced form. On the contrary, if they do so, then we will say that this equation is over-identi…ed. To illustrate the above de…nition, let us reconsider the two examples of II.1.1. In both of them, the …rst equation is identi…able. However, it is just-identi…ed in the …rst example and over-identi…ed in the second one. Note that we have explicitly assumed in our above de…nition that the equation under consideration is iden…able. In case an equation is not identi…able we will say equivalently that it is under-identi…ed. 65 When the intra-equation restrictions are linear, there exists a simple criterion which enables us to determine whether an equation is just-identi…ed or over-identi…ed. In this case, the appropriate question to be asked is : what are the conditions to be satis…ed by C in order that the following system has at most one solution for (a0g; b0g):

a0gC + b0g = 0 65 That there are two di¤erent terminologies in this case is due to the fact that the concept of under-identi…cation concerns the whole of the equation; there will not be any sense in applying it to a single parameter, for instance.

[15.3]

Under-Identi…cation, Just-Identi…cation and Over-Identi…cation

349

where

or by noting W =

µ

C I

p0g© g = ¸0g

¶ ½

p0g W = 0 p0g ©g = ¸0g Given W and ©g , it is a linear system of K + Rg equations in G + K unknowns. Also, if the number of equations is greater than the number of unknowns (i.e. Rg > G) such a system has no solution in general. But, we know that the g ¡ th equation is identi…able and hence there is a solution. In order to reconcile these two results, we have to assume that there are some restrictions on the matrix (W; ©g ) i.e. among the columns of this matrix. But these restrictions cannot concern the columns of ©g as we have assumed that this matrix is of rank Rg. Therefore, the restrictions can only concern the columns of W i.e. actually only the columns of C. To sum up, if the system admits a solution in the case Rg > G it means that there are some restrictions on the elements of C. This is in fact what we mean by over-identi…cation. We are now going to summarize the di¤erent possibilities considered above in the form of a theorem. Theorem 3.1 Consider a model (S; Á) with intra-equation linear restrictions. Let us consider the g ¡ th equation and assume that the number of restrictions (of …rst-order) is Rg. Then, (i) if Rg < G, the g ¡ th equation is under-identi…ed (since it is non-identi…able, the rand condition of rank P 0© g = G being violated) (ii) if rank P 0© g = G, the g ¡ th equation is identi…able and if

Rg = G

the g ¡ th equation is just-identi…ed

Rg > G

the g ¡ th equation is over-identi…ed

At this stage, it can be noted that, in a model where all the equations are identi…able, the maximum possible number of linear, independent intra-equation restrictions that can be

350

Identi…cation

[Chap. 15]

introduced without leading to any restrictions on the reduced form, is equal to G 2. Moreover, there are cases in which Rg ¸ G with rank P 0 ©g < G all the same, that is, cases in which the g ¡ th equation is not identi…able. For instance, it is the case, for the …rst equation of the model ½

y1 + a 12 y2 + x1 + b12 x2 = u1 y2 + b22 x2 = u2

that we already studied in paragraph II.1.2. Since we frequently come across the case in which the a priori restrictions are given by exclusion relations concerning the elements of the g ¡ th equation, it is useful to explicit the above theorem in such cases. Let us start by di¤erenciating between the elements of a g and bg that are a priori assumed to be zero and those that are not. Also, let us re-arrange the elements of C in such a way that the relation

a0gC + b0g = 0 can be written as

¡

®0g ¡ a gg

1 C C gg g¹ g ¡ ¢ ¢ 0 @ cgg cgc A + ¯0g 0 = 0 Cg¹g C¹g¹g 0

Note that we have not explicitly taken into account the normalisation rule agg = 1. Indeed, in certain estimation methods66 or in some particular models (for example the supply-demand model), we would be made to include one additional constraint on the elements of ®g and agg which is not the usual normalisation rule agg = 1. But, in all the cases, we always have Rg = N¹g + K ¹g + 1. Therefore, we can apply the above theorem using this particular value of Rg 67 The table on the next page shows some simple examples representing di¤erent situations 66

especially in the case of limited information maximum likelihood method which will be studied in a later chapter. Since G = Ng + N¹g + 1, comparing Rg and G is equivalent to comparing Kg¹ and Ng . In particular, from the above theorem, it follows that if the g ¡ th equation is identi…able, then : 67

Kg¹ = Ng

the g ¡ th equation is just-identi…ed

Kg¹ > Ng the g ¡ th equation is over-identi…ed Let us add that usually, the quantity vg = Kg¹ ¡ Ng is called the ”degree of over-identi…cation”.

[15.4]

Extensions

351

that can arise in a model of two equations. Till now, we have considered relatively simple cases in which the a priori information was given by linear constraints on the …rst-order paramaters of the same equation. Even though these situations are the most frequent ones (especially in the case of exclusion restrictions only), it is useful to derive identi…ability results in the case of other forms of a priori informations.

15.4 Extensions First, we will start with cases which are relatively straight forward extensions of the ones already studied.

15.4.1 First-order Identi…cation in the presence of linear Intra-equation Constraints This sub-section deals with models in which the a priori information consists of linear relations among the coe¢cients of matrices A and B, concerning di¤erent equations. In this case, by applying the relatively simple technique of vectorization on the model and on the constraints, we can establish results that are similar to the ones arrived at earlier. Let us rewrite the set of constraints (including the normalisation rule) as follows :

³

P

0v

´0

© = '0 0

Having written the constraints in terms of the vector P v, we can apply the same criteria developed earlier to the model as a whole. Examples of two-equation models corresponding to di¤erent situations in the case of exclusion constraints and the normalisation rule For instance, starting from a structure S 0 de…ned by P 0 = (A0; B 0), we see that a structure S is equivalent to S 0 if and only if there exists a non-singular matrix M such that P 1 = (A1; B 1) = (M A0; M B0 ) and

352

Identi…cation

[Chap. 15]

h³ i ´ 0v 00 IG -P M © = '0

¡ 0¢ 0 using (P 0M 0 )v = IG -P M v.

By using the same procedure as that which enabled us to derive the rank condition for the structural form (cf. VI.3.2), it is easy to establish the following result : Theorem 4.1 . The model is (…rst-order) identi…able if and only if the rank of (IG -P 0) © is equal to G2: We will not examine in more detail the models with linear inter-equation constraints. The reader is referred to Wegge (1965), Rothenberg (1971), Richmond (1974), Kelly (1975) and Monfort (1978), for example, for more detailed treatment of such models.

15.4.2 First-Order Identi…cation in the presence of non-linear intra-equation Constraints In the case of non-linear constraints, we can only obtain local identi…cation conditions unlike in the case of linear constraints in which the criteria obtained were global. Further, only su¢cient conditions can be derived in general. Let us suppose that the a priori information on the g ¡ th equation consists of Rg nonlinear constraints :

©rg(pg) = ¸rg

(r = 1; :::; Rg)

or in vector form, ©g (pg) = ¸g . The following result can be shown in this case (see Fisher (1976) and other earlier references contained therein) : Theorem 4.2 . A su¢cient condition for the g ¡ th equation to be locally identi…ed is g (pg ) be equal to G. that the rank of P ¢ @©@p g The reader can also refer to Rothenberg’s article (1971) for more general theoretical results.

[15.4]

Extensions

353

15.4.3 Identi…cation at the …rst two order levels Sometimes, we may have a priori information on the variance-covariance matrix of the errors, which can take the form of certain zero variances (for example, if the model consists of identities) or certain zero covariances. It is possible to study the identi…cation problem in general, in the case of a model when the a priori restrictions concern both the …rst-order and the second-order coe¢cients. Here, we will not go into the study of such cases. The reader is invited to consult the book by Fisher (1976). Other relevant references are the article by Koopmans, Rubin, and Leipnik (1950) (which is a basic reference paper), the article by Wald (1950) and that of Mallela and Patil (1976). It may be added that most of the results in the above papers are special cases of the general results of Wegge (1965) and Rothenberg (1971). In one of the following chapters, we will come across a particularly important case of identi…cation at both the …rst and the second orders, which is the so-called ”recursive” model. We will then see that an equation can be identi…ed at both the …rst and second orders without being identi…ed at the …rst-order only.

15.4.4 Study of identi…cation based on the conditional probability law of the endogenous variables given the exogenous variables The approach of looking at the identi…cation problem from an angle of the conditional probability law of the endogenous variables given the exogenous variables is more general than the approach based on the moments of the …rst two orders only. However, this approach is only of limited scope in practice as the economist does not have, in general, a priori information in the form of conditional probability law. Moreover, if we are ready to assume normality, which is most frequently the case, then this approach leads to the same results as those obtained in the earlier approach con…ned to the …rst two moments only. This is due to the fact that the normal distribution is entirely determined if we know its …rst two moments. After all, it is interesting to have a brief look at this approach. Let us consider the model A0y + B 0x = u and assume that the disturbance vector u is distributed according to a continuous probability law with a conditional density with respect

354

Identi…cation

[Chap. 15]

to x denoted as f 0 (u j x). Let us also assume that u is independent on x i.e.

f 0(u j x) = f 0(u) As we have assumed that A0 is non-singular, the conditional law of y given x is : ¯ ¯ g 0(u j x) = ¯det A0¯ f 0(u)

Now, let us ask the following question : are there other models of the same type i.e. other matrices A and B for which the conditional law of y given x is equal to g 0 (u j x) ? For this, let us consider another model :

A 1 y + B 1 x = u1 and let us assume that u1 has the density f 1 (¢). We will say that the two models are indistinguishable if the conditional laws of y given x are the same in both the models. More precisely, we will say that the two structures de…ned by (A0; B 0; f 0 (u j x)) and (A1; B1 ; f 1(u1 j x)) are indiscernible if they imply the same conditional distribution of y given x.68 Therefore, the two structures are indiscernible if the conditional law of y given x in the model A1 y + B 1x = u1 is g 0(u j x) when the conditional law of u given x is f 1 (u1 j x). But, by replacing y by ¡A0¡1 B0 x + A0¡1u, it is equivalent to say that the two structures are indiscernible if and only if u1 = A1 (¡A0¡1 B0 x + A0¡1u) + B 1x has the conditional law f 1(u1 j x) when the conditional law of u is f 0 (u j x). Now, consider the joint law of (u1; x) in the transformation : ½

u1 = A1A0¡1 u + (B1 ¡ A1A0¡1 B0 )x x =x

The law of (u1; x) is ¯ ¯ £ ¤ h(u1; x) = ¯< det(A0 A1¡1)¯ f 0 A0A1¡1 u1 ¡ A0(A1¡1 B1 ¡ A0¡1 B0 )x j x

and hence, given the independence between u and x, we have

68 This de…nition as well as the general idea constained in the derivations that follow are used by Brown (1982) in his study of identi…cation of models non-linear in variables.

[15.4]

Extensions

355

¯ ¯ £ ¤ f 1(u1 j x) = ¯det(A0A1¡1) ¯ f 0 A0 A1¡1u1 ¡ (A0 A1¡1B 1 ¡ B0 )x

One should note here that the disturbance u1 must satisfy the stochastic assumption of independence between the disturbances and the exogenous variables. We see that f 1 (u1 j x) is independent of x if and only if

A0A1¡1 B1 ¡ B 0 = 0 Let

M = A1A0¡1 i.e.

A1 = MA0 The above condition can be written as

B1 = MB 0 To sum up, we have shown that the two structures are indiscernible if and only if there exists a non-singular matrix M such that ½

A1 = M A0 B1 = M B0

Thus, the conditional law of y given x resulting from the model A1y + B1 x = u1 is identical to the one resulting from the model A0y + B 0x = u if and only if the former model is a linear transformation of the latter. This transformation is obtained by multiplying A0 y + B 0x = u by a non-singular matrix M . Note that this implies that each equation of the model A0y + B 0x = u is replaced by a linear combination of its equations. We have now obtained a characterisation of indiscernible models similar to that of equivalent structures of paragraph II.3. We can therefore set out an identi…ability de…nition similar to the one given in paragraph II.3 by replacing the term ”equivalent structures” by ”indiscernible models”.

356

Identi…cation

[Chap. 15]

There are several other identi…ability criteria for ”non-standard” models like the nonlinear model, the dynamic model etc. We will mention those as and when we deal with each model. Conclusion In this chapter, we have examined various identi…ability criteria for simultaneous equation models. At this stage, it may be relevant to brie‡y discuss how far the identi…cation concepts are really useful in econometrics. In this context, it is no doubt useful to distinguish between ”big models” and ”small models”. Regarding the former, we do have only a few general results that are moreover di¢cult to apply. However, the identi…ability concepts can help us to be sure that estimating an equation of the model in fact makes sense as an un-identi…able equation should not estimated. The numerical values of the coe¢cients obtained by estimating such an equation will have no meaning. Usually, in applied econometrics, one assumes that any equation of a ”big model” is in general ”strongly over-identi…ed” as a great number of exogenous variables do not appear in the equation. Even if this is true, we should keep in mind the price that one has to eventually pay for assuming such an over-identi…cation : by omitting many exogenous variables we run a risk of mis-speci…cation of the equations of the model. On the other hand, in the case of ”small models”, the application of the identi…ability criteria can be very useful in the sense that by taking account of all the a priori restrictions on the coe¢cients, we may obtain e¢cient estimators. Of course, there is no point in imposing (”over-identifying”) constraints on the parameters of a model, just to make it identi…able. These restrictions have to be correctly speci…ed and it is for this reason that it is important to perform tests of hypotheses on the (over-) identi…able constraints. We will see, in a later chapter devoted to hypothesis testing, di¤erent ways of tackling this problem.

357

358

References [1] Bowden, R. (1973), ”The theory of parametric identi…cation”, Econometrica, 41, 10691074. [2] Brown, B. (1982) [3] Fisher, F.M. (1976), The identi…cation problem in econometrics, Mc Graw-Hill, NewYork, 2nd edition. [4] Kelly, J.S. (1975), ”Linear cross-equation constraints and the identi…cation problem”, Econometrica, 43, 125-140. [5] Kelly, J.S. (1975), ”Linear cross-equation constraints and the identi…cation problem”, Econometrica, 43, 125-140. [6] Koopmans, T.C., H. Rubin and R.B. Leipnik (1950), ”Measuring the equation systems of dynamic economics”, in Statistical inference in dynamic economic model, Cowles Commission Monograph 10, John Wile and Sons, New York. [7] Malinvaud, E. (1978), Méthodes statistiques de l’econométrie. Paris, Dunod (3rd edition). [8] Mallela, P. and G.H. Patil (1976), ”On the identi…cation with covariance restrictions : a note”, International Economic Review, 17, 741-750. [9] Monfort, A. (1978), ”First-order identi…cation in linear models”, Journal of Econometrics, 7, 333-350. [10] Richmond, J. (1974), ”Identi…ability in linear models”, Econometrica, 42, 731-736. [11] Rothenberg, T.J. (1971), ”Identi…cation in parametric models”, Econometrica, 39, 577592.

[15.b]

References

359

[12] Wald, A. (1950),”Note on the identi…cation of economic relations”, in Statistical inference in dynamic economic model, Cowles Commission Monograph 10, John Wiley and Sons, New York. [13] Wegge, L.L. (1965), ”Identi…ability criteria for a system of equations as a whole”, Australian Journal of Statistics, 3, 67-77.