Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes

Bernoulli 10(4), 2004, 605–637 Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes C H R I S T I A N F R A N C Q 1 and J E A N -M IC...
Author: April Kelley
6 downloads 0 Views 271KB Size
Bernoulli 10(4), 2004, 605–637

Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes C H R I S T I A N F R A N C Q 1 and J E A N -M IC H E L ZA KO I¨ A N 2 1

Universite´ Lille 3, GREMARS, BP 149, 59653 Villeneuve d’Ascq Cedex, France, E-mail: [email protected] 2 Universite´ Lille 3, GREMARS and CREST, 3 Avenue Pierre Larousse, 92245 Malakoff Cedex, France, E-mail: [email protected] We prove the strong consistency and asymptotic normality of the quasi-maximum likelihood estimator of the parameters of pure generalized autoregressive conditional heteroscedastic (GARCH) processes, and of autoregressive moving-average models with noise sequence driven by a GARCH model. Results are obtained under mild conditions. Keywords: ARMA; asymptotic normality; consistency; GARCH; heteroskedastic time series; maximum likelihood estimation

1. Introduction Since the seminal papers by Engle (1982) and Bollerslev (1986), generalized autoregressive conditional heteroscedastic (GARCH) processes have received considerable attention in the literature devoted to the analysis of financial time series. These time series models capture several important features of financial series, such as leptokurticity and volatility clustering – see Mikosch (2001) for a recent paper on GARCH and stochastic volatility models. The asymptotic properties of the quasi-maximum likelihood estimator (QMLE) were first established by Weiss (1986) for ARCH models, under fourth-order moment conditions on the ARCH process. Unfortunately, these conditions are typically violated when GARCH models are estimated on financial data. The problem of finding weak assumptions for the consistency and asymptotic normality of the QMLE in GARCH models has attracted much attention in the statistical literature. To our knowledge, the most significant contributions on the theoretical properties of the QMLE in GARCH models are those of Lee and Hansen (1994) and Lumsdaine (1996), both for the GARCH(1, 1) case, Straumann and Mikosch (2003) for a general heteroscedastic model including GARCH(1, 1), and Boussama (1998; 2000), Berkes and Horva´th (2003a; 2003b) and Berkes et al. (2003) for general GARCH ( p, q). The latter reference gives rigorous proofs of the strong consistency and asymptotic normality, under assumptions which we discuss in Section 2. The first goal of the present paper is to establish, under weaker conditions than those in the existing literature, the convergence and asymptotic normality of the QMLE for the GARCH( p, q) process defined in (2.1) below. We will provide asymptotic results requiring 1350–7265 # 2004 ISI/BS

606

C. Francq and J.-M. Zakoı¨an

strict stationarity but no moment assumption. An alternative method of proof allows us to weaken some of the technical assumptions used in the above references. Our second goal is to extend these asymptotic results to ARMA-GARCH processes. In financial applications, it is common practice to fit return series by autoregressive movingaverage (ARMA) models with GARCH innovations. It is therefore of interest to analyse the properties of QMLEs of ARMA-GARCH processes. As we will see, the extension leads to non-trivial problems. Recent works on the estimation of ARMA-GARCH processes are Ling and Li (1997; 1998) and Ling and McAleer (2003a). Comments on these papers are provided in Section 3. See also Ling and Li (2003) and Ling and McAleer (2003b) for related work. The paper is organized as follows. Section 2 presents the assumptions for the GARCH model and states our results on this class. Section 3 is devoted to the ARMA-GARCH class. The proofs are postponed to Section 4. The following notation will be used throughout. The norm of a matrix A ¼ (a ij ) is P defined by kAk ¼ ja ij j. The spectral radius of a sqaure matrix A is denoted by r(A). The Kronecker product is denoted by . The symbol ) denotes convergence in distribution.

2. The pure GARCH( p, q) case Consider the GARCH( p, q) model Et ¼

pffiffiffiffiffiffiffiffiffi ht  t ,

ht ¼ ø0 þ

q X

Æ0i E2ti þ

p X

i¼1

0 j h t j ,

8t 2 Z,

(2:1)

j¼1

where ( t ) is a sequence of independent and identically distributed (i.i.d.) random variables such that E2t ¼ 1, ø0 . 0, Æ0i > 0 (i ¼ 1, . . . , q) and 0 j > 0 ( j ¼ 1, . . . , p). Bougerol and Picard (1992) showed that a unique non-anticipative strictly stationary solution (E t ) to model (2.1) exists if and only if the sequence of matrices A0 ¼ (A0 t ), where 0

Æ01 2t B 1 B B 0 B . B . B . B B 0 A0 t ¼ B B Æ01 B B 0 B B 0 B . @ .. 0

Æ0q 2t 0 0 .. .

01 2t 0 0 .. .

0 1 .. .

   .. .

..

 1    .. .

0 Æ0q 0 0 .. .

0 01 1 0 .. .

 0

0

0

.

   .. .. . .  0  0  1  .. .. . .  1

1 0 p 2t 0 C C 0 C .. C C . C C 0 C C, 0 p C C 0 C C 0 C .. C . A 0

Maximum likelihood estimation of GARCH processes

607

has a strictly negative top Lyapunov exponent, ª(A0 ) , 0, where ª(A0 ) :¼ inf t2N

1 1 E(logkA0 t A0, t1 . . . A01 k) ¼ lim logkA0 t A0, t1 . . . A01 k a:s: t!1 t t

(2:2)

The definition of ª(A0 ) does not depend on the choice of a norm on the space of the ( p þ q) 3 ( p þ q) matrices. The second equality in (2.2) is a consequence of the subadditive ergodic theorem (Kingman, 1973). Note that the existence of ª(A0 ) is guaranteed by the inequality E(logþ kA01 k) < EkA01 k , 1. Let z t ¼ (E2t , . . . , E2tqþ1 , ht , . . . , h t pþ1 )T 2 R pþq and b t ¼ (ø, 0, . . . , 0)T 2 R pþq . Then (2.1) is equivalently written as a vector stochastic recurrence equation z t ¼ bt þ A0 t z t1 ,

(2:3)

and if ª(A0 ) , 0, the unique strictly stationary solution to (2.3) is zt ¼ bt þ

1 X

A0 t A0, t1 . . . A0, t kþ1 b t k :

(2:4)

k¼1

In view of (2.2), and using the Jensen inequality, it is clear that the conditions E(logkA0 k A0, k1 . . . A01 k) , 0,

for some k . 0,

and r(EA01 ) , 1,

i:e:

q X

Æ0i þ

i¼1

p X

0i , 1,

(2:5)

i¼1

imply ª(A0 ) , 0. Note, however, that the sufficient condition (2.5) is much stronger than the strict stationarity condition ª(A0 ) , 0 ((2.5) implies EE2t , 1). Two well-known consequences of the strict stationarity condition are stated in the following proposition. We refer to Bougerol and Picard (1992) for the proof of the first result, and to Nelson (1990) and Berkes et al. (2003, Lemma 2.3) for the second. Proposition 1. If ª(A0 ) , 0, then the following equivalent conditions hold: Pp (a) j¼1 0 j , 1. (b) The roots of the polynomial 1  01 z  . . .  0 p z p are outside the unit disc. (c) r(B0 ) , 1, where 1 0 01 02    0 p B 1 0  0 C C B 0 1  0 C: B0 ¼ B B . .. C .. .. @ .. . A . .  1

0

0

Moreover, if ª(A0 ) , 0, then there exists s . 0 such that Eh st , 1

and

EE2s t , 1:

608

C. Francq and J.-M. Zakoı¨an

We now turn to the QML estimation of model (2.1). The vector of parameters is Ł ¼ (Ł1 , . . . , Ł pþqþ1 )T ¼ (ø, Æ1 , . . . , Æ q , 1 , . . . ,  p )T and belongs to a parameter space ¨  ]0, þ 1[3[0, 1[ pþq . The true parameter value is unknown and is denoted by Ł0 ¼ (ø0 , Æ01 , . . . , Æ0q , 01 , . . . , 0 p )T. Let (E1 , . . . , E n ) be a realization of length n of the unique non-anticipative strictly stationary solution (E t ) to model (2.1). Conditionally on initial values E0 , . . . , E1q , ~ 20 , . . . , ~ 21 p , the Gaussian quasi-likelihood is given by   n X 1 E2t pffiffiffiffiffiffiffiffiffiffiffi exp  2 , Ln (Ł) ¼ Ln (Ł; E1 , . . . , E n ) ¼ 2~ t 2~ 2t t¼1 where the ~ 2t are defined recursively, for t > 1, by ~ 2t ¼ ~ 2t (Ł) ¼ ø þ

q X

Æ i E2ti þ

i¼1

p X

 j ~ 2t j :

j¼1

For instance, the initial values can be chosen as E20 ¼ . . . ¼ E21q ¼ ~ 20 ¼ . . . ¼ ~ 21 p ¼ ø

(2:6)

E20 ¼ . . . ¼ E21q ¼ ~ 20 ¼ . . . ¼ ~ 21 p ¼ E21 :

(2:7)

or

A QMLE of Ł is defined as any measurable solution Ł^n of Ł^n ¼ arg max Ln (Ł) ¼ arg min ~l n (Ł): Ł2¨

Ł2¨

(2:8)

where ~l n (Ł) ¼ n1

n X t¼1

‘~t

and

E2 ‘~t ¼ ‘~t (Ł) ¼ t2 þ log ~ 2t : ~ t

Remark 1.1. It will be shown that the choice of initial values does not matter for the asymptotic properties of the QMLE. However, it may be important from a practical point of view. Other ways of generating P the sequence ~ 2t have been considered in the literature; for t1 instance by taking ~ 2t ¼ c0 (Ł) þ i¼1 ci (Ł)E2ti , where the ci (Ł) are computed recursively (see Berkes et al. 2003). Note that, to compute ~l n (Ł), their procedure requires a number of operations of order n2. The number of operations required by our procedure is of order n. P Pp Let AŁ (z) ¼ qi¼1 Æ i z i and BŁ (z) ¼ 1  j¼1  j z j . By convention, AŁ (z) ¼ 0 if q ¼ 0 and BŁ (z) ¼ 1 if p ¼ 0. To show the strong consistency, the following assumptions will be made. (A1) (A2)

Ł0 2 ¨ and ¨ is compact. Pp ª(A0 ) , 0 and 8Ł 2 ¨, j¼1  j , 1.

Maximum likelihood estimation of GARCH processes (A3) (A4)

609

2t has a non-degenerate distribution with E2t ¼ 1. If p . 0, AŁ0 (z) and BŁ0 (z) have no common root, AŁ0 (1) 6¼ 0, and Æ0q þ 0 p 6¼ 0.

It will be convenient to approximate the sequence (‘~t (Ł)) by an ergodic stationary sequence. In the first part of Proposition 1, equivalence evidently holds for any Ł 2 ¨. Thus (A2) implies that the roots of BŁ (z) are outside the unit disc. Therefore, denote by ( 2t ) ¼ f 2t (Ł)g the strictly stationary, ergodic and non-anticipative solution of  2t ¼ ø þ

q X

Æ i E2ti þ

p X

i¼1

 j  2t j ,

8t:

(2:9)

j¼1

Note that  2t (Ł0 ) ¼ ht . Let l n (Ł) ¼ l n (Ł; E n , E n1 . . . , ) ¼ n1

n X

‘t,

t¼1

‘ t ¼ ‘ t (Ł) ¼

E2t þ log  2t :  2t

We are now in a position to state our first result. Theorem 2.1. Let (Ł^n ) be a sequence of QML estimators satisfying (2.8), with the initial conditions (2.6) or (2.7). Then, under (A1)–(A4), almost surely Ł^n ! Ł0 , as n ! 1. Remark 2.1. Unlike Berkes et al. (2003), our assumptions on the i.i.d. process  t do not impose the existence of E(2þE t ), for some E . 0, or any other technical assumption on  t , such as those requiring that the cdf around zero is well behaved. In fact, their proof requires EŁ0 supŁ2¨ j‘ t (Ł)j , 1 (see their Lemma 5.1). Using an ergodic theorem for stationary processes (X t ) such that EX 1 2 R [ fþ1g, our proof only requires EŁ0 j‘ t (Ł0 )j , 1. Moreover, in Berkes et al. (2003), the parameter space is very constrained, ruling out zero coefficients in Ł. Remark 2.2. Straumann and Mikosch (2003) established asymptotic results for a general heteroscedastic time series model. When applied to the GARCH(1, 1) model, their consistency result coincides with ours. A slight difference is that they assume that the distribution of  t is not concentrated at two points, whereas we assume that  t is not concentrated at 1. Remark 2.3. Lee and Hansen (1994) and Lumsdaine (1996) established asymptotic results for the GARCH(1, 1) model. In Lee and Hansen (1994) the  t are required to form a strictly stationary martingale difference sequence. However, their QMLE is local, that is, Ln (Ł) is maximized in a neighbourhood of Ł0 . Moreover, the existence of E(2þE t ) is assumed for some E . 0. Remark 2.4. The first part of the identifiability assumption (A4) concerning the common roots of the polynomials was also made by Berkes et al. (2003). It is worth noting that (A4) implies that Ł0 need not belong to the interior of ¨. This is essential, in particular, to handle

610

C. Francq and J.-M. Zakoı¨an

situations of overidentification. For instance, our result shows that an ARCH(q) model (0 j ¼ 0, for all j) is consistently estimated when a GARCH( p, q) is fitted. More generally, one of the two orders p and q can be overidentified, but not both of them. Evidently, it is required that Æ0i . 0 for some i when p . 0. Without this assumption, the model solution would be an i.i.d. white noise, which could be represented as any GARCH(1, 0) process of the form  2t ¼  2 (1  ) þ 0 3 E2t1 þ  2t1 . Note also that the first part of (A4) is always satisfied when p . 1 and q . 1. If q ¼ 1, the unique root of AŁ0 (z) is 0 and BŁ0 (0) 6¼ 0. If p ¼ 1 and 01 6¼ 0, the unique root of BŁ0 (z) is 1=01 . 0 (if 01 ¼ 0, the polynomial does not have any zero), and, because the Æ0i are positive, AŁ0 (1=01 ) 6¼ 0. Remark 2.5. Following the suggestion made by a referee, we have not imposed E t ¼ 0. The conditional variance of E t given fE ti , i . 0g is only proportional to ht in this case. The assumption that E2t ¼ 1 is made for identifiability reasons and is not restrictive provided E2t , 1. Berkes and Horva´th (2003b) showed that this moment condition is necessary for the asymptotic normality of the Gaussian QMLE, and Berkes and Horva´th (2003a) showed that the moment condition can be weakened when criteria different from the Gaussian quasilikelihood are used. To show the asymptotic normality, the following additional assumptions are made. 8 where ¨ 8 denotes the interior of ¨. (A5) Ł0 2 ¨, 4 (A6) k :¼ E t , 1. The second main result of this section is the following. Theorem 2.2. Under assumptions (A1)–(A6), N (0, (k  1)J 1 ), where  J :¼ EŁ0

pffiffiffi ^ n(Ł n  Ł0 ) is asymptotically distributed as

   @ 2 ‘ t (Ł0 ) 1 @ 2t (Ł0 ) @ 2t (Ł0 ) ¼ E : Ł0 @Ł@ŁT  4t (Ł0 ) @Ł @ŁT

(2:10)

Remark 2.6. We show in Section 4 the existence and positive definiteness of J. Remark 2.7. Assumption (A5) is clearly necessary to obtain asymptotic normality. For pffiffiffi ^1  Æ01 ) is concentrated on [0, 1[, so the instance, when Æ01 ¼ 0, the distribution of n(Æ asymptotic distribution cannot be normal. Andrews (1999) studied such boundary problems in the GARCH(1, q) case. Remark 2.8. As in Theorem 2.1, no technical assumption on the distribution of  t is required, apart from the existence of a fourth-order moment. This assumption is clearly necessary for the existence of the variance of the score vector @‘ t (Ł0 )=@Ł. Note also that this assumption does not imply the existence of a second-order moment for the observed process (E t ). This is particularly interesting for financial applications, because such existence of the second-order moments is often found to be inappropriate.

Maximum likelihood estimation of GARCH processes

611

Remark 2.9. In Berkes et al. (2003) it is assumed that Ej t j4þ , 1 for some  . 0, and t  P(2t < t) ! 0 when t ! 0, for  . 0. These are P used to treat the P some P1 assumptions 3 2 2 1 3 3 2 right-hand terms of the inequality 1 i c E (1 þ c E ) < M þ i ti M,i i ci E ti , for i¼1 i¼1 i ti some absolutely summable positive sequence (ci ) and anyPM > 1 (see their 5.2). P1Lemma 1 3 2 2 1 Instead, we use Proposition 1 and the inequality i c E (1 þ c E ) < i ti i¼1 i¼1 i ti P1 3 s 2s s i c E for all s 2 ]0, 1[. The idea of exploiting the inequality x=(1 þ x) < x for all i ti i¼1 x . 0 is due to Boussama (2000).

6 Remark 2.10. In Boussama (2000) it is assumed that E parameter space is Q qt , 1. The Q p supposed to be a hypercube of the form [ø, ø] 3 [Æ , Æ ] 3 i i¼1 i j¼1 [ j ,  j ] with Pq j¼1  j , 1, which seems very restrictive. Moreover, it is not clear whether his results allow pure ARCH models to be treated, because an implicit assumption in his paper is that both Æ0q and 0 p are non-zero.

3. Estimation of ARMA-GARCH models In this section our aim is to extend the previous results to the case where the GARCH process is not directly observed. The process (E t ), the solution to (2.1), is a martingale difference and can therefore be used as the innovation of an ARMA process. Even for financial series, it seems very restrictive to assume that the observed process is a pure GARCH. Allowing for an ARMA part considerably extends the range of applications, but it also entails serious technical difficulties. The observations are now denoted X 1 , . . . , X n and are obtained from an ARMA(P, Q)GARCH( p, q) process (X t ) satisfying X t  c0 ¼

P X

a0i (X ti  c0 ) þ et 

i¼1

et ¼

Q X

b0 j e t j ,

j¼1

pffiffiffiffiffi ht  t ,

ht ¼ ø 0 þ

q X i¼1

(3:1) Æ0i e2ti þ

p X

0 j h t j ,

j¼1

where ( t ) and the coefficients ø0 , Æ0i and 0 j are defined as in (2.1). The parameter vector is denoted j ¼ (WT , ŁT )T ¼ (c, a1 , . . . a P , b1 , . . . , bQ , ŁT )T, the true value is j0 ¼ T T T (W0 , Ł0 )T ¼ (c0 , a01 , . . . a0 P , b01 , . . . , b0Q , Ł0 )T , and the parameter space is   R PþQþ1 3]0, þ 1[3[0, 1[ pþq . If q > Q, the initial values X 0 . . . , X 1(qQ) P , E~qþQ , . . . , E~1q , ~ 20 , . . . , ~ 21 p allow us to compute E~t (W), for t ¼ q þ Q þ 1, . . . , n, and ~ 2t (j), for t ¼ 1, . . . , n, from

612

C. Francq and J.-M. Zakoı¨an E~t ¼ E~t (W) ¼ X t  c 

P X

Æi (X ti  c) þ

q X

Æ i E~2ti þ

i¼1

bj E~t j ,

j¼1

i¼1

~ 2t ¼ ~ 2t (j) ¼ ø þ

Q X

p X

 j ~ 2t j :

j¼1

When q , Q, the required initial values are X 0 , . . . , X 1(qQ) P , E~0 , . . . , E~1Q , ~ 20 , . . . , ~ 21 p . For simplicity, these initial values will be taken to be fixed (neither random nor functions of the parameters). A QMLE of j is any measurable solution of ^n ¼ arg min ~l n (j), j

(3:2)

j2

P (W)=~ 2t (j) þ log ~ 2t (j). where ~l n (j) ¼ n1 Pnt¼1 ‘~t and ‘~t ¼ ‘~t (j) ¼ E~2t P P i Let AW (z) ¼ 1  i¼1 ai z and BW (z) ¼ 1  Qj¼1 bj z j . We make standard assumptions on these autoregressive and moving-average polynomials, and we adapt assumption (A1) as follows: (A7) (A8) (A9)

j0 2  and  is compact. For all j 2 , AW (z)BW (z) ¼ 0 implies jzj . 1. AW0 (z) and BW0 (z) have no common root, a0 P 6¼ 0 or b0Q 6¼ 0.

Under assumptions (A2) and (A8), (X t ) is assumed to be the unique non-anticipative strictly stationary solution to (3.1). Let E t ¼ E t (W) ¼ AW (L)B1 W (L)(X t  c), where L denotes the lag operator, and let ‘ t ¼ ‘ t (j) ¼ E2t = 2t þ log  2t , where  2t ¼  2t (j) is the strictly stationary, ergodic and non-anticipative solution of (2.9). Note that et ¼ E t (W0 ) and ht ¼  2t (j0 ). The following result extends Theorem 2.1. ^ n ) be a sequence of QMLEs satisfying (3.2). Assume that E t ¼ 0. Then, Theorem 3.1. Let (j ^ n ! j0 , almost surely, as n ! 1. under (A2)–(A4) and (A7)–(A9), j Remark 3.1. Ling and Li (1997; 1998) announced theoretical results for the MLE and QMLE of unstable and fractionally integrated ARMA models with GARCH innovations. However, they only obtained results for local estimators, that is, for sequences of solutions to the likelihood equation. Remark 3.2. Ling and MacAleer (2003a) considered QMLEs for vector ARMA-GARCH models. Their consistency result requires the existence of a second-order moment. Remark 3.3. (A9) is an identifiability assumption. In the literature on ARMA estimation, the assumption that a0 P 6¼ 0 and b0Q 6¼ 0 is often made. This excludes interesting situations where, for instance, and AR(1) model is fitted to a white noise. Remark 3.4. As in the pure GARCH case, the process E t (and hence X t ) need not have finite variance. In the pure ARMA case, where E t ¼  t has finite variance, our theorem reduces to a

Maximum likelihood estimation of GARCH processes

613

classical result on ARMA models based on i.i.d. innovations (see Brockwell and Davis, 1991, p. 384). For ARMA processes with i.i.d. infinite-variance innovations, the asymptotic distribution of the QMLE is not standard – see Mikosch et al. (1995) and Kokoszka and Taqqu (1996). Remark 3.5. Apart from the condition E t ¼ 0, the assumptions required for the strong consistency are not strengthened when an ARMA part is added. One might wonder whether the normality Theorem 2.2 also extends without cost in terms of assumptions. Unfortunately, the answer is negative, as the following example reveals. Consider the AR(1)-ARCH(1) model X t ¼ a01 X t1 þ et ,

et ¼

pffiffiffiffiffiffiffiffiffi ht  t ,

ht ¼ ø0 þ Æ01 e2t1

(3:3)

where ja01 j , 1, ø0 . 0, Æ01 > 0, and ( t ) is an i.i.d. sequence such that, for some a . 1, P( t ¼ a) ¼ P( t ¼ a) ¼

1 , 2a2

P( t ¼ 0) ¼ 1 

1 : a2

It can easily be seen that for an ARCH(1) process, the strict stationarity condition is Æ01 , expfE(log 2t )g. For any Æ01, the process (X t ) is therefore strictly stationary, since expfE(log 2t )g ¼ þ1. However, X t does not have a second-order moment, whence Æ01 > 1. The first component of the (normalized) score vector is      @‘ t (Ł0 ) e2t 1 @ 2t (Ł0 ) 2et @E t (Ł0 ) e t1 X t2  t X t1 2 ¼ 1 ¼ 2Æ01 (1   t ) þ  2 pffiffiffiffiffi : @a1 ht @a1 ht ht @a1 ht ht We have     e t1 X t2  t X t1 2 E Æ01 (1  2t ) þ pffiffiffiffiffi ht ht " #    e t1 X t2  t X t1 2  2 > E Æ01 (1   t ) þ pffiffiffiffiffi  t1 ¼ 0 P( t1 ¼ 0) ht ht   a201 1 1  2 E(X 2t2 ) ¼ a ø0 because  t1 ¼ 0 implies e t1 ¼ 0 and X t1 ¼ a01 X t2, and because  t ,  t1 and X t2 are independent. Therefore, if EX 2t ¼ 1 and a01 6¼ 0 the variance of the score vector is not defined. In Theorem 2.2, the asymptotic variance of the estimator of the pure GARCH parameter is proportional to the (finite) variance of the score vector (see Remark 2.8). This example shows that Theorem 2.2 does not extend to the ARMA-GARCH class. This is not very surprising since the asymptotic normality of the estimators of pure ARMA models with i.i.d. innovations (which belong to our general class) are obtained under second-order moment assumptions (see Brockwell and Davis 1991). For ARMA models with infinitevariance noise, the rate of convergence is faster than in the standard case and asymptotic stable laws are obtained (see Davis et al. 1992; Mikosch et al. 1995).

614

C. Francq and J.-M. Zakoı¨an

We establish asymptotic normality under a fourth-order moment assumption. Chen and An (1998) showed that there exists a non-anticipative and strictly stationary solution of (2.1) with finite fourth-order moment if and only if rfE(A0t  A0t )g . 1. We assume that Pp (A10) rfE(A0t  A0t )g , 1, and 8Ł 2 ¨, j¼1  j , 1. This assumption implies that k ¼ E(4t ) , 1. Also, (A2) becomes redundant. Analogously to the pure GARCH case, we assume that 8 where  8 denotes the interior of . (A11) j0 2 , For identifiability reasons we also make the following assumption, which is slightly stronger than the first part of (A3) when  t has a non-symmetric distribution. (A12)

There exists no set ¸ of cardinality 2 such that P( t 2 ¸) ¼ 1.

Theorem 3.2. Assume that E t ¼ 0. Under assumptions (A3)–(A4) and (A8)–(A12), p ffiffiffi ^ n  j0 ) is asymptotically distributed as N (0, ), where  ¼ J 1 I J 1 , with n( j    2  @‘ t (j0 ) @‘ t (j0 ) @ ‘ t (j0 ) I ¼ Ej0 , J ¼ Ej0 : @j @jT @j@jT If, in addition, the distribution of  t is symmetric, we have     I1 0 J1 0 I¼ , J ¼ , 0 I2 0 J2 with 

   1 @ 2t @ 2t 1 @E t @E t (j ) þ 4Ej (j ) , 0 0 0  4t @W @WT  2t @W @WT   1 @ 2t @ 2t I 2 ¼ (k  1)Ej0 4 (j ) , 0  t @Ł @ŁT     1 @ 2t @ 2t 2 @E t @E t J 1 ¼ Ej0 4 (j0 ) þ Ej0 2 (W0 ) ,  t @W @WT  t @W @WT   1 @ 2t @ 2t J 2 ¼ Ej0 4 (j ) : 0  t @Ł @ŁT I 1 ¼ (k  1)Ej0

Remark 3.6. When applied to the ARMA-GARCH case, the asymptotic normality result given by Ling and MacAleer (2003a) requires the existence of sixth-order moments. Moreover, the stationarity conditions are imposed over the whole parameter space. Remark 3.7. In the proof of the theorem, we show the existence and positive definiteness of the matrices I and J . Notice that when  t has a symmetric distribution,  is block-diagonal, which is important in the testing of joint assumptions on ARMA and GARCH coefficients. In

Maximum likelihood estimation of GARCH processes

615

1 addition, the bottom right-hand block J 1 2 I 2 J 2 of  depends on the GARCH coefficients only. In other words, the asymptotic accuracy of the GARCH estimators is not affected by the presence of an ARMA part.

Remark 3.8. It can easily be seen that assumption (A11) constrains only the GARCH coefficients. For any value of W0 , the restriction of  to its first P þ Q þ 1 components can be chosen sufficiently large that its interior contains W0 and assumption (A8) is not violated. Assumption (A11), however, requires the GARCH coefficients to be strictly positive.

4. Proofs Let K and r be generic constants taking many different values K . 0 and 0 , r , 1 throughout the proofs. For instance, we will allow ourselves to write, for 0 , r1 , 1 and 0 , r2 , 1, i1 > 0, i2 > 0, 0,K

X

r1i þ K

i>i1

X

ir2i < Krmin(i1 ,i2 ) :

i>i2

4.1. Proof of Theorem 2.1 Rewrite (2.9) in vector form as  2t ¼ c t þ B 2t1 ,

(4:1)

where 0 B B  2t ¼ B @

 2t  2t1 .. .  2t pþ1

1

0

C C C, A

B ct ¼ B @

øþ

Pq

1

0 .. .

C C, A

2 i¼1 Æ i E ti

0

0

1 B1 B¼B @ ...

2 0

 

0



1

1 p 0 C C: A

(4:2)

0

We will establish the following intermediate results: (i) (ii) (iii) (iv)

lim n!1 supŁ2¨ jl n (Ł)  ~l n (Ł)j ¼ 0 a.s. (9t 2 Z such that  2t (Ł) ¼  2t (Ł0 ) PŁ0 a.s.) ) Ł ¼ Ł0 . EŁ0 j‘ t (Ł0 )j , 1, and if Ł 6¼ Ł0 , EŁ0 ‘ t (Ł) . EŁ0 ‘ t (Ł0 ). Any Ł 6¼ Ł0 has a neighbourhood V (Ł) such inf Ł 2V (Ł) ~l n (Ł ) . EŁ0 ‘1 (Ł0 ) a.s.

that

lim inf n!1

To prove (i) first note that, using Proposition 1 and the compactness of ¨, sup r(B) , 1:

Ł2¨

(4:3)

616

C. Francq and J.-M. Zakoı¨an

Hence, iterating (4.1), we obtain  2t ¼ c t þ Bc t1 þ B2 c t2 þ . . . þ B t1 c1 þ B t  20 ¼

1 X

B k c t k :

(4:4)

k¼0

~ 2t be the vector obtained by replacing  2ti by ~ 2ti in  2ti . Let ~c be the vector obtained Let  by replacing E20 , . . . , E21q by the initial values (2.6) or (2.7). We have ~ 2t ¼ c t þ Bc t-1 þ . . . þ B tq1 c qþ1 þ B tq ~c q þ . . . þ B t1 ~c1 þ Bt  ~ 20 : 

(4:5)

In view of (4.3)–(4.5), almost surely, ( ) q  X    ~ 2t k ¼ sup ~ 20  supk 2t   B t k (c k  ~c k ) þ Bt ( 20     Ł2¨ Ł2¨ k¼1 < Kr t ,

8t:

(4:6)

Thus, using log x < x  1, almost surely, supjl n (Ł)  ~l n (Ł)j < n1

Ł2¨

n X t¼1


EŁ0 log 2t þ log t2 ¼0  t (Ł0 )  t (Ł)

EŁ0 ‘ t (Ł)  EŁ0 ‘ t (Ł0 ) ¼ EŁ0 log

(4:9)

with equality if and only if  2t (Ł0 )= 2t (Ł) ¼ 1 PŁ0 -a.s. It remains to show (iv). For any Ł 2 ¨ and any positive integer k, let Vk (Ł) be the open ball with centre Ł and radius 1/k. In view of (i), lim inf

inf

n!1 Ł 2V k (Ł)\¨

~l n (Ł ) > lim inf

inf

n!1 Ł 2V k (Ł)\¨

> lim inf n1 n!1

n X

l n (Ł )  lim sup supjl n (Ł)  ~l n (Ł)j n!1

inf

 t¼1 Ł 2V k (Ł)\¨

Ł2¨

‘ t (Ł ):

Now we use the following ergodic Ptheorem: if (X t ) is a stationary and ergodic process such that EX 1 2 R [ fþ1g, then n1 nt¼1 X t converges almost surely to EX 1 when n ! 1 (see Billingsley 1995, pp. 284 and 495). Applying this theorem to finf Ł 2V k (Ł)\¨ ‘ t (Ł )g t and using EŁ0 ‘t (Ł) , 1, we obtain lim inf n1 n!1

n X

inf

 t¼1 Ł 2V k (Ł)\¨

‘ t (Ł ) ¼ EŁ0

inf

Ł 2V k (Ł)\¨

‘1 (Ł ):

By the Beppo-Levi theorem, when k increases to 1, EŁ0 inf Ł 2V k (Ł)\¨ ‘1 (Ł ) increases to EŁ0 ‘1 (Ł). In view of (4.9), (iv) is proved. By a standard compactness argument we complete the proof of Theorem 2.1.

618

C. Francq and J.-M. Zakoı¨an

4.2. Proof of Theorem 2.2 The proof rests classically on a Taylor series expansion of the score vector around Ł0 . We have 0 ¼ n1=2

¼n

1=2

n X @ ~ ^ ‘ t (Ł n ) @Ł t¼1 n X @ ~ ‘ t (Ł0 ) þ @Ł t¼1

! n 1X @ 2 ~  pffiffiffi ^ n(Ł n  Ł0 ) ‘ t (Ł ij ) n t¼1 @Łi @Ł j

where the Łij are between Ł^n and Ł0 . We will show that n1=2

n X @ ~ ‘ t (Ł0 ) ) N (0, (k  1)J ) @Ł t¼1

(4:10)

and n1

n X

@2 ~  ‘ t (Ł ij ) ! J (i, j) @Łi @Ł j t¼1

in probability:

(4:11)

The theorem will straightforwardly follow. Again, we will split the proof into several intermediate results: (i) EŁ0 k(@‘ t (Ł0 )=@Ł) (@‘ t (Ł0 )=@ŁT )k , 1, EŁ0 k@ 2 ‘ t (Ł0 )=@Ł@ŁT k , 1. (ii) J is non-singular and varŁ0 f@‘ t (Ł0 )=@Łg ¼ fk  1gJ . (iii) There exists a neighbourhood V(Ł0 ) of Ł0 such that, for all i, j, k 2 f1, . . . , p þ q þ 1g,  3   @ ‘ t (Ł)   , 1: EŁ0 sup   Ł2V(Ł0 ) @Łi @Ł j @Ł k P P (iv) kn1=2 nt¼1 f@‘ t (Ł0 )=@Ł  @ ‘~t (Ł0 )=@Łgk ! 0 and supŁ2V(Ł0 ) kn1 nt¼1 f@ 2 ‘ t (Ł)= T @Ł@ŁP  @ 2 ‘~t (Ł)=@Ł@ŁT gk ! 0 in probability when n ! 1. n (v) n1=2 t¼1 @‘ t (Ł0 )=@Ł ) N (0, (k  1)J ): P n (vi) n1 t¼1 @ 2 ‘ t (Łij )=@Łi @Ł j ! J (i, j) a.s. The derivatives of ‘ t ¼ E2t = 2t þ log  2t are given by    @‘ t E2 1 @ 2t ¼ 1  t2 ,  2t @Ł @Ł t      2   @2‘ t E2t 1 @ 2  2t Et 1 @ 2t 1 @ 2t ¼ 1   1 þ 2 :  2t @Ł@ŁT  2t @Ł  2t @ŁT @Ł@ŁT  2t  2t

(4:12)

(4:13)

For Ł ¼ Ł0, E2t = 2t ¼ 2t is independent of the terms involving  2t and its derivatives. To prove (i) it will therefore be sufficient to show that

Maximum likelihood estimation of GARCH processes    1 @ 2    t (Ł0 ) , 1, EŁ0  2   t @Ł 

   1 @2 2    t EŁ0  2 (Ł )  , 1,   t @Ł@ŁT 0 

619    1 @ 2 @ 2    t t EŁ0  4 (Ł )  , 1:   t @Ł @ŁT 0  (4:14)

By (4.4) we have 1 @ 2t X ¼ Bk 1, @ø k¼0

1 @ 2t X ¼ B k E2t ki , @Æ i k¼0

( ) 1 k X @ 2t X i1 ( j) ki ¼ B B B c t k , @ j i¼1 k¼1

(4:15)

(4:16)

where 1 ¼ (1, 0, . . . , 0)T , E2t ¼ (E2t , 0, . . . , 0)T , and B( j) is a p 3 p matrix with (1, j)th element 1 and all other elements 0. Notice that, by the positivity of the coefficients in (4.15)– (4.16), the derivatives of  2t are non-negative. From (4.15), it is clear that @ 2t =@ø is bounded. Since  2t > ø . 0, this is also the case for f@ 2t =@øg= 2t which therefore possesses moments of any order. By (4.15) we have Æi

1 1 X @ 2t X ¼ Bk Æ i E2t ki < Bk c t k ¼  2t , @Æ i k¼0 k¼0

from which we deduce 1 @ 2t 1 < : 2  t @Æ i Æi

(4:17)

2 ( j) < B, Hence  2 t (@ t =@Æ i ) has moments of all orders at Ł ¼ Ł0 . In view of (4.16) and  j B we have ( ) 1 k 1 X X X @ 2t j < Bi1 BB ki c t¼ k ¼ kBk c t k : (4:18) @ j i¼1 k¼1 k¼1

Using (4.3), we have kBk k < Kr k for all k. Invoking Proposition 1 andPthe elementary inequality (a þ b) s < as þ bs for all a, b > 0, we deduce that c t (1) ¼ ø þ qi¼1 Æ i E2ti has a moment of order s, for some s 2]0, 1[. Using (4.18), the inequalities  2t > ø þ Bk (1, 1)c t k (1) and x=(1 þ x) < x s for all x > 0, we obtain EŁ0

1 1 @ 2t 1X kB k (1, 1)c t k (1) < E Ł 0 2  t @ j  j k¼1 ø þ Bk (1, 1)c t k (1)


0, the Minkowski inequality gives (  2 )1=2   k  s 1=2 1 1 @ 2t (Ł0 ) 1 X B (1, 1)c t k (1) EŁ0 < k EŁ0 , 1:  2t (Ł0 ) @ j 0 j k¼1 ø0 Finally, the Cauchy–Schwarz inequality allows us to conclude that the third expectation in (4.14) exists. We now prove (ii). By (4.12) and (i), we have     @‘ t (Ł0 ) 1 @ 2t (Ł0 ) 2 EŁ0 ¼ EŁ0 (1   t )EŁ0 ¼ 0: @Ł  2t (Ł0 ) @Ł Now using (4.13) and (i), J exists and (2.10) holds. We also have     @‘ t (Ł0 ) @‘ t (Ł0 ) @‘ t (Ł0 ) ¼ EŁ0 varŁ0 @Ł @Ł @Ł9   1 @ 2t (Ł0 ) @ 2t (Ł0 ) ¼ Ef(1  2t )2 gEŁ0  4t (Ł0 ) @Ł @Ł9

(4:23)

¼ (k  1)J : Now suppose "

 2 # 2 1 @ (Ł ) 0 t ºT J º ¼ E 4 ºT ¼0  t (Ł0 ) @Ł for some vector º 2 R pþqþ1. Then, almost surely, ºT f@ 2t (Ł0 )=@Łg ¼ 0. In view of (2.9) and the stationarity of f@ 2t (Ł0 )=@Łg t , we have 1 1 0 0 1 1 2 2 B E t1 C B E t1 C C C B B .. .. C C B B C C B B . . p 2 2 X C C B B @ (Ł ) @ t (Ł0 ) t j 0 2 T TB C: E2tq C þ E ¼ ºT B ¼ º 0 ¼ ºT  º tq j C C B B @Ł @Ł B  2 (Ł0 ) C j¼1 B  2 (Ł0 ) C C C B t1 B t1 C C B B .. .. A A @ @ . . 2 2  t p (Ł0 )  t p (Ł0 ) Write º ¼ (º0 , º1 , . . . , º qþ p ).T It is clear that º1 ¼ 0, otherwise E2t1 would be measurable with respect to the -field generated by f u , u , t  1g. For the same reason, it can be shown that º2 ¼ . . . ¼ º2þi ¼ 0 if º qþ1 ¼ . . . ¼ º qþi ¼ 0. Therefore º 6¼ 0 entails a GARCH( p  1, q  1) representation. This is impossible in view of (A4) using the arguments given to establish (4.8). Therefore ºT J º ¼ 0 implies º ¼ 0, which completes the proof of (ii). To prove (iii) we differentiate (4.13), which gives

622

C. Francq and J.-M. Zakoı¨an    @ 3 ‘ t (Ł) E2t 1 @ 3  2t ¼ 1 2 @Łi @Ł j @Ł k  2t @Łi @Ł j @Ł k t  2    E 1 @ 2t 1 @ 2  2t þ 2 t2  1  2t @Łi  2t @Ł j @Ł k t  2    E 1 @ 2t 1 @ 2  2t þ 2 t2  1  2t @Ł j  2t @Łi @Ł k t  2    Et 1 @ 2t 1 @ 2  2t þ 2 21  2t @Ł k  2t @Łi @Ł j t      E2t 1 @ 2t 1 @ 2t 1 @ 2t þ 26 2 :  2t @Łi  2t @Ł j  2t @Ł k t

(4:24)

We first prove that f1  E2t = 2t g is integrable. This term is difficult to handle because E2t = 2t is not uniformly integrable over ¨: in Ł ¼ (ø, 0, . . . , 0)T , the ratio E2t = 2t is not integrable when EE2t ¼ 1. However, we will show that f1  E2t = 2t g is uniformly integrable in a 8 Denote by neighbourhood of Ł0 . Let ¨ be a compact set containing Ł0 and included in ¨. B0 the matrix B (defined in (4.2)) evaluated at Ł ¼ Ł0 . For all  . 0, there exists a neighbourhood V(Ł0 ) of Ł0 , with V(Ł0 )  ¨ , such that B0 < (1 þ )B for all Ł 2 V(Ł0 ). From (4.4), we obtain ( ) q 1 1 X X X 2 k k 2 t ¼ ø B (1, 1) þ Æi B (1, 1)E t ki : i¼1

k¼0

k¼0

Since V(Ł0 )  ¨ , we have supŁ2V(Ł0 ) 1=Æ i , 1. Using also x=(1 þ x) < x s for all x > 0 and any s 2]0, 1[, we obtain ( P !) q 1 k k 2 X X  2t (Ł0 ) ø0 1 B (1, 1) B (1, 1)E 0 t ki k¼0 0 þ sup < sup Æ0i  2t ø ø þ Æ i Bk (1, 1)E2t ki Ł2V(Ł0 ) Ł2V(Ł0 ) i¼1 k¼0