Dynamic stock markets clustering

ANNA CZAPKIEWICZ*, ARTUR MACHNO* Dynamic stock markets clustering. SUMMARY Dependencies between international stock markets determine the risk of an i...
Author: Sibyl Johns
10 downloads 2 Views 376KB Size
ANNA CZAPKIEWICZ*, ARTUR MACHNO* Dynamic stock markets clustering. SUMMARY Dependencies between international stock markets determine the risk of an international investment strategy. Clustering methods provides investors with the knowledge how to diversify portfolios more efficiently. In this article we propose procedure of dynamical clustering. Our conjecture is that the clustering is time dependent and dynamical properties of multivariate time series have an influence on the clustering. We have considered two dynamical multivariate models: multivariate GARCH model with dynamic conditional correlation (DCC) and the copula regime switching model. The dynamical clustering and forecasting of future clustering is possible using one of the two models. The data consist of daily returns of 36 market indices from all over the world from October 2002 to April 2012. We have included markets of North and South America, Asia and Australia. We have estimated parameters of the 36-dimensional GARCH model, however in the case of copula regime switching model, we estimated parameters of 630 bivariate models. These procedures have provided us with the time varying estimates of correlations which have been used to clustering markets due to the Ward’s method. The analysis has indicated three main clusters: Americas, Europe and Asia. The European cluster consists of two sub-clusters of developed and emerging markets. The clustering has appeared to be time dependent, we present results for three sub-periods (the period before the crisis, the period of the crisis and the post-crisis period). Moreover, the minor differences between clustering using two considered models have been noticed. Key words: dynamical clustering, multivariate GARCH, copula, regime switching 1 INTRODUCTION Clustering financial time series is important in order to understand the similarities of financial markets. This analysis, which consists in creating a set of markets according to a certain measure of similarity of their behavior, has appeared to be very helpful in formulating investment strategies. However, this procedure involves the construction of a convenient similarity measure of the empirical data which would take into account the characteristics of financial time series. Mantegna (1999), Bonanno, Lillo, Mantegna(2001), among others, used a Pearson correlation coefficient to construct the dissimilarity measure of a pair of stocks. However, Musetti(2012) has shown that this approach is not suitable outside the class of multivariate elliptical distributions. Piccolo (1990) introduced the dissimilarity measure for an autoregressive process. This measure was extended to the GARCH process family by Otranto (2004). In this case, the proximity is identified if the underlying conditional variance equations are similar in terms of their structure. Caiado, Crato (2007) introduced an approach based on the information about the estimated GARCH parameters for considered financial time series. The approach presented by Bastos, Caiado (2011) was based on the

AGH University of Science and Technology in Cracow, Department of Applications of Mathematics in Economics, e–mails [email protected]; [email protected] *


variance ratio test statistics. The authors divided the sample into three sub-periods and found that the markets are generally connected according to their size and level of development, although in the clusters there are markets from different geographical regions. Numerous models have been proposed in order to describe a multivariate financial data. The empirical research suggests using a dynamic multidimensional model to describe the relationship among financial time series, therefore we conjecture that clustering is dynamical and time dependent. We consider two models, the Markov regime switching model based on copulas and the multivariate GARCH model. The multivariate GARCH model was proposed by Bollerslev (1990), where the conditional correlation was assumed to be constant (CCC model). Engle (2002) and Tse and Tsui (2002) introduced DCC model, where the correlation matrix changes at every point of the time. Pelletier (2006) proposed the Regime Switching Dynamic Correlation (RSDC) model where the covariance was decomposed into correlations and standard deviations and both the correlations and the standard deviations were dynamic. Capiello and others (2006) described the model with asymmetry dynamics of dependences among considered financial time series. The time varying dependencies between time series may also be explained by regime switching process. This approach was widely discussed in the literature, for example Bartram et al. (2007),Okimoto (2008), Kenourgios, Samitas and Paltalidis (2011). The Markov switching mechanism is the most popular way by which a dynamics may be imposed into models based on copulas. The regime switching copula model with Markov switching mechanism for modelling financial time series was discussed by Patton (2006, and 2009), Jondeaua and Rockinger (2006), Rodriguez (2007), Chollete at al. (2009) and others. The approach based on copula functions is advisable in the case of strong asymmetry in margins. In this work, we discuss an approach to dynamic clustering of international stock markets. We specify the time varying parameters that defines the dynamic dependence between the financial time series and use these parameters in a construction of the dissimilarity measure of these time series at any time. This approach allows to cluster markets at any time and forecast. The main goal of the article is the empirical verification how the chosen model affects dynamical clustering of financial data. The study covers 36 daily returns market indices from all over the world. It is the sample chosen in order to diversified enough to capture all specific properties in terms of both geographical and economical dimensions. It includes the markets deemed as emerging as well as these already developed. Data consist of indices of North and South America, Europe, Asia and Australia. Daily returns come from the period from October 2002 to April 2012. The cluster analysis is performed for every time point, however, the presentation of clustering results are performed in certain sub-periods. The selection of the periods is conditioned by the behavior of the US market. The following subperiods have been analyzed: the period before the global crisis, the period of the crisis itself and the post-crisis period. The main body of the paper is organized as follows: In Section 2 the models and estimation procedures are presented. The empirical study is presented in Section 3. Section 4 summarize the results. 2


2.1. Regime Switching Copula Model It is considered the Markov switching copula model where the binary variable indicating the current regime is assumed to be latent and the marginal distributions does not depend on the regime. The parameters of model are obtained using maximum likelihood method. Because of the number of the markets in search is large the bivariate model is discussed instead of considering multivariate regime switching copula. The regime switching process is indexed by an unobservable random variable denoted by . The state variable follows a two-state Markov chain with the transition matrix: where

, , . The marginal densities are not dependent on the regime, therefore the procedure is performed in two steps. Firstly, margins distribution parameters are estimated. The chosen marginal model is the AR(1)-GARCH(1,1) model with the skewed Student’s t distribution as the conditional distribution. Secondly, the estimation of the dependence structure, that is conditional on having estimated the marginal models, is carried on. Given the fact that the Markov chain is not observable, the filter of Hamilton (for details see Hamilton (1989)) is used for estimation purposes. For simplicity let, denote the copula densities under the regime , for : (1) where is defined as follows:


Therefore, the log-likelihood functions for the dependence structure, denoted by following form:

(2) , takes the (3)

Using Hamilton filter, for

, we have: ,


where (5) The formula (5) is obtained from the following equality: (6) The algorithm of the likelihood function construction is performed iteratively for . At each step we accept the values as an input and next we calculate the component of and make the output . The procedure described above provide us with the probability distribution of given the information set . However, it is useful to know the distribution of given the full sample information set, so called smoothed probabilities. The smoothed probabilities may be obtained according the procedure describe in Hamilton(1994). Let , for , be a correlation coefficients connected with a copula function being in the regime and let be a correlation coefficient at the time t. The expected value of the correlation, conditionally on , at time t , has the following form: (7) Having full sample information set expected value of the correlation,

, estimates and at the time t as:

, it is possible to estimate the conditional

(8) 2.1. Dynamic Conditional Correlation Model Consider the dimensional stochastic vector process Autoregressive formula:

satisfying the following Vector

(9) where and denote the constant vector and the autoregression matrix, respectively, and denotes the error term. Let denotes the information set generated by the observed series up to the time . We assume that the process is conditionally heteroscedastic, represented by: (10) , where is the dynamic covariance matrix at the time , and is a sequence of dimensional i.i.d. random vector, such that and . Therefore, and . Furthermore, let us assume that , where is a continuous density function. There are various parametric formulations to specify the covariance matrix . In this paper, the Dynamic Conditional Correlation (DCC) specification, introduced by Engle (2002) and Tse, Tsui (2002), and modified by Aielli (2008) is considered. Hence, the covariance matrix can be decomposed as follows: (11) , where is the conditional correlation matrix of the vector and is the diagonal matrix whose i-th diagonal entry is given by the conditional standard deviation of . Conditional variances can be estimated separately and written in the following vector form based on model: (12) , In the presented model, apart from the fact that the time varying correlation matrix must be inverted at every time point (making the calculation much slower), it is also important to constrain it


to be positive definite. The most popular of DCC models, due to Engle (2002), achieves this constraint by modelling the proxy process as: (13) , ; where , and denotes the unconditional matrix of the standardized errors . Then, the correlation matrix is obtained by rescaling as follows: (14) . In this paper, the density function is assumed to be the multidimensional Student’s-t distribution. The model parameters are estimated by Maximum Likelihood Method. The estimation of the realized correlations is conducted by the recursive procedure of Engle, Sheppard (2001). 3. EMPIRICAL RESULTS 3.1 Data The investigation covers daily returns of 36 market indices from all over the world from October 2002 to April 2012. It includes the markets of North and South America(the USA, Canada, Argentine, Brazil, Mexico, Chile), Europe (Poland, Germany, the Netherlands, Greece, Austria, Hungary, France, the UK, Finland, Spain, Norway, Czech Republic, Russia, Switzerland), Asia (India, Hong Kong, Turkey, Indonesia, Malaysia, South Korea, Japan, Philippines, Thailand, China, Singapore, Taipei) and Australia. In the case of the USA, two different indices, DJIA and NASDAQ, are included. In addition, the BEL20, the benchmark stock market index of Euronext Brussels, is also included to the research. All considered indices are denominated in the US dollar. To deal with the missing data in the sample, the linear approximation has been used. The daily returns are computed as the difference between the logarithm of price at the time t and the logarithm of price at the time – and multiplied by 100. Table 1 contains some descriptive statistics. Table 1. Descriptive statistics of the data Average


Std dev. Skewness




















Std dev.






One can see from Table 1 that the average of the returns ranges from -0.026 percent to 0.106 percent. Generally, the median of considered return is greater than its average, whereas the skewness is negative, so the skewed distribution of returns should be taken into consideration. Kurtosis takes values from the 2.667 to 23.160, so a tendency to a high kurtosis should also be taken into account in the modeling. In order to examine the properties of the time series, especially autocorrelation and heteroscedasticity, the Ljung-Box and Engle tests have been carried out. The test results indicated existence of a first lag autocorrelation and GARCH effect for almost all of considered returns. 3.2. Cluster analysis The markets are connected to one another, with a few exceptions. It would be, therefore interesting to indicate which of them have some similarities in their dependencies. Furthermore, empirical results suggests the fact that the intensity of dependencies between financial markets changes over time. The change in intensity of the dependences had an impact on the dynamics of markets clustering. We use the Ward’s method for clustering. Let us define the dissimilarity matrix: where to perform dynamic clustering. The estimated conditional correlation between variables and at the time t, denoted by is calculated for both models, regime switching copula and multivariate GARCH. The clustering algorithm used in this work is based on the Wards algorithm. It is impossible to show results for each time point, so the cluster analysis results in particular sub-periods are presented. In this case the dissimilarity matrix have the elements , where and are the moments of the beginning and the end of the considered sub-period. It should be noted that the time varying correlation coefficient has been estimated for the entire sample. The selection of the periods has been conditioned by the behavior of the US market. So, three sub-periods have been considered: the period before the global crisis (from beginning of the sample to July 2007), the period of the crisis itself (from July 2007 to Dec 2008) and finally, the post-crisis


Canada Mexico USA_DJIA USA_NASDAQ Chile Brazil Argentine India Greece UK Italy Finland Switzerland Brussels Spain Germany Netherlands France Austria Norway Turkey Russia Czech_Republic Poland Hungary Hong_Kong Singapore South_Korea Taiwan Australia Japan Philippines Indonesia Malaysia Thailand China



China Indonesia Malaysia Hong_Kong Singapore South_Korea Taiwan Australia Japan India Thailand UK Italy Finland Switzerland Brussels Spain Germany Netherlands France Greece Austria Norway Turkey Russia Czech_Republic Poland Hungary Philippines Canada Mexico USA_DJIA USA_NASDAQ Chile Brazil Argentine





period (from January 2009 to the end of the sample, that is the year April 2012). In the empirical study the time zone difference has been taken into account. In the period before the global crisis (Figure 1) the following clusters are clearly observed: American markets, markets from Europe and Asian markets. The European cluster consist of two subclusters of emerging markets and developed markets. Those clusters are transparent and the result corresponds with our prior expectations. The only clear deference in clustering based on the full sample is that in the model based on the copula Philippines is in the cluster with American markets, as it is in the Asian cluster assuming mGARCH model.

Figure 1. The dendrograms the period before global crisis (from beginning of the sample to July 2007) based on regime switching copula model (left) and DCC model (right)

0.8 0.0


Canada Brazil Argentine Chile Mexico USA_DJIA USA_NASDAQ Italy Greece Norway Finland Switzerland Brussels Spain Germany Netherlands France Austria UK Turkey Russia Czech_Republic Poland Hungary India South_Korea Taiwan Japan Australia Hong_Kong Singapore Philippines Indonesia Malaysia Thailand China

Russia Greece Norway Switzerland Brussels Spain Netherlands Germany France UK Italy Austria Finland Turkey Poland Hungary Czech_Republic Canada Brazil Argentine Chile Mexico USA_DJIA USA_NASDAQ India Indonesia Malaysia Philippines South_Korea Taiwan Japan Australia Hong_Kong Singapore Thailand China




Under the crisis period (Figure 2) there are minor differences in the grouped indices. In this period Philippines change their position in comparison with previous case but positions obtained from both approaches differs between themselves . Philippines is grouped as an Asian market but rather isolated, in the second case Philippines is close to Indonesia and Malaysia markets. The other differences are observed in the European cluster. For example in the case of the copula model, Russia is isolated European market, whereas it is clustered with emerging markets in case of mGARCH model. The Italian and Finnish markets have different positions in the case of the copula model and DCC-GARCH model approaches. Under the post-crisis period (Figure 3) further minor differences are observed. Mexico became more isolated market in the case of multivariate GARCH model considering, whereas it is closed to Canadian market considering switching regime model. The other minor difference concerns position of South Korea and Taiwan markets. Clustering studies in sub-periods implies the fact that in each sub-period clear major clusters are present: American markets, European markets and Asian markets. The clustering changes that occur over time take place within these major clusters.

Chile Mexico USA_DJIA USA_NASDAQ Canada Brazil Argentine Greece Austria Norway UK Italy Switzerland Finland Brussels Spain Germany Netherlands France Turkey Russia Czech_Republic Poland Hungary Thailand China India Indonesia Malaysia Philippines Japan Australia Hong_Kong Singapore South_Korea Taiwan


Greece Switzerland UK Italy Norway Austria Finland Spain Brussels Germany Netherlands France Turkey Russia Czech_Republic Poland Hungary Chile USA_DJIA USA_NASDAQ Brazil Argentine Mexico Canada Thailand China India Indonesia Malaysia Hong_Kong Singapore Philippines Japan Australia South_Korea Taiwan






Figure 2. The dendrogram for the period of the crisis (July 2007 to December 2008) based on regime switching copula model (left) and DCC model (right)

Figure 3. The dendrogram for the post-crisis period (December 2008 to April 2012) based on regime switching copula model (left) and DCC model (right)


4. CONCLUSIONS Assuming one of the models: multivariate DCC or regime switching copula model, we are able to perform dynamic clustering. Moreover, the forecast of the future clustering is practicable. The presentation of the clustering results for every time point would be unintelligible if not impossible. Therefore, we present results in three periods, namely the period before global crisis, the period of the crisis and the post-crisis period. Four expected clusters occurred in all considered periods and based on both models: American markets, developed European markets, emerging European markets and the Asian (with Australia) markets. The clustering changes within the main clusters occurred frequently in the analyzed sample. However, the changes of the main clusters have been rare and occurred mostly between two European clusters. Similarly, differences in main clusters using different models have been occasional. However, these differences could be crucial for international investors. The choice of the multivariate financial data model affects clustering and might lead to the change of an investment strategy. Moreover, dynamic clustering provide investors with more accurate dependencies. The forecast is important for investors who take long-term strategies. It could suggest which country is more likely to change the cluster and which is not. References [1] Bartram S.M., Taylor S.J., Wang Y.H., The Euro and European financial market dependence. Journal of Banking & Finance, 2007, vol. 31, pp. 1461-1481. [2] Bastos J.A., Caiado J., Clustering global equity markets with variance ratio tests. Centre for Applied Mathematics and Economics, Technical University of Lisbon, Portugal, 2011. [3] Bollerslev T., Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 1986, vol. 31, pp. 307-327. [4] Bollerslev T., Modeling the Coherence in Short-Run Nominal Exchange Rates: A Multivariate Generalized ARCH Model. Review of Economics and Statistics, 1990, vol. 72, pp. 498-505. [5] Bonanno G., Lillo F., Mantegna R., Level of complexity in financial markets. Physica A, 2001, vol. 299, pp. 16–27. [6] Breymann W., Dias A., Embrechts P., Dependence Structures for Multivariate HighFrequency Data in Finance. Quantitative Finance, 2003, vol. 3, pp. 1-14. [7] Cappiello, L., Engle, R., Sheppard, K., Asymmetric dynamics in the correlations of global equity and bond returns. Journal of Financial Economics, 2006, vol. 4, pp. 537-572. [8] Caiado J., Crato N., A GARCH-based method for clustering of financial time series: International stock markets evidence. Forthcoming in: Proceedings of the XIIth Applied Stochastic Models and Data Analysis International Conference, 2007. [9] Chollete, L. Heinen, A. and Valdesogo, A., Modeling international financial returns with a multivariate regime switching copula. Journal of Financial Econometrics, 2009, vol. 7(4), pp. 437–480. [10] Diebold F.X., Gunther T.A., Tay A.S., Evaluating Density Forecasts with Applications to Financial Risk Management. International Economic Review, 1998, vol. 39(4), pp. 863-883. [11] Engle, R.F., Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business and Economic Statistics, 2002, vol. 20, pp. 339-350. [12] Genest R., Remillard B., Beaudoin D., Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, 2009, vol. 44, pp. 199-214. [13] Glosten L., Jagannathan R., Runkle D., 1993, On the relation between the expected value and the volatility of the nominal excess return on stocks. Journal of Finance, 48.1779-1801. [14] Hamilton J., Time series analysis. Princeton Univ. Press, 1994. [15] Hansen B., Autoregressive Conditional Density Estimation. International Economic Review, 1994, vol. 35, pp. 705-730. [16] Jondeau E., Rockinger M., The Copula-GARCH model of conditional dependencies: An international stock market application. Journal of International Money and Finance, 2006, vol. 25, pp. 827-853. [17] Kenourgios D., Samitas A., Paltalidis N., Financial crises and stock market contagion in a multivariate time-varying asymmetric framework. Journal of International Financial Markets, Institutions & Money, 2011, vol. 21(1), pp. 92-106. [18] Mantegna R.N., Hierarchical structure in financial markets. The European Physical Journal B, 1999, vol. 11, pp. 193–197.


[19] Mirkin B., Clustering for Data Mining: A Data Recovery Approach, Boca Raton Fl., Chapman and Hall/CRC, 2005. [20] Musetti A.T.Y., Clustering methods for financial time series. Spring, 2012. [21] Nelsen B. R., An Introduction to Copulas. Springer Verlag, New York, 1999,. [22] Okimoto, T., New evidence of asymmetric dependence structures in international equity markets. Journal of Financial and Quantitative Analysis, 2008, vol. 43, pp. 787-815. [23] Otranto E., Classifying the Markets Volatility with ARMA Distance Measures, Quaderni di Statistica, 2004, vol. 6, pp. 1-19. [24] Patton, A.J., Modelling asymmetric exchange rate dependence, International Economic Review, 2006, vol. 47, pp. 527-556. [25] Pelletier, D., Regime-switching for dynamic correlation, Journal of Econometrics, 2006, vol. 131, pp. 445-473. [26] Patton, A.J., Copula–based Models for Financial Time Series. Handbook of financial time series, 2009. [27] Piccolo, A distance measure for classifying ARIMA models, Journal of Time Series Analysis, 1990, vol. 11, pp. 153-164. [28] Rodriguez J.C., Measuring Financial Contagion: A Copula Approach. Journal of Empirical Finance, 2007, vol. 14(3), pp. 401-423. [29] Tse, Y. K. Tsui, A. K. C., A Multivariate Generalized Autoregressive Conditional Heteroscedasticity Model with Time-Varying Correlations, Journal of Business and Economic Statistics, 2002, vol. 20, pp. 351-362.