Industry-Level Returns and Fama-French Factors In Bayesian Regression Models

Industry-Level Returns and Fama-French Factors In Bayesian Regression Models Chongyi Shen Fan Yang Qinbin Fan Abstract The prediction of index retu...
Author: Meryl Murphy
6 downloads 0 Views 61KB Size
Industry-Level Returns and Fama-French Factors In Bayesian Regression Models Chongyi Shen

Fan Yang

Qinbin Fan

Abstract The prediction of index returns has been extensively studied using various models. In this paper, we use Bayesian method to compare two classical models on predicting index returns. We employ Markov Chain Monte Carlo method via WinBUGS to estimate Bayesian model coefficients. Meanwhile, a complete comparison and discussion of full and reduced models under Bayesian treatment is performed by comparing DIC values. Our findings are consistent with the Fama-French three-factor hypothesis.

1. Introduction 1.1 Motivation CAPM uses a single factor, Market Equity Returns, to compare a portfolio with the market as a whole. But more generally, you can add factors to a regression model to give better statistics. The best known approach like this is the Fama-French threefactor model developed by Gene Fama and Ken French. One thing that's interesting is that Fama and French still see high returns as a reward for taking on high risk; in particular that means that if returns increase with book/price, then stocks with a high book/price ratio must be more risky than average - exactly the opposite of what a traditional business analyst would tell you. Fama and French aren't particular about why book/price measures risk, although they and others have suggested some possible reasons. For example, high book/price could 1

mean a stock is "distressed", temporarily selling low because future earnings look doubtful. Or, it could mean a stock is capital intensive, making it generally more vulnerable to low earnings during slow economic times. Those both sound plausible; but they seem to be describing completely different situations. It may be that the success of this model at explaining past performance isn't due to the significance of any of the three factors taken separately, but in their being different enough that taken together they do an effective job of "spanning the dimensions" of the market. In our paper, we want to see that if the Fama-French three-factor model performs better than the normal single-factor model. 1.2 Data Industry level returns and three-factor data is taken from Kenneth R. French’s data bank. We use average monthly value for Auto industry returns and three factors. The original data spans July 1926 to present. We use a subset of this data, from January 2003 to December 2008. There are 72 observations available for each series.

The response variable is: Y : Auto Industry Returns The predictors are:

X1: Market Equity Returns X2:Small

Minus

Big

(SMB:

small

market

capitalization minus the big, measures the additional return investors have historically received by investing in stocks of companies with relatively small market capitalization. This additional return is often referred to as the “size premium”.) X3: High Minus Low (HML: high book-to-price ratio

2

minus the low one, has been constructed to measure the “value premium” provided to investors for investing in companies with high book-to-market values, essentially, the value placed on the company by accountants as a ratio relative to the value the public markets placed on the company, commonly expressed as B/M)

The complete dataset is presented in section 7.2.

2. Methods 2.1 Single-Factor and Three-Factor Models Two theory-based models are going to be compared here Traditional Single-Factor Model:

Yi= a + b1X1i + ei which only considers Market Equity Returns as the predictor. Three-Factor Model:

Yi= a + b1X1i + b2X2i + b3X3i + ei By Gene Fama and Ken French’s theory, this model gives better fit of the market returns data. For simplification, we will refer the first model the ‘Reduced’ one and the second model the ‘Full’ one later in this text.

2.2 Likelihood Function, Priors We set normal likelihood function and independent normal priors for each coefficient with very small precision. Likelihood Function:

y[i] ~ dnorm( mu[i], tau) 3

a, bj ~ N (0, 0.000001) for j =1, 2, 3

Vague priors:

tau ~ Gamma (0.001, 0.001)

2.3 Initial Values The way we choose initial values is that we estimate the coefficients a and bi’s by using the first half of data (Jan 03 ~ Dec 05). To achieve this, frequentist’s methods are used to get initial values. Since index returns associated with the month variable are time series data, we should also consider cyclical component in our model. By using the procedure AUTOREG with SAS, the fitted model including AR(1) factor for forecasting and monitoring are obtained.

Time Series Plot of Auto Industry Returns 20

Auto Industry Returns

10 0 -10 -20 -30 -40 1

7

14

21

28 35 42 49 Index of Variable "Month"

56

63

70

Referring to section 7.3, from the output we can see the coefficient for b1 (Market Equity Returns) in the traditional single-factor model is 1.9105 and the coefficients for b1 (Market Equity Returns) b2 (Small Minus Big) and b3 (High Minus Low) are 1.778, 0.2727 and 0.3628, with insignificant AR(1) factor in each model.

4

2.4 Regression Statistics We use MCMC simulation and compare the Deviance Information Criterion (DIC) statistics corresponding to each of these two models.

3. Diagnostics We simulated three chains, with one from the auto regression output from SAS, and the other two from the spread-out of this output.

3.1 Single-Factor Model alpha chains 1:3

beta1 chains 1:3

1.5

1.5

1.0

1.0

0.5

0.5

0.0

0.0 51

200

400

51

start-iteration

200

400

start-iteration

deviance chains 1:3

sigma chains 1:3

1.0

1.5 1.0

0.5

0.5

0.0

0.0 51

200 start-iteration

400

51

200

400

start-iteration

From the BGR plots, it looks like the red line starts to be stably close to 1 from the 300th iteration for beta1, therefore a burn-in period of 300 seems to be reasonable and conservative. Then another 2000 iterations were run after setting up ‘DIC’. Below is the DIC output:

Dbar = post.mean of -2logL; Dhat = -2LogL at post.mean of stochastic nodes Dbar Dhat pD DIC y 210.744 207.659 3.086 213.830 total 210.744 207.659 3.086 213.830 pD is an estimate of ‘free parameters’ of the model and it is very close to 3. This is due to the fact that in the reduced model, we have three parameters to estimate (alpha, beta1, tau). 5

DIC is 213.830 for this reduced model. The final model output is: node mean alpha -3.308 beta1 1.628 deviance 210.7 sigma 4.548

sd MC err 0.7788 0.008037 0.1633 0.001694 2.591 0.0313 0.5743 0.007481

2.5% -4.833 1.31 207.8 3.591

median -3.312 1.626 210.1 4.495

97.5% start sample -1.731 301 8100 1.959 301 8100 217.5 301 8100 5.821 301 8100

This means a regression model of: Yi = -3.308 + 1.628 * Xi, where Xi is the Market Equity Returns of a given year, and Yi is the Auto Industry Returns of the same year. The model fit graph is shown below: model fit: mu 20.0 0.0 -20.0 -40.0 -20.0

-10.0

0.0

10.0

0.0

10.0

Together with the scatter plot, they both look good:

20.0 0.0 -20.0 -40.0 -20.0

-10.0

3.2 Three-Factor Model First look at the BGR plots for burn-in period:

6

alpha chains 1:3

beta1 chains 1:3

1.0

1.5 1.0

0.5

0.5

0.0

0.0 51

200

400

51

start-iteration

200

400

start-iteration

beta2 chains 1:3

beta3 chains 1:3

1.0

1.5 1.0

0.5

0.5

0.0

0.0 51

200

400

51

start-iteration

200

400

start-iteration

sigma chains 1:3 1.0 0.5 0.0 51

200

400

start-iteration

Overall, it looks like convergence is quick and only a small burn-in period is needed to discard. We decide to throw 200 iterations for the sake of being conservative. We also care about the DIC statistics. Dbar = post.mean of -2logL; Dhat = -2LogL at post.mean of stochastic nodes Dbar y

Dhat

pD

DIC

206.738

201.609 5.129 211.868

total 206.738

201.609 5.129 211.868

We notice the DIC statistics here is about 5, this is due to the fact that in the full model, we have 2 more parameters to estimate than the reduced model, which are beta2 and beta3. The correlation plot all look very good in that none of the 2+ lags are significantly different than 0, meaning there is not a strong correlation after we taking into account the time series effect in SAS steps:

7

alpha chains 1:3

beta1 chains 1:3

1.0 0.5 0.0 -0.5 -1.0

1.0 0.5 0.0 -0.5 -1.0 0

20

40

0

20

lag

40 lag

beta2 chains 1:3

beta3 chains 1:3

1.0 0.5 0.0 -0.5 -1.0

1.0 0.5 0.0 -0.5 -1.0 0

20

40

0

20

lag

40 lag

tau chains 1:3 1.0 0.5 0.0 -0.5 -1.0 0

20

40 lag

Below is the output of coefficients node

mean

alpha

-3.418

0.744

0.008078 -4.884

beta1

1.717

0.1647

0.001644 1.398

beta2

-0.7218

0.3487

beta3

0.5681

tau

0.05654

sd

MC error 2.5%

median

97.5%

start

sample

-3.414

-1.951

201

8400

1.715

2.047

201

8400

0.003841 -1.406

-0.7241

-0.01854 201

8400

0.3566

0.003563 -0.1403

0.5687

1.285

201

8400

0.01425

1.688E-4 0.03233

0.05528

0.08823

201

8400

The final model is then Yi = -3.418 + 1.717*X1i - 0.7218*X2i + 0.5681*X3i, where X1 is Market Equity Returns, X2 is Small Minus Big (SMB: small market capitalization minus the big one), and X3 is High Minus Low (HML: high book-to-price ratio minus the low one) The model fit graph is shown below:

8

model fit: mu 20.0 0.0 -20.0 -40.0 -40.0

-20.0

0.0

20.0

0.0

20.0

Together with the scatter plot, they both look good:

20.0 0.0 -20.0 -40.0 -40.0

-20.0

However, one thing that is noticed is that the 95% confidence interval for beta3 is (-0.1403, 1.285) which contains 0, meaning we do not have a significant coefficient estimate for HML. But based on our limited data and significant results in three-factor model from previous research, we would like to include HML in your fitted three-factor model.

4. Conclusion Based on the above results, we find that both models have very good fit graphs and statistically significant coefficient estimates. The Three-Factor model (adding x2 and x3) produces a smaller DIC. Therefore we decide to recommend the Three-Factor model as the better model for prediction on index returns. The table for comparison between two models is shown below. Model Single-Factor Three-Factor

DIC 213.830 211.868 9

The

three-factor

model

is

gaining

recognition

in

portfolio

management.

Morningstar.com classifies stocks and mutual funds based on these factors. Many studies show that the majority of actively managed mutual funds underperform broad indexes based on three factors if classified properly. This leads to more and more index funds and ETFs being offered based on the three-factor model. Our contribution in this project is to support and to verify the application of the three-factor model in line with previous research.

5. Reference Ang, A., Bekaert, G., 2007. Stock Return Predictability: Is it There? Review of Financial Studies 20, 651-707. z

Jagannathan, R., Wang, Z., 1996. The Conditional CAPM and the Cross-Section of Expected Returns. Journal of Finance 51, 3-53. z

Stambaugh, R.F., 1999. Predictive Regressions. Journal of Financial Economics 54, 375-421. z

Fama, Eugene F., French, Kenneth R. Common Risk Factors in the Returns on Stocks and Bonds. Journal of Financial Economics 33, 3-56. z

Fama, Eugene F., French, Kenneth R. The Cross-Section of Expected Stock Returns. Journal of Finance 47, 427-465. z

6. Appendix 6.1 WinBUGS Code for All Bayesian Model Reduced Model model { for (i in 1:N) { y[i] ~ dnorm( mu[i], tau) mu[i]