Luck versus Skill in the Cross Section of Mutual Fund Alpha Estimates

Tuck School of Business at Dartmouth Tuck School of Business Working Paper No. 2009-56 Luck versus Skill in the Cross Section of Mutual Fund Alpha E...
Author: Jasmine Craig
1 downloads 0 Views 244KB Size
Tuck School of Business at Dartmouth

Tuck School of Business Working Paper No. 2009-56

Luck versus Skill in the Cross Section of Mutual Fund Alpha Estimates

Eugene F. Fama University of Chicago – Booth School of Business Kenneth R. French Tuck School of Business at Dartmouth; National Bureau of Economic Research (NBER)

March 9, 2009

This paper can be downloaded from the Social Science Research Network Electronic Paper Collection:

http://ssrn.com/abstract=1356021

Electronic copy available at: http://ssrn.com/abstract=1356021

First draft: October 2007 This draft: February 2009 Not for quotation: Comments welcome

Luck versus Skill in the Cross Section of Mutual Fund α Estimates Eugene F. Fama and Kenneth R. French* Abstract The aggregate portfolio of U.S. equity mutual funds is close to the market portfolio, but the high costs of active management show up intact as lower returns to investors. Bootstrap simulations produce no evidence that any managers have enough skill to cover the costs they impose on investors. If we add back costs, there is some evidence of inferior and superior performance (non-zero true α) in the extreme tails of the cross section of mutual fund α estimates. The evidence for performance is, however, weak, especially for successful funds, and we cannot reject the hypothesis that no fund managers have skill that enhances expected returns.

                                                             

Graduate School of Business, University of Chicago (Fama) and Amos Tuck School of Business, Dartmouth College (French). We are grateful for the comments of John Cochrane and seminar participants at the University of Chicago and UCLA.

   

Electronic copy available at: http://ssrn.com/abstract=1356021

There is a constraint on the returns to active management that we call equilibrium accounting. In short (details later), suppose that when returns are measured before costs (management fees and other expenses), passive investors get passive returns; that is, they have zero α (abnormal expected return) relative to passive benchmarks. This means active investment must also be a zero sum game – aggregate α is zero before costs. If some active investors have positive α before costs, it is dollar for dollar at the expense of other active investors.

After costs, that is, in terms of net returns to investors, active

investment must be a negative sum game. (See, for example, Sharpe 1991.) We examine mutual fund performance from the perspective of equilibrium accounting. For example, at the aggregate level, if the value-weight (VW) portfolio of funds has positive α before costs, we can infer that the VW portfolio of active investments outside mutual funds has a negative α. In other words, active mutual funds win at the expense of active investments outside mutual funds. Our tests, however, do not produce this result. The VW portfolio of funds that invest primarily in U.S. equities is close to the market portfolio, and estimated before costs, its α relative to common benchmarks is close to zero. Since the VW portfolio of funds produces α close to zero in gross (pre-cost) returns, α estimated on net (post-cost) returns to investors is negative by about the amount of costs. And since mutual funds in aggregate produce α close to zero before

costs, equilibrium accounting allows us to infer that in aggregate other active managers do the same. The aggregate results imply that if there are mutual funds with positive true α, they are balanced by funds with negative α. We test for the existence of such funds. The challenge is to distinguish skill from luck. Given the multitude of funds, many have extreme returns by chance. A common approach to this problem is to test for persistence in fund returns, that is, whether past winners continue to produce high returns and losers continue to underperform (for example, Grinblatt and Titman 1992, Carhart 1997). Persistence tests have a downside. They rank funds on short-term past performance, so there may be little evidence of persistence because the allocation of funds to winner and loser portfolios is largely based on noise. We take a different tack. We use long histories of individual fund returns and bootstrap simulations of return histories to infer the existence of superior and inferior managers. We compare the actual cross-section of fund α estimates to the results from 10,000 bootstrap simulations of the cross-section. The returns of the

Electronic copy available at: http://ssrn.com/abstract=1356021

funds in a simulation run have all the properties of actual fund returns, except that in the return population from which simulation samples are drawn, true α is set to zero. The simulations thus provide the distribution of the cross-section of α estimates when in truth there is no abnormal performance in fund returns. Comparing the distribution of α estimates from the simulations to the cross-section of α estimates for actual fund returns allows us to draw inferences about the existence of skilled managers. For fund investors the simulation results are disheartening. The tests say that when α is estimated on net returns to investors, even the extreme right tail of the cross-section of precision-adjusted α estimates, t(α), is dominated by fund managers that lack skill sufficient to produce expected returns that cover the costs funds impose on investors. Thus, if there are managers with sufficient skill to cover costs, they are hidden among the mass of managers with insufficient skill. Mutual funds look better when returns are measured gross, that is, before the costs included in expense ratios. Comparing the cross-section of t(α) estimates from gross fund returns to the average crosssection from the simulations suggests that there are inferior managers whose stock picks reduce expected returns and there are superior managers that enhance expected returns. If we assume that the cross section of true α has a normal distribution with mean zero and standard deviation x, then x around 0.5% per year seems to capture the right tail of the cross section of α estimates for funds, and x around 1.0% captures the left tail. The right tail estimate of the standard deviation of true α, 0.5% per year, does not imply much skill. It suggests, for example, that only 2.38% of funds have α greater than 1.0% per year (about 0.08% per month). As a result, even before costs, formal support for the existence of positive true α is weak. Large fractions of the simulation runs in which true α is zero for all funds produce right tails for t(α) more extreme than the tails observed for actual gross fund returns. The left tail estimate, x = 1.0% per year, is stronger but still not overwhelming evidence that there are funds with negative true α. Our bootstrap simulation approach is potentially subject to a power problem. Because there are so many funds, it can be difficult to reject a null hypothesis in tests on the cross section of t(α) estimates. We examine this issue and conclude that the simulations have power to identify performance when it is nontrivial. For example, if the cross section of true α for gross fund returns has mean zero and standard deviation x, the 2

 

Electronic copy available at: http://ssrn.com/abstract=1356021

simulations comfortably reject the hypothesis that x is as large as 1.5% per year. For perspective, the average standard error of individual fund α estimates is 0.28% per month (about 3.2% per year), so α estimates are typically imprecise. The fact that the simulations allow us to infer that the standard deviation of the cross section of true α for actual gross fund returns is less than 1.5% per year then seems like substantial power. Many readers suggest that our results are consistent with the predictions of Berk and Green (2004). We briefly outline their model in Section II, after presenting the results for mutual fund aggregates (Section I), and before the bootstrap simulations (Section III) and power tests (Section IV). Our results clearly reject the predictions of their model. Finally, the paper closest to ours is Kosowski et al. (2006). They do bootstrap simulations that seem to produce stronger evidence of manager skill. We discuss their tests in Section V, after presenting our results. Section VI concludes. I. Average Returns for EW and VW Portfolios of U.S. Equity Mutual Funds Our mutual fund sample is from the CRSP (Center for Research in Security Prices) database. We include only funds that invest primarily in U.S. common stocks, and we combine, with value weights, different classes of the same fund into a single fund. (See French 2008 for details.) The CRSP data cover January 1962 to September 2006 (henceforth 1962-2006), but our central simulation tests use the period after 1983, when there are fewer data issues (Elton, Gruber, and Blake 2001). Our benchmarks for evaluating fund performance are the three-factor model of Fama and French (1993), and Carhart’s (1997) four-factor model. To measure performance, these models use two variants of the time-series regression, Rit – Rft = ai + bi(RMt – Rft) + siSMBt + viVMGt + miMOMt + eit.

(1)

In this regression, Rit is the return on fund i for month t, Rft is the riskfree rate (the one-month U.S. Treasury bill rate), RMt is the market return (the return on a value-weight portfolio of NYSE, Amex, and NASDAQ stocks), SMBt and VMGt are the size and value-growth returns of Fama and French (1993), MOMt is our version of Carhart’s (1997) momentum return, ai is the average return left unexplained by the benchmark model (the estimate of αi), and eit is the regression residual. The full version of (1) is Carhart’s four-factor model, and the regression without MOMt is the Fama-French three-factor model. The construction of SMBt and VMGt (also known as HMLt) follows Fama and French (1993). The momentum return, MOMt, is defined 3

 

like VMGt, except that we sort on prior return rather than the book-to-market equity ratio and the momentum sort is refreshed monthly rather than annually. (See Table 1 for details.) Regression (1) allows a more precise statement of the constraints of equilibrium accounting. The VW aggregate of the U.S. equity portfolios of all investors is the market portfolio. It has a market slope equal to 1.0 in (1), zero slopes on the other explanatory returns, and a zero intercept – before investment costs. This means that if the VW aggregate portfolio of passive investors also has a zero intercept before costs, the VW aggregate portfolio of active investors must have a zero intercept. Thus, positive and negative intercepts among active investors must balance out – before costs. There is controversy about whether the average SMBt, VMGt, and MOMt returns are rewards for risk or the result of mispricing. For our purposes, there is no need to take a stance on this issue. We can simply interpret SMBt, VMGt, and MOMt as diversified passive benchmark returns that capture patterns in average returns during our sample period, whatever the source of the average returns. Abstracting from the variation in returns associated with RMt – Rft, SMBt, VMGt, and MOMt then allows us to focus better on the effects of stock picking, which should show up in the three-factor and four-factor intercepts. From an investment perspective, the slopes on the explanatory returns in (1) describe a diversified portfolio of passive benchmarks (including the riskfree security) that replicates the exposures of the fund on the left to common factors in returns. The regression intercept then measures the average return provided by a fund in excess of the return on a comparable passive portfolio. We interpret a positive expected intercept (true α) as good performance, and a negative expected intercept signals bad performance.1 Table 1 shows summary statistics for the explanatory returns in (1) for periods used in later tests of aggregate mutual fund performance. The average value of the value-growth return, VMGt, is large for 19622006 (0.48% per month, t = 3.82) and for each subperiod. The average value of the market premium, RMt – Rft, for 1962-2006 (0.45% per month, t = 2.36) is close to the average VMGt return. The size return, SMBt, has the smallest average value for 1962-2006 (0.23% per month, t = 1.65). For every period, the average

                                                             1

Formal justification for this definition of good and bad performance is provided by Dybvig and Ross (1985). Given a riskfree security, their Theorem 5 implies that if the intercept in (1) is positive (negative), there is a portfolio of fund i and the portfolio of the explanatory portfolios on the right of (1), with positive (negative) weight on i, that has a higher Sharpe ratio than the portfolio of the explanatory portfolios. 4

 

momentum (MOMt) return is the largest factor premium (for example, 0.82% per month, t = 4.80, for 19622006). Table 2 shows estimates of regression (1) for monthly returns on equal-weight (EW) and value-weight (VW) portfolios of the funds in our sample. The EW portfolio weights funds equally each month. In the VW portfolio funds are weighted by assets under management (AUM). The intercepts in (1) for EW fund returns tell us whether funds on average produce returns different from those implied by their exposures to common factors in returns, whereas VW returns tell us about the fate of aggregate wealth invested in funds. Part A of Table 2 shows the regression intercepts (α estimates) for the three- and four-factor variants of (1) for returns measured gross and net of fund expenses. Net returns are those reported to investors. Monthly gross returns are net returns plus 1/12th of a fund’s expense ratio for the year. Part B of Table 2 shows the regression slopes for the four-factor model. The market, SMBt, and VMGt slopes for the three-factor model are close to the slopes in the four-factor model. Only the slopes for net returns are shown. The slopes for gross returns are the same to three decimal places. All the market slopes in Table 2 are close to 1.0, which is not surprising since our sample is limited to funds that invest primarily in U.S. stocks. (Indeed, one of our screens is that funds must have market slopes greater than 0.75.) The slopes on SMBt are around 0.20 for EW fund returns and around 0.06 for VW returns. We infer that smaller funds are more likely to invest in small stocks, but total dollars invested in funds (captured by VW returns) show little tilt toward small stocks. More interesting, the tilt toward small stocks in EW and VW returns, as captured by SMBt slopes, is a bit lower after 1983. Thus, Banz’ (1981) evidence on the higher average returns of small stocks does not produce a shift toward small stocks. We show results for 1993-2006 in Table 2 to examine how the mutual fund industry responds to academic evidence on the value/growth and momentum patterns in returns. The slopes for VMGt in fund returns for 1962-1992 are -0.10 (EW and VW returns), which suggests a slight tilt toward growth stocks. The VMGt slopes in fund returns for 1993-2006 rise to 0.05 (EW) and 0.00 (VW), which suggests a slight move away from growth stocks toward a neutral position. Again, however, academic trumpeting of the higher returns of value stocks relative to growth stocks (Fama and French 1992, Lakonishok, Shleifer, and Vishny 5

 

1994) does not result in much aggregate movement of funds or dollars invested in funds toward value stocks. The discovery of return momentum (Jegadeesh and Titman 1993) also produces little aggregate movement toward positive momentum stocks. In fact, the momentum slopes for EW and VW fund returns move from slightly positive for 1962-1992 to values closer to zero for 1993-2006. The intercepts in the estimates of (1) summarize the average performance of funds (EW returns) and the performance of aggregate wealth invested in funds (VW returns) relative to passive benchmarks. In terms of net returns to investors, performance is poor. The three-factor intercepts for EW and VW net returns are negative for all the periods in Table 2, but the estimates for 1962-1983 and 1962-1992 are close to zero. The EW and VW fund returns of 1962-1983 and 1962-1992 have slightly positive momentum exposure, however, and controlling for momentum leads to strongly negative four-factor intercepts for these periods, like those for 1984-2006 and 1993-2006. The annualized four-factor intercepts for different periods range from -0.71% to -1.38%, and they are -1.75 to -3.68 standard errors from zero. These results are in line with previous work on net returns (Jensen 1968, Malkiel 1995, Gruber 1996). The intercepts in (1) for net returns tell us whether managers have sufficient skill to generate returns that cover the costs their funds impose on investors. Gross returns are better for testing whether managers have any skill. For EW and VW gross fund returns, three-factor intercepts are positive for 1962-1983 and 1962-1993. Again, however, the positive three-factor α estimates before 1993 are due to positive momentum exposure. For VW gross returns, four-factor intercepts for all periods in Table 2 are negative but close to zero. For EW gross returns, four-factor intercepts are randomly positive and negative, and always close to zero. Skipping the details, we have also estimated the CAPM version of (1), in which RMt – Rft is the only explanatory return. No inferences change. For example, because of data issues, the period of interest in the rest of the paper is 1984-2006. The annualized CAPM intercept for VW gross fund returns for this period is -0.18% (t = -0.52), versus 0.10% (t = 0.32) and -0.03% (t = -0.09) for the three-factor and four-factor models. It is not surprising that the intercepts for the three models are so similar since VW fund returns produce slopes close to zero for the non-market explanatory returns in (1).

6

 

We can offer an equilibrium accounting perspective on the results in Table 2. During 1984-2006, when there are fewer biases in the CRSP data, the annualized three- and four-factor α estimates for VW net fund returns are -0.80% and -0.93% (t = -2.62 and -2.98). Thus, for total wealth invested in funds, any benefits from active management are overwhelmed by expenses (including management fees). When we add back expenses, there is no evidence that total wealth invested in funds gets benefits or suffers losses from active management. The annualized α estimates for VW gross fund returns for 1984-2006 are close to zero. VW fund returns also show little exposure to the size, value, and momentum returns during 1984-2006, and we can report that the excess market return alone explains 99% of the variance of the monthly VW excess return for our funds. Together these facts say that during 1984-2006, mutual fund investors in aggregate hold a portfolio that, before expenses, mimics market portfolio returns.

The aggregate portfolio of funds is, however,

dominated by active funds, and the average return to investors is reduced by the high expenses of these funds. These results echo equilibrium accounting, but for a subset of investment managers where the implications of equilibrium accounting for aggregate investor returns need not hold. Moreover, since aggregate α for dollars invested in mutual funds is close to zero in gross returns, equilibrium accounting allows us to infer that aggregate α for active investments outside mutual funds is also close to zero. Finally, our gross fund returns are before management fees and other expenses, but they are net of trading costs. Since the benchmark explanatory returns in (1) are before all costs, there is a mismatch in the tests on gross returns. The problem is minor. The VW market portfolio is largely self-balancing. There is more turnover in the other benchmark portfolios in (1), but the slopes on their returns in Table 2 are close to zero, so their trading costs are largely irrelevant, at least in tests on aggregates. Moreover, passive managers often use fees from securities lending to offset costs. And they seem to succeed. Thus, we can report that the four-factor α estimate for 1984-2006 for a VW portfolio of index funds is 0.26% per year (t = 1.00) in gross returns and 0.00% (t = 0.01) in net returns to investors. In other words, at least in aggregate, index funds (a comparison set for active funds) produce average returns that cover all costs, including trading costs. We could attempt to add trading costs to gross fund returns. Turnover is, however, often missing on CRSP, and even when turnover is available, estimates of trading costs are subject to large error (Carhart 1997). 7

 

We prefer to argue that trading costs (including market impact) are inherent in an active fund’s strategy, and active funds should at a minimum provide expected returns that cover trading costs. In this view, measuring gross fund returns net of trading costs is reasonable. We revisit this issue later. II. Berk and Green (2004)

Readers contend that our results (Table 2 and below) are consistent with Berk and Green (2004). Their model is attractive theory, but our results reject its predictions. In their world, a fund manager is endowed with a permanent α, before costs, but he faces costs that are an increasing convex function of assets under management. Investors use returns to update estimates of α. Funds flow to a manager with a positive expected α before costs until AUM reaches the point where expected α, net of costs, is zero. Outflows of funds drive out managers with negative expected α. In equilibrium, all active managers (and thus funds in aggregate) have positive expected α before costs and zero expected α net of costs. Our evidence that the aggregate portfolio of mutual funds has negative α net of costs contradicts the predictions of Berk and Green (2004). The results below on the net returns of individual funds also reject their model. Finally, the model faces a theoretical problem. It violates equilibrium accounting because in aggregate active investors must have zero α before costs and negative α after costs.

III. Bootstrap Simulations Table 2 says that on average mutual funds do not produce gross returns above (or below) those of our passive benchmarks. This may just mean that managers with skill that allows them to outperform the benchmarks are balanced by inferior managers who underperform. We turn now to a simulation approach that uses individual fund returns to infer the existence of superior and inferior managers. A. Setup We limit the tests to 1984-2006 and to funds that reach the equivalent of five million 2006 dollars in assets under management (AUM). Since the AUM minimum is in 2006 dollars, we include a fund in 1984, for example, if it has more than about $2.5 million in AUM in 1984. Once a fund passes the AUM minimum, it is 8

 

included in all subsequent tests, so this requirement does not create selection bias. We also show results for funds after they pass $250 million and $1 billion. Since we estimate benchmark regressions for each fund, we limit the tests to funds that have at least eight months of returns after they pass an AUM bound, so there is a bit of survival bias. To avoid having lots of new funds with short return histories, we only use funds that appear on CRSP at least five years before the end of our sample period. The 1984 start date merits comment. During 1962-1983 about 15% of the funds on CRSP report only annual returns, and the average annual EW return for these funds is 5.29% lower than for funds that report monthly returns. As a result, the EW average return on all funds is a nontrivial 0.65% per year lower than the EW return of funds that report monthly returns. Thus, during 1962-1983 there is a selection bias in tests like ours that use only funds that report monthly returns. (The problem is minor in the VW aggregate returns in Table 2 because funds that report annual returns tend to be small.) After 1983 almost all funds report monthly returns. (Elton, Gruber, and Blake 2001 discuss CRSP data problems for the period before 1984.) A different selection bias leads us to drop funds that have not reached $5 million AUM. Fund management companies commonly provide seed money to new funds to develop a return history. Funds are then opened to the public when their return histories turn out to be attractive. The $5 million AUM bound for admission to the tests alleviates this “incubation bias” since AUM is likely to be low during the pre-release period. The limited tests we are able to do on this issue suggest incubation bias has little or no effect on our results. (See appendix. Evans 2007 provides a detailed analysis of incubation bias.) Our goal is to draw inferences about the cross section of true α for funds, specifically, whether the cross section of α estimates suggests a world where true α is zero for all funds or whether there is non-zero true α, especially in the tails of the cross section of α estimates. We are interested in answering this question for 12 different cross sections of α estimates (gross and net returns, for the three-factor and four-factor benchmarks, and the three AUM samples). Thus, we use regression (1) to estimate each fund’s three-factor or four-factor α for gross or net returns for the part of 1984-2006 after the fund passes each AUM bound. The evidence on the existence of non-zero true α in a cross section of α estimates comes from bootstrap simulations on returns that have all the properties of actual fund returns, except that true α is set to 9

 

zero for every fund. To set α to zero, we simply subtract a fund’s α estimate from its monthly returns. For example, to compute three-factor benchmark-adjusted gross returns for a fund in the $5 million group, we subtract its three-factor α estimated from monthly gross returns for the part of 1984-2006 the fund is in the $5 million group from the fund’s monthly gross returns for that period. We calculate benchmark-adjusted returns for the three-factor and four-factor models, for gross and net returns, and for the three AUM bounds. The result is 12 populations of benchmark-adjusted (zero-α) returns. A simulation run is a random sample with replacement of the calendar months of 1984-2006. A simulation run has 273 months, like January 1984 to September 2006. For each of the 12 sets of benchmarkadjusted returns, we estimate, fund by fund, the relevant benchmark model on the simulation draw of months of adjusted returns, dropping funds that are in the simulation run for less than eight months. Each run thus produces 12 cross-sections of α estimates using the same random sample of months from 12 populations of adjusted (zero-α) fund returns. We do 10,000 simulation runs to produce 12 distributions of t-statistics, t(α), for a world in which true α is zero. We focus on t(α), rather than raw estimates of α, to control for differences in precision due to differences in residual variance and in the number of months funds are in a simulation run. A prime advantage of our simulation approach is that it mimics the joint distribution of fund returns. It thus captures all effects of the cross-correlation of fund returns on the distribution of t(α) estimates for funds. Because we jointly sample fund returns and explanatory returns we also capture all effects of, for example, correlated heteroscedasticity of the explanatory returns and disturbances of a benchmark model. We shall see that capturing the joint distribution of fund returns is critical for valid inferences about the existence of nonzero true α in the cross section of α estimates for actual fund returns. Note that except for funds that are in our tests for the entire 1984-2006 period, a fund is likely to show up in a simulation run for more or less than the number of months of 1984-2006 it is on CRSP. This is not serious. We focus on t(α) estimates and the distribution of t(α) depends on the number of months a fund is in a simulation run only through the degrees of freedom effect.

10

 

Note also that different assumptions about skill are built into the simulations for gross and net returns. For gross returns, setting true α equal to zero simulates a world where no fund manager has skill that affects expected returns. In contrast, setting true α equal to zero for net returns simulates a world where every manager has sufficient skill to generate expected returns that cover the costs imposed on investors. Finally, the simulations share a shortcoming of all performance tests. They only allow inferences about the existence of inferior or superior funds. Since a large cross-section of funds produces some extreme α estimates by chance, we cannot identify the specific managers who are skillful rather than lucky. To develop perspective on the simulations, we first compare, in qualitative terms, the percentiles of the cross-section of t(α) estimates from actual fund returns and the average values from the simulations. We then turn to formal inferences about whether the cross-section of t(α) estimates for actual fund returns points to the existence of skilled managers. B. First Impressions When we estimate a benchmark model on the actual returns of each fund in an AUM group, we get a cross-section of t(α) estimates that can be ordered into a cumulative distribution function (CDF) of t(α) estimates for actual fund returns. A simulation run for the same combination of benchmark model and AUM group also produces a cross-section of t(α) estimates and its CDF for a world in which true α is zero. In our initial examination of the simulations we compare (i) the values of t(α) at selected percentiles of the CDF of t(α) estimates from actual fund returns and (ii) the averages across the 10,000 simulation runs of the t(α) estimates at the same percentiles. For example, the first percentile of three-factor t(α) estimates for the net returns of funds in the $5 million AUM group is -3.92, versus an average first percentile of -2.93 from the 10,000 three-factor simulation runs for the net returns of funds in this group (Table 3). For each combination of gross or net returns, AUM group, and benchmark model, Table 3 shows the CDFs of t(α) estimates for actual returns and the average of the 10,000 simulation CDFs. The average simulation CDFs are quite similar for gross and net returns and for the two benchmark models. This is not surprising given that true α is always zero in the simulations. Note, however, that the dispersion of the average simulation CDFs decreases a bit from lower to higher AUM groups. This is at least in part a degrees of 11

 

freedom effect. Lots of small funds do not survive. As a result, funds in lower AUM groups have shorter average sample periods. Differences in the cross-correlations of fund returns may also play a role in the lower dispersion of the average simulation CDFs of the larger AUM groups. Gross returns are more relevant than net returns for judging whether managers have any skill. Net returns are better for judging whether investors get what they pay for, that is, whether managers have sufficient skill to produce expected returns that cover the costs they impose on investors. Net Returns – The hypothesis that fund managers have sufficient skill to cover costs fares poorly in Table 3. For every benchmark model and AUM group, the CDF of the cross-section of t(α) estimates from actual net fund returns is entirely to the left of the average CDF from the simulations. In other words, the average percentile values of the t(α) estimates from the simulations of net returns (where, by construction, skill is sufficient to cover costs) always beat the corresponding percentile values of the t(α) estimates for actual net fund returns. This evidence does not rule out the existence of some managers with skill sufficient to cover costs, but it is a strong hint that formal tests (discussed later) are unlikely to produce this inference. Gross Returns – It is possible that the fruits of skill do not show up in net fund returns because they are absorbed by costs (fees and expenses). Gross returns provide better evidence on whether fund managers have any stock picking skill. Adding back costs pushes t(α) for actual fund returns toward higher values, but Table 3 shows that for all AUM groups, the left tail of three-factor t(α) estimates for actual gross returns is still to the left of the average from the simulations. For example, the simulations say that in the absence of skill, on average the fifth percentile of t(α) for gross returns for the $5 million group is -1.85, but the actual fifth percentile is lower, -2.18. Thus, the left tails of the CDFs of three-factor t(α) suggest that there are inferior fund managers whose stock picks result in negative true α relative to passive benchmarks. Conversely, the right tail of three-factor t(α) hints at the existence of superior managers who enhance expected returns relative to passive benchmarks. For the $5 million AUM group, the CDF of t(α) estimates for actual gross fund returns moves to the right of the average from the simulations at about the 70th percentile. For example, the 95th percentile of t(α) estimates for funds in the $5 million group averages 1.88 in the simulations, but the actual 95th percentile is slightly higher, 1.98. For the two larger AUM groups the 12

 

crossovers occur at higher percentiles, around the 80th for the $250 million group and the 95th for the $1 billion group. The inferior performance in the left tail of t(α) estimates is typically more evident than the superior performance in the right tail. For example, the 5th percentile of three-factor t(α) estimates for the actual gross returns of funds in the $5 million group is 0.33 standard errors below the average from the simulations, but the 95th percentile for actual fund returns is only 0.10 standard errors above the simulation average.

Our gross

fund returns are, however, net of trading costs, and an adjustment for trading costs (if deemed relevant) would likely result in a more symmetric picture of performance. The four-factor results for gross returns in Table 3 are similar to the three-factor results, with one nuance. Adding a momentum control tends to shrink slightly the left and right tails of the cross-sections of t(α) estimates for actual fund returns. This suggests that funds with negative three-factor α estimates tend to have slight negative MOMt exposure and funds with positive three-factor α tend to have slight positive exposure. Controlling for momentum pulls the α estimates toward zero. The shrinkage is small but at the 99th percentile it suffices to kill the advantage of the t(α) estimates for actual fund returns over the simulation averages. Finally, the simulation distributions of t(α) are fat-tailed. The average simulation distribution of t(α) for the $5 million group (our full sample) is like a t distribution with eight degrees of freedom. Since every α estimate uses at least eight observations and most use many more, we can conclude that the simulation distributions of t(α) are more fat-tailed than can be explained by degrees of freedom. This may in part be due to fat-tailed distributions of stock returns (Fama 1965). It also suggests that properties of the joint distribution of fund returns may have important effects on the cross-section of α estimates – a comment of some import in our later discussion of Kosowski et al. (2006). C. Formal Tests Comparing the percentiles of t(α) estimates for actual fund returns with the simulation averages gives hints about whether manager skill affects returns. Table 3 also provides likelihoods, in particular, the fractions of the 10,000 simulation runs that produce lower values of t(α) at selected percentiles than actual fund returns.

13

 

These likelihoods allow us to judge more formally whether the tails of the cross-section of t(α) estimates for actual fund returns are extreme relative to what we observe when true α is zero. Specifically, we infer that some managers lack skill sufficient to cover costs if low fractions of the simulation runs produce left tail percentiles of t(α) below those from actual net fund returns, or equivalently, if large fractions of the simulation runs beat the left tail t(α) estimates from actual net fund returns. Similarly, we infer that some managers produce expected returns sufficient to cover costs if large fractions of the simulation runs produce right tail percentile t(α) estimates below those from actual net fund returns; that is, if low fractions of the simulation runs beat the right tail t(α) estimates from actual net fund returns. The logic is the same for gross returns, but the inferences center on whether there are managers that have any skill (good or bad) that affects expected returns. Net Returns – Table 3 provides strong evidence that most managers lack sufficient skill to produce expected returns that cover costs. For every left tail percentile and all three AUM groups, less than 1.0% of the net return three-factor simulation runs (in which expected returns cover costs) produce three-factor t(α) estimates below the values observed for actual net fund returns. For example, the 10th percentile of the crosssection of t(α) estimates from the net returns of $5 million funds is -2.31, and only 0.11% of the simulation runs for this group have 10th percentile t(α) estimates below -2.31. Clearly, the left tails of the cross-sections of three-factor t(α) estimates for net returns are dominated by managers with insufficient skill to cover costs. More interesting, the right tails of three-factor t(α) estimates for net returns also offer no evidence of managers with sufficient skill to cover costs. The $1 billion group produces the strongest positive results, but even for these funds at most 24.41% of the simulation runs produce right tail t(α) estimates below those for actual net fund returns. In other words, at every right tail percentile more than 75% of the simulation runs beat the estimate of t(α) from actual net fund returns. For the $5 million and $250 million groups, typically more than 90% of the right tail percentiles of t(α) estimates from the simulations beat those from net fund returns. Switching from the three-factor to the four-factor model to estimate α confirms that the left tail of t(α) estimates is dominated by funds with net returns that do not cover costs. If anything, the absence of right tail evidence that some managers have enough skill to cover costs is stronger for the four-factor model. The 14

 

fractions of the simulation runs that beat right tail estimates of four-factor t(α) for net fund returns are typically more than 90% for the $250 million and $1billion groups and always more than 95% for the $5 million group. Some perspective is in order. For funds in the $5 million group, the 90th and higher percentiles of t(α) estimates for actual net fund returns are 1.00 or larger. The average standard error of the α estimates is 0.28, so a back-of-the-envelope calculation suggests that funds in the right tail produce long-term returns, in excess of costs, that beat those of passive benchmarks by more than 3.5% per year. These fund managers would surely be anointed by the media as highly skilled active investors. Yet the simulations say that the eyepopping returns of extreme winners are consistent with a world where no managers have skill sufficient to cover costs, in other words, where winners are just lucky. In sum, if there are fund managers with enough skill to cover costs, they are buried in the noise of managers with insufficient skill. Like the evidence on aggregate net fund returns in Table 2, the evidence on the cross section of net returns in Table 3 is bad news for Berk and Green (2004), if one takes seriously their prediction that every fund manager has sufficient skill to produce zero expected α in net returns. Gross Returns – The simulations for gross returns in Table 3 are better for inferences about the existence of managers with any skill that affects expected returns. The $1 billion AUM group presents the strongest formal case for our earlier informal inference that there are inferior managers that produce negative true α relative to passive benchmarks. For left tail percentiles below the 40th, the three-factor t(α) estimates for actual gross returns of the $1 billion funds are above the simulation values in less than 10% of the simulation runs. For the $5 million and $250 million groups, the actual three-factor t(α) estimates for percentiles up to the 20th exceed the corresponding simulation values in less than 20% of the simulation runs. Below the 20th percentile, it is unusual for more than 5% of simulation runs to produce three-factor t(α) estimates below those for actual gross fund returns. Thus, with a 5% threshold, the evidence in Table 3 does not reject the hypothesis that the extreme negative values of three-factor t(α) for actual gross fund returns are just bad luck. Even with a 10% threshold, we cannot infer that bad management contributes to the left tail of three-factor t(α) estimates from actual gross fund returns for the $5 million AUM group (the full sample) and the $250 group. Moreover, Table 3 shows that switching from the three-factor to the four-factor model to 15

 

estimate α almost always increases the fraction of simulation runs in which left tail percentiles of t(α) for actual fund returns beat the estimates from the simulations.

Controlling for momentum exposure thus

produces higher likelihoods that the left tail of the cross-section of t(α) estimates for actual gross fund returns is due to bad luck, not bad management. The absence of strong evidence that poorly performing funds have negative true α relative to passive benchmarks is good news for fund managers. The bad news is that the right tails of t(α) estimates for gross fund returns provide little evidence that there are funds with positive true α. The 95th and higher percentiles of the cross-section of three-factor t(α) estimates for actual gross fund returns are above the average values from simulations – but not so far above as to be unlikely in a world where all funds have zero true α. For the full ($5 million) sample, the 95th and higher percentiles of three-factor t(α) estimates from actual fund returns are above the same simulation percentiles in 54.79% to 76.03% of the simulation runs, which means 23.97% to 45.21% of the simulation runs beat the t(α) estimates from fund returns. In the four-factor tests for the $5 million group, 28.98% to 58.69% of the simulations beat the t(α) estimates for the 95th and higher percentiles from actual returns. The results for $250 million and $1 billion funds are similar. It is probably a mistake to use standard significance levels to reject the hypothesis that there are no skill effects (non-zero true α) in gross fund returns. For example, the arguments of Grossman and Stiglitz (1980) suggest that the natural hypothesis is that there are skill effects in gross returns. Since the tests on gross returns tilt toward the presence of non-zero true α, they also cannot reject this alternative hypothesis. In the end, the best approach is probably to estimate the size of the skill effects suggested by gross returns. We do this next, as part of our examination of the power of the simulations. IV. Power The bootstrap simulation approach is potentially subject to a power problem. With so many funds, it may be difficult to reject a null hypothesis in tests on the cross section of t(α) estimates. Since we have no problem drawing inferences about net returns, the power tests focus on gross returns. To examine power we repeat the simulations but with α injected into fund returns. We then examine (i) how much α is necessary to reproduce the cross section of t(α) estimates for actual gross fund returns, and 16

 

(ii) levels of α too extreme to be consistent with the cross section of t(α) estimates for actual fund returns. Given the constraints of equilibrium accounting and the evidence that the median α in gross fund returns for 1984-2006 is close to zero (Table 3), it is reasonable to assume that true α is distributed around zero. It is also reasonable to assume that extreme levels of skill (good or bad) are rare. Concretely, we assume that each fund is endowed with an α drawn from a normal distribution with mean zero and standard deviation x% per year. The new bootstrap simulations are much like the old. The first step again is to use regression (1) to adjust the gross returns of each fund so its estimated α is zero for the three-factor and four-factor benchmarks and for each of the three AUM groups. Now, however, before drawing the random sample of months to be used in a simulation run, we add a true α drawn from a normal distribution with mean zero and standard deviation x% per year to a fund’s benchmark-adjusted returns – the same α for every combination of benchmark model and AUM group for a given fund, but an independent drawing of α for each fund. As in the simulations of Table 3, we then draw a random sample (with replacement) of 273 months, and for each fund we estimate the three-factor and four-factor versions of regression (1) on the α-controlled gross returns of the fund’s three AUM samples. In this way, the simulation runs use returns that have the properties of actual fund returns, except we know true α has a normal distribution with mean zero and standard deviation x% per year. We do 10,000 simulation runs, and a fund gets a new drawing of α in each run. To examine power, we vary x, the standard deviation of true α, from 0.0% to 2.0% per year, in steps of 0.5%. The results are in Table 4. The first column of the table shows percentiles of the cross section of t(α) estimates for actual gross fund returns (repeated from Table 3). The next five columns show the average t(α) estimates at the same percentiles from the 10,000 simulation runs, for each of the five values of x. These are useful for judging how much performance (x) is consistent with the actual cross section of t(α) estimates. The final five columns show, for each value of x, the fraction of simulation runs that produce t(α) estimates at the selected percentiles below the estimates from actual fund returns. These are formal evidence about the amount of performance in actual fund returns that we can rule out as too extreme.

17

 

A. Likely Levels of Performance If true α comes from a normal distribution with mean zero and standard deviation x, Table 4 provides two slightly different ways to infer the value of x. We can look for the value of x in the simulations that produces average percentile values of t(α) most like those from actual fund returns. Or we can look for the x that produces simulation t(α) estimates below those for actual returns in about 50% of the simulation runs. Moreover, if α has a normal distribution with mean zero and standard deviation x, we expect that the effects of the level of x become stronger as we look further into the tails of the cross section of t(α). Thus, we are most interested in values of x that match the extreme tails of the t(α) estimates for actual fund returns. The normality assumption for the distribution of true α is an approximation, and we do not expect that a single value of x (the standard deviation of true α) completely captures the cross section of t(α) estimates for actual fund returns, even if we allow a different estimate of x for each tail. With this caveat, the simulations suggest that the extreme left tail of t(α) estimates for actual gross fund returns is consistent with x roughly equal to 1.0% per year, but 0.5% does a better job on the right tail. These estimates work about as well for all three AUM groups and for the three-factor and four-factor benchmark models. (To save space Table 4 shows results only for the $5 million and $1 billion AUM groups.) The estimates do not suggest much performance. The right tail x = 0.5% says that about one sixth of funds have a true α greater than 0.5% per year (about 0.04% per month) and about 2.38% have a true α greater than 1.0% per year (0.08% per month). The numbers roughly double for the left tail, but are still unimpressive. For perspective, the average of the OLS standard errors of individual fund α estimates is 0.28% per month. Our gross fund returns are net of trading costs.

An adjustment for trading costs (if deemed

appropriate) would lift both the left and the right tails of the t(α) estimates for actual gross fund returns, which would probably move them toward more similar estimates of x.

B. Unlikely Levels of Performance The simulations with x = 1.0% per year produce extreme left tail percentiles of t(α) estimates below those for actual fund returns in roughly 50% of the 10,000 simulation runs (Table 4). This is the basis of our

18

 

estimate that the standard deviation of true α for poorly performing funds is about 1.0%. Similar comments apply to the 0.5% estimate of x for the extreme right tail percentiles of t(α). What levels of x can we reject? Table 4 says that x = 1.5% per year implies too much performance, especially in the right tail of t(α). When the standard deviation of injected α in the simulations is 1.5% per year, the 95th and higher percentiles of t(α) for actual fund returns beat those from the simulations in less than 4.0% of the simulation runs. For the first five percentiles of the left tail, more than 79% of the simulation runs produce t(α) estimates below the same percentile values for actual fund returns. We can thus reject x = 1.5% but with more confidence for the right tail of t(α) than for the left tail. This is not surprising given that the estimates of x for actual fund returns are 1.0% for the left tail and 0.5% for the right tail. Can we further narrow the range of likely values of x? When we raise x from 0.5% to 1.0% for the right tail t(α) estimates, typically less than 25% of the simulation runs produce t(α) estimates at extreme percentiles below those from actual fund returns. In other words, when x = 1.0% typically more than 75% of the extreme right tail simulation percentiles of t(α) beat those from actual fund returns. The evidence against x = 0.0% for the right tail of t(α) is a bit weaker. When no performance is injected into the simulations (x = 0.0%), it is common that more than 30% of the simulation runs beat the extreme right tail percentiles of t(α) estimates from actual fund returns. It seems likely that the right tail value of x is between 0.0% and 1.0%. Similarly, for the left tail of the t(α) estimates, we can probably conclude that x is greater than 0.5%, but with less confidence than we can reject x = 1.5%. In sum, we seem able to infer, with reasonable confidence, that for the left tail of t(α) estimates, the standard deviation of true α for gross returns is greater than 0.5% per year and less than 1.5%. That we get rather strong probability statements for such small deviations from the estimated value, 1.0%, suggests that the simulation approach has power. The right tail of the t(α) estimates for gross fund returns is consistent with a distribution for true α that has standard deviation 0.5% per year. This is not much performance, and perhaps as a result, the evidence against the no performance alternative (x = 0.0%) is rather weak. We are able to infer with a bit more confidence that for the right tail, x = 1.0% is too high. Since we can put x into a rather small interval around its point estimate, the simulation approach again seems to have power. 19

 

V. Kosowski et al. (2006) The paper closest to ours is Kosowski et al. (2006). They use bootstrap simulations to draw inferences about performance in the cross-section of four-factor t(α) estimates for net fund returns. Their main inference is more positive than ours. They find that the 95th and higher percentiles of four-factor t(α) estimates for net fund returns are above the same simulation percentiles in more than 99% of simulation runs. This seems like strong evidence that among the best performing funds, some have more than sufficient skill to cover costs. In contrast, our simulations uncover no evidence of skill sufficient to cover costs. Two things account for their stronger results, (i) simulation approach and (ii) time period. We jointly sample fund (and explanatory) returns, whereas Kosowski et al. (2006) do simulations independently for each fund. Their simulations thus take no account of the correlation of the α estimates for different funds that arises because the explanatory returns of a benchmark model do not capture all common variation in fund returns (for example, industry effects). They summarize but do not show simulations that jointly sample the four-factor residuals of funds. But they never jointly sample fund returns and explanatory returns, which means (for example) they miss any effects of correlated movement in the volatilities of fourfactor explanatory returns and residuals. In fact, in the results they show, the explanatory returns do not vary across simulation runs; the historical sequence of explanatory returns is used in every run. Table 5 shows results for their 1975-2002 period using our simulation approach and our full ($5 million AUM) sample. Our simulations do not confirm their inference that many right tail funds have more than sufficient skill to cover costs. More than 99% of their simulation runs produce 95th percentile four-factor t(α) estimates below the 95th percentile from actual net fund returns. In our simulations the number is 53.09%, and the 95th percentile t(α) for actual fund returns, 1.90, is close to the average from the simulations, 1.91 . We infer that failure to account for the joint distribution of fund returns, and of fund and explanatory returns, biases their inferences toward positive performance. (Cuthbertson et al. (2008) apply the simulation approach of Kosowski et al. (2006) to UK mutual funds, with similar results and, we guess, similar problems.) Though less important, there is also a survival bias in Kosowski et al. (2006). They require that funds exist for five years to appear in their tests. Skipping the details, we can report that when we impose this 20

 

rule on the tests for 1975-2002, the fraction of simulation runs that produce 95th percentile t(α) estimates below the 95th percentile for actual net fund returns rises from 53.09% to 65.65%. We can also report that though our fund inclusion rules differ somewhat from theirs, our sample sizes for 1975-2002, with and without the 60month survival rule, are similar to those reported in their Table 1. Time period is also a source of differences in results. For 1984-2006, our simulations produce no hint of funds with sufficient skill to cover costs. In Table 3, the CDFs of four-factor t(α) estimates for the actual net fund returns of 1984-2006 are always to the left of the average CDFs from the net return simulations (in which funds have sufficient skill to cover costs). Even in the extreme right tail of four-factor t(α) for net returns, more than 95% of the simulation runs beat the t(α) estimates for actual fund returns. But applied to the 1975-2002 period of Kosowski et al. (2006), the 90th and higher percentiles of t(α) for actual net fund returns are similar to the average values from the simulations (Table 5), and about half the simulation runs produce t(α) estimates at the 90th and higher percentiles lower than those for net fund returns. This suggests that some of the best performing funds of 1975-2002 have sufficient skill to cover costs. Likewise, in tests for 1984-2006 (Table 3), the simulations say that the right tails of the t(α) estimates for actual gross fund returns are consistent with a world in which fund managers have little or no skill. In contrast, for 1975-2002, the 90th through 98th percentiles of t(α) for actual gross fund returns exceed the same percentiles from the simulations in more than 93% of simulation runs (Table 5). This suggests that among the top 10% of the funds of 1975-2002, some managers have skill that enhances expected returns. What do we make of the stronger results for 1975-2002 versus 1984-2006? One story is that in olden times there were fewer funds, and it is possible to infer that there were managers with skill and some with sufficient skill to cover costs. Over time the skilled managers lost their edge or went on to more lucrative pursuits (for example, hedge funds). Or perhaps, the entry of hordes of mediocre managers posing as skilled (Cremers and Petajisto 2008) eventually makes it impossible to uncover the tracks of true skill. Stronger results for 1975-2002 may also be due to biases in the CRSP data that are more prevalent in earlier years (Elton, Gruber, and Blake 2001). Whatever the explanation, the stronger evidence for performance during 1975-2002 is interesting, but irrelevant for today’s investors. 21

 

IV. Conclusions For 1984-2006, when the CRSP database is relatively free of biases, mutual funds on average and the average dollar invested in funds underperform three-factor and four-factor benchmarks by about the amount of costs (fees and expenses). Thus, if there are fund managers with skill that enhances expected returns relative to passive benchmarks, they are offset by managers whose stock picks lower expected returns. We attempt to identify the presence of skill via bootstrap simulations. The tests for net returns say that even in the extreme right tails of the cross-sections of three-factor and four-factor t(α) estimates, there is no evidence of fund managers with skill sufficient to cover costs. The simulation results for gross returns produce hints of the existence of managers with skill that enhances expected returns. But they are just hints. Rather large fractions of the simulation runs in which there is no skill (true α is zero in gross returns) produce more extreme right tails than the right tails of the cross-sections of t(α) estimates for actual fund returns. Finally, the aggregate portfolio of the mutual funds that specialize in U.S. equities is much like the market portfolio of U.S. equities, and the aggregate fund return is close to the market return, before costs. But we have only examined mutual funds. It is possible that other segments of the investment industry, for example, hedge funds, in aggregate produce returns that point to the existence of skill. Equilibrium accounting tells us, however, that if the hedge fund industry produces positive α due to skill, there must be other actively managed segments of the investment industry that pay for hedge fund winnings, dollar for dollar, with negative α. And the balancing of the dollar losses of losers against the gains of winners occurs before costs. After costs, that is, in terms of returns to investors, active management must in aggregate be a negative sum game, by the amount of the costs that active managers impose on investors (French 2008). Appendix – Incubation Bias Fund management companies commonly provide seed money to new funds to develop a return history. Funds that have good performance are then opened to the public. The result is an upward bias in fund returns during the “incubation” period. Evans (2007) suggests that incubation bias can be minimized by using returns only after funds receive a ticker symbol from the NASD, which typically means they are available to the public. Systematic data on 22

 

ticker symbol dates are only available after 1998. We have replicated the tests in Tables 2 and 3 for 1999-2006 using CRSP start dates for new funds (as in Tables 2 and 3) and then using NASD ticker dates (from Evans). Switching to ticker dates has almost no effect on aggregate fund returns (as in Table 2), and trivial effects on the cross-section of t(α) estimates for funds (as in Table 3). We conclude that incubation bias is probably unimportant in our results for other periods. Appendix – CAPM Bootstrap Simulations Table A1 replicates the bootstrap simulations in Table 3 for a CAPM benchmark, that is, regression (1) with the excess market return as the only explanatory variable. The CAPM results are much different. The CAPM tests on net returns produce what seems like strong evidence that some fund managers have sufficient skill to cover the costs they impose on investors. Thus, for percentiles above the 90th, the CAPM t(α) estimates for actual net fund returns are always above the averages from the net return simulations (in which all managers have sufficient skill to cover costs), and the t(α) estimates for actual fund returns typically beat those from the simulations in more than 80% of simulation runs. Relative to the three-factor and four-factor tests in Table 3, the CAPM tests on gross returns in Table A1 also produce what seems like stronger evidence that some managers have skill that leads to positive true α, while others have negative true α. In fact, the CAPM results just illustrate the effects of failure to account for patterns in average returns during our sample period. Actual fund returns contain the effects of size, value/growth, and momentum tilts in fund portfolios that cause well-known problems for the CAPM. As a result, even passive funds that tilt toward small stocks, or value stocks, or positive momentum stocks are likely to produce positive α estimates in CAPM tests, despite the fact that their managers have no skill in picking individual stocks. The CAPM simulations wash out all but the pattern in average returns that traces to market exposure, so they confirm that actual fund returns can have non-zero true α in the CAPM tests because the CAPM fails to explain systematic patterns in average returns unrelated to market exposure. Which patterns in average returns left unexplained by the CAPM are most responsible for the different simulation results for the CAPM versus the three-factor and four-factor models? Table 3 says that adding the momentum factor to the three-factor model has minor effects on estimates of t(α). Since the momentum return 23

 

MOMt in (1) has the highest average premium during our sample period, we infer that long-term exposure to momentum is probably rare among mutual funds. The average size (SMBt) premium in (1) is trivial during our 1984-2006 sample period (0.03% per month, Table 1), so size tilts are not driving the different results for the CAPM. That leaves the value (VMGt) premium as the focus of the story. Funds in the right tail of the CAPM t(α) estimates are more likely to have positive VMGt exposure that makes them look good in CAPM tests, and funds in the left tail are likely to have negative VMGt exposure. In short, the CAPM tests are a lesson about how failure to account for common patterns in returns and average returns can lead to false inferences about stock picking skill.

24

 

References Banz, Rolf W., 1981, The relationship between return and market value of common stocks, Journal of Financial Economics 9, 3-18. Berk, Jonathan B., and Richard C. Green, 2004, Mutual fund flows in rational markets, Journal of Political Economy 1269-1295. Carhart, Mark M., 1997, On persistence in mutual fund performance, Journal of Finance 52, 57-82. Cremers, Martijn, and Antti Petajisto, 2008, How active is your fund manager? A new measure that predicts performance," Review of Financial Studies, forthcoming. Cuthbertson, Keith, Dirk Nitzche, and Niall O’Sullivan, 2008, UK Mutual fund performance: Skill or luck? Journal of Empirical Finance, forthcoming. Dybvig, Philip H. and Stephen A Ross, 1985, The Analytics of performance measurement using a security market line, Journal of Finance 40, 401-416. Elton, Edwin J., Martin J. Gruber, and Christopher R. Blake, 2001, A first look at the accuracy of the CRSP mutual fund database and a comparison of the CRSP and Morningstar mutual fund databases, Journal of Finance 56, 2415-2430. Fama, Eugene F., 1965, The behavior of stock market prices, Journal of Business 38, 34-105. Fama, Eugene F., and Kenneth R. French, 1992, The cross-section of expected stock returns, Journal of Finance 47, 427-465. Fama, Eugene F., and Kenneth R. French, 1993, Common risk factors in the returns on stocks and bonds, Journal of Financial Economics 33, 3-56. French, Kenneth R., 2008, The cost of active investing, Journal of Finance, 63, forthcoming. Grinblatt, Mark, and Sheridan Titman, 1992, Performance persistence in mutual funds, Journal of Finance 47, 1977-1984. Grossman, Sanford J., and Joseph E. Stiglitz, 1980, On the impossibility of informationally efficient markets, American Economic Review 70, 393-408. Gruber, Martin J. 1996, Another puzzle: The growth of actively managed mutual funds, Journal of Finance 51, 783-810. Jegadeesh, Narasimhan, and Sheridan Titman, 1993, Returns to buying winners and selling losers: Implications for stock market efficiency, Journal of Finance 48, 65-91. Jensen, Michael C., 1968, The performance of mutual funds in the period 1945-1964, Journal of Finance 23, 2033-2058.

25

 

Kosowski, Robert, Allan Timmermann, Russ Wermers, and Hal White, 2006, Can mutual fund “stars” really pick stocks? New evidence from a bootstrap analysis, Journal of Finance 61, 2551-2595. Lakonishok, Joseph, Andrei Shleifer, and Robert W. Vishny, 1994, Contrarian investment, extrapolation, and risk, Journal of Finance 49, 1541-1578. Malkiel, Burton G. 1995, Returns from investing in equity mutual funds: 1971-1991, Journal of Finance 50, 549-572.

26

 

Table 1 — Summary statistics for monthly explanatory returns of the three- and four-factor models RM, is the return on a value-weight market portfolio of NYSE, Amex (after 1962), and NASDAQ (after 1972) stocks, and Rf, is the one-month Treasury bill rate. The construction of SMBt and VMGt follows Fama and French (1993). At the end of June of each year k, we sort stocks into two size groups. Small includes NYSE, Amex (after 1962), and (after 1972) NASDAQ stocks with June market capitalization below the NYSE median and Big includes stocks with market cap above the NYSE median. We also sorts stocks into three book-tomarket equity (B/P) groups, Growth (NYSE, Amex, and NASDAQ stocks in the bottom 30% of NYSE B/P), Neutral (middle 40% of NYSE B/P), and Value (top 30% of NYSE B/P). Book equity B is for the fiscal year ending in calendar year k-1, and the market cap P in B/P is for the end of December of k-1. The intersection of the (independent) size and B/P sorts produces six value-weight portfolios, refreshed at the end of June each year. The size return, SMBt, is the simple average of the month t returns on the three Small stock portfolios minus the average of the returns on the three Big stock portfolios. The value-growth return, VMGt, is the simple average of the returns on the two Value portfolios minus the average of the returns on the two Growth portfolios. The momentum return, MOMt, is defined like VMGt, except that we sort on prior return rather than B/P and the momentum sort is refreshed monthly rather than annually. At the end of each month t-1 we sort NYSE stocks on the average of the eleven months of returns to the end of month t-2. (Dropping the return for month t-1 is common in the momentum literature.) We use the 30th and 70th NYSE percentiles to assign NYSE, Amex, and NASDAQ stocks to Low, Medium, and High momentum groups. The intersection of the size sort for the most recent June and the independent momentum sort produces six value-weight portfolios, refreshed monthly. The momentum return, MOMt, is the simple average of the month t returns on the two High momentum portfolios minus the average of the returns on the two Low momentum portfolios. The table shows average monthly return, the standard deviation of monthly returns and the t-statistic for the average monthly return. Time periods are as indicated, except that 2006 is September 2006. Average Return RM-Rf SMB VMG MOM

Standard Deviation RM-Rf SMB VMG MOM

t-statistic RM-Rf SMB VMG MOM

1962-2006

0.45

0.23

0.48

0.82

4.42

3.22

2.90

3.98

2.36

1.65

3.82

4.80

1962-1983 1984-2006

0.25 0.64

0.44 0.03

0.56 0.40

0.86 0.79

4.48 4.36

3.03 3.38

2.59 3.17

3.57 4.35

0.92 2.42

2.36 0.13

3.48 2.10

3.91 3.01

1962-1992 1993-2006

0.38 0.61

0.24 0.20

0.47 0.49

0.82 0.83

4.53 4.15

2.87 3.90

2.54 3.57

3.41 5.04

1.60 1.90

1.62 0.67

3.59 1.76

4.65 2.10

27

 

Table 2 – Intercepts and slopes in variants of regression (1) for equal-weight (EW) and value-weight (VW) portfolios of mutual funds The first two lines of the table show the average number of funds (Funds) and the average across funds of average monthly assets under management (AUM) in millions of dollars. Part A shows the annualized intercepts (12*α) and t-statistics for the intercepts (t(α)) for the three- and four-factor versions of regression (1) estimated on equal-weight (EW) and value-weight (VW) returns on the portfolio of mutual funds in our sample. Part B shows the regression slopes, t-statistics for the slopes (t(b-1) for the market slope) and the regression R2 from the estimates of the four-factor version of (1). Part A shows results for gross (pre-expense ratio) and net fund returns. Part B shows regression slopes for only net returns. When a fund’s expense ratio for a year is missing, we assume it is the same as other funds with similar AUM, with separate estimates for active and passive funds. The time periods are as indicated, except that 2006 is September 2006.

Funds AUM

1962-2006 EW VW

1962-1983 EW VW

1984-2006 EW VW

1962-1992 EW VW

1993-2006 EW VW

819 410.7

246 136.7

1374 675.8

331 179.0

1921 933.2

Part A: Three-factor and four-factor α estimates (monthly values multiplied by 12) Net Returns Three factor 12 * α -0.52 t(α) -1.38

-0.49 -1.67

-0.23 -0.37

-0.08 -0.17

-0.94 -2.21

-0.80 -2.62

-0.08 -0.17

-0.16 -0.41

-1.44 -2.56

-1.16 -3.09

Four factor 12 * α t(α)

-0.86 -2.90

-1.25 -2.08

-0.87 -1.75

-0.91 -2.08

-0.93 -2.98

-0.86 -1.92

-0.71 -1.84

-1.38 -2.39

-1.38 -3.68

-0.85 -2.19

Gross Returns Three factor 12 * α t(α)

0.61 1.61

0.26 0.89

0.74 1.21

0.53 1.04

0.36 0.85

0.10 0.32

0.97 2.14

0.52 1.38

-0.10 -0.18

-0.23 -0.61

Four factor 12 * α t(α)

0.29 0.74

-0.11 -0.35

-0.29 -0.49

-0.27 -0.53

0.39 0.90

-0.03 -0.09

0.18 0.41

-0.02 -0.06

-0.03 -0.06

-0.45 -1.20

Part B: Four-factor regression slopes and R2 for net returns b t(b-1)

0.97 -4.14

0.97 -5.31

0.94 -5.08

0.97 -3.42

0.98 -1.81

0.97 -5.20

0.94 -6.75

0.95 -6.03

1.00 0.33

0.99 -1.53

s t(s)

0.20 19.71

0.05 6.86

0.26 15.41

0.08 5.38

0.18 15.81

0.05 6.26

0.26 19.44

0.08 7.21

0.19 13.86

0.05 5.77

v t(v)

-0.03 -2.85

-0.05 -5.86

-0.08 -4.12

-0.09 -5.65

-0.00 -0.23

-0.03 -2.93

-0.10 -6.66

-0.10 -7.45

0.05 3.21

0.00 0.16

m t(m)

0.03 3.50

0.03 5.17

0.08 5.81

0.06 5.38

-0.00 -0.34

0.01 1.92

0.06 6.17

0.05 5.07

-0.01 -0.56

0.02 2.89

R2

0.98

0.99

0.97

0.98

0.98 28

0.99

0.98

0.98

0.98

0.99

 

Table 3 - Percentiles of t(α) estimates for actual and simulated fund returns: January 1984 to September 2006 The table shows values of t(α) at selected percentiles (Pct) of the distribution of t(α) estimates for actual (Act) net and gross fund returns. The table also shows the percent of the 10,000 simulation runs that produce lower values of t(α) at the selected percentiles than those observed for actual fund returns (% < Act). Sim is the average value of t(α) at the selected percentiles from the simulations. The period is January 1984 to September 2006 and results are shown for the three- and four-factor models for the $5 million, $250 million, and $1 billion AUM fund groups. Pct

Sim

5 Million Act %