Modern Approaches in the Evaluation of
Management Skill in the Mutual Fund
Universitat Pompeu Fabra
Portfolio management evaluation has been a central topic of financial research since the late
1960ies and continues to be a vibrant area of scientific inquiry for several reasons: From a
theoretical standpoint, the detection of superior portfolio management skill questions the
concept of semi-strong form market efficiency (Fama, 1970), asserting that asset prices fully
reflect all available public information in financial markets. The study of performance evaluation
is furthermore a direct test of different asset-pricing models developed in the last 50 years, and
the insights gained from it have had important repercussions on the development of the
theoretical asset pricing literature.
Aside from its impact on financial theory, the pursuit to distinguish skill from luck in asset
management has important implications for social welfare and the macroeconomic environment
in general. While the American society fully relies on financial instruments, mostly managed
funds, to ensure the inter-temporal smoothing of personal consumption, an increasing number of
other countries nowadays base part of their pension system on funded-schemes. Clearly, the
importance of knowing the value and quality of money management that is involved in this
process cannot be overestimated. From a macroeconomic point of view, mutual and pension
funds have to be seen as devices to channel savings to its productive uses. The prosperity of an
economy will crucially depend on the abilities of the individuals who take these allocational
The main goal of this review is to provide the reader with a clear overview about the recent
methodological developments and results of the performance evaluation literature. Moreover, an
attempt is made to relate them to developments in other parts of the financial literature and our
understanding of financial markets in general. The text is structured as follows: Section 2 will set
up an analytical framework, which helps to understand the different specific evaluation
methodologies as well as highlight several intellectual problems the literature is concerned with.
Part 3 of the review will give the reader a brief introduction to the more “classical” approaches of
performance evaluation while section 4, the central part of this text, is concerned with recent
methodological developments and their results. Part 5 concludes and highlights promising
avenues for future research.
2. An analytical framework for portfolio management skill
In this section I will first provide a simple analytical framework that is common to performance
evaluation studies and then highlight several critical issues that have caused diversions in opinion
as well as the results in the literature.
Mutual fund portfolios are linear combinations of assets that asset managers update on a period
by period basis. Expected returns on this bundle of assets will therefore depend on the expected
returns of each of the individual assets in the portfolio as well as the skill of the manager in
establishing this mix. The expected excess return of the portfolio over the risk-free rate ( r p ,t ) will
be a linear combination of premia attributed to economic risk factors ( λk ):
E( r p ,t ) = α p + ∑ b i ,k ,t λk ,t (1)
The expected return on each portfolio thus depends on its exposure to these factors ( b i ,k ). All
linear asset pricing models can be nested within this general specification. For instance, the
classical Capital Asset Pricing Model (CAPM) due to (Sharpe 1964, Lintner 1965) considers only
one risk factor, the market risk, and hence (1) takes the simple form where k=1. The parameter
α p , often referred to as risk adjusted return or simply “alpha”, summarizes the asset selection
skill of the fund manager. If the pricing model (1) is correctly specified any deviation of α p from
zero can only be interpreted as selection skill of the manager or the lack thereof if it is negative.
Disagreement among academics as well as widely differing results on the degree of management
skill mostly arise due to differing interpretations and estimations of this simple pricing relation. In
order to structure this text, the following five points summarize the main problems the literature
is trying to solve as well as some obstacles in the process. I will refer to them repeatedly, when
discussing how different researchers have dealt with them:
i. The number of risk factors, that determine expected returns in financial markets is far
from well defined. As mentioned the CAPM presupposes only one factor, whereas more
complex specifications include a multitude of risk sources, that often conform with
observed asset pricing anomalies such as the small-firm and the value effect. It is
important to understand that testing for non-zero alphas of managed portfolios is a joint
test for management performance (or equivalently semi-strong form market efficiency)
and the validity of the asset pricing model that is assumed to describe the world.
ii. Risk exposures ( b i ,k ) need not be constant over time. First, each individual assets in the
portfolio can vary in its exposure to economy wide risk factors over time. More
importantly though, one might suspect considerable time-series variation of the factor
loadings, due to changes in the portfolio composition as managers try to time their
exposure to risk, according to the time varying degree of the premium associated with it.
iii. For identity (1) to hold in practice, managers have to be able to invest in a complete set
of assets. Not only are financial markets incomplete in practice but there exist clear-cut
restrictions for managers on the types (f.e. equity vs. fixed income funds) as well as styles
(f.e. value stock, small-cap…) of assets in their choice set. In the context of performance
evaluation this implies that managers should be compared to peers that face the same
restrictions. A task that is not easy to achieve.
iv. In the simplest case, performance studies estimate the parameters of relation (1)
econometrically using historic return data on fund and mimicking portfolios that are
constructed to synthesize the premia associated to different risk factors. It is conceivable
that the statistical inference can be considerably sharpened by enriching this simple
information set in different dimensions. The consistent use of such extraneous
information in econometric analysis is again not as straightforward as it might seem.
v. Finally, one major issue in the detection of money management skill are data limitations.
One problem is survivorship bias: Funds with histories that are too short to accurately
estimate their risk adjusted return, are excluded from the evaluation. Precisely these
funds, however, are often the worst performing ones and their exclusion causes an
upward bias in the researchers impression about the degree of management skill in the
industry. This problem is succinctly summarized by Brown et al. (1992, pp. 559-560)
when they explain that
“If the probability of survival depends on past performance to date, we might
expect that the set of managers who survive will have a higher ex post return than
those who did not survive. Managers who take on significant risk and lose may
also have a low probability of survival. This observation suggests that past
performance numbers are biased by survivorship.”
A less well documented data constraint, however, works in the opposite direction. Many
times young funds outperform their peers substantially. Statistical inference using past
return histories of a minimum length will lead to the exclusion of these funds from the
analysis and therefore bias the overall impression of management skill in the industry
downwards. It is crucial to understand that there exists a subtle interplay between
statistical accuracy of the parameter estimates in (1) on the one hand and time variation
in factor loadings (ii) as well as survivorship bias on the other. A good evaluation
methodology will ideally require only short return histories as an input, while guaranteeing
a minimum degree of statistical accuracy in the parameter estimation.
3. The “classic” performance evaluation literature
Besides early descriptive work of Close (1952), the first major analytical treatment of mutual fund
performance is Jensen (1968). Since then it has become one of the main references in the
literature. In his study the author assumes the validity of the CAPM and hence the existence of a
unique risk factor, the market or systematic risk. If agents are risk averse and have the same set of
beliefs about the payoff of securities in the economy, the expected excess return of a portfolio is
a linear function of the excess return on the market portfolio, proxied by the S&P 500. Relation
(1) here takes the simplest possible form where k=1. The parameter α p , in recognition of this
contribution often labeled “Jensen’s Alpha“, captures managerial ability. His data comprise a
set of 115 U.S. open end mutual funds. Applying OLS to estimate (1), he first finds no superior
performance of the industry as a whole. Net of expenses, the average fund under-performs the
market by 1.1% on a risk adjusted basis. Examining the coss-section of alphas he notes that the
distribution is skewed to the left. While 14 funds out of his sample have historically under-
performed the market at a 5% level of significance, only three have significantly positive alphas.
Since in a sample of 115 portfolios one expects six to out-perform at this level of significance by
pure chance he concludes that there is no evidence for any superior performance of the mutual
fund industry as compared to a simple buy-and-hold strategy of the market portfolio. Similar
conclusions on superior performance are thereafter reached by McDonald (1974).
At the same time researchers begin to investigate an additional dimension of management skill,
namely the ability to time the market. In a nutshell, these studies assess, whether the factor
loadings in (1) are positively correlated to the risk premium of the factor, therefore addressing
issue (ii) discussed in the previous section. Early work of Treynor and Mazury (1966) finds no
statistical evidence of market timing skill for a sample of 57 funds between 1953 and 1962.
Ross (1976) develops the Arbitrage Pricing Theory, which rationalizes the existence of more than
one risk factor in the economy. The most widely used empirical implementation of this idea is
provided by Fama-French (1993), who construct mimicking portfolios for risk factors associated
with commonly observed asset pricing anomalies such as the small-firm and value effects.
Lehmann and Modest(1987) provide an extensive test of management skill, comparing results
from assuming different asset pricing models and different benchmark (factor) specifications of
the underlying portfolio return generating process. Furthermore they refer to the central insight
of Dybvig, Ross (1985), whereby static asset pricing models lead to the erroneous
underestimation of managers’ stock picking abilities if they possess market timing skills at the
same time1. They therefore address issues (i),(ii) and (iii) in a very detailed manner. LM’s
conclusions from examining data on 130 funds from 1968 to 1982 are threefold. First, they find
substantial differences in the absolute as well as relative risk-adjusted performance estimates
across the CAPM and APT specifications. The latter generally yield more pessimistic conclusions
about managerial abilities. Second, such conclusions are also highly sensitive to the choice and
construction of benchmark portfolios. Finally, they show that risk adjusted performance
estimates are relatively insensitive to the addition of further common risk factors after a certain
minimum number is considered. Their contribution is a methodological one and lies in their
conjecture that not the complexity of the pricing model, but rather the correct specification of
benchmarks is crucial for performance evaluation of investment management.
The second block of the “classic” literature on the topic of money management performance
examines the persistence of performance in other words the degree of positive serial correlation
in risk adjusted returns of managed portfolios. Again (i), the correct specification of an asset price
model and (iii), the choice of appropriate benchmarks are central issues in this debate. Using
quarterly returns of 165 no-load2 growth oriented funds, Hendricks et al. (1993) detect positive
serial correlation in excess returns net of fees. By employing different pricing and benchmark
specifications3 they find positive serial correlation of abnormal returns up to 4 quarters and a
reversal thereafter. This result holds for positive and, even more so, for negative risk adjusted
returns. They provide several explanations for these results: Skilled analysts get bid away once
they build a track record. Excessive fund flows after superior performance erode future
performance. Once reputation is established managers sit back and the increased remuneration of
good managers eat up any extra value created by their skill.
1 The intuition behind this argument is that uniformed investors, who cannot observe the private information set of
a manager that induces him to vary factor exposures, interpret timing related return movements as useless additions
to overall volatility of the portfolio returns.
2 “no-load” refers to the absence of sales and redemption charges.
3 They estimate single factor models with different proxies for the market (NYSE, CRSP and equally weighted fund
returns) and an 8-factor model that takes account of known asset pricing anomalies.
Later research shows that the effect is probably due to the second explanation, the inverse
relationship between fund flows and future performance4. Using different samples and bootstrap
simulations they convincingly show that survivorship bias (v) is not the driving factor behind
their results. A strategy that systematically buys recent top performers or what they call “hot
hands” managers statistically outperforms the market. “Icy hands” of the last quarter continue to
under-perform, even more so than their counterparts on the upside. Similar conclusions are
drawn by later studies of Goetzmann and Ibbotson (1994) as well as Brown and Goetzmann
(1995). Even stronger results are generated by Elton et al. (1996) who find performance
predictability over long horizons (5-10 years), which they attribute to differences in stock
selection skill as well as information sets of managers.
The growing optimism of this early persistence literature, however, is harshly dampened by what
is considered to be one of the most influential studies in the literature on mutual fund
performance. Carhart (1997) shows with an elaborately constructed survivorship bias-free
database on a large subset of the U.S. mutual fund industry5, that persistence in fund returns is
mostly attributable to momentum in stock returns as well as differences in expenses and
transactions costs. Carhart argues that serial correlation in fund returns is observed, not because
managers successfully follow momentum strategies, but rather because some funds just happen
to hold past winners in their portfolios. Any superior performance from true momentum
strategies is on average eroded by the high transactions costs involved in pursuing them. To
exemplify the errors made by earlier research, he constructs decile rankings based on simple
CAPM alphas and compares them to rankings that are generated from the estimation of multi-
factor models that, besides mimicking portfolios for market-, small-firm and value risk, include a
momentum factor due to Jegadeesh and Titman (1993). This factor is mimicked by a zero-cost
portfolio that longs last year’s winning stock and shorts last year’s losers. Carhart confirms that
rankings based on single factor models indicate statistically significant persistence in the spread of
risk adjusted returns between the top and bottom decile of the sample (approximately 8% in the
year after formation). Once momentum and other risk factors are accounted for, however, this
spread shrinks substantially. He is able to account for most of the remaining spread by
considering differences in expense ratios and transaction costs between funds. In particular, there
is no more evidence of superior performance, as compared to a passive strategy, in the sample.
4 Empirically this has been widely documented. See Sirri, Tufano (1998) or Jain,Wu (2000) for examples .
Theoretically Berk and Green (2004) examine this relationship in a rational expectations context and conclude that
the lack of long-term performance persistence should not be seen as an indication of homogenous skill levels in the
industry but as a natural outcome of competitive capital allocation.
5 This dataset later developed into the CRSP Survivorship-Bias Free Database®, which is the standard data source
for researchers up to this date.
The only remaining puzzle, he concludes, is the persistent under-performance of the worst funds
in the sample.
In a more recent contribution, Teo and Woo (2001), revitalize the debate on the issue of
persistence by modifying Carhart’s (1997) analysis in two ways: First they argue that due to
increased style compliance of fund managers one should evaluate their performance using style-
adjusted returns (iii). These are fund returns in excess of the average return of its peers as defined
by their Morningstar® classification. Second, to abstract from momentum driven short-term
persistence they create a temporal wedge of 2-5 years between the evaluation and the backtesting
periods. They find that neither momentum nor expense ratios or transaction costs can account
for the statistically significant spread of 3-4% annually , between the best and worst ranked funds.
This results is furthermore robust to variations in the lag between evaluation and portfolio
formation. They conclude that “managerial ability might not be as dead as it seems”.
4. “Modern” approaches
Against this background, the following section of the review will survey more recent attempts to
distinguish luck from skill in the mutual fund industry. The literature is not only modern from a
chronological standpoint but also differs markedly from previous research in its application of
more sophisticated econometric modeling techniques as well as conclusions on the existence of
skill in the mutual fund industry. Two main streams of recent research can be distinguished: The
first one is concerned with accurately capturing the dynamics of portfolio returns with respect to
economy wide risk factors. The second line of research applies Bayesian techniques to enrich the
information set used for the evaluation of fund managers.
4.1 Time-varying factor exposures
As mentioned in the previous section of the review, the question of market timing abilities of
managers has long had a central spot in the debate on managerial quality. It appears reasonable to
claim that managers, who are able to adjust the risk exposure of their portfolio according to the
time-varying premium associated with certain factors, possess superior information and market
timing skill. On the other hand, it is empirically corroborated that asset returns and the risk
associated with them are to a certain degree predictable by variables describing the
macroeconomic environment such as interest, credit spreads or dividend yields, all of which are
clearly public information. Time variation in factor exposures due to changes in these can
therefore not be attributed to superior timing skill of managers. Only skillful changes in factor
exposures that are due to management driven changes in the composition of the portfolio
indicate ability. Ferson and Schadt (1996) point out that tests for managerial ability and semi-
strong form market efficiency need to detect timing and stock selection ability, that cannot be
attributed to such public information.
For this reason they advocate the use of conditional performance evaluation. The asset pricing model
underlying this approach is the conditional APT (Stambaugh 1983), which can be expressed as
E( r p ,t / Z t −1 ) = α p + ∑ b i ,k ,t ( Z t −1 )λk ,t ( Zt −1 ) . (2)
It is equivalent to (1) besides the fact that factor loadings are now functions of the lagged public
information set Zt −1 . These functions can be Taylor-approximated linearly such that
b i ,k ,t ( Zt −1 ) ≅ β 0 , p + B'1, p ( Zt −1 − E( Z t −1 )) (3)
Translating the model into a regression specification will yield (L+1)K+1 regressors, where L is
the number of conditioning variables. To allow for market timing actions motivated by private
information of managers they also include squared factors, a la Treynor, Mazury (1966) into the
regression model of (2). This specification nests all the models that were previously discussed and
constitutes a step further on issues (i), (ii) and (iv)6. They use monthly observations on 67 open-
end mutual funds from 1968 to 1990. The public information set consists of a vector of 5
variables: the lagged treasury bill yield, a lagged dividend yield, a lagged measure of the slope of
the term structure, a lagged quality spread in the corporate bond market and a dummy variable
accounting for the January effect.
Estimating unconditional CAPMs yield similar conclusions than Jensen’s 30 years before: About
two thirds of the alpha estimates are negative. Of the 13 significant ones (at 5%) only five are
positive. Incorporating conditioning information yields a larger proportion of positive alphas (34
out of 67). Of the 12 significant ones exactly half are positive. The same conclusions are drawn
for the multi-factor specifications. Clearly, the introduction of conditioning information in the
estimation makes the performance of the sample look better overall and has a greater impact on
results than enlarging the set of risk factors. The reason for this is not obvious: FS find that the
unconditional beta estimates are typically slightly lower than average conditional estimates. This,
Of course parsimony becomes an issue here, particularly in light of small sample sizes and their effect on
estimation accuracy and survivorship bias (v). The purpose of FS’s analysis, however, is to highlight the differences
between conditional and unconditional performance evaluation and not to conduct universal performance evaluation
however, implies that the covariance between the factor exposure and the associated risk
premium is negative, which might be interpreted as perverse timing skill of managers. They can
provide some weak evidence, however, that the sign of this covariance is due to perverse
movements of betas of the underlying assets and the fact that expectation driven inflows to funds
temporary increase their cash positions and therefore lower their factor loadings in up markets.
To sum up, Ferson and Schadt (1996) provide strong evidence that previous judgments on the
level of timing and security selection skill were probably too pessimistic.
In a closely related article Christoferson et al. (1998) apply FS’s conditonal performance
evaluation method to a sample of 185 U.S pension fund managers7 to examine the persistence of
their performance. Similar to FS’s results for the mutual fund returns they find strong evidence
for conditional time variation in alphas and factor loadings. Furthermore, pension fund
performance is persistent, and again, in line with the results on mutual funds, this is
predominantly observed for poor managers.
The most recent substantial contribution on the estimation of time-varying coefficients in
portfolio asset pricing models are two papers by Spiegel, Mamaysky and Zhang (2003,2006)8. The
prime contribution of these papers lies in their use of Kalman-filters to estimate the dynamics of
mutual fund alphas and betas. This line of research can be distinguished from Ferson and
Schadt’s in that the time variation in loadings is not related to observables but to an unobservable
signal that drives the time variation in risk premia and hence managers trading strategies. Clearly,
without further assumptions on the dynamics of he signal over time this model is not identified
and cannot be estimated. SMZ therefore impose that the signal follows and AR(1) process, which
allows the estimation of the model9. The time variation in alphas and factor loadings can now be
decomposed into parameter variation of the underlying assets and variation due to the managers
actions on the composition of the portfolio.
In contrast to FS (1996), one advantage of this approach is that it is directly comparable to the
unconditional CAPM and Carhart-style factor models in terms of the information set employed
for estimation. MSZ therefore use several testing methodologies to assess the superiority of their
approach to unconditional static methods. The first one is a simple bootstrap: They draw a sub-
sample of funds from the U.S. universe and estimate CAPM, 4-factor Carhart and Kalman
specifications. They subsequently use the instantaneous alpha-rankings gathered from the
different models, to form optimal market-neutral portfolios and compute their out-of-sample
7 In contrast to mutual funds, whose shares are bought directly by the private investor, pension funds are hired by
firms to manage the future pension liabilities against their workforce.
8 The former has been cited in the New York Times (May 18, 2003) and the Financial Advisor Magazine (June 2004)
as “having solved a big problem for mutual fund rating systems”.
9 This assumption appears to be less esoteric in the light of Blake et al.’s(1999) observation of mean reversion in
fund weightings across securities of UK pension funds. The same assumption is furthermore used by Brunnermeier,
Nagel (2004), who model the return dynamics of hedge funds using Kalman filters.
performance. By repeating these steps, bootstrap distributions of risk free returns are generated,
which allow to make inferences about the predictive power of the different specifications. The
return distribution gained from the Kalman filter first-order stochastically dominates the
distribution from the unconditional CAPM specification. First-order stochastic dominance is
rejected when the Kahlman filter is compared to the unconditional Carhart specification.
However, the fund-of-fund returns selected by the former are less volatile, and hence their
distribution second-order stochastically dominates the one generated from the unconditional
Carhart specification. MSZ also directly compare their methodology to Ferson and Schadt’s and
conclude that conditioning information is not generally found to explain the time variation of the
factor loadings, but for a small group of 12% of the sample, indicating that only a limited number
of managers trade on the macro-variables that were considered by FS (1996).
In their second paper MSZ (2006) build on their earlier insights and examine the forecasting
abilities of dynamic and static model specifications more closely. Besides some slight details, the
estimated models are identical to MSZ (2003). The innovative element of the paper is the
combination of static and dynamic models to forecast superior performance. This rests on the
insight that mutual fund portfolios are heterogeneous, in the sense that some managers rely
heavily on dynamic trading strategies while others specialize in security selection for the long run.
While it is appropriate to model the former dynamically, static specifications might be superior
for the latter. One might be inclined to question this conjecture, given that the dynamic model
presented above nests any static specification. From the previous discussion it should, however,
be clear that in light of limited sample sizes there exists a trade off between model complexity
and estimation accuracy, and hence simple restricted models might provide more accurate
parameter estimates in certain cases.
For this reason MSZ (2006) subject their estimates periodically to two filters. First, the 3 models
(CAPM, 4-factor, Kalman) are estimated. Parameters outside a certain band around the mean are
disregarded as being results of misspecifation and sampling errors. Second, to classify as the
appropriate model describing a specific portfolio return process, it must correctly predict the sign
of the return in the period following the evaluation. They arrive at several interesting results: First
both models, static and dynamic, provide remarkably different performance rankings. In any one
period only about one third of them overlap. Second, while forecasting ability of static models is
improved with increased complexity in the factor structure, the same is not true for the Kalman
specification. Simple dynamic models might therefore generally be superior to complex ones10.
Finally MSZ show that a portfolio of fund of funds constructed by filtering parameter estimates
10 Again this result is likely due to the exploding degree of complexity of estimating dynamic models when the
number of parameters increases. It simply reduces the degrees of freedom to an extend that it interferes with the
accuracy of the parameter estimates.
as explained above, produces a risk adjusted return of 3.5% to 7%, depending on the method of
risk adjustment applied in the out-of-sample period.
To sum up, the insights of the MSZ (2003, 2006) advance the performance evaluation literature
immensely. They not only exemplify the importance of dynamic asset pricing specifications in
portfolio return modeling. Their work also shows that in fact no model uniquely applies to the
whole cross-section of managed portfolios, and that future research on the topic will have to
establish their respective advantages, when dealing with a heterogeneous group of assets and
4.2 Bayesian approaches
Parallel to developments in empirical asset pricing, also the performance attribution literature
increasingly uses Bayesian analysis in forming their judgments about management skill. A clear
advantage of this approach over Frequentist methods, lies in its ability to incorporate extraneous
information about the distribution of parameters in a consistent way into the analysis through the
formulation of a prior. Looking at the results produced by the recent literature it is fair to claim
that the Bayesian approach dominates and will increasingly crowd-out the Frequentist one in the
field of performance evaluation. The following discussion will hopefully convince the reader of
the validity of this statement.
The main output of Bayesian analysis is a posterior distribution of the parameter of interest,
which can loosely be thought of as resulting from weighting between the prior idea about the
distribution of a parameter and the data-based estimate of it. To illustrate this point in the
context of mutual fund performance a Bayesian researcher might have a priori a certain idea about
the distribution of the alphas of her sample of mutual funds. This idea could be formed on a
completely subjective basis (f.e. her believe in the pricing ability of different models) or with
hindsight to previous research or her own analysis of the data (f.e. the distribution of alphas is
centered around the negative of the expense ratio). She would then combine this prior
distribution with the data via Bayes’ theorem, taking into account the degree of informativeness
of the latter. The less informative the data, for instance the shorter the return history for a
particular fund, the more relative trust would the researcher put into her prior and hence the
more she would shrink the parameter estimates towards the mean of her prior distribution. In
this light, Bayesian analysis, tries to solve several issues discussed in the introduction. It first,
allows performance analysis with differing degrees of reliance on the pricing ability of certain
models and benchmarks (i)(ii). Furthermore it allows more precise inference in the light of short
return histories, alleviating problems associated with survivorship bias (v) and it provides a
method to consistently incorporate information, other than past return histories, into the analysis
of mutual fund performance (iv).
Pástor and Stambaugh (2002) are two of the first researchers to use Bayesian analysis in the
context of portfolio pricing. Their paper builds on Stambaugh’s (1997) insight that assets with
longer histories can help to make more precise inferences about the moments of asset returns
with short histories. The idea, applied to mutual funds can be expressed as follows: Suppose, as it
is generally the case, OLS is applied to the following linear regression reflecting a multi-factor
asset pricing model
r p ,t = α p + β ' p rb ,t + ε p ,t (4),
where rb,t is a vector of returns on benchmark portfolios11. Furthermore consider the regression
of a non-benchmark asset (hereafter NBA) return rn,t on the same set of benchmarks
rn ,t = α n + β 'n rb ,t + ε n ,t (5),
where the correlation between the two errors of (4) and (5) is positive. Now suppose one
believes that the set of benchmarks perfectly prices the NBA, in which case α n would be zero.
Imagine for the moment, that the OLS-estimation of (4) and (5) yields negative alpha-estimates.
The negative estimate of α n is therefore fully attributable to sampling errors and since the two
disturbances in (4) and (5) are positively correlated, also the OLS estimate of α p will be subject to
a negative sampling error bias. This insight can be used to adjust the estimate of α p .
Another extreme situation arises when the researcher believes that the NBA cannot be priced by
the benchmarks in which case α n is unknown. Furthermore suppose that the NBA has a longer
return history than the portfolio but overlaps with it completely. Imagine now that the difference
between the NBA alphas from estimation using the shorter history, chronologically overlapping
with the one of the portfolio, and the alpha estimated from its entire longer return history is
positive. Given that the latter estimate is more precise and again the fact that the errors of (4) and
(5) are positively correlated, the researcher has good reason to adjust his estimate of
α p downwards.
11 Returns are here expressed in excess of the risk-free rate.
This clarifies how the subjective notion of the validity of the asset pricing model will influence
the results of the performance analysis. The more trust one puts into the ability of the
benchmarks to price the NBAs, the more will deviations of the NBA-alpha from zero influence
the adjustment of the estimates of α p . The less the researcher believes into the pricing ability of
the benchmarks the more will the difference between the long and short history estimates of
α n lead to adjustment in the portfolio’s alpha estimate.
PS then continue there analysis by estimating and comparing CAPM and 3-factor Fama-French
models for a sample of 2609 domestic equity funds with and without the incorporation of five
non-benchmark assets.12 The results of course differ with the differing presumptions about the
ability of each model to price the non-benchmark assets. To exemplify this they provide results
for three scenarios: (a) perfect pricing ability (non-benchmark alpha prior is 0), (b) some pricing
ability (finite variance of 2% p.a. of the prior alpha distribution) and (c) no pricing ability
(variance of alpha prior is infinite).
Their results indicate that the incorporation of information from NBAs significantly changes the
alpha-estimates in the sample, even if (c) is assumed. Not surprisingly, the effect of including
NBAs is stronger for the CAPM specifications than the Fama-French ones, as the additional
factors of the FF specifications belong to the set of NBAs when estimating the Bayesian CAPM.
For instance, the difference between the median alpha-estimate of small-company growth funds
assuming the CAPM with and without the incorporation of non-benchmark assets is 7.2% to
8.3%, depending on the assumed ability of the CAPM to price the NBAs. With increasing
reliance on the validity of the asset pricing model, the differences between the naïve OLS
estimations and the ones including NBAs becomes even larger. Naturally, this difference is
smaller for funds with long histories as more reliance is put on the data.
Second, the inclusion of NBAs significantly improves the precision of the alpha estimates captured
by the variance of the posterior distribution, even in case (c). Again the improvement is larger for
the CAPM as compared to the FF. Finally they show that the probability that the average CAPM-
OLS alpha is negative is 100%. Again, in the case of the Bayesian approach this probability
depends on the prior about pricing abilities and is highest, when pricing is assumed to be perfect.
In a more recent contribution Busse and Irvine (2004) apply PS’s (2002) technique to investigate
persistence in mutual fund returns. Even though they heavily build on this exisitng idea, the
paper contributes in several aspects to the literature. Instead of monthly data, they use daily
observations and find that this significantly improves the accuracy of the estimated parameters.
Moreover, they show that PS’s Bayesian technique strongly increases the predictability and hence
12 These are portfolios constructed to explain industry-related variation that is not accounted for by the benchmarks.
measured persistence of management performance. They furthermore apply the Bayesian
approach of PS to a dynamic asset pricing model and show that this combination is superior to
the static Bayesian specification. A final, crucial element of the paper is their analysis of prior
beliefs of mutual fund investors using data on fund flows. They conclude that the average
investor believes that managers earn substantial abnormal returns.
Jones and Shanken (2002) approach the issue of performance evaluation in a Bayesian framework
from a different perspective. While PS (2002) enrich the information set of the researcher (or
equivalently the investor) by incorporating longer time-series of NBAs into the analysis, JS (2002)
believe that sharper inference can be made by considering the risk adjusted performance of other
fund managers. If an investor, for instance, knows that the risk adjusted performance of half of
the funds in the universe is 3% p.a. , he is likely to take this into account when forming his prior
idea about the risk adjusted performance of a particular portfolio he is interested in. The prior
can also be interpreted as investors coming to the market with a general idea about the degree of
management skill in the industry or a general view on market efficiency.
To implement this idea, JS argue that one should think of alphas as random draws from a
distribution with hyper-parameters µα and σ α . The value of the former indicates the presence of
management skill or the degree of efficiency of the market, while the latter reflects the fact that
there are differences in skill between managers. Of course, this distribution is not observable in
practice but investors would do well in forming ideas about it before their decisions to purchase a
particular portfolio. The information from the cross-section of all other portfolios will help them
in inferring the values of the hyper-parameters.
Again, as in PS (2002), one can think of two extreme scenarios in this context: On the one side
one might believe that the group of managers is homogenous, σ α =0 and hence cross-sectional
dependence is high. The estimated risk adjusted performance of other funds would then matter a
lot in judging the performance of a particular portfolio. If, on the other hand, the variance of the
hyper-distribution approaches infinity, dependence goes to zero and the judgment of fund
performance is completely based on its own historical data. A finite σ α will of course lead to a
weighting between the own data and the learning from the cross-section.
To exemplify the importance of cross-sectional learning on the posterior distribution of alpha, JS
run several Monte-Carlo experiments, generating artificial return series that have the same
moments as mutual fund portfolio returns in practice. The usefulness of Monte-Carlo here lies in
the fact that the researchers know the data-generating process exactly and can therefore simulate
the effect of cross-sectional learning on the precision of the parameter inference process. Under
prior independence (“no learning”) the inference process is solely based on the own return
process of a fund. With learning, the investor forms an aggregate posterior estimate of µα and
σ α , which in turn affects the posterior of the individual fund alphas depending on the precision
and amount of data. If the prior is such that the investor believes that markets are efficient, the
alpha estimates are shrunk towards the prior mean of zero, obviously more so in the no-leaning
scenario, since no data of the cross-section are included that would counter this effect. A further
important result is that with learning the posterior distribution of alphas converges quickly to the
true one as the amount of external data is increased. Under no learning, the “bias” induced by
shrinkage toward the prior mean of the hyperdistribution is constant.
Huij and Verbeek (2005) is a direct empirical application of JS’s (2002) concept of cross-sectional
learning. An important feature of their analysis is their focus on the scenario where investors
learn cross-sectionally but the subjective prior about managerial ability is diffuse or uninformative.
This refers to the special case where the prior variance of the hyperdistribution is infinite. The
analysis is then completely independent of the subjective prior of the investor and hence entirely
data-driven13. They motivate their analysis in several ways: First they argue that their estimation
procedure is more precise than OLS, particularly with very short return series. Being able to use
small samples to get precise estimates of alphas and factor loading has several advantages: It
reduces the incidence of misspecification errors in light of time varying factor exposures as well
as the effects of survivorship bias. On the other hand they can rely on using monthly data,
instead of daily, which greatly increases the cross-section of their fund sample, since returns of
many funds are only reported monthly.
Via Monte-Carlo simulation they show that the gains from using cross-sectional learning are
substantial for short-time series. Alpha estimates turn out to be 45% more accurate than OLS
estimates in terms of mean-squared error, over a measurement horizon of 12 month. This
difference in accuracy remains significant even when the measurement horizon is extended up to
five years. This gap widens particularly as one increases kurtosis in the simulated returns above
normal values, showing that the shrinkage estimator is particularly useful for estimating alphas of
return distributions, that are known to be leptokurtic.
Their empirical analysis is primarily concerned with the detection of performance persistence in a
sample of 1,754 no-load funds14. When using a 36-moth window for the evaluation with a 4-
factor model, they find, similar to Carhart (1997), some weak evidence for persistence, which is
13 In this sense it might actually be artificial to speak of Bayesian analysis here since it is frequentist in nature. The
method effectively boils down to weighted least squares.
14 The use of this sub-group of funds is motivated to gauge the economic significance of their persistence results, as
they don’t involve sales charges, which would render dynamic fund-of-fund trading strategies unprofitable if
implemented in practice.
stronger for poor performers. This result is independent of whether they use OLS or Bayesian
alpha estimates. If however, the evaluation period is reduced to 12 month the picture changes
substantially: The estimated alphas are more dispersed between the top and bottom deciles of the
distribution. Of course this could be due to misspecification and sampling errors but they show
that, even accounting for momentum, the top decile of the funds ranked using the Bayesian
method earns a significant superior return of 25 basis above decile ten over the following month.
They attribute the differences in results with respect to the length of the estimation window to
the persistent superior performance of young funds that are excluded from the analysis using the
longer evaluation window15.
Huij and Verbeek’s (2005) analysis, building on JS (2002), therefore provides several solutions to
problems in the mutual fund evaluation literature: Most importantly they convincingly argue that
their Bayesian estimation procedure allows to estimate parameters of an asset pricing model
relatively accurately even with very limited sample sizes. This alleviates problems associated with
survivorship bias (vi), time varying fund characteristics (ii) and generally the lack of daily data for
large part of the mutual fund cross-section.
The last paper in this review by Cohen et al. (2005), is probably the most innovative contribution
to the literature in recent years. Even if their approach is technically speaking not Bayesian, the
philosophy behind the central idea is. Similar to JS (2002) and HV (2005), it rests on the insight
that investors increase the quality of their evaluation and transaction decisions by considering
information on a particular fund in relation to information on the cross-section of other
managers. While the previous two papers discussed, rely on the entire sample of existing funds
when forming a “better idea” about the true performance of a particular manager, CCP (2005)
focus on the sample of fund managers that behaves similarly in terms of their portfolio
transactions. This simple, but powerful idea can be clarified with the following example taken
from the paper (p.2): Two managers have the same track record in terms of past returns.
However, manager A currently holds Intel, while B holds Microsoft. If one finds that Intel is
currently held by a group of “good” managers with a strong previous record, while Microsoft is
held predominately by an inferior group of managers, it is reasonable to assume that A is truly a
good manager while B’s previous success is more likely due to luck than skill. Pursuing this idea,
CCP construct a performance measure, which is a weighted average of a manager’s own and
others’ alphas. The weights are determined by the covariance between his asset holdings with the
ones of other managers. Interpreting this measure from a Bayesian viewpoint this is equivalent to
15 By using a 36 month window they would have to eliminate on average one third of the existing funds. Depending
on whether the “young funds” or survivorship bias dominates, the risk adjusted performance will be biased down or
JS (2002) where the subjective prior about the abilities of each manager is formed on the basis of
the correlation of his portfolio decisions with the ones of other managers combined with cross-
sectional learning as described above.
Besides developing this idea in detail the paper’s main purpose is to compare the predictive
power of the performance measure thus generated to traditional measure such as CAPM and FF-
alphas. The precision as well as the forecasting ability of the former is substantially higher: In
Monte-Carlo simulations the authors show that rank correlations (against the true ranking of the
alphas assumed in the data generating process) gathered from their evaluation method are
substantially higher than the ones that result from OLS-alpha rankings. As expected, the biggest
benefit of using their criteria arises for funds with short return histories, particularly if the
number of managers in the cross-section is large. This point illustrates the key advantage of their
method in light of the explosive growth in the number of managed portfolios in recent years. To
assess the forecasting ability of their method, they form decile portfolios quarterly sorted on their
performance measure from 1982 to 2000 and find a monotonic relation between the decile rank
and the subsequent quarter’s performance. While for the standard alpha measures, the risk-
adjusted spread between top and bottom decile ranges from 3.7% to 5.3%, the same spread
based on their measures increases to values between 5.9% and 7.4%. To test the marginal
predictive power of their evaluation method, CCP construct double-sorted rankings (first on
traditional alpha and then on their own measure) and show that their criteria explain persistence
in excess of what is explained by normal alphas. Conversely, all information on persistence
contained in the traditional alpha rankings seems to be captured by their measures.
CCP (2005) should be seen an important step forward not only for the Bayesian evaluation
literature, but more generally for thinking about market efficiency. Their analysis uses a public
information set. In addition to a more optimistic view on the existence of true management skill
in the industry their results thus provide strong-evidence against the viability of the concept of
semi-strong form market efficiency.
5. Conclusions and possible lines of future research
To conclude, it is useful to summarize the progress the literature on performance evaluation has
made on the five issues illustrated in section 2 of this review.
i. Even though there still does not exist (and surely will never exist) a universal asset pricing
model, recent Bayesian research, in particular Pástor, Stambaugh (2002) and Jones,
Shanken (2002) has progressed on the topic as it explicitly accounts for varying degrees of
believe in the validity of the model used. They show that even under substantial model
uncertainty it is possible to conduct performance evaluation.
ii. Recent contributions, in particular Ferson and Schadt (1996) as well as Marmaysky et al.
(2003,2006), have explored return dynamics in great detail and illustrate the benefits of
estimating dynamically unrestricted asset pricing models.
iii. In the light of increasing competition and specialization in the mutual fund industry it is
clear now that the performance of fund managers has to be compared to their peers. The
latest contribution of Teo and Woo (2001) points at the usefulness of classifying
managers according to segments, that are often artificially superimposed on the latter by
rating organizations (f.e. Morningstar®) or governmental regulation.
iv. Recent research has gone a far way in incorporating extraneous information into
performance evaluation measures and demonstrated the benefits of doing so. Pástor
Stambaugh (2002), Huij and Verbeek (2005) and Cohen et al. (2005) are just some
v. The influence of data constraints on the inference process in performance evaluation is
now well understood. Solutions have been developed, in particular methods that require
only small sample sizes for the accurate estimation of performance parameters.
Given these developments, the field of portfolio performance evaluation remains an exciting area
for future research: Researchers can look forward to an ever increasing set of information on the
performance and behavior of investment companies. Not only is this due to the enormous
growth of the industry over the last ten years but also due to more detailed reporting and
collection of data. Recent research on the behavior of mutual fund managers has shown that
there are many measurable factors that determine the quality and behavior of mutual fund
managers, besides the one considered so far16. A main task for future research will therefore lie in
the consistent incorporation of such information into the analysis of performance.
Moreover, more work needs to be done on the creation of meta-rules that determine the matching
of appropriate asset pricing models to estimate specific portfolios. Given information on the
heterogeneity of trading activities of a manager, captured in variables such as portfolio turnover,
16For instance, Chevalier and Ellison (1999) find some evidence for a relationship between education and
performance of managers. Hong et al. (2002) detect word-of- mouth effects in the fund industry.
transaction data or the asset pool the manager is investing in, one might be able to develop such
In the context of portfolio dynamics it would be interesting to better distinguish individual asset
dynamics from management induced changes in the asset mix in determining the variation of
factor loadings over time. Given the availability of holdings data of a large subset of mutual funds
as well as data on the assets they hold, it should be possible to do so. It would clearly help to get
a clearer picture on the existence of true skill in the investment management industry.
Baks, Klaas P., Andrew Metrick, and Jessica Wachter. "Should Investors Avoid all Actively
Managed Mutual Funds? A Study in Bayesian Performance Evaluation." Journal of Finance
56.1 (2001): 45-85.
Berk, Jonathan B., and Richard C. Green. Mutual Fund Flows and Performance in Rational
Markets. National Bureau of Economic Research, Inc, 2002.
Blake, David, Bruce N. Lehmann, and Allan Timmermann. "Asset Allocation Dynamics and
Pension Fund Performance." Journal of Business 72.4 (1999): 429-61.
Brown, Stephen J., et al. "Survivorship Bias in Performance Studies." Review of Financial Studies
5.4 (1992): 553-80.
Brunnermeier, Markus K., and Stefan Nagel. "Hedge Funds and the Technology Bubble." Journal
of Finance 59.5 (2004): 2013-40.
Carhart, Mark M., et al. "Mutual Fund Survivorship." Review of Financial Studies 15.5 (2002):
Chevalier, Judith, and Glenn Ellison. “Are some Mutual Funds Managers Better than Others?
Cross-Sectional Patterns in Behavior and Performance”. National Bureau of Economic
Research, Inc,” 1996.
Close, J. "Investment Companies: Closed-End Versus Open-End." Harvard Business Review 29
Cohen, R. B., Coval J., and Pastor L.. "Judging Fund Managers by the Company they Keep." The
Journal of Finance 60.3 (2005): 1057-96.
Dybvig, Philip H., and Stephen A. Ross. "Differential Information and Performance
Measurement using a Security Market Line." Journal of Finance 40.2 (1985): 383-99.
Elton, Edwin J., Martin J. Gruber, and Christopher R. Blake. "The Persistence of Risk-Adjusted
Mutual Fund Performance." Journal of Business 69.2 (1996): 133-57.
Fama, Eugene F., and Kenneth R. French. "Common Risk Factors in the Returns on Stocks and
Bonds." Journal of Financial Economics 33.1 (1993): 3-56.
Fama, Eugene F. "Efficient Capital Markets: A Review of Theory and Empirical Work." Journal
of Finance 25.2 (1970): 383-417.
Ferson, Wayne E., and Rudi W. Schadt. "Measuring Fund Strategy and Performance in Changing
Economic Conditions." Journal of Finance 51.2 (1996): 425-61.
Goetzmann, W. N., and R. G. Ibbotson. Do Winners Repeat? Patterns in Mutual Fund Behavior.
Columbia - Graduate School of Business, 1990.
Gruber, Martin J. "Another Puzzle: The Growth in Actively Managed Mutual Funds." Journal of
Finance 51.3, Papers and Proceedings of the Fifty-Sixth Annual Meeting of the American
Finance Association, San Francisco, California, January 5-7, 1996 (1996): 783-810.
Hendricks, Darryll, Jayendu Patel, and Richard Zeckhauser. "Hot Hands in Mutual Funds: Short-
Run Persistence of Relative Performance, 1974-1988." Journal of Finance 48.1 (1993): 93-
Hong, Harrison, Jeffrey D. Kubik, and Jeremy C. Stein. The Neighbor's Portfolio: Word-of-
Mouth Effects in the Holdings and Trade of Money Managers. National Bureau of
Economic Research, Inc, 2003.
Huij J., Verbeek M. "Cross-Sectional Learning and Short-Run Peristence in Mutual Fund
Performance." SSRN-Working Papers (2005)
Jain, Prem C., and Joanna Shuang Wu. "Truth in Mutual Fund Advertising: Evidence on Future
Performance and Fund Flows." Journal of Finance 55.2 (2000): 937-58.
Jegadeesh, Narasimhan, and Sheridan Titman. "Returns to Buying Winners and Selling Losers:
Implications for Stock Market Efficiency." Journal of Finance 48.1 (1993): 65-91.
Jensen, Michael C. "The Performance of Mutual Funds in the Period 1945-1964." Journal of
Finance 23.2, Papers and Proceedings of the Twenty-Sixth Annual Meeting of the American
Finance Association Washington, D.C. December 28-30, 1967 (1968): 389-416.
Jones, Christopher S., and Jay Shanken. Mutual Fund Performance with Learning Across Funds.
National Bureau of Economic Research, Inc, 2002.
Kothari, S. P., and Jerold B. Warner. "Evaluating Mutual Fund Performance." Journal of Finance
56.5 (2001): 1985-2010.
Lehmann, Bruce N., and David M. Modest. "Mutual Fund Performance Evaluation: A
Comparison of Benchmarks and Benchmark Comparisons." Journal of Finance 42.2 (1987):
Lintner, John. "The Valuation of Risk Assets and the Selection of Risky Investments in Stock
Portfolios and Capital Budgets: A Reply." The Review of Economic Statistics 51.2 (1969):
Mamaysky H., Spiegel M. I. and Zhang H. "Estimating the Dynamics of Mutual Fund Alphas and
Betas." SSRN-Working Papers (2003)
Maymansky H., Spiegel M.I. and Zhang H. "Improved Forecasting of Mutual Fund Alphas and
Betas." Yale ICF Working Papers 4.23 (2006)
McDonald, John G. "Objectives and Performance of Mutual Funds, 1960-1969." Journal of
Financial and Quantitative Analysis 9.3 (1974): 311-33.
Pastor, Lubos, and Robert F. Stambaugh. "Mutual Fund Performance and Seemingly Unrelated
Assets." Journal of Financial Economics 63.3 (2002): 315-49.
Ross, Stephen. "The Arbitrage Theory of Capital Market Asset Pricing." Journal of Economic
Theory 13 (1976): 341-60.
Sharpe, William F. "Capital Asset Prices: A Theory of Market Equilibrium Under Conditions of
Risk." Journal of Finance 19.3 (1964): 425-42.
Sirri, Erik R., and Peter Tufano. "Costly Search and Mutual Fund Flows." Journal of Finance 53.5
Stambaugh, R. F. "Arbitrage Pricing with Information." Journal of Financial Economics 12
Stambaugh, Robert F. Analyzing Investments Whose Histories Differ in Length. National Bureau
of Economic Research, Inc, 1997.
Teo M. and Woo S. "Persistence in Style-Adjusted Mutual Fund Returns." SSRN-Working Papers
Treynor J. and Mazury K. "Can Mutual Funds Outguess the Market?" Harvard Business
Review.July (1966): 131-6.