Factor Zoo

The factor zoo is the explosion of published factors — academic studies have documented hundreds of systematic equity characteristics claimed to predict returns. The darker reality: the vast majority are likely false discoveries, statistical patterns that fit historical data by chance but have no true predictive power out of sample.

For the nature of factor premiums in general, see Factor Premium. This article addresses the explosion of candidate factors and the methodological crisis it has created.

How we arrived at this zoo

In the 1990s, academic finance had a handful of well-known factors: market risk (or beta), size, value, and momentum. These were robust, documented across decades, and had plausible economic stories.

Then two things changed:

Computing power exploded. Researchers could now test thousands of potential factors against decades of stock-return data in minutes.
Publication incentives aligned with novelty. A journal rewards finding a new factor more than confirming an old one. Academic careers are built on discoveries.

The result: an explosion. By the early 2020s, published academic papers had documented over 400 factors. The American Finance Association published a comprehensive review counting 447 distinct factors in top journals. Each claimed to have predictive power. Each author had a paper, a data set, a conference presentation, and a claim to have found an anomaly.

The data-mining problem

This abundance creates a vast multiple-testing problem. If you run enough statistical tests, you will find false patterns by pure chance.

Imagine flipping a coin 1,000 times and looking for sequences of 10 heads in a row. You will find some — not because the coin is biased, but because you tested 991 possible 10-flip windows. Similarly, a researcher with stock-return data for 3,000 firms over 50 years has billions of potential predictive variables available. Test enough of them against historical returns, and some will appear significant by chance.

The problem is acute because:

Selection bias: Researchers publish only positive results. A researcher who finds 500 factors, of which 5 appear significant, publishes the 5. The 495 null results never see light. The journals do not publish a paper titled “I looked for 500 factors and found nothing.”
In-sample vs. out-of-sample: A pattern might fit historical data perfectly yet fail entirely on future data. Many published factors work brilliantly from 1980–2000, then fail or reverse from 2001 onward.
P-hacking and HARKing: Researchers can test many variants of a hypothesis and report only the one that “works.” A researcher might define a value factor 10 different ways and publish the one with the best t-statistic, making it seem rock-solid when the real success rate across definitions was just 10%.

The zoo is not a conspiracy. It is a rational response to misaligned incentives.

How to identify the false ones

Several red flags suggest a published factor is a data-mining artifact rather than a genuine premium:

Short history: A factor tested only from 1990 to 2020 may simply be in-sample overfitting. Genuine factors survive long-term, out-of-sample tests.

Limited geography: A factor that works only in the US but not in Europe or Japan is suspect. Random patterns are geography-specific; true premiums are more universal.

High complexity: A factor that requires three inputs, two transformations, and threshold adjustments is more likely to be fit than one based on a single, simple idea (like price-to-earnings).

Mechanical rules with tweaks: If the factor’s definition includes “except on Mondays” or “unless the prior month was positive,” it is probably a red flag. Overfitting leaves fingerprints.

Tiny magnitude: A factor delivering 30 basis points per year, after costs, might be worth publishing academically but is economically irrelevant. Many factors in the zoo are so small that trading costs eliminate them.

No economic theory: A factor defined as “stocks whose 247-day momentum is positive and whose earnings surprise is greater than the industry median” lacks intuitive appeal. Without a story, it is just a number.

By contrast, the original factors (value, momentum, quality) are simple, have decade-spanning histories, exist across markets and asset classes, deliver economically meaningful returns, and have plausible stories rooted in either risk or behavior.

The replication crisis

In recent years, academics have begun replicating published factors on new data. The findings are sobering.

A seminal paper by Hou, Xue, and Zhang (2020) examined 430 published factors across multiple time periods and geographies. Many failed to replicate. The primary culprits were:

Factors tested only in sample, without holdout data.
Factors that worked in a specific era (like the 1990s boom) but not in general.
Factors fit to US large-cap stocks, which may not generalize to other universes.

Some factors that were celebrated in the literature turned out to have negative returns out of sample. This suggests the in-sample “premium” was a fluke.

What survives?

Not all factors are equally dubious. A small core have weathered scrutiny:

Value (low price-to-earnings or price-to-book): Decades of history, multiple geographies, plausible stories about risk and mispricing, meaningful magnitude.
Momentum (recent winners tend to keep winning): Also long-tested, global, large magnitude, though mechanism is more disputed.
Quality (profitability, low debt): Increasingly robust across studies, plausible story.
Low volatility (stocks with below-average price swings): Documented but with some geographic variation.
Size (small stocks outperform large ones): Historically robust, though less reliable in recent decades.

These factors do not always work (value underperformed for 15 years; momentum crashed in 2009), but they are credible. The zoo’s other 400+ factors are mostly data mining.

Implications for investors

The factor zoo poses two risks:

For managers: Building strategies on false factors is building on sand. A hedge fund that relies on a factor with only 15 years of history, tested only in the US, and never replicated by outsiders is taking uncompensated risk. The factor may simply be random noise.

For allocators: The proliferation of factors creates a marketing temptation. Firms can cherry-pick from the zoo to find a factor that worked well in the recent past, launch a fund around it, and sell it to investors. When the factor reverts, investors suffer. The abundance of factors is not a feature; it is a sign of a crisis in factor identification.

The path forward

The field is self-correcting. Rigorous journals now demand:

Pre-registration: Define the factor before testing it on data.
Out-of-sample tests: Show it works on data the author did not use to design it.
Multiple geographies and periods: Demonstrate global and temporal robustness.
Realistic costs: Account for trading frictions, not just theoretical excess returns.
Robustness checks: Show the factor is not a variant of an existing, better-known factor.

This raises the bar for publication and reduces the incentive to publish noise. The zoo will not vanish — academic publishing is too large and too ambitious — but a clearer distinction is emerging between factors that are genuinely predictive and those that are statistical artifacts waiting to be forgotten.

Factor Zoo

How we arrived at this zoo

The data-mining problem

How to identify the false ones

The replication crisis

What survives?

Implications for investors

The path forward

See also

Wider context

Factor Zoo

How we arrived at this zoo

The data-mining problem

How to identify the false ones

The replication crisis

What survives?

Implications for investors

The path forward

See also

Closely related

Wider context