Skip to main content
What Does Not Work, and the Data

Curve-Fitting and Overfitting in Trading

Pomegra Learn

Why Backtested Strategies Fail: Overfitting and Curve-Fitting Explained

The most destructive bias in technical analysis is overfitting—the process of making a strategy fit historical data so perfectly that it fails in the future. A strategy that generated 25% annual returns in backtests might return -5% when deployed live. This isn't because the backtest was a lie; it's because the strategy was fitted to the past, not to the future.

Overfitting is the reason that most published trading strategies fail. It explains why newsletter trading alerts that worked for three months suddenly stop working. It explains why a strategy famous for beating the market underperforms in real trading. Overfitting is the single most important bias to understand when evaluating any technical trading system.

Understanding overfitting is essential to protecting yourself from false hope in technical analysis.

Quick definition: Overfitting (also called curve-fitting) occurs when a trading strategy is adjusted to fit historical prices so well that it has essentially memorized the past rather than identified a genuine pattern. The strategy works on the data it was optimized on, but fails on new data it hasn't seen.

Key Takeaways

  • A strategy with enough parameters can fit any historical data perfectly by pure chance, making backtest results meaningless without out-of-sample validation
  • The larger the number of parameters tested, the higher the probability that a good-looking result is simply overfitting; testing 1,000 strategies guarantees some will appear excellent by random luck
  • Overfitting inflates historical returns by 50–200% on average; a strategy showing 15% annual gains in backtests typically delivers 5–8% in live trading if it delivers anything
  • Walk-forward validation and out-of-sample testing are the primary defenses against overfitting, though even these imperfect safeguards are rarely used by retail traders
  • Most published trading strategies are overfit; the ones that aren't are typically either mediocre or untested forward

The Problem of Optimization

Most technical traders optimize their strategy parameters on historical data. You select a moving average period by testing all periods (10-day, 20-day, 30-day, etc.) and choosing the one with the highest returns. You optimize the Bollinger Band width, the RSI threshold, the position sizing, entry timing, exit timing—dozens of parameters.

This optimization process is where overfitting begins.

Consider a simple moving average crossover strategy. You have these parameters to optimize:

  • Short-term moving average period (10 to 100 days)
  • Long-term moving average period (50 to 500 days)
  • Entry threshold (how far above the long-term MA to buy)
  • Exit threshold (stop-loss percentage)
  • Position size (1% to 5% of account)

That's roughly 90 × 450 × 10 × 20 × 5 = 40 million possible combinations. If you test 40 million combinations on 20 years of historical data, some will return 30% annually by pure chance.

This is not skill. This is the law of probability. Test enough combinations, and you'll find a "winning" strategy that's actually just curve-fitting.

How Overfitting Works: A Concrete Example

Let's trace through a realistic overfitting scenario.

A trader backtests a momentum strategy on the S&P 500 from 2000 to 2023. The strategy is simple: buy when the RSI (relative strength index) crosses below 30 (oversold), sell when it crosses above 70 (overbought).

Initial testing shows a 20% annual return. Great! The trader publishes the strategy.

But what happened behind the scenes? The trader tested RSI thresholds of 25, 28, 30, 32, 35, 40, 45, 50—about 8 different oversold thresholds. They tested holding periods of 3, 5, 10, 15, 20, 30, 50 days—about 7 different exits. They tested on different start dates (2000, 2005, 2010) to find the most profitable combination.

That's already 8 × 7 × 3 = 168 different versions of "RSI strategy" tested. Of those 168, one returned 20%. The other 167 returned 5–15%. The trader published the best one: the 30/70 RSI with a 5-day holding period, tested on 2000–2023 data.

Then the trader tested this specific strategy forward on 2024 data (out-of-sample, data the strategy had never seen). It returned 3%. Why? Because the 30/70 thresholds and 5-day exit were fitted to 2000–2023, not to the general future.

This is overfitting. The strategy worked perfectly on the training data (in-sample) but failed on new data (out-of-sample).

The Statistical Inevitability of Overfitting

Overfitting is not an accident; it's a mathematical consequence of testing multiple hypotheses on finite data.

If you test 100 independent trading strategies on 20 years of historical data, approximately 5 of them (by the laws of statistics) will show significant outperformance at the 95% confidence level, purely by chance. This is the multiple-comparison problem.

A 2014 study by Arnott, Beck, Kalesnik, and West, "How Can 'A Random Walk Down Wall Street' Outperform the Market?" examined this problem explicitly. They showed that if you test 200 moving average strategies on historical stock data, some will show 15%+ annual returns. Then they tested these "winning" strategies forward on subsequent years. The out-of-sample returns were near zero or negative.

The conclusion: the winning strategies hadn't identified genuine patterns; they'd identified random noise that happened to align with past returns.

This is inevitable when you have enough degrees of freedom in optimization.

Degrees of Freedom and Overfitting

The number of parameters you optimize directly affects overfitting risk. A strategy with 3 parameters is harder to overfit than one with 30 parameters.

A simple moving average crossover has 2 parameters: short period and long period. You could test maybe 1,000 combinations on 20 years of daily data (about 5,000 trading days). This is borderline dangerous.

A neural network trained on technical indicators might have 100,000 parameters. Testing it on 5,000 days of data is like giving a student 100,000 multiple-choice questions to memorize and then quizzing them on just 5,000 of those questions. Of course they'll score perfectly—but only on material they've memorized, not on genuinely new questions.

A useful rough rule: the ratio of data points to parameters should be at least 50:1 to avoid severe overfitting. With 5,000 trading days, you can safely optimize around 100 parameters. With 10,000 days (40 years), you might get away with 200.

Most retail traders don't track this ratio and end up over-optimizing.

The In-Sample vs. Out-of-Sample Problem

The gold standard for testing a trading strategy is the out-of-sample test:

  1. Divide your data into training period and test period
  2. Optimize your strategy on training data only
  3. Test the strategy on the test period (which it has never seen)
  4. Compare in-sample returns (training) to out-of-sample returns (test)

If in-sample returns are 15% and out-of-sample returns are 12%, the strategy is probably reasonably fitted (3% difference is acceptable).

If in-sample returns are 15% and out-of-sample returns are 1%, the strategy is severely overfit.

Most backtests you see skip this step. They show only in-sample returns, which are always inflated. This is a major red flag.

A proper backtest divides data into three periods:

  1. Training period: 2000–2010. Optimize strategy here.
  2. Test period: 2010–2015. Evaluate strategy on unseen data.
  3. Forward period: 2015–2023. Track live or paper trading.

Many retail traders do only period 1 and call it a day. Professional quant funds do all three.

Walk-Forward Validation: A Better Approach

Walk-forward validation is a more sophisticated defense against overfitting. Instead of a single train-test split, you divide data into multiple overlapping periods:

  1. Train on 2000–2005, test on 2005–2010
  2. Train on 2005–2010, test on 2010–2015
  3. Train on 2010–2015, test on 2015–2020
  4. Train on 2015–2020, test on 2020–2023

In this approach, the strategy is retrained periodically (every five years), and you track the average returns across all test periods. This better mimics how a trader would use the strategy in real time: optimize periodically on recent data, then use the strategy for a defined holding period.

Walk-forward validation is much more realistic than a single backtest, though it still doesn't catch all overfitting.

Diagram: Overfitting in Parameter Optimization

Real-World Examples of Overfitting

The "Perfect" Bollinger Bands Strategy: A popular trading course teaches a Bollinger Bands mean reversion strategy with precise rules: when price touches the lower band AND RSI is below 30 AND volume spikes AND the VIX is between 15 and 25, buy with a 2% trailing stop. This strategy, when backtested on 2015–2022 data, shows 18% annual returns. But these specific parameters (lower band, RSI < 30, volume threshold, VIX range, 2% trailing stop) were optimized on that exact period. When tested on 2022–2024, it returns -2%. The strategy was a perfect fit to a bull market; it failed in a sideways/bear market.

The "Winning" Cryptocurrency Trading Bot: A trader develops a bot using moving averages and MACD (moving average convergence divergence) on Bitcoin. The bot is backtested on 2017–2021 (a strong bull market) and shows 200% annual returns. In live trading from 2021–2024, it loses 40%. The strategy was fitted to an era of relentless uptrends; when the trend reversed, it was whipsawed into losses.

The Candlestick Pattern Study: A student backtests 50 different candlestick patterns on the S&P 500 from 2005–2020. Pattern #14 (a rare combination of doji, hammer, and engulfing candles) shows 12% annual returns. The student publishes this "discovery." But this pattern is rare—it occurs only 20–30 times in 15 years. By pure chance, those 20–30 times happened to precede up days in that particular period. Test it forward on 2020–2024, and it performs no better than random. The pattern was overfitted.

Common Mistakes Leading to Overfitting

One: Testing Only on Bull Markets: A strategy that works in 2009–2021 (the longest bull market in decades) might fail in 2022 or 2000–2003. Always test across multiple market regimes.

Two: Optimizing Too Many Parameters: Each parameter you add multiplies the overfitting risk. A strategy with 10 parameters optimized on 5,000 days of data is dangerous.

Three: Using Walk-Forward Validation But With Too-Short Test Periods: If you test a strategy on only 3 months of out-of-sample data, you're not truly validating. Test on at least 1–2 years of unseen data.

Four: Ignoring Costs in Backtests: A strategy might appear profitable with 0% costs but unprofitable with realistic commissions and slippage. Always include costs.

Five: Testing Multiple Assets and Reporting Only the Best One: If you test a strategy on 500 stocks and report the best result, you're cherry-picking. Report the median or average result.

Six: Reoptimizing After Seeing Forward Results: If you backtest a strategy, see it fail forward, then tweak parameters to fit the failure, you're overfitting to the failure. This is especially tempting with real money at stake.

FAQ

Q: How do I know if my strategy is overfit? A: The primary test is out-of-sample performance. If your strategy returns 15% on training data but only 2% on test data it hasn't seen, it's overfit. Walk-forward validation also reveals overfitting: if each successive period shows declining returns, that's a sign.

Q: If I use machine learning, am I more likely to overfit? A: Yes. Machine learning models with many parameters (neural networks, random forests) are extremely prone to overfitting. They can fit any dataset perfectly if given enough capacity. Overfitting is more likely with machine learning than with simple rules.

Q: Should I avoid testing many parameters? A: Not entirely. You need to test some parameters. But be aware of the multiple-comparison problem. If you test 100 combinations, expect 5 to look good by chance. Only believe results that are much better than the median.

Q: Can I overfit if I don't do any optimization? A: Yes, you can still overfit through selection bias. If you choose a strategy because it "looks good" on historical charts, that's a form of overfitting. Any strategy chosen after looking at the data is vulnerable to having been selected for luck.

Q: Does live trading prevent overfitting? A: Partially. Live trading introduces slippage and real market conditions that backtests miss. However, a few months of live trading isn't enough to confirm a strategy is not overfit; regimes change, and luck persists over short periods.

Q: If a strategy is overfit, should I completely abandon it? A: Not necessarily. An overfit strategy might still contain a small real edge buried within the overfitting. But you should drastically lower your expectations. If backtests showed 20%, expect 3–5% in reality, or test more rigorously with walk-forward validation.

Q: Can professionals avoid overfitting? A: They minimize it through rigorous testing methodology, but they don't eliminate it. Quant firms use walk-forward validation, out-of-sample testing, and multiple independent data sources. Even so, many fund strategies underperform forward because some overfitting is nearly inevitable.

Summary

Overfitting—adjusting a strategy to fit historical data perfectly—is the most common reason technical trading strategies fail in live trading. The more parameters you optimize on historical data, the higher the probability that your results are pure overfitting rather than genuine edge. The only effective defense is rigorous out-of-sample and walk-forward testing, practices rarely used by retail traders. Even "winning" backtests with 15–20% annual returns are likely overfit and will deliver 3–5% in reality if they deliver anything at all. Understanding overfitting is essential to avoiding costly mistakes with technical strategies.

Next

Data-Mining Bias