Skip to main content

Backtesting Historical Accuracy

Before you trust your valuation model with real money, you should ask: "Would this methodology have worked in the past?" Backtesting subjects your model to historical cases where you know the outcome. If your model said a stock was worth $50 in 2022 and it actually traded at $55 and grew to $120 by 2024, your methodology was directionally right but off on magnitude. If it said $50 and the company collapsed, your model missed the warning signs. Backtesting reveals these patterns.

Quick Definition

Backtesting is the process of applying your valuation model to historical companies and scenarios, then comparing your estimated intrinsic values to actual subsequent stock prices and business performance. The goal is to measure: (1) accuracy of your methodology, (2) whether your assumptions tend to overestimate or underestimate reality, (3) which inputs drive the largest forecast errors, and (4) which market conditions expose model weaknesses. A model that passes backtests inspires confidence; one that fails should be refined before deployment.

Key Takeaways

  • Backtest on 10–20 past companies where you have complete data; measure error magnitude and direction
  • Track not just whether you called the direction right (up/down) but accuracy of magnitude (off by 10% vs. off by 50%)
  • Test across different market cycles (bull, bear, transition), industries, and company sizes to expose regime blindness
  • Compare your valuation to actual price; track divergence over time (did the gap close, widen, or switch sign?)
  • Use backtest insights to refine assumption-setting rules, not to overfit past data

Designing a Backtest

Step 1: Choose Your Historical Sample

Select 10–20 companies from your typical investment universe. Include:

  • Different industries (tech, industrial, healthcare, finance)
  • Different sizes (small-cap, mid-cap, large-cap)
  • Different economic regimes (recession 2008–2009, bull market 2012–2020, rate shock 2022)
  • Mix of successes, failures, and borderline cases

Example portfolio for backtest:

  • Microsoft (large-cap, steady compounder) → 2016 valuation
  • Best Buy (mature, steady-state) → 2012 valuation
  • Chipotle (high growth, execution risk) → 2015 valuation
  • Walgreens (declining industry) → 2015 valuation
  • Energy Transfer (cyclical, debt-heavy) → 2020 valuation

Step 2: Gather Historical Input Data

For each company at a chosen date (e.g., Jan 1, 2016), collect:

  • Historical financials (last 5–10 years of actual results)
  • Management guidance and analyst consensus
  • Industry growth rates and peer multiples
  • Risk-free rate, market risk premium, company beta at that date
  • Stock price at valuation date (your starting point)

Step 3: Build the Model

Run your standard DCF model with these historical inputs. Assume you're sitting at Jan 1, 2016, and you don't know what happens next. Project forward 5–10 years using only information available in 2016.

Example: Microsoft on Jan 1, 2016

  • Historical growth (2010–2015): ~13% revenue CAGR
  • 2016 management guidance: 6–8% growth (cloud transition)
  • Historical EBIT margin: 33% (down from 35% due to mix shift to lower-margin Azure)
  • 2016 consensus analyst growth: 7%
  • Beta (2016): 0.95; risk-free rate: 1.7%; MRP: 6%
  • WACC: ~8.2%

Your DCF (with 2016 data): Fair value $47–$55/share Actual stock price Jan 1, 2016: $52.40

Step 4: Track Actual Outcomes

Let time pass. Record what actually happened:

  • Stock price over next 3, 5, 10 years
  • Actual revenue growth achieved
  • Actual margin trajectory
  • Any major surprises (M&A, disruption, secular decline)

Microsoft actual outcomes:

  • Stock price Jan 1, 2021 (5 years later): $158.08
  • Stock price Jan 1, 2024 (8 years later): $371.20
  • Actual revenue CAGR 2016–2023: 13.7%
  • Actual EBIT margin trend: 33% → 37% (expansion, not compression)

Step 5: Calculate Backtest Metrics

For each historical case, compute:

MetricCalculationInterpretation
Accuracy (Price)(Actual Price - Your Valuation) / Your Valuation% error; negative = undervalued, positive = overvalued
Accuracy (5-Yr Return)(Actual Price Yr 5 - Price Yr 0) / Price Yr 0Did the market eventually agree with you?
DirectionSign of aboveDid you call up/down correctly?
Magnitude ErrorABS(Actual Growth - Your Growth Forecast)How far off on key drivers?
Time-to-ConvergenceHow many years before actual price neared your valuation?Did the market ratify your thesis?

Microsoft Backtest Example:

MetricYour ModelActualError
Your DCF Fair Value (Jan 1, 2016)$51/sh----
Stock Price Jan 1, 2016--$52.40You: slightly undervalued, -2%
Stock Price Jan 1, 2024--$371.20608% return actual vs. your initial estimate
Your Growth Forecast 2016–20237% revenue CAGR13.7% actualUnderestimated by 97%
Your Margin Forecast (2023 EBIT)32–33%37% actualUnderestimated by 12%

Interpretation: Your model called the direction (buy) correctly, but dramatically underestimated both growth and margin expansion. You would have profited, but far less upside than warranted. Question: Why did you underestimate Azure growth? (Answer: Hard to forecast from 2016 consensus, which underestimated cloud growth universally.)

Building a Backtest Summary

Aggregate results across your 10–20 test cases:

CompanyValuation DateYour Fair ValueActual Price (Entry)Actual Return (3Y)Actual Return (5Y)Your CallOutcome
Microsoft2016-01-01$51$52.4040%202%BUY (slight UVed)CORRECT (MISS upside)
Best Buy2012-10-01$28$26.5045%76%BUY (overvalued)CORRECT direction (harsh on entry)
Chipotle2015-01-01$645$635-15%242%HOLD/BUYCORRECT (but volatility surprise)
Walgreens2015-01-01$85$80-18%-45%HOLD/SELLCORRECT (secular decline)

Summary Stats:

  • Accuracy on direction: 18/20 correct (90%)
  • Accuracy on magnitude (% error): Average 25%, Median 18%, Range 5%–60%
  • Under/overestimate: 12 underestimates (too conservative), 8 overestimates (too aggressive)
  • Time to convergence: Average 2.3 years

Insight: Your model has a 90% directional hit rate but systematically underestimates growth and margin expansion in winners. You're too conservative on compounding businesses.

Diagram: Backtest Feedback Loop

Common Backtest Mistakes

Overfitting to Past Results If you tweak your model until it perfectly matches every past company, you've optimized for history, not the future. Use backtest insights to refine rules and ranges, not to lock in specific numbers.

Example: "My model called Microsoft growth at 7% but it actually did 13%, so I'll now assume 13% growth for all cloud companies." Wrong. Instead: "My 2016 model underestimated cloud TAM expansion. I'll now research TAM estimates more deeply before setting growth."

Survivorship Bias Only testing on companies still around in 2024 ignores the failures. Backtest should include bankruptcies, mergers, and dead companies. "How would my model have warned me about Blockbuster? Did I have mechanisms to catch secular decline?"

Cherry-Picking Historical Dates Testing 10 companies on dates that happen to be valuation peaks or troughs skews results. Use random dates or systematic dates (Jan 1 every year) to avoid bias.

Ignoring Regime Shifts A model may work in bull markets but fail in bear markets. Backtest across at least two different economic regimes. A 2008–2009 test and a 2015–2020 test tell you whether your model is robust.

Not Separating Process from Luck A correct call on one stock may be luck (you guessed the right outcome for wrong reasons). Look for patterns: Do you consistently underestimate certain categories (growth, margin expansion)? That's signal. One-off correct calls are noise.

Advanced: Sensitivity Backtest

Beyond comparing valuation to price, backtest your sensitivity analysis:

Question: "If I had run a sensitivity table in 2016 and placed Microsoft in the middle case, would I have been directionally right about uncertainty?"

Your 2016 sensitivity table for Microsoft:

CaseGrowth AssumptionMargin AssumptionIntrinsic Value
Downside4%30%$38
Base7%33%$51
Upside10%35%$68

Actual Outcome (2024): Value dramatically exceeded upside case.

Insight: Your uncertainty bands were too narrow. You didn't consider the possibility of 14%+ growth + 37% margins (cloud expansion). This points to a bias: Your process underestimates what high-quality businesses can achieve in favorable conditions.

Refined rule: "For companies with strong secular tailwinds (cloud, AI, cost-of-capital dynamics), widen upside case growth by 50% and margin expansion by 100–200 bps."

Backtesting Against Peer Methods

Compare your DCF not just to actual outcomes but to other valuation methods:

2016 Microsoft Backtest:

  • Your DCF: $51/share
  • Peer-based P/E multiple (33x trailing): $52/share
  • Historical average P/E (30x): $50/share
  • Price-to-sales (8x): $54/share
  • All methods clustered near actual price ($52)

This gives you confidence. If your DCF had said $80 while multiples said $50, that would be a warning flag.

FAQ

Q: How many years back should I backtest? A: At least 10 years, ideally across one full market cycle (bull, bear, recovery). More than 20 is rarely necessary; you see the patterns by then.

Q: Can I backtest on the same stocks I currently hold? A: Dangerous; you risk justifying past buys rather than learning. Include companies you rejected or didn't own. Test on dead companies and bankruptcies especially—what would your model have told you?

Q: What's a "good" backtest error rate? A: Directional accuracy 75%+. Magnitude error within 20% is solid; 30% is acceptable. If you're off by 50% on half your cases, refine the methodology before deploying real capital.

Q: Should I backtest on data I used to train my model? A: No. Use data you didn't see when building assumptions. If you trained on 2015 data, backtest on 2012 or 2008.

Q: What if I realize my model is wrong? A: Backtest is a test, not a guarantee. If results are poor, audit your model: Are assumptions reasonable? Is the three-statement logic correct? Are there systematic biases in how you set growth or discount rates? Fix the root cause, not the output.

Q: How often should I backtest? A: Annually if you actively value stocks. Whenever you change your methodology significantly. Backtesting is not a one-time exercise; it's part of continuous improvement.

  • Forecast Accuracy: How close predictions are to outcomes
  • Out-of-Sample Testing: Validating on data the model wasn't trained on
  • Walk-Forward Analysis: Rolling-window backtest where you use progressively newer data
  • Stress Testing: Forcing extreme scenarios; often paired with backtesting
  • Model Validation: Broader framework for checking model integrity

Summary

Backtesting bridges the gap between elegant theory and messy reality. A model that performs brilliantly on historical companies builds confidence; one that fails reveals blind spots. The goal isn't perfect prediction (impossible) but pattern recognition: Where do I consistently err? What categories of companies or regimes does my process miss? What should I adjust?

Use backtests not to over-optimize past results but to refine your assumption-setting process, tighten your uncertainty bands, and build honest conviction in your methodology. A model you've vetted on 15 past cases is far more trustworthy than one you've never tested.

Before you commit significant capital on a valuation, ask yourself: "Would this methodology have worked in the past? Where did it fail? Have I fixed those failures?" Backtest answers those questions.

Next Steps

Your model is tested, documented, and empirically validated. The final frontier: Maintaining Your Models covers how to evolve your valuations as new information arrives, updating efficiently without losing discipline, and knowing when to kill a thesis that no longer fits the facts.