Backtesting Historical Accuracy

Before you trust your valuation model with real money, you should ask: "Would this methodology have worked in the past?" Backtesting subjects your model to historical cases where you know the outcome. If your model said a stock was worth $50 in 2022 and it actually traded at $55 and grew to $120 by 2024, your methodology was directionally right but off on magnitude. If it said $50 and the company collapsed, your model missed the warning signs. Backtesting reveals these patterns.

Quick Definition

Backtesting is the process of applying your valuation model to historical companies and scenarios, then comparing your estimated intrinsic values to actual subsequent stock prices and business performance. The goal is to measure: (1) accuracy of your methodology, (2) whether your assumptions tend to overestimate or underestimate reality, (3) which inputs drive the largest forecast errors, and (4) which market conditions expose model weaknesses. A model that passes backtests inspires confidence; one that fails should be refined before deployment.

Key Takeaways

Backtest on 10–20 past companies where you have complete data; measure error magnitude and direction
Track not just whether you called the direction right (up/down) but accuracy of magnitude (off by 10% vs. off by 50%)
Test across different market cycles (bull, bear, transition), industries, and company sizes to expose regime blindness
Compare your valuation to actual price; track divergence over time (did the gap close, widen, or switch sign?)
Use backtest insights to refine assumption-setting rules, not to overfit past data

Designing a Backtest

Step 1: Choose Your Historical Sample

Select 10–20 companies from your typical investment universe. Include:

Different industries (tech, industrial, healthcare, finance)
Different sizes (small-cap, mid-cap, large-cap)
Different economic regimes (recession 2008–2009, bull market 2012–2020, rate shock 2022)
Mix of successes, failures, and borderline cases

Example portfolio for backtest:

Microsoft (large-cap, steady compounder) → 2016 valuation
Best Buy (mature, steady-state) → 2012 valuation
Chipotle (high growth, execution risk) → 2015 valuation
Walgreens (declining industry) → 2015 valuation
Energy Transfer (cyclical, debt-heavy) → 2020 valuation

Step 2: Gather Historical Input Data

For each company at a chosen date (e.g., Jan 1, 2016), collect:

Historical financials (last 5–10 years of actual results)
Management guidance and analyst consensus
Industry growth rates and peer multiples
Risk-free rate, market risk premium, company beta at that date
Stock price at valuation date (your starting point)

Step 3: Build the Model

Run your standard DCF model with these historical inputs. Assume you're sitting at Jan 1, 2016, and you don't know what happens next. Project forward 5–10 years using only information available in 2016.

Example: Microsoft on Jan 1, 2016

Historical growth (2010–2015): ~13% revenue CAGR
2016 management guidance: 6–8% growth (cloud transition)
Historical EBIT margin: 33% (down from 35% due to mix shift to lower-margin Azure)
2016 consensus analyst growth: 7%
Beta (2016): 0.95; risk-free rate: 1.7%; MRP: 6%
WACC: ~8.2%

Your DCF (with 2016 data): Fair value $47–$55/share Actual stock price Jan 1, 2016: $52.40

Step 4: Track Actual Outcomes

Let time pass. Record what actually happened:

Stock price over next 3, 5, 10 years
Actual revenue growth achieved
Actual margin trajectory
Any major surprises (M&A, disruption, secular decline)

Microsoft actual outcomes:

Stock price Jan 1, 2021 (5 years later): $158.08
Stock price Jan 1, 2024 (8 years later): $371.20
Actual revenue CAGR 2016–2023: 13.7%
Actual EBIT margin trend: 33% → 37% (expansion, not compression)

Step 5: Calculate Backtest Metrics

For each historical case, compute:

Metric	Calculation	Interpretation
Accuracy (Price)	(Actual Price - Your Valuation) / Your Valuation	% error; negative = undervalued, positive = overvalued
Accuracy (5-Yr Return)	(Actual Price Yr 5 - Price Yr 0) / Price Yr 0	Did the market eventually agree with you?
Direction	Sign of above	Did you call up/down correctly?
Magnitude Error	ABS(Actual Growth - Your Growth Forecast)	How far off on key drivers?
Time-to-Convergence	How many years before actual price neared your valuation?	Did the market ratify your thesis?

Microsoft Backtest Example:

Metric	Your Model	Actual	Error
Your DCF Fair Value (Jan 1, 2016)	$51/sh	--	--
Stock Price Jan 1, 2016	--	$52.40	You: slightly undervalued, -2%
Stock Price Jan 1, 2024	--	$371.20	608% return actual vs. your initial estimate
Your Growth Forecast 2016–2023	7% revenue CAGR	13.7% actual	Underestimated by 97%
Your Margin Forecast (2023 EBIT)	32–33%	37% actual	Underestimated by 12%

Interpretation: Your model called the direction (buy) correctly, but dramatically underestimated both growth and margin expansion. You would have profited, but far less upside than warranted. Question: Why did you underestimate Azure growth? (Answer: Hard to forecast from 2016 consensus, which underestimated cloud growth universally.)

Building a Backtest Summary

Aggregate results across your 10–20 test cases:

Company	Valuation Date	Your Fair Value	Actual Price (Entry)	Actual Return (3Y)	Actual Return (5Y)	Your Call	Outcome
Microsoft	2016-01-01	$51	$52.40	40%	202%	BUY (slight UVed)	CORRECT (MISS upside)
Best Buy	2012-10-01	$28	$26.50	45%	76%	BUY (overvalued)	CORRECT direction (harsh on entry)
Chipotle	2015-01-01	$645	$635	-15%	242%	HOLD/BUY	CORRECT (but volatility surprise)
Walgreens	2015-01-01	$85	$80	-18%	-45%	HOLD/SELL	CORRECT (secular decline)

Summary Stats:

Accuracy on direction: 18/20 correct (90%)
Accuracy on magnitude (% error): Average 25%, Median 18%, Range 5%–60%
Under/overestimate: 12 underestimates (too conservative), 8 overestimates (too aggressive)
Time to convergence: Average 2.3 years

Insight: Your model has a 90% directional hit rate but systematically underestimates growth and margin expansion in winners. You're too conservative on compounding businesses.

Diagram: Backtest Feedback Loop

Common Backtest Mistakes

Overfitting to Past Results If you tweak your model until it perfectly matches every past company, you've optimized for history, not the future. Use backtest insights to refine rules and ranges, not to lock in specific numbers.

Example: "My model called Microsoft growth at 7% but it actually did 13%, so I'll now assume 13% growth for all cloud companies." Wrong. Instead: "My 2016 model underestimated cloud TAM expansion. I'll now research TAM estimates more deeply before setting growth."

Survivorship Bias Only testing on companies still around in 2024 ignores the failures. Backtest should include bankruptcies, mergers, and dead companies. "How would my model have warned me about Blockbuster? Did I have mechanisms to catch secular decline?"

Cherry-Picking Historical Dates Testing 10 companies on dates that happen to be valuation peaks or troughs skews results. Use random dates or systematic dates (Jan 1 every year) to avoid bias.

Ignoring Regime Shifts A model may work in bull markets but fail in bear markets. Backtest across at least two different economic regimes. A 2008–2009 test and a 2015–2020 test tell you whether your model is robust.

Not Separating Process from Luck A correct call on one stock may be luck (you guessed the right outcome for wrong reasons). Look for patterns: Do you consistently underestimate certain categories (growth, margin expansion)? That's signal. One-off correct calls are noise.

Advanced: Sensitivity Backtest

Beyond comparing valuation to price, backtest your sensitivity analysis:

Question: "If I had run a sensitivity table in 2016 and placed Microsoft in the middle case, would I have been directionally right about uncertainty?"

Your 2016 sensitivity table for Microsoft:

Case	Growth Assumption	Margin Assumption	Intrinsic Value
Downside	4%	30%	$38
Base	7%	33%	$51
Upside	10%	35%	$68

Actual Outcome (2024): Value dramatically exceeded upside case.

Insight: Your uncertainty bands were too narrow. You didn't consider the possibility of 14%+ growth + 37% margins (cloud expansion). This points to a bias: Your process underestimates what high-quality businesses can achieve in favorable conditions.

Refined rule: "For companies with strong secular tailwinds (cloud, AI, cost-of-capital dynamics), widen upside case growth by 50% and margin expansion by 100–200 bps."

Backtesting Against Peer Methods

Compare your DCF not just to actual outcomes but to other valuation methods:

2016 Microsoft Backtest:

Your DCF: $51/share
Peer-based P/E multiple (33x trailing): $52/share
Historical average P/E (30x): $50/share
Price-to-sales (8x): $54/share
All methods clustered near actual price ($52)

This gives you confidence. If your DCF had said $80 while multiples said $50, that would be a warning flag.

FAQ

Q: How many years back should I backtest? A: At least 10 years, ideally across one full market cycle (bull, bear, recovery). More than 20 is rarely necessary; you see the patterns by then.

Q: Can I backtest on the same stocks I currently hold? A: Dangerous; you risk justifying past buys rather than learning. Include companies you rejected or didn't own. Test on dead companies and bankruptcies especially—what would your model have told you?

Q: What's a "good" backtest error rate? A: Directional accuracy 75%+. Magnitude error within 20% is solid; 30% is acceptable. If you're off by 50% on half your cases, refine the methodology before deploying real capital.

Q: Should I backtest on data I used to train my model? A: No. Use data you didn't see when building assumptions. If you trained on 2015 data, backtest on 2012 or 2008.

Q: What if I realize my model is wrong? A: Backtest is a test, not a guarantee. If results are poor, audit your model: Are assumptions reasonable? Is the three-statement logic correct? Are there systematic biases in how you set growth or discount rates? Fix the root cause, not the output.

Q: How often should I backtest? A: Annually if you actively value stocks. Whenever you change your methodology significantly. Backtesting is not a one-time exercise; it's part of continuous improvement.

Forecast Accuracy: How close predictions are to outcomes
Out-of-Sample Testing: Validating on data the model wasn't trained on
Walk-Forward Analysis: Rolling-window backtest where you use progressively newer data
Stress Testing: Forcing extreme scenarios; often paired with backtesting
Model Validation: Broader framework for checking model integrity

Summary

Backtesting bridges the gap between elegant theory and messy reality. A model that performs brilliantly on historical companies builds confidence; one that fails reveals blind spots. The goal isn't perfect prediction (impossible) but pattern recognition: Where do I consistently err? What categories of companies or regimes does my process miss? What should I adjust?

Use backtests not to over-optimize past results but to refine your assumption-setting process, tighten your uncertainty bands, and build honest conviction in your methodology. A model you've vetted on 15 past cases is far more trustworthy than one you've never tested.

Before you commit significant capital on a valuation, ask yourself: "Would this methodology have worked in the past? Where did it fail? Have I fixed those failures?" Backtest answers those questions.

Next Steps

Your model is tested, documented, and empirically validated. The final frontier: Maintaining Your Models covers how to evolve your valuations as new information arrives, updating efficiently without losing discipline, and knowing when to kill a thesis that no longer fits the facts.

Quick Definition​

Key Takeaways​

Designing a Backtest​

Building a Backtest Summary​

Diagram: Backtest Feedback Loop​

Common Backtest Mistakes​

Advanced: Sensitivity Backtest​

Backtesting Against Peer Methods​

FAQ​

Related Concepts​

Summary​

Next Steps​