Overconfidence

How to Measure Your Confidence: Quantifying Certainty in Investing

Pomegra Learn

How Do You Accurately Measure Your Confidence in Investment Decisions?

Measuring confidence sounds straightforward until you attempt it systematically. Most investors feel confident or skeptical intuitively, without translating those feelings into measurable quantities. This gap between intuitive certainty and measurable confidence creates room for overconfidence bias to operate undetected. Measuring your confidence means translating vague feelings into numerical probability estimates, then tracking whether your predictions actually align with outcomes. The discipline of measurement forces you to confront the gap between felt certainty and predictive accuracy.

Quick definition: Measuring confidence means converting your subjective certainty about an investment outcome into a numerical probability estimate (typically 0-100%), then systematically comparing those estimates to actual results over time to assess calibration.

Key takeaways

Confidence measurement requires converting intuitive certainty into probability estimates between 0-100%
Most investors are poorly calibrated, expressing more confidence than their track record justifies
Confidence levels should map to frequency: a 70% confidence estimate should be correct about 70% of the time
Simple confidence scales (high/medium/low) fail to detect overconfidence because they're too coarse
Professional investors, despite more experience, are typically worse calibrated than novices because they face stronger overconfidence incentives
Systematic measurement and feedback loops are the only reliable methods to improve calibration

The Difference Between Confidence and Calibration

Confidence is your subjective certainty about an outcome. Calibration is whether that subjective certainty matches objective frequency. A perfectly calibrated investor expresses 60% confidence in outcomes that actually occur 60% of the time. An overconfident investor expresses 75% confidence in outcomes that actually occur 45% of the time.

Most investors conflate these concepts. They believe that if they carefully analyze a decision, their confidence is justified. They feel confident, therefore they are calibrated. This is a critical error. Careful analysis can increase confidence without increasing accuracy. A value investor who spends six months analyzing a company and concludes it's undervalued by 30% might express 80% confidence. If that company actually declines 40% over the next two years, the investor's careful analysis created overconfidence, not calibration.

The distinction between confidence and calibration separates intuitive investors from disciplined ones. Intuitive investors track whether their confident predictions made money (outcome-focused). Disciplined investors track whether their probability estimates matched actual frequencies (calibration-focused). These produce radically different learning patterns.

Consider two portfolio managers:

Manager A makes 20 high-conviction stock picks, expresses 75% confidence in each, and 16 are correct. They feel validated because their confidence was justified.
Manager B makes 20 stock picks, expresses 60% confidence in each, and 14 are correct. They feel disappointed because their predictions underperformed their stated confidence.

Actually, Manager B is better calibrated. Manager A's 75% confidence estimate was excessive—the real outcome frequency was 80% (16 of 20), which is higher than their stated 75%. Manager B's 60% estimate was closer to the 70% actual outcome frequency (14 of 20). Yet Manager A feels validated while Manager B feels criticized. Outcome-focused feedback is terrible for improving calibration.

Creating a Probability Scale

The first practical step in measuring confidence is establishing a shared probability scale. Professional forecasters typically use a 0-100% scale with anchors at specific confidence levels:

0-10%: You believe the outcome is extremely unlikely but acknowledge non-zero probability.
20-30%: You believe the outcome is probably not going to happen, but significant uncertainty remains.
40-50%: Genuine uncertainty; you're essentially unable to predict better than chance.
60-70%: You believe the outcome is more likely than not, but substantial doubt exists.
80-90%: You believe the outcome is highly likely, but acknowledge meaningful exceptions.
95-100%: You believe the outcome is virtually certain, reserved only for near-tautologies.

The critical feature of this scale is the distribution: you should rarely express confidence above 80% for non-trivial predictions. Overconfident investors bunch their estimates at 75-85% across diverse predictions, which immediately signals miscalibration.

A pharmaceutical company exec asks your confidence that their new drug will receive FDA approval. You research the clinical trial data, the competitive landscape, the regulatory timeline. You conclude: 65% confidence. This probability estimate contains specific information about what you've learned (more positive than 50-50 chance, but meaningful risk remains).

Compare this to a casual investor's typical approach: "I've done my research and I'm confident the stock will go up." This conveys no measurable information. It's equally consistent with 55% confidence or 80% confidence. The lack of specificity permits overconfidence to hide.

The Anchoring Problem in Self-Assessment

When you estimate your own confidence, you face a systematic bias: your confidence estimate tends to anchor on the confidence you're asked to express. If I ask "What's your confidence this stock will outperform?" you naturally land somewhere in the 60-80% range because that feels confident without being absurd. You're not processing the actual evidence; you're anchoring on the social normality of that confidence range.

Research from University of Pennsylvania found that professional forecasters ask too many questions about "what" (what will happen) and too few about "what not" (what evidence would prove me wrong?). When forecasters explicitly write down the scenario that would falsify their thesis, their confidence estimates become measurably more accurate.

To overcome anchoring, follow this protocol:

Without thinking about confidence, list the specific reasons why the outcome might not occur.
Estimate the frequency of each reason based on historical data.
Only then estimate your overall confidence as the complement.

Suppose you're considering a real estate development that depends on a zoning variance. Rather than asking "What's my confidence the variance will be approved?" follow this process:

Reason it fails: The planning board rejects the application at the objection of neighborhood groups. Variance denial rate: 35% in this city.
Reason it fails: The applicant's funding falls through after initial board review. Project funding failure rate: 8%.
Reason it fails: The applicant must re-file due to changed circumstances. Re-filing rate: 12%.

Using these base rates, you estimate your confidence at roughly 50% (1 - 0.35 - 0.08 - 0.12 + overlap corrections). This feels less confident than your intuitive assessment, which was 65%. The difference—moving from 65% to 50%—represents the anchoring bias you had to overcome through systematic reasoning.

Tracking Calibration Over Time

The only way to improve your confidence measurement is to track your predictions systematically and compare estimates to outcomes. This requires establishing a forecast log where you record:

The prediction (clear, specific outcome)
The confidence level (0-100%)
The time horizon (when will this be resolved?)
The outcome (what actually occurred)
The calibration error (was your estimate higher or lower than the actual frequency?)

After every 20-30 predictions, analyze your calibration. Create a table comparing your estimated confidence to your actual success rate:

Estimated Confidence    Actual Success Rate    Count
50-55%                 48%                    12
60-65%                 72%                    18
70-75%                 82%                    15
80-85%                 91%                    8
90-95%                 93%                    3

This table reveals your calibration profile. If your 60-65% predictions are correct 72% of the time, you're underconfident in that range. If your 80-85% predictions are correct 91% of the time, you're overconfident in that range. The pattern across the full distribution shows where you most systematically overestimate certainty.

Most professional investors are overconfident in the 70-85% range. This is the "confidence sweet spot" where their analysis feels most convincing and their career incentives push them hardest to express high conviction. Yet their actual success rate in that range is typically 60-70%—roughly 10-15 percentage points below their stated confidence.

Different Confidence Contexts

Investment confidence varies by context, and different contexts require different measurement approaches. A short-term trade confidence differs from a multi-year thesis confidence. An earnings forecast confidence differs from a sector rotation confidence.

Company-specific predictions (Will this stock outperform the sector in six months?) typically yield better calibration because you're comparing outcomes in a narrow scope. A stock either outperforms the sector or it doesn't. Historical accuracy of such predictions is measurable.

Macro predictions (Will rates rise next year?) typically yield worse calibration because many correlated variables affect outcomes. Your prediction might be correct directionally but wrong in timing, magnitude, or implementation. Did you predict correctly if rates rose 50 basis points instead of the 100 basis points you expected?

Binary outcomes (Will this company beat earnings?) are easier to measure. Your 65% confidence either was justified by a beat, or it wasn't.

Range predictions (Earnings will be between $2.40 and $2.60 per share) require different calibration assessment. Did you set your range too narrow (not capturing actual results frequently enough)? Too wide (you claimed precision you didn't have)?

Each context requires adapted measurement. The discipline is consistent—translate confidence to probability, track outcomes, assess calibration—but the implementation varies.

The Expertise Paradox in Confidence Measurement

Here's a counterintuitive finding: professional investors are worse calibrated than random forecasters. A 2015 meta-analysis of forecasting accuracy found that expert forecasters with deep domain knowledge showed worse calibration than novices making random guesses. Why? Experts have stronger incentives to express high confidence (career reasons), stronger beliefs in their mental models (expertise trap), and more sophisticated-sounding justifications (confidence in language).

An index-fund investor assessing confidence in stock picks feels comfortable saying "I'm 50% confident because I don't actually know." A legendary value investor assessing confidence says "I'm 75% confident based on my 30-year track record and proprietary analysis." Both might face identical market randomness; the value investor's calibration is likely worse because their professional status and financial success create stronger overconfidence incentives.

This paradox suggests that the most dangerous investors are simultaneously the most expert and the least self-aware about their calibration gaps. Measurement becomes more critical the more expertise you accumulate, precisely because expertise creates stronger overconfidence.

Real-world examples

David Tice and the Deflationary Thesis: Tice, a respected hedge fund manager, expressed high confidence (approximately 80%) in a deflationary scenario throughout the 2000s. He made substantial fund bets on deflation, shorting equities and longing Treasury bonds. His analysis was sophisticated and his reasoning sound. But his prediction proved wrong for nearly a decade, causing severe underperformance. Tice never measured his calibration or acknowledged that his confidence exceeded his accuracy rate. A 60% confidence level (not certainty) would have been more appropriate for such a low-probability, long-duration prediction.

The Dot-Com Boom Analyst Confidence: Equity analysts rating dot-com stocks expressed average confidence of 82% in "buy" ratings at the peak of the 2000 bubble, according to research from Indiana University. Actual success rate of those recommendations: 34%. Their confidence was catastrophically miscalibrated. If they had measured calibration, they would have discovered that "buy" ratings at that market stage corresponded to a 34% success rate, not 82%.

Warren Buffett's Calibration Success: Buffett has explicitly stated that his major investment decisions are made with 70-80% confidence, not higher. When he discusses the Berkshire acquisition or major portfolio additions, he talks about "high conviction with meaningful downside risk." This language reflects appropriate calibration. His actual track record suggests his confidence levels are reasonably aligned with outcome frequencies—not because he's always right, but because he's systematically measured and adjusted his confidence estimates.

Lehman Brothers Risk Models: Lehman's quantitative risk team expressed confidence in their value-at-risk models that proved catastrophically miscalibrated. Their models suggested 99% confidence that daily losses wouldn't exceed $200 million. In September 2008, the firm experienced losses exceeding $2 billion in a single week—far outside their estimated confidence bounds. The calibration error wasn't in the analysis; it was in the confidence interval assigned to the analysis.

Common mistakes

Mistake 1: Assuming past accuracy validates future confidence. You made 5 stock picks in 2022 with 70% expressed confidence and 4 were correct. This doesn't mean 70% confidence is appropriate going forward. You might have benefited from market regime tailwinds. Your sample size is too small to establish calibration. Only after 20-30 predictions across different market conditions can you assess whether your confidence levels are appropriate.

Mistake 2: Conflating confidence with analysis quality. Careful, detailed analysis should reduce overconfidence, not increase it. If your six-month deep dive into a company makes you MORE confident (moving from 60% to 80%), that's often a red flag. Deep analysis should reveal more edge cases and scenarios where you're wrong, which should reduce confidence or at minimum keep it stable.

Mistake 3: Using confidence ranges instead of point estimates. When forced to choose, saying "I'm confident the stock will be between $40 and $60" is too vague for calibration. You're claiming a 50% range, but $40 and $60 might represent your 5th percentile and 95th percentile, not your actual price distribution. Specific point estimates ("$52") force clearer thinking about confidence.

Mistake 4: Measuring confidence only on winning positions. This creates selection bias in your feedback. Your winning positions obviously were confident calls; your losing positions might have been equally confident but experienced unfavorable randomness. You need to track predictions whether they win or lose to measure calibration accurately.

Mistake 5: Adjusting confidence retroactively after outcomes are known. Once you know the result, your "confidence" becomes contaminated with hindsight bias. You'll systematically adjust down your stated confidence on losses ("I was never that confident") and adjust up on wins ("I was even more confident than I realized"). Only pre-outcome confidence measurement avoids this distortion.

FAQ

Is 50% confidence ever the right answer, or does it signal bad analysis?

50% confidence is absolutely correct for some decisions. When genuine uncertainty exists and evidence points equally in both directions, 50% represents accurate calibration, not bad analysis. The error is assuming 50% is rare—it should appear 15-25% of the time in your prediction log. If it never appears, you're probably overconfident about your ability to predict.

How do I measure confidence for long-term predictions that take five years to resolve?

Long-term predictions require interim checkpoints. Rather than waiting five years to assess a "company will dominate its market" prediction, establish quarterly or annual milestones. After each milestone, reassess your confidence. If Year 1 results suggest your prediction is wrong, update your confidence down before the five-year mark. This prevents long-term overconfidence from festering unnoticed.

Should I measure confidence differently for probability-based outcomes versus yes-no outcomes?

Yes, the measurement protocol differs but the principle is identical. For probability outcomes (the Fed will raise rates), you measure whether your confidence matches the actual frequency. For yes-no outcomes (Will Company X achieve profitability?), you measure whether your confidence matches the success rate. The method adapts but calibration principle remains: does your estimate frequency match your success frequency?

What's a reasonable confidence range for professional investors?

Research suggests well-calibrated professional investors express confidence that ranges from 30% to 85%, with a mean around 55-60%. They rarely exceed 85% because truly confident bets are rarer than confidence language suggests. If your confidence distribution is 70-80% with rare excursions, you're likely overconfident. If it's 40-70% with frequent 30-40% assessments, you're probably closer to well-calibrated.

How does recency bias affect my confidence measurement?

Recency bias causes recent outcomes to disproportionately affect your confidence assessment. If your last three high-confidence predictions were correct, you'll express higher confidence going forward—even if that success was due to market conditions, not analysis quality. Combat this by tracking calibration over rolling 20-30 prediction windows, not just recent outcomes.

Can I measure confidence collectively across a team, or only individually?

Both are valuable but for different reasons. Individual measurement reveals personal calibration patterns and trains individual discipline. Team measurement reveals whether shared mental models are driving overconfidence. Collective overconfidence (entire team expressing 75% confidence in a macro call that's wrong) is more dangerous than individual overconfidence because it concentrates risk.

Summary

Measuring confidence transforms vague certainty feelings into measurable probability estimates, enabling systematic learning about your calibration. Most investors are poorly calibrated, expressing more confidence than their outcomes justify, particularly in the 70-85% confidence range. The only reliable method to improve calibration is maintaining a forecast log where you record predictions, their confidence estimates, and actual outcomes, then analyzing whether your 70% estimates succeed 70% of the time. Professional investors face stronger overconfidence incentives than novices due to career structures rewarding confidence. By tracking calibration systematically—comparing your probability estimates to outcome frequencies—you can identify personal overconfidence patterns and adjust confidence levels in domains where you persistently overestimate certainty.

→ The Calibration Exercise

Key takeaways​

The Difference Between Confidence and Calibration​

Creating a Probability Scale​

The Anchoring Problem in Self-Assessment​

Tracking Calibration Over Time​

Different Confidence Contexts​

The Expertise Paradox in Confidence Measurement​

Real-world examples​

Common mistakes​

FAQ​

Is 50% confidence ever the right answer, or does it signal bad analysis?​

How do I measure confidence for long-term predictions that take five years to resolve?​

Should I measure confidence differently for probability-based outcomes versus yes-no outcomes?​

What's a reasonable confidence range for professional investors?​

How does recency bias affect my confidence measurement?​

Can I measure confidence collectively across a team, or only individually?​

Related concepts​

Summary​

Next​