Position Sizing Methods

Where the Kelly Criterion Comes From: The Mathematical Derivation

Pomegra Learn

Why Does the Kelly Criterion Formula Work the Way It Does?

The Kelly Criterion's power lies not in empirical happenstance, but in rigorous mathematics. To understand why the formula f* = (p × b - q) / b is the right way to size positions, we must derive it from first principles. The derivation begins with a simple insight: if you maximize the logarithm of expected wealth over many trials, the optimal bet size falls out of calculus naturally.

This article walks through the mathematical journey from a gambler's intuition—"I want to get rich as fast as possible"—to the precise formula traders use today. Along the way, you'll encounter logarithmic utility (the lens through which Kelly sees wealth), the expectation operator (how to average outcomes across win/loss scenarios), and the derivative (the calculus tool that finds the optimal bet size). Understanding the derivation doesn't change how you apply Kelly in trading, but it transforms the formula from a mysterious recipe into a logical consequence of mathematics.

Quick definition: The Kelly Criterion is derived by maximizing the expected logarithm of final bankroll over many bets, which yields the formula f* = (p × b - q) / b. The logarithmic utility function embeds the assumption that you prefer steady, compounding growth over volatile, linear gains.

Key takeaways

The Kelly formula emerges from maximizing log-wealth (expected geometric growth) rather than linear wealth
Log-utility reflects the real-world principle that wealth is useful to you in ratios, not absolutes—doubling from $1,000 to $2,000 feels as good as doubling from $100,000 to $200,000
The derivation requires setting up an expectation equation, then using calculus to find the bet fraction that maximizes it
The critical mathematical step is taking the first derivative of log-wealth with respect to bet size and setting it equal to zero
Understanding the derivation clarifies why Kelly scales position size with your edge—better odds demand larger positions

The Setup: Bankroll, Bets, and Wealth Growth

Imagine you start with a bankroll W (your account equity). You place a bet, risking a fraction f of your bankroll:

Amount wagered: f × W
If you win (probability p): You gain b × (f × W), where b is the odds ratio. Your new bankroll is W + b × f × W = W × (1 + b × f).
If you lose (probability q = 1 - p): You lose f × W. Your new bankroll is W - f × W = W × (1 - f).

After one bet, your wealth is either W × (1 + b × f) or W × (1 - f), with probabilities p and q respectively.

Now imagine you repeat this bet n times independently. Your final wealth is:

Final Wealth = W × (1 + b × f)^(n_wins) × (1 - f)^(n_losses)

Where n_wins is the number of times you win and n_losses is the number of times you lose, and n_wins + n_losses = n total bets.

For large n, the law of large numbers tells us that n_wins converges to p × n, and n_losses converges to q × n. So:

Final Wealth ≈ W × (1 + b × f)^(p×n) × (1 - f)^(q×n)

This is the foundation. Now we want to find the value of f that maximizes this expression over many bets.

Enter Logarithms: Why Log-Wealth, Not Wealth?

Taking the logarithm of both sides:

log(Final Wealth) = log(W) + p × n × log(1 + b × f) + q × n × log(1 - f)

Why logarithms? Because if we tried to maximize Final Wealth directly (the linear version), the math would tell us to bet 100% of your bankroll on the best-odds bet and never diversify. That's mathematically optimal for one bet, but it doesn't reflect how real traders behave. Real traders care about long-term compounding, not one-time absolute wealth.

Logarithmic utility captures compounding: each doubling of wealth contributes equally to your utility, regardless of starting level. A log-scale treats "$100 → $200" and "$1,000,000 → $2,000,000" identically—both are a doubling. This reflects the intuition that proportional returns matter more than absolute returns in a trader's world.

The log-utility approach is called the geometric mean maximization or expected logarithmic return optimization. It's the framework for all long-term, compounding strategies.

The Expectation: Average Wealth Across Outcomes

Expand the equation above by dividing both sides by n (since we want the per-bet average, not the total):

(1/n) × log(Final Wealth) = (1/n) × log(W) + p × log(1 + b × f) + q × log(1 - f)

As n gets large, (1/n) × log(W) approaches zero. We're left with:

Per-Bet Log-Growth = p × log(1 + b × f) + q × log(1 - f)

This expression is the expected logarithm of the multiplier on your wealth per bet. If this number is positive, you're compounding; if negative, you're shrinking. Kelly seeks the value of f that maximizes this.

The Calculus: Finding the Optimum

To maximize, take the derivative with respect to f and set it equal to zero:

d/df [p × log(1 + b × f) + q × log(1 - f)] = 0

Using the chain rule (derivative of log(u) is 1/u):

p × (b / (1 + b × f)) + q × (-1 / (1 - f)) = 0

Rearrange:

p × b / (1 + b × f) = q / (1 - f)

Cross-multiply:

p × b × (1 - f) = q × (1 + b × f)

Expand:

p × b - p × b × f = q + q × b × f

Collect f terms:

p × b - q = f × (p × b + q × b)
p × b - q = f × b × (p + q)

Since p + q = 1 (probability must sum to 1):

p × b - q = f × b

Solve for f:

f* = (p × b - q) / b

This is the Kelly formula, derived from first principles.

Verifying the Formula: Does It Make Intuitive Sense?

Let's check three scenarios to verify the formula behaves sensibly:

Scenario 1: No edge (p = 0.5, b = 1).

f* = (0.5 × 1 - 0.5) / 1 = 0 / 1 = 0

Kelly says don't bet at all. Correct: if you're a coin-flip bettor (50-50 win rate, 1:1 odds), you have zero edge. Betting any fraction of your bankroll shrinks your wealth in expectation.

Scenario 2: Strong edge (p = 0.6, b = 2).

f* = (0.6 × 2 - 0.4) / 2 = (1.2 - 0.4) / 2 = 0.4

Kelly says bet 40%. Correct: with a 60% win rate and 2:1 odds, you have significant edge. A larger bet size exploits the edge faster.

Scenario 3: Positive odds, negative edge (p = 0.4, b = 2).

f* = (0.4 × 2 - 0.6) / 2 = (0.8 - 0.6) / 2 = 0.1

Kelly says bet 10%. But wait—you lose 60% of the time! A negative expected value (-40% of the time, then +40% from those losses). Why would Kelly recommend a positive bet?

This reveals an important boundary: Kelly only works when p × b > q, or equivalently, when your expected value per bet is positive. In Scenario 3, p × b = 0.8 but q = 0.6; 0.8 > 0.6, so there's a 0.2 unit of positive expectancy. But the math is misleading; the true edge is questionable, and Kelly's recommendation would lead to ruin in practice. Always verify that your edge is real before trusting Kelly.

Understanding the Numerator and Denominator

The Kelly formula has two parts:

Numerator: p × b - q
Denominator: b

The numerator (p × b - q) is your expected profit per dollar wagered (before accounting for the odds ratio). It tells you how much you win in expectation:

If p = 0.6, b = 2, then p × b - q = 1.2 - 0.4 = 0.8. You expect to gain 0.8 units per unit wagered.
If p = 0.55, b = 1, then p × b - q = 0.55 - 0.45 = 0.1. You expect to gain only 0.1 units per unit wagered.

The denominator (b) scales the fraction based on how asymmetric the payoff is:

If b = 1 (1:1 odds, equal upside and downside), you bet a fraction equal to your edge (0.1 for the 10% edge above).
If b = 2 (2:1 odds, double upside), you can afford to risk more because wins are bigger, so you bet twice as large.
If b = 10 (huge payoff), you can bet a huge fraction of your bankroll (though rarely more than 100%).

The Second Derivative Test: Confirming It's a Maximum

Calculus requires one more check: we've found a critical point (where the first derivative is zero), but is it a maximum or a minimum? Take the second derivative:

d²/df² [p × log(1 + b × f) + q × log(1 - f)]
= p × d/df [b / (1 + b × f)] + q × d/df [-1 / (1 - f)]
= -p × b² / (1 + b × f)² - q / (1 - f)²

At the optimal f*, both terms are negative (you can verify this by substitution). A negative second derivative means the critical point is a maximum, confirming Kelly gives the optimal bet size.

A Worked Derivation Example

Let's apply the derivation to a specific trader's system to make the math concrete:

System: Momentum breakout on large-cap stocks.

Backtest data: 100 trades, 58 wins, 42 losses.
Average win: +$300 per contract.
Average loss: -$200 per contract.
Account: $50,000.

Step 1: Extract parameters.

p = 58 / 100 = 0.58
q = 42 / 100 = 0.42
b = 300 / 200 = 1.5

Step 2: Plug into Kelly formula.

f* = (p × b - q) / b
f* = (0.58 × 1.5 - 0.42) / 1.5
f* = (0.87 - 0.42) / 1.5
f* = 0.45 / 1.5
f* = 0.30

Step 3: Interpret.

Full Kelly says risk 30% of the $50,000 account per trade = $15,000 per trade. But professionals would likely use half-Kelly (15%) or quarter-Kelly (7.5%) to reduce drawdown risk. At half-Kelly, the trader risks $7,500 per trade.

The Assumption of Independence: A Critical Limitation

The derivation assumes each bet is independent—the outcome of trade 1 doesn't affect the probability of trade 2. In reality, trading systems exhibit autocorrelation: if you just won a trade, you might be in a trending market where the next trade is also likely to win. Conversely, a loss might indicate the market has turned choppy, reducing the next trade's win rate.

When bets are correlated, the optimal Kelly fraction decreases. If your wins and losses cluster (positive autocorrelation), you should bet less than Kelly to account for the risk of consecutive losses. The full Kelly formula underestimates the risk-of-ruin in this scenario.

This is one reason professionals use fractional Kelly: it implicitly buffers for dependency that the pure derivation ignores.

The Log-Utility Assumption: Is It the Right Lens?

The derivation assumes you care about logarithmic growth, but what if you don't? What if you care about minimizing bankruptcy risk or maximizing probability of reaching a specific wealth target?

Different utility functions lead to different optimal bet sizes:

Linear utility (maximize expected absolute wealth) → Bet 100% on the highest-odds bet. Volatile but maximizes expected dollar gains.
Logarithmic utility (maximize geometric growth) → Kelly formula. Balances growth and survival.
Square-root utility (less risk-averse than log) → Bet more than Kelly. Faster growth but higher bankruptcy risk.
Safety-first utility (minimize bankruptcy probability) → Bet less than Kelly. Conservative, prioritizes survival.

Kelly optimizes for traders who play indefinitely (or at least for a very long horizon) and care about proportional growth. For traders with a finite time horizon or specific wealth target, alternative frameworks might be better.

Kelly and Information Theory

A fascinating connection: Kelly's formula emerges naturally from information theory. If you have a slight informational edge—you know slightly more about the true odds than the market does—the rate at which you can accumulate information (and thus wealth) is proportional to your edge, scaled by the payoff odds. This is why Kelly is also called the information-optimal bet.

Shannon's theorem (from information theory) states that your optimal rate of wealth accumulation equals the rate of information gain. The Kelly formula quantifies this rate mathematically.

Common Derivation Mistakes and Clarifications

Mistake 1: Confusing log base. The derivation works with natural logarithms (ln), but Kelly is base-independent. Whether you use ln, log₁₀, or log₂, the resulting f* is the same because the ratio of logs is constant. Use whatever base is natural to your calculations.

Mistake 2: Forgetting q = 1 - p. Some derivations write out p and q separately as independent variables. They're not—they sum to 1. Keep this constraint in mind to avoid algebraic errors.

Mistake 3: Assuming fractional Kelly comes from the derivation. Half-Kelly and quarter-Kelly don't emerge from the math; they're post-hoc adjustments for real-world friction and psychology. The derivation gives full Kelly only.

Mistake 4: Applying Kelly to dependent events. The derivation assumes independence. In trending markets, winning and losing trades cluster, violating this assumption. Kelly alone doesn't account for this; you must reduce Kelly fraction manually for correlated systems.

FAQ

Does the Kelly derivation change if the odds are asymmetric (e.g., double-up on win, triple-loss on loss)?

The formula accounts for any asymmetric odds as long as you define b correctly as the win payout divided by the loss amount. The derivation doesn't care about the asymmetry; it finds the optimal fraction regardless.

Why use logarithms instead of, say, the median or mode of final wealth?

Logarithms capture compounding—the reality that you reinvest your profits and losses are subtracted from a shrinking base. Median and mode don't embed this recursion. For single bets, linear wealth makes sense; for repeated bets, logarithmic utility is correct.

Can I derive Kelly for continuous returns instead of discrete win/loss?

Yes. If you model returns as a continuous distribution (e.g., normally distributed returns), the Kelly formula becomes f* = (μ - r) / σ², where μ is expected return, r is the risk-free rate, and σ² is variance. This is the starting point for many asset allocation formulas.

Does the derivation assume you use Kelly for every bet forever?

The derivation assumes you apply a constant fraction f to every bet and that bets repeat many times. If you plan to stop after N bets (finite horizon), Kelly is less optimal, and a more conservative fraction might be better.

Why doesn't Kelly minimize risk-of-ruin?

Kelly maximizes log-wealth, not risk-of-ruin. Minimizing risk-of-ruin would require a different optimization (e.g., minimizing bankruptcy probability), leading to a smaller position size. Kelly accepts some bankruptcy risk in exchange for faster growth.

How does slippage and commissions change the derivation?

They change the effective p, b, and q values. If commissions are 0.5% per trade, your average winner shrinks from 300 to 298.5, and your average loser grows from -200 to -201 (you lose the original amount plus commission). Plug these adjusted values into Kelly to get a more realistic fraction.

Summary

The Kelly Criterion emerges naturally from the mathematics of logarithmic utility and wealth compounding. By setting up an expression for final wealth after n repeated bets, taking its logarithm (to capture proportional growth), and optimizing with respect to bet fraction, we arrive at f* = (p × b - q) / b. This derivation shows why Kelly scales position size with your edge—higher win probability or better odds justify larger bets. The formula is elegant and mathematically sound, provided three conditions hold: your edge is real, bets are independent, and you accept that full Kelly creates significant account volatility.

The practical implication is that Kelly gives the theoretical maximum long-term wealth accumulation rate over infinite trials. But traders operate with finite capital and finite careers, where volatility can force psychological capitulation. This is why professionals use fractional Kelly—half or quarter—which preserves most of the mathematical advantage while reducing drawdowns to tolerable levels.

Understanding the derivation doesn't change how you apply Kelly, but it removes the mystery. You're not using an empirical formula; you're using the natural consequence of the mathematics of compounding. With that knowledge, you can confidently apply Kelly (or a fractional variant) with the assurance that the formula is as close to optimal as mathematics permits.

→ Half-Kelly: The Practitioner's Choice

Key takeaways​

The Setup: Bankroll, Bets, and Wealth Growth​

Enter Logarithms: Why Log-Wealth, Not Wealth?​

The Expectation: Average Wealth Across Outcomes​

The Calculus: Finding the Optimum​

Verifying the Formula: Does It Make Intuitive Sense?​

Understanding the Numerator and Denominator​

The Second Derivative Test: Confirming It's a Maximum​

A Worked Derivation Example​

The Assumption of Independence: A Critical Limitation​

The Log-Utility Assumption: Is It the Right Lens?​

Kelly and Information Theory​

Common Derivation Mistakes and Clarifications​

FAQ​

Does the Kelly derivation change if the odds are asymmetric (e.g., double-up on win, triple-loss on loss)?​

Why use logarithms instead of, say, the median or mode of final wealth?​

Can I derive Kelly for continuous returns instead of discrete win/loss?​

Does the derivation assume you use Kelly for every bet forever?​

Why doesn't Kelly minimize risk-of-ruin?​

How does slippage and commissions change the derivation?​

Related concepts​

Summary​

Next​