Kelly criterion
In probability theory, the Kelly criterion is a formula for sizing a sequence of bets by maximizing the long-term expected value of the logarithm of wealth, which is equivalent to maximizing the long-term expected geometric growth rate. John Larry Kelly Jr., a researcher at Bell Labs, described the criterion in 1956.
The practical use of the formula has been demonstrated for gambling, and the same idea was used to explain diversification in investment management. In the 2000s, Kelly-style analysis became a part of mainstream investment theory and the claim has been made that well-known successful investors including Warren Buffett and Bill Gross use Kelly methods. Also see intertemporal portfolio choice. It is also the standard replacement of statistical power in anytime-valid statistical tests and confidence intervals, based on e-values and e-processes.
Kelly criterion for binary return rates
In a system where the return on an investment or a bet is binary, so an interested party either wins or loses a fixed percentage of their bet, the expected growth rate coefficient yields a very specific solution for an optimal betting percentage.Gambling Formula
Where losing the bet involves losing the entire wager, the Kelly bet is:where:
- is the fraction of the current bankroll to wager.
- is the probability of a win.
- is the probability of a loss.
- is the proportion of the bet gained with a win. E.g., if betting $10 on a 2-to-1 odds bet, then.
If the gambler has zero edge, then the criterion recommends the gambler bet nothing.
If the edge is negative, the formula gives a negative result, indicating that the gambler should take the other side of the bet.
Investment formula
A more general form of the Kelly formula allows for partial losses, which is relevant for investments:where:
- is the fraction of the assets to apply to the security.
- is the probability that the investment increases in value.
- is the probability that the investment decreases in value.
- is the fraction that is gained in a positive outcome. If the security price rises 10%, then.
- is the fraction that is lost in a negative outcome. If the security price falls 10%, then
The general form can be rewritten as follows
where:
- is the win-loss probability ratio, which is the ratio of winning to losing bets.
- is the win-loss ratio of bet outcomes, which is the winning skew.
The Kelly formula can easily result in a fraction higher than 1, such as with losing size . This happens somewhat counterintuitively, because the Kelly fraction formula compensates for a small losing size with a larger bet. However, in most real situations, there is high uncertainty about all parameters entering the Kelly formula. In the case of a Kelly fraction higher than 1, it is theoretically advantageous to use leverage to purchase additional securities on margin.
Betting example – behavioural experiment
In a study, each participant was given $25 and asked to place even-money bets on a coin that would land heads 60% of the time. Participants had 30 minutes to play, so could place about 300 bets, and the prizes were capped at $250. But the behavior of the test subjects was far from optimal:Using the Kelly criterion and based on the odds in the experiment, the right approach would be to bet 20% of one's bankroll on each toss of the coin, which works out to a 2.034% average gain each round. This is a geometric mean, not the arithmetic rate of 4%. The theoretical expected wealth after 300 rounds works out to $10,505 if it were not capped.
In this particular game, because of the cap, a strategy of betting only 12% of the pot on each toss would have even better results.
Proof
Heuristic proofs of the Kelly criterion are straightforward. The Kelly criterion maximizes the expected value of the logarithm of wealth. We start with 1 unit of wealth and bet a fraction of that wealth on an outcome that occurs with probability and offers odds of. The probability of winning is, and in that case the resulting wealth is equal to. The probability of losing is and the odds of a negative outcome is. In that case the resulting wealth is equal to. Therefore, the geometric growth rate is:We want to find the maximum r of this curve, which involves finding the derivative of the equation. This is more easily accomplished by taking the logarithm of each side first; because the logarithm is monotonic, it does not change the locations of function extrema. The resulting equation is:
with denoting logarithmic wealth growth. To find the value of for which the growth rate is maximized, denoted as, we differentiate the above expression and set this equal to zero. This gives:
Rearranging this equation to solve for the value of gives the Kelly criterion:
To be thorough, we should also consider the behaviour as approaches the boundaries and since there can be a maximum there without the derivative being 0. But tends to −∞ for both. Finally, we need to show that the critical point found is not a minimum, this can be easily shown by computing the second derivative which is strictly negative for all in the domain.
Notice that this expression reduces to the simple gambling formula when, when a loss results in full loss of the wager.
Kelly criterion for non-binary return rates
If the return rates on an investment or a bet are continuous in nature the optimal growth rate coefficient must take all possible events into account.Application to the stock market
In mathematical finance, if security weights maximize the expected geometric growth rate, then a portfolio is growth optimal.The Kelly Criterion shows that for a given volatile security this is satisfied when
where is the fraction of available capital invested that maximizes the expected geometric growth rate, is the expected growth rate coefficient, is the variance of the growth rate coefficient and is the risk-free rate of return. Note that a symmetric probability density function was assumed here.
Computations of growth optimal portfolios can suffer tremendous garbage in, garbage out problems. For example, the cases below take as given the expected return and covariance structure of assets, but these parameters are at best estimates or models that have significant uncertainty. If portfolio weights are largely a function of estimation errors, then Ex-post performance of a growth-optimal portfolio may differ fantastically from the ex-ante prediction. Parameter uncertainty and estimation errors are a large topic in portfolio theory. An approach to counteract the unknown risk is to invest less than the Kelly criterion.
Rough estimates are still useful. If we take excess return 4% and volatility 16%, then yearly Sharpe ratio and Kelly ratio are calculated to be 25% and 150%. Daily Sharpe ratio and Kelly ratio are 1.7% and 150%. Sharpe ratio implies daily win probability of p=, where we assumed that probability bandwidth is. Now we can apply discrete Kelly formula for above with, and we get another rough estimate for Kelly fraction. Both of these estimates of Kelly fraction appear quite reasonable, yet a prudent approach suggest a further multiplication of Kelly ratio by 50%.
A detailed paper by Edward O. Thorp and a co-author estimates Kelly fraction to be 117% for the American stock market SP500 index.
Significant downside tail-risk for equity markets is another reason to reduce Kelly fraction from naive estimate.
Proof
A rigorous and general proof can be found in Kelly's original paper or in some of the other references listed below. Some corrections have been published.We give the following non-rigorous argument for the case with to show the general idea and provide some insights.
When, a Kelly bettor bets times their initial wealth, as shown above. If they win, they have after one bet. If they lose, they have. Suppose they make bets like this, and win times out of this series of bets. The resulting wealth will be:
The ordering of the wins and losses does not affect the resulting wealth. Suppose another bettor bets a different amount, for some value of . They will have after a win and after a loss. After the same series of wins and losses as the Kelly bettor, they will have:
Take the derivative of this with respect to and get:
The function is maximized when this derivative is equal to zero, which occurs at:
which implies that
but the proportion of winning bets will eventually converge to:
according to the weak law of large numbers.
So in the long run, final wealth is maximized by setting to zero, which means following the Kelly strategy.
This illustrates that Kelly has both a deterministic and a stochastic component. If one knows K and N and wishes to pick a constant fraction of wealth to bet each time, one will end up with the most money if one bets:
each time. This is true whether is small or large. The "long run" part of Kelly is necessary because K is not known in advance, just that as gets large, will approach. Someone who bets more than Kelly can do better if for a stretch; someone who bets less than Kelly can do better if for a stretch, but in the long run, Kelly always wins.
The heuristic proof for the general case proceeds as follows.
In a single trial, if one invests the fraction of their capital, if the strategy succeeds, the capital at the end of the trial increases by the factor, and, likewise, if the strategy fails, the capital is decreased by the factor. Thus at the end of trials, the starting capital of $1 yields
Maximizing, and consequently, with respect to leads to the desired result
Edward O. Thorp provided a more detailed discussion of this formula for the general case. There, it can be seen that the substitution of for the ratio of the number of "successes" to the number of trials implies that the number of trials must be very large, since is defined as the limit of this ratio as the number of trials goes to infinity. In brief, betting each time will likely maximize the wealth growth rate only in the case where the number of trials is very large, and and are the same for each trial. In practice, this is a matter of playing the same game over and over, where the probability of winning and the payoff odds are always the same. In the heuristic proof above, successes and failures are highly likely only for very large.