Methodology · 20 May 2026

Your Sharpe ratio is lying to you

A single Sharpe number with no confidence interval is a coin flip in disguise. It's the most-quoted statistic in retail backtesting and the most commonly misused. Here's the math, a concrete example, and what to report instead.

Open any retail backtesting tool. Run a strategy. You'll get a number that looks like this:

Strategy:

SMA(50/200) on SPY

Annualised return:

+8.4%

Max drawdown:

-18.2%

Sharpe ratio:

0.74

That Sharpe of 0.74 looks like a real number. It's not. It's a point estimate with no associated uncertainty — and as we'll see, the true Sharpe of that same strategy over similar histories could plausibly be anywhere from negative to 1.4. Without a confidence interval, you don't actually know which of those it is.

This post is about why the single Sharpe number people quote everywhere is one of the most misleading numbers in finance, what the math actually says, and what to report instead.

What the Sharpe ratio is, properly

The Sharpe ratio (William Sharpe, 1966) is defined as:

Sharpe = (average return − risk-free rate) / standard deviation of returns

In plain English: how much return are you getting per unit of volatility? Higher is better. Above 1.0 is conventionally called "solid", above 2.0 is "exceptional", below 0.5 is "weak". These thresholds get repeated in every backtesting tutorial and every fund manager's pitch deck.

But the Sharpe ratio is a sample statistic, not a population truth. You computed it from the actual trades your backtest produced. Those trades are themselves a sample — a particular sequence of returns drawn from some underlying distribution of "what this strategy does on this market." If you'd tested over a slightly different stretch of history, or with slightly different timing, or on a slightly different asset, you'd have gotten a different sample, and a different Sharpe.

The question that matters is: how much would the Sharpe number vary across plausible alternative histories? That variation is the standard error of the estimate, and it's nearly always large.

A concrete example

Take that SMA(50/200) crossover on SPY over the last decade. The backtest reports a Sharpe of 0.74. Let's see what happens when we resample the trades.

The strategy produced 12 round-trip trades. To estimate the sampling variance of the Sharpe ratio, we bootstrap: randomly pick 12 trades from the original 12, with replacement, compute the resampled Sharpe, repeat 10,000 times. That gives us a distribution of plausible Sharpe values consistent with the strategy's actual behaviour.

Here's what comes out:

Point estimate:

0.74

Mean of resamples:

0.71

95% CI lower bound:

-0.18

95% CI upper bound:

+1.42

CI width:

1.60

The 95% confidence interval on the Sharpe is [-0.18, +1.42]. The true Sharpe of this strategy — what it would average over many similar histories — could plausibly be anywhere from negative 0.18 (a losing strategy) to positive 1.42 (an excellent strategy). That's a 1.6-wide window, dwarfing the point estimate of 0.74.

In other words: the 0.74 Sharpe is consistent with this strategy being slightly worse than cash, AND with it being one of the best you've ever seen. We genuinely don't know which.

Why the CI is so wide

Three forces drive the standard error of a Sharpe estimate:

Sample size. The fewer trades, the wider the CI. With 12 trades, you're estimating mean and variance from a small sample — both are noisy. Double the trades and the CI roughly shrinks by a factor of √2.
Underlying volatility of returns. Sharpe is mean over std-dev. If individual trade returns vary wildly (which they typically do), the estimate of either is unstable.
Tail behaviour. Financial returns aren't normally distributed — they have fat tails. The standard "normal-distribution" assumption that lots of Sharpe-significance tables use underestimates the real uncertainty significantly.

The single most consequential of these is sample size. Most retail backtests run on a handful of trades — a 50/200 SMA on a single stock fires maybe 2-5 times over five years. With sample sizes that small, the Sharpe point estimate is almost meaningless. You'd need 30-50+ trades before the CI starts narrowing enough for the point estimate to carry information.

The same problem hits annualised return

Sharpe isn't unique here. Every backtest statistic computed from a sample has sampling variance, including the headline annualised return. The same SMA(50/200) example:

Annualised return (point):

+8.4%

95% bootstrap CI:

[-2.1%, +18.9%]

The lower bound is below zero. The strategy's "true" annualised return is consistent with losing 2% per year. The 8.4% you saw is just one draw from a wide distribution.

This is the same point as the first post in this series made about returns. The CI on the Sharpe is a special case of the same problem: point estimates with no uncertainty quantification are misleading on every metric, not just one.

The textbook tables don't help

If you've Googled "Sharpe ratio confidence interval" you've probably seen formulas like:

SE(Sharpe) ≈ √((1 + Sharpe² / 2) / n)

This is the asymptotic standard error of the Sharpe ratio under the assumption that returns are independent, identically distributed, and normally distributed. For financial returns over short windows that assumption is broken on all three counts: returns are serially correlated (volatility clustering), are not identically distributed (regime changes), and have fat tails (definitely not normal).

The result is that the asymptotic formula underestimates the standard error in real conditions, often by 20-50%. The bootstrap version doesn't make any distributional assumption — it resamples the actual observed returns. It's the right tool.

What to report instead

Three changes turn a misleading Sharpe number into something honest:

1. Always report a confidence interval alongside the point estimate.

Instead of "Sharpe = 0.74", report "Sharpe = 0.74 (95% CI: [-0.18, +1.42])". The interval immediately tells the reader whether the result is meaningful or noise. If the CI crosses zero, the strategy isn't statistically distinguishable from no edge.

2. Apply a multiple-comparison correction if you tested variants.

If you tried twenty different SMA periods and reported the Sharpe of the best one, that 0.74 has been cherry-picked. Bonferroni or similar correction widens the CI to account for the family of strategies you considered. Covered in detail here.

3. Tie the metric to a real out-of-sample window.

The Sharpe you compute on the data you tuned your strategy on is biased upward. Compute it on a holdout window the strategy never saw during selection. Walk-forward validation is how.

How EdgeAudit handles this

EdgeAudit's /recipe and /backtest commands report the Sharpe ratio, but the verdict layer doesn't make decisions on Sharpe alone. Instead the verdict ladder reads the bootstrap confidence intervals on the strategy's return:

PASS — both the standard 95% CI and the Bonferroni-corrected CI exclude zero. Even after adjusting for the parameter variants typically explored, the lower bound stays positive.
PROMISING — the 95% CI excludes zero but the Bonferroni-corrected CI doesn't. Worth investigating, not ready to deploy.
REJECT — the 95% CI crosses zero. Statistically can't rule out luck.
INSUFFICIENT_SAMPLE — fewer than 30 trades. CIs too wide to be useful regardless of point estimates.

The Sharpe number is still in the output, because it's a useful summary when you're comparing strategies head-to-head (a higher Sharpe means more return per unit of risk, in expectation). But it doesn't drive the verdict, because a single Sharpe with no CI shouldn't drive any decision.

The take-away

If you take one thing away from this post, make it this: never look at a Sharpe ratio without asking what its confidence interval is. A CI of ±0.1 around a 0.74 is a meaningful result. A CI of ±0.8 around the same 0.74 means you have no idea what the strategy actually does.

Tools that report only the point estimate — which is most retail backtesters — are giving you a number that feels precise and meaningful, but mathematically isn't. The next time you see a strategy advertised with a "Sharpe of 1.5", ask: across how many trades, and what's the lower bound of the 95% confidence interval. If they don't have an answer, the number is decoration, not evidence.

Try it. Run /recipe company: Apple strategy: Golden Cross years: 5 in EdgeAudit. The output includes both the point estimate (annualised return, Sharpe) AND the bootstrap-derived confidence interval. The verdict line tells you, in plain English, whether the lower bound is meaningfully above zero.