What Is the Central Limit Theorem? Explained with Examples

One of the most powerful ideas in all of statistics — why the bell curve appears everywhere, how sample size controls precision, and what a plinko machine has to do with probability.

By Petrus Sheya

June 30, 2026 · 7 min read

You roll a six-sided die 40 times. The average of those 40 rolls — what shape does its distribution have?

Not the shape of one die roll, which is flat. Something else. Something suspiciously bell-shaped.

Roll twice, average, repeat. The average of three rolls. The average of thirty. Each time, the distribution of that average approaches the same shape — a symmetric, bell-shaped mound — no matter how many sides the die has.

This is the Central Limit Theorem. It's why the bell curve appears everywhere. And it's one of the most beautiful results in all of mathematics.


A plinko machine

Before the math, let's build intuition with a physical machine.

Picture a plinko board: a triangular grid of pegs, with bins at the bottom. You drop a ball from the top. At each peg, the ball bounces randomly left or right. After 10 pegs, it lands in a bin.

The final position is the sum of 10 independent coin flips — left (−1) or right (+1). Each flip is as simple as a decision can be.

Each ball makes 10 random left/right decisions at the pegs — binary choices that add up to a perfect bell.

02456810↓ press Play
Balls dropped0
Decisions / ball10
Expected peakbin 5

Let it run until you've dropped 100 or 200 balls. Watch the histogram that builds up at the bottom. It's not flat. It's not triangular. It's a bell curve.

Every ball made the same number of completely random decisions. There's no preference for the middle. And yet the middle is where they pile up — because there are more ways to end up near the center than at either extreme.

10 rights in a row: one way. 9 rights and 1 left: ten ways. 5 and 5: 252 ways.

That's the bell curve. The number of ways to reach each outcome forms a perfect bell shape. And as the number of decisions grows, the shape converges on the normal distribution.


The dice experiment

The plinko machine used binary decisions — left or right. But the CLT doesn't need that. Let's try with dice.

Roll nn dice and take the average. Repeat that experiment 1,000 times. Plot the distribution of averages.

Each point is the average of n random numbers from [0, 1]. Drag n and watch flat become bell.

0.000.501.00
Samples1000
σ / √n0.289
Shapeflat

At n=1n = 1, you're just looking at one die: a flat, uniform distribution. Every face equally likely.

Drag to n=2n = 2. Now two dice, averaged. You can't easily get a 1 or a 6 anymore — you'd need both dice to show the extreme. The shape becomes triangular.

By n=10n = 10, it's a recognizable bell. By n=30n = 30, it's almost indistinguishable from the ideal normal curve (the dashed overlay).

Notice the width in the stat box. It narrows as nn grows. At n=1n = 1, the spread is wide. At n=40n = 40, the averages cluster tightly around 0.5. More data means a more precise average.


The theorem

Here's the formal statement.

Take any random variable XX with mean μ\mu and standard deviation σ\sigma. Draw nn independent samples: X1,X2,,XnX_1, X_2, \ldots, X_n. Compute their average:

Xˉ=X1+X2++Xnn\bar{X} = \frac{X_1 + X_2 + \cdots + X_n}{n}

As nn grows large, the distribution of Xˉ\bar{X} approaches a normal distribution:

XˉN ⁣(μ,  σn)\bar{X} \sim \mathcal{N}\!\left(\mu,\; \frac{\sigma}{\sqrt{n}}\right)

Two things to unpack here.

The center doesn't change. The average of the averages is always μ\mu — the true mean of the original distribution. CLT doesn't shift your estimate; it concentrates it.

The spread shrinks as 1/n1 / \sqrt{n}. Double the sample size, and the standard deviation of Xˉ\bar{X} shrinks by a factor of 2\sqrt{2}. Quadruple the sample size: spread halves. This quantity — σ/n\sigma / \sqrt{n} — is called the standard error. It measures how much your average wanders around the true mean from one sample to the next.


It works for any distribution

The theorem says "any random variable with a finite mean and standard deviation." That's nearly everything. It doesn't have to be symmetric. It doesn't have to be bell-shaped. It doesn't even have to look remotely like a normal distribution.

Pick any source shape. Accumulate sample means of size 30. The bell always appears below, no matter what's on top.

source distribution↓ sample means (n = 30)μ=0.50
Samples0
Expected σ_mean0.053

Try all three source distributions. The uniform distribution is flat — every value equally likely. The exponential is heavily skewed right — most values are small, but the tail stretches far. The bimodal has two humps with almost nothing in the middle — about as far from a bell as you can get.

Now watch what happens to the sample means.

Hit Auto-play and let a few hundred samples accumulate. All three converge to the same dashed bell curve. The source distribution visible in the top panel looks completely different each time. The histogram of means below looks the same every time.

The shape of the source distribution is irrelevant. Given enough samples, the average always converges to a bell centered at μ\mu, with spread σ/n\sigma / \sqrt{n}.

This is the theorem's real surprise. It's not "the normal distribution is special." It's "the normal distribution is inevitable."


Precision grows with sample size

The standard error σ/n\sigma / \sqrt{n} is the key practical lever of statistics.

If you want to estimate the true mean μ\mu from data, your estimate Xˉ\bar{X} will be close to μ\mu — but not exact. The standard error tells you how close. A 95% confidence interval runs roughly Xˉ±2σn\bar{X} \pm 2 \cdot \frac{\sigma}{\sqrt{n}}, meaning you're 95% sure the true mean is within that band.

Gray: source distribution (exponential, σ=1). Red: where sample means land. As n grows, the red bell collapses.

012340.51.0μ=1sourceX̄ dist.
SE = σ / √n1.000
95% CI width4.000
Peak height0.40

The gray curve is the source distribution (exponential, mean = 1, wide spread). The red bell is where sample means land. At n=1n = 1, they're nearly identical — one sample doesn't tell you much.

Drag nn upward and watch the red bell collapse. At n=25n = 25, it's already much narrower than the source. At n=100n = 100, the red bell is a sharp spike centred on μ=1\mu = 1.

The 95% CI width in the stat box updates live. At n=1n = 1: width ≈ 4.00. At n=100n = 100: width ≈ 0.40. At n=400n = 400: width ≈ 0.20. Every factor of 4 in sample size halves the error. This is the square-root law, and it governs every scientific estimate from opinion polls to clinical trials.


Why the bell is everywhere

The CLT says: averages over independent samples converge to a normal distribution. But what's a measurement, really?

Height is the average effect of thousands of genes and environmental factors, each nudging you slightly taller or shorter.

A factory part's length is the sum of hundreds of small machining errors, each adding or subtracting a tiny amount.

A test score is the aggregate of knowledge across many independent questions.

A polling average is the mean of thousands of independent voter intentions.

In each case, we're summing or averaging many small, roughly independent contributions. And the CLT says that sum will be normally distributed — every time, automatically, regardless of what the individual contributions look like.

The bell doesn't appear because the world is normal. It appears because the world is a sum.


The short version

The Central Limit Theorem says: take any distribution with mean μ\mu and standard deviation σ\sigma. Draw nn independent samples and compute their average. As nn grows, that average follows a normal distribution centered at μ\mu with standard deviation σ/n\sigma / \sqrt{n}.

Three consequences that matter in practice:

  1. Your average is an unbiased estimate of the true mean. No matter what the source distribution looks like.

  2. Your precision improves with n\sqrt{n}. Four times more data = half the error.

  3. The source distribution's shape doesn't matter — for large nn, the sampling distribution is always a bell.

This is why statistics works at all. We don't need to know the true distribution of human heights, or measurement errors, or voter preferences. We just take enough samples, compute an average, and the CLT guarantees that average is normally distributed. We can build confidence intervals, run hypothesis tests, and calculate p-values — because averages, by the theorem's guarantee, are always bell-shaped in the limit.

The bell curve isn't an assumption about the world. It's what the world produces when you add things up.


All simulations run live in the browser. The Galton board uses physically correct peg geometry with Binomial(10, 0.5) statistics. The dice experiment uses 40,000 pre-generated uniform random numbers, grouped into batches of n. The AnyDistCLT simulator draws fresh samples on each iteration using a Box-Muller transform for the bimodal component.