Unbiased estimator for population variance: clearly explained!

Estimator: A statistic used to approximate a population parameter. Sometimes called a point estimator. Estimate: The observed value of the estimator. Unbiased estimator: An estimator whose expected value is equal to the parameter that it is trying to estimate.

Sheldon M. Ross.

Sometimes, students wonder why we have to divide by n-1 in the formula of the sample variance. In this pedagogical post, I show why dividing by n-1 provides an unbiased estimator of the population variance which is unknown when I study a peculiar sample. I start with n independent observations with mean µ and variance σ2.

X_1, X_2, \dots, X_n

I recall that two important properties for the expected value:

E(ΣX_i)=ΣE(X_i) \\ E(cX_i)=cE(X_i)

The variance is defined as follows:

V(X)=E(X^2)-[E(X)]^2

Thus, I rearrange the variance formula to obtain the following expression:

\begin{aligned} E(X^2)&=V(X)+[E(X)]^2 \\ E(X^2)&=\sigma^2+\mu^2 \end{aligned}

For the proof I also need the expectation of the square of the sample mean:

E(\bar X^2)=V(\bar X)+[E(\bar X)]^2

Before moving further, I can find the expression for the expected value of the mean and the variance of the mean:

E(\bar X) = E\Big(\frac{X_1+X_2+\dots+X_n}{n}\Big)

The expected value operator is linear:

\begin{aligned} E(cX_i)&=cE(X_i) \end{aligned}
\begin{aligned} E(\bar X) &= \Big(\frac{1}{n}\Big)(\mu+\mu+\dots+\mu)\\ E(\bar X) &= \Big(\frac{1}{n}\Big)n\times\mu\\ E(\bar X) &= \mu \end{aligned}

I move to the variance of the mean:

V(\bar X) = V\Big(\frac{X_1+X_2+\dots+X_n}{n}\Big)

Since the variance is a quadratic operator, I have:

\begin{aligned} V(cX_i)&=c^2V(X_i) \end{aligned}
\begin{aligned} V(\bar X) &= \Big(\frac{1}{n}\Big)^2(\sigma^2+\sigma^2+\dots+\sigma^2)\\ V(\bar X) &= \Big(\frac{1}{n}\Big)^2n\times\sigma^2\\ V(\bar X) &= \frac{\sigma^2}{n} \end{aligned}

Thus, I obtain:

\begin{aligned} E(\bar X^2)&=V(\bar X)+[E(\bar X)]^2\\ E(\bar X^2)&=\frac{\sigma^2}{n}+\mu^2 \end{aligned}

I need to show that:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ &=\sigma^2 \end{aligned}

I focus on the expectation of the numerator, in the sum I omit the superscript and the subscript for clarity of exposition:

\begin{aligned} E[\Sigma (X_i&-\bar X)^2]\\ E[\Sigma (X_{i}^{2}&-2X_{i}\bar X+\bar X^2)]\\ \end{aligned}

Because,

(a-b)^2=(a^2-2ab+b^2)

I continue by rearranging terms in the middle sum:

\begin{aligned} E[\Sigma (X_{i}^{2}&-2X_{i}\bar X+\bar X^2)]\\ E[\Sigma X_{i}^{2}&-\Sigma 2X_{i}\bar X+\Sigma\bar X^2]\\ E[\Sigma X_{i}^{2}&-2\bar X\Sigma X_{i}+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-2\bar Xn\bar X+n\bar X^2]\\ \end{aligned}

Remember that the mean is the sum of the observations divided by the number of the observations:

\begin{aligned} \bar X&=\frac{\Sigma X_{i}}{n}\\ \Sigma X_{i}&=n\times\bar X \end{aligned}

I continue and since the expectation of the sum is equal to the sum of the expectation, I have:

\begin{aligned} E[\Sigma X_{i}^{2}&-2\bar Xn\bar X+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-2n\bar X^2+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-n\bar X^2]\\ E(\Sigma X_{i}^{2})&-E(n\bar X^2)\\ E(\Sigma X_{i}^{2})&-nE(\bar X^2)\\ \end{aligned}

I use the results obtained earlier:

\begin{aligned} E(X^2)&=\sigma^2+\mu^2\\ E(\bar X^2)&=\frac{\sigma^2}{n}+\mu^2 \end{aligned}
\begin{gathered} \small E[\Sigma (X_i-\bar X)^2]= \small E(\Sigma X_{i}^{2})-nE(\bar X^2)\\ = \small \Sigma (\sigma^2+\mu^2)-n\Big(\frac{\sigma^2}{n}+\mu^2\Big)\\ = \small n\sigma^2+n\mu^2-\sigma^2-n\mu^2\\ = \small n\sigma^2-\sigma^2\\ = \small (n-1)\sigma^2 \end{gathered}

I wanted to show this:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ &=\sigma^2 \end{aligned}

I use the previous result to show that dividing by n-1 provides an unbiased estimator:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ E(S^2)&=\frac{1}{n-1}E[\Sigma_{i=1}^n (X_i-\bar X)^2]\\ E(S^2)&=\frac{1}{n-1}(n-1)\sigma^2\\ E(S^2)&=\sigma^2 \end{aligned}

The expected value of the sample variance is equal to the population variance that is the definition of an unbiased estimator.