Unbiased estimator for population variance: clearly explained!

Estimator: A statistic used to approximate a population parameter. Sometimes called a point estimator. Estimate: The observed value of the estimator. Unbiased estimator: An estimator whose expected value is equal to the parameter that it is trying to estimate.

Sheldon M. Ross.

This post is based on two YouTube videos made by the wonderful YouTuber jbstatistics : https://www.youtube.com/watch?v=7mYDHbrLEQo and https://www.youtube.com/watch?v=D1hgiAla3KI&list=WL&index=11&t=0s. The most pedagogical videos I found on this subject.

Sometimes, students wonder why we have to divide by n-1 in the formula of the sample variance. In this pedagogical post, I show why dividing by n-1 provides an unbiased estimator of the population variance which is unknown when I study a peculiar sample. I start with n independent observations with mean µ and variance σ2.

X_1, X_2, \dots, X_n

I recall that two important properties for the expected value:

E(ΣX_i)=ΣE(X_i) \\ E(cX_i)=cE(X_i)

The variance is defined as follows:


Thus, I rearrange the variance formula to obtain the following expression:

\begin{aligned} E(X^2)&=V(X)+[E(X)]^2 \\ E(X^2)&=\sigma^2+\mu^2 \end{aligned}

For the proof I also need the expectation of the square of the sample mean:

E(\bar X^2)=V(\bar X)+[E(\bar X)]^2

Before moving further, I can find the expression for the expected value of the mean and the variance of the mean:

E(\bar X) = E\Big(\frac{X_1+X_2+\dots+X_n}{n}\Big)

The expected value operator is linear:

\begin{aligned} E(cX_i)&=cE(X_i) \end{aligned}
\begin{aligned} E(\bar X) &= \Big(\frac{1}{n}\Big)(\mu+\mu+\dots+\mu)\\ E(\bar X) &= \Big(\frac{1}{n}\Big)n\times\mu\\ E(\bar X) &= \mu \end{aligned}

I move to the variance of the mean:

V(\bar X) = V\Big(\frac{X_1+X_2+\dots+X_n}{n}\Big)

Since the variance is a quadratic operator, I have:

\begin{aligned} V(cX_i)&=c^2V(X_i) \end{aligned}
\begin{aligned} V(\bar X) &= \Big(\frac{1}{n}\Big)^2(\sigma^2+\sigma^2+\dots+\sigma^2)\\ V(\bar X) &= \Big(\frac{1}{n}\Big)^2n\times\sigma^2\\ V(\bar X) &= \frac{\sigma^2}{n} \end{aligned}

Thus, I obtain:

\begin{aligned} E(\bar X^2)&=V(\bar X)+[E(\bar X)]^2\\ E(\bar X^2)&=\frac{\sigma^2}{n}+\mu^2 \end{aligned}

I need to show that:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ &=\sigma^2 \end{aligned}

I focus on the expectation of the numerator, in the sum I omit the superscript and the subscript for clarity of exposition:

\begin{aligned} E[\Sigma (X_i&-\bar X)^2]\\ E[\Sigma (X_{i}^{2}&-2X_{i}\bar X+\bar X^2)]\\ \end{aligned}



I continue by rearranging terms in the middle sum:

\begin{aligned} E[\Sigma (X_{i}^{2}&-2X_{i}\bar X+\bar X^2)]\\ E[\Sigma X_{i}^{2}&-\Sigma 2X_{i}\bar X+\Sigma\bar X^2]\\ E[\Sigma X_{i}^{2}&-2\bar X\Sigma X_{i}+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-2\bar Xn\bar X+n\bar X^2]\\ \end{aligned}

Remember that the mean is the sum of the observations divided by the number of the observations:

\begin{aligned} \bar X&=\frac{\Sigma X_{i}}{n}\\ \Sigma X_{i}&=n\times\bar X \end{aligned}

I continue and since the expectation of the sum is equal to the sum of the expectation, I have:

\begin{aligned} E[\Sigma X_{i}^{2}&-2\bar Xn\bar X+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-2n\bar X^2+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-n\bar X^2]\\ E(\Sigma X_{i}^{2})&-E(n\bar X^2)\\ E(\Sigma X_{i}^{2})&-nE(\bar X^2)\\ \end{aligned}

I use the results obtained earlier:

\begin{aligned} E(X^2)&=\sigma^2+\mu^2\\ E(\bar X^2)&=\frac{\sigma^2}{n}+\mu^2 \end{aligned}
\begin{gathered} \small E[\Sigma (X_i-\bar X)^2]= \small E(\Sigma X_{i}^{2})-nE(\bar X^2)\\ = \small \Sigma (\sigma^2+\mu^2)-n\Big(\frac{\sigma^2}{n}+\mu^2\Big)\\ = \small n\sigma^2+n\mu^2-\sigma^2-n\mu^2\\ = \small n\sigma^2-\sigma^2\\ = \small (n-1)\sigma^2 \end{gathered}

I wanted to show this:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ &=\sigma^2 \end{aligned}

I use the previous result to show that dividing by n-1 provides an unbiased estimator:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ E(S^2)&=\frac{1}{n-1}E[\Sigma_{i=1}^n (X_i-\bar X)^2]\\ E(S^2)&=\frac{1}{n-1}(n-1)\sigma^2\\ E(S^2)&=\sigma^2 \end{aligned}

The expected value of the sample variance is equal to the population variance that is the definition of an unbiased estimator.