# Unbiased estimator for population variance: clearly explained!

Estimator: A statistic used to approximate a population parameter. Sometimes called a point estimator. Estimate: The observed value of the estimator. Unbiased estimator: An estimator whose expected value is equal to the parameter that it is trying to estimate.

Sheldon M. Ross (2010).

Sometimes, students wonder why we have to divide by n-1 in the formula of the sample variance. In this pedagogical post, I show why dividing by n-1 provides an unbiased estimator of the population variance which is unknown when I study a peculiar sample. I start with n independent observations with mean µ and variance σ2.

X_1, X_2, \dots, X_n

I recall that two important properties for the expected value:

E(ΣX_i)=ΣE(X_i) \\
E(cX_i)=cE(X_i)

The variance is defined as follows:

V(X)=E(X^2)-[E(X)]^2

Thus, I rearrange the variance formula to obtain the following expression:

\begin{aligned}
E(X^2)&=V(X)+[E(X)]^2 \\
E(X^2)&=\sigma^2+\mu^2
\end{aligned}

For the proof I also need the expectation of the square of the sample mean:

E(\bar X^2)=V(\bar X)+[E(\bar X)]^2

Before moving further, I can find the expression for the expected value of the mean and the variance of the mean:

E(\bar X) = E\Big(\frac{X_1+X_2+\dots+X_n}{n}\Big)

The expected value operator is linear:

\begin{aligned}
E(cX_i)&=cE(X_i)
\end{aligned}
\begin{aligned}
E(\bar X) &= \Big(\frac{1}{n}\Big)(\mu+\mu+\dots+\mu)\\
E(\bar X) &= \Big(\frac{1}{n}\Big)n\times\mu\\
E(\bar X) &= \mu
\end{aligned}



I move to the variance of the mean:

V(\bar X) = V\Big(\frac{X_1+X_2+\dots+X_n}{n}\Big)

Since the variance is a quadratic operator, I have:

\begin{aligned}
V(cX_i)&=c^2V(X_i)
\end{aligned}
\begin{aligned}
V(\bar X) &= \Big(\frac{1}{n}\Big)^2(\sigma^2+\sigma^2+\dots+\sigma^2)\\
V(\bar X) &= \Big(\frac{1}{n}\Big)^2n\times\sigma^2\\
V(\bar X) &= \frac{\sigma^2}{n}
\end{aligned}

Thus, I obtain:

\begin{aligned}
E(\bar X^2)&=V(\bar X)+[E(\bar X)]^2\\
E(\bar X^2)&=\frac{\sigma^2}{n}+\mu^2
\end{aligned}

I need to show that:

\begin{aligned}
E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\
&=\sigma^2
\end{aligned}

I focus on the expectation of the numerator, in the sum I omit the superscript and the subscript for clarity of exposition:

\begin{aligned}
E[\Sigma (X_i&-\bar X)^2]\\
E[\Sigma (X_{i}^{2}&-2X_{i}\bar X+\bar X^2)]\\
\end{aligned}

Because,

(a-b)^2=(a^2-2ab+b^2)

I continue by rearranging terms in the middle sum:

\begin{aligned}
E[\Sigma (X_{i}^{2}&-2X_{i}\bar X+\bar X^2)]\\
E[\Sigma X_{i}^{2}&-\Sigma 2X_{i}\bar X+\Sigma\bar X^2]\\
E[\Sigma X_{i}^{2}&-2\bar X\Sigma X_{i}+n\bar X^2]\\
E[\Sigma X_{i}^{2}&-2\bar Xn\bar X+n\bar X^2]\\
\end{aligned}

Remember that the mean is the sum of the observations divided by the number of the observations:

\begin{aligned}
\bar X&=\frac{\Sigma X_{i}}{n}\\
\Sigma X_{i}&=n\times\bar X
\end{aligned}

I continue and since the expectation of the sum is equal to the sum of the expectation, I have:

\begin{aligned}
E[\Sigma X_{i}^{2}&-2\bar Xn\bar X+n\bar X^2]\\
E[\Sigma X_{i}^{2}&-2n\bar X^2+n\bar X^2]\\
E[\Sigma X_{i}^{2}&-n\bar X^2]\\
E(\Sigma X_{i}^{2})&-E(n\bar X^2)\\
E(\Sigma X_{i}^{2})&-nE(\bar X^2)\\
\end{aligned}

I use the results obtained earlier:

\begin{aligned}
E(X^2)&=\sigma^2+\mu^2\\
E(\bar X^2)&=\frac{\sigma^2}{n}+\mu^2
\end{aligned}
\begin{gathered}
\small E[\Sigma (X_i-\bar X)^2]= \small E(\Sigma X_{i}^{2})-nE(\bar X^2)\\
= \small \Sigma (\sigma^2+\mu^2)-n\Big(\frac{\sigma^2}{n}+\mu^2\Big)\\
= \small n\sigma^2+n\mu^2-\sigma^2-n\mu^2\\
= \small n\sigma^2-\sigma^2\\
= \small (n-1)\sigma^2
\end{gathered}


I wanted to show this:

\begin{aligned}
E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\
&=\sigma^2
\end{aligned}

I use the previous result to show that dividing by n-1 provides an unbiased estimator:

\begin{aligned}
E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\
E(S^2)&=\frac{1}{n-1}E[\Sigma_{i=1}^n (X_i-\bar X)^2]\\
E(S^2)&=\frac{1}{n-1}(n-1)\sigma^2\\
E(S^2)&=\sigma^2
\end{aligned}

The expected value of the sample variance is equal to the population variance that is the definition of an unbiased estimator.

### 1 Comment

[…] Unbiased estimator for population variance: clearly explained! July 15, 2020 […]

This site uses Akismet to reduce spam. Learn how your comment data is processed.