Unbiased estimator for population variance: clearly explained!

Estimator: A statistic used to approximate a population parameter. Sometimes called a point estimator. Estimate: The observed value of the estimator. Unbiased estimator: An estimator whose expected value is equal to the parameter that it is trying to estimate.
Sheldon M. Ross (2010).

This post is based on two YouTube videos made by the wonderful YouTuber jbstatistics

https://www.youtube.com/watch?v=7mYDHbrLEQo

https://www.youtube.com/watch?v=D1hgiAla3KI&list=WL&index=11&t=0s

The most pedagogical videos I found on this subject.

Sometimes, students wonder why we have to divide by n-1 in the formula of the sample variance. In this pedagogical post, I show why dividing by n-1 provides an unbiased estimator of the population variance which is unknown when I study a peculiar sample. I start with n independent observations with mean µ and variance σ².

X_1, X_2, \dots, X_n

I recall that two important properties for the expected value:

E(ΣX_i)=ΣE(X_i) \\ E(cX_i)=cE(X_i)

The variance is defined as follows:

V(X)=E(X^2)-[E(X)]^2

Thus, I rearrange the variance formula to obtain the following expression:

\begin{aligned} E(X^2)&=V(X)+[E(X)]^2 \\ E(X^2)&=\sigma^2+\mu^2 \end{aligned}

For the proof I also need the expectation of the square of the sample mean:

E(\bar X^2)=V(\bar X)+[E(\bar X)]^2

Before moving further, I can find the expression for the expected value of the mean and the variance of the mean:

E(\bar X) = E\Big(\frac{X_1+X_2+\dots+X_n}{n}\Big)

The expected value operator is linear:

\begin{aligned} E(cX_i)&=cE(X_i) \end{aligned}

\begin{aligned} E(\bar X) &= \Big(\frac{1}{n}\Big)(\mu+\mu+\dots+\mu)\\ E(\bar X) &= \Big(\frac{1}{n}\Big)n\times\mu\\ E(\bar X) &= \mu \end{aligned}

I move to the variance of the mean:

V(\bar X) = V\Big(\frac{X_1+X_2+\dots+X_n}{n}\Big)

Since the variance is a quadratic operator, I have:

\begin{aligned} V(cX_i)&=c^2V(X_i) \end{aligned}

\begin{aligned} V(\bar X) &= \Big(\frac{1}{n}\Big)^2(\sigma^2+\sigma^2+\dots+\sigma^2)\\ V(\bar X) &= \Big(\frac{1}{n}\Big)^2n\times\sigma^2\\ V(\bar X) &= \frac{\sigma^2}{n} \end{aligned}

Thus, I obtain:

\begin{aligned} E(\bar X^2)&=V(\bar X)+[E(\bar X)]^2\\ E(\bar X^2)&=\frac{\sigma^2}{n}+\mu^2 \end{aligned}

I need to show that:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ &=\sigma^2 \end{aligned}

I focus on the expectation of the numerator, in the sum I omit the superscript and the subscript for clarity of exposition:

\begin{aligned} E[\Sigma (X_i&-\bar X)^2]\\ E[\Sigma (X_{i}^{2}&-2X_{i}\bar X+\bar X^2)]\\ \end{aligned}

Because,

(a-b)^2=(a^2-2ab+b^2)

I continue by rearranging terms in the middle sum:

\begin{aligned} E[\Sigma (X_{i}^{2}&-2X_{i}\bar X+\bar X^2)]\\ E[\Sigma X_{i}^{2}&-\Sigma 2X_{i}\bar X+\Sigma\bar X^2]\\ E[\Sigma X_{i}^{2}&-2\bar X\Sigma X_{i}+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-2\bar Xn\bar X+n\bar X^2]\\ \end{aligned}

Remember that the mean is the sum of the observations divided by the number of the observations:

\begin{aligned} \bar X&=\frac{\Sigma X_{i}}{n}\\ \Sigma X_{i}&=n\times\bar X \end{aligned}

I continue and since the expectation of the sum is equal to the sum of the expectation, I have:

\begin{aligned} E[\Sigma X_{i}^{2}&-2\bar Xn\bar X+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-2n\bar X^2+n\bar X^2]\\ E[\Sigma X_{i}^{2}&-n\bar X^2]\\ E(\Sigma X_{i}^{2})&-E(n\bar X^2)\\ E(\Sigma X_{i}^{2})&-nE(\bar X^2)\\ \end{aligned}

I use the results obtained earlier:

\begin{aligned} E(X^2)&=\sigma^2+\mu^2\\ E(\bar X^2)&=\frac{\sigma^2}{n}+\mu^2 \end{aligned}

\begin{gathered} \small E[\Sigma (X_i-\bar X)^2]= \small E(\Sigma X_{i}^{2})-nE(\bar X^2)\\ = \small \Sigma (\sigma^2+\mu^2)-n\Big(\frac{\sigma^2}{n}+\mu^2\Big)\\ = \small n\sigma^2+n\mu^2-\sigma^2-n\mu^2\\ = \small n\sigma^2-\sigma^2\\ = \small (n-1)\sigma^2 \end{gathered}

I wanted to show this:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ &=\sigma^2 \end{aligned}

I use the previous result to show that dividing by n-1 provides an unbiased estimator:

\begin{aligned} E(S^2)&=E\Big[\frac{\Sigma_{i=1}^n (X_i-\bar X)^2}{n-1}\Big]\\ E(S^2)&=\frac{1}{n-1}E[\Sigma_{i=1}^n (X_i-\bar X)^2]\\ E(S^2)&=\frac{1}{n-1}(n-1)\sigma^2\\ E(S^2)&=\sigma^2 \end{aligned}

The expected value of the sample variance is equal to the population variance that is the definition of an unbiased estimator.

A Simple Proof for the Chebyshev Inequality: Clearly Explained!

In the Appendix A of this book: Statistics: Principles and Methods written by Giuseppe Cicchitelli, Pierpaolo D’Urso and Marco Minozzo published by Pearson in 2021, I found the…

Skewness in Wolfram Alpha: Clearly Explained!

The positional average known as the skewness allows you to assess the symmetry of a distribution. When the skewness is to zero, then the distribution is symmetric. You…

13 Comments

[…] Unbiased estimator for population variance: clearly explained! (9 413 visits): https://www.jamelsaadaoui.com/unbiased-estimator-for-population-variance-clearly-explained/ […]

Thank you so much for such elaborate derivation, Prof. Saadaoui!

You’re welcome!

[…] Unbiased estimator for population variance: clearly explained! […]

Thanks, this was very useful!

You are welcome. The pleasure is mine!

Thanks for your kind, detail and clear derivation.

The pleasure is mine!

[…] my two previous blog, I recall that we can demonstrate in a few steps that the sample variance is an unbiased estimator […]

[…] my previous blog, I recall that we can demonstrate in a few steps that the sample variance is an unbiased estimator […]

[…] to this site [1]: The expected value of the sample variance is equal to the population […]

[…] https://www.jamelsaadaoui.com/unbiased-estimator-for-population-variance-clearly-explained/ […]

[…] Unbiased estimator for population variance: clearly explained! July 15, 2020 […]

13 Comments

Strategic Stockpiling Reduces the Geopolitical Risk to the Supply Chain of Copper and Lithium

RePEc’s authors ranking (Last 10 Years Publications)

Countries and periods after a panel estimation with Stata

The Economic Cost of Nationalism

US-China Tensions, US Partisan Conflict and Global Oil Prices: Scapegoating? (Applied Economics Letters)

13 Comments

Leave a Reply Cancel reply

Related Posts

Strategic Stockpiling Reduces the Geopolitical Risk to the Supply Chain of Copper and Lithium

RePEc’s authors ranking (Last 10 Years Publications)

Countries and periods after a panel estimation with Stata

The Economic Cost of Nationalism

US-China Tensions, US Partisan Conflict and Global Oil Prices: Scapegoating? (Applied Economics Letters)