Skip to content

Sample mean, sample variance and Bessel’s correction

Sample mean, sample variance and Bessel’s correction

Sample mean is defined as
\begin{eqnarray}
\overline{X} = \frac{1}{n}\sum_{i=1}^n X_i
\end{eqnarray}
and sample variance is defined as
\begin{eqnarray}
S^2 = \frac{1}{n-1}\sum_{i=1}^n(X_i-\overline{X})^2
\end{eqnarray}

Note that sample mean is diiferent from the real mean. We know that the mean of a dice roll is 3.5. But if we roll the dice five times and get the results 5, 6,3, 4, 2 the sample mean will be 4, quite different from mean. But our hope is that if we increase the number of rolls sample mean will converge to mean. Same with sample variance. This is the topic of the law of large numbers, which we will see later.

Let $X_1,\ldots,X_n$ be iid with $N(\mu,\sigma^2)$. We will prove two results:

  • $\overline{X}$ and $S^2$ are independent. Or, in words, when the random variable is normal, sample mean and sample variance are independent.
  • $\frac{(n-1)S^2}{\sigma^2} = \frac{\sum_{i=1}^n (X_i-\overline{X})^2}{\sigma^2} \sim \chi^2(n-1)$

Proof: The first item is very hard to prove, and we will not prove it. It is necessary for the proof of the second item. Let us consider

\begin{eqnarray}
W = \sum_{i=1}^n \left( \frac{X_i-\mu}{\sigma} \right)^2
\end{eqnarray}
We can rewrite this equation as
\begin{eqnarray}
W &=& \sum_{i=1}^n \left( \frac{(X_i-\overline{X}+\overline{X}-\mu}{\sigma} \right)^2\\
&=& \sum_{i=1}^n \left( \frac{X_i-\overline{X}}{\sigma} \right)^2 + \sum_{i=1}^n \left( \frac{\overline{X}-\mu}{\sigma} \right)^2
+2\frac{\overline{X}-\mu}{\sigma}\sum_{i=1}^n(X_i-\overline{X})
\end{eqnarray}
Note that the last term is zero. Hence
\begin{eqnarray}
\sum_{i=1}^n \left( \frac{X_i-\mu}{\sigma} \right)^2 &=&
\sum_{i=1}^n \left( \frac{X_i-\overline{X}}{\sigma} \right)^2 + n \left( \frac{\overline{X}-\mu}{\sigma} \right)^2 \\
&=& \sum_{i=1}^n \left( \frac{X_i-\overline{X}}{\sigma} \right)^2 + \left( \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}} \right)^2
\end{eqnarray}
Note that
\begin{eqnarray}
\left( \frac{X_i-\mu}{\sigma} \right)^2 &\sim& \chi^2(1) \\
\sum_{i=1}^n \left( \frac{X_i-\mu}{\sigma} \right)^2 &\sim& \chi^2(n)\\
\frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}} &\sim& N(0,1)\\
\left( \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}} \right)^2 &\sim& \chi^2(1)
\end{eqnarray}
Hence
\begin{eqnarray}
\frac{(n-1)S^2}{\sigma^2} = \frac{\sum_{i=1}^n (X_i-\overline{X})^2}{\sigma^2} \sim \chi^2(n-1)
\end{eqnarray}
Note that we used the fact that $S^2$ and $\overline{X}$ are independent, in addition to the rule of sum of independent $\chi^2$ distributions.

Student’s t-distribution

If $Z \sim N(0,1)$ and $U \sim \chi^2(r)$ are independent, then the random variable
\begin{eqnarray}
T = \frac{Z}{\sqrt{\frac{U}{r}}}
\end{eqnarray}
follows a t distribution with r degrees of freedom. The pdf is
\begin{eqnarray}
f(t)=\frac{\varGamma\left(\frac{r+1}{2}\right)}{\sqrt{\pi r} \varGamma\left( \frac{r}{2} \right)} \frac{1}{(1+t^2/r)^{(r+1)/2}}
\end{eqnarray}
As the r increases, t-distribution converge to z (standard normal) distribution.