\[ \newcommand{\define}[1]{\textbf{#1}} \newcommand{\textsigma}{$\sigma$} \newcommand{\C}{\mathbb{C}} \newcommand{\haar}{\mathsf{m}} \renewcommand{\P}{\mathcal{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\intd}{\,\mathsf{d}} \newcommand{\area}{\mathop{\mathsf{Area}}} \newcommand{\cont}{\mathsf{C}} \newcommand{\contc}{\cont_\mathsf{c}} \newcommand{\conto}{\cont_\mathsf{0}} \newcommand{\met}{\mathop{\mathsf{d}}} \renewcommand{\emptyset}{\varnothing} \newcommand{\B}{\mathscr{B}} \newcommand{\borel}{\mathsf{Bor}} \newcommand{\lp}[1][2]{\operatorname{\mathscr{L}^{\mathsf{#1}}}} \newcommand{\Lp}[1][2]{\operatorname{\mathsf{L}^{\mathsf{#1}}}} \]

The Law of Large Numbers

  1. Probability
  2. Proving the strong law

In this section we will prove the strong law of large numbers in a special case. It is a central result in probability theory and – as will become apparent later – our second example of analyzing statistical properties of dynamical systems.

$\lahResCounter{equation}$$\lahResCounter{equation}$$\lahResCounter{equation}$$\lahResCounter{equation}$$\lahResCounter{equation}$

11.1 Probability

We defined the outer measure $\Xi_p$ on $\{0,1\}^\N$ in terms of probability: it came from assigning the sizes $\lahIncCounter{equation}$\begin{equation}\label{eqn:cyl_size} p^{|\{ 1 \le i \le r : \epsilon(i) = 1 \}|} (1-p)^{| \{ 1 \le i \le r : \epsilon(i) = 0 \} |}\end{equation} to the cylinder sets $[\epsilon(1) \cdots \epsilon(r)]$. Write $\xi_p$ for the measure on $\{0,1\}^\N$ that results from Carathéodory's theorem. We will take for granted that $\xi_p$ is defined on the Borel subsets of $\{0,1\}^\N$ and that Equation 11.1 is the $\xi_p$ measure of the cylinder $[\epsilon(1) \cdots \epsilon(r)]$. In particular $\xi_p(\{0,1\}^\N) = 1$ for all $0 \le p \le 1$.

A standard problem in probability is to understand empirical averages such as the frequency \[ \dfrac{| \{ 1 \le n \le N : x(n) = 1 \}|}{N} \] of heads in $N$ coin tosses as $N \to \infty$ given $x \in \{0,1\}^\N$. We would like to say with some certainty that, in the long run, we have the “expected” frequency.

What “expected” means is prescribed by the measure we are working with i.e. the model of randomness we have accepted. In our current situation, the measure $\xi_p$ is intended to model independent tosses of a coin that shows a head with probability $p$. In that situation one might “expect” to see a head with proportionality $p$ in the long run. The strong law of large numbers makes this precise by telling us that \[ \xi_p \left( \left\{ x \in \{0,1\}^\N : \lim_{N \to \infty} \dfrac{| \{ 1 \le n \le N : x(n) = 1 \}|}{N} = p \right\} \right) = 1 \] for all $0 \le p \le 1$. As $\xi_p(\{0,1\}^\N) = 1$ the set of trials $x$ that do not have the correct limiting frequency (or for which the limiting frequency does not exist) is negligible as far as $\xi_p$ is concerned.

We can think of the above dynamically. Define $T : \{0,1\}^\N \to \{0,1\}^\N$ by \[ (T(x))(n) = x(n+1) \] for all $n \in \N$ Then $x(n) = (T^{n-1}(x))(1)$ and \[ \dfrac{|\{ 1 \le n \le N : x(n) = 1 \}|}{N} = \dfrac{x(1) + \cdots + x(N)}{N} = \dfrac{1}{N} \sum_{n=0}^{N-1} 1_{[1]}(T^n(x)) \] is an empirical average.

$\lahResCounter{equation}$$\lahResCounter{equation}$$\lahResCounter{equation}$$\lahResCounter{equation}$$\lahResCounter{equation}$

11.2 Proving the strong law

We are now ready to start proving the following theorem.

Theorem 11.2.1 (The strong law of large numbers)

For all $0 \le p \le 1$ the set \[ \left\{ x \in \{0,1\}^\N : \lim_{N \to \infty} \dfrac{| \{ 1 \le n \le N : x(n) = 1 \}|}{N} = p \right\} \] has measure 1 with respect to $\xi_p$.

Fix $0 \le p \le 1$. Define a function $Y_n : \{0,1\}^\N \to \R$ by \[ Y_n(x) = x(n) - p \] for each $n \in \N$. If $x \in \{0,1\}^\N$ satisfies \[ \lim_{N \to \infty} \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} = 0 \] then we have \[ \lim_{N \to \infty} \dfrac{x(1) + \cdots + x(N)}{N} = p \] so it suffices to work with the $Y_n$. We could work throughout with function $Z_n(x) = x(n)$ but it is slightly more convenient – and often done – to work with the functions $Y_n$ instead. The main advantage of $Y_n$ is that is has zero integral. We can check this directly because \[ Y_n = 1_{T^{-(n-1)}[1]} - p \] is a simple function and therefore has integral \[ \xi_p(T^{-(n-1)}([1])) - p = p-p = 0 \] as claimed.

We want to show that the set \[ B = \left\{ x \in \{0,1\}^\N : \lim_{N \to \infty} \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \ne 0 \right\} \] has zero measure. First note that \[ B = \bigcup_{r \in \N} \bigcap_{M \in \N} \bigcup_{N \ge M} \left\{ x \in \{0,1\}^\N : \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right| \ge \dfrac{1}{r} \right\} \] so we must show for every $r \in \N$ that \[ \bigcap_{M \in \N} \bigcup_{N \ge M} \left\{ x \in \{0,1\}^\N : \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right| \ge \dfrac{1}{r} \right\} \] has zero measure with respect to $\xi_p$. The following lemma will give us a way to do this.

Lemma 11.2.2 (Borel-Cantelli lemma)

Let $(X,\B,\nu)$ be a measure space with $\nu(X) = 1$. For any sequence $A_1,A_2,\dots$ in $\B$ the convergence \[ \sum_{n=1}^\infty \nu(A_n) < \infty \] implies \[ \bigcap_{M \in \N} \bigcup_{N \ge M} A_N \] has zero measure.

Proof:

By subadditivity we can calculate that \[ \nu \left(\, \bigcup_{N \ge M} A_N \right) \le \sum_{N=M}^\infty \nu(A_N) \to 0 \] as $N \to \infty$. Since \[ M \mapsto \bigcup_{N \ge M} A_N \] is a decreasings sequence of sets and $\nu(X) = 1$ the result follows from continuity of measures.

Returning to the strong law of large numbers, it suffices by the Borel-Cantelli lemma to prove that \[ N \mapsto \xi_p \left( \, \left\{ x \in \{0,1\}^\N : \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right| \ge \dfrac{1}{r} \right\} \right) \] is summable. Fix $r \in \N$ and $N \in \N$.

Lemma 11.2.3 (Markov's inequality)

Fix a measure space $(X,\B,\nu)$ with $\nu(X) = 1$ and fix a measurable function $f : X \to [0,\infty)$. Then \[ \nu( \{ x \in X : f(x) \ge s \}) \le \dfrac{1}{s} \int f \intd \nu \] for every $s \ge 0$.

Proof:

This is nothing other than the fact that \[ 0 \le s \cdot 1_{\{ x \in X : f(x) \ge s \}} \le f \] and monotonicity of the integral.

Applying Markov's inequality to our set gives us \begin{align*}& \xi_p \left(\, \left\{ x \in \{0,1\}^\N : \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right| \ge \dfrac{1}{r} \right\} \right) \\\le {} & r \int \left| \dfrac{Y_1 + \cdots + Y_N}{N} \right| \intd \xi_p \\\le {} & \dfrac{r}{N} \sum_{n=1}^N \int |Y_n| \intd \xi_p \end{align*} for all $N \in \N$. However \[ |Y_n| = (1-p) \cdot 1_{T^{-(n-1)}[1]} + p \cdot 1_{T^{-(n-1)}[0]} \] so our upper bound above is just \[ \dfrac{r}{N} \sum_{n=1}^N 2 p (1-p) \] which is not summable in $N$.

To do better, we note that \[ \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right| \ge \dfrac{1}{r} \Leftrightarrow \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right|^4 \ge \dfrac{1}{r^4} \] and we may therefore also take \[ \begin{aligned} & \xi_p \left(\, \left\{ x \in \{0,1\}^\N : \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right| \ge \dfrac{1}{r} \right\} \right) \\ \le {} & r^4 \int \left| \dfrac{Y_1 + \cdots + Y_N}{N} \right|^4 \intd \xi_p \\ = {} & \dfrac{r^4}{N^4} \int |Y_1 + \cdots + Y_N|^4 \intd \xi_p \end{aligned} \] from Markov's inequality. It remains for us to calculate the integral \[ \int |Y_1 + \cdots + Y_N|^4 \intd \xi_p = \sum_{a,b,c,d = 1}^N \int Y_a Y_b Y_c Y_d \intd \xi_p \] which we do by checking various possibilities.

First we note that \[ \int Y_a Y_b Y_c Y_d \intd \xi_p = 0 \] will be zero whenever some member of the tuple $(a,b,c,d)$ appears only once. This is because $\xi_p$ is such that \[ \int Y_a^\alpha Y_b^\beta Y_c^\gamma Y_d^\delta \intd \xi_p = \int Y_a^\alpha \intd \xi_p \int Y_b^\beta \intd \xi_p \int Y_c^\gamma \intd \xi_p \int Y_d^\delta \intd \xi_p \] whenever $\alpha,\beta,\gamma,\delta$ belong to $\N$ and $a,b,c,d$ are pairwise distinct.

Thus \[ \sum_{a,b,c,d = 1}^N \int Y_a Y_b Y_c Y_d \intd \xi_p = \sum_{a,b=1}^N \int Y_a^2 Y_b^2 \intd \xi_p \] and from $|Y_a| \le 1$ we get \[ \left| \sum_{a,b,c,d = 1}^N \int Y_a Y_b Y_c Y_d \intd \xi_p \right| \le N^2 \] for all $N \in \N$. To summarize, Markov's inequality gives \[ \begin{aligned} & \xi_p \left(\, \left\{ x \in \{0,1\}^\N : \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right| \ge \dfrac{1}{r} \right\} \right) \\ \le {} & r^4 \int \left| \dfrac{Y_1 + \cdots + Y_N}{N} \right|^4 \intd \xi_p \\ \le {} & \dfrac{r^4}{N^4} \int \sum_{a,b,c,d = 1}^N Y_a Y_b Y_c Y_d \intd \xi_p \\ \le {} & \dfrac{r^4}{N^2} \end{aligned} \] which is summable in $N$. We conclude from the Borel-Cantelli lemma that \[ \bigcap_{M \in \N} \bigcup_{N \ge M} \left\{ x \in \{0,1\}^\N : \left| \dfrac{Y_1(x) + \cdots + Y_N(x)}{N} \right| \ge \dfrac{1}{r} \right\} \] has zero measure with respect to $\xi_p$. Since $r \in \N$ was arbitrary, we are done.