\[ \newcommand{\C}{\mathbb{C}} \newcommand{\haar}{\mathsf{m}} \DeclareMathOperator{\cont}{\mathsf{C}} \newcommand{\contc}{\cont_\mathsf{c}} \newcommand{\conto}{\cont_\mathsf{0}} \newcommand{\P}{\mathcal{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\g}{>} \newcommand{\l}{<} \newcommand{\intd}{\,\mathsf{d}} \newcommand{\Re}{\mathsf{Re}} \newcommand{\area}{\mathop{\mathsf{Area}}} \newcommand{\met}{\mathop{\mathsf{d}}} \newcommand{\orb}{\mathop{\mathsf{orb}}} \newcommand{\emptyset}{\varnothing} \newcommand{\B}{\mathscr{B}} \DeclareMathOperator{\borel}{\mathsf{Bor}} \DeclareMathOperator{\lpell}{\mathsf{L}} \newcommand{\lp}[1]{\lpell^{\!\mathsf{#1}}} \newcommand{\Lp}[1][p]{\mathsf{L}^{\!\mathsf{#1}}} \renewcommand{\|}{|\!|} \newcommand{\M}{\operatorname{\mathsf{M}}} \]

The ergodic theorem

In this section we will prove the pointwise ergodic theorem for ergodic transformations. A crucial ingredient in its proof is the maximal ergodic theorem, which is concerned with the maximum value \[ \M_f(x) = \sup \left\{ \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n x) : N \in \N \right\} \] of the ergodic averages of a function $f : X \to \R$.

Theorem (Maximal ergodic theorem)

Let $(X,\B,\mu)$ be a probability space. Fix a measure-preserving map $T : X \to X$ and fix $f : X \to \R$ measurable and integrable. Then \[ \int 1_{\{ \M_f \g \lambda \}} \cdot (f - \lambda) \intd \mu \ge 0 \] for every $\lambda \in \R$.

Proof:

Fix $\lambda \in \R$. We will first prove this in the case that $f$ is bounded, say $|f(x)| \le K$ for all $x \in X$. Fix $R \in \N$ and write \[ \M_{f,R}(x) = \max \left\{ \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n x) : 1 \le N \le R \right\} \] for all $x \in X$. Put \[ E(R) = \{ x \in X : \M_{f,R}(x) \ge \lambda \} \] and let us analyse for each $x \in X$ the sum \[ \sum_{n=0}^{P-1} 1_{E(R)}(T^n x) \cdot (f(T^n x) - \lambda) \] for $P \in \N$ which we imagine is much larger than $N$. Whenever $T^n(x)$ is outside $E(R)$ the summand has the value zero. If $T^j(x)$ is not in $E(R)$ and $T^{j+1}(x)$ is in $E(R)$ then we must have \[ f(T^{j+1} x) + \cdots + f(T^{j+r} x) \ge r \lambda \] for some $1 \le r \le R$ by definition of $\M_{f,R}$. But this means that the portion \[ \sum_{n=j+1}^{j+r} 1_{E(R)}(T^n x) \cdot (f(T^n x) - \lambda) \] of the above sum is non-negative. Indeed \[ 1_{E(R)}(x) \cdot (f(x) - \lambda) \ge f(x) - \lambda \] holds for all $x \in X$ because points $x$ outwith $E(R)$ satisfy $f(x) \le \lambda$.

The long sum \[ \sum_{n=0}^{P-1} 1_{E(R)}(T^n x) \cdot (f(T^n x) - \lambda) \] is therefore no smaller than \[ \sum_{n=P-R}^{P-1} 1_{E(R)}(T^n x) \cdot (f(T^n x) - \lambda) \] because any good portion has length at most $R$ and any full portion has a non-negative summation. Now \[ \left| \sum_{n=P-R}^{P-1} 1_{E(R)}(T^n x) \cdot (f(T^n x) - \lambda) \right| \le R (K + \lambda) \] so \[ \sum_{n=0}^{P-1} 1_{E(R)}(T^n x) \cdot (f(T^n x) - \lambda) \ge - R(K + \lambda) \] and we can integrate both sides to get \[ -R (K + \lambda) \le P \int 1_{E(R)} \cdot (f - \lambda) \intd \mu \] because $\mu$ is $T$ invariant. Dividing by $P$ and noting that $P \in \N$ was arbitrary gives \[ 0 \le \int 1_{E(R)} \cdot (f - \lambda) \intd \mu \] and we may then apply the dominated convergence theorem to take the limit as $R \to \infty$ obtaining \[ 0 \le \int 1_{\{\M_f \g \lambda\}} \cdot (f - \lambda) \intd \mu \] as desired.

It remains to deduce the result without the assumption that $f$ is bounded. Given any measurable and integrable function $f : X \to \R$ define for each $K \in \N$ the function \[ \phi_K = f \cdot 1_{Q(K)} \] where $Q(K) = \{ x \in X : |f(x)| \le K \}$. The sequence $\phi_K(x)$ converges pointwise to $f(x)$ and one can apply the dominated convergence theorem to finish.

Theorem (Pointwise ergodic theorem)

Let $(X,\B,\mu)$ be a probability space. Fix a measure-preserving map $T : X \to X$ that is ergodic. For every measurable and integrable function $f : X \to \R$ there is a set $\Omega \in \B$ with $\mu(\Omega) = 1$ and \[ \lim_{N \to \infty} \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n(x)) = \int f \intd \mu \] for all $x \in \Omega$.

Proof:

Fix $f : X \to \R$ measurable and integrable. Since we do not know whether the limit of the sequence \[ N \mapsto \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n(x)) \] exists we will work with its limits superior and inferior. Our goal is to prove for every $k \in \N$ that the set \[ B(k) = \left\{ x \in X : \limsup_{N \to \infty} \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n x) \g \int f \intd \mu + \dfrac{1}{k} \right\} \] has zero measure. Indeed, if that is the case then \[ \mu \left(\, X \setminus \bigcup_{k \in \N} B(k) \right) = 1 \] and we have \[ \limsup_{N \to \infty} \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n(x)) \le \int f \intd \mu \] for almost every $x \in X$. Repeating the above argument with $-f$ in place of $f$ then gives \[ \int f \intd \mu \le \liminf_{N \to \infty} \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n(x)) \] for almost every $x \in X$ as well and concludes the proof.

Fix $k \in \N$ and suppose that $B(k)$ has positive measure. The set $B(k)$ is $T$ invariant. Indeed \[ \left| \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n x) - \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n(T x)) \right| \le \dfrac{|f(x)| + |f(T^N x)|}{N} \] so it suffices to prove that \[ \left\{ x \in X : \dfrac{|f(T^N x)|}{N} \to 0 \right\} \] has zero measure. Fix $\epsilon > 0$. The set \[ A(N) = \{ x \in X : |f(T^N x)| \ge N \epsilon \} \] has measure equal to that of \[ \{ x \in X : |f(x)| \ge N \epsilon \} \] because $\mu$ is $T$ invariant. But \[ \sum_{N=1}^\infty \epsilon \mu(A(N)) \le \int |f| \intd \mu \] so the set of points $x$ that belong to infinitely many of the sets $A(N)$ has zero measure by the Borel-Cantelli lemma. In other words, almost every $x \in X$ has the property that \[ \dfrac{|f(T^N x)|}{N} \to 0 \] and therefore \[ \mu( B(k) \triangle T^{-1} B(k)) = 0 \] holds. Ergodicity then forces $\mu(B(k)) = 1$.

We are looking for a contradiction to $\mu(B(k)) = 1$. The crucial quantity to consider is the maximum value \[ \M_f(x) = \sup \left\{ \dfrac{1}{N} \sum_{n=0}^{N-1} f(T^n x) : N \in \N \right\} \] of the ergodic averages. Note that \[ B(k) \subset \left\{ x \in X : \M_f(x) \ge \int f \intd \mu + \dfrac{1}{k} \right\} \] because the supremum of the values of a sequence cannot be smaller than its limit superior. Thus the set \[ C(k) = \left\{ x \in X : \M_f(x) \ge \int f \intd \mu + \dfrac{1}{k} \right\} \] also has $\mu(C(k)) = 1$. The crucial ingredient now is the inequality \[ \int 1_{C(k)} \cdot \left( f - \int f \intd \mu - \dfrac{1}{k} \right) \intd \mu \ge 0 \] which is a special case of the maximal ergodic theorem. If we have this inequality then, as $C(k)$ has full measure, we conclude that \[ 0 \le \int f - \int f \intd \mu - \dfrac{1}{k} \intd \mu = - \dfrac{1}{k} \] which is the desired contradiction.