\[ \newcommand{\C}{\mathbb{C}} \newcommand{\haar}{\mathsf{m}} \DeclareMathOperator{\cont}{\mathsf{C}} \newcommand{\contc}{\cont_\mathsf{c}} \newcommand{\conto}{\cont_\mathsf{0}} \newcommand{\P}{\mathcal{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\g}{>} \newcommand{\l}{<} \newcommand{\intd}{\,\mathsf{d}} \newcommand{\Re}{\mathsf{Re}} \newcommand{\area}{\mathop{\mathsf{Area}}} \newcommand{\met}{\mathop{\mathsf{d}}} \newcommand{\orb}{\mathop{\mathsf{orb}}} \newcommand{\emptyset}{\varnothing} \newcommand{\B}{\mathscr{B}} \DeclareMathOperator{\borel}{\mathsf{Bor}} \DeclareMathOperator{\lpell}{\mathsf{L}} \newcommand{\lp}[1]{\lpell^{\!\mathsf{#1}}} \newcommand{\Lp}[1][p]{\mathsf{L}^{\!\mathsf{#1}}} \renewcommand{\|}{|\!|} \]

Full shifts

In the previous two sections we studied the irrational rotation \[ T(x) = x + \alpha \bmod 1 \] on $[0,1)$ where $\alpha$ was a fixed irrational number. We proved that all orbits are dense, and that all orbits are uniformly distributed in the sense that \[ \lim_{N\to \infty} \dfrac{1}{N} \sum_{n=0}^{N-1} 1_{[a,b)}(T^n(x)) = b-a \] for all $0 \le a \l b \le 1$ and all $x \in [0,1)$.

The full shift on two symbols

We are next going to try to study the concepts for a different dynamical system. We will take \[ X = \{0,1\}^\N \] and study the map $T : X \to X$ defined by \[ (T(x))(n) = x(n+1) \] for all $x \in X$ and all $n \in \N$. A point $x \in X$ is an infinite sequence of zeroes and ones. We can represent such sequences as infinite strings \[ \begin{aligned} x & {} = \mathtt{01101101001010101100010101000101001001010}\cdots \\ T(x) & {} = \mathtt{1101101001010101100010101000101001001010}\cdots \\ T^2(x) & {} = \mathtt{101101001010101100010101000101001001010}\cdots \\ T^3(x) & {} = \mathtt{01101001010101100010101000101001001010}\cdots \end{aligned} \] and in that representation the effect of $T$ is to discard the first term.

In comparison with irrational rotations, the qualitative behaviour of the orbits of $T$ can vary dramatically. All of the following are possible.

The behaviour of empirical averages is also much more difficult to control. When working with irrational rotations we studeied the frequencies with which orbit segments visited $[a,b)$ in the long term by considering the quantity \[ \dfrac{1}{N} \sum_{n=0}^{N-1} 1_{[a,b)}(T^n(x)) = \dfrac{|\{ 0 \le n \le N-1 : a \le T^n(x) \l b \}|}{N} \] in the limit $N \to \infty$. We want to do the same thing for our shift map $T$ on $\{0,1\}^\N$. What do we use instead of intervals?

Definition

By a cylinder set we mean any set of the form \[ \{ x \in \{0,1\}^\N : x(i_1) = \epsilon(i_1),\dots,x(i_r) = \epsilon(i_r) \} \] where $i_1 \l \cdots \l i_r$ are natural numbers and each $\epsilon(i_j)$ is either 0 or 1.

A cylinder set is a subset of $\{0,1\}^\N$ defined by specifying the values to be taken by sequences in $\{0,1\}$ at certain indices.

Example

If $r = 1$, $i_1 = 2$ and $\epsilon(2) = 0$ then the corresponding cylinder is the set of all sequences in $X$ that have a zero in the second position.

We will use a special notation for cylinder sets with $i_1 = 1,\dots,i_r = r$ as these are the cylinder sets we will work with most often. Write \[ [\epsilon(1) \epsilon(2) \cdots \epsilon(r)] = \{ x \in \{0,1\}^\N : x(1) = \epsilon(1),\dots,x(r) = \epsilon(r) \} \] for any $\epsilon(1),\dots,\epsilon(r)$ in $\{0,1\}$. So, for example, we have \[ \begin{aligned} {}[0] = {}& \{ x\in \{0,1\}^\N : x(1) = 0 \} \\ [11] = {}& \{ x\in \{0,1\}^\N : x(1) = 1, x(2) = 1 \} \\ [01] = {}& \{ x\in \{0,1\}^\N : x(1) = 0, x(2) = 1 \} \\ [101] = {}& \{ x\in \{0,1\}^\N : x(1) = 1, x(2) = 0,x(3) = 1 \} \end{aligned} \]

The cylinder sets will be our analogues in $\{0,1\}^\N$ of the intervals $[a,b)$ in $[0,1)$. Topologically, cylinder sets are slightly better behaved than intervals. For one thing, every cylinder set is both open and closed with respect to the metric \[ \met(x,y) = \sum_{n=1}^\infty \dfrac{|x(n) - y(n)|}{2^n} \] on $\{0,1\}^\N$. For another, we can cover $\{0,1\}^\N$ by cylinders withour overlap. For example \[ \{0,1\}^\N = [0] \cup [10] \cup [110] \cup [111] \] is a cover of $\{0,1\}^\N$ by pairwise disjoint sets that are both open and closed.

What we want to do is to investigate the extent to which \[ \lim_{N \to \infty} \dfrac{1}{N} \sum_{n=0}^{N-1} 1_C(T^n(x)) \] exists for points $x \in \{0,1\}^\N$ and cylinders $C \subset \{0,1\}^\N$.

A measure on the full shift

When analyzing irrational rotations we deduced that the Lebesgue measure was responsible for the limiting values of the empirical averages we were interested in. To analyze the shift map $T$ on $\{0,1\}^\N$ we similarly need a measure on the Borel subsets of $\{0,1\}^\N$. Fix $0 \le p \le 1$ and let us decalare that \[ \mathbb{P}([\epsilon(1) \cdots \epsilon(r)]) = p^{|\{ 1 \le i \le r : \epsilon(i) = 1 \}|} (1-p)^{|\{ 1 \le i \le r : \epsilon(i) = 0\}|} \] so that, for example \[ \begin{aligned} \mathbb{P}([1]) &{} = p \\ \mathbb{P}([0]) &{} = 1-p \\ \mathbb{P}([101]) &{} = p^2(1-p) \\ \end{aligned} \] and then put \[ \Xi(E) = \inf \left\{ \sum_{n=1}^\infty \mathbb{P}(C_n) : E \subset \bigcup_{n=1}^\infty C_n \textsf{ for } C_1,C_2,\dots \textsf{ cylindres} \right\} \] which defines an outer measure on $\{0,1\}^\N$. The Carathéodory construction we used to construct Lebesgue measure can be applied in the same way to construct a measure $\mu_p$ on the Borel subsets of $\{0,1\}^\N$ with the property that $\mu_p(C) = \mathbb{P}(C)$ for every cylinder $C$. We take the existence of such measures for granted without going through the details again.

We call the resulting measure $\mu_p$ the $(p,1-p)$ coin measure and we call the $(\tfrac{1}{2},\tfrac{1}{2})$ coin measure the fair coin measure. Our main goal is to use these measures to prove the following theorem.

Theorem

Fix $0 \le p \le 1$. The set \[ \left\{ x \in \{0,1\}^\N : \lim_{N \to \infty} \dfrac{| \{ 1 \le n \le N : x(n) = 1 \}|}{N} = p \right\} \] has full measure with respect to $\mu_p$.

Taking $C = [1]$ and writing \[ \dfrac{| \{ 1 \le n \le N : x(n) = 1 \}|}{N} = \dfrac{1}{N} \sum_{n=1}^N x(n) = \dfrac{1}{N} \sum_{n=0}^{N-1} 1_C(T^n(x)) \] gives us the same perspective - of averaging orbit segments along functions - as was fruitful when working with irrational rotations. However, we will not be able to proceed as smoothly because we do not have an analogue of the functions $\psi_k$ whose average over the orbit of an irrational rotation we were able to calculate relatively easily. Instead we will take a more probabilistic approach.