Markov chains

Consider the set Y={x{0,1}N:x(n)=1x(n+1)=0} of sequences in which every 1 is followed by a zero. This is certainly invariant for the shift map: if yY then T(y) belongs to Y as well.

To study the dynamics of T on Y using ergodic theory we want a measure on Y that is T invariant. For each 0p1 we have the (1p,p) coin measure μp on {0,1}N and each is T invariant. However, only one - the point measure μ0 - gives positive measure to Y. For example, since the number wn of cylinders of length n that intersects Y satisfies the recurrence w1=2w2=3wn+2=wn+1+wn for all nN we have μ1/2(Y)wn2n0 as n. How can we equip Y with a probability measure μ that is T invariant?

Walks on directed graphs

The coin measure were inappropriate because they see all cylinders: they assign [01] and [11] positive measure whereas only one of those cylinders intersects Y. We can think of the points in Y as the result of all possible infinite walks on the directed graph with vertices 0, 1 and directed edges as follows.

To any endless journey on the graph one associates a sequence in Y by recording the labels associated with the visited vertices. As the vertex labelled 1 cannot be visited consecutively, and as there are not other restrictions, we get all sequences in Y this way.

Let us assign probabilities to each traversal. We encode these in a matrix [q(0,0)q(0,1)q(1,0)q(1,1)] with 0q(i,j)1 representing the probability that one moves in a single step from vertex i to vertex j. We must have q(0,0)+q(0,1)=1q(1,0)+q(1,1)=1 and we must also have q(1,1)=0 as we forbid consecutive ones. To entirely determine the measure we also need the probability of the starting location. Fix 0p(i)1 with p(0)+p(1)=1 where p(i) is the probability that one begins at vertex i. With this information - values for all p(i) and all q(i,j) - we define ν([ϵ1ϵr])=p(ϵ1)i=1r1q(ϵi,ϵi+1) on all cylinder sets [ϵ1ϵr]. For example ν([01])=p(0)q(0,1) is the probability of beginning at 0 multiplied by the probability that the first step is from 0 to 1.

Note that if μ is to be an invariant measure then we must have p(i)=μ([i])=μ(T1[i])=p(0)q(0,i)+p(1)q(1,i) which is to say [p(0)p(1)][q(0,0)q(0,1)q(1,0)q(1,1)]=[p(0)p(1)] holds so we assume this of our parameters.

We will take for granted that the above formula defines a measure ν on {0,1}N. Let us verify that it is T invariant. Recall that it suffices to check ν(C) and ν(T1C) agree for all cylinder sets C because such sets form a π system. Write C=[ϵ1ϵr]. We calculate ν(T1C)=ν([0ϵ1ϵr])+ν([1ϵ1ϵr])=(p(0)q(0,ϵ1)+p(1)q(1,ϵ1))i=1r1q(ϵi,ϵi+1)=p(ϵ1)i=1r1q(ϵi,ϵi+1)=ν(C) so ν is T invariant.

We have total freedom in the parameters p(0) and q(0,0). Fixing their values then determines p(1) and q(0,1) by the laws of total probability. What is the best way to choose their values? Absent any other information about the dynamics, or any other quantity that we might be interested in, it is often reasonable to choose the values that maximize the entropy.

Proposition

For the above measure the quantity p(0)q(0,0)logq(0,0)p(0)q(0,1)logq(0,1) is the entropy of T.

Proof:

As ξ=([0],[1]) is a generator for the Borel σ algebra on X the limit limN1NH(n=0N1Tnξ) is the entropy H(T) we want to calculate. We have H(n=0N1Tnξ)=ϵ{0,1}Nμ([ϵ1ϵr])logμ([ϵ1ϵr])=ϵ{0,1}Np(ϵ1)q(ϵ1,ϵ2)q(ϵN1,ϵN)logp(ϵ1)q(ϵ1,ϵ2)q(ϵN1,ϵN)=i{0,1}p(i)logp(i)+(N1)i,j{0,1}p(i)q(i,j)logq(i,j) by writing a sum of logarithms and then using repeatedly both p(i)=p(0)q(0,i)+p(1)q(1,i)q(i,0)+q(i,1)=1 for i=1,2. Dividing by N and taking the limit as N gives the desired result as q(1,1)=0 and q(1,0)=1 in our special case.

The Parry measure

We would like to maximize p(0)q(0,0)logq(0,0)p(0)q(0,1)logq(0,1) for values in [0,1] subject to p(0)+p(1)=1q(0,0)+q(0,1)=1p(0)q(0,0)+p(1)=p(0)p(0)q(0,1)=p(1) which is not so simple an optimization problem.

If we attempt to be as unbiased as possible in our walk on the graph, choosing which edge to traverse from vertex 0 each time by a fair coin toss then we can assert q(0,0)=12=q(0,1) which then forces p(0)=23 and p(1)=13. For these particular choices we get an entropy value of H(T)=21312log12=23log20.200686664 but it is not clear that this is maximal.

Theorem (Parry)

Let B be a k×k matrix with entries from {0,1}. Suppose there is rN with all entries of Br positive. Let λ be the largest positive eigenvalue of B. Fix left and right eigenvectors u and v of B respectively with u(1)v(1)++u(k)v(k)=1 such that both have an eigenvalue of λ. With p(i)=u(i)v(i) and q(i,j)=B(i,j)λv(j)v(i) the corresponding Markov measure maximizes the entropy for T on the set Y={x{1,,k}N:B(x(n),x(n+1))=1 for all nN} where transitions are determined by B.

Proof:

This is Theorem 8.10 in Walters.

The resulting measure is the Parry measure on the Markov chain. For our example we have B=[1110] with eigenvalues 1521+52 and the latter must be λ. The vectors U=[1+52]V=[1+52] are left and right eigenvectors respectively of B with eingevalue λ. Since the ratio of the eigenvectors is unchanged by scaling we conclude that q(0,0)=B(0,0)λV(0)V(0)=1λ and that q(0,1)=B(0,1)λV(1)V(0)=1λ2 are the values that will maximize the entropy. As p(0)q(0,1)=p(1)=1p(0) we conclude that p(0)=λ21+λ2p(1)=11+λ2 giving an entropy value of H(T)=logλ0.20898764 which is indeed larger than entropy one gets from our naive guess q(0,0)=12. In fact, the entropy of the Parry measure is always logλ.