Avoidance Coupling

I took a read through this fun paper that appeared on ArXiV a few months ago:

Avoidance Coupling
Omer Angel, Alexander E. Holroyd, James Martin, David B. Wilson, Peter Winkler
arXiv:1112.3304v1 [math.PR]

Typically when you think of coupling arguments, you want to create two (or more) copies of a Markov chain such that they meet up quickly. A coupling of Markov chains with transition matrix P is a sequence of pairs of \{(U_t, V_t)\} such that \{U_t\} and \{V_t\} are Markov chains with transition matrix P:

\mathbb{P}( U_{t+1} = j | U_t = i, V_t = i') = P(i,j)
\mathbb{P}( V_{t+1} = j' | U_t = i, V_t = i') = P(i', j')

Note that the two chains need not be independent! The idea is that we start \{U_t\} and \{V_t\} at different initial positions and when they “meet” we make them move together. Once they meet, the decisions from random mappings are the same for both chains. The coupling time is T_c = \min \{t : U_t = V_t \}. Coupling is used to show fast mixing of Markov chains via theorems like the following, which says that the difference between the distribution of the chain started at time i and the chain started at time i' at time t is upper bounded by the probability of coupling:

Theorem. Let \{ (U_t,V_t) \} be a coupling with U_0 = i and V_0 = i'. Then

\| P^t(i,\cdot) - P^t(i',\cdot) \|_{\mathrm{TV}} \le \mathbb{P}_{(U,V)}\left( T_c > t \right)

This paper takes an entirely different tack — sometimes you want to start off a bunch of copies of the chain such that they never meet up. This could happen if you are trying to cover more of the space. So you want to arrange a coupling such that the coupling time is huge. There’s a bit of a definitional issue regarding when you declare that two walkers collide (what if they swap places) but they just say “multiple walkers are assumed to take turns in a fixed cyclic order, and again, a collision is deemed to occur exactly if a walker moves to a vertex currently occupied by another. We call a coupling that forbids collisions an avoidance coupling.”

There are a number of results in the paper, but the simplest one to describe is for two walkers on a complete graph K_n (or the complete graph with loops K_n^{\ast}.

Theorem 4.1. For any composite n = ab, where a,b > 1, there exist Markovian avoidance couplings for two walkers on K_n and on K_n^{\ast}.

How do they do this? They partition n into b clusters \{S_i\} of size a. Let’s call the walkers Alice and Bob. Suppose Alice and Bob are in the same cluster and it’s Bob’s turn to move. Then he chooses uniformly among the vertices in another cluster. If they are in different clusters, he moves uniformly to a vertex in his own cluster (other than his own). Now when it’s Alice’s turn to move, she is always in a different cluster than Bob. She picks a vertex uniformly in Bob’s cluster (other than Bob’s) with probability \frac{a(b-1)}{ab - 1} and a vertex in her own cluster (other than her own) with probability \frac{a - 1}{ab - 1}.

So let’s look at Bob’s distribution. The chance that he moves to a particular vertex outside his current cluster is the chance that Alice moved into his cluster times the uniform probability of choosing something outside his cluster:
\frac{a(b-1)}{ab - 1} \times \frac{a (b-1)} = \frac{1}{ab - 1}
The chance that he moves to a vertex inside his own cluster is likewise
\frac{a - 1}{ab - 1} \times \frac{1}{a-1} = \frac{1}{ab - 1}
So the marginal transitions of Bob are the same as a uniform random walk.

For Alice’s distribution we look at the time reversal of the chain. In this case, Alice’s reverse distribution is Bob’s forward distribution and vice versa, so Alice also looks like a uniform random walk.

There are a number of additional results in the paper, such as:

Theorem 7.1. There exists a Markovian avoidance coupling of k walkers on K_n^{\ast} for any k \le n/(8 log_2 n), and on K_n for any k \le n/(56 log_2 n)$.

Theorem 8.1. No avoidance coupling is possible for n - 1 walkers on K_n^{\ast} , for n \ge 4.

In addition there are a ton of open questions at the end which are quite interesting. I didn’t mention it here, but there are also interesting questions of the entropy of the coupling — lower entropy implies easier simulation, in a sense.


Products of random stochastic matrices

As I mentioned, Behrouz Touri gave a really great presentation at ITA on some of his work with Angelia Nedić on products of stochastic matrices, some of which was in this paper on ArXiV. The setup of the paper is relatively straightforward — we have a sequence of independent random matrices \{W(k)\}, each of which is row-stochastic almost surely, and we want to know when the product \lim_{k \to \infty} W(k) W(k-1) \cdots W(t_0) converges almost surely. The main result is that if the chain is balanced and strongly aperiodic, then the limit is a random stochastic matrix such that the rows in the same connected component of the infinite flow graph are equal.

Recall that for a chain W a lazy version of the chain is \alpha W + (1 - \alpha) I. Laziness helps avoid periodicity by letting the chain be “stuck” with probability (1 - \alpha). Strongly aperiodic means that there is a \gamma \in (0,1] such that \mathbb{E}[ W_{ii}(k) W_{ij}(k) ] \ge \gamma \mathbb{E}[ W_{ij}(k) ]. Basically this is a sort of “expected laziness condition” which says that there is enough self-transition probability to avoid some sort of weak periodicity in the chain.

Consider a cut of the chain into a set of states S and a set \bar{S}. A chain is balanced if there is an \alpha > 0 such that for all cuts (S,\bar{S}), we have \mathbb{E}[ W_{S \bar{S}}(k) ] \ge \alpha \mathbb{E}[ W_{\bar{S} S}(k) ]. So this is saying that at each time, the flow out of S is commensurate with the flow into S.

The infinite flow graph has the same vertex set as the original chain, but with edge (i,j) existing only if \sum_{k} W_{ij}(k) + W_{ji}(k) = \infty. That is, the edge has to exist infinitely often in the graph.

So to add it all up, the well-behaved independent products of stochastic matrices that converge are those which don’t behave badly over time — for each time k, they don’t shift around the mass too much and they are not too periodic. Basically, they don’t become volatile and cyclical, which sounds perfectly reasonable. So how does he prove it?

The first part is to connect the problem with a dynamic system whose state x(k+1) = W(k+1) x(k) and look to see if the state converges in the limit. The next part uses a connection to absolute probability processes, which were used by Kolmogorov in 1936. A random vector process \pi(k) is an absolute probability process for \{W(k)\} if it is adapted to the same filtration as the chain, it’s stochastic almost surely, and

\mathbb{E}[ \pi(k+1) W(k+1) | \mathcal{F}_k ] = \pi(k)

Kolmogorov showed that for a deterministic sequence of stochastic matrices there is always an deterministic absolute probability process, so for independent chains we can always find a random absolute probability process. Using this, Touri and Nedić define a class of comparison functions which are super-martingales for each $\latex x(0)$ and have a limit. By choosing a particular comparison function they can get a version of the main result. It’s a nicely written paper and worth a skim if you’re interested in these things (as I am).