Broadcast gossip revisited : companion variables and convergence to consensus

Mike Rabbat pointed out his new preprint on ArXiv:

Broadcast Gossip Algorithms for Consensus on Strongly Connected Digraphs
Wu Shaochuan, Michael G. Rabbat

The basic starting point is the unidirectional Broadcast Gossip algorithm — we have n nodes in a graph and each node i starts with a value x_i(0). At each time, a random node broadcasts its value to its neighbors in the graph and they each update their value with the weighted average of their current value and the received value. Eventually, this process converges almost surely and all nodes i will have a common value \hat{x}. However, in general \hat{x} \ne \bar{x}, where \bar{x} = \frac{1}{n} \sum x_i(0), but \mathbb{E}[\hat{x}] = \bar{x}. So this process achieves consensus but only computes the average in expectation.

The reason broadcast gossip does’t compute the average is pretty clear — since the communication and updates are unidirectional, the average of the nodes’ values changes over time. One way around this is to use companion variables \{y_i(t)\} to track the effect of the broadcast. These have been studied before, but in this paper they set the initial values y_i(0) = 0 and perform updates as follows : if node k broadcasts at time t and node j is a neighbor, then

x_j(t+1) = (1 - a_{j,k}) x_j(t) + a_{j,k} x_k(t) + \epsilon d_j^{(k)} y_j(t)
y_j(t+1) = a_{j,k} (x_j(t) - x_k(t)) + (1 - \epsilon d_j^{(k)}) y_j(t) + b_{j,k} y_k(t)
x_k(t+1) = x_k(t)
y_j(t+1) = 0

So what’s going on here? Each time a node transmits it resets its companion variable to 0. Each time it receives a broadcast it accumulates an offset in y_j. The actual value estimate at the nodes is weighted average of the current and received value plus some of the local offset.

The parameters of the algorithm are matrices A = (a_{j,k}) and B = (b_{j,k}) and per-node matrices D_k = (d_j^{(k)}). Let

A_k = A e_k e_k^T
B_j = B e_k e_k^T
L_k = \mathrm{diag}(A_k \mathbf{1}) - A_k
S_k = I - e_k e_k^T + B_k

When node k broadcasts) we can write a matrix

W_k \left[ \begin{array}{cc} I - L_k & \epsilon D_k \\ L_k & S_k - \epsilon D_k \end{array} \right]

such that

\left[ \begin{array}{c} \mathbf{x}(t+1) \\ \mathbf{y}(t+1) \end{array} \right]^T = W(t) \left[ \begin{array}{c} \mathbf{x}(t) \\ \mathbf{y}(t) \end{array} \right]^T

where W(t) = W_k if node k broadcasts at time $t$.

The main results are:

  1. If we choose \epsilon small enough, then \lim_{t \to \infty} \mathbb{E}[ \mathbf{x}(t) | \mathbf{x}(0) ] = \hat{x} \mathbf{1} and \hat{x} = \bar{x} under more conditions.
  2. A sufficient condition for the second moment of the error to go to 0.
  3. Rules on how to pick the parameters, especially \epsilon.

There are also some nice simulations. Broadcast gossip is nice because of its simplicity, and adding a little monitoring/control variable y_j(t) seems to buy a lot in terms of controlling bad sample paths for the process.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.