# Broadcast gossip revisited : companion variables and convergence to consensus

Mike Rabbat pointed out his new preprint on ArXiv:

Broadcast Gossip Algorithms for Consensus on Strongly Connected Digraphs
Wu Shaochuan, Michael G. Rabbat

The basic starting point is the unidirectional Broadcast Gossip algorithm — we have $n$ nodes in a graph and each node $i$ starts with a value $x_i(0)$. At each time, a random node broadcasts its value to its neighbors in the graph and they each update their value with the weighted average of their current value and the received value. Eventually, this process converges almost surely and all nodes $i$ will have a common value $\hat{x}$. However, in general $\hat{x} \ne \bar{x}$, where $\bar{x} = \frac{1}{n} \sum x_i(0)$, but $\mathbb{E}[\hat{x}] = \bar{x}$. So this process achieves consensus but only computes the average in expectation.

The reason broadcast gossip does’t compute the average is pretty clear — since the communication and updates are unidirectional, the average of the nodes’ values changes over time. One way around this is to use companion variables $\{y_i(t)\}$ to track the effect of the broadcast. These have been studied before, but in this paper they set the initial values $y_i(0) = 0$ and perform updates as follows : if node $k$ broadcasts at time $t$ and node $j$ is a neighbor, then

$x_j(t+1) = (1 - a_{j,k}) x_j(t) + a_{j,k} x_k(t) + \epsilon d_j^{(k)} y_j(t)$
$y_j(t+1) = a_{j,k} (x_j(t) - x_k(t)) + (1 - \epsilon d_j^{(k)}) y_j(t) + b_{j,k} y_k(t)$
$x_k(t+1) = x_k(t)$
$y_j(t+1) = 0$

So what’s going on here? Each time a node transmits it resets its companion variable to $0$. Each time it receives a broadcast it accumulates an offset in $y_j$. The actual value estimate at the nodes is weighted average of the current and received value plus some of the local offset.

The parameters of the algorithm are matrices $A = (a_{j,k})$ and $B = (b_{j,k})$ and per-node matrices $D_k = (d_j^{(k)})$. Let

$A_k = A e_k e_k^T$
$B_j = B e_k e_k^T$
$L_k = \mathrm{diag}(A_k \mathbf{1}) - A_k$
$S_k = I - e_k e_k^T + B_k$

When node $k$ broadcasts) we can write a matrix

$W_k \left[ \begin{array}{cc} I - L_k & \epsilon D_k \\ L_k & S_k - \epsilon D_k \end{array} \right]$

such that

$\left[ \begin{array}{c} \mathbf{x}(t+1) \\ \mathbf{y}(t+1) \end{array} \right]^T = W(t) \left[ \begin{array}{c} \mathbf{x}(t) \\ \mathbf{y}(t) \end{array} \right]^T$

where $W(t) = W_k$ if node $k$ broadcasts at time $t$.

The main results are:

1. If we choose $\epsilon$ small enough, then $\lim_{t \to \infty} \mathbb{E}[ \mathbf{x}(t) | \mathbf{x}(0) ] = \hat{x} \mathbf{1}$ and $\hat{x} = \bar{x}$ under more conditions.
2. A sufficient condition for the second moment of the error to go to 0.
3. Rules on how to pick the parameters, especially $\epsilon$.

There are also some nice simulations. Broadcast gossip is nice because of its simplicity, and adding a little monitoring/control variable $y_j(t)$ seems to buy a lot in terms of controlling bad sample paths for the process.