# The strong converse for the MAC

I have been reading Ahlswede’s paper on An Elementary Proof of the Strong Converse Theorem for the Multiple-access Channel, J. Combinatorics, Information and System Sciences, Vol. 7, No. 3, 216-230. The preprint is here but I have a scanned version of the journal version thanks to the intrepid UC Libraries staff — they are a great resource!

I’m going to split up this rather rambling post into a few posts. The main result is the following: for the discrete memoryless MAC, let
$\mathcal{R} = \mathrm{conv}\{ (R_1, R_2) : X,Y \ \mathrm{independent} \\ R_1 \le I(X; Z|Y),\\ R_2 \le I(Y; Z | X),\\ R_1 + R_2 \le I(XY ; Z) \}$
Then the for any code of blocklength $n$, rates $(R_1, R_2)$, and average error $\lambda$, the rates must satisfy
$(R_1, R_2) \in ( 1 + O( \frac{ \log n }{\sqrt{n}} ) ) \mathcal{R}$

The first step in the proof is to look at non-stationary discrete memoryless channels. Suppose we have a codebook for a point-to-point channel $P(\mathbf{z} | \mathbf{u})$ with codewords $\{\mathbf{u}_i : i = 1, 2, \ldots, M\}$. You can think of these as vector inputs and outputs representing a whole block, and $P$ is just some channel. For a parameter $\theta$ and distribution $r(\mathbf{z})$ on the output we can look at the set of outputs
$B_{\mathbf{u}}(\theta, r) = \{ \mathbf{z} : \frac{P(\mathbf{z} | \mathbf{u} )}{ r(\mathbf{z})} \ge e^{\theta} \}$.
The first thing to show is a packing lemma.

Lemma. Suppose that for an code with $M$ codewords and maximal error $\lambda$ there exists a distribution $r$ and positive numbers $\theta_1, \theta_2, \ldots, \theta_M$ such that
$\max_{i} P\left( B_{\mathbf{u_i}}(\theta_i, r) | \mathbf{u}_i \right) < \kappa$.
Then
$M < \frac{1}{1 - \lambda - \kappa} \exp\left( \frac{1}{M} \sum_{i=1}^{M} \theta_i \right)$

This Lemma can be used, with an appropriate choice of $r(\mathbf{z})$, to show a converse for non-stationary DMCs:
$W(\mathbf{z} | \mathbf{u} ) = \prod_{t=1}^{n} W_t(z_i | u_t)$.

The first question is what $r$ should we choose? One choice is to put the uniform distribution on the codewords $\{\mathbf{u_i} \}$ and then just set
$r(\mathbf{z}) = \frac{1}{M} \sum_{i=1}^{M} W(\mathbf{z} | \mathbf{u}_i)$
Ahlswede calls this the “Fano-distribution,” and remarks that you can use this to get a weak converse. A good choice, used by Augustin, is to pick the output distribution induced by the product of the marginal distributions on each input letter via the uniform distribution of the codebook. Phew! That is a mouthful. So first you take the marginal distribution on each letter of the input:
$p_t(u) = \frac{1}{M} \sum_{i=1}^{M} 1( u_{i,t} = u )$
then take the product of these:
$p^n(\mathbf{u}) = \prod_{t=1}^{n} p_t(u_t)$
And then look at the corresponding output distribution:
$r^n(\mathbf{z}) = \prod_{t=1}^{n} \left( \frac{1}{M} \sum_{i=1}^{M} w_t(z | u_{i,t}) \right)$.
Ahlswede calls the input distribution in this case the “Fano*-distribution.” The intuition about this choice is that it factors into independent components (so we can split things into a sum of mutual informations), but the distribution is still related to the codebook rather than just the channel.

The next thing is to choose an appropriate $\theta_i$. The way to understand the right choice is to look at the set $B_{\mathbf{u}_i}(\theta_i, r^n)$, which corresponds to some event related to the (random) output $\mathbf{Z}$ of the channel. The condition of the Lemma says that the probability of this event under input $\mathbf{u}_i$ should be less than $\kappa$. Taking logs on both sides we get that a $\mathbf{Z} \in B_{\mathbf{u}_i}(\theta_i, r^n)$ occurs when
$\log \frac{P(\mathbf{Z} | \mathbf{u}_i )}{ r^n(\mathbf{Z})} \ge \theta_i$
So we treat $A_i = \log \frac{P(\mathbf{Z} | \mathbf{u}_i )}{ r^n(\mathbf{Z})}$ as a random variable and set
$\theta_i = \mathbb{E}[ A_i ] + ( \frac{2}{1 - \lambda} \mathrm{Var}[A_i] )^{1/2}$.
With this choice we can just apply Chebyshev’s inequality to get that $\kappa = \frac{1 - \lambda}{2}$ in the Lemma.

Applying the Lemma and indulging in some bounding-ology yields the point-to-point result.

For a code of blocklength $n$ with $M$ messages and maximal error probability $\lambda$ for the nonstationary DMC:
$\log M \le \sum_{t=1}^{n} I(X_t ; Z_t) + (\frac{ 2 }{1 - \lambda} 3 |\mathcal{X}| n)^{1/2} + \log \frac{2}{1 - \lambda}$,
where the distributions of the random variables are given by the Fano-distribution on the codewords.

Note that in the mutual information expressions we use the Fano-distribution on the codewords. The choice of the Fano*-distribution for $r^n$ was just for applying the Lemma. I’ll pick up the next section of the argument later.

## 8 thoughts on “The strong converse for the MAC”

1. Hari P. says:

Great topic for a writeup Anand! One thing that always made me uncomfortable about results like the lemma here and others in Csiszar & Korner: what’s the intuition (or even the technical reason) for the $1 – \lambda – \kappa$?

Also, interesting how this result for non-stationary DMCs connects to the information-spectrum work 20 years later. The idea of using this Fano*-distribution was something interesting that I saw first in the information-spectrum stuff.

• Ha! I knew you would find it interesting. More comments later…

• The $1 - \lambda - \kappa$ shows up in the proof where you look at the part of the decoding set $D_i$ for message $i$ that is not part of the $B$ set defined above. Since the probability of successful decoding is $1 - \lambda$, the probability of getting an output that is in $D_i$ but not in $B_{\mathbf{u}_i}(\theta_i, r)$ is at least $1 - \lambda - \kappa$.

We want to get a lower bound on the mass of such outputs under the $r$ distribution.

2. Marcel Fernandez says:

Nice post! There are some things I can’t understand about that paper. But I’ll wait for the next posts, in case they are answered there. 🙂

• Ah but you will have to wait a few days for that — I keep saying I’m going to do it but then I get lazy…

This site uses Akismet to reduce spam. Learn how your comment data is processed.