The strong converse for the MAC, part 2

The last time we saw that a packing lemma, together with a good choice of reference output distribution r (the Fano-* distribution) could give a converse for nonstationary DMC:

\log M \le \sum_{t=1}^{n} I(X_t ; Z_t) + (\frac{ 2 }{1 - \lambda} 3 |\mathcal{X}| n)^{1/2} + \log \frac{2}{1 - \lambda}

where the distribution on (X^n,Z^n) is given by the Fano distribution on the codebook, i.e. uniform input distribution on codewords followed by applying the channel. The next step in the argument is to apply this result to the MAC with inputs X and Y.

First start with a code for the MAC — M codewords \{\mathbf{u}_i\} for user 1, N codewords \{ \mathbf{v}_j \} for user 2, and decoding regions \{D_{ij} \subset \mathcal{Z}^n\} for message pair (i,j). Suppose the average error is bounded by \bar{\lambda}:

\frac{1}{MN} \sum_{i=1}^{M} \sum_{j=1}{N} W(D_{ij} | \mathbf{u}_i, \mathbf{v}_j) \ge 1 - \bar{\lambda}

If we think about a fixed codeword \mathbf{v}_j for user 2, then coding for user 1 is like coding over a nonstationary DMC, and vice versa. It turns out that we can’t apply the result directly though, and instead we want to consider only those i for which (i,j) doesn’t have too high error probability, and vice versa. That is, define the set

\mathcal{A} = \{ (i,j) : W(D_{ij} | \mathbf{u}_i, \mathbf{v}_j) \ge \frac{1}{2} (1 - \bar{\lambda}).

Now by looking at the uniform distribution on pairs in \mathcal{A} we can apply the previous result to get the following result.

Lemma. An (n, M, N, \bar{\lambda}) code for the MAC satisfies for some c(\bar{\lambda}):

\log M \le \sum_{t=1}^{n} I(X_t ; Z_t | Y_t) + c(\bar{\lambda})\sqrt{n} \\  \log N \le \sum_{t=1}^{n} I(Y_t ; Z_t | X_t) + c(\bar{\lambda})\sqrt{n} \\  \log MN \le \sum_{t=1}^{n} I(X_t, Y_t ; Z_t) + c(\bar{\lambda})\sqrt{n}

Where the distribution of (X^n,Y^n,Z^n) is given by the uniform distribution on the codewords in \mathcal{A} passed through the DM-MAC W(z | x,y).

So it looks like we’re done, right? Just pass to single-letterization and that’s it! Unfortunately, what we don’t have here is that X^n and Y^n are independent. That’s because we have a picture that looks like this:

The set of codewords for the converse is not a product set.

That is, \mathcal{A} is not a product set, so the uniform distribution on \mathcal{A} doesn’t factor into independent distributions on the codewords for users 1 and 2. So can we find a large product set contained in \mathcal{A} and get a converse that way? It turns out this is hard. This is why Ahlswede (and before, Dueck) need wringing techniques. These results (saved for next time) basically say something like this: “given some random vectors whose mutual information is not too big, we can find some entries and fixed values for those entries such that the vectors are even less dependent when conditioned on those fixed values. Furthermore, the probability that those entries have those fixed values is not too small.” That is, we can “locate” the dependence in the random vectors and squeeze it out by conditioning on fixed values.

The rest of the proof strategy goes like this: use wringing techniques to find a subcode such that (X^n,Y^n) are almost independent. Then if we can get the same rate bounds, we can use the continuity of mutual information to approximate the distribution with a product distribution without affecting the rate by too much, which shows it’s sufficient to consider independent (X^n, Y^n).

This whole approach is confusing because a priori you know that the inputs have to be independent. But because this is a converse bound, the random variables that appear in the mutual information expressions don’t necessarily have the operational significance that we think. In the next post on this topic I’ll describe one of the two wringing lemmas, and then I’ll try to tie the proof up after that.

8 thoughts on “The strong converse for the MAC, part 2

  1. I understand (or I think I do) that $X^n$ and $Y^n$ are not independent … but what I don’t understand for sure is why it is enough that they are “almost” independent, and it is not needed that they are “totally independent” …

    • If they are “almost” independent then you can approximate the distribution with a product distribution. Let (X^n Y^n) \sim P be the almost independent distribution and let (\hat{X}^n, \hat{Y}^n) \sim Q be independent such that d_{TV}(P,Q) (the total variational distance) is small (in fact vanishing with n). Then we can approximate \sum I(X_t ; Y_t) with \sum I(\hat{X}_t; \hat{Y}_t). At least that’s the idea.

  2. By the way, I came across this awesome paper by Rudolf Alshwede through this other nice paper:

    “On the Fingerprinting Capacity Under the Marking Assumption” N. Prasanth Anthapadmanabhan, Alexander Barg, and Ilya Dumer. IEEE Transactions on Information Theory 54(6): 2678-2689 (2008)

    where the proof of Theorem 5.1 relies heavily in the Rudolf Alshwede’s paper, but in an entirely different setting 🙂

  3. Pingback: The strong converse for the MAC, part 4 « An Ergodic Walk

  4. Pingback: The strong converse for the MAC summary « An Ergodic Walk

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.