The last time we saw that a packing lemma, together with a good choice of reference output distribution (the Fano-* distribution) could give a converse for nonstationary DMC:
where the distribution on is given by the Fano distribution on the codebook, i.e. uniform input distribution on codewords followed by applying the channel. The next step in the argument is to apply this result to the MAC with inputs and .
First start with a code for the MAC — codewords for user 1, codewords for user 2, and decoding regions for message pair . Suppose the average error is bounded by :
If we think about a fixed codeword for user 2, then coding for user 1 is like coding over a nonstationary DMC, and vice versa. It turns out that we can’t apply the result directly though, and instead we want to consider only those for which doesn’t have too high error probability, and vice versa. That is, define the set
Now by looking at the uniform distribution on pairs in we can apply the previous result to get the following result.
Lemma. An code for the MAC satisfies for some :
Where the distribution of is given by the uniform distribution on the codewords in passed through the DM-MAC .
So it looks like we’re done, right? Just pass to single-letterization and that’s it! Unfortunately, what we don’t have here is that and are independent. That’s because we have a picture that looks like this:
That is, is not a product set, so the uniform distribution on doesn’t factor into independent distributions on the codewords for users 1 and 2. So can we find a large product set contained in and get a converse that way? It turns out this is hard. This is why Ahlswede (and before, Dueck) need wringing techniques. These results (saved for next time) basically say something like this: “given some random vectors whose mutual information is not too big, we can find some entries and fixed values for those entries such that the vectors are even less dependent when conditioned on those fixed values. Furthermore, the probability that those entries have those fixed values is not too small.” That is, we can “locate” the dependence in the random vectors and squeeze it out by conditioning on fixed values.
The rest of the proof strategy goes like this: use wringing techniques to find a subcode such that are almost independent. Then if we can get the same rate bounds, we can use the continuity of mutual information to approximate the distribution with a product distribution without affecting the rate by too much, which shows it’s sufficient to consider independent .
This whole approach is confusing because a priori you know that the inputs have to be independent. But because this is a converse bound, the random variables that appear in the mutual information expressions don’t necessarily have the operational significance that we think. In the next post on this topic I’ll describe one of the two wringing lemmas, and then I’ll try to tie the proof up after that.