The strong converse for the MAC, part 3

This post was written using Luca’s LATEX2WP tool.

The topic of this (late) post is “wringing” techniques, which are a way of converting a statement like “{ P(x^n)} and { Q(x^n)} are close” into a statement about per-letter probabilities. Such a statement looks like “we can find some indices and values at those indices such that if we condition on those values, the marginals of { P} and { Q} are close.” I’m going to skip straight to the simpler of the two “Wringing Lemmas” in this paper and sketch its proof. Then I’ll go back and state the first one (without proof). There are some small typos in the original which I will try to correct here.

Lemma 1 Let { P} and { Q} be probability distributions on { \mathcal{X}^n} such that for a positive constant { c},

\displaystyle  	P(x^n) = (1 + c) Q(x^n) \qquad \forall x^n \in \mathcal{X}^n 	 	\ \ \ \ \ (1)

then for any { 0 < \gamma < c} and {0 \le \epsilon < 1} there exist { t_1, t_2, \ldots, t_k \in \{1, 2, \ldots, n\}}, where

\displaystyle  	0 \le k \le c/\gamma

such that for some { \bar{x}_{t_1}, \bar{x}_{t_2}, \ldots, \bar{x}_{t_k}},

\displaystyle  	P(x_t | \bar{x}_{t_1}, \bar{x}_{t_2}, \ldots, \bar{x}_{t_k}) 		\le \max\{ (1 + \gamma) Q(x_t | \bar{x}_{t_1}, 			\bar{x}_{t_2}, \ldots, \bar{x}_{t_k}), \epsilon \}

for all { x_t \in \mathcal{X}} and all { t = 1, 2, \ldots, n}. Furthermore,

\displaystyle  	P(\bar{x}_{t_1}, \bar{x}_{t_2}, \ldots, \bar{x}_{t_k}) \ge \epsilon^{k}.

The proof works pretty mechanically. Fix a {\gamma} and {\epsilon}. Suppose that the conclusion doesn’t hold for {k = 0}. Then there is a {t_1} and {\bar{x}_{t_1}} such that

\displaystyle  	(1 + c) Q(\bar{x}_{t_1}) \ge P(\bar{x}_{t_1}) \ge \max((1 + \gamma) Q(\bar{x}_{t_1}),\epsilon),

where the first bound is from the assumption and the second because the conclusion doesn’t hold. Then we have {P(\bar{x}_{t_1}) > \epsilon}. Now dividing (1) by {P(\bar{x}_{t_1})} and using the lower bound on {P(\bar{x}_{t_1})} we get for all {x^n \in \mathcal{X}^n},

\displaystyle  	P(x^n | \bar{x}_{t_1}) \le \frac{1 + c}{1 + \gamma} Q(x^n | \bar{x}_{t_1}) 	 	\ \ \ \ \ (2)

Now, either the conclusion is true or we use (2) in place of (1) to find a {t_2} and a {\bar{x}_{t_2}} such that

\displaystyle  	P(x^n | \bar{x}_{t_1},\bar{x}_{t_2}) \le \frac{1 + c}{(1 + \gamma)^2} Q(x^n | \bar{x}_{t_1},\bar{x}_{t_2}) 	 	\ \ \ \ \ (3)

We can keep going in this way until we get {(1 + \gamma)^k} in the denominator. This yields the condition on {k}.

Rather than describe how to apply this to our coding problem, I want to contrast this rather simple statement to the other wringing lemma in the paper, which is a generalization of the lemma developed by Dueck. This one is more “information theoretic” in the sense that it says if the block mutual information is small, then we can find positions to condition on such that the per-letter mutual informations are small too.

Lemma 2 Let {X^n} and {Y^n} be {n}-tuples of random variables from discrete finite alphabets {\mathcal{X}^n} and {\mathcal{Y}^n} and assume that

\displaystyle  	I(X^n; Y^n) \le \sigma.

Then for any { 0 < \delta < \sigma} and there exist { t_1, t_2, \ldots, t_k \in \{1, 2, \ldots, n\}}, where

\displaystyle  	0 \le k \le 2 \sigma/\delta

such that for some {\{ (\bar{x}_{t_i}, \bar{y}_{t_i}) : i = 1 ,2 \ldots, k\}},

\displaystyle  	I(X_t ; Y_t ~|~ (X_{t_i},Y_{t_i}) = (\bar{x}_{t_i}, \bar{y}_{t_i}), i = 1 ,2 \ldots, k) \le \delta.

for all { t = 1, 2, \ldots, n}. Furthermore,

\displaystyle  	\mathbb{P}((X_{t_i},Y_{t_i}) = (\bar{x}_{t_i}, \bar{y}_{t_i}), i = 1 ,2 \ldots, k) \ge \left( \frac{\delta}{|\mathcal{X}||\mathcal{Y}| (2 \sigma - \delta)} \right)^{k}.

This lemma proved in basically the same way as the previous lemma, but it’s a bit messier. Dueck’s original version was the similar but gave a bound on the conditional mutual information, rather than this bound where we condition on particular values. This way of taking a small mutual information and getting small per-letter mutual informations can be useful outside of this particular problem, I imagine.

Next time I’ll show how to apply the first lemma to our codebooks in the converse for the MAC.


4 thoughts on “The strong converse for the MAC, part 3

  1. If you had to give a name to these techniques other than “wringing”, which one would you choose? …

    I thought that for a given code, “wringing” refered to finding a maximal error subcode with the same rates such that it satisfied expressions (4.1) and (4.2) …

    • I think of it as a bit more general than that — you are “wringing” the dependence out of the two sequences/distributions by conditioning on some selected values. It’s a way of taking global correlation and squeezing/wringing it into local correlation bounds.

      Most of the applications seem to be in finding subcodes (which is how I think about it), but there should be other situations in which you want to do things like this. This is why Lemma 4 (= Lemma 1 above) looks more broadly applicable than Lemma 3 (= Lemma 2 above): it doesn’t explicitly deal with mutual information expressions.

  2. Pingback: The strong converse for the MAC, part 4 « An Ergodic Walk

  3. Pingback: The strong converse for the MAC summary « An Ergodic Walk

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.