Banff blog

I figured I would blog about this week’s workshop at Banff in a more timely fashion. Due to the scheduling of flights out of Calgary, I will have to miss the last day of talks. The topics of people’s presentations varied rather widely, and many were not about the sort of Good-Turing estimator setup. Sometimes it was a bit hard to see how to see how the problems or approaches were related (not that they had to be directly), but given that the crowd had widely varying backgrounds, presenters had a hard time because the audience had to check in a new set of notation or approach for every talk. The advantage is that there were lots of questions — the disadvantage is that people insisted on “finishing” their presentations. By mid-week my brain was over-full, and a Wednesday afternoon hike up Sulphur Mountain was the perfect solution.

The view from Sulpur Mountain

The view from Sulpur Mountain

Continue reading

Banfffffffffffffff

I’ve just arrived in chilly but beautiful Banff for a workshop on Information theory and statistics for large alphabets. I’m looking forward to it, although I will have to miss the last day due to the timing of flights out of Calgary that get me to Chicago before midnight. My itineraries there and back seem especially perverse : ORD-SEA-YYC and YYC-SFO-ORD. However, thanks to the new gig I have a new laptop with a functional battery so I am doing a bit more busy-work and less New Yorker reading in the plane. I might try to write a bit more about the topics in the workshop — although the topic seems focused, there are a wide range of approaches and angles to take on the problem of estimating probabilities/prevalences in situations where you may not get to see each outcome once. Certainly I hope I can get the journal version of a paper from last year’s Allerton squared away.

Allerton 2011

Despite Yury‘s attempts to get me to “stop blogging,” here is my much-delayed recap of Allerton. Because of moving to TTI-Chicago right after Allerton and then almost immediately shuttling back to UCSD for the iDASH Privacy Workshop, things have been a bit delayed. I could only attend for two days but I wanted to highlight a few interesting talks that I saw. More Allerton blogging was done by Michael Mitzenmacher (part 1, part 2) and a bit by Maxim Raginsky about his talk on causal calculus (since he blogged about it I don’t have to, ha!). The conference has gotten too big and the rooms are too small to hold the audience, so it probably is time to move the thing. We have similar issues at ITA and the 2012 ITA Workshop is moving off campus next year (you heard it here first, folks!)

But here are some interesting talks I saw:

Continue reading

The strong converse for the MAC, part 4

In the previous post we saw how to take two distributions {P} and {Q} on {\mathcal{X}^n} that are close and show that there must be some fixed values {\{ \bar{x}_{t_i} : i = 1, 2, \ldots, k\}} for individual variables {\{X_{t_i} : i = 1, 2, \ldots, k\}} such that the single letter marginals of {P} and {Q} are close when you condition on {X_{t_i} = \bar{x}_{t_i}}.

Corollary 1 Let {\mathcal{A} \subset \{1,2, \ldots, M\} \times \{1,2,\ldots,N\}} and suppose {|\mathcal{A} \ge (1 - \lambda^{\ast}) MN} and let {\{(u_i,v_j, D_{ij}) : (i,j) \in \mathcal{A}\}} be a code for the MAC with maximal error probability {\lambda}. Then for an {0 < \gamma < c = \frac{\lambda^{\ast}}{1 - \lambda^{\ast}}} and {0 \le \epsilon < } there exist {\{t_1,\ldots, t_k\}} for {k \le c/\gamma} and fixed values {\{(\bar{x}_{t_i},\bar{y}_{t_i})\}} such that the subcode formed by

\displaystyle  	\bar{\mathcal{A}} = \{ (i,j) : u_{i,t_i} = \bar{x}_{t_i}, v_{i,t_i} = \bar{y}_{t_i} \}

has maximal error {\lambda},

\displaystyle  \begin{array}{rcl}  	|\bar{\mathcal{A}}| &\ge& \epsilon^k |\mathcal{A}| \\ 	\bar{M} &=& | \{ u_i : (i,j) \in \bar{\mathcal{A}} \} | \ge \epsilon^{k} M \\ 	\bar{N} &=& | \{ v_i : (i,j) \in \bar{\mathcal{A}} \} | \ge \epsilon^{k} N 	\end{array}

and for all {x \in \mathcal{X}} and {y \in \mathcal{Y}} and all {t = 1,2, \ldots, n},

\displaystyle  \begin{array}{rcl}  	\mathbb{P}( \bar{X}_t = x, \bar{Y}_t = y ) 		&\ge& 		(1 + \gamma) \mathbb{P}(\bar{X}_t = x) \mathbb{P}(\bar{Y}_t = y) - \gamma \\  & & \qquad - |\mathcal{X}||\mathcal{Y}|\epsilon \\ 	\mathbb{P}( \bar{X}_t = x, \bar{Y}_t = y ) 		&\le& 		\max\left\{ (1 + \gamma) \mathbb{P}(\bar{X}_t = x) \mathbb{P}(\bar{Y}_t = y), \epsilon \right\}, 	\end{array}

where {\bar{X}^n} and {\bar{Y}^n} are distributed according to the Fano distribution of the subcode.

That is, we can condition on some values at {\{t_i\}} and (a) not lose too much rate for each user and (b) the joint distribution of the subcode is almost a product distribution. Remember, the Fano distribution is just the uniform distribution on the codewords of the subcode induced by {\bar{\mathcal{A}}}.

So how does this result follow from the Wringing Lemma discussed before? Let {P} be the Fano distribution of the original code, and let {Q} be the product distribution induced by the marginal distributions on the code:

\displaystyle  \begin{array}{rcl}  P_{X^n Y^n}(u_i,v_j) &=& \frac{1}{|\mathcal{A}|} \\ 	Q_{X^n Y^n}(u_i,v_j) &=& P_{X^n}(u_i) P_{Y^n}(v_j). 	\end{array}

Then just churn through the implications of the Lemma.

So putting the whole argument together, we start first with an {(n,M,N,\bar{\lambda})} code for the MAC and then we get a subcode from pairs {\mathcal{A}} defined before. This code satisfies rate bounds which look almost like what we want — they have sums of mutual informations like {I(X_t ; Z_t | Y_t)} except that {(X^n,Y^n)} are distributed uniformly on {\mathcal{A}}, so they are not independent. We can then apply the corollary above to get that there is a sub-subcode indexed by {\bar{\mathcal{A}}} for which the Fano distribution on {\bar{\mathcal{A}}} is almost independent, and for which we have not lost too much rate. Now we have rate bounds with a sum of terms {I(\bar{X}_t ; \bar{Z}_t | \bar{Y}_t)} where the inputs are almost independent, so then we can use the continuity of mutual information and entropy to get upper bounds with sums of {I(\hat{X}_t ; \hat{Z}_t | \hat{Y}_t)} where the inputs are finally independent.

There are three steps here — first you find a subcode with more uniform error properties but dependence between the codewords, then you wring out the dependence without giving up too much rate, and finally you substitute in the independent inputs without giving up too much rate. And that’s the value of wringing arguments: in order to get the nice structure from the subcode with {\mathcal{A}} you have to get back the independence of inputs that you lost.

The strong converse for the MAC, part 3

This post was written using Luca’s LATEX2WP tool.

The topic of this (late) post is “wringing” techniques, which are a way of converting a statement like “{ P(x^n)} and { Q(x^n)} are close” into a statement about per-letter probabilities. Such a statement looks like “we can find some indices and values at those indices such that if we condition on those values, the marginals of { P} and { Q} are close.” I’m going to skip straight to the simpler of the two “Wringing Lemmas” in this paper and sketch its proof. Then I’ll go back and state the first one (without proof). There are some small typos in the original which I will try to correct here.

Lemma 1 Let { P} and { Q} be probability distributions on { \mathcal{X}^n} such that for a positive constant { c},

\displaystyle  	P(x^n) = (1 + c) Q(x^n) \qquad \forall x^n \in \mathcal{X}^n 	 	\ \ \ \ \ (1)

then for any { 0 < \gamma < c} and {0 \le \epsilon < 1} there exist { t_1, t_2, \ldots, t_k \in \{1, 2, \ldots, n\}}, where

\displaystyle  	0 \le k \le c/\gamma

such that for some { \bar{x}_{t_1}, \bar{x}_{t_2}, \ldots, \bar{x}_{t_k}},

\displaystyle  	P(x_t | \bar{x}_{t_1}, \bar{x}_{t_2}, \ldots, \bar{x}_{t_k}) 		\le \max\{ (1 + \gamma) Q(x_t | \bar{x}_{t_1}, 			\bar{x}_{t_2}, \ldots, \bar{x}_{t_k}), \epsilon \}

for all { x_t \in \mathcal{X}} and all { t = 1, 2, \ldots, n}. Furthermore,

\displaystyle  	P(\bar{x}_{t_1}, \bar{x}_{t_2}, \ldots, \bar{x}_{t_k}) \ge \epsilon^{k}.

The proof works pretty mechanically. Fix a {\gamma} and {\epsilon}. Suppose that the conclusion doesn’t hold for {k = 0}. Then there is a {t_1} and {\bar{x}_{t_1}} such that

\displaystyle  	(1 + c) Q(\bar{x}_{t_1}) \ge P(\bar{x}_{t_1}) \ge \max((1 + \gamma) Q(\bar{x}_{t_1}),\epsilon),

where the first bound is from the assumption and the second because the conclusion doesn’t hold. Then we have {P(\bar{x}_{t_1}) > \epsilon}. Now dividing (1) by {P(\bar{x}_{t_1})} and using the lower bound on {P(\bar{x}_{t_1})} we get for all {x^n \in \mathcal{X}^n},

\displaystyle  	P(x^n | \bar{x}_{t_1}) \le \frac{1 + c}{1 + \gamma} Q(x^n | \bar{x}_{t_1}) 	 	\ \ \ \ \ (2)

Now, either the conclusion is true or we use (2) in place of (1) to find a {t_2} and a {\bar{x}_{t_2}} such that

\displaystyle  	P(x^n | \bar{x}_{t_1},\bar{x}_{t_2}) \le \frac{1 + c}{(1 + \gamma)^2} Q(x^n | \bar{x}_{t_1},\bar{x}_{t_2}) 	 	\ \ \ \ \ (3)

We can keep going in this way until we get {(1 + \gamma)^k} in the denominator. This yields the condition on {k}.

Rather than describe how to apply this to our coding problem, I want to contrast this rather simple statement to the other wringing lemma in the paper, which is a generalization of the lemma developed by Dueck. This one is more “information theoretic” in the sense that it says if the block mutual information is small, then we can find positions to condition on such that the per-letter mutual informations are small too.

Lemma 2 Let {X^n} and {Y^n} be {n}-tuples of random variables from discrete finite alphabets {\mathcal{X}^n} and {\mathcal{Y}^n} and assume that

\displaystyle  	I(X^n; Y^n) \le \sigma.

Then for any { 0 < \delta < \sigma} and there exist { t_1, t_2, \ldots, t_k \in \{1, 2, \ldots, n\}}, where

\displaystyle  	0 \le k \le 2 \sigma/\delta

such that for some {\{ (\bar{x}_{t_i}, \bar{y}_{t_i}) : i = 1 ,2 \ldots, k\}},

\displaystyle  	I(X_t ; Y_t ~|~ (X_{t_i},Y_{t_i}) = (\bar{x}_{t_i}, \bar{y}_{t_i}), i = 1 ,2 \ldots, k) \le \delta.

for all { t = 1, 2, \ldots, n}. Furthermore,

\displaystyle  	\mathbb{P}((X_{t_i},Y_{t_i}) = (\bar{x}_{t_i}, \bar{y}_{t_i}), i = 1 ,2 \ldots, k) \ge \left( \frac{\delta}{|\mathcal{X}||\mathcal{Y}| (2 \sigma - \delta)} \right)^{k}.

This lemma proved in basically the same way as the previous lemma, but it’s a bit messier. Dueck’s original version was the similar but gave a bound on the conditional mutual information, rather than this bound where we condition on particular values. This way of taking a small mutual information and getting small per-letter mutual informations can be useful outside of this particular problem, I imagine.

Next time I’ll show how to apply the first lemma to our codebooks in the converse for the MAC.

Greetings from the School of IT in Austin

I’m at the 2011 School of Information Theory at UT Austin — I’m mostly here to help out with the activities on Monday, which I will blog about after they happen (it’s a secret!) I missed most of the talks, but I went to the first part of Bob Gray’s lecture this morning on stationary codes and the connections between information theory and ergodic theory. Here’s his set of inclusions of the different species of random processes:

IID \subset B-processes \subset K-processes \subset strongly mixing \subset weakly mixing \subset stationary ergodic \subset stationary \subset block stationary \subset asymptotically stationary \subset asymptotically mean stationary

B-processes are the result of applying stationary codes to IID process and K-processes are those which satisfy the Kolmogorov 0-1 law. The last one implies that sample averages converge. It’s only appropriate to mention it, given the title of this blog.

The School is a nice event with a mixture of classical and “hot topics” in information theory. Even though my own research interests have shifted a bit away from the hard core of IT, there are still fun things to think about and it’s nice to see some new new and new old results.

The strong converse for the MAC

I have been reading Ahlswede’s paper on An Elementary Proof of the Strong Converse Theorem for the Multiple-access Channel, J. Combinatorics, Information and System Sciences, Vol. 7, No. 3, 216-230. The preprint is here but I have a scanned version of the journal version thanks to the intrepid UC Libraries staff — they are a great resource!

Continue reading