I figured I would blog about this week’s workshop at Banff in a more timely fashion. Due to the scheduling of flights out of Calgary, I will have to miss the last day of talks. The topics of people’s presentations varied rather widely, and many were not about the sort of Good-Turing estimator setup. Sometimes it was a bit hard to see how to see how the problems or approaches were related (not that they had to be directly), but given that the crowd had widely varying backgrounds, presenters had a hard time because the audience had to check in a new set of notation or approach for every talk. The advantage is that there were lots of questions — the disadvantage is that people insisted on “finishing” their presentations. By mid-week my brain was over-full, and a Wednesday afternoon hike up Sulphur Mountain was the perfect solution.
Tag Archives: information theory
Banfffffffffffffff
I’ve just arrived in chilly but beautiful Banff for a workshop on Information theory and statistics for large alphabets. I’m looking forward to it, although I will have to miss the last day due to the timing of flights out of Calgary that get me to Chicago before midnight. My itineraries there and back seem especially perverse : ORD-SEA-YYC and YYC-SFO-ORD. However, thanks to the new gig I have a new laptop with a functional battery so I am doing a bit more busy-work and less New Yorker reading in the plane. I might try to write a bit more about the topics in the workshop — although the topic seems focused, there are a wide range of approaches and angles to take on the problem of estimating probabilities/prevalences in situations where you may not get to see each outcome once. Certainly I hope I can get the journal version of a paper from last year’s Allerton squared away.
Allerton 2011
Despite Yury‘s attempts to get me to “stop blogging,” here is my much-delayed recap of Allerton. Because of moving to TTI-Chicago right after Allerton and then almost immediately shuttling back to UCSD for the iDASH Privacy Workshop, things have been a bit delayed. I could only attend for two days but I wanted to highlight a few interesting talks that I saw. More Allerton blogging was done by Michael Mitzenmacher (part 1, part 2) and a bit by Maxim Raginsky about his talk on causal calculus (since he blogged about it I don’t have to, ha!). The conference has gotten too big and the rooms are too small to hold the audience, so it probably is time to move the thing. We have similar issues at ITA and the 2012 ITA Workshop is moving off campus next year (you heard it here first, folks!)
But here are some interesting talks I saw:
Safe travels to ISIT 2011
Bon voyage and safe travels to all those headed to St. Petersburg for ISIT 2011. Perhaps Max will blog about it, since I am not going to attend. Or if any of you see Alex, maybe you can convince him to contribute a post or two here…
The strong converse for the MAC summary
The strong converse for the MAC, part 4
In the previous post we saw how to take two distributions and
on
that are close and show that there must be some fixed values
for individual variables
such that the single letter marginals of
and
are close when you condition on
.
Corollary 1 Let
and suppose
and let
be a code for the MAC with maximal error probability
. Then for an
and
there exist
for
and fixed values
such that the subcode formed by
has maximal error
,
and for all
and
and all
,
where
and
are distributed according to the Fano distribution of the subcode.
That is, we can condition on some values at and (a) not lose too much rate for each user and (b) the joint distribution of the subcode is almost a product distribution. Remember, the Fano distribution is just the uniform distribution on the codewords of the subcode induced by
.
So how does this result follow from the Wringing Lemma discussed before? Let be the Fano distribution of the original code, and let
be the product distribution induced by the marginal distributions on the code:
Then just churn through the implications of the Lemma.
So putting the whole argument together, we start first with an code for the MAC and then we get a subcode from pairs
defined before. This code satisfies rate bounds which look almost like what we want — they have sums of mutual informations like
except that
are distributed uniformly on
, so they are not independent. We can then apply the corollary above to get that there is a sub-subcode indexed by
for which the Fano distribution on
is almost independent, and for which we have not lost too much rate. Now we have rate bounds with a sum of terms
where the inputs are almost independent, so then we can use the continuity of mutual information and entropy to get upper bounds with sums of
where the inputs are finally independent.
There are three steps here — first you find a subcode with more uniform error properties but dependence between the codewords, then you wring out the dependence without giving up too much rate, and finally you substitute in the independent inputs without giving up too much rate. And that’s the value of wringing arguments: in order to get the nice structure from the subcode with you have to get back the independence of inputs that you lost.
The strong converse for the MAC, part 3
This post was written using Luca’s LATEX2WP tool.
The topic of this (late) post is “wringing” techniques, which are a way of converting a statement like “ and
are close” into a statement about per-letter probabilities. Such a statement looks like “we can find some indices and values at those indices such that if we condition on those values, the marginals of
and
are close.” I’m going to skip straight to the simpler of the two “Wringing Lemmas” in this paper and sketch its proof. Then I’ll go back and state the first one (without proof). There are some small typos in the original which I will try to correct here.
Lemma 1 Let
and
be probability distributions on
such that for a positive constant
,
then for any
and
there exist
, where
such that for some
,
for all
and all
. Furthermore,
The proof works pretty mechanically. Fix a and
. Suppose that the conclusion doesn’t hold for
. Then there is a
and
such that
where the first bound is from the assumption and the second because the conclusion doesn’t hold. Then we have . Now dividing (1) by
and using the lower bound on
we get for all
,
Now, either the conclusion is true or we use (2) in place of (1) to find a and a
such that
We can keep going in this way until we get in the denominator. This yields the condition on
.
Rather than describe how to apply this to our coding problem, I want to contrast this rather simple statement to the other wringing lemma in the paper, which is a generalization of the lemma developed by Dueck. This one is more “information theoretic” in the sense that it says if the block mutual information is small, then we can find positions to condition on such that the per-letter mutual informations are small too.
Lemma 2 Let
and
be
-tuples of random variables from discrete finite alphabets
and
and assume that
Then for any
and there exist
, where
such that for some
,
for all
. Furthermore,
This lemma proved in basically the same way as the previous lemma, but it’s a bit messier. Dueck’s original version was the similar but gave a bound on the conditional mutual information, rather than this bound where we condition on particular values. This way of taking a small mutual information and getting small per-letter mutual informations can be useful outside of this particular problem, I imagine.
Next time I’ll show how to apply the first lemma to our codebooks in the converse for the MAC.
Greetings from the School of IT in Austin
I’m at the 2011 School of Information Theory at UT Austin — I’m mostly here to help out with the activities on Monday, which I will blog about after they happen (it’s a secret!) I missed most of the talks, but I went to the first part of Bob Gray’s lecture this morning on stationary codes and the connections between information theory and ergodic theory. Here’s his set of inclusions of the different species of random processes:
IID B-processes
K-processes
strongly mixing
weakly mixing
stationary ergodic
stationary
block stationary
asymptotically stationary
asymptotically mean stationary
B-processes are the result of applying stationary codes to IID process and K-processes are those which satisfy the Kolmogorov 0-1 law. The last one implies that sample averages converge. It’s only appropriate to mention it, given the title of this blog.
The School is a nice event with a mixture of classical and “hot topics” in information theory. Even though my own research interests have shifted a bit away from the hard core of IT, there are still fun things to think about and it’s nice to see some new new and new old results.
The strong converse for the MAC
I have been reading Ahlswede’s paper on An Elementary Proof of the Strong Converse Theorem for the Multiple-access Channel, J. Combinatorics, Information and System Sciences, Vol. 7, No. 3, 216-230. The preprint is here but I have a scanned version of the journal version thanks to the intrepid UC Libraries staff — they are a great resource!
In Memoriam Jack Wolf
By now, many of those in the Information Theory community have probably heard that Jack Wolf passed away last week. I didn’t get much of a chance to interact with Prof. Wolf during my time here at UCSD, but every time I did talk to him he was very warm and welcoming. He will really be missed.
