# Greetings from the School of IT in Austin

I’m at the 2011 School of Information Theory at UT Austin — I’m mostly here to help out with the activities on Monday, which I will blog about after they happen (it’s a secret!) I missed most of the talks, but I went to the first part of Bob Gray’s lecture this morning on stationary codes and the connections between information theory and ergodic theory. Here’s his set of inclusions of the different species of random processes:

IID $\subset$ B-processes $\subset$ K-processes $\subset$ strongly mixing $\subset$ weakly mixing $\subset$ stationary ergodic $\subset$ stationary $\subset$ block stationary $\subset$ asymptotically stationary $\subset$ asymptotically mean stationary

B-processes are the result of applying stationary codes to IID process and K-processes are those which satisfy the Kolmogorov 0-1 law. The last one implies that sample averages converge. It’s only appropriate to mention it, given the title of this blog.

The School is a nice event with a mixture of classical and “hot topics” in information theory. Even though my own research interests have shifted a bit away from the hard core of IT, there are still fun things to think about and it’s nice to see some new new and new old results.

# the job market : so many applicants

I just got a rejection from the CS department at Brown, and they sadly neglected to Bcc the recipients, so I now know that they rejected 362 people in one fell swoop. Just glancing through the addresses I recognized several of the recipients. I think, based on my limited expertise in privacy-preserving algorithms, that this is pretty much satisfies the Dinur-Nissim definition of blatant non-privacy: if there are $n$ applicants, I have reconstructed $n - o(n)$ of them, since I can’t imagine that they would interview more than $o(n)$. Ok that was a little more nerdy than I intended. I do think that they deserve a wag of the finger.

Update: I just got a followup saying that the sender “would like to recall the message…” Alas, no.

Update 2: Another followup came in saying to “please ignore and delete” the previous message. Does this mean I still have a chance?!? (Again, alas, no).

I kind of like this version of Take Five from Sachal Music.

Sometimes the Library of Congress does awesome things. This jukebox is up there.

I wouldn’t have believed before that there is money in a bannass stand, but I could be wrong.

The clarity in this press nugget leaves a lot to be desired. The statement “the trio has found a way to determine the smallest number of nodes that must be externally controlled to force a given network from any initial state to any desired final state,” is so vague! The full article is here. It turns out they are looking at a linear control problem $d\mathbf{x}/dt = A \mathbf{x}(t) + B \mathbf{u}(t)$ where the different elements of the state are related via a graph matched to $A$ and you want the input $\mathbf{u}(t)$ to only be nonzero on a subset of the nodes. Thanks to Ann Wehman for the pointer.

# The strong converse for the MAC

I have been reading Ahlswede’s paper on An Elementary Proof of the Strong Converse Theorem for the Multiple-access Channel, J. Combinatorics, Information and System Sciences, Vol. 7, No. 3, 216-230. The preprint is here but I have a scanned version of the journal version thanks to the intrepid UC Libraries staff — they are a great resource!

# In Memoriam Jack Wolf

By now, many of those in the Information Theory community have probably heard that Jack Wolf passed away last week. I didn’t get much of a chance to interact with Prof. Wolf during my time here at UCSD, but every time I did talk to him he was very warm and welcoming. He will really be missed.

# Responses to reviewers : raw TeX please

I am revising a paper now and one of the reviewers sent their comments as a PDF what looked like a Word document or RTF containing the LaTeX, rather than a rendered version. So it was a little annoying to read, what with all of the \$’s and so on. The beauty of it was that I could just cut and paste from the PDF into my document of responses to the reviewers without having to reformat and it (almost) rendered without a hitch. There were some lingering issues though with quotation marks (I hate smart quotes) and itemizing/enumerating lists.

As a recommendation, I think that the raw .tex file of the review should be uploaded instead. That will make it much easier for the authors to revise, no? I plan on doing this in the future. What do you do?

# Sampling one distribution out of another

I am reading this nice paper by Harsha et al on The Communication Complexity of Correlation and some of their results rely on the following cute lemma (slightly reworded).

Let $P$ and $Q$ be two distributions on a finite set $\mathcal{X}$ such that the KL-divergence $D(P \| Q) > 0$. Then there exists a sampling procedure which, when given a sequence of samples $x_1, x_2, \ldots$ drawn i.i.d. according to $Q$, will output (with probability 1) an index $i^{\ast}$ such that the sample $x_{i^{\ast}}$ is distributed according to the distribution $P$ and the expected encoding length of the index $i^{\ast}$ is at most

$D(P \| Q) + 2 \log( D(P \| Q) + 1 ) + O(1)$

where the expectation is taken over the sample sequence and the internal randomness of the procedure.

So what is the procedure? It’s a rejection sampler that does the following. First set $p_0(x) = 0$ for all $x \in \mathcal{X}$ and set $p_0^{\ast} = 0$ as well. Then for each $i = 1, 2, \ldots$ do the following:

1. $\alpha_i(x) = \min\{ P(x) - p_{i-1}(x), \ (1 - p^{\ast}_{i-1}) Q(x) \}$
2. $p_i(x) = p_{i-1}(x) + \alpha_i(x)$
3. $p_{i^{\ast}} = \sum_{x} p_{i}(x)$
4. $\beta_i = \alpha_i(x_i)/( (1 - p_{i-1}^{\ast}) Q(x_i) )$
5. Output $i$ with probability $\beta_i$.

Ok, so what is going on here? It turns out that $\alpha_i(x)$ is tracking the probability that the procedure halts at $i$ with $x_i = x$ (this is not entirely clear at first). Thus $p_i(x)$ is the probability that we halted before time $i$ and the sample is $x$, and $p_i^{\ast}$ is the probability that we halt at time $i$. Thus $(1 - p_{i-1}^{\ast})$ is the probability that the procedure gets to step $i$. Getting back to $\alpha_i(x)$, we see that $x_i = x$ with probability $Q(x_i)$ and we halt with probability $\beta_i$, so indeed, $\alpha_i(x)$ is the probability that we halt at time $i$ with $x_i = x$.

What we want then is that $P(x) = \sum_{i=1}^{\infty} \alpha_i(x)$. So how should we define $\alpha_i(x)$? It’s the minimum of two terms : in order to stop at time $i$ such that $x_i = x$, the factor $\alpha_i(x)$ is at most $(1 - p_{i-1}^{\ast}) Q(x)$. However, we don’t want to let $p_{i}(x) = \sum_{j=1}^{i} \alpha_j(x)$ exceed the target probability $P(x)$, so $\alpha_i(x)$ must be less than $P(x) - p_{i-1}(x)$. The authors say that in this sense the procedure is greedy, since it tries to get $p_i(x)$ as close to $P(x)$ as it can.

In order to show that the sampling is correct, i.e. that the output distribution is $P$, we want to show what $p_i(x) \to P(x)$. This follows by induction. To get a bound on the description length of the output index, they have to show that

$\mathbb{E}[ \log i^{\ast} ] \le D(P \| Q) + O(1)$.

This is an interesting little fact on its own. It comes out because

$i \le \frac{1}{1 - p_{i-1}^{\ast}} \cdot \frac{P(x)}{Q(x)} + 1$

and then expanding out $\mathbb{E}[ \log i^{\ast} ] = \sum_{x} \sum_{i} \alpha_i(x) \cdot \log i$.

One example application of this Lemma in the paper is the following problem of simulation. Suppose Alice and Bob share an unlimited amount of common randomness. Alice has some $X$ drawn according to $P(x)$ and wants Bob to generate a $Y$ according to the distribution $P(y | x)$. So ideally, $(X,Y)$ have joint distribution $P(x) P(y | x)$. They can both generate a sequence of common $y_1, y_2, \ldots$ according to the marginal distribution of $Y$. Then Alice tries to generate the distribution $P(y | x)$ according to this sampling scheme and sends the index $i^{\ast}$ to Bob, who chooses the corresponding $y_{i^{\ast}}$. How many bits does Alice have to send on average? It’s just

$\mathbb{E}_{P(x)}[ D( P(y|x) \| P(y) ) + 2 \log( D( P(y|x) \| P(y) ) + 1) + O(1) ]$.

which, after some love from Jensen’s inequality, turns out to be upper bounded by

$I(X ; Y) + 2 \log( I(X; Y) + 1) + O(1) ]$.

Pretty cute, eh?