Three classic results on the AVC

I wrote a summary of three classical results on arbitrarily varying channels that I think make a nice story. The first is the original 1960 proof by Blackwell, Breiman, and Thomasian. The second is Ahlswede’s elimination technique. The last is the symmetrizability result of Csiszár and Narayan. Together, these three results settle some basic questions regarding arbitrarily varying channels, and I tried to make the exposition self-contained and readable by people who have had a basic course in information theory.

a quote that gives some comfort

From Van H. Vu’s “Concentration of Non-Lipschitz Functions and Applications,” Random Structures and Algorithms 20 : 262–316, 2002 :

Finding this hypothesis was actually the hardest part of our work, requiring some insight and a numerous number of attempts. As the reader will see after reading Section 5, a proper induction hypothesis not only allows us to derive a powerful result, but also reduces the proof to a rather routine verification of a few conditions.

I have spent a long time reading through difficult proofs with numerous lemmata, wondering why it had to be so complicated. Some things have to be proved by brute force. For others, just phrasing the problem in the right way can make the proof seem trivial. Some might say “well, I could have done that,” but the more accurate response is “I wish I had thought to do it that way.”

adventures in grant administration

I just learned the Berkeley way of doing financial support for TA’s and RA’s here. The payroll paperwork for the appointment is sent down to the Graduate Division, which up-front pays the University for tuition and fees. They then turn around and charge the department or grant per month for the money over the course of the semester. That way, if you’re an RA and get fired after one month, the Grad Division will turn around and charge you for the last 2/3 of the tuition and fees.

For those who are TA-ing this semester, the bus pass fee and student life fee are not covered by the teaching appointment, so those have to be paid out of pocket. I don’t recall this being what happened in the past, so in my mind that’s a real step back for TAs. This fee is only waived for first-year grad students, presumably to lull them into the false sense that Berkeley is a place which fully supports its graduate student instructors.

I know that these policies happen for good reasons, namely to prevent waste and trim budgets. I’m sure that similar things happen at most other universities. But the way the bureaucracy evolved seems so bizarre to me at times.

Funkcialaj Ekvacioj

While poking around for some references today I took a detour through the Mathematical Reviews. I heard that Rota had written some zingers, but one of the first of his that I saw, on the Riemann-Liouville integral in the complex field contained the following comment:

Many of the results are essentially known, but this seems the best treatment in the literature, which would well justify a previous brief study of Esperanto for the interested reader.

A quick investigation shows that the paper is indeed in Esperanto!

The journal is Funkcialaj Ekvacioj, which presumably means “functional equations” in Esperanto. The entire editorial board is Japanese, and since its inception most of the contributing authors have been Japanese. The journal has been going since 1958, and they have full archives online (vol. 1-40 are free). At the beginning, they had articles in English, French, and Esperanto, with the abstract (if necessary) translated into Esperanto. That stopped as early as the 1970s (more’s the pity), and articles are just written in English, but the journal’s title remains in Esperanto.

This really made my day — maybe I should translate an article into Esperanto… except that I’d have to learn Esperanto first.

paper a day : a counterexample for Gaussian feedback capacity

This is kind of a cop-out, since it’s a two-page “note” rather than a real paper, but since I’m so out of practice of reading and synthesizing papers I think that it’ll be a baby-step back into the swing of things. This paper is another information-theory one, but it uses a more general strategy that is easily explained to a general audience.

A Counterexample to Cover’s 2P Conjecture on Gaussian Feedback Capacity
Young-Han Kim
ArXiV : cs.IT/0511019

Imagine you have a communication channel that can be modeled as:

where X is the input, Y is the output, and Z is a stationary Gaussian process (not necessarily white noise, but possibly shaped with some spectrum). The approach information theory uses is to say that we’ll limit ourselves to a block of n time steps and then pick a set of 2nR sequences of length n that will encode 2nR messages. These sequences are called codewords and R is the rate of the code, measured in bits/time. If you use n = 10 time steps at rate R, you get a total of 10 R bits, which can stand for one of 210 R messages. A coding system for this model picks a set of codewords and a decoder that will take the noisy output Y = X + Z and guess which codeword was sent. Shannon’s Noisy Coding Theorem says that there’s an number C, called the capacity of the channel, such that for any rate R below C, the probabiliy of making a decoding error goes to 0 as the blocklength n goes to infinity.

Of course, you have constrained resources, and in the above Gaussian problem we assume that the power (squared Euclidean norm) of the codewords is constrained to n P. If we fix the noise power as well, the capacity is neatly parameterized in terms of P and we can denote it by C(P). The paper here is about the role of feedback — if the encoder knows exactly what the decoder receives, then picking codewords “in advance” may not be a good idea, since the encoder can adapt what it sends based on “how confused” the decoder is. There’s a longstanding conjecture that feedback can’t improve the actual capacity by more than doubling the effective power, so that CFB(P) is smaller than C(2P).

It turns out this is false, and the scheme used to show it is a modification of an old one by Schalkwijk and Kailath (Transactions on Information Theory, 1966). The intution is that with perfect feedback, the encoder knows exactly what the decoder does, and therefore can mimic the decoder’s operation perfectly. Imagine Alice is talking to Bob, but Alice knows exactly what Bob thinks she is saying. Using this knowledge, she can tailor the next thing she says to correct Bob’s misunderstanding as best she can. Bob will still not quite get it, but will be closer, so Alice can iteratively zoom on the precise meaning until she’s satisfied that Bob has gotten close enough.

Another way of thinking about it is helping someone hang a picture. You can see if the picture is level or not, and you issue commands: “two inches higher on the left,” “one inch on the right,” “half an inch on the left,” “a nudge more on the right,” until it’s hung straight. This is precisely the intuition behind the Schalkwijk-Kailath coding scheme. With this feedback you can get more precise than you could without, and similarly in this Gaussian communication example, you can communicate more bits reliably with feedback than without.

The result is not earth-shattering, since there’s another result that says you can get at most half a bit per unit time with feedback than without. But it’s a nice application of the coding scheme and a cool demonstration of the usefulness of concrete examples.

default paper size for debian/ubuntu

After months of just dealing with my printing being “messed up.” I discovered the tiny file /etc/papersize, which contained only two characters — “a4.” After replacing those with “letter” today my life has improved noticeably. Of course, this wasn’t an easy-to-find system configuration — maybe they should consider adding it to the GNOME system config tools.

we’re jammin’

I’ve been meaning to write some notes towards a contextualization of the research I’ve been doing lately, as part of an effort to find a better angle on things and also spur myself to actually tackle some harder/more interesting problems than I’ve looked at so far. At the moment I’m looking at communication scenarios in which there is some unknown interference. The crux of the matter is that the communication strategy must be designed knowing that the interference is there, but not what it is like. There is a disconnect between two different strands of research on this subject (so much so that the papers in one don’t cite the other as relevant). In both the interference is viewed as coming from a jammer who wants to screw over your communication.

In the world of arbitrarily varying channels (Blackwell, Breiman and Thomasian, Ahlswede, Csiszar and Narayan, Hughes and Narayan, etc), mostly populated by mathematicians, interference is treated as malicious but not listening in. Perhaps a better way to view it is that you want to provide guarantees even if the interference behaves in the worst way possible. This worst-case analysis is often no different from the case when you have a statistical description of the analysis — the average and worst-case scenarios are quite similar. However, in some situations the worst-case is significantly worse than the average case. Here, if a secret key is shared between the transmitter and receiver, we can recover the average-case behavior, although the reliabiliity of the communication may be worse.

On the other side, we have wire-tap situations in which the jammer not only knows your encoding scheme, but also can try to figure out what you’re sending and actively cancel it out. Most of these analyses initially took the form of game-theory set-ups in which one player tries to maximize a function (the capacity) and the other tries to minimize it. The space of strategies for each player take the form of choosing probability distributions. In this highly structured framework you can prove minimax theories about the jammer and and encoder strategies. Extensions to this view take into account channel fading for wireless, or interference for multiple access (Shafiee and Ulukus).

But never the twain shall meet, as Kipling would say, and it’s hard to sort out the connections between interference, jamming, statistical knowledge, and robustness. Part of what I have to do is sort out what is what and see if I can make any more statements stitching the two together and finding real scenarios in which the unreasonableness of the models can be made more reasonable.