ITA Workshop : general comments

The ITA Workshop was last week at UCSD, and as opposed to last year I decided to go down and attend. I had a good time, but it was a bit weird to be at a conference without presenting anything. It was worth it to get a snapshot of some of the things going on in information theory, and I got a few new ideas for problems that I should work on instead of blogging. But I find the exercise of blogging about the conference useful, and at least a few people have said some positive things about it. This time around I’m going to separate posts out by subject area, loosely. My attention and drive to attend talks decreased exponentially as the week progressed, more due to fatigue than anything else, so these posts may be short (a blessing for my friends who don’t care about information theory!) and more impressionistic at times.

One general gripe I had was that sessions were very de-synced from each other. Most session chairs were unable or unwilling to curtail speakers who went over, to the point where one session I attended finished after the break between sessions. I ended up missing a few talks I wanted to see because of this. I regard it as more of a failing on the part of the speaker — an experienced researcher with many conference talks under their belt should be know how to make a coherent 20 minute talk and not plan to run over. Dry runs can only tell you so much about timing, but one should be considerate towards the other speakers in the session and at the conference, no? I know this makes me sound a bit like a school-marm, but it bothers me to leave a talk before the theorem is presented so that I can make it to another talk.

I’ll write separately about the panel on publication issues, which raised some interesting points while dodging others. There was also a presentation by Dr. Sirin Tekinay, who is in charge of the NSF area under which information theory sits. I am woefully ignorant of the grant-writing process right now so I wasn’t sure how to take her comments, but it looks like a lot of emphasis is going to be on networks and cross-discipline work, as is the trend. Unfortunately, not all research can be said to have applications to networks, so that seems a bit unfortunate…

scaling laws as comedy

[Note : imagine B is from India.]

A: Oh my God!
B: What is it?
A: I just proved this great result!
B: Really???
A: Yeah, it’s an lower bound on the achievable rate!
B: So what is it?
A: Well my scheme shows it’s at least log! Log(N)!
B: Ok…
A: Isn’t that cool?
B: Seems a bit… low.
A: Well it’s not polynomial…
B: Hardly. Log(log(N))? You’ve got to be joking.
A: C’mon! Look, if you have a log, log(N) growth you can bootstrap that up to something better.
B: No you need to get rid of a log.
A: I did get rid of a log! It’s an improvement on Singh et al.
B: So it was log log log before?
A: No, log log.
B: So what’s your contribution?
A: Well it’s log log…
B: Exactly! Log log!
A: By log, do you…
B: Log log by log?
A: No, you have an extra two logs in there, it’s…
B: 1 by log? What the heck are you trying to prove!
A: It’s log! Log! Log!
B: I give up. Why don’t you come back when you’ve figured it it out. See if you can get it to log(N). [exits]

new paper : deterministic list codes for state-constrained AVCs

It should be up on ArXiV later today…

A.D. Sarwate and M. Gastpar
Deterministic list codes for state-constrained arbitrarily varying channels
Submitted to IEEE Transactions on Information Theory
ArXiV cs.IT/0701146

The capacity for the discrete memoryless arbitrarily varying channel (AVC) with cost constraints on the jammer is studied using deterministic list codes under both the maximal and average probability of error criteria. For a cost function $l(\cdot)$ on the state set and constraint $\Lambda$ on the jammer, the achievable rates are upper bounded by the random coding capacity $C_r(\Lambda)$. For maximal error, the rate $R = C_r(\Lambda) – \epsilon$ is achievable using list codes with list size $O(\epsilon^{-1})$. For average error, an integer $\lsym(\Lambda)$, called the \textit{symmetrizability}, is defined. It is shown that any rate below $C_r(\Lambda)$ is achievable under average error using list codes of list size $L > \lsym$. An example is given for a class of discrete additive AVCs.

yet another test

I can only hope that this Gaussian will appear:

\mathbb{P}(X \le x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2 \pi}} \exp(- y^2/2) dy

How to do it:

  1. Install the WP-Cache plugin. Note that you have to muck with file permissions and it’s somewhat non-intuitive.
  2. Install the MimeTeX plugin.
  3. Stress out about the fact that for some equations the terminating quote ” in the alt tag for the image has turned into a fancy quote. Come up with a hack involving placing an empty img tag right after every piece of LaTeX. Hope that you figure out a way around it.

not again!

Since I only recently started reading ArXiV through my RSS aggregator, I was a unaware of the astonishing regularity with which “proofs” of polynomial-time algorithms for NP-complete problems are proposed. Most recent is this one, but one can find a more comprehensive list here. The latter page is a bit too unskeptical of the claims, since they say “so and so proved P=NP in this paper.” It’s not a proof if it’s wrong, and pretty much all of these proofs have been shown to be wrong. But it might be an interesting exercise one week for some reading group or topics class to formally prove some of these guys wrong. Of course, for every person claiming a proof that P=NP there is another person ready to knock them down and claim that P!=NP. Maybe it’s just a little self-correcting mechanism in the ArXiV.

Transactions growth

On a similar tip as my comments on ISIT’s size, the IT Society’s Board of Governors formed an ad-hoc committee to talk about the ballooning-out-of-control of the Transactions. They filed a report at the ISIT in Seattle. Some highlights:

  1. “… the exponential curve, which corresponds to an annual growth rate of 7.5%, is by far the best fit to the data… we may well find ourselves in the following situation within less than 10 years. The Transactions will be publishing over 10,000 pages per year…”
  2. “… it appears the growth will be sustainable as long as we charge for the cost of producing and mailing hard copies.”
  3. The average length of a regular paper was 10.3 pages in 1989 and is 15.8 pages in 2006.

Something that is distinctly missing from the plots provided in the report is the variance of paper length or a histogram of paper lengths. Although papers of average length 15.8 doesn’t seem so bad, if the distribution has a heavy tail some other solutions might present themselves.

The report describes several possible solutions, including doing nothing, splitting the transactions into two journals, limiting page lengths, going to all-electronic publishing except for libraries, etc. The recommendations they make are threefold:

  1. Go to all-electronic publishing except for libraries. This limits the financial burden.
  2. Make a hierarchical organization of the editorial board with sub-editors in chief for different areas.
  3. A 5-page limit for correspondence items

I’m not sure how I feel about all-electronic publishing. Libraries want to have hard copies of things for archival purposes, and this seems to be a neat way of passing the buck to them — will more expensive binding mean more cost on their institutional subscription? On the one hand, you save money by printing fewer copies, but on the other the hard copies may cost more. Probably this saves money all around though.

The sub-editing is a way of making the large page numbers tractable. They specifically don’t want to narrow the scope of the journal, which is good, but they also don’t want to cut more papers by making the acceptance rate lower, so the only way to keep the quality high is to add more reviewers and editors. But they simultaneously note that the number of reviewers is not increasing at the same rate as the number of pages.

The 5-page limit is somewhat odd — ISIT papers are already 5 pages and the difference would seem to be better peer-reviewing and very long publication delay. While this would help control the page numbers, they specifically do not recommend adding a page limit for regular papers. What happens in the gap between a 5-page idea and a 15.8 page regular paper?

Taken together, the proposed solutions seem to consciously avoiding taking the problem head-on. A more aggressive solution might include things like

  • Imposing page charges. At the moment the policy is:

    Page Charges: If the manuscript is accepted for publication, the author’s company or institution will be requested to cover part of the cost of publication. Page charges for this journal are not obligatory nor is their payment a prerequisite for publication. The author will receive 100 free reprints without covers if the charge is honored. Detailed instructions will accompany the proof.

    The Signal Processing Transactions charges $110/page up to 8 pages and $220/page thereafter. Making page charges mandatory for longer papers or adding a mandatory charge per page for papers over 20 pages may encourage authors to write shorter (and perhaps more readable?) papers.

  • Encouraging editors and reviewers to more aggressively promote correspondence items. They bring up the point that correpondence items suffer from “inflated introductions and similar excesses, designed by the authors in order to avoid the correspondence classification.” If there are more specific instructions to editors to… edit things down, then both regular papers and correspondence items can be trimmed.

In the end, the recommendations of the board are more carrot than stick, which may work or may not. The big message seems to be that exponential growth is good but it needs to be somehow managed. However, it may be that more active feedback is needed to control this system which is going unstable exponentially. It has been noted that there is insufficient communication between the information theory and control communities…

Allerton 2006

I just finished up attending the 44th Annual Allerton Conference on Communication, Control, and Computation. Conveniently, the conference is held in Monticello, IL, which is a short drive away from Urbana. So it’s a free trip home for me, and another chance to flog my research. The conference was fun — I attended a number of good talks, and although the synchronization across sessions was a little uneven, I think I got a lot of ideas out of being there. What follows is a small set of notes on a subset of talks that I attended. I went to more talks than these, and some that I didn’t mention here were also quite interesting.

As always, I may have misunderstood some of the results, so don’t take my word for it necessarily…

  • How to Filter an “Individual Sequence with Feedback”
    T. Weissman (Stanford)

    This was on a problem that is sometimes called “compound sequential decision-making against the well-informed antagonist.” The idea is that you have some data sequence which is corrupted by noise and you would like to denoise it. If the sequence is arbitrary, then you can take an individual sequence approach to the problem. However, suppose now that you observe the noisy sequence causally, and an aversary can select the next data sample based on the noisy outputs of the previous data samples. This is the well-informed adversary, and the problem becomes significantly harder. The interesting result for me is that randomized strategies perform better than deterministic ones, since the adversary can’t track the output. This is of course related in some way to the problems I’m thinking about, but has a totally different flavor.

  • Sharp Thresholds for Sparsity Recovery in the High-Dimensional and Noisy Setting
    M. Wainwright (Berkeley)

    There has been lots of work on compressed sensing and sparse reconstructions in recent years. Wainwright is interested in recovering a sparsity pattern in high dimensional data
    that is, which elements are nonzero? A lot of asymptotic analysis is done to relate how many observations you need of the noisy data in terms of the dimension of the problem and the number of nonzero elements. It turns out that scaling the sparsity linearly with the dimension of the problem is bad (which may not be so surprising to some, but it’s hard for me to get an intuition for these things).

  • Delay Asymptotics with Network Source Coding
    S. Bhadra and S. Shakkottai (UT Austin)

    This was an interesting talk on interpreting coding across packets in networks as effectively moving the buffering from nodes in the network back to the source. In networks with heterogeneous traffic, this might be a useful way of thinking about things. Shakkottai said it was like “waterpouring over unused capacity,” which sounded like an appealing intuition. Whenever I get the proceedings I’d like to look at the details.

  • How to Achieve Linear Capacity Scaling without Mobility
    A. Ozgur (EPFL), O. Leveque (Stanford) and D. Tse (Berkeley)

    This talk was completely packed, as it showed a complete solution for the scaling law problem in dense wireless networks like those of Gupta and Kumar. The scheme to achieve this was a hierarchical MIMO scheme on an overlay grid for the network that tried to get the maximum spatial reuse. The big picture was clear and the details looked quite tricky. The major thing needed is independent uniform phase fading from each point-to-point link in the network. There was a little confusion during the questions about uniform phase and scattering effects that went a bit over my head. The mathematical problem seems settled, although the debate on the engineering question may be open…

  • Coding into a Source: A Direct Inverse Rate-Distortion Theorem
    M. Agarwal (MIT), A. Sahai (Berkeley) and S. Mitter (MIT)

    Suppose you have to use a fixed input distribution for a channel, and the channel is guaranteed, given that input distribution, to not distort the input by too much (using a given distortion measure). How well can you do across such a channel? The capacity is actually the rate distortion function R(D), which neatly gives an operational meaning for rate-distortion in terms of channel coding. It’s a difficult problem to wrap your head around, and nearly impossible to describe in a few sentences. It’s one of those results that is trying to get at what information really is, and to what extent the “bit” formalism of information theory is a fundamental aspect of nature.

  • Data Fusion Trees for Detection: Does Architecture Matter?
    W-P. Tay, J. Tsitsilkis, and M. Win (MIT)

    Suppose we have a huge number of sensors that all observe iid samples that could come from one of two distributions. A fusion center wants to decide which distribution governs the samples, and can get quantized versions of the samples from the sensors. This paper is trying to get at how different fusion architectures (e.g. trees) perform versus a totally centralized scheme. The main result is that if the total number of leaves is asymptotically the same as the number of sensors and the tree has bounded height, then we can do as well as a completely centralized solution.

  • Randomization for robust communication in networks, or “Brother, can you spare a bit?”
    A.D. Sarwate and M. Gastpar (Berkeley)

    I don’t have much to say about this paper, since I wrote it. The basic idea is to justify the AVC model for interference in networks and to come up with some strategies for sensor networks and ad-hoc wireless networks to generate secret random keys to enable randomized coding on links in the network. For sensor networks, two terminals might be able to extract a shared random key from correlated sources. In wireless networks we can sacrifice some nodes to distribute keys to their neighbors in a decentralized scheme. The main point is that key distribution needs to happen only once — a new key for the next transmission can be sent with the current transmission with no asymptotic cost in rate.

  • Probabilistic Analysis of LP Decoding
    A.D.G. Dimakis, C. Daskalakis, R. Karp, and M. Wainwright (UC Berkeley)

    LP decoding is appealing since it is a way of efficiently decoding LDPC codes. The authors here provide a formalism for analyzing the performance of LP decoding for random errors (rather than the adversarial error model). The heuristic is that bits that are corrupted have “poison” that must be redistributed around the Tanner graph of the code using what they call a valid hyperflow. If such a flow exists than LP decoding succeeds. They then prove that for a large class of codes that are expanders a hyperflow will exist with high probability.

  • Distributed Beamforming Using 1 Bit Feedback: From Concept to Realization
    R. Mudumbai (UC Santa Barbara), B. Wild (UC Berkeley), U. Madhow (UC Santa Barbara), and K. Ramchandran (UC Berkeley)

    The one bit feedback scheme for distributed beamforming is a way of aligning antennas using one bit of feedback about the received SNR. Imagine a large number of nodes communicating with a base station. Beamforming gains can be huge with a large number of antennas, so it would be good if they could adjust their phases to add coherently at the receiver. The authors describe their scheme and show some experimental results. There is also an analysis of the speed of convergence, which is linear in the number of antennas.

  • Achieving List Decoding Capacity Using Folded Reed-Solomon Codes
    V. Guruswami and A. Rudra (U. Washington)

    Consider a Reed-Solomon code of blocklength n over an alphabet of size q that communicates N bits. We can take this code and make it into a Folded Reed-Solomon code of blocklength n/m over an alphabet of size mq that communicates N bits by taking m consecutive symbols and treating them as a single packet. In doing this and some clever list-decoding modifications to the original Guruswami-Sudan algorithm, the authors can get arbitrarily close to the capacity of the erasure channel. The tricky thing is that the alphabet size has to increase we well. The construction and decoding are pretty cool though, and it makes me wish we had done some of this kind of stuff in my algebra classes.

more IEEE election ridiculousness

The paper ballot that I was given for the IEEE general elections is misprinted! The two candidates for IEEE-USA President Elect are switched with the two candidates for Member-at-Large. The biographies/statements are printed correctly in the book, but the ballot is all wrong. What kind of an operation are these guys running anyway?

As a further note on the Information Theory Society ballot, one thing that was missing was a statement from any of the candidates — I basically had to make a decision on which 6 to vote for based on what I thought of their research, how they have comported themselves in talks, and (in a very few cases) personal interactions. None of these things really have much to do with how well they would do at running and organization, and there are certain policy issues which I think need to be addressed. It kind of makes the whole election thing into a referendum on research quality. I hope that changes in the future.

UPDATE : IEEE sent me a new ballot in the mail — that was pretty fast! Color me impressed…

Information Theory Society Ballots

I recently received my IEEE Information Theory Society Board of Governor ballot. It says there that

On the ballot card, names are listed in randomized order, no preference is intended.

Leaving the grammatical issues aside, what does this mean? This is the information theory crowd, so they should have told us how they were doing the randomizing! I mean, we’re supposed to get excited by that stuff, right? A more freewheeling (and incorrect) take on it is that they sent a different randomized ballot order to every one of the “over 6000” members of the society. Since there are 15 nominees for the board, there are a total of 15! = 1,307,674,368,000 different orderings, which would make the ballots-alphabet highly undersampled. Given that Alon Orlitsky is one of the candidates, perhaps that would be more appropriate…

Three classic results on the AVC

I wrote a summary of three classical results on arbitrarily varying channels that I think make a nice story. The first is the original 1960 proof by Blackwell, Breiman, and Thomasian. The second is Ahlswede’s elimination technique. The last is the symmetrizability result of Csiszár and Narayan. Together, these three results settle some basic questions regarding arbitrarily varying channels, and I tried to make the exposition self-contained and readable by people who have had a basic course in information theory.