The editor-in-chief has a new stick

Here’s a little bit of news which may have escaped notice by some in the information theory community:

“In view of its concerns about excessive reviewing delays in the IT Transactions, the BoG authorizes the EiC in his sole judgment to delay publication of papers by authors who are derelict in their reviewing duties.”

Reviewers may be considered to be derelict if they habitually decline or fail to respond to review requests, or accept but then drop out; or if they habitually submit perfunctory and/or excessively delayed reviews.

I’ve said it before, but I think that transparency is the thing which will make reviews more timely — what is the distribution of review times? What is the relationship between paper length and review length? Plots like these may shock people and also give a perspective on their own behavior. I bet some people who are “part of the problem” don’t even realize that they are part of the problem.

Postdoc job openings

Some people have told me about postdoctoral position openings that are opening up, and I figured I’d repost some of them here as they come along. Of course, there are other places to post announcements, but I find that postdoc opportunities are a bit harder to advertise/hear about. I think a lot of systems EE people applying for academic positions right out of grad school tend to put off applying for postdocs until they hear back about their faculty interviews — I’d tend to say this is a mistake:

  • If your graduation date may be a little flexible, pinging someone early on (e.g. in the fall) about possible postdoc opportunities can be a good plan. NSF grant deadlines are in the fall, and so they could write a postdoc position into a current proposal.
  • Of course you’re going to apply for faculty positions, and the people you’re talking to about postdoc positions know that. However, if you get to May and haven’t talked to anyone about postdoc options, you may find that those positions have filled up.
  • Don’t think of a postdoc as a “fallback plan” (akin to a “safety school”) — it’s an opportunity and a chance to make a strategic decision. Do you want to switch areas or learn about something new? Do you want to dig deeper into things you’ve already been working on? Do you want a springboard to get a job in a specific country? Do you want to build closer ties to industry? Do you want closer mentorship?

I went to a panel at Allerton once on “whether you should do a postdoc” starring (among other people) Aaron Wagner and Todd Coleman, I believe. Everyone was very enthusiastic about doing a postdoc. Everyone on the panel had faculty positions lined up for after their postdoc and deferred their start date to do that postdoc. This is the best of all possible worlds but is pretty unusual, so don’t count on it.

This is all dodging the issue of whether or not you should even do a postdoc. That might be a topic for a different post (or a debate for the comments) — I know people have strong feelings on both sides. I tend to think our system is broken or veering into brokenness.

However, more information is more power, so if you have a postdoc announcement (details are helpful) and want me to post it here, please do send it my way. You can also try to post to the IT Society website.

Le charme discret de l’entropie?

TTI Chicago is located on the University of Chicago campus. I am a UChicago affiliate, which means I get to use the libraries here. Unfortunately, the University is almost opposed to the idea of engineering, so things like institutional subscriptions to IEEExplore are out. Compounding this is a kind of old-fashionedness about the university which makes a lot of their collection… dated. However, since information theory is math-like, the university does have a fair number of information theory books and resources, including a number of textbooks and monographs which I had not seen before. One such is Stanford Goldman’s book, whose first chapter is entitled:

Le charme discret de l'entropie?

Le charme discret de l’entropie?

There is also a cute figure:

Moo

Moo!

A comment about maximal versus average error

In my previous post I made some vague comments about maximal and average error which I think are not complicated but don’t often arise when we think about plain vanilla information theory problems. These come up in arbitrarily varying channels, but also help motivate some interesting alternative models.

We most often think of error criteria as being about the desiderata for communication — asking for the maximal error over messages to be small is more stringent than asking for the average error. However, if we have a code which has small average error we can expurgate bad codewords to get a code which has small maximal error. With that meta-argument in hand, at a first pass we’re done : average error and maximal error are pretty much the same.

However, choosing an error criterion can also be a choice about the communication model. If the noise can somehow depend on what is being transmitted, then this distinction can come up. If the error criterion is maximal error and the interference is arbitrary, then for each message and each interference signal, I need to guarantee that probability of error is small. Implicitly, this allows the interference to be chosen as a function of the message. Under average error, the interference cannot be a function of the message.

If we allow for common randomness, the transmitter can mask the dependence of the codeword on the message. Suppose we have a deterministic code with N codewords which has small average error. Then we can create a randomized code by choosing a random permutation \pi of N and encoding message m with the \pi(m)-th codeword of the original code. In this way we get the maximal error to be small, since the maximal error is averaged over the common randomness.

From this it’s clear that the choice of error criterion is also a modeling choice, especially when there is no common randomness allowed. However, this is all looking at the case where the interference can depend on the message. The trickier situation is when the interference can depend on the transmitted codeword, which is what I discussed in the previous post. In that setting using common randomness to mask the message-codeword dependence doesn’t buy that much. What I was working on recently was how much it buys you when the dependence is weakened.

As a final note, this model of interference depending on the transmitted codeword appears in a stochastic form under the name “correlated jamming” and has been worked on by several researchers, including Muriel Médard, Shabnam Shafiee, and Sennur Ulukus. My goal was to get an AVC and geometric perspective on these kinds of communication models.

ISIT : plenaries and thoughts

Just a few brief notes on the plenaries. Prakash Narayan gave a nice talk on his work on secrecy generation and related problems. It was nice because it tied together a number of different models in one talk so that if you were someone who had only looked at wiretap problems you could see a more unified approach to these problems. It was a little technical for my pre-breakfast brain though. Ueli Maurer gave an overview of his new approach to cryptography — I had seen a version of this before, and it was full of pictures to illustrate the reductions and interfaces he was trying to create. I think if I had more of a background in formal CS-style cryptography I might have understood it a bit better. It feels like trying to build a different style of bridge between theory (formal reasoning about security) and practice.

Abbas El Gamal gave a rather personal Shannon Lecture, taking us through a number of stages in his research life, together with some perspectives on his new book with Young-Han Kim on network information theory. He ended by calling for the IT community to really go and tackle new problems and develop new tools and models to do that. One of the things that came across more sharply for me in this ISIT, partly due to the Cover memorial, is that information theory really is a research community. Of course, there are groups and cliques and politics and problems, but each ISIT is a real coming together that reinforces that sense of community. That’s valuable.

ISIT 2012 : more talks

Since I am getting increasingly delayed by post-ISIT and pre-SPCOM business, I am going to have to keep the rest of blogging about ISIT a little short. This post will mention some talks, and I’ll keep the other stuff for a (final) post.

Efficient Tracking of Large Classes of Experts
András György, Tamas Linder, Gabor Lugosi
This paper was on expanding the reference class against one is competing in a “prediction with experts” problem. Instead of doing well against the best expert chosen in hindsight, you compete against the best meta-expert which can switch between the existing experts. This leads to a transition diagram that is kind of complicated, but they propose a unifying approach which traces along branches — the key is that every transition path can be well approximated, so the space of possibilities one is tracking will not blow up tremendously.

Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing
David Donoho, Adel Javanmard, Andrea Montanari
What a trendy title! Basically this problem looks at the compressed sensing problem when the sensing matrix is banded (this is what spatially coupled means), and solves it using Bayesian approximate message passing to do progressive decoding and elimination. The optimality is in the sense of matching with the Renyi dimension of the signal class for the data. I alas did not take notes for the next talk, which also seemed related: Hybrid Generalized Approximate Message Passing with Applications to Structured Sparsity (Sundeep Rangan, Alyson Fletcher, Vivek Goyal, Philip Schniter)

Quantized Stochastic Belief Propagation: Efficient Message-Passing for Continuous State Spaces
Nima Noorshams, Martin Wainwright
This problem was on BP when the state space is continuous — instead of passing the whole belief distribution, nodes pass along samples from the distribution and the receiving node does a kind of interpolation/estimate of the density. They show that this process converges on trees. This is related to a problem I’ve been thinking about for decentralized inference, but with a different approach.

Synchrony Amplification
Ueli Maurer, Björn Tackmann
This was a cool talk on a framework for thinking about synchrony in clocks — the model is pretty formal, and it’s something I never really think about but it seemed like a fun way to think about these problems. Basically they want to formalize how you can take a given clock (a sequence of ticks) and convert it into another clock. The goal is to not throw out too many ticks (which equals slowdown), while achieving synchrony.

Non-coherent Network Coding: An Arbitrarily Varying Channel Approach
Mahdi Jafari Siavoshani, Shenghao Yang, Raymond Yeung
Of course I have to go to a talk with AVC in the title. This looks at the same operator channel for network coding but then they assume the network matrix may be arbitrarily varying (with known rank). In this model they can define all the usual AVC concepts and they get similar sorts of results that you see for AVCs, like dichotomies between deterministic coding with average error and randomized coding.

Alternating Markov Chains for Distribution Estimation in the Presence of Errors
Farzad Farnoud, Narayana Prasad Santhanam, Olgica Milenkovic
This talk was on the repetition channel and getting the redundancy of alternating patterns. They show upper and lower bounds. The idea is you start out with a word like abccd and it goes through a repetition channel to get aaabbcccdddd for example, and then you look instead at abcd by merging repeated letters.

On Optimal Two Sample Homogeneity Tests for Finite Alphabets
Jayakrishnan Unnikrishnan
A two-sample test means you have two strings x^n and y^n and you want to know if they are from the same distribution. He looked at the weak convergence of the asymptotically optimal test to get bounds on the false alarm probability.

Hypothesis testing via a comparator
Yury Polyanskiy
This was on a model where two nodes get to observe X^n and Y^n drawn i.i.d. from either P_{XY} or Q_{XY} and they separately compress their observations into messages W_1 and W_1. The decision rule is to decide P_{XY} if W_1 = W_2. What’s the best exponent?

The Supermarket Game
Jiaming Xu, Bruce Hajek
This was on queuing. Customers come in and sample the loads of L queues and then pick one to join. Their strategies may differ, so there is a game between the customers and this can affect the distribution of queue sizes. As a flavor of the weird stuff that can happen, suppose all customers but one only sample one queue and join that queue. Then the remaining customer will experience less delay if they sample two and join the shorter one. However, if all but one sample two and join the shorter one, then it’s better for her to sample just one. At least, that’s how I understood it. I’m not really a queueing guy.

Tracking the origin of genies

Lalitha Sankar asked Gerhard Kramer about my earlier question about genies. Gerhard wrote:

I got the name from Jim Massey who had suggested it as part of a title for the thesis of another doctoral student I know.

I have heard this attributed to Gallager, but the word “genie” might even come up in the Wozencraft-Jacobs book from the mid-60s (not sure!). I suspect that it goes back even further.

A little further searching along those directions turned up some more hits. On page 366 of Viterbi and Omura’s 1978 text Principles of Digital Communication and Coding, while discussing the distribution of computation in convolutional codes they write “[W]e begin by considering a sequential decoder aided by a benevolent genie who oversees the decoder action on each incorrect subset.”

But indeed, as Gerhard indicates, there is a reference in Wozencraft and Jacobs (1965). From Rimoldi and Urbanke’s paper on rate splitting, they write “[C]onceptually, we follow the lead of Wozencraft and Jacobs [29, p. 419] and postulate a genie who always knows the codeword of user 2…” Following up on that reference, in reference to the decoding of convolutional codes, Wozencraft and Jacobs write “… assume initially that a magic genie directs the decoder to the correct starting node for determining each \hat{x}_h…”

In the bibliographic notes in Viterbi and Omura, they write

As was noted previously, the original sequential decoding algorithm was proposed and analyzed by Wozencraft [1957]. The Fano algorithm [1963], with various minor modifications, has been analyzed by Yudkin [1964], Wozencraft and Jacobs [1965], Gallager [1968], and Jelinek [1968a]. Two versions of stack algorithms and their performance analyses are due to Zigangirov [1966] and Jelinek [1969a]. The precise form of the Pareto distribution on computation emerged from the works of Savage [1966] for the upper bound, and of Jacobs and Berlekamp [1967] for the lower bound.

So it seems that if the argument is due to Wozencraft, the source of the genie argument in this context is probably due to the Wozencraft and Jacobs book, but the credit for the analogy to genies is probably lost in time to us.

Converse genie etymology

In his Shannon Lecture, Abbas mentioned that his work with Costa on deterministic interference channels was the first to use “genie-aided” converse arguments (essentially assuming the decoder has more information), but that they “lacked the imagination” to use that name. The question is, who did come up with the term “genie” in connection with converses?