For those readers of the blog who have not submitted papers to machine learning (or related) conferences, the conference review process is a bit like a mini-version of a journal review. You (as the author) get the reviews back and have to write a response and then the reviewers discuss the paper and (possibly, but in my experience rarely) revise their reviews. However, they generally are supposed to take into account the response in the discussion. In some cases people even adjust their scores; when I’ve been a reviewer I often adjust my scores, especially if the author response addresses my questions.

This morning I had the singular experience of having a paper rejected from ICML 2014 in which all of the reviewers specifically marked that they did not read and consider the response. Based on the initial scores the paper was borderline, so the rejection is not surprising. However, we really did try to address their criticisms in our rebuttal. In particular, some misunderstood what our claims were. Had they bothered to read our response (and proposed edits), perhaps they would have realized this.

Highly selective (computer science) conferences often tout their reviews as being just as good as a journal, but in both outcomes and process, it’s a pretty ludicrous claim. I know this post may sound like sour grapes, but it’s not about the outcome, it’s about the process. Why bother with the facade of inviting authors to rebut if the reviewers are unwilling to read the response?

The The 10th IEEE International Conference on Distributed Computing in Sensor Systems has issued a call for papers. Deadlines are 1/31 and 2/7.

After attending GlobalSIP I flew to Reno and drove to South Lake Tahoe for NIPS 2013. NIPS is large conference that is unfortunately single-track. All papers are posters and a very small number are selected for longer oral presentation. A slightly larger number are selected for 5 minute “spotlight” advertisements. The poster session is 7-11PM for the first three days, and each poster session contains around 90 posters in a giant room. It’s very loud, and some poster presenters lose their voice for a day or two after presenting.

The contrast with GlobalSIP could not be starker. Obviously these are very different venues, but I found that all of the noise and commotion at NIPS made it nigh impossible for me to understand or retain any explanations at the poster session. Instead, I found myself circling titles in my program guide so that I could take a look at the papers later. Perhaps it was harder for me since I’m an “outsider” so I have more to learn about the basic models/assumptions in most of the papers, and I need more of an explanation than most.

In a sense a poster is “better” for the viewer because they can see what they want/need. You can get an explanation “at your level” from the poster presenter, and it’s more interactive than sitting for some 20 minute talk where the presenter feels the need to have a TOC slide (c.f. ISIT). But the lack of noise isolation and the sheer volume of posters is not ideal for actually digesting new ideas. I wonder if the NIPS model is really sustainable, and if they would ever consider going to parallel sessions. I think that even with posters, some isolation would help tremendously.

I’m in Austin right now for the first GlobalSIP conference. The conference has a decentralized organization, with semi-independent day-long workshops (“symposia”) scheduled in parallel with each other. There are 8 of these, with 6 running in parallel per day, with 1 session of “plenary” talks and 2 poster sessions. Each workshop is scheduled in AAB, ABA, or BAA, where A = posters and B = plenary, so there are 2 talk sessions and 4 poster sessions running in parallel.

Fortunately, there are a wide range of topics covered in the workshops, from biology to controlled sensing, to financial signal processing. The downside is that the actual papers in each workshop often fit well with other workshops. For example, the distributed optimization posters (in which I am interested), were sprinkled all over the place. This probably has a lot to do with the decentralized effects.

In terms of the “results” at the conference, it seems from my cursory view that many people are presenting “extra” results from other conference papers, or preliminary work for future papers. This actually works well in the poster format: for the former, the poster contains a lot of information about the “main result” as well, and for the latter, the poster is an invitation to think about future work. In general I’m a little ambivalent about posters, but if you’re going to have to do ‘em, a conference like this may be a better way to do it.

Here are my much-belated post-ISIT notes. I didn’t do as good a job of taking notes this year, so my points may be a bit cursory. Also, the offer for guest posts is still open! On a related note the slides from the plenary lectures are now available on Dropbox, and are also linked to from the ISIT website.

From compression to compressed sensing
Shirin Jalali (New York University, USA); Arian Maleki (Rice University, USA)
The title says it, mostly. Both data compression and compressed sensing use special structure in the signal to achieve a reduction in storage, but while all signals can be compressed (in a sense), not all signals can be compressively sensed. Can one get a characterization (with an algorithm) that can take a lossy source code/compression method, and use it to recover a signal via compressed sensing? They propose an algorithm called compressible signal pursuit to do that. The full version of the paper is on ArXiV.

Dynamic Joint Source-Channel Coding with Feedback
Tara Javidi (UCSD, USA); Andrea Goldsmith (Stanford University, USA)
This is a JSSC problem with a Markov source, which can be used to model a large range of problems, including some sequential search and learning problems (hence the importance of feedback). The main idea is to map the problem in to a partially-observable Markov decision problem (POMDP) and exploit the structure of the resulting dynamic program. They get some structural properties of the solution (e.g. what are the sufficient statistics), but there are a lot of interesting further questions to investigate. I usually have a hard time seeing the difference between finite and infinite horizon formulations, but here the difference was somehow easier for me to understand — in the infinite horizon case, however, the solution is somewhat difficult to compute.

Unsupervised Learning and Universal Communication
Vinith Misra (Stanford University, USA); Tsachy Weissman (Stanford University, USA)
This paper was about universal decoding, sort of. THe idea is that the decoder doesn’t know the codebook but it knows the encoder is using a random block code. However, it doesn’t know the rate, even. The question is really what can one say in this setting? For example, symmetry dictates that the actual message label will be impossible to determine, so the error criterion has to be adjusted accordingly. The decoding strategy that they propose is a partition of the output space (or “clustering”) followed by a labeling. They claim this is a model for clustering through an information theoretic lens, but since the number of clusters is exponential in the dimension of the space, I think that it’s perhaps more of a special case of clustering. A key concept in their development is something they call the minimum partition information, which takes the place of the maximum mutual information (MMI) used in universal decoding (c.f. Csiszár and Körner).

Farzin Haddadpour (Sharif University of Technology, Iran); Mahdi Jafari Siavoshani (The Chinese University of Hong Kong, Hong Kong); Mayank Bakshi (The Chinese University of Hong Kong, Hong Kong); Sidharth Jaggi (Chinese University of Hong Kong, Hong Kong)
Of course I had to go to this paper, since it was on AVCs. The main result is that if one considers maximal error but allow the encoder only to randomize, then one can achieve the same rates over the Gaussian AVC as one can with average error and no randomization. That is, allowing encoder randomization can move from average error to max error. An analogous result for discrete channels is in a classic paper by Csiszár and Narayan, and this is the Gaussian analogue. The proof uses a similar quantization/epsilon-net plus union bound that I used in my first ISIT paper (also on Gaussian AVCs, and finally on ArXiV), but it seems that the amount of encoder randomization needed here is more than the amount of common randomness used in my paper.

Coding with Encoding Uncertainty
Jad Hachem (University of California, Los Angeles, USA); I-Hsiang Wang (EPFL, Switzerland); Christina Fragouli (EPFL, Switzerland); Suhas Diggavi (University of California Los Angeles, USA)
This paper was on graph-based codes where the encoder makes errors, but the channel is ideal and the decoder makes no errors. That is, given a generator matrix $G$ for a code, the encoder wiring could be messed up and bits could be flipped or erased when parities are being computed. The resulting error model can’t just be folded into the channel. Furthermore, a small amount of error in the encoder (in just the right place) could be catastrophic. They focus just on edge erasures in this problem and derive a new distance metric between codewords that helps them characterize the maximum number of erasures that an encoder can tolerate. They also look at a random erasure model.

One big difference between reviewing for conferences like NIPS/ICML and ISIT is that there is a “discussion” period between the reviewers and the Area Chair. These discussions are not anonymized, so you know who the other reviewers are and you can also read their reviews. This leads to a little privacy problem — A and B may be reviewing the same paper P, but A may be an author on a paper Q which is also being reviewed by B. Because A will have access to the text of B’s reviews on P and Q, they can (often) unmask B’s authorship of the review on Q simply by looking at the formatting of the reviews (are bullet points dashes or asterisks, do they give numbered points, are there “sections” to the review, etc). This seems to violate the spirit of anonymous review, which is perhaps why some have suggested that reviewing be unblinded (at least after acceptance).

The extent to which all of this matter is of course a product of the how fast the machine learning literature has grown and the highly competitive nature of the “top tier conferences.” Because the acceptance rate is so low, the reviewing process can appear to be “arbitrary” (read: subjective) and so questions of both review quality and author/review anonymity impact possible biases. However, if aim of double-blind reviewing is to reduce bias, then shouldn’t the discussions also be anonymized?

I’m on the program committee for the Cyber-Security and Privacy symposium, so I figured I would post this here to make more work for myself.

GlobalSIP 2013 – Call for Papers
IEEE Global Conference on Signal and Information Processing
December 3-5, 2013 | Austin, Texas, U.S.A.

GlobalSIP: IEEE Global Conference on Signal and Information Processing is a new flagship IEEE Signal Processing Society conference. The focus of this conference is on signal and information processing and up-and-coming signal processing themes.

GlobalSIP is composed of symposia selected based on responses to the call-for-symposia proposals. GlobalSIP is composed of symposia on hot topics related to signal and information processing.

The selected symposia are:

Paper submission will be online only through the GlobalSIP 2013 website Papers should be in IEEE two-column format. The maximum length varies among the symposia; be sure to check each symposium’s information page for details. Authors of Signal Processing Letters papers will be given the opportunity to present their work at GlobalSIP 2013, subject to space availability and approval by the Technical Program Chairs of GlobalSIP 2013. The authors need to specify in which symposium they wish to present their paper. Please check conference webpage for details.

Important Dates:
*New* Paper Submission Deadline – June 15, 2013
Review Results Announce – July 30, 2013
Camera-Ready Papers Due – September 7, 2013
*New* SPL request for presentation – September 7, 2013

I’m at the Bellairs Research Institute for a workshop this week and I’ll blog a bit later about some of the interesting talks here. We give the talks on the balcony of one of the buildings, projected on the wall. Fortunately, we are facing west, which means talks have to end at around 2:30 before people start baking to death. After all that superheated research the only thing to do, really, is cool off in the ocean next door…

The beach at Belairs

Again a caveat — these are the talks in which I took reasonable enough notes to write anything coherent.

Green Communication: From Maxwell’s Demon to “Informational Friction”
Pulkit Grover
Pulkit talked about trying to tie a physical interpretation the energy used in communication during computation. Physicists might argue that reversible computation costs nothing, but this ignores friction and noise. Pulkit discussed a simple network model to account for “informational friction” that penalizes the bit-distance product in communicating on a chip. See also Pulkit’s short video on the topic.

Hajar Mahdavi-Doost, Roy Yates
Roy talked about a model in which receivers have to harvest the energy they need for sampling/buffering/decoding the transmissions. These three tasks cost different amounts, and in particular, the rate at which the receiver samples the output dictates the other parameters. The goal is to choose a rate which helps meet the decoder energy requirements. Because the receiver has to harvest the energy it needs, it has to design a policy to switch between the three operations while harvesting the (time-varying) energy available to it.

Multiple Access and Two-way Channels with Energy Harvesting and Bidirectional Energy Cooperation
Kaya Tutuncuoglu Aylin Yener
Unlike the previous talk, this was about encoders which have to transmit energy to the receivers — there’s a tradeoff between transmitting data and energy, and in the MAC and TWC there is yet another dimension in how the two users can cooperate. For eample, they can cooperate in energy transmission but not data cooperation. There were a lot of results in here, but there was also a discussion of policies for the users. In particular a “procrastination” strategy turns out to work well (rejoice!).

An equivalence between network coding and index coding
Michelle Effros, Salim El Rouayheb, Michael Langberg
The title says it all! For every network coding problem (multiple unicast, multicast, whatever), there exists a corresponding index coding problem (constructed via a reduction) such that a solution to the latter can be easily translated to a solution for the former. This equivalence holds for all network coding problems, not just linear ones.

Crowd-sourcing epidemic detection
Constantine Caramanis, Chris Milling, Shie Mannor, Sanjay Shakkottai
Suppose we have a graph and we can see some nodes are infected. This paper was on trying to distinguish between whether the infected nodes started from a single point infection spread via an SI model, or just from a random pattern of infection. They provide two algorithms for doing this and then address how to deal with false positives using ideas from robust statistics.

I promised some ITA blogging, so here it is. Maybe Alex will blog a bit too. These notes will by necessity be cursory, but I hope some people will find some of these papers interesting enough to follow up on them.

A Reverse Pinsker Inequality
Daniel Berend, Peter Harremoës , Aryeh Kontorovich
Aryeh gave this talk on what we can say about bounds in the reverse direction of Pinsker’s inequality. Of course, in general you can’t say much, but what they do is show an expansion of the KL divergence in terms of the total variation distance in terms of the balance coefficient of the distribution $\beta = \inf \{ P(A) : P(A) \ge 1/2 \}$.

Unfolding the entropy power inequality
Mokshay gave a talk on the entropy power inequality. Given vector random variables $X_1$ and $X_2$ is there a term we know that $h(X_1 + X_2) \ge h(Z_1 + Z_2)$ where $Z_1$ and $Z_2$ are isotropic Gaussian vectors with the same differential entropy as $X_1$ and $X_2$. The question in this paper is this : can we insert a term between these two in the inequality? The answer is yes! They define a spherical rearrangement of the densities of $X_1$ and $X_2$ into variables $X_1^{\ast}$ and $X_2^{\ast}$ with spherically symmetric decreasing densities and show that the differential entropy of their sum lies between the two terms in the regular EPI.
This talk was on tradeoffs in caching. If there are $N$ files, $K$ users and a size $M$ cache at each user, how should they cache files so as to best allow a broadcaster to share the bandwidth to them? More simply, suppose there are three people who may want to watch one of three different TV shows, and they can buffer the content of one TV show. Since a priori you don’t know which show they want to watch, the idea might be to buffer/cache the first 3rd of each show at each user. They show that this is highly suboptimal. Because the content provider can XOR parts of the content to each user, the caching strategy should not be the same at each user, and the real benefit is the global cache size.