dismissing research communities is counterproductive

I recently saw that Andrew Gelman hasn’t really heard of compressed sensing. As someone in the signal processing/machine learning/information theory crowd, it’s a little flabbergasting, but I think it highlights two things that aren’t really appreciated by the systems EE/algorithms crowd: 1) statistics is a pretty big field, and 2) the gulf between much statistical practice and what is being done in SP/ML research is pretty wide.

The other aspect of this is a comment from one of his readers:

Meh. They proved L1 approximates L0 when design matrix is basically full rank. Now all sparsity stuff is sometimes called ‘compressed sensing’. Most of it seems to be linear interpolation, rebranded.

I find such dismissals disheartening — there is a temptation to say that every time another community picks up some models/tools from your community that they are reinventing the wheel. As a short-hand, it can be useful to say “oh yeah, this compressed sensing stuff is like the old sparsity stuff.” However, as a dismissal it’s just being parochial — you have to actually engage with the use of those models/tools. Gelman says it can lead to “better understanding one’s assumptions and goals,” but I think it’s more important to “understand what others’ goals.”

I could characterize rate-distortion theory as just calculating some large deviations rate functions. Dembo and Zeitouni list RD as an application of the LDP, but I don’t think they mean “meh, it’s rebranded LDP.” For compressed sensing, the goal is to do the inference in a computationally and statistically efficient way. One key ingredient is optimization. If you just dismiss all of compressed sensing as “rebranded sparsity” you’re missing the point entirely.

ISIT Blogging, part 1

Here are my much-belated post-ISIT notes. I didn’t do as good a job of taking notes this year, so my points may be a bit cursory. Also, the offer for guest posts is still open! On a related note the slides from the plenary lectures are now available on Dropbox, and are also linked to from the ISIT website.

From compression to compressed sensing
Shirin Jalali (New York University, USA); Arian Maleki (Rice University, USA)
The title says it, mostly. Both data compression and compressed sensing use special structure in the signal to achieve a reduction in storage, but while all signals can be compressed (in a sense), not all signals can be compressively sensed. Can one get a characterization (with an algorithm) that can take a lossy source code/compression method, and use it to recover a signal via compressed sensing? They propose an algorithm called compressible signal pursuit to do that. The full version of the paper is on ArXiV.

Dynamic Joint Source-Channel Coding with Feedback
Tara Javidi (UCSD, USA); Andrea Goldsmith (Stanford University, USA)
This is a JSSC problem with a Markov source, which can be used to model a large range of problems, including some sequential search and learning problems (hence the importance of feedback). The main idea is to map the problem in to a partially-observable Markov decision problem (POMDP) and exploit the structure of the resulting dynamic program. They get some structural properties of the solution (e.g. what are the sufficient statistics), but there are a lot of interesting further questions to investigate. I usually have a hard time seeing the difference between finite and infinite horizon formulations, but here the difference was somehow easier for me to understand — in the infinite horizon case, however, the solution is somewhat difficult to compute.

Unsupervised Learning and Universal Communication
Vinith Misra (Stanford University, USA); Tsachy Weissman (Stanford University, USA)
This paper was about universal decoding, sort of. THe idea is that the decoder doesn’t know the codebook but it knows the encoder is using a random block code. However, it doesn’t know the rate, even. The question is really what can one say in this setting? For example, symmetry dictates that the actual message label will be impossible to determine, so the error criterion has to be adjusted accordingly. The decoding strategy that they propose is a partition of the output space (or “clustering”) followed by a labeling. They claim this is a model for clustering through an information theoretic lens, but since the number of clusters is exponential in the dimension of the space, I think that it’s perhaps more of a special case of clustering. A key concept in their development is something they call the minimum partition information, which takes the place of the maximum mutual information (MMI) used in universal decoding (c.f. Csiszár and Körner).

Farzin Haddadpour (Sharif University of Technology, Iran); Mahdi Jafari Siavoshani (The Chinese University of Hong Kong, Hong Kong); Mayank Bakshi (The Chinese University of Hong Kong, Hong Kong); Sidharth Jaggi (Chinese University of Hong Kong, Hong Kong)
Of course I had to go to this paper, since it was on AVCs. The main result is that if one considers maximal error but allow the encoder only to randomize, then one can achieve the same rates over the Gaussian AVC as one can with average error and no randomization. That is, allowing encoder randomization can move from average error to max error. An analogous result for discrete channels is in a classic paper by Csiszár and Narayan, and this is the Gaussian analogue. The proof uses a similar quantization/epsilon-net plus union bound that I used in my first ISIT paper (also on Gaussian AVCs, and finally on ArXiV), but it seems that the amount of encoder randomization needed here is more than the amount of common randomness used in my paper.

Coding with Encoding Uncertainty
Jad Hachem (University of California, Los Angeles, USA); I-Hsiang Wang (EPFL, Switzerland); Christina Fragouli (EPFL, Switzerland); Suhas Diggavi (University of California Los Angeles, USA)
This paper was on graph-based codes where the encoder makes errors, but the channel is ideal and the decoder makes no errors. That is, given a generator matrix $G$ for a code, the encoder wiring could be messed up and bits could be flipped or erased when parities are being computed. The resulting error model can’t just be folded into the channel. Furthermore, a small amount of error in the encoder (in just the right place) could be catastrophic. They focus just on edge erasures in this problem and derive a new distance metric between codewords that helps them characterize the maximum number of erasures that an encoder can tolerate. They also look at a random erasure model.

/

ISIT 2012 : more talks

Since I am getting increasingly delayed by post-ISIT and pre-SPCOM business, I am going to have to keep the rest of blogging about ISIT a little short. This post will mention some talks, and I’ll keep the other stuff for a (final) post.

Efficient Tracking of Large Classes of Experts
András György, Tamas Linder, Gabor Lugosi
This paper was on expanding the reference class against one is competing in a “prediction with experts” problem. Instead of doing well against the best expert chosen in hindsight, you compete against the best meta-expert which can switch between the existing experts. This leads to a transition diagram that is kind of complicated, but they propose a unifying approach which traces along branches — the key is that every transition path can be well approximated, so the space of possibilities one is tracking will not blow up tremendously.

Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing
David Donoho, Adel Javanmard, Andrea Montanari
What a trendy title! Basically this problem looks at the compressed sensing problem when the sensing matrix is banded (this is what spatially coupled means), and solves it using Bayesian approximate message passing to do progressive decoding and elimination. The optimality is in the sense of matching with the Renyi dimension of the signal class for the data. I alas did not take notes for the next talk, which also seemed related: Hybrid Generalized Approximate Message Passing with Applications to Structured Sparsity (Sundeep Rangan, Alyson Fletcher, Vivek Goyal, Philip Schniter)

Quantized Stochastic Belief Propagation: Efficient Message-Passing for Continuous State Spaces
Nima Noorshams, Martin Wainwright
This problem was on BP when the state space is continuous — instead of passing the whole belief distribution, nodes pass along samples from the distribution and the receiving node does a kind of interpolation/estimate of the density. They show that this process converges on trees. This is related to a problem I’ve been thinking about for decentralized inference, but with a different approach.

Synchrony Amplification
Ueli Maurer, Björn Tackmann
This was a cool talk on a framework for thinking about synchrony in clocks — the model is pretty formal, and it’s something I never really think about but it seemed like a fun way to think about these problems. Basically they want to formalize how you can take a given clock (a sequence of ticks) and convert it into another clock. The goal is to not throw out too many ticks (which equals slowdown), while achieving synchrony.

Non-coherent Network Coding: An Arbitrarily Varying Channel Approach
Mahdi Jafari Siavoshani, Shenghao Yang, Raymond Yeung
Of course I have to go to a talk with AVC in the title. This looks at the same operator channel for network coding but then they assume the network matrix may be arbitrarily varying (with known rank). In this model they can define all the usual AVC concepts and they get similar sorts of results that you see for AVCs, like dichotomies between deterministic coding with average error and randomized coding.

Alternating Markov Chains for Distribution Estimation in the Presence of Errors
This talk was on the repetition channel and getting the redundancy of alternating patterns. They show upper and lower bounds. The idea is you start out with a word like abccd and it goes through a repetition channel to get aaabbcccdddd for example, and then you look instead at abcd by merging repeated letters.

On Optimal Two Sample Homogeneity Tests for Finite Alphabets
Jayakrishnan Unnikrishnan
A two-sample test means you have two strings $x^n$ and $y^n$ and you want to know if they are from the same distribution. He looked at the weak convergence of the asymptotically optimal test to get bounds on the false alarm probability.

Hypothesis testing via a comparator
Yury Polyanskiy
This was on a model where two nodes get to observe $X^n$ and $Y^n$ drawn i.i.d. from either $P_{XY}$ or $Q_{XY}$ and they separately compress their observations into messages $W_1$ and $W_1$. The decision rule is to decide $P_{XY}$ if $W_1 = W_2$. What’s the best exponent?

The Supermarket Game
Jiaming Xu, Bruce Hajek
This was on queuing. Customers come in and sample the loads of $L$ queues and then pick one to join. Their strategies may differ, so there is a game between the customers and this can affect the distribution of queue sizes. As a flavor of the weird stuff that can happen, suppose all customers but one only sample one queue and join that queue. Then the remaining customer will experience less delay if they sample two and join the shorter one. However, if all but one sample two and join the shorter one, then it’s better for her to sample just one. At least, that’s how I understood it. I’m not really a queueing guy.

ISIT 2009 : post the first

I’m at ISIT 2009 now at the COEX Center. Korea is pretty good so far (pictures to be posted later). The conference started today with a plenary by Rich Baraniuk, who talked about compressed sensing. I’d seen parts of the talk before, but it’s always nice to hear Rich speak because he’s so passionate about how cool the thing is.

I’ll try to blog about talks that I saw that were interesting as time permits — this time I’ll try to reduce the delay between talks and posts!