ISIT Blogging, part 1

Here are my much-belated post-ISIT notes. I didn’t do as good a job of taking notes this year, so my points may be a bit cursory. Also, the offer for guest posts is still open! On a related note the slides from the plenary lectures are now available on Dropbox, and are also linked to from the ISIT website.

From compression to compressed sensing
Shirin Jalali (New York University, USA); Arian Maleki (Rice University, USA)
The title says it, mostly. Both data compression and compressed sensing use special structure in the signal to achieve a reduction in storage, but while all signals can be compressed (in a sense), not all signals can be compressively sensed. Can one get a characterization (with an algorithm) that can take a lossy source code/compression method, and use it to recover a signal via compressed sensing? They propose an algorithm called compressible signal pursuit to do that. The full version of the paper is on ArXiV.

Dynamic Joint Source-Channel Coding with Feedback
Tara Javidi (UCSD, USA); Andrea Goldsmith (Stanford University, USA)
This is a JSSC problem with a Markov source, which can be used to model a large range of problems, including some sequential search and learning problems (hence the importance of feedback). The main idea is to map the problem in to a partially-observable Markov decision problem (POMDP) and exploit the structure of the resulting dynamic program. They get some structural properties of the solution (e.g. what are the sufficient statistics), but there are a lot of interesting further questions to investigate. I usually have a hard time seeing the difference between finite and infinite horizon formulations, but here the difference was somehow easier for me to understand — in the infinite horizon case, however, the solution is somewhat difficult to compute.

Unsupervised Learning and Universal Communication
Vinith Misra (Stanford University, USA); Tsachy Weissman (Stanford University, USA)
This paper was about universal decoding, sort of. THe idea is that the decoder doesn’t know the codebook but it knows the encoder is using a random block code. However, it doesn’t know the rate, even. The question is really what can one say in this setting? For example, symmetry dictates that the actual message label will be impossible to determine, so the error criterion has to be adjusted accordingly. The decoding strategy that they propose is a partition of the output space (or “clustering”) followed by a labeling. They claim this is a model for clustering through an information theoretic lens, but since the number of clusters is exponential in the dimension of the space, I think that it’s perhaps more of a special case of clustering. A key concept in their development is something they call the minimum partition information, which takes the place of the maximum mutual information (MMI) used in universal decoding (c.f. Csiszár and Körner).

On AVCs with Quadratic Constraints
Farzin Haddadpour (Sharif University of Technology, Iran); Mahdi Jafari Siavoshani (The Chinese University of Hong Kong, Hong Kong); Mayank Bakshi (The Chinese University of Hong Kong, Hong Kong); Sidharth Jaggi (Chinese University of Hong Kong, Hong Kong)
Of course I had to go to this paper, since it was on AVCs. The main result is that if one considers maximal error but allow the encoder only to randomize, then one can achieve the same rates over the Gaussian AVC as one can with average error and no randomization. That is, allowing encoder randomization can move from average error to max error. An analogous result for discrete channels is in a classic paper by Csiszár and Narayan, and this is the Gaussian analogue. The proof uses a similar quantization/epsilon-net plus union bound that I used in my first ISIT paper (also on Gaussian AVCs, and finally on ArXiV), but it seems that the amount of encoder randomization needed here is more than the amount of common randomness used in my paper.

Coding with Encoding Uncertainty
Jad Hachem (University of California, Los Angeles, USA); I-Hsiang Wang (EPFL, Switzerland); Christina Fragouli (EPFL, Switzerland); Suhas Diggavi (University of California Los Angeles, USA)
This paper was on graph-based codes where the encoder makes errors, but the channel is ideal and the decoder makes no errors. That is, given a generator matrix G for a code, the encoder wiring could be messed up and bits could be flipped or erased when parities are being computed. The resulting error model can’t just be folded into the channel. Furthermore, a small amount of error in the encoder (in just the right place) could be catastrophic. They focus just on edge erasures in this problem and derive a new distance metric between codewords that helps them characterize the maximum number of erasures that an encoder can tolerate. They also look at a random erasure model.

Linkage

David McAllester, my department chair at TTI, has a started a new blog.

I thought it was pretty well known that people are fairly unique by ZIP code, but Forbes has an article about it now (h/t Raj). Of course, stores can also ping a smartphone’s WiFi to get more accurate location information about your activity within the store — when you check out they can tag your the MAC address of your device to all the other information about you. Creeptastic!

Bradley Efron’s perspective on the impact of Bayes’ Theorem from Science (h/t Kevin).

Some discussion on what makes a popular philosophy book. I wonder what, if anything, transfers over to a popular mathematical book?

Some thoughts from Larry Laudan on the mathematization of the presumption of innocence.

De-anonymized reviewers in discussion

One big difference between reviewing for conferences like NIPS/ICML and ISIT is that there is a “discussion” period between the reviewers and the Area Chair. These discussions are not anonymized, so you know who the other reviewers are and you can also read their reviews. This leads to a little privacy problem — A and B may be reviewing the same paper P, but A may be an author on a paper Q which is also being reviewed by B. Because A will have access to the text of B’s reviews on P and Q, they can (often) unmask B’s authorship of the review on Q simply by looking at the formatting of the reviews (are bullet points dashes or asterisks, do they give numbered points, are there “sections” to the review, etc). This seems to violate the spirit of anonymous review, which is perhaps why some have suggested that reviewing be unblinded (at least after acceptance).

The extent to which all of this matter is of course a product of the how fast the machine learning literature has grown and the highly competitive nature of the “top tier conferences.” Because the acceptance rate is so low, the reviewing process can appear to be “arbitrary” (read: subjective) and so questions of both review quality and author/review anonymity impact possible biases. However, if aim of double-blind reviewing is to reduce bias, then shouldn’t the discussions also be anonymized?

a cute paper: finding a most biased coin with fewest flips

I saw this paper on ArXiV a while back and figured it would be a fun read, and it was. Post-ISIT blogging may have to wait for another day or two.

Finding a most biased coin with fewest flips
Karthekeyan Chandrasekaran, Richard Karp
arXiv:1202.3639 [cs.DS]

The setup of the problem is that you have n coins with biases \{p_i : i \in [n]\}. For some given p \in [\epsilon,1-\epsilon] and \epsilon \in (0,1/2), each coin is “heavy” (p_i = p + \epsilon) with probability \alpha and “light” (p_i = p - \epsilon) with probability 1 - \alpha. The goal is to use a sequential flipping strategy to find a heavy coin with probability at least 1 - \delta.

Any such procedure has three components, really. First, you have to keep track of some statistics for each coin i. On the basis of that, you need a rule to pick which coin to flip. Finally, you need a stopping criterion.

The algorithm they propose is a simple likelihood-based scheme. If I have flipped a particular coin i a bunch of times and gotten h_i heads and t_i tails, then the likelihood ratio is
L_i = \left( \frac{p+\epsilon}{p - \epsilon} \right)^{h_i} \left( \frac{ 1 - p - \epsilon }{ 1 -p + \epsilon} \right)^{t_i}
So what the algorithm does is keep track of these likelihoods for the coins that it has flipped so far. But what coin to pick? It is greedy and chooses a coin i which has the largest likelihood L_i so far (breaking ties arbitrarily).

Note that up to now the prior probability \alpha of a coin being heavy has not been used at all, nor has the failure probability \delta. These appear in the stopping criterion. The algorithm keeps flipping coins until there exists at least one $i$ for which
L_i \ge \frac{1 - \alpha}{\alpha} \cdot \frac{ 1 - \delta }{\delta}
It then outputs the coin with the largest likelihood. It’s a pretty quick calculation to see that given (h_i, t_i) heads and tails for a coin $i$,
\mathbb{P}(\mathrm{coin\ }i\mathrm{\ is\ heavy}) = \frac{\alpha L_i}{ \alpha L_i + (1 - \alpha) },
from which the threshold condition follows.

This is a simple-sounding procedure, but to analyze it they make a connection to something called a “multitoken Markov game” which models the corresponding mutli-armed bandit problem. What they show is that for the simpler case given by this problem, the corresponding algorithm is, in fact optimal in the sense that it makes the minimum expected number of flips:
\frac{16}{\epsilon^2} \left( \frac{1 - \alpha}{\alpha} + \log\left( \frac{(1 -\alpha)(1 - \delta)}{\alpha \delta} \right) \right)

The interesting thing here is that the prior distribution on the heavy/lightness plays a pretty crucial role here in designing the algorithm. part of the explore-exploit tradeoff in bandit problems is the issue of hedging against uncertainty in the distribution of payoffs — if instead you have a good handle on what to expect in terms of how the payoffs of the arms should vary, you get a much more tractable problem.

ISIT highlights request

ISIT 2013 is over, and there were lots of good talks. Just as a few quick impressions — the conference felt a bit more fragmented to me, topic-wise. I almost didn’t see many of the coding theorists because their sessions were physically separated from the Shannon theory/estimation sessions that I normally find myself in.

However, perhaps the theme of the conference for me this year was “measure concentration.” Between Marton’s Shannon Lecture, Lugosi’s tutorial, and Max’s talk on data processing, I felt like very measured, and yet full of concentration.

I’ll blog about a few of them here, but if any readers want to write a guest post about some interesting talks they saw, please feel free to contact me. I’ll likely post some recaps later in the week after I get back from Asia/Europe.

A poem in lieu of a post

I’ve not been posting because I have been visiting Japan for the last week (and for another week after this). I am teaching one week of a machine learning course at the Toyota Technological Institute (豊田工業大学, Toyota Kōgyō Daigaku), which is the main campus — TTI-Chicago is just a satellite campus. Between jet lag, sightseeing, and lesson prep I haven’t had much of a chance to post about anything.

My friend Celeste posted a link to the poem Telephone Repairman, by Joseph Millar.

Some people who read this blog work on communications. It’s worth taking a pause occasionally to contemplate, as the character in the poem does:

He thinks of the many signals
flying in the air around him
the syllables fluttering,
saying please love me,
from continent to continent
over the curve of the earth.

Open letter to the Urbana Free Library

I learned recently of a terrible tragedy at the Urbana Free Library, which was one of the formative institutions of my childhood. The Executive Director, Debra Lissak, authorized the removal of all books from the Adult Collection which were published after 2003. That was a mere 10 years ago. The collection of short stories by Hasan Saadat Manto, one of the most important chroniclers of partition, would have been summarily tossed. Why? Because the library didn’t own a newer edition of his work. Such decisions make me wonder if Lissak even reads books, or understands the function of a public library. I don’t think one needs a library science degree to understand that a terrible travesty has occurred here. I wrote a letter to Lissak, and if you read this and care about the UFL, I encourage you to do so as well.

I’ve also started a petition via change.org here.

The Library’s “official response” is here, and Lissak has been blaming “communication problems,” effectively blaming her staff when she should take responsibility for these acts. Her actions undermine the viability of the library and credibility of its leadership. It’s hard to be a Friend of a library whose administration is so deaf to the community it serves.

You can contact Debra Lissak, Executive Director, at 217-367-4058 or at dlissak@tufl.info. Because she is apparently disregarding the opinions of others, you may want to CC the administration of the library: dcassady@tufl.info, reference@tufl.info, lfegley@tufl.info, avoss@tufl.info, kwicks@tufl.info, foundation@tufl.info, administration@tufl.info, mfarrell@tufl.info, cscherer@tufl.info, bscheid@tufl.info, sbennett@tufl.info, aho@tufl.info, ejakobsson@tufl.info, jwilliams@tufl.info, amerritt@tufl.info, mnetter@tufl.info.

My letter to Lissak is below.
Continue reading

ISIT notice from the organizers on the protest

Dear ISIT Participant,

As you may well be aware, there is an ongoing sit-in protest in Taksim Square, Istanbul. The protest is about a road construction and renovation project which calls for cutting down a large number of trees and demolishing a park. The protest has triggered street demonstrations in many cities around Turkey and has not yet subsided. Demonstrations continue to take place at Taksim Square especially in the late afternoon and evening hours. There are no clashes between the protesters and the police on or near Taksim Square. The shops are open and business runs as usual in the area. The Istanbul Conference and Exhibition Center (ICEC), where ISIT 2013 will take place, is about 1 km from Taksim Square and is safely away from the scene of the protests. The conference hotels which are at Talimhane district are 200-500 m away from the Taksim Square and there have been no reported cases of disturbance to the hotels or their guests. We are hoping that by the time of ISIT, the protests will come to an end. You may find the latest travel advisories issued by various governments here.

We will be updating you here as events develop.

Erdal Arıkan – Elza Erkip, ISIT 2013 Co-Chairs
Gerhard Kramer, President IT Society

Yet more skims from ArXiV

I’m still catching up on my backlog of reading everything, but I’ve decided to set some time aside to take a look at a few papers from ArXiV.

  • Lecture Notes on Free Probability by Vladislav Kargin, which is 100 pages of notes from a course at Stanford. Pretty self-explanatory, except for the part where I don’t really know free probability. Maybe reading these will help.
  • Capturing the Drunk Robber on a Graph by Natasha Komarov and Peter Winkler. This is on a simple pursuit-evasion game in which the robber (evader) is moving according to a random walk. On a graph with n vertices:

    the drunk will be caught with probability one, even by a cop who oscillates on an edge, or moves about randomly; indeed, by any cop who isn’t actively trying to lose. The only issue is: how long does it take? The lazy cop will win in expected time at most 4 n^3/27 (plus lower-order terms), since that is the maximum possible expected hitting time for a random walk on an n-vertex graph [2]; the same bound applies to the random cop [4]. It is easy to see that the greedy cop who merely moves toward the drunk at every step can achieve O(n^2); in fact, we will show that the greedy cop cannot in general do better. Our smart cop, however, gets her man in expected time n + o(n).

    How do you make a smarter cop? In this model the cop can tell where the robber is but has to get there by walking along the graph. Strategies which try to constantly “retarget” are wasteful, so they propose a strategy wherein the cop periodically retargets to eventually meet the robber. I feel like there is a prediction/learning algorithm or idea embedded in here as well.

  • Normalized online learning by Stephane Ross, Paul Mineiro, John Langford. Normalization and data pre-processing is the source of many errors and frustrations in machine learning practice. When features are not normalized with respect to each other, procedures like gradient descent can behave poorly. This paper looks at dealing with data normalization in the algorithm itself, making it “unit free” in a sense. It’s the same kind of weights-update rule that we see in online learning but with a few lines changed. They do an adversarial analysis of the algorithm where the adversary gets to scale the features before the learning algorithm gets the data point. In particular, the adversary gets to choose the covariance of the data.
  • On the Optimality of Treating Interference as Noise, by Chunhua Geng, Navid Naderializadeh, A. Salman Avestimehr, and Syed A. Jafar. Suppose I have a K-user interference channel with gains \alpha_{ij} between transmitter i and receiver j. Then if
    \alpha_{ii} \ge \max_{j \ne i} \alpha_{ij} + \max_{k \ne i} \alpha_{ki}
    then treating interference as noise is optimal in terms of generalized degrees of freedom. I don’t really work on this kind of thing, but it’s so appealing from a sense of symmetry.
  • Online Learning under Delayed Feedback, byPooria Joulani, András György, Csaba Szepesvári. This paper is on forecasting algorithms which receive the feedback (e.g. the error) with a delay. Since I’ve been interested in communication with delayed feedback, this seems like a natural learning analogue. They provide ways of modifying existing algorithms to work with delayed feedback — one such method is to run a bunch of predictors in parallel and update them as the feedback is returned. They also propose methods which use partial monitoring and an approach to UCB for bandit problems in the delayed feedback setting.