Linkage

The fog in San Francisco (h/t Erin Rhode).

A general approach to privacy/utility tradeoffs, with metric spaces! A new preprint/note by Robert Kleinberg and Katrina Ligett.

Max breaks it down for you on how to use the divergence to get tight convergence for Markov chains.

The San Diego Asian Film Festival starts on Thursday!

Apparently China = SF Chinatown. Who knew? Maybe the fog confused them.

Allerton 2010 : the only talks I’ll blog about

Hey, lookit! I’m blogging about a conference somewhat near to when the conference happened!

I’m just going to write about a few talks. This is mostly because I ended up not taking as many notes this year, but also because writing up extensive notes on talks is a bit too time consuming. I found the talks by Paul Cuff, Sriram Vishwanath, Raj Rajagopalan, and others interesting, but no notes. And of course I enjoyed the talks by my “bosses” at UCSD, Alon Orlitsky and Tara Javidi. That’s me being honest, not me trying to earn brownie points (really!)

So here were 5 talks which I thought were interesting and I took some notes.

Continue reading

Allerton 2010 : the sessions’ (lack of) timing

Warning: small rant below. I’m probably not as ticked off as this makes me sound.

One thing that seemed significantly off this year from previous times I’ve been to Allerton is that around 3/4 of the talks I attended went over the alloted time. Why does this happen?

For one thing, more than half of the sessions at Allerton are invited. This means that some speakers know what they are going to talk about in general, but haven’t necessarily pinned down the whole story. This is amplified by the fact that the camera-ready paper is due on the last day of the conference (the deadline was pushed back to Monday this year). For invited talks, many people have not even started writing the paper until they get on the plane, adding uncertainty as to what they can or should present. Little lemmas are proved hours before the deadline. It’s not unusual to make slides on the plane to the conference, but if the actual results are in flux, what are you going to put on the slides? Why, the entire kitchen sink, of course!

The actual session brings up other issues. Because people are editing their slides until the last minute, they insist on using their own laptop, causing delays as the laptops are switched, the correct display is found, and the presentation remote is set up. This is a gigantic waste of time. Almost all laptops provided by conference organizers are PCs, which can display PDF (generated by LaTeX or Keynote) and PowerPoint. Why must you use your own laptop? So the slide transitions will be oh-so pretty?

Finally, many session chairs don’t give warnings early enough and don’t enforce time constraints. Once a habit of talks running over is established, it becomes unfair to cut off one speaker if you didn’t cut off another. Naturally, speakers feel upset if someone got more time to present than they did.

What we should ask ourselves is this : is the talk for the benefit of the speaker or for the benefit of the audience?

ITW 2010 : finishing up

Blogging conferences seems to have gone by the wayside for me, but some quick takes on things from ITW. It was a more coding-focused conference so there were fewer talks of active research interest to me, but I did get to catch up and talk shop with a few people, so that was nice.

Tali Kaufman (who seems to have no homepage) gave a plenary on “Local computation in codes” which looked at fast (sublinear time) algorithms for detecting if a binary vector is from a code, and fast ways to correct single bit errors. In particular, she was looking at these properties in the context of LDPC codes. It’s a nice place where information theory and CS theory look at the same object but with different intents.

Ueli Maurer gave a great talk on “A Cryptographic Theory of System Indistinguishability,” which started out kind of slow and then ended up at a breakneck speed. This was a way of thinking about crypto from the perspective of systems being able to simulate each other.

Rudolf Ahlswede talked a bit about strong converses and weak converses for channels with a rather generic “time structure.” Essentially this boils down to a difference between lim inf and lim sup, and he gave a rather short proof showing that capacities exist under very mild conditions and that the additivity of capacity (e.g. for two parallel channels) may hold in some settings but not others.

There were lots of other good talks that I enjoyed but I didn’t take particularly good notes this time (I blame the jet lag), so I don’t have much to say here. Tomorrow is the start of Allerton, so I might take better notes for that.

Information Theoretic Anagrams

I have been visiting Berkeley for the end of last week and beginning of this week, and part of my job was to clean out this old box of papers and printouts that I left when I moved down to UCSD. Most of the papers I printed out in grad school didn’t make it down, and a small fraction of those I ended up re-printing at ITA. It’s a sad waste of paper, but I also noticed that I printed out lots of them because someone said to check them out and I did print-first-read-second. Thankfully as time wore on I have switched to the other direction, but I don’t know if I’ll ever be able to do the all-electronic e-reader thing. I just love marking up papers with a red/green/purple pen. Some of the more heavily marked ones are coming back with me. Hopefully they won’t weigh down the plane too much.

One of the little gems I found among the photocopies of old reimbursement forms, conference schedules, and mouldering reprints of my optical queueing article was Bob McEliece‘s handout from his 2004 Shannon Lecture on “Some Information-Theoretic Anagrams”:

  • A Sound Channel
  • Brainy Coed
  • Rome Noodles
  • Cubed Roots
  • UCLA Shenanigans
  • Coordinate Spasm
  • Momentary Mixup
  • Acquiescent Yelp

assorted links and news

More on self-plagiarizing.

This looks like an interersting book on the homeless, especially given all the time I spent in the Bay Area.

Tyler Perry has shortened the title of his film adaptation of Ntozake Shange’s For Colored Girls Who Have Considered Suicide When the Rainbow is Enuf.

Evaluating Fredric Jameson.

Max really digs in to directed information.

In other news, after ITW I went to Paris to hang out and work with Michele Wigger on a small story related to the multiaccess channel with user cooperation. While I was there saw some fun art by Detanico/Lain and caught a show by Fever Ray at L’Olympia. In fact, I’ll be headlining there soon:

ADS Headlines at L'Olympia

Have a good Sunday, everyone!

ITW Dublin : historical take on polar codes

I am at ITW in Dublin, and I will write a short post or two about it. I missed most of the conference until now due to jetlag and late arrival, but I did make it to Arikan’s plenary lecture this morning on the historical context for polar codes. It was a really nice talk about successive decoding and how it relates to polar codes. A central issue is the computation cutoff rate R_{comp}, which prevents successive decoding from reaching capacity.

He described Pinsker’s “concatenated” construction of convolutional encoders around a block code, which is capacity-achieving but inefficient, and Massey’s 1981 construction of codes for the quaternary erasure channel which decomposes the QEC into two parallel BECs whose noise is correlated (you just relabel the 4 inputs with 2 bits and treat the two bits as going through parallel BECs). This is efficient, increases R_{comp}, but is not enough to get to capacity. However, in a sense, Massey’s construction is like doing one step in polar codes, and combining this with Pinkser’s ideas starts getting the flavor of the channel polarization effect.

Good stuff!

Lossless compression via the memoizer

Via Andrew Gelman comes a link to deplump, a new compression tool. It runs the data through a predictive model (like most lossless compressors), but:

Deplump compression technology is built on a probabilistic discrete sequence predictor called the sequence memoizer. The sequence memoizer has been demonstrated to be a very good predictor for discrete sequences. The advantage deplump demonstrates in comparison to other general purpose lossless compressors is largely attributable to the better guesses made by the sequence memoizer.

The paper on the sequence memoizer (by Wood et al.) appeared at ICML 2009, with follow-ups at DCC and ICML 2010 It uses as its probabilistic model a version of the Pitman-Yor process, which is a generalization of the “Chinese restaurant”/”stick-breaking” process. Philosophically, the idea seems to be this : since we don’t know the order of the Markov process which best models the data, we will let the model order be “infinite” using the Pitman-Yor process and just infer the right parameters, hopefully avoiding overfitting while being efficient. The key challenge is that since the process can have infinite memory, the encoding seems to get hairy, which is why “memoization” becomes important. It seems that the particular parameterization of the PY process is important to reduce the number of parameters, but I didn’t have time to look at the paper in that much detail. Besides, I’m not as much of a source coding guy!

I tried it out on Leo Breiman’s paper Statistical Modeling: The Two Cultures. Measured in bytes:

307458 Breiman01StatModel.pdf         original
271279 Breiman01StatModel.pdf.bz2     bZip (Burrows-Wheeler transform)
269646 Breiman01StatModel.pdf.gz      gzip
269943 Breiman01StatModel.pdf.zip     zip
266310 Breiman01StatModel.pdf.dpl     deplump

As promised, it is better than the alternatives, (but not by much for this example).

What is interesting is that they don’t seem to cite much from the information theory literature. I’m not sure if this is a case of two communities working on related problems and unaware of the connections or that the problems are secretly not related, or that information theorists mostly “gave up” on this problem (I doubt this, but like I said, I’m not a source coding guy…)

ISIT 2010 : a few more talks

I think I lack the willpower to write up more notes on talks, and there are other things I’d like to blog about, but here are one or two sentences on some other talks that I found interesting… I also enjoyed the energy session on Friday and the talks by Sundeep and Galen on compressed sensing, but time has gotten the better of me. Next time, folks.

Channel Intrinsic Randomness
Matthieu Bloch
This was on extracting random bits from the output of noisy channel. These bits should be independent of the input to the channel. Matthieu uses the enigmatic information spectrum method to get his results — thanks to the plenary lecture I was able to understand it a bit better than I might have otherwise.

Assisted Common Information
Vinod Prabhakaran and Manoj Prabhakaran
I was very interested in this talk because I have been thinking of a related problem. Two terminals observe correlated sources X_1^n and Y_1^n respectively. A genie observes both sources and sends messages at rates R_1 and R_2 to the two terminals, who then have to produce variables W_1 and W_2 which have large entropies and are also equal with high probability. This problem is connected to the Gacs-Korner problem and Wyner’s common information problem, and also possibly this recent preprint by Andrej Bogdanov and Elchanan Mossel. They manage to solve it using a novel construction of “monotone regions.”

Patterns and Exchangeability
N. P. Santhanam and M. Madiman
This work grew out of the AIM workshop on permanents. De Finetti’s theorem says an exchangeable process can be built up as a mixture of iid processes. Kingman showed that something called an exchangeable partition process is built up from something he called “paintbox processes.” One thing that we discovered at the workshop was that the pattern process of an iid process is the same as a paintbox process (and vice-versa). The paper then goes through many connections between these processes, certain limits of graphs, and connections to universal compression.

Universal Hypothesis Testing in the Learning-Limited Regime
Benjamin G. Kelly, Thitidej Tularak, Aaron B. Wagner, and Pramod Viswanath
This was a really great talk. The problem here is that for each n you get n samples X^n distributed i.i.d. according to p_n or q_n on an alphabet \mathcal{A}_n which can grow with n. Given a new sequence Z^n you have to decide if it was generated according to p_n or q_n. They show a number of results which say that consistency is possible for |\mathcal{A}_n| sublinear in n, impossible for quadratic in n, and other intermediate results. In particular, for well-behaved distributions with |\mathcal{A}_n| = \Theta(n^{\alpha}) and all probabilities are \Theta(n^{-\alpha}), they can get some consistency results, but in particular the generalized likelihood ratio test (GLRT) is inconsistent for \alpha = 1.

Feature Extraction for Universal Hypothesis Testing via Rank-Constrained Optimization
Dayu Huang and Sean Meyn
This talk was of interest to me because I have been looking at hypothesis testing problems in connection with election auditing. In universal testing you know a lot about the distribution for one hypothesis, but much less about the other hypothesis. The Hoeffding test is a threshold test on the KL-divergence between the empirical distribution and the known hypothesis. This test is asymptotically optimal but has a high variance when the data is in high dimension. Thus for smaller sample sizes, a so-called mismatched divergence test may be better. In this paper they look at how to tradeoff the variance and the error exponent of the test.