ITW Dublin : historical take on polar codes

I am at ITW in Dublin, and I will write a short post or two about it. I missed most of the conference until now due to jetlag and late arrival, but I did make it to Arikan’s plenary lecture this morning on the historical context for polar codes. It was a really nice talk about successive decoding and how it relates to polar codes. A central issue is the computation cutoff rate R_{comp}, which prevents successive decoding from reaching capacity.

He described Pinsker’s “concatenated” construction of convolutional encoders around a block code, which is capacity-achieving but inefficient, and Massey’s 1981 construction of codes for the quaternary erasure channel which decomposes the QEC into two parallel BECs whose noise is correlated (you just relabel the 4 inputs with 2 bits and treat the two bits as going through parallel BECs). This is efficient, increases R_{comp}, but is not enough to get to capacity. However, in a sense, Massey’s construction is like doing one step in polar codes, and combining this with Pinkser’s ideas starts getting the flavor of the channel polarization effect.

Good stuff!

Completely ridiculous stock photography

I get the IEEE Communications and Signal Processing magazines electronically to save paper (I find they don’t make for great bus reading, so the print version is less appealing). Every month they dutifully send me an email with possibly the most ridiculous stock images. For example, here’s the one for the Signal Processing magazine:

Happy Signal Processing Dude

Oh man, I am so STOKED to get this Signal Processing Magazine! Wooo hoooo!

First off, who is this dude, and what is wrong with his life such that getting this magazine makes him so happy? Clearly he’s not an engineer since he’s wearing a suit. Maybe he works in finance? Of perhaps government, since he’s walking down some pretty “city hall”-looking stairs. Maybe it’s a courthouse, and he’s been cleared of all charges, thanks to the evidence in the signal processing magazine?

Now here’s the Communications one:

Peek inside Communications

Hey there, want to have some sitcom-like hijinks with 4G communication systems?

Again, who is this woman, and why is she creepily hiding behind the magazine, only to pop around the side, holding the pages shut just as I’m opening it? You scared me, lady! What are you trying to do, give me a heart attack? Or are you some sort of not-so-subtle ploy to lure the predominantly male engineering audience to download the magazine?

Honestly, I wish IEEE would not bother paying for the graphic design of the download image and instead use the money for something else, like defraying subscription costs for developing nations. As it stands, these emails make me take the magazine less seriously.

an old-ish thought

I just got another email about a “Frontiers in X” conference, so I thought of this. Have you ever noticed how biology and medicine always have conferences that have the word “frontiers” in the title? I guess that’s because they are very high dimensional phenomena, and as we know, the sphere in high dimensions has all of its mass at the boundary.

Of course, the downside is that the volume of the unit sphere goes to 0 as the dimension goes to infinity…

Lossless compression via the memoizer

Via Andrew Gelman comes a link to deplump, a new compression tool. It runs the data through a predictive model (like most lossless compressors), but:

Deplump compression technology is built on a probabilistic discrete sequence predictor called the sequence memoizer. The sequence memoizer has been demonstrated to be a very good predictor for discrete sequences. The advantage deplump demonstrates in comparison to other general purpose lossless compressors is largely attributable to the better guesses made by the sequence memoizer.

The paper on the sequence memoizer (by Wood et al.) appeared at ICML 2009, with follow-ups at DCC and ICML 2010 It uses as its probabilistic model a version of the Pitman-Yor process, which is a generalization of the “Chinese restaurant”/”stick-breaking” process. Philosophically, the idea seems to be this : since we don’t know the order of the Markov process which best models the data, we will let the model order be “infinite” using the Pitman-Yor process and just infer the right parameters, hopefully avoiding overfitting while being efficient. The key challenge is that since the process can have infinite memory, the encoding seems to get hairy, which is why “memoization” becomes important. It seems that the particular parameterization of the PY process is important to reduce the number of parameters, but I didn’t have time to look at the paper in that much detail. Besides, I’m not as much of a source coding guy!

I tried it out on Leo Breiman’s paper Statistical Modeling: The Two Cultures. Measured in bytes:

307458 Breiman01StatModel.pdf         original
271279 Breiman01StatModel.pdf.bz2     bZip (Burrows-Wheeler transform)
269646 Breiman01StatModel.pdf.gz      gzip
269943 Breiman01StatModel.pdf.zip     zip
266310 Breiman01StatModel.pdf.dpl     deplump

As promised, it is better than the alternatives, (but not by much for this example).

What is interesting is that they don’t seem to cite much from the information theory literature. I’m not sure if this is a case of two communities working on related problems and unaware of the connections or that the problems are secretly not related, or that information theorists mostly “gave up” on this problem (I doubt this, but like I said, I’m not a source coding guy…)

Some not-so-recent reads

The Other Indians: A Political and Cultural History of South Asians in America (Vinay Lal) – a short and lively introduction to the history of South Asians in the US, from the colonial era through 19th century Sikh immigration to the present. I don’t think I learned any new facts, per se, but it’s a lot less polemical book than say, The Karma of Brown Folk. Recommended.

The Mindbody Prescription: Healing the Body, Healing the Pain (John E. Sarno) – this was recommended by more than one friend as a potential way to cure my repetitive stress injury and shoulder/neck tension. Sarno’s thesis is a little complicated to describe, but basically he says “repressed anger causes many chronic conditions” and advocates a kind of cognitive-behavioral therapy for it. I found it pretty unconvincing, although there is some good advice sprinkled here and there in the book. Skippable.

The Trade of Queens (Charles Stross) – final book in Stross’s Merchant Princes series, I basically read this because I had already read the other ones and I wanted to see how it turned out. This series started out pretty interesting (although with many of the same racial blind spots as most SciFi), but I lost interest somewhere around the 3rd book. Recommended if you are OCD about finishing things.

A Fragile Power (Chandra Mukerji) – this is an interesting science studies book on how “big science” works, using two oceanographic institutes (Wood’s Hole and Scripps, thinly disguised as “Big Lab” and “Blue Lab” for IRB reasons, presumably) as case studies. It describes issues like power dynamics within the lab, tensions between engineers who make the instruments and the scientists that use them, how the government uses scientists to further their own ends, and so on. Her main thesis is that the state uses its funding power to maintain a sort of “reserve” of scientific expertise — in the case of the military, they have an interest in people who understand and can map the sea floor, and so they fund scientists to train students in the use of these technologies. Although “big science” is not “academic engineering,” many of the observations she makes in her book do seem relevant. I don’t think anyone has done a study of this sort on academic engineering, where there is a lot more industry sponsorship. A good thesis topic for… someone else. Highly recommended.

The Space Between Us (Thrity Umrigar) – a novel about a Parsee widow and her relationship with her servant, who has been with her for a long time. At times touching, and at other times melodramatic, it exemplifies a kind of middle-class sensibility about poverty and class-relations in India. The personal tragedies don’t transform into a larger critique of the system. I’m sure Amitava Kumar would have something more intelligent to say about it. Somewhat recommended (cautiously) if you’re interested in South Asian lit in English.

Unseen Academicals (Terry Pratchett) – a Discworld novel about soccer. Perfect during the World Cup.

The Beautiful Struggle (Ta-Nehisi Coates) – a touching and beautifully written memoir about growing up in Baltimore, raised by an ex-Black Panther father. Coates struggles against the expectations of his father and himself, becoming Conscious and getting the Knowledge that his father prized so much. Highly recommended.

Passport Pictures (Amitava Kumar) – a book about immigration, unheard voices, the politics of knowledge, where the rubber of theory meets the road of reality in postcolonial and transnational studies. The best part about this book is how it questions itself and how it takes what might be arch turn of phrase and actually tries to answer it. For example, “if the word ‘curry’ doesn’t have a stable referent or a fixed origin, how can its changing use by postcolonials be seen as a sign of resistance?” Recommended for those with an interest in immigration from South Asia and a taste for some postcolonial theory (but with examples! And poetry! And photography!)

Inherent Vice (Thomas Pynchon) – this was definitely not what I expected. A kind of potboiler of a detective novel but with more drugs and cultural references than you could shake a stick at. A tribute to LA in some era that I can’t recognize. Much more approachable than the monster tomes that Pynchon has put out (and I have not read), this book was also a real page turner.

The City and The City (China Mieville) – I, like others, am a bit of a China Mieville fanboy, but I think this is one of his better novels (it derailed a bit at the end for me, but I don’t want to give anything away). It’s set in a very strange city — two cities in the same place, to be precise, and it really brings up a lot of interesting questions about society divisions, (un)seeing, and global politics. The allegory doesn’t seem too far from reality. Recommended!

Feller’s anti-Bayesianism

I didn’t really realize that in Feller’s classic probability book he had the following dismissal of Bayesian statistics:

Unfortunately Bayes’ rule has been somewhat discredited by metaphysical applications of the type described above. In routine practice, this kind of argument can be dangerous. A quality control engineer is concerned with one particular machine and not with an infinite population of machines from which one was chosen at random. He has been advised to use Bayes’ rule on the grounds that it is logically acceptable and corresponds to our way of thinking. Plato used this type of argument to prove the existence of Atlantis, and philosophers used it to prove the absurdity of Newton’s mechanics. In our case it overlooks the circumstance that the engineer desires success and that he will do better by estimating and minimizing the sources of various types of errors in predicting and guessing. The modern method of statistical tests and estimation is less intuitive but more realistic. It may be not only defended but also applied.” — W. Feller, 1950 (pp. 124-125 of the 1970 edition)

A few weeks ago, a little note on Feller’s anti-Bayesianism was posted to ArXiV. It’s a bit of an emotional read; a mathematical Op-Ed if you will. However, it does present an interesting perspective on historical “received wisdom” in light of more modern approaches to statistics and Bayesian data analysis. As an example, take the methods from Michael Jordan’s talk at ISIT (video and slides on the ITSOC website now!), using which you can do some cross-validation to see that they are indeed producing the correct results on real data.

What I am missing (as an outsider to the internecine battles of statistics) is an even-handed explanation of what all the hullabaloo is about. Such an article probably exists, but I haven’t seen it…