ITW Dublin : historical take on polar codes

I am at ITW in Dublin, and I will write a short post or two about it. I missed most of the conference until now due to jetlag and late arrival, but I did make it to Arikan’s plenary lecture this morning on the historical context for polar codes. It was a really nice talk about successive decoding and how it relates to polar codes. A central issue is the computation cutoff rate $R_{comp}$, which prevents successive decoding from reaching capacity.

He described Pinsker’s “concatenated” construction of convolutional encoders around a block code, which is capacity-achieving but inefficient, and Massey’s 1981 construction of codes for the quaternary erasure channel which decomposes the QEC into two parallel BECs whose noise is correlated (you just relabel the 4 inputs with 2 bits and treat the two bits as going through parallel BECs). This is efficient, increases $R_{comp}$, but is not enough to get to capacity. However, in a sense, Massey’s construction is like doing one step in polar codes, and combining this with Pinkser’s ideas starts getting the flavor of the channel polarization effect.

Good stuff!

Completely ridiculous stock photography

I get the IEEE Communications and Signal Processing magazines electronically to save paper (I find they don’t make for great bus reading, so the print version is less appealing). Every month they dutifully send me an email with possibly the most ridiculous stock images. For example, here’s the one for the Signal Processing magazine:

Oh man, I am so STOKED to get this Signal Processing Magazine! Wooo hoooo!

First off, who is this dude, and what is wrong with his life such that getting this magazine makes him so happy? Clearly he’s not an engineer since he’s wearing a suit. Maybe he works in finance? Of perhaps government, since he’s walking down some pretty “city hall”-looking stairs. Maybe it’s a courthouse, and he’s been cleared of all charges, thanks to the evidence in the signal processing magazine?

Now here’s the Communications one:

Hey there, want to have some sitcom-like hijinks with 4G communication systems?

Again, who is this woman, and why is she creepily hiding behind the magazine, only to pop around the side, holding the pages shut just as I’m opening it? You scared me, lady! What are you trying to do, give me a heart attack? Or are you some sort of not-so-subtle ploy to lure the predominantly male engineering audience to download the magazine?

Honestly, I wish IEEE would not bother paying for the graphic design of the download image and instead use the money for something else, like defraying subscription costs for developing nations. As it stands, these emails make me take the magazine less seriously.

an old-ish thought

I just got another email about a “Frontiers in X” conference, so I thought of this. Have you ever noticed how biology and medicine always have conferences that have the word “frontiers” in the title? I guess that’s because they are very high dimensional phenomena, and as we know, the sphere in high dimensions has all of its mass at the boundary.

Of course, the downside is that the volume of the unit sphere goes to 0 as the dimension goes to infinity…

Lossless compression via the memoizer

Via Andrew Gelman comes a link to deplump, a new compression tool. It runs the data through a predictive model (like most lossless compressors), but:

Deplump compression technology is built on a probabilistic discrete sequence predictor called the sequence memoizer. The sequence memoizer has been demonstrated to be a very good predictor for discrete sequences. The advantage deplump demonstrates in comparison to other general purpose lossless compressors is largely attributable to the better guesses made by the sequence memoizer.

The paper on the sequence memoizer (by Wood et al.) appeared at ICML 2009, with follow-ups at DCC and ICML 2010 It uses as its probabilistic model a version of the Pitman-Yor process, which is a generalization of the “Chinese restaurant”/”stick-breaking” process. Philosophically, the idea seems to be this : since we don’t know the order of the Markov process which best models the data, we will let the model order be “infinite” using the Pitman-Yor process and just infer the right parameters, hopefully avoiding overfitting while being efficient. The key challenge is that since the process can have infinite memory, the encoding seems to get hairy, which is why “memoization” becomes important. It seems that the particular parameterization of the PY process is important to reduce the number of parameters, but I didn’t have time to look at the paper in that much detail. Besides, I’m not as much of a source coding guy!

I tried it out on Leo Breiman’s paper Statistical Modeling: The Two Cultures. Measured in bytes:

```307458 Breiman01StatModel.pdf         original
271279 Breiman01StatModel.pdf.bz2     bZip (Burrows-Wheeler transform)
269646 Breiman01StatModel.pdf.gz      gzip
269943 Breiman01StatModel.pdf.zip     zip
266310 Breiman01StatModel.pdf.dpl     deplump
```

As promised, it is better than the alternatives, (but not by much for this example).

What is interesting is that they don’t seem to cite much from the information theory literature. I’m not sure if this is a case of two communities working on related problems and unaware of the connections or that the problems are secretly not related, or that information theorists mostly “gave up” on this problem (I doubt this, but like I said, I’m not a source coding guy…)

The Other Indians: A Political and Cultural History of South Asians in America (Vinay Lal) – a short and lively introduction to the history of South Asians in the US, from the colonial era through 19th century Sikh immigration to the present. I don’t think I learned any new facts, per se, but it’s a lot less polemical book than say, The Karma of Brown Folk. Recommended.

The Mindbody Prescription: Healing the Body, Healing the Pain (John E. Sarno) – this was recommended by more than one friend as a potential way to cure my repetitive stress injury and shoulder/neck tension. Sarno’s thesis is a little complicated to describe, but basically he says “repressed anger causes many chronic conditions” and advocates a kind of cognitive-behavioral therapy for it. I found it pretty unconvincing, although there is some good advice sprinkled here and there in the book. Skippable.

The Trade of Queens (Charles Stross) – final book in Stross’s Merchant Princes series, I basically read this because I had already read the other ones and I wanted to see how it turned out. This series started out pretty interesting (although with many of the same racial blind spots as most SciFi), but I lost interest somewhere around the 3rd book. Recommended if you are OCD about finishing things.

A Fragile Power (Chandra Mukerji) – this is an interesting science studies book on how “big science” works, using two oceanographic institutes (Wood’s Hole and Scripps, thinly disguised as “Big Lab” and “Blue Lab” for IRB reasons, presumably) as case studies. It describes issues like power dynamics within the lab, tensions between engineers who make the instruments and the scientists that use them, how the government uses scientists to further their own ends, and so on. Her main thesis is that the state uses its funding power to maintain a sort of “reserve” of scientific expertise — in the case of the military, they have an interest in people who understand and can map the sea floor, and so they fund scientists to train students in the use of these technologies. Although “big science” is not “academic engineering,” many of the observations she makes in her book do seem relevant. I don’t think anyone has done a study of this sort on academic engineering, where there is a lot more industry sponsorship. A good thesis topic for… someone else. Highly recommended.

The Space Between Us (Thrity Umrigar) – a novel about a Parsee widow and her relationship with her servant, who has been with her for a long time. At times touching, and at other times melodramatic, it exemplifies a kind of middle-class sensibility about poverty and class-relations in India. The personal tragedies don’t transform into a larger critique of the system. I’m sure Amitava Kumar would have something more intelligent to say about it. Somewhat recommended (cautiously) if you’re interested in South Asian lit in English.

Unseen Academicals (Terry Pratchett) – a Discworld novel about soccer. Perfect during the World Cup.

The Beautiful Struggle (Ta-Nehisi Coates) – a touching and beautifully written memoir about growing up in Baltimore, raised by an ex-Black Panther father. Coates struggles against the expectations of his father and himself, becoming Conscious and getting the Knowledge that his father prized so much. Highly recommended.

Passport Pictures (Amitava Kumar) – a book about immigration, unheard voices, the politics of knowledge, where the rubber of theory meets the road of reality in postcolonial and transnational studies. The best part about this book is how it questions itself and how it takes what might be arch turn of phrase and actually tries to answer it. For example, “if the word ‘curry’ doesn’t have a stable referent or a fixed origin, how can its changing use by postcolonials be seen as a sign of resistance?” Recommended for those with an interest in immigration from South Asia and a taste for some postcolonial theory (but with examples! And poetry! And photography!)

Inherent Vice (Thomas Pynchon) – this was definitely not what I expected. A kind of potboiler of a detective novel but with more drugs and cultural references than you could shake a stick at. A tribute to LA in some era that I can’t recognize. Much more approachable than the monster tomes that Pynchon has put out (and I have not read), this book was also a real page turner.

The City and The City (China Mieville) – I, like others, am a bit of a China Mieville fanboy, but I think this is one of his better novels (it derailed a bit at the end for me, but I don’t want to give anything away). It’s set in a very strange city — two cities in the same place, to be precise, and it really brings up a lot of interesting questions about society divisions, (un)seeing, and global politics. The allegory doesn’t seem too far from reality. Recommended!

Feller’s anti-Bayesianism

I didn’t really realize that in Feller’s classic probability book he had the following dismissal of Bayesian statistics:

Unfortunately Bayes’ rule has been somewhat discredited by metaphysical applications of the type described above. In routine practice, this kind of argument can be dangerous. A quality control engineer is concerned with one particular machine and not with an infinite population of machines from which one was chosen at random. He has been advised to use Bayes’ rule on the grounds that it is logically acceptable and corresponds to our way of thinking. Plato used this type of argument to prove the existence of Atlantis, and philosophers used it to prove the absurdity of Newton’s mechanics. In our case it overlooks the circumstance that the engineer desires success and that he will do better by estimating and minimizing the sources of various types of errors in predicting and guessing. The modern method of statistical tests and estimation is less intuitive but more realistic. It may be not only defended but also applied.” — W. Feller, 1950 (pp. 124-125 of the 1970 edition)

A few weeks ago, a little note on Feller’s anti-Bayesianism was posted to ArXiV. It’s a bit of an emotional read; a mathematical Op-Ed if you will. However, it does present an interesting perspective on historical “received wisdom” in light of more modern approaches to statistics and Bayesian data analysis. As an example, take the methods from Michael Jordan’s talk at ISIT (video and slides on the ITSOC website now!), using which you can do some cross-validation to see that they are indeed producing the correct results on real data.

What I am missing (as an outsider to the internecine battles of statistics) is an even-handed explanation of what all the hullabaloo is about. Such an article probably exists, but I haven’t seen it…

Postdocs at UC have unionized

By now it’s officially official, but postdoc employees across the UC system unionized and got a contract with the university. Here are a few of the good things that came out:

• experience-based minimum salary steps according to the NIH/NRSA pay scale – these are minimum pay guidelines, so PIs who feel generous can of course pay more. Why are minimum pay requirements important? Many postdocs are here for 5 years. With durations like that, the position is not “training,” it’s a job. And therefore we should treat it like a job. Prior to this contract, many postdocs were receiving well below the minimum that NIH recommends, even though they were funded by NIH grants. In addition, you cannot be a postdoc for more than 5 years — after that you should be hired as a staff scientist. Some PIs oppose this, because postdocs are “cheaper” than staff scientists.
• health insurance – the UC administration wanted to slash benefits in a way that would ultimately end up cutting compensation.
• workplace safety – suppose the lab you work in is unsafe, but if you report any violations your PI may fire you. Does that seem fair?

There are a lot of other things in there, especially with regards to time off, parental leave, and so on. There is a pernicious attitude in the sciences that if you have kids while a grad student/postdoc/pre-tenure faculty you are “not serious about your career.” If you have 6 years of grad school and then 5 years of postdoc and then start a tenure-track job, and wait to have kids until after tenure, you might be 38 or 40. Breaking this attitude is hard, but it’s really starts with establishing basic expectations and treating employees like people.

And that is why this contract is important. I think of it as a restructuring of the playing field — without rules from the University as a whole, PIs are incentivized to pay postdocs as little as possible and work them as hard as possible, trading on the reputation of their lab and the University to make the deal more palatable. This is not to say most PIs do this, but certainly some do. With this contract there is a minimum set of rules by which PIs have to play, rules which are in fact in accordance with recommendations by funding agencies.

In talking about this with faculty from different places, I’ve heard diverse perspectives on why this is a difficult thing for them to accept, the fears they have about being demonized or not being able to have the flexibility they feel is so important to being able to run the kind of research program that they desire. These are important concerns, and ones which can and should be explored as this new contract is implemented. However, I have heard no good proposals from them about how to address the real issues faced by postdoctoral employees, whereas this contract does just that.

EVT/WOTE ’10 : Panel on India’s Electronic Voting Machine

I’m attending the…

Panel on Indian Electronic Voting Machines (EVMs)
Moderator: Joseph Lorenzo Hall, University of California, Berkeley and Princeton University
Panelists: P.V. Indiresan, Former Director, IIT-Madras; G.V.L Narasimha Rao, Citizens for Verifiability, Transparency, and Accountability in Elections, VeTA; Alok Shukla, Election Commission of India; J. Alex Halderman, University of Michigan

The first speaker was G.V.L. Narasimha Rao, who is also a blogger on the topic of elections. He is a staunch opponent of Electonic Voting Machines (EVMs). He gave a summary of voting in India — until 1996, all voting was with paper ballots and hand counting. In 1998 there were some EVMs introduced in urban areas, and then in 2004 it moved entirely to EVMs. Vote confirmation was given by a beep, and there were several complaints of machine failure. His claim is that exit polling was accurate prior to 2004 and then after the introduction of EVMs, the exit polls diverged widely from the actual results. In these elections I believe the BJP got a drubbing from Congress (Rao probably got suspicious since he appears to be a BJP political analyst).

Next up was Alok Shukla, the Deputy Election Commissioner of India. He gave an overview of the EVMs in use in India. He gave a review of how India decided to move to EVMs (the Parliament ended up approving the use of EVMs). He claimed that a paper trail was not the solution (mostly due to infeasibility/cost/remoteness of polling locations, etc), and said solutions lie in better transparency and administrative oversight. His main answer to claims that the EVMs have been hacked is that the attacks are infeasible and detectable by election officials. Finally, he said essentially “different systems for different people” (or different strokes for different folks?).

The third speaker was J. Alex Halderman, who is one of the people who attacked the Indian EVM. He described how he got hold of an EVM and showed details on the insides. The first problem is that the devices can be duplicated (or fake ones could be substituted). Another issue is that verifying the code in the EVM is not possible (so they can be tampered with at the time of manufacture). Finally, the reported counts are stored in two EEPROMS which can be swapped out. There are two attacks (at least) that they performed. The first is to hack the display so that false counts are displayed on the LED. A bluetooth radio lets a mobile user select who should win. The second is to clip on a device to reprogram the EEPROMS. Full details will appear at CCS. Halderman’s last bit of news was that one of their co-authors in India, Hari K. Prasad, has been summoned by the police as a result of a criminal complaint that he stole the EVM, which seems like an attempt by the government of India to silence their critics. He called upon Shukla to drop the suit, who was rather upset by this public accusation.

The last panelist was P.V. Indiresan, who is on the advisory committee to the government. He discussed some new security features in EVMs, such as signatures to prevent tampering with the cable between the ballot unit (where people push buttons) and the control unit (which counts the ballots). He claimed that most of the attacks proposed so far are farfetched. Much of his latter complaints were to the effect that to break the EVM is a criminal act (which is a claim of security through obscurity). He ended with a plea to ask researchers to stop (!) hacking the EVMs because they “are working.”

To sum up : the Indian government says the system works and that there is no actual evidence of tampering (with the exception of Prasad, who apparently received stolen goods). Halderman says the attacks show that the system as a whole are not secure, and Rao says that the results are suspicious.

Shukla responded to critics that the Election Commission of India is willing to listen to critics and said that the only kind of attack that is of interest is one on a sealed machine. He reiterated the statement that Prasad was in receipt of stolen government property and needs to be questioned.

The Q&A was quite contentious. I might have more to say about it later… but wow.