ITA 2011 : zombie blogging

I came down with the flu at the tail end of ITA, so I proceeded to fail at a bunch of things, like submitting reviews that were due, writing a submission for ISIT, and blogging the ITA Workshop in time. Cosma already blogged about his favorite talks, beating me to the punch.

I basically missed the first day because of technical glitches with our registration system, but once that was all resolved things went a bit more smoothly (at least I hope they did). The poster session seemed well-attended, and we shot videos of all the posters which will be posted at some point. Ofer did a great job arranging the Graduation Day and poster events. The thing about these conferences is that you end up wanting to talk to people you haven’t seen in a while, and it’s good to hammer out research ideas during the daytime, so I only made it to the keynote and Xiao-Li Meng’s tutorial on MCMC. I felt like I followed the tutorial at the beginning, but even Prof. Meng’s engaging speaking style lost me when it came to modeling something about stars (?). But there will be videos posted of the tutorials soon enough as well. I’ll probably make a post about those. For those who were at the entertainment program, of course the video for that was top priority. For the small number of those blog readers who wish to know what I was making:

  • 2 oz. Maker’s Mark bourbon
  • 1 oz. Carpano Antica formula vermouth
  • 1 dash Angostura bitters
  • 1 dash Regan’s No. 6 orange bitters

Shaken with ice, served up with a cherry. I opted for a bourbon Manhattan with a cherry rather than a rye Manhattan with an orange twist (or without garnish) because it was more convenient, and also more 1960s versus craft cocktail.

But on to the talks! I did manage to drag my lazy butt to some of them.

Improved rate-equivocation regions for secure cooperative communication
Ninoslav Marina, Hideki Yagi, H. Vincent Poor
They looked at a model where you have a transmitter and also a “blind” helper who is trying to help communicate over a wiretap channel. They show a better achievable rate-equivocation region by introducing another auxiliary random variable (big surprise!), but this doesn’t affect the best secrecy rate. So if you are willing to tolerate less than full equivocation at the eavesdropper then you’ll get an improvement.

Shannon’s inequality
S. Verdú, Princeton
Sergio talked about an alternative to Fano’s inequality used by Shannon:
P_e \ge \frac{1}{6} \frac{ H(X|Y) }{ \log M + \log \log M - \log H(X|Y) }
It was a nice talk, and the kind of talk I think is great at ITA. It’s not a new result, but ITA is a place where you can give a talk that explains some cool connection or new idea you have.

On the zero-error capacity threshold for deletion channels
Ian A. Kash, Michael Mitzenmacher, Justin Thaler, Jon Ullman
A nice piece of work on connecting zero-error capacity for deletion channels with longest common subsequences. The error model is adversarial. You can make a graph where each vertex is a length-n binary string, and connect two vertices if the two strings have a longest common subsequence of length at least (1 - p)n. If two strings are connected then they can’t be in the same code since an adversary could delete p n bits and create the common subsequence (note : not substring). So you can get a bound on the capacity by getting a bound on the largest independent set in this graph. So then you can use… Turan’s Theorem! Hooray! There are more results of course…

Data-driven decision making in healthcare systems
Mohsen Bayati, Stanford, Mark Braverman, U Toronto, Michael Gillam, Microsoft, Mark Smith, Medstart Health, and Eric Horvitz, Microsoft Research
This was a nice talk on doing feature selection via ideas from sparse signal processing/machine learning. The idea is to find a small set of features to help predict whether a patient is high-risk or low-risk for being readmitted soon after being discharged from the hospital. The idea is that the number of features is huge but the number of data points is small. They do an L1 penalized logistic regression and then derive a threshold based on the cost of doing an intervention (e.g. house-visits for high-risk patients).

Tracking climate models: advances in Climate Informatics
Claire Monteleoni, Columbia CCLS, Gavin A. Schmidt, NASA and Columbia, Shailesh Saroha, and Eva Asplund, Columbia Computer Science
This was an overview of Claire’s work on climate informatics. The basic problem was this : given several models (large-scale simulated systems based on PDEs etc. derived from physics) that predict future temperature, how should you combine them to produce more accurate predictions. She used some tools from her previous works on HMMs to get a system with better prediction accuracy.

On a question of Blackwell concerning hidden Markov chains
Ramon van Handel
The problem is trying to estimate the entropy rate of a process that is a function of a Markov chain (and hence not a Markov chain itself). “Does the conditional distribution of an ergodic hidden Markov chain possess a unique invariant measure?” This was a great talk for the Blackwell session because it started from a question posed by Blackwell and then revisited a few of his other works. Pretty amazing. Oh, and the paper (or one of them).

I think more talks will have to wait for another time (if ever).

Linkage (and a puzzle)

I saw Scott’s talk today on some complexity results related to his and Alex Arkhpov’s work on linear optics. I missed the main seminar but I saw the theory talk, which was on how hard it is to approximate the permanent of a matrix X whose entries (X_{ij}) are drawn iid complex circularly-symmetric Gaussian \mathcal{CN}(0,1). In the course of computing the expected value of the 4th moment of the permanent, he gave the following cute result as a puzzle. Given a permutation \sigma of length n, let c(\sigma) be the number of cycles in \sigma. Suppose \sigma is drawn uniformly from the set of all permutations. Show that

\mathbb{E}[ 2^{c(\sigma)}] = n + 1.

At least I think that’s the statement.

In other news…

  • Ken Ono has announced (with others) an algebraic formula for partition numbers. Very exciting!
  • Cosma thinks that New Yorker article is risible, but after talking to a number of people about it, I realized that the writing is pretty risible (and that I had, at first pass, skimmed to the part which I thought was good to report in the popular (or elitist) press, namely the bias towards positive results. Andrew Gelman points out that he has written about this before, but I think the venue was the more interesting part here. What was risible about the writing is that it starts out in this “ZOMG OUR SCIENCE POWERZ ARE FAAAAAAADINNNNNNGGGGGGG,” and then goes on to say slightly more reasonable things. It’s worthy of the worst of Malcolm Gladwell.
  • Tofu is complicated.
  • The 90-second Newbery contest.

Quote of the day

From Paul Ellis‘s The Essential Guide to Effect Sizes:

THe post hoc analysis of nonsignificant results is sometime painted as controversial (e.g. Nakagawa and Foster 2004), but it really isn’t. It is just wrong. There are two small technical reasons and one gigantic reason why the post hoc analysis of observed power is an exercise in futility…

… and then some more stuff on p-values and the like. Somehow, reading applied statistics books make my brain glaze over, but at least you get a giggle now and then.

Linkage

It seems as good a time as any to link to this chestnut from McSweeney’s. Warning : full of highly profane language.

Did you know Krish Eswaran has a blog? Neither did I, until today. He appears to not be updating it, however. C’mon Krish, more posts!

Andrew Gelman pins down one of the things that annoys me about arguments based on personal finance — it’s not true that we do things for money or for fun, unless “fun” is really broadly construed. Plus there’s this zinger at the end: “as a statistician, I’m not impressed with an argument when it doesn’t work on the example it’s been applied to.”

A pretty cool video on hand-pulled noodles.

More chaconne than you can handle.

Via CT, an amazing cartoon in which Donald Duck meets Glenn Beck.

How to write about Pakistan, inspired by the classic How to write about Africa.

Allerton 2010 : the only talks I’ll blog about

Hey, lookit! I’m blogging about a conference somewhat near to when the conference happened!

I’m just going to write about a few talks. This is mostly because I ended up not taking as many notes this year, but also because writing up extensive notes on talks is a bit too time consuming. I found the talks by Paul Cuff, Sriram Vishwanath, Raj Rajagopalan, and others interesting, but no notes. And of course I enjoyed the talks by my “bosses” at UCSD, Alon Orlitsky and Tara Javidi. That’s me being honest, not me trying to earn brownie points (really!)

So here were 5 talks which I thought were interesting and I took some notes.

Continue reading

Feller’s anti-Bayesianism

I didn’t really realize that in Feller’s classic probability book he had the following dismissal of Bayesian statistics:

Unfortunately Bayes’ rule has been somewhat discredited by metaphysical applications of the type described above. In routine practice, this kind of argument can be dangerous. A quality control engineer is concerned with one particular machine and not with an infinite population of machines from which one was chosen at random. He has been advised to use Bayes’ rule on the grounds that it is logically acceptable and corresponds to our way of thinking. Plato used this type of argument to prove the existence of Atlantis, and philosophers used it to prove the absurdity of Newton’s mechanics. In our case it overlooks the circumstance that the engineer desires success and that he will do better by estimating and minimizing the sources of various types of errors in predicting and guessing. The modern method of statistical tests and estimation is less intuitive but more realistic. It may be not only defended but also applied.” — W. Feller, 1950 (pp. 124-125 of the 1970 edition)

A few weeks ago, a little note on Feller’s anti-Bayesianism was posted to ArXiV. It’s a bit of an emotional read; a mathematical Op-Ed if you will. However, it does present an interesting perspective on historical “received wisdom” in light of more modern approaches to statistics and Bayesian data analysis. As an example, take the methods from Michael Jordan’s talk at ISIT (video and slides on the ITSOC website now!), using which you can do some cross-validation to see that they are indeed producing the correct results on real data.

What I am missing (as an outsider to the internecine battles of statistics) is an even-handed explanation of what all the hullabaloo is about. Such an article probably exists, but I haven’t seen it…

EVT/WOTE ’10 : the keynote

I am at EVT/WOTE (Electronic Voting Technology Workshop/Workshop on Trustworthy Elections) today and tomorrow, and will try to blog about it a bit. The keynote today was given by Donetta Davidson, who runs the Election Assisstance Commission. She gave an overview of the EAC’s activities and priorities. The Q&A has focused a bit on how voting research is underfunded and that CS researchers want the EAC to lobby for more research funding. I guess some things don’t change much.

David Blackwell has passed away

Via Inherent Uncertainty I learned that David Blackwell passed away on July 8th.

Prof. Blackwell’s original paper (with Leo Breiman and Aram Thomasian) on the arbitrarily varying channel was an inspiration to me, and he served on my thesis committee a scant 2 years ago.

I’ll always remember what he told me when I handed him a draft of my thesis. “The best thing about Bayesians is that they’re always right.”

ISIT 2010 : a few more talks

I think I lack the willpower to write up more notes on talks, and there are other things I’d like to blog about, but here are one or two sentences on some other talks that I found interesting… I also enjoyed the energy session on Friday and the talks by Sundeep and Galen on compressed sensing, but time has gotten the better of me. Next time, folks.

Channel Intrinsic Randomness
Matthieu Bloch
This was on extracting random bits from the output of noisy channel. These bits should be independent of the input to the channel. Matthieu uses the enigmatic information spectrum method to get his results — thanks to the plenary lecture I was able to understand it a bit better than I might have otherwise.

Assisted Common Information
Vinod Prabhakaran and Manoj Prabhakaran
I was very interested in this talk because I have been thinking of a related problem. Two terminals observe correlated sources X_1^n and Y_1^n respectively. A genie observes both sources and sends messages at rates R_1 and R_2 to the two terminals, who then have to produce variables W_1 and W_2 which have large entropies and are also equal with high probability. This problem is connected to the Gacs-Korner problem and Wyner’s common information problem, and also possibly this recent preprint by Andrej Bogdanov and Elchanan Mossel. They manage to solve it using a novel construction of “monotone regions.”

Patterns and Exchangeability
N. P. Santhanam and M. Madiman
This work grew out of the AIM workshop on permanents. De Finetti’s theorem says an exchangeable process can be built up as a mixture of iid processes. Kingman showed that something called an exchangeable partition process is built up from something he called “paintbox processes.” One thing that we discovered at the workshop was that the pattern process of an iid process is the same as a paintbox process (and vice-versa). The paper then goes through many connections between these processes, certain limits of graphs, and connections to universal compression.

Universal Hypothesis Testing in the Learning-Limited Regime
Benjamin G. Kelly, Thitidej Tularak, Aaron B. Wagner, and Pramod Viswanath
This was a really great talk. The problem here is that for each n you get n samples X^n distributed i.i.d. according to p_n or q_n on an alphabet \mathcal{A}_n which can grow with n. Given a new sequence Z^n you have to decide if it was generated according to p_n or q_n. They show a number of results which say that consistency is possible for |\mathcal{A}_n| sublinear in n, impossible for quadratic in n, and other intermediate results. In particular, for well-behaved distributions with |\mathcal{A}_n| = \Theta(n^{\alpha}) and all probabilities are \Theta(n^{-\alpha}), they can get some consistency results, but in particular the generalized likelihood ratio test (GLRT) is inconsistent for \alpha = 1.

Feature Extraction for Universal Hypothesis Testing via Rank-Constrained Optimization
Dayu Huang and Sean Meyn
This talk was of interest to me because I have been looking at hypothesis testing problems in connection with election auditing. In universal testing you know a lot about the distribution for one hypothesis, but much less about the other hypothesis. The Hoeffding test is a threshold test on the KL-divergence between the empirical distribution and the known hypothesis. This test is asymptotically optimal but has a high variance when the data is in high dimension. Thus for smaller sample sizes, a so-called mismatched divergence test may be better. In this paper they look at how to tradeoff the variance and the error exponent of the test.