I figured I would blog about this week’s workshop at Banff in a more timely fashion. Due to the scheduling of flights out of Calgary, I will have to miss the last day of talks. The topics of people’s presentations varied rather widely, and many were not about the sort of Good-Turing estimator setup. Sometimes it was a bit hard to see how to see how the problems or approaches were related (not that they had to be directly), but given that the crowd had widely varying backgrounds, presenters had a hard time because the audience had to check in a new set of notation or approach for every talk. The advantage is that there were lots of questions — the disadvantage is that people insisted on “finishing” their presentations. By mid-week my brain was over-full, and a Wednesday afternoon hike up Sulphur Mountain was the perfect solution.
Tag Archives: Bayesians
Bayesianism in philosophy
In an effort to get myself more philosophically informed with regards to probability and statistics, I’ve been reading about various notions and their discontents, such as symmetry, or Bayesianism, or p-values. I was delighted to find this recent pair of papers (part I,part II) by fellow Berkeley-ite and occasional puzzle-partner Kenny Easwaran (now a prof at USC) on Bayesianism in Philosophy Compass. In the first paper he goes through basic tenets of Bayesian approaches to probability in terms of subjective belief, and their philosophical justification via rational actions or “Dutch book” arguments and representation theorems. What’s also interesting from a scientific view (somewhat off-topic from the article) is the angle being advanced (some might say “pushed”) by some cognitive scientists that people are actually doing some kind of Bayesian conditionalization in certain tasks (here’s a plug for my buddy Pradeep‘s work). The second article talks about the difficulties in developing a consistent and quantitative “confirmation theory” in Bayesianism. In different fields there are different questions how how to do this, and as Kenny points out, the anti-Bayesians in different fields are different — the null-position is not necessarily frequentism.
They’re a relatively quick read, and I think provide some different perspectives for those of us who usually see these concepts in our little fiefdoms.
Allerton 2010 : the only talks I’ll blog about
Hey, lookit! I’m blogging about a conference somewhat near to when the conference happened!
I’m just going to write about a few talks. This is mostly because I ended up not taking as many notes this year, but also because writing up extensive notes on talks is a bit too time consuming. I found the talks by Paul Cuff, Sriram Vishwanath, Raj Rajagopalan, and others interesting, but no notes. And of course I enjoyed the talks by my “bosses” at UCSD, Alon Orlitsky and Tara Javidi. That’s me being honest, not me trying to earn brownie points (really!)
So here were 5 talks which I thought were interesting and I took some notes.
Lossless compression via the memoizer
Via Andrew Gelman comes a link to deplump, a new compression tool. It runs the data through a predictive model (like most lossless compressors), but:
Deplump compression technology is built on a probabilistic discrete sequence predictor called the sequence memoizer. The sequence memoizer has been demonstrated to be a very good predictor for discrete sequences. The advantage deplump demonstrates in comparison to other general purpose lossless compressors is largely attributable to the better guesses made by the sequence memoizer.
The paper on the sequence memoizer (by Wood et al.) appeared at ICML 2009, with follow-ups at DCC and ICML 2010 It uses as its probabilistic model a version of the Pitman-Yor process, which is a generalization of the “Chinese restaurant”/”stick-breaking” process. Philosophically, the idea seems to be this : since we don’t know the order of the Markov process which best models the data, we will let the model order be “infinite” using the Pitman-Yor process and just infer the right parameters, hopefully avoiding overfitting while being efficient. The key challenge is that since the process can have infinite memory, the encoding seems to get hairy, which is why “memoization” becomes important. It seems that the particular parameterization of the PY process is important to reduce the number of parameters, but I didn’t have time to look at the paper in that much detail. Besides, I’m not as much of a source coding guy!
I tried it out on Leo Breiman’s paper Statistical Modeling: The Two Cultures. Measured in bytes:
307458 Breiman01StatModel.pdf original 271279 Breiman01StatModel.pdf.bz2 bZip (Burrows-Wheeler transform) 269646 Breiman01StatModel.pdf.gz gzip 269943 Breiman01StatModel.pdf.zip zip 266310 Breiman01StatModel.pdf.dpl deplump
As promised, it is better than the alternatives, (but not by much for this example).
What is interesting is that they don’t seem to cite much from the information theory literature. I’m not sure if this is a case of two communities working on related problems and unaware of the connections or that the problems are secretly not related, or that information theorists mostly “gave up” on this problem (I doubt this, but like I said, I’m not a source coding guy…)
I didn’t really realize that in Feller’s classic probability book he had the following dismissal of Bayesian statistics:
Unfortunately Bayes’ rule has been somewhat discredited by metaphysical applications of the type described above. In routine practice, this kind of argument can be dangerous. A quality control engineer is concerned with one particular machine and not with an infinite population of machines from which one was chosen at random. He has been advised to use Bayes’ rule on the grounds that it is logically acceptable and corresponds to our way of thinking. Plato used this type of argument to prove the existence of Atlantis, and philosophers used it to prove the absurdity of Newton’s mechanics. In our case it overlooks the circumstance that the engineer desires success and that he will do better by estimating and minimizing the sources of various types of errors in predicting and guessing. The modern method of statistical tests and estimation is less intuitive but more realistic. It may be not only defended but also applied.” — W. Feller, 1950 (pp. 124-125 of the 1970 edition)
A few weeks ago, a little note on Feller’s anti-Bayesianism was posted to ArXiV. It’s a bit of an emotional read; a mathematical Op-Ed if you will. However, it does present an interesting perspective on historical “received wisdom” in light of more modern approaches to statistics and Bayesian data analysis. As an example, take the methods from Michael Jordan’s talk at ISIT (video and slides on the ITSOC website now!), using which you can do some cross-validation to see that they are indeed producing the correct results on real data.
What I am missing (as an outsider to the internecine battles of statistics) is an even-handed explanation of what all the hullabaloo is about. Such an article probably exists, but I haven’t seen it…