Persi Diaconis gave the second annual Billingsley Lecture at UChicago yesterday on the topic of coincidences and what a skeptical statistician/probabilist should say about them. He started out by talking about how Jung was fascinated by paradoxes (apparently there’s one about having fish come up all the time in conversation).

It was mostly a general-audience talk (with some asides about Poisson approximation), and the first part on the birthday problem and variants. Abstracted away, the question is given $n$ balls (people) and $C$ bins/categories (days), how big should $n$ be so that there’s an even chance that two balls land in the same bin? Turns out $n \approx latex 1.2 \sqrt{C}$, as we know, but we can expand this to deal with approximate matches (you need only 7 people to get 2 birthdays in the same week with probability around 1/2). If you want to put a graph on it you can ask social-network coincidence questions and get some scalings as a function of the number of edges and number of categories — here there are $n$ vertices and $C$ colors for the vertices. What these calculations show, of course, is that most coincidences are not so surprising, at least in this probabilistic sense. Some more advanced treatment might be found in Sukhada Fadnavis’s preprint (which also has something about a “shameful conjecture” on chromatic polynomials that was proved in 2000, but I don’t know why it is shameful). The second part of the talk was on problems arising in the study of ESP — namely that experimental controls are not really present, so the notion of a “trial” is hard to pin down, leading (of course) to more false perceptions of coincidences are being surprising. He closed with some remarks about how our perception of coincidence is really about how our minds work, and pointed to some work by Ruma Falk for those who are interested in that angle of things.

I was unaware of this body of Diaconis’s work, and it was nice to have a high-level talk to cap off the day.

Assumptionless consistency of the Lasso
Sourav Chatterjee
The title says it all. Given $p$-dimensional data points $\{ \mathbf{x}_i : i \in [n] \}$ the Lasso tries to fit the model $\mathbb{E}( y_i | \mathbf{x_i}) = \boldsymbol{\beta} \mathbf{x}_i$ by minimizing the $\ell^1$ penalized squared error
$\sum_{i=1}^{n} (y_i - \boldsymbol{\beta} \mathbf{x}_i)^2 + \lambda \| \boldsymbol{\beta} \|_1$.
The paper analyzes the Lasso in the setting where the data are random, so there are $n$ i.i.d. copies of a pair of random variables $(\mathbf{X},Y)$ so the data is $\{(\mathbf{X}_i, Y_i) : i \in [n] \}$. The assumptions are on the random variables $(\mathbf{X},Y)$ : (1) each coordinate $|X_i| \le M$ is bounded, the variable $Y = (\boldsymbol{\beta}^*)^T \mathbf{X} + \varepsilon$, and $\varepsilon \sim \mathcal{N}(0,\sigma^2)$, where $\boldsymbol{\beta}^*$ and $\sigma$ are unknown constants. Basically that’s all that’s needed — given a bound on $\|\boldsymbol{\beta}\|_1$, he derives a bound on the mean-squared prediction error.

On Learnability, Complexity and Stability
Silvia Villa, Lorenzo Rosasco, Tomaso Poggio
This is a handy survey on the three topics in the title. It’s only 10 pages long, so it’s a nice fast read.

Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression
Francis Bach
A central challenge in stochastic optimization is understanding when the convergence rate of the excess loss, which is usually $O(1/\sqrt{n})$, can be improved to $O(1/n)$. Most often this involves additional assumptions on the loss functions (which can sometimes get a bit baroque and hard to check). This paper considers constant step-size algorithms but where instead they consider the averaged iterate $\latex \bar{\theta}_n = \sum_{k=0}^{n-1} \theta_k$. I’m trying to slot this in with other things I know about stochastic optimization still, but it’s definitely worth a skim if you’re interested in the topic.

On Differentially Private Filtering for Event Streams
Jerome Le Ny
Jerome Le Ny has been putting differential privacy into signal processing and control contexts for the past year, and this is another paper in that line of work. This is important because we’re still trying to understand how time-series data can be handled in the differential privacy setting. This paper looks at “event streams” which are discrete-valued continuous-time signals (think of count processes), and the problem is to design a differentially private filtering system for such signals.

Gossips and Prejudices: Ergodic Randomized Dynamics in Social Networks
Paolo Frasca, Chiara Ravazzi, Roberto Tempo, Hideaki Ishii
This appears to be a gossip version of Acemoglu et al.’s work on “stubborn” agents in the consensus setting. They show similar qualitative behavior — opinions fluctuate but their average over time converges (the process is ergodic). This version of the paper has more of a tutorial feel to it, so the results are a bit easier to parse.

Learning from transcriptomes can be cheaper for organisms which have never been sequenced.

A fancy Nature article on mobility privacy, in case you weren’t convinced by other studies on mobility privacy.

Bad statistics in neuroscience. Color me unsurprised.

I bet faked results happen a lot in pharmaceutical trials, given the money involved. Perhaps we should jail people for faking data as a disincentive?

The Atheist shoe company did a study to see if the USPS was discriminating against them.

On Lalitha’s recommendation I read Frank Nielsen’s paper “Cramer-Rao Lower Bound and Information Geometry,” which is a survey how C.R. Rao’s work has impacted information geometry. I remember spending some time in grad school trying to learn information geometry (mostly for fun), but since it ended up not being particularly useful in my research, I’m afraid a lot of it has leaked out of my ears. This paper has a short introduction to the Cramer-Rao lower bound and an introduction to information geometry which might be a nice read for some of the readers of this blog. It’s certainly faster than trying to read Amari’s monograph! In particular, it goes over the “highlights” of geodesics and other geometric features on the manifold of probability distributions.

The paper mentions the sub-family of f-divergences known as $\alpha$-divergences, which are given by

$D_{\alpha}(p \| q) = \frac{4}{1 - \alpha^2} \left( 1 - \int p(x)^{(1 - \alpha)/2)} q(x)^{(1 + \alpha)/2} dx \right)$

The KL divergence is $D_{-1}(p \| q)$ — you have to take the limit as $\alpha \to -1$. Within this family of divergences we have the relation $D_{\alpha}(p \| q) = D_{-\alpha}(q \| p)$. Consider a pair of random variables $(X,Y)$ with joint distribution $P_{XY}$ and marginal distributions $P_X$ and $P_Y$. If we take $q = P_X P_Y$ and $p = P_{XY}$ then the mutual information is $D_{-1}( p \| q )$. But we can also take

$D_{-1}( P_{X} P_{Y} \| P_{XY}) = D_1( P_{XY} \| P_{X} P_{Y} )$

Thus it turns out that the “lautum information” defined by Palomar and Verdú is a special case of this: it’s the 1-divergence between the the joint distribution and the product of the marginals. While their paper mentions the lautum information is an f-divergence, it doesn’t discuss this connection to this family of divergences. Nielsen’s paper calls this the “reverse Kullback-Leibler divergence,” but some googling doesn’t seem to indicate that this is a common term, or indeed if it has some use in information geometry. Palomar and Verdú give several operational interpretations of the lautum information.

I tend to flag papers on ArXiV that I want to take a look at in (soon to be defunct, *sniff*) Google Reader. Here are some papers from the last month that I found interesting. I’ll post a few more of these as I work through my backlog…

Local Privacy and Statistical Minimax Rates (John C. Duchi, Michael I. Jordan, Martin J. Wainwright) — this is a paper proving minimax lower bounds for differential privacy. The approach is based on the Fano/Le Cam style of getting minimax bounds by constructing a packing of instances of the problem.

Bernstein – von Mises Theorem for growing parameter dimension (Vladimir Spokoiny) — I’m generally interested in the consistency properties of Bayesian procedures, and this looks at the effect of asymptotically growing the problem size to see how fast the problem can grow while still getting the same consistency from the BvM theorem.

On the problem of reversibility of the entropy power inequality (Sergey G. Bobkov, Mokshay M. Madiman) — More results on the EPI. Reversing it is the same as reversing the Brunn-Minkowski inequality (consider uniform distributions), but there is an interesting impossibility result here (Theorem 1.3): “For any constant $C$, there is a convex probability distribution $\mu$ on the real line with a finite entropy, such that $\min \{ H(X+Y), H(X-Y) \} \ge C H(X)$, where $X$ and $Y$ are independent random variables, distributed according to $\mu$.” The distribution they use is a truncated Pareto distribution but the calculations seem hairy.

A universal, operational theory of unicast multi-user communication with fidelity criteria (Mukul Agarwal, Sanjoy Mitter, Anant Sahai) — This is the culmination of Mukul’s work starting from a very nice paper I cite all the time from Allerton. There are several results and commentary in here — there’s a fair bit of philosophy, so it’s worth a more patient read than I could give it so far (only so many hours in the day, after all!)

The Convergence Rate of Majority Vote under Exchangeability (Miles E. Lopes) — The title says it all, really. The bounds are actually in terms of the mixture distribution of the exchangeable sequence of Bernoulli votes.

A rather pretty video of an L-system made by my friend Steve.

LACMA, which I finally saw with a friend in February, has decided to offer high-resolution downloads of many of the items in its collection. This Ganesha has a pretty impressive belly. Via MeFi.

This may answer David Bowie’s question.

This slideshow makes me want to go to Slurping Turtle again.

Sometimes I wish we could just name p-values something else that is more descriptive. There’s been a fair bit of misunderstanding about them going on lately.

I think this is the end of my ITA blogging! But there were some issues that came up during the conference that may be of interest to some of the readers of this blog (although from anecdotal reports, there are many people who read but never comment, so I’m not sure what to do to encourage more discussions).

Having seen a talk recently by John Ioannidis on how medical research is (often) bunk, this finer corrective by Larry Wasserman was nice to read.

Computer science conferences are often not organized by the ACM, but instead there are different foundations for machine learning and vision and so on that basically exist to organize the annual conference(s). At least, that is what I understand. There are a few which are run by the ACM, and there’s often debate about whether or not the ACM affiliation is worth it, given the overheads and so on. Boaz Barak had a post a little over a week ago making the case for sticking with the ACM. Given the hegemonic control of the IEEE on all things EE (more or less), this debate is new to me. As far as I can tell, ISIT exists to cover some of the cost of publishing the IT Transactions, and so it sort of has to be run by IEEE.

As mentioned before, Tara Javidi has a nice post up on what it means for one random variable to be stochastically less variable than another.

Paul Miniero has a bigger picture view of NIPS — I saw there were lots of papers on “deep learning” but it’s not really my area so I missed many of those posters.

David Eppstein’s top 10 cs.DS papers from 2012.

Here’s a roundup of some interesting posts/pages on technical things.

Over at Larry Wasserman’s blog, Rob Tibshirani suggests 9 Great Statistics papers published after 1970. You know, in case you were looking for some light reading over winter break.

Videos from the DIMACS Differential Privacy Workshop are up.

All of these ads for jobs this year want someone who works on Big Data. But… do you really have big data? Or, as I like to ask, “how big is big, anyway?”

Speaking of big data, this talk by Peter Bartlett looks cool. (h/t Andrew Gelman)

Max Raginsky and Igal Sason have a tutorial on measure concentration. Log Sobolev inequalities are a dish best served cold.

I’ll probably do an ArXiV roundup sometime soon — trying to catch up on a backlog of reading and thinking lately.

This is an amazing video that makes me miss the Bay Area. (via Bobak Nazer)

Also via Bobak, we’re number 8 and 10!

Since it’s holiday season, I figured it’s time to link to some profanity-laden humor about the holidays. For the new, The Hater’s Guide to the Williams-Sonoma Catalog, and the classic It’s Decorative Gourd Season….

A Game of Food Trucks. (via MetaFilter)

Larry Wasserman takes on the Bayesian/Frequentist debate.

My friend Erik, who started the Mystery Brewing Company, has a blog called Top Fermented. He is now starting a podcast, which also has an RSS feed.