estimating probability metrics from samples

I took a look at this interesting paper by Sriperumbudur et al., On the empirical estimation of integral probability metrics (Electronic Journal of Statistics Vol. 6 (2012) pp.1550–1599). The goal of the paper is to estimate a distance or divergence between two distributions P and Q based on samples from each distribution. This sounds pretty vague at first… what kind of distributions? How many samples? This paper looks at integral probability metrics, which have the form

\gamma(P,Q) = \sup_{f \in \mathcal{F}} \left| \int_{S} f dP - \int_{S} f dQ \right|

where S is a measurable space on which P and Q are defined, and \mathcal{F} is a class of real-valued bounded measurable functions on S. This class doesn’t contain Csiszár \phi-divergences (also known as Ali-Silvey distances), but does contain the total variational distance.

Different choices of the function class give rise to different measures of difference used in so-called two-sample tests, such as the Kolmogorov-Smirnov test. The challenge in practically using these tests is that it’s hard to get bounds on how fast an estimator of \gamma(P,Q) converges if we have to estimate it from samples of P and Q. The main result of the paper is to provide estimators with consistency and convergence guarantees. In particular, they estimators are based on either linear programming or (in the case of kernel tests) in closed form.

The second section of the paper connects tests based on IPMs with the risk associated to classification rules for separating P and Q when the separation rule is restricted to come from the function class \mathcal{F} associated to the rule. This is a nice interpretation of these two-sample tests — they are actually doing similar things for restricted classes of classifiers/estimators.

Getting back to KL divergence and non-IPM measures, since total variation gives a lower bound on the KL divergence, they also provide lower bounds on the total variation distance in terms of other IPM metrics. This is important since the total variation distance can’t be estimated itself in a strongly consistent way. This could be useful for algorithms which need to estimate the total variation distance for continuous data. In general, estimating distances between multivariate continuous distributions can become a bit of a mess when you have to use real data — doing a plug-in estimate using, e.g. a kernel density estimator is not always the best way to go, and instead attacking the distance you want to measure directly could yield better results.

Bounds between common information and mutual information

I’m at EPFL right now on my way back from a trip to India. My last work-related stop there was the Tata Institute of Fundamental Research, where I am working with Vinod Prabhakaran on some follow-up work to our ISIT paper this year on correlated sampling. TIFR has a lovely campus in Navy Nagar, right next to the ocean in the south of Mumbai. It’s a short walk across the lawn to the ocean:

The beach at TIFR, looking north to the haze-obscured skyline of Mumbai

The beach at TIFR, looking north to the haze-obscured skyline of Mumbai

The grounds themselves are pretty rigorously landscaped and manicured:

The main building seen across the lawn in front of the ocean.

The main building seen across the lawn in front of the ocean.

Earlier in my trip I visited IIT-Madras, hosted by Andrew Thangaraj and Radha Krishna Ganti and IIT-Bombay, where I was working with Bikash Dey on extensions to some of our AVCs-with-delay work. It’s been a great trip and it’s nice to visit so many different schools. The challenges facing Indian higher education are very different than those in the US, and it’s a good change of perspective from my usual US-centric view of things. Maybe I’ll write more on that later.

But for now, I wanted to get back to some actual technical stuff, so I thought I’d mention something that came up in the course of my discussions with Vinod today. For a joint distribution P(X,Y), Wyner defined the common information in the the following way:

\mathsf{CI}(X;Y) = \min_{X--U--Y} I(XY ; U)

One fun fact about the common information is that it’s larger than the mutual information, so I(X;Y) \le \mathsf{CI}(X;Y). One natural question that popped up as part of a calculation we had to do was whether for doubly-symmetric binary sources we could have a bound like

\mathsf{CI}(X;Y) \le \alpha I(X;Y)

for some “nice” \alpha. In particular, it would have been nice for us if the inequality held with \alpha = 2 but that turns out to not be the case.

Suppose (X,Y) and are a doubly-symmetric binary source, where Y is formed from X \sim \mathsf{Bern}(1/2) by passing it through a binary symmetric channel (BSC) with crossover probability \alpha. Clearly the mutual information is I(X;Y) = 1 - h(\alpha). For the common information, we turn to Wyner’s paper, which says \mathsf{CI}(X;Y) = 1 + h(\alpha) - 2 h( \frac{1}{2}( 1 - \sqrt{1 - 2 \alpha} ) ), which is a bit of a weird expression. Plotting the two for \alpha between 0 and 1/2 we get:

Mutual and Common Information for a DSBS

Mutual and Common Information for a DSBS

If we plot the ratio \mathsf{CI}(X;Y)/I(X;Y) versus \alpha, we get the following:

Ratio of Common to Mutual Information for a DSBS

Ratio of Common to Mutual Information for a DSBS

The ratio blows up! So not only is Common Information larger than mutual information, it can be an arbitrarily large factor larger than the mutual information.

It might seem like this is bad news for us, but it just made the problem we’re working on more interesting. If we get any results on that I might be less engimatic and blog about it.


I am traveling all over India at the moment so I’m not really able to write contentful posts. Here are even more links instead, sigh. Maybe later I’ll talk about log-Sobolev inequalities so I can be cool like Max.

Speaking of Max, he posted this hilarious bad lip reading version of Game of Thrones. Probably NSFW. I don’t even like the series but it’s pretty funny.

For those who are fans of Rejected, Don Hertzfeldt’s new film is available on Vimeo.

Those who were at Berkeley may remember seeing Ed Reed perform at the Cheeseboard. His album (which I helped fund via indiegogo, was named a Downbeat Editors’ Pick. It’s a great album.

In light of the Snowden leaks, some doubt has been cast on NIST’s crypto standards.

I’m super late to this, but I endorse Andrew’s endorsement of Sergio‘s interview with Robert Fano in the IT Newsletter. Here’s just the article, if you want that.


Avleen Bijral, one of the students here at TTIC, sent me a link to a pretty cool browser plugin for GMail called GMail TeX. Basically it lets you send LaTeX formatted emails via gmail. This can be quite a time saver for the reader (but maybe not the writer) if you’re remotely collaborating on some project (which I so often am these days, it seems). It’s supported on a wide variety of browsers and seems worth checking out!

PSA: immigration scams preying on international students

I got an email from the Rutgers ECE department about a new scam that has been preying on international students. Some readers of the blog may wish to be aware of this.

Scammers posing as immigration officials – United States Citizenship and Immigration Services (USCIS) officers, Department of State (DOS) officials, or local policemen are contacting international students, & scholars. They may ask for your personal information (such as SSN, passport number, credit card information), identify false problems with your immigration record, and ask for payment to correct the record.

These scams can be very sophisticated with a USCIS like number appearing on your caller ID, the caller knowing a lot of your personal information and so on. Here are a few pointers to bear in mind if you receive this kind of a fake “immigration” call:

  • Don’t wire/pay any money. USCIS will never call someone to ask for any form of payment over the phone.
  • Just hang up and don’t call back.
  • Call the real USCIS National Customer Service Center at 1-800-375-5283 to report the issue.
  • Never give out your personal information. No matter who a caller claims to be, don’t give him/her your I-94 number, “A” number, visa control number or any other personal information. Hang up and call the real USCIS.

Apparently Rutgers had an incident involving a postdoc, where the caller posed as a police officer “and informed the scholar that he had immigration problems. This female caller had the scholar’s home address, cell phone number and SSN number. She informed the scholar that the problem can be resolved if he could make an immediate payment.”

More information can be found at the USCIS website.