ICML Workshop on Machine Learning with Test-Time Budgets

Venkatesh Saligrama sent out a call for an ICML workshop he is organizing:

I wanted to bring to your attention an ICML workshop on “Machine Learning with Test-Time Budgets” that I am helping organize. The workshop will be held during the ICML week. The workshop will feature presentations both from data-driven as well as model-based perspectives and will feature researchers from machine learning and control/decision theory.

We are accepting papers related to these topics. Please let me know if you have questions about the workshop or wish to submit a paper.


Quicksort as a dance. Via James Fallows.

I have a subscription to Harper’s and try to solve the cryptic crossword each month in the vain hope that I will win a free year’s subscription. The puzzles back to 1976 have been posted online.

Tesla and the lone inventor myth.

My friend (and ex-fellow actor) Stephen Larson‘s project OpenWorm was written up in Wired UK.

Max has an important reminder about stochastic kernels and conditional probabilities.

Generating vector-valued noise for differential privacy

A distribution that appears frequently in differential privacy is the Laplace distribution. While in the scalar case we have seen that Laplace noise may not be the best, it’s still the easiest example to start with. Suppose we have n scalars x_i \in [0,1] and we want to compute the average \frac{1}{n} \sum_{i} x_i in a differentially private way. One way to do this is to release Y = \frac{1}{n} \sum_{i} x_i + Z, where Z has a Laplace distribution:

p(z) = \frac{\lambda}{2} \exp( - \lambda |z| ).

To see that this is differentially private, note that by changing one value of x_i the average can change by at most \pm \frac{1}{n}. Let \bar{x} and \bar{x}' be the average of the original data and the data with one element changed. The output density in these two cases has distribution p( y - \bar{x}) and p(y - \bar{x}'), so for a

\frac{ p(y - \bar{x}) }{ p(y - \bar{x}') } \le \exp( \lambda ( |y - \bar{x}'| - |y - \bar{x}| ) ) \le \exp( \lambda |\bar{x} - \bar{x}|' ) \le \exp( \lambda/n )

So we can see by choosing \lambda = n \epsilon we get an \epsilon-differentially private approximation to the average.

What if we now have n vectors \mathbf{x}_i \in [0,1]^d? Well, one candidate is release a differentially private version of the mean by computing \mathbf{Y} = \frac{1}{n} \sum_{i} \mathbf{X}_i + \mathbf{Z}, where \mathbf{Z} has a distribution that looks Laplace-like but in higher dimensions:

p(\mathbf{z}) \propto \exp( - \lambda \| \mathbf{z} \| )

Now we can do the same calculation with means \bar{\mathbf{x}} and \bar{\mathbf{x}}'

\frac{ p(\mathbf{y} - \bar{\mathbf{x}}) }{ p(\mathbf{y} - \bar{\mathbf{x}}') } \le \exp( \lambda ( \|\mathbf{y} - \bar{\mathbf{x}}'\| - \|\mathbf{y} - \bar{\mathbf{x}}\| ) ) \le \exp( \lambda \|\bar{\mathbf{x}} - \bar{\mathbf{x}}'\| )

Now the Euclidean norm of the average vector can change by a most \sqrt{d}/n (by replacing x_i = \mathbf{0} with x_i' = \mathbf{1}, for example), so the overall bound is \exp(\lambda \sqrt{d}/n), which means choosing \lambda = n \epsilon / \sqrt{d} we get \epsilon-differential privacy.

Sampling from exponentials is easy, but what about sampling from this distribution? Here’s where people can fall into a trap because they are not careful about transformations of random variables. It’s tempting (if you are rusty on your probability) to say that

p(\mathbf{z}) = C(\lambda) \exp( - \lambda \| \mathbf{z} \| )

and then say “well, the norm looks exponentially distributed and the direction is isotropic so we can just sample the norm with an exponential distribution and the uniform direction by taking i.i.d. Gaussians and normalizing them.” But that’s totally wrong because that is implicitly doing a change of variables without properly adjusting the density function. The correct thing to do is to change from Euclidean coordinates to spherical coordinates. We have a map T : (z_1, z_2, \ldots, z_d) \to (r, \phi_1, \phi_2, \ldots, \phi_{d-1}) whose Jacobian is

J(r, \phi_1, \phi_2, \ldots, \phi_{d-1}) = r^{d-1} \sin^{d-2}(\phi_1) \ldots, \sin(\phi_{d-2}).

Plugging this in and noting that r = \|\mathbf{z}\| we get

p(r, \phi_1, \phi_2, \ldots, \phi_{d-1}) = C'(\lambda,\phi_1,\ldots, \phi_{d-1}) \exp( - \lambda r ).

So now we can see that the distribution factorizes and indeed the radius and direction are independent. The radius is not exponentially distributed, it’s Erlang with parameters (d,\lambda). We can generate this by taking the sum of d exponential variables with parameter \lambda. The direction we can pick uniformly by sampling d i.i.d. Gaussians and normalizing them.

In general sampling distributions for differentially private mechanisms can be complicated — for example in our work on PCA we had to use an MCMC procedure in our experiments to sample from the distribution in our algorithm. This means we could really only approximate our algorithm in the experiments, of course. There are also places to slip up in sampling from simple-looking distributions, and I’d be willing to bet that in some implementations out there people are not sampling from the correct distribution.

(Thanks to Kamalika Chaudhuri for inspiring this post.)

Postdoc position at KTH

(via David Tse)

The School of Electrical Engineering and the ACCESS Linnaeus Center at the KTH Royal Institute of Technology, Stockholm, Sweden, are pleased to announce post-doctoral positions in information and communication theory.

The ability and interest to work across traditional disciplines and to initiate new research collaborations are essential. Candidates should have a PhD (or be near completion) in a relevant field and a strong research and publication record. The duration of the position is 12 months which may be extended by an additional 12 months. The starting date is during fall or winter of 2013.

Candidates interested in a position should send their application material (as a single pdf file) to: openpos-commth@ee.kth.se and 0129@ee.kth.se no later than 20 April 2013. Position reference number E-2013-0129. Write this reference number on your application. The application can include any material that supports the candidate’s qualifications, but as a minimum it should include a CV, contact information of two reference persons, a full list of publications, a brief research statement, and information about academic track record and performance. Do not send any compressed files. Female candidates are explicitly invited to apply.

Mikael Skoglund and Lars K. Rasmussen
KTH Royal Institute of Technology, Stockholm

The KTH School of Electrical Engineering
The ACCESS Center
The KTH EE Communication Theory Lab

Some not-so-recent ArXiV skims

I tend to flag papers on ArXiV that I want to take a look at in (soon to be defunct, *sniff*) Google Reader. Here are some papers from the last month that I found interesting. I’ll post a few more of these as I work through my backlog…

Local Privacy and Statistical Minimax Rates (John C. Duchi, Michael I. Jordan, Martin J. Wainwright) — this is a paper proving minimax lower bounds for differential privacy. The approach is based on the Fano/Le Cam style of getting minimax bounds by constructing a packing of instances of the problem.

Bernstein – von Mises Theorem for growing parameter dimension (Vladimir Spokoiny) — I’m generally interested in the consistency properties of Bayesian procedures, and this looks at the effect of asymptotically growing the problem size to see how fast the problem can grow while still getting the same consistency from the BvM theorem.

On the problem of reversibility of the entropy power inequality (Sergey G. Bobkov, Mokshay M. Madiman) — More results on the EPI. Reversing it is the same as reversing the Brunn-Minkowski inequality (consider uniform distributions), but there is an interesting impossibility result here (Theorem 1.3): “For any constant C, there is a convex probability distribution \mu on the real line with a finite entropy, such that \min \{ H(X+Y), H(X-Y) \} \ge C H(X), where X and Y are independent random variables, distributed according to \mu.” The distribution they use is a truncated Pareto distribution but the calculations seem hairy.

A universal, operational theory of unicast multi-user communication with fidelity criteria (Mukul Agarwal, Sanjoy Mitter, Anant Sahai) — This is the culmination of Mukul’s work starting from a very nice paper I cite all the time from Allerton. There are several results and commentary in here — there’s a fair bit of philosophy, so it’s worth a more patient read than I could give it so far (only so many hours in the day, after all!)

The Convergence Rate of Majority Vote under Exchangeability (Miles E. Lopes) — The title says it all, really. The bounds are actually in terms of the mixture distribution of the exchangeable sequence of Bernoulli votes.


A rather pretty video of an L-system made by my friend Steve.

LACMA, which I finally saw with a friend in February, has decided to offer high-resolution downloads of many of the items in its collection. This Ganesha has a pretty impressive belly. Via MeFi.

This may answer David Bowie’s question.

This slideshow makes me want to go to Slurping Turtle again.

Sometimes I wish we could just name p-values something else that is more descriptive. There’s been a fair bit of misunderstanding about them going on lately.

Mo’ math, mo’ solutions

I was in New York on Sunday afternoon and on the suggestion of Steve Severinghaus we took a trip to the brand-new Museum of Mathematics, which is a short walk from the Flatiron building.

The Museum of Mathematics

The Museum of Mathematics

It’s a great little place to take kids — there are quite a few exhibits which illustrate all sorts of mathematics from recreational math and Martin Gardner-esque pastimes like tessellations to an interactive video-floor which draws minimum distance spanning trees between the people standing on it. It apparently does Voronoi tessellations too but it wasn’t in that mode when I was there. There’s also a video wall which makes your body into a tree fractal, games, and a car-racing game based on the brachistochrone problem. The kids were all over that so I just got to watch.

One of the nice things was that there was a touch-screen explanation of each exhibit from which you could get three different “levels” of explanation depending on how much detail you wanted, and also additional information and references in case you wanted to learn more. That’s good because I think it will let parents learn enough to help explain the exhibit to their kids at a level that the parents feel comfortable. That makes it a museum for everyone and not just a museum for math-y parents who want to indoctrinate their children. On the downside, a lot of the exhibits were broken or under repair or under construction, so we really only got to see about 2/3 of the things.

Apparently it’s also a good place to go on a first date, as evidenced by some surreptitious people-watching. So if you’re in New York and want a romantic or educational time (aren’t they the same thing?), go check it out!