ICML Workshop on Machine Learning with Test-Time Budgets

Venkatesh Saligrama sent out a call for an ICML workshop he is organizing:

I wanted to bring to your attention an ICML workshop on “Machine Learning with Test-Time Budgets” that I am helping organize. The workshop will be held during the ICML week. The workshop will feature presentations both from data-driven as well as model-based perspectives and will feature researchers from machine learning and control/decision theory.

We are accepting papers related to these topics. Please let me know if you have questions about the workshop or wish to submit a paper.

Linkage

Quicksort as a dance. Via James Fallows.

I have a subscription to Harper’s and try to solve the cryptic crossword each month in the vain hope that I will win a free year’s subscription. The puzzles back to 1976 have been posted online.

Tesla and the lone inventor myth.

My friend (and ex-fellow actor) Stephen Larson‘s project OpenWorm was written up in Wired UK.

Max has an important reminder about stochastic kernels and conditional probabilities.

Generating vector-valued noise for differential privacy

A distribution that appears frequently in differential privacy is the Laplace distribution. While in the scalar case we have seen that Laplace noise may not be the best, it’s still the easiest example to start with. Suppose we have n scalars x_i \in [0,1] and we want to compute the average \frac{1}{n} \sum_{i} x_i in a differentially private way. One way to do this is to release Y = \frac{1}{n} \sum_{i} x_i + Z, where Z has a Laplace distribution:

p(z) = \frac{\lambda}{2} \exp( - \lambda |z| ).

To see that this is differentially private, note that by changing one value of x_i the average can change by at most \pm \frac{1}{n}. Let \bar{x} and \bar{x}' be the average of the original data and the data with one element changed. The output density in these two cases has distribution p( y - \bar{x}) and p(y - \bar{x}'), so for a

\frac{ p(y - \bar{x}) }{ p(y - \bar{x}') } \le \exp( \lambda ( |y - \bar{x}'| - |y - \bar{x}| ) ) \le \exp( \lambda |\bar{x} - \bar{x}|' ) \le \exp( \lambda/n )

So we can see by choosing \lambda = n \epsilon we get an \epsilon-differentially private approximation to the average.

What if we now have n vectors \mathbf{x}_i \in [0,1]^d? Well, one candidate is release a differentially private version of the mean by computing \mathbf{Y} = \frac{1}{n} \sum_{i} \mathbf{X}_i + \mathbf{Z}, where \mathbf{Z} has a distribution that looks Laplace-like but in higher dimensions:

p(\mathbf{z}) \propto \exp( - \lambda \| \mathbf{z} \| )

Now we can do the same calculation with means \bar{\mathbf{x}} and \bar{\mathbf{x}}'

\frac{ p(\mathbf{y} - \bar{\mathbf{x}}) }{ p(\mathbf{y} - \bar{\mathbf{x}}') } \le \exp( \lambda ( \|\mathbf{y} - \bar{\mathbf{x}}'\| - \|\mathbf{y} - \bar{\mathbf{x}}\| ) ) \le \exp( \lambda \|\bar{\mathbf{x}} - \bar{\mathbf{x}}'\| )

Now the Euclidean norm of the average vector can change by a most \sqrt{d}/n (by replacing x_i = \mathbf{0} with x_i' = \mathbf{1}, for example), so the overall bound is \exp(\lambda \sqrt{d}/n), which means choosing \lambda = n \epsilon / \sqrt{d} we get \epsilon-differential privacy.

Sampling from exponentials is easy, but what about sampling from this distribution? Here’s where people can fall into a trap because they are not careful about transformations of random variables. It’s tempting (if you are rusty on your probability) to say that

p(\mathbf{z}) = C(\lambda) \exp( - \lambda \| \mathbf{z} \| )

and then say “well, the norm looks exponentially distributed and the direction is isotropic so we can just sample the norm with an exponential distribution and the uniform direction by taking i.i.d. Gaussians and normalizing them.” But that’s totally wrong because that is implicitly doing a change of variables without properly adjusting the density function. The correct thing to do is to change from Euclidean coordinates to spherical coordinates. We have a map T : (z_1, z_2, \ldots, z_d) \to (r, \phi_1, \phi_2, \ldots, \phi_{d-1}) whose Jacobian is

J(r, \phi_1, \phi_2, \ldots, \phi_{d-1}) = r^{d-1} \sin^{d-2}(\phi_1) \ldots, \sin(\phi_{d-2}).

Plugging this in and noting that r = \|\mathbf{z}\| we get

p(r, \phi_1, \phi_2, \ldots, \phi_{d-1}) = C'(\lambda,\phi_1,\ldots, \phi_{d-1}) \exp( - \lambda r ).

So now we can see that the distribution factorizes and indeed the radius and direction are independent. The radius is not exponentially distributed, it’s Erlang with parameters (d,\lambda). We can generate this by taking the sum of d exponential variables with parameter \lambda. The direction we can pick uniformly by sampling d i.i.d. Gaussians and normalizing them.

In general sampling distributions for differentially private mechanisms can be complicated — for example in our work on PCA we had to use an MCMC procedure in our experiments to sample from the distribution in our algorithm. This means we could really only approximate our algorithm in the experiments, of course. There are also places to slip up in sampling from simple-looking distributions, and I’d be willing to bet that in some implementations out there people are not sampling from the correct distribution.

(Thanks to Kamalika Chaudhuri for inspiring this post.)

Postdoc position at KTH

(via David Tse)

The School of Electrical Engineering and the ACCESS Linnaeus Center at the KTH Royal Institute of Technology, Stockholm, Sweden, are pleased to announce post-doctoral positions in information and communication theory.

The ability and interest to work across traditional disciplines and to initiate new research collaborations are essential. Candidates should have a PhD (or be near completion) in a relevant field and a strong research and publication record. The duration of the position is 12 months which may be extended by an additional 12 months. The starting date is during fall or winter of 2013.

Candidates interested in a position should send their application material (as a single pdf file) to: openpos-commth@ee.kth.se and 0129@ee.kth.se no later than 20 April 2013. Position reference number E-2013-0129. Write this reference number on your application. The application can include any material that supports the candidate’s qualifications, but as a minimum it should include a CV, contact information of two reference persons, a full list of publications, a brief research statement, and information about academic track record and performance. Do not send any compressed files. Female candidates are explicitly invited to apply.

Sincerely
Mikael Skoglund and Lars K. Rasmussen
KTH Royal Institute of Technology, Stockholm

The KTH School of Electrical Engineering
The ACCESS Center
The KTH EE Communication Theory Lab

Some not-so-recent ArXiV skims

I tend to flag papers on ArXiV that I want to take a look at in (soon to be defunct, *sniff*) Google Reader. Here are some papers from the last month that I found interesting. I’ll post a few more of these as I work through my backlog…

Local Privacy and Statistical Minimax Rates (John C. Duchi, Michael I. Jordan, Martin J. Wainwright) — this is a paper proving minimax lower bounds for differential privacy. The approach is based on the Fano/Le Cam style of getting minimax bounds by constructing a packing of instances of the problem.

Bernstein – von Mises Theorem for growing parameter dimension (Vladimir Spokoiny) — I’m generally interested in the consistency properties of Bayesian procedures, and this looks at the effect of asymptotically growing the problem size to see how fast the problem can grow while still getting the same consistency from the BvM theorem.

On the problem of reversibility of the entropy power inequality (Sergey G. Bobkov, Mokshay M. Madiman) — More results on the EPI. Reversing it is the same as reversing the Brunn-Minkowski inequality (consider uniform distributions), but there is an interesting impossibility result here (Theorem 1.3): “For any constant C, there is a convex probability distribution \mu on the real line with a finite entropy, such that \min \{ H(X+Y), H(X-Y) \} \ge C H(X), where X and Y are independent random variables, distributed according to \mu.” The distribution they use is a truncated Pareto distribution but the calculations seem hairy.

A universal, operational theory of unicast multi-user communication with fidelity criteria (Mukul Agarwal, Sanjoy Mitter, Anant Sahai) — This is the culmination of Mukul’s work starting from a very nice paper I cite all the time from Allerton. There are several results and commentary in here — there’s a fair bit of philosophy, so it’s worth a more patient read than I could give it so far (only so many hours in the day, after all!)

The Convergence Rate of Majority Vote under Exchangeability (Miles E. Lopes) — The title says it all, really. The bounds are actually in terms of the mixture distribution of the exchangeable sequence of Bernoulli votes.

Linkage

A rather pretty video of an L-system made by my friend Steve.

LACMA, which I finally saw with a friend in February, has decided to offer high-resolution downloads of many of the items in its collection. This Ganesha has a pretty impressive belly. Via MeFi.

This may answer David Bowie’s question.

This slideshow makes me want to go to Slurping Turtle again.

Sometimes I wish we could just name p-values something else that is more descriptive. There’s been a fair bit of misunderstanding about them going on lately.

Mo’ math, mo’ solutions

I was in New York on Sunday afternoon and on the suggestion of Steve Severinghaus we took a trip to the brand-new Museum of Mathematics, which is a short walk from the Flatiron building.

The Museum of Mathematics

The Museum of Mathematics


It’s a great little place to take kids — there are quite a few exhibits which illustrate all sorts of mathematics from recreational math and Martin Gardner-esque pastimes like tessellations to an interactive video-floor which draws minimum distance spanning trees between the people standing on it. It apparently does Voronoi tessellations too but it wasn’t in that mode when I was there. There’s also a video wall which makes your body into a tree fractal, games, and a car-racing game based on the brachistochrone problem. The kids were all over that so I just got to watch.

One of the nice things was that there was a touch-screen explanation of each exhibit from which you could get three different “levels” of explanation depending on how much detail you wanted, and also additional information and references in case you wanted to learn more. That’s good because I think it will let parents learn enough to help explain the exhibit to their kids at a level that the parents feel comfortable. That makes it a museum for everyone and not just a museum for math-y parents who want to indoctrinate their children. On the downside, a lot of the exhibits were broken or under repair or under construction, so we really only got to see about 2/3 of the things.

Apparently it’s also a good place to go on a first date, as evidenced by some surreptitious people-watching. So if you’re in New York and want a romantic or educational time (aren’t they the same thing?), go check it out!

The history of new foods in India

Konnichiwa, Varshney-san. Your post on the potato inspired me to read the papers you mentioned as well as a reference suggested by a friend here in Chicago:

Sucheta Mazumdar, “The Impact of New World Food Crops on the Diet and Economy of China and India, ca. 1600-1900.” Food in Global History. Ed. Raymond Grew. Westview Press, 1999. 58-78.

The Columbian Exchange refers to the interchange of foodstuffs, technologies, and disease after European contact with the Americas. In exchange for offering pestilence, brutal colonialism, and genocide, Europeans got a variety of staple crops with which they could support their burgeoning populations and which would later sustain the Industrial Revolution:

The exchange introduced a wide range of new calorically rich staple crops to the Old World—namely potatoes, sweet potatoes, maize, and cassava. The primary benefifit of the New World staples was that they could be grown in Old World climates that were unsuitable for the cultivation of Old World staples. (Nunn and Qian)

In addition, the discovery of quinine in the Andes enabled Europeans to invade and colonize tropical regions. In addition to the trans-Atlantic slave trade, this expansion introduced the widespread planting of cash crops such as rubber in Africa. Being an economics paper, there are some sobering quantitative measures to drive home the horrors of colonial exploitation:

The population of the Congo is estimated to have been about 25 million prior the rubber boom, in the 1880s. In 1911, after the peak of the boom, the population was 8.5 million, and in 1923 after the completion of the boom, it was 7.7 million. If one compares the population losses relative to the production of rubber, an astonishing conclusion is reached: an individual was “lost” from the Congo for every ten kilograms of rubber exported (Loadman, 2005, pp. 140–41).

The potato paper covers the effect of potatoes and tries to estimate (numerically) the impact potato cultivation had on population growth and urbanization in Europe. It is somewhat elusive to me what such a quantification “means,” but it’s of a piece with what Ian Hacking describes in The Taming of Chance : the torrent of printed numbers led to the publication of attendant “studies” slicing and dicing the numbers in statistical ways in order to “make sense” of them. The second Nunn and Qian paper covers capsicum, tomatoes, cacao, vanilla, coca, and tobacco, and contains some fun nutritional facts and trivia:

  • Capsicum is high in vitamins A, B and C, magnesium, and iron, and the extra saliva produced by capsacin helps digestion.
  • “Greece consumes the most tomatoes per capita… The tomato has been so thoroughly adopted and integrated into Western diets that today it provides more nutrients and vitamins than any other fruit or vegetable (Sokolov, 1993, p. 108).”
  • “[I]n Roald Amundsen’s trek to the South Pole, his men were allocated 4,560 calories per day, of which over 1,000 came from cacao (West, 1992, pp. 117–18).”

My interest came more from vegetables that almost define Indian cuisine : tomatoes, potatoes, and chilies. Mazumdar’s article focuses on the effect new crops had on China and India. Specific to this context,

There were two major periods of introduction of American plants into Asia. The first wave, in the sixteenth and seventeeth centuries, included sweet potatoes, maize, potatoes, jicamas, capsicums (chile peppers), squashes, and peanuts, cashews, custard apples, guavas, avocadoes, tomatoes, papaya, passion-fruit, pineapples, and sapodillas… In the second wave, American plants, such as cocoa and the sunflower, were brought to India even more recently in the twentieth century.

With them came new words of course — South Asian readers may know of a certain fruit as sapota (in the south) or chiku (in the north), both of which come from a Meso-American word (not sure of the language) chicosapote. The word achar for pickles came from the Carib axi meaning chile pepper.

The paper draws a distinction between how land ownership practices in India and China made a difference in how fast new foods were incorporated into the common diet. In China, a number of reforms allowed “tenancy rights to become inheritable” for peasants, meaning they had an incentive to say in place and try to extract more productivity from the land they had. The new crops, especially the sweet potato, became staples because they provided more calories per acre, and because they were drought- and pest-resistant, required less labor (especially over rice), and could grow in poor soil. Mazumdar writes:

[In the 1920s in south China] sweet potatoes regularly provided a supply of at least three to four months’ worth or food for practically everybody living in the countryside… they were eaten fresh, baked, boiled, or mashed with pickles.. ground into flour and made into noodles, bread, or a gruel… or stirred into a hash.

The sweet potato revolutionized the lives of peasants in China, giving them more calories and freeing time and labor to grow cash crops. Corn and peanuts were also widely cultivated, since corn could also grow in nutrient-poor soils and peanuts are good nitrogen-fixers and could be grown with sugarcane.

India was a different story — there was more arable land and “relatively low population growth between 1600 and 1850.” Due to military conflicts and tensions with zamindars (landlords), villages would often up and leave, transplanting themselves further from conflict or interference. This meant that unlike China, rural farmers were not as tied to specific locations during this period. Colonialism changed all that — people were pinned down and agriculture was commercialized, so in the 19th and 20th centuries American crops started flourishing. The Brits promoted the potato heavily, and increased urbanization brought it and the tomato into the mainstream. Although it’s hard to think of Indian food without tomatoes, potatoes, and chilies, these ingredients were only integrated around 150 years ago!

Linkage

I’m sick today so here are some links.

Click That Hood, a game which asks you to identify neighborhoods. I was lousy at San Diego, but pretty decent at Chicago, even though I’ve lived here for half the time. Go figure.

For those who care about beer, there’s been some news about the blocked merger of Inbev and Modelo. I recommend Erik’s podcast post on the structure of the beer industry (the three-tier system) for those who care about craft beer, and (with reservations) Planet Money’s show on the antitrust regulatory framework that is at work here.

Remember step functions from your signals and systems course? We called them Heaviside step functions after Oliver Heaviside — you can read more about him in this Physics Today article.

Did you know that Pad Thai’s “birth and popularity came out of the nationalist campaign of Field Marshal Plaek Pibulsongkram, one of the revolutionary figures who in 1932 pushed Thailand out of an absolute monarchy?” Neither did I!

I need this album, since I love me some Kurt Weill. I can also live vicariously through NPR’s list of SXSW recommendations.