Sampling one distribution out of another

I am reading this nice paper by Harsha et al on The Communication Complexity of Correlation and some of their results rely on the following cute lemma (slightly reworded).

Let P and Q be two distributions on a finite set \mathcal{X} such that the KL-divergence D(P \| Q)  > 0. Then there exists a sampling procedure which, when given a sequence of samples x_1, x_2, \ldots drawn i.i.d. according to Q, will output (with probability 1) an index i^{\ast} such that the sample x_{i^{\ast}} is distributed according to the distribution P and the expected encoding length of the index i^{\ast} is at most

D(P \| Q) + 2 \log( D(P \| Q) + 1 ) + O(1)

where the expectation is taken over the sample sequence and the internal randomness of the procedure.

So what is the procedure? It’s a rejection sampler that does the following. First set p_0(x) = 0 for all x \in \mathcal{X} and set p_0^{\ast} = 0 as well. Then for each i = 1, 2, \ldots do the following:

  1. \alpha_i(x) = \min\{ P(x) - p_{i-1}(x), \ (1 - p^{\ast}_{i-1}) Q(x) \}
  2. p_i(x) = p_{i-1}(x) + \alpha_i(x)
  3. p_{i^{\ast}} = \sum_{x} p_{i}(x)
  4. \beta_i = \alpha_i(x_i)/( (1 - p_{i-1}^{\ast}) Q(x_i) )
  5. Output i with probability \beta_i.

Ok, so what is going on here? It turns out that \alpha_i(x) is tracking the probability that the procedure halts at i with x_i = x (this is not entirely clear at first). Thus p_i(x) is the probability that we halted before time i and the sample is x, and p_i^{\ast} is the probability that we halt at time i. Thus (1 - p_{i-1}^{\ast}) is the probability that the procedure gets to step i. Getting back to \alpha_i(x), we see that x_i = x with probability Q(x_i) and we halt with probability \beta_i, so indeed, \alpha_i(x) is the probability that we halt at time i with x_i = x.

What we want then is that P(x) = \sum_{i=1}^{\infty} \alpha_i(x). So how should we define \alpha_i(x)? It’s the minimum of two terms : in order to stop at time i such that x_i = x, the factor \alpha_i(x) is at most (1 - p_{i-1}^{\ast}) Q(x). However, we don’t want to let p_{i}(x) = \sum_{j=1}^{i} \alpha_j(x) exceed the target probability P(x), so \alpha_i(x) must be less than P(x) - p_{i-1}(x). The authors say that in this sense the procedure is greedy, since it tries to get p_i(x) as close to P(x) as it can.

In order to show that the sampling is correct, i.e. that the output distribution is P, we want to show what p_i(x) \to P(x). This follows by induction. To get a bound on the description length of the output index, they have to show that

\mathbb{E}[ \log i^{\ast} ] \le D(P \| Q) + O(1).

This is an interesting little fact on its own. It comes out because

i \le \frac{1}{1 - p_{i-1}^{\ast}} \cdot \frac{P(x)}{Q(x)} + 1

and then expanding out \mathbb{E}[ \log i^{\ast} ] = \sum_{x} \sum_{i} \alpha_i(x) \cdot \log i.

One example application of this Lemma in the paper is the following problem of simulation. Suppose Alice and Bob share an unlimited amount of common randomness. Alice has some X drawn according to P(x) and wants Bob to generate a Y according to the distribution P(y | x). So ideally, (X,Y) have joint distribution P(x) P(y | x). They can both generate a sequence of common y_1, y_2, \ldots according to the marginal distribution of Y. Then Alice tries to generate the distribution P(y | x) according to this sampling scheme and sends the index i^{\ast} to Bob, who chooses the corresponding y_{i^{\ast}}. How many bits does Alice have to send on average? It’s just

\mathbb{E}_{P(x)}[ D( P(y|x) \| P(y) ) + 2 \log( D( P(y|x) \| P(y) ) + 1) + O(1) ].

which, after some love from Jensen’s inequality, turns out to be upper bounded by

I(X ; Y) + 2 \log( I(X; Y) + 1) + O(1) ].

Pretty cute, eh?

Concert Bleg : Friede auf Erden with Orchestra Nova

San Diego folks : I’m singing in another concert next weekend! We’re only doing the < 10 minute piece Friede Auf Erden by Schoenberg, but there’s the spoonful of sugar that is Beethoven 5 to help the decayed remnants of late-Romantic harmonics go down smoothly…

Victory Through Peace
Orchestra Nova San Diego
Jung-Ho Pak, conductor

Symphony No. 5
Ludwig van Beethoven
Egmont Overture
Ludwig van Beethoven
Peace on Earth
Arnold Schoenberg
Featuring SACRA/PROFANA, dir. Krishan Oberoi
Ascent to Victory
Nancy Bloomer Deussen

Friday, May 13, 2011 – 7:30 p.m. at St. Paul’s Cathedral, 2728 Sixth Avenue San Diego, CA
Saturday, May 14, 2011 – 7:30 p.m. at the Irwin M. Jacobs Qualcomm Hall, 5775 Morehouse Drive, San Diego, CA
Monday, May 16, 2011 – 7:30 p.m. at Sherwood Auditorium, 700 Prospect Street, La Jolla, CA

Ticket prices depend on venue.

Linkage

I’m heading off to Mexico in less than 12 hours for a week during which I hope to disconnect : no email, web, or phone. I guess I’ll miss the majority of the post-Bin Laden news cycle. In the meantime, here are some more links because I am too lazy to post content.

Speaking of 9/11, this is simply terrible.

An interview with George Saunders, one of my favorite authors.

Blackwell’s proof of Wald’s Identity, as told by Max.

Long pepper looks fascinating and tasty!

Can Voter ID Laws Be Administered in a Race-Neutral Manner? The short answer is no. The longer answer is a 30 page paper.

Frank has blogger about our trip last weekend to The 2nd 8th Annual Grilled Cheese Invitational. My arteries may never be the same again.

There are no more typewriter factories. This makes me especially sad, as I have a 1914 Underwood No. 5 that I love (and lug).

Instant Runoff Voting, STV, AV, and the like

One thing I’ve gotten interested in lately is Instant Runoff Voting (IRV), which is an alternative vote tabulation system to our “first-past-the-post” system here in the US. It’s also known as the Alternative Vote (AV), and in multi-winner elections, the Single Transferrable Vote (STV). I’ll probably blog a bit on-and-off about this topic, but for starters, there’s a lot of activism/partisanship when it comes to promoting different voting systems. Unfortunately, almost all voting systems under consideration fall victim to Arrow’s theorem, which says, basically, that you can’t have a method of aggregating people’s preferences that satisfies a bunch of desirable criteria (under some assumptions on how preferences are given).

IRV or STV is used to elect Representatives in Australia, and the Australian Electoral Commission has a nice video explaining the process. It also mentions the election monitors, which are called scrutineers. That always cracks me up. But I digress. AV has come up more recently in the UK, where people are thinking of using it for Parliamentary elections. The pro-AV side has its videos as well, which seem designed to appear to the beer-lovers out there. However, the polling on its popularity seems to indicate that the switch to AV will not happen. There’s opposition to AV from different sources, and even some small parties don’t think it will make a difference.

I’ve gotten interested in IRV because it’s used in California for some local elections. The recent mayoral election in Oakland was run via IRV, which requires a bit of voter re-education. The outcome of the election was quite interesting, wherein Don Perata, who won the largest share of first-choices, ended up losing because Rebecca Kaplan was eliminated and the second- and third-choices went to Jean Quan. This is exactly the kind of thing proponents of IRV want.

What is less clear is how the mathematics of counting IRV works, and how sensitive the counting process is to errors. A lot of people have written about the former, but there has been less work about the latter, and that’s something I’ve started working on, because auditing the outcome of elections is an important step in ensuring voter confidence in the results.

UPDATE: As Oxeador points out below, Arrow’s Theorem is actually a statement about producing a total order of all the candidates that satisfies a given set of criteria, not about single-winner elections. In particular, if you treat the IRV ordering as the order in which the candidates are eliminated, then IRV would fall under Arrow’s Theorem.

Readings

Felix Gilman, The Half-Made World – A rather stunning and harrowing fantasy/western (don’t think Jonah Hex). I didn’t like it quite as much as Cosma did, but I couldn’t put it down, so that is something.

Jane Margolis, Stuck in the Shallow End : Education, Race, and Computing – really insightful look at the race-based gap in access and enrollment in computer science classes in 3 very different LA high schools. Margolis and her discuss how the actions of teachers, counselors, and administrators create barriers and disincentives that lower black and Latino enrollment in computer sciences when they are available, and that gut computer science classes for everyone in favor of computer skills classes.

John Crowley, Love & Sleep – second book in the Aegypt cycle. I found it more self-indulgent and flatter than the first one, but maybe it’s because the characters are not new to me. The writing is, as always, beautiful, but I was less excited than I was by The Solitudes.

Ian Hacking, The Emergence of Probability – a slim book about early ideas about probability and ending at Bernoulli and Hume’s problem of induction. Hacking traces how “probable” went from meaning “approved of by experts” (as in “probable cause”) to a more aleatoric interpretation, and at the same time how problems such as computing annuities brought forth new foundational questions for philosophers and mathematicians. A key figure in this development was Leibnitz, who worked on developing inductive theories of logic. The last few pages sum it up well — the early development was spurred by changes in how people thought of opinion and on what it should be based. “Probability-and-induction” required a different change in perspective; causation had to be thought of as a problem of opinion rather than of knowledge. I found the book fascinating and pretty easy to read; nice short chapters highlighting one point after the other. Hat tip to Marisa Brandt for the recommendation.

Linkage

Yes yes yes, all my posts are link posts now. I swear, I’ll get back to something more interesting soon, but I always promise that.

People post funny things to ArXiV.

Razib discusses new studies of the genetic origin of Indians.

Tips for food photography. I seem to know several food bloggers now.

A new study about bullying.

The University of Michigan is allowing longer tenure processes. This is in part to address the pressures of getting tenure and starting a family at the same time, but also particularly the culture in the medical school, where “very few faculty in medical schools actually take advantage of such policies [to halt the tenure clock].” The academic Senate Assembly was opposed to the change.

CISE is hiring

I got this in my email:

NSF’s Directorate for Computer and Information Science and Engineering (CISE) seeks candidates for the position of Deputy Assistant Director. The incumbent participates with the Assistant Director in providing leadership and direction to the staff and activities of the directorate and in coordinating activities with the Directorate’s senior managers. The Deputy Assistant Director also serves as a key assistant to the Assistant Director in all phases of the Directorate’s activities and programs.

The full ad is here.

Linkage part deux

Most of these are stolen from MetaFilter.

Welcome back to public blogging, Dan.

All about time zones.

Musical instrument samples. My first UROP at MIT was at the Media Lab, where I helped record instrumentalists as part of a musical instrument identification system. Paris Smaragdis was there at the time, and now he is at UIUC where he has a lot of cool audio demos. There are also some great clips Inside the Music Library at the BBC.

Ridiculous computer interfaces from movies.

Linkage

I’m blogging from Chicago, where it is a balmy 42 degrees but sunny. Whither spring, I ask! Actually, I’m not blogging so much as linking to a bunch of stuff.

For San Diegans, the SD Asian Film Festival Spring Showcase is going on. It looks like I’ll miss a lot of it but I might try to catch something at the end of the week.

Less Pretentious & More Accurate Titles For Literary Masterworks — funny but possibly NSFW.

This home-scanning program seems creepy, regardless of the constitutionality issues.

Unfortunate headlines strike again.

I really like scallion pancakes. I’ll have to try this out when I get back to San Diego.

I agree that this video is awesome. Yo-Yo Ma and Lil Buck. I think that dude is made of rubber. And steel.

Tom Waits was induced into the Rock and Roll Hall of Fame. I just hope I get to see him live some day.

Some things to skim or read from ArXiV when I get the chance:
Sequential Analysis in High Dimensional Multiple Testing and Sparse Recovery (Matt Malloy, Robert Nowak)
Differential Privacy: on the trade-off between Utility and Information Leakage (Mário S. Alvim, Miguel E. Andrés, Konstantinos Chatzikokolakis, Pierpaolo Degano, Catuscia Palamidessi)
Capacity of Byzantine Consensus with Capacity-Limited Point-to-Point Links (Guanfeng Liang, Nitin Vaidya)
Settling the feasibility of interference alignment for the MIMO interference channel: the symmetric square case (Guy Bresler, Dustin Cartwright, David Tse)
Decentralized Online Learning Algorithms for Opportunistic Spectrum Access (Yi Gai, Bhaskar Krishnamachari)
Online and Batch Learning Algorithms for Data with Missing Features (Afshin Rostamizadeh, Alekh Agarwal, Peter Bartlett)
Nonuniform Coverage Control on the Line (Naomi Ehrich Leonard, Alex Olshevsky)
Degree Fluctuations and the Convergence Time of Consensus Algorithms (Alex Olshevsky, John Tsitsiklis)