# New paper: Redundancy of Exchangeable Estimators

More like an old paper… this (finally) a journal version of some older work that we did on analyzing Bayesian nonparametric estimators form an information-theoretic (redundancy) perspective.

Exchangeable random partition processes are the basis for Bayesian approaches to statistical inference in large alphabet settings. On the other hand, the notion of the pattern of a sequence provides an information-theoretic framework for data compression in large alphabet scenarios. Because data compression and parameter estimation are intimately related, we study the redundancy of Bayes estimators coming from Poisson-Dirichlet priors (or “Chinese restaurant processes”) and the Pitman-Yor prior. This provides an understanding of these estimators in the setting of unknown discrete alphabets from the perspective of universal compression. In particular, we identify relations between alphabet sizes and sample sizes where the redundancy is small, thereby characterizing useful regimes for these estimators.

In the large alphabet setting, one thing we might be interested in is sequential prediction: I observe a sequence of butterfly species and want to predict whether the next butterfly I collect will be new or one that I have seen before. One simple way to do this prediction is to put a prior on the set of all distributions on infinite supports and do inference on that model given the data. This corresponds to the so-called Chinese Restaurant Process (CRP) approach to the problem. The information-theoretic view is that sequential prediction is equivalent to compression: the estimator is assigning a probability $q(x^n)$ to the sequence $x^n$ seen so far. An estimator is good if for any distribution $p$, if $x^n$ is drawn i.i.d. according to $p$, then the divergence between $p(x^n)$ and $q(x^n)$ is “small.” The goal of this work is to understand when CRP estimators are good in this sense.

This sort of falls in with the “frequentist analysis of Bayesian procedures” thing which some people work on.

David McAllester, my department chair at TTI, has a started a new blog.

I thought it was pretty well known that people are fairly unique by ZIP code, but Forbes has an article about it now (h/t Raj). Of course, stores can also ping a smartphone’s WiFi to get more accurate location information about your activity within the store — when you check out they can tag your the MAC address of your device to all the other information about you. Creeptastic!

Bradley Efron’s perspective on the impact of Bayes’ Theorem from Science (h/t Kevin).

Some discussion on what makes a popular philosophy book. I wonder what, if anything, transfers over to a popular mathematical book?

Some thoughts from Larry Laudan on the mathematization of the presumption of innocence.

# ArXiV notes : July 2012 part 2

Differentially Private Filtering and Differentially Private Kalman Filtering
Jerome Le Ny, George J. Pappas
These papers apply the differential privacy model to classical signal processing and look at the effect of adding noise before filtering and summing versus adding noise after filtering and summing. The key is that the operations have to be causal, I think — we usually think about differential privacy as operating offline or online in very restricted settings, but here the signals are coming in as time series.

Finite sample posterior concentration in high-dimensional regression
Nate Strawn, Artin Armagan, Rayan Saab, Lawrence Carin, David Dunson
They study ultra high-dimensional linear regression (cue guitars and reverb) in the “large p, small n” setting. The goal is to get a Bernstein-von Mises type theorem — e.g. assuming the data comes from a linear model $\beta_0$, then using Bayesian inference the posterior should concentrate around $\beta_0$. They of course need a sparsity assumption, and the prior must assign reasonable mass around the true parameter and assign low probability to non-sparse signals. The methods use some ideas from compressed sensing (the Dantzig selector) and should be of interest to people working in that area.

Identifying Users From Their Rating Patterns
José Bento, Nadia Fawaz, Andrea Montanari, and Stratis Ioannidis
This is a paper about recommender systems as part of the 2011 CAMRa challenge. They look at the problem of re-identification of users in this data and show that looking at the time stamps of movie ratings is much more useful than looking at the rating values. This suggests to me that people should use a masking system like Anant and Parv’s “Incentivizing Anonymous ‘Peer-to-Peer’ Reviews” (P. Venkitasubramaniam and A. Sahai, Allerton 2008) paper.

Michael Kearns, Mallesh M. Pai, Aaron Roth, Jonathan Ullman
This proposes a variant of differential privacy which they call joint differential privacy and look at mechanism designs that satisfy privacy and are incentive compatible. At first glance, these should be incompatible, since the latter implies “revealing the truth.” The model is one in which each agent has a finite set of actions but its own payoff/value is private. This is somewhat out of my area, so I can’t really get the nuances of what’s going on here, but a crucial assumption here is that there are a large number of agents. Joint differential privacy seems to be a form of $(\epsilon,\delta)$ differential privacy on the utility functions of the users.

I anticipate I will be doing a fair bit more reading in the future, due to the new job and personal circumstances. However, I probably won’t write more detailed notes on the books. This blog should be a rapidly mixing random walk, after all.

Embassytown (China Miéville) : a truly bizarre novel set on an alien world in on which humans have an Embassy but can only communicate with the local aliens in a language which defies easy description. Ambassadors come in pairs, as twins — to speak with the Ariekei they must both simultaneously speak (in “cut” and “turn”). The Ariekei’s language does not allow lying, and they have contests in which they try to speak falsehoods. However, events trigger a deadly change (I don’t want to give it away). Philosophically, the book revolves a lot around how language structures thought and perception, and it’s fascinating if you like to think about those things.

Chop Suey: A Cultural History of Chinese Food in the United States (Andrew Coe) : an short but engaging read about how Chinese food came to the US. The book starts really with Americans in China and their observations on Chinese elite banquets. A particular horror was that the meat came already chopped up — no huge roasts to carve. Chapter by chapter, Coe takes us through the railroad era through the 20s, the mass-marketing of Chinese food and the rise of La Choy, through Nixon going to China. The book is full of fun tidbits and made my flights to and from Seattle go by quickly.

The Thousand Autumns of Jacob de Zoet: A Novel (David Mitchell) : I really love David Mitchell’s writing, but this novel was not my favorite of his. It was definitely worth reading — I devoured it — but the subject matter is hard. Jacob de Zoet is a clerk in Dejima, a Dutch East Indies trading post in 19th century Japan. There are many layers to the story, and more than a hint of the grotesque and horrific, but Mitchell has an attention to detail and a mastery with perspective that really makes the place and story come alive.

Air (Geoff Ryman) : a story about technological change, issues of the digital divide, economic development, and ethnic politics, set in a village in fictional Karzistan (looks like Kazakhstan). Air is like having mandatory Internet in your brain, and is set to be deployed globally. During a test run in the village, Chung Mae, a “fashion expert,” ends up deep into Air and realizes that the technology is going to change their lives. She goes about trying (in a desperate, almost mad way) to tell her village and bring them into the future before it overwhelms them. There’s a lot to unpack here, especially in how technology is brought to rural communities in developing nations, how global capital and the “crafts” market impacts local peoples, and the dynamics of village social orders. It’s science fiction, but not really.

The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy (Sharon Bertsch McGrayne) : an engaging read about the history of Bayesian ideas in statistics. It reads a bit like an us vs. them, the underdog story of how Bayesian methods have overcome terrible odds (prior beliefs?) to win the day. I’m not sure I can give it as enthusiastic a review as Christian Robert, but I do recommend it as an engaging popular nonfiction read on this slice in the history of modern statistics. In particular, it should be entertaining to a general audience.

Dangerous Frames: How Ideas about Race and Gender Shape Public Opinion (Nicholas J.G. Winter) : the title says most of it, except it’s mostly about how ideas about race and gender shape white public opinion. The basic theoretical structure is that there are schemas that we carry that help us interpret issues, like a race schema or a gender schema. Then there are frames or narratives in which issues are put. If the schema is “active” and an issue is framed in a way that is concordant with the schema, then people’s opinions follow the schema, even if the issue is not “about” race or gender. This is because people reason analogically, so they apply the schema if it matches. To back up the theory, Winter has some experiments, both of the undergrads doing psych studies type as well as survey data, to show that by reframing certain issues people’s “natural” beliefs can be skewed by the schema that they apply. The schemas he discusses are those of white Americans, mostly, so the book feels like a bit of an uncomfortable read because he doesn’t really interrogate the somewhat baldly racist schemas. The statistics, as with all psychological studies, leaves something to be desired — I take the effects he notices at a qualitative level (as does he, sometimes).