Truth in surveying

A few weeks ago I attended Scott Kominers‘s class on Market Design. They were talking about mechanism design and differential privacy so I felt like it would be fun to attend that session. In the class Scott mentioned some interesting work by Nicholas Lambert and Yoav Shoham on Truthful Surveys that appeared at WINE 2008. There’s also some recent work by Aaron Roth and Grant Schoenebeck up on ArXiV.

In Lambert and Shoham’s set up, the opinion distribution of a population is given by some CDF F(x) (with a density) on the unit interval [0,1]. We can think of x as a level of approval (say of a politician) and F(x) as the proportion of the population which has approval less than x. A surveyor selects n agents \{x_i\} i.i.d. from F and asks them to report their opinion. They can report anything they like, however, so they will report \{r_i\}. In order to incentivize them, the surveyor will issue a payment \Pi_i( r_1, \ldots, r_n ) to each agent i. How should we structure the payments to incentivize truthful reporting? In particular, can we make a mechanism in which being truthful is a Nash equilibrium (“accurate”) or the only Nash equilibrium (“strongly accurate”)?

Let A_i = |\{j : r_i  r_j \}|. They propose partitioning the agents into k groups with \mathcal{G}(i) denoting the group of agent $i$, and \tilde{F}_i(x) as an unbiased estimator of F(x) that uses the points \{r_j : \mathcal{G}_j \ne \mathcal{G}_i \}. The payments are:

\Pi_i(\{r_j\}) = \frac{1}{|\mathcal{G}_i| - 1} \left[ A_i - B_i \right] + 2 \tilde{F}_i(r_i) - \frac{2}{|\mathcal{G}_i| - 1} \sum_{j \in \mathcal{G}_i \setminus \{i\} } \tilde{F}_j(r_j)

This mechanism is accurate and also permutation-invariant with respect to the agents (“anonymous”) and the sum of the payments is 0 (“budget-balanced”).

This is an instance of a more general mechanism for truthfully inducing samples from a collection of distributions that are known — each agent has a distribution F_i and you want to get their sample of that distribution. Here what they do is replace the known distributions with empirical estimates, in a sense. Why is this only accurate and not strongly accurate? It is possible that the agents could collude and pick a different common distribution G and report values from that. Essentially, each group has an incentive to report from the same distribution and then globally the optimal thing is for all the groups to report from the same distribution, but that distribution need not be F if there is global collusion. How do we get around this issue? If there is a set of “trusted” agents \mathcal{T}, then the estimators in the payment model can be built using the trusted data and the remaining untrusted agents can be put in a single group whose optimal strategy is now to follow the trusted agents. That mechanism is strongly accurate. In a sense the trusted agents cause the population to “gel” under this payment strategy.

It seems that Roth and Schoenbeck are not aware of Lambert and Shoham’s work, or it is sufficiently unrelated (they certainly don’t cite it). They also look at truth in surveying from a mechanism design perspective. Their model is somewhat more involved (an has Bayesian bits), but may be of interest to readers who like auction design.

Linkage

Congratulations to my fellow Beast Amitha Knight on being a co-winner of the 2012 PEN New Enlgand Susan P. Bloom Children’s Book Discovery Award!

Speaking of children’s books, some people who saw The Hunger Games movie are upset that Rue is black. Unsurprising but sad.

And speaking of friends, my friend Amber is slumming it in Antarctica and is writing some fascinating blog posts from down there.

Can Ellen Do More Push-Ups Than Michelle Obama? They both seem to be able to do more pushups than me. Time to hit the gym I think.

I’ve been eating this spicy peanut noodle salad for lunch this week and boy is it delicious.

Typical review loads

Since becoming faculty at TTI, I’ve started to appreciate better the tensions of service commitments and I can see how many people begin to view reviewing as a chore, a burden they must bear to maintain goodwill in the “community.” Since I work in a few different communities now, I end up reviewing papers from a lot of different areas : information theory and signal processing of course, but also machine learning, security, and networks. There’s been a distinct uptick in my reviewing queue, which I find somewhat alarming.

Looking back, I did a quick calculation and in the almost 6 months I’ve been here, I’ve either finished or committed to reviewing 9 journal papers and 16 conference papers. These numbers don’t really mean too much, because some journal papers are shorter (e.g. a correspondence) and some conference papers are long (40+ pages including supplementary material). Page numbers also don’t really help because of formatting differences. I’m hoping my new iPad (ooh, shiny!) will let me pack in some reviewing time during my commute and stop me from killing so many trees.

However, I have no idea if these numbers are typical. I’ve turned down review requests because I felt like I don’t have enough time as it is. So readers : what’s a typical review load like? Should I just suck it up and accept more reviews?

Note that I’m not asking about what’s “fair” in terms of I submit N papers and therefore should review 3N or something like that. Those games are fine and all, but I really wonder what the distribution of review load is across individuals for a given journal. More on that point later…

Update: I should be clear that being on a PC will clearly cause your review load to go up. I am on 2 PCs but for smaller conferences; having 10+ ISIT reviews would add significantly to one’s total load.

Updated perl script for merging TeX files for ArXiV

Manu Sridharan (blog) left a comment the other day on my old post on my script to merge multiple TeX files (and strip the comments) for posting to ArXiV. He’s created a git repository for it, which seem so much more official and stuff. It’s at:

Thanks a bunch, Manu!

As a side note, Péter Gács has a de-macro script to eliminate all of your private macros if you’re so inclined.