# Randomized response, differential privacy, and the elusive biased coin

In giving talks to broader audiences about differential privacy, I’ve learned quickly (thanks to watching talks by other experts) that discussing randomized response first is an easy way to explain the kind of “plausible deniability” guarantee that differentially private algorithms give to individuals. In randomized response, the setup is that of local privacy: the simplest model is that a population of $n$ individuals with data $x_1, x_2, \ldots, x_n \in \{0,1\}$ representing some sensitive quantity are to be surveyed by an untrusted statistician. Concretely, suppose that the individual bits represent whether the person is a drug user or not. The statistician/surveyor wants to know the fraction $p = \frac{1}{n} \sum x_i$ of users in the population. However, individuals don’t trust the surveyor. What to do?

The surveyor can give the individuals a biased coin that comes up heads with probability $q < 1/2$. The individual flips the coin in private. If it comes up heads, they lie and report $y_i = 1 - x_i$. If it comes up tails, they tell the truth $y_i = x_i$. The surveyor doesn’t see the outcome of the coin, but can compute the average of the $\{y_i\}$. What is the expected value of this average?

$\mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^{n} y_i \right] = \frac{1}{n} \sum_{i=1}^{n} (q (1 - x_i) + (1 -q) x_i) = q + (1 - 2q) p$.

So we can invert this to solve for $p$: if we have a reported average $\bar{y} = \frac{1}{n} \sum y_i$ then estimate $p$ by

$\hat{p} = \frac{\bar{y} - q}{ 1 - 2 q }$.

What does this have to do with differential privacy? Each individual got to potentially lie about their drug habits. So if we look at the hypothesis test for a surveyor trying to figure out if someone is a user from their response, we get the likelihood ratio

$\frac{ \mathbb{P}( y_i = 1 | x_i = 1 ) }{ \mathbb{P}( y_i = 1 | x_i = 0 ) } = \frac{1 - q}{q}$

If we set $\epsilon = \log \frac{1 - q}{q}$, we can see that the protocol guarantees differential privacy. This gives a possibly friendlier interpretation of $\epsilon$ in terms of the “lying probability” $q$. We can plot this:

Epsilon versus lying probability

This is a bit pessimistic — it says that to guarantee reasonable “lying probability” we need $\epsilon \ll 1$, but in practice this turns out to be quite difficult. Why so pessimistic? The differential privacy thread model is pretty pessimistic — it’s your plausible deniability given that everyone else in the data set has revealed their data to the surveyor “in the clear.” This is the fundamental tension in thinking about the practical implications of differential privacy — we don’t want to make conditional guarantees (“as long as everyone else is secret too”) but the price of an unconditional guarantee can be high in the worst case.

So how does randomized response work in practice? It seems we would need a biased coin. Maybe one can custom order them from Alibaba? Turns out, the answer is not really. Gelman and Nolan have an article about getting students to try and evaluate the bias of a coin — the physics of flipping would seem to dictate that coins are basically fair. You can load dice, but not coins. I recommend reading through the article — it sounds like a fun activity, even for graduate students. Maybe I’ll try it in my Detection and Estimation course next semester.

Despite the widespread prevalence of “flipping a biased coin” as a construction in probability, randomized algorithms, and information theory, a surprisingly large number of people I have met are completely unaware of the unicorn-like nature of biased coins in the real world. I guess we really are in an ivory tower, eh?