Randomized response, differential privacy, and the elusive biased coin

In giving talks to broader audiences about differential privacy, I’ve learned quickly (thanks to watching talks by other experts) that discussing randomized response first is an easy way to explain the kind of “plausible deniability” guarantee that differentially private algorithms give to individuals. In randomized response, the setup is that of local privacy: the simplest model is that a population of $n$ individuals with data $x_1, x_2, \ldots, x_n \in \{0,1\}$ representing some sensitive quantity are to be surveyed by an untrusted statistician. Concretely, suppose that the individual bits represent whether the person is a drug user or not. The statistician/surveyor wants to know the fraction $p = \frac{1}{n} \sum x_i$ of users in the population. However, individuals don’t trust the surveyor. What to do?

The surveyor can give the individuals a biased coin that comes up heads with probability $q < 1/2$ . The individual flips the coin in private. If it comes up heads, they lie and report $y_i = 1 - x_i$ . If it comes up tails, they tell the truth $y_i = x_i$ . The surveyor doesn’t see the outcome of the coin, but can compute the average of the $\{y_i\}$ . What is the expected value of this average?

$\mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^{n} y_i \right] = \frac{1}{n} \sum_{i=1}^{n} (q (1 - x_i) + (1 -q) x_i) = q + (1 - 2q) p$ .

So we can invert this to solve for $p$ : if we have a reported average $\bar{y} = \frac{1}{n} \sum y_i$ then estimate $p$ by

$\hat{p} = \frac{\bar{y} - q}{ 1 - 2 q }$ .

What does this have to do with differential privacy? Each individual got to potentially lie about their drug habits. So if we look at the hypothesis test for a surveyor trying to figure out if someone is a user from their response, we get the likelihood ratio

$\frac{ \mathbb{P}( y_i = 1 | x_i = 1 ) }{ \mathbb{P}( y_i = 1 | x_i = 0 ) } = \frac{1 - q}{q}$

If we set $\epsilon = \log \frac{1 - q}{q}$ , we can see that the protocol guarantees differential privacy. This gives a possibly friendlier interpretation of $\epsilon$ in terms of the “lying probability” $q$ . We can plot this:

Epsilon versus lying probability

This is a bit pessimistic — it says that to guarantee reasonable “lying probability” we need $\epsilon \ll 1$ , but in practice this turns out to be quite difficult. Why so pessimistic? The differential privacy thread model is pretty pessimistic — it’s your plausible deniability given that everyone else in the data set has revealed their data to the surveyor “in the clear.” This is the fundamental tension in thinking about the practical implications of differential privacy — we don’t want to make conditional guarantees (“as long as everyone else is secret too”) but the price of an unconditional guarantee can be high in the worst case.

So how does randomized response work in practice? It seems we would need a biased coin. Maybe one can custom order them from Alibaba? Turns out, the answer is not really. Gelman and Nolan have an article about getting students to try and evaluate the bias of a coin — the physics of flipping would seem to dictate that coins are basically fair. You can load dice, but not coins. I recommend reading through the article — it sounds like a fun activity, even for graduate students. Maybe I’ll try it in my Detection and Estimation course next semester.

Despite the widespread prevalence of “flipping a biased coin” as a construction in probability, randomized algorithms, and information theory, a surprisingly large number of people I have met are completely unaware of the unicorn-like nature of biased coins in the real world. I guess we really are in an ivory tower, eh?

3 thoughts on “Randomized response, differential privacy, and the elusive biased coin”

I am surprised to learn this about coins. I’d assumed that there were novelty manufacturers at least who made biased or bias-able coins.

Anand Sarwate says:

on December 8, 2015 at 11:13 pm

Yeah me too! I had it on my list to order a box in case I got a chance to do a demo or something, but it turns out there’s no such thing. I felt sad and then also embarrassed for not knowing this.

Reply

Pingback: Where Are The Unfair Coins? | nebusresearch

	Zonghong Liu on A story about Canvas
	anonymousskimmer on “The needs of the many,…
	Chanterelle Recipes… on Broiled shrimp with chanterell…
	kvarsh on ICML 2019 encouraged code subm…
	Pulkit Grover on gender inclusivity in communic…

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

An Ergodic Walk

a process whose average over time converges to the true average

Randomized response, differential privacy, and the elusive biased coin

3 thoughts on “Randomized response, differential privacy, and the elusive biased coin”

Leave a comment Cancel reply

Share this:

Related

3 thoughts on “Randomized response, differential privacy, and the elusive biased coin”

Leave a comment Cancel reply