# Randomized response, differential privacy, and the elusive biased coin

In giving talks to broader audiences about differential privacy, I’ve learned quickly (thanks to watching talks by other experts) that discussing randomized response first is an easy way to explain the kind of “plausible deniability” guarantee that differentially private algorithms give to individuals. In randomized response, the setup is that of local privacy: the simplest model is that a population of $n$ individuals with data $x_1, x_2, \ldots, x_n \in \{0,1\}$ representing some sensitive quantity are to be surveyed by an untrusted statistician. Concretely, suppose that the individual bits represent whether the person is a drug user or not. The statistician/surveyor wants to know the fraction $p = \frac{1}{n} \sum x_i$ of users in the population. However, individuals don’t trust the surveyor. What to do?

The surveyor can give the individuals a biased coin that comes up heads with probability $q < 1/2$. The individual flips the coin in private. If it comes up heads, they lie and report $y_i = 1 - x_i$. If it comes up tails, they tell the truth $y_i = x_i$. The surveyor doesn’t see the outcome of the coin, but can compute the average of the $\{y_i\}$. What is the expected value of this average?

$\mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^{n} y_i \right] = \frac{1}{n} \sum_{i=1}^{n} (q (1 - x_i) + (1 -q) x_i) = q + (1 - 2q) p$.

So we can invert this to solve for $p$: if we have a reported average $\bar{y} = \frac{1}{n} \sum y_i$ then estimate $p$ by

$\hat{p} = \frac{\bar{y} - q}{ 1 - 2 q }$.

What does this have to do with differential privacy? Each individual got to potentially lie about their drug habits. So if we look at the hypothesis test for a surveyor trying to figure out if someone is a user from their response, we get the likelihood ratio

$\frac{ \mathbb{P}( y_i = 1 | x_i = 1 ) }{ \mathbb{P}( y_i = 1 | x_i = 0 ) } = \frac{1 - q}{q}$

If we set $\epsilon = \log \frac{1 - q}{q}$, we can see that the protocol guarantees differential privacy. This gives a possibly friendlier interpretation of $\epsilon$ in terms of the “lying probability” $q$. We can plot this:

Epsilon versus lying probability

This is a bit pessimistic — it says that to guarantee reasonable “lying probability” we need $\epsilon \ll 1$, but in practice this turns out to be quite difficult. Why so pessimistic? The differential privacy thread model is pretty pessimistic — it’s your plausible deniability given that everyone else in the data set has revealed their data to the surveyor “in the clear.” This is the fundamental tension in thinking about the practical implications of differential privacy — we don’t want to make conditional guarantees (“as long as everyone else is secret too”) but the price of an unconditional guarantee can be high in the worst case.

So how does randomized response work in practice? It seems we would need a biased coin. Maybe one can custom order them from Alibaba? Turns out, the answer is not really. Gelman and Nolan have an article about getting students to try and evaluate the bias of a coin — the physics of flipping would seem to dictate that coins are basically fair. You can load dice, but not coins. I recommend reading through the article — it sounds like a fun activity, even for graduate students. Maybe I’ll try it in my Detection and Estimation course next semester.

Despite the widespread prevalence of “flipping a biased coin” as a construction in probability, randomized algorithms, and information theory, a surprisingly large number of people I have met are completely unaware of the unicorn-like nature of biased coins in the real world. I guess we really are in an ivory tower, eh?

# ITA 2015: quick takes

Better late than never, I suppose. A few weeks ago I escaped the cold of New Jersey to my old haunts of San Diego. Although La Jolla was always a bit fancy for my taste, it’s hard to beat a conference which boasts views like this:

A view from the sessions at ITA 2015

I’ll just recap a few of the talks that I remember from my notes — I didn’t really take notes during the plenaries so I don’t have much to say about them. Mostly this was due to laziness, but finding the time to blog has been challenging in this last year, so I think I have to pick my battles. Here’s a smattering consisting of

$\{ \mathrm{talks\ attended} \} \cap \{ \mathrm{talks\ with\ understandable\ notes} \}$

(Information theory)
Emina Soljanin talked about designing codes that are good for fast access to the data in distributed storage. Initial work focused on how to repair codes under disk failures. She looked at how easy it is to retrieve the information afterwords to guarantee some QoS for the storage system. Adam Kalai talked about designing compression schemes that work for an “audience” of decoders. The decoders have different priors on the set of elements/messages so the idea is to design an encoder that works for this ensemble of decoders. I kind of missed the first part of the talk so I wasn’t quite sure how this relates to classical work in mismatched decoding as done in the information theory world. Gireeja Ranade gave a great talk about defining notions of capacity/rate need to control a system which as multiplicative uncertainty. That is, $x[n+1] = x[n] + B[n] u[n]$ where $B[n]$ has the uncertainty. She gave a couple of different notions of capacity, relating to the ratio $| x[n]/x[0] |$ — either the expected value of the square or the log, appropriately normalized. She used a “deterministic model” to give an explanation of how control in this setting is kind of like controlling the number of significant bits in the state: uncertainty increases this and you need a certain “amount” of control to cancel that growth.

(Learning and statistics)
I learned about active regression approaches from Sivan Sabato that provably work better than passive learning. The idea there is do to use a partition of the X space and then do piecewise constant approximations to a weight function that they use in a rejection sampler. The rejection sampler (which I thought of as sort of doing importance sampling to make sure they cover the space) helps limit the number of labels requested by the algorithm. Somehow I had never met Raj Rao Nadakuditi until now, and I wish I had gotten a chance to talk to him further. He gave a nice talk on robust PCA, and in particular how outliers “break” regular PCA. He proposed a combination of shrinkage and truncation to help make PCA a bit more stable/robust. Laura Balzano talked about “estimating subspace projections from incomplete data.” She proposed an iterative algorithm for doing estimation on the Grassmann manifold that can do subspace tracking. Constantine Caramanis talked about a convex formulation for mixed regression that gives a guaranteed solution, along with minimax sample complexity bounds showing that it is basically optimal. Yingbin Liang talked about testing approaches for understanding if there is an “anomalous structure” in a sequence of data. Basically for a sequence $Y_1, Y_2, \ldots, Y_n$, the null hypothesis is that they are all i.i.d. $\sim p$ and the (composite) alternative is that there an interval of indices which are $\sim q$ instead. She proposed a RKHS-based discrepancy measure and a threshold test on this measure. Pradeep Ravikumar talked about a “simple” estimator that was a “fix” for ordinary least squares with some soft thresholding. He showed consistency for linear regression in several senses, competitive with LASSO in some settings. Pretty neat, all said, although he also claimed that least squares was “something you all know from high school” — I went to a pretty good high school, and I don’t think we did least squares! Sanmi Koyejo talked about a Bayesian devision theory approach to variable selection that involved minimizing some KL-divergence. Unfortunately, the resulting optimization ended up being NP-hard (for reasons I can’t remember) and so they use a greedy algorithm that seems to work pretty well.

(Privacy)
Cynthia Dwork gave a tutorial on differential privacy with an emphasis on the recent work involving false discovery rate. In addition to her plenary there were several talks on differential privacy and other privacy measures. Kunal Talwar talked about their improved analysis of the SuLQ method for differentially private PCA. Unfortunately there were two privacy sessions in parallel so I hopped over to see John Duchi talk about definitions of privacy and how definitions based on testing are equivalent to differential privacy. The testing framework makes it easier to prove minimax bounds, though, so it may be a more useful view at times. Nadia Fawaz talked about privacy for time-series data such as smart meter data. She defined different types of attacks in this setting and showed that they correspond to mutual information or directed mutual information, as well as empirical results on a real data set. Raef Bassily studied a estimation problem in the streaming setting where you want to get a histogram of the most frequent items in the stream. They reduce the problem to one of finding a “unique heavy hitter” and develop a protocol that looks sort of like a code for the MAC: they encode bits into a real vector, had noise, and then add those up over the reals. It’s accepted to STOC 2015 and he said the preprint will be up soon.

# WIFS 2014

This week I took a quick jaunt down to Atlanta to attend part of WIFS 2014 (co-located with GlobalSIP 2014). Kamalika and I were invited to give a talk on differential privacy and machine learning, based on our IEEE Signal Processing Magazine article. I’ve uploaded the slides of the tutorial to my website and we’re planning on making a video (audio over slides) version for SigView as well as on YouTube.

Much like last year, GlobalSIP had a somewhat disjointed, semi-chaotic feel (exacerbated by tiredness, I am sure) — it’s really a collection of semi-interacting workshops in the same space, and I knew people in several of the other workshops. Since I was there for a day and giving a tutorial at WIFS, I decided to stick with WIFS for the day. To give a sense of how confusing it all was, here’s a picture of the guide to deciphering the program book:

Overly-complicated rules for encoding sessions

The keynote for GlobalSIP was given by Vince Poor on information-theoretic privacy via rate distortion (this is the work with Lalitha). Vince did a good job of not over-IT-ing it I think, which was good because the audience was pretty diverse and it’s not clear that many of the people there had even taken a course on information theory. This seems to be the big challenge in multi-disciplinary conferences like GlobalSIP (or large signal processing conferences in general) — everyone is in signal processing, but it’s a big tent and it’s hard to reach everyone.

Min Wu was the keynote speaker for the WIFS workshop on the day I attended. Her talk, on “Exploring Power Network Signatures for Information Forensics” was about how to glean information from power fluctuations in networks, or electronic network frequency (ENF). Different processes or operations have different power demands — by matching these signatures to an observed signal (e.g. a video), one can make inferences about the time/location/integrity of the data. For example, were the audio and visual tracks in a video taken at the same time or merged later? This whole area is quite interesting, and while I was sort of aware of this work I hadn’t really read up on much of it.

Perhaps it was the end of the semester kicking in, but I sort of took terrible notes on most of the talks and poster sessions at the conference, so I can’t really write coherently about the papers I saw. Unfortunately I had to run back to teach the penultimate lecture in my class. I guess now that I have a “real job” this is going to be the way it works from now on. Kind of sad, really.

# An exercise in careful misreading

A recent article was passed along to me:

The article is aimed at the legal community, which has seen in differential privacy a potential technological solution for data privacy issues. The goal of the article is to throw some cold water on some law scholars’ embrace of differential privacy as a solution concept. I’m not a one-method-fixes-all kind of person, but this article is sort of relentlessly negative about differential privacy based solely on a single mechanism: output perturbation. The authors appear to be laboring under the impression that this is really the only way to provide differential privacy, “an assumption that contorts the rest of [their] analysis,” the charge that they level at one proponent of differential privacy.

In the example with which they open the paper, they claim that “even knowing the distribution of noise that is randomly added to each cell, the internist has no hope of interpreting the response. The true values could be almost anything.” While technically true, it’s quite misleading. Indeed, by knowing the distribution, one can create bounds on the accuracy of the answer — this is, contra the authors’ claims, the “tension between utility and privacy” that differential privacy researchers do “toil” with. They manage to explain the statistics fairly reasonably in the middle of the paper but ignore that in the introduction and conclusion in favor of some ascerbic bons mots. Now, perhaps to them, privacy should be an almost-sure guarantee. There is a critique in that: differential privacy can only make probabilistic guarantees, and if your legal standard is stricter than that, then it’s probably not a good way to go. But the misleading rhetoric employed here is meant to stir emotions rather than sway the intellect.

The language in the article is quite florid: “Differential privacy has been rocking the computer science world for over ten years and is fast becoming a crossover hit among privacy scholars and policymakers.” I suppose this sort of prose may be what constitutes scholarly writing in law, but it lacks the measured tones that one might want in more objective criticism. Perhaps they read academic writing in science and engineering in an equally emotional register. They use some strong language to conclude “differential privacy is either not practicable or not novel.” I find such blanket statements both puzzling and vacuous. If you set up a straw-man of what differential privacy is, I suppose you can derive such dichotomies, but is that the best argument one can make?

One thing that comes out of this reading is that most people don’t really appreciate how technology progresses from academic research to practical solutions. Perhaps some legal scholars have overstated the case for differential privacy based on the state of the technology now. But whose to say how things will look a few years down the line? We’ll have better algorithms, different database structures, and different data sharing mechanisms and platforms. Perhaps differential privacy is not ready for prime time, although Google seems to disagree. The authors’ main point (hidden in the in the breathless indignation) is that it’s probably not the best solution for every data sharing problem, a statement with which I can completely agree.

In their effort to discredit differential privacy, the authors ignore both the way in which scientific and academic research works as well as contemporary work that seeks to address the very problems they raise: context-awareness via propose-test-release, methods for setting $\epsilon$ in practical scenarios, and dealing with multiple disclosures via stronger composition rules. They further ignore real technical hurdles in realizing “pure” differential privacy in favor of “illustrations” with the goal of painting proponents of differential privacy as ideologues and hucksters. Of course context and judgement are important in designing query mechanisms and privacy-preserving analysis systems. Furthermore, in many cases microdata have to be released for legal reasons. I think few people believe that differential privacy is a panacea, but it at least provides a real quantifiable approach to thinking about these privacy problems that one can build theories and algorithms around. The key is to figure out how to make those work on real data, and there’s a lot more research to be done on that front.

I’ve been a bit bogged down upon getting back from traveling, but here are a few interesting technical tidbits that came through.

Aaron Roth and Cynthia Dwork’s Foundation and Trends monograph on differential privacy is now available.

Speaking of differential privacy, Shiva Kasiviswanathan and Adam Smith have a paper in the Journal of Privacy and Confidentiality on Bayesian interpretations of differential privacy risk.

Deborah Mayo has a post up on whether p-values are error probabilities.

Raymond Yeung is offering a Coursera course on information theory (via the IT Society).

A CS Theory take on Fano’s inequality from Suresh over at the GeomBlog.

# yet more not-so-recent hits from ArXiV

Some shorter takes on these papers, some of which I should read in more detail later. I figure I’ll use the blog for some quick notes and to see if any readers have any comments/ideas about these:

Differentially Private Convex Optimization with Piecewise Affine Objectives (Shuo Han, Ufuk Topcu, George J. Pappas) — arXiv:1403.6135 [math.OC]. The idea here is to look at minimizing functions of the form
$f(x) = \max_{i = 1,2, \ldots, m} \{ a_i^{\top} x + b_i \}$
subject to $x$ belonging to some convex polytope $\mathcal{P}$. This is a bit different than the kind of convex programs I’ve been looking at (which are more ERM-like). Such programs occur often in resource allocation problems. Here the private information of users are the offsets $b_i$. They propose a number of methods for generating differentially private approximations to this problem. Analyzing the sensitivity of this optimization is tricky, so they use an upper bound based on the diameter of the feasible set $\mathcal{P}$ to find an appropriate noise variance. The exponential mechanism also gives a feasible mechanism, although the exact dependence of the suboptimality gap on $\epsilon$ is unclear. They also propose a noisy subgradient method where, instead of using SGD, they alter the sampling distribution using the exponential mechanism to choose a gradient step. Some preliminary experiments are also given (although none exploring the dependence on $\epsilon$, which would also be very interesting)!

Assisted Common Information with an Application to Secure Two-Party Sampling (Vinod M. Prabhakaran, Manoj M. Prabhakaran) — arXiv:1206.1282 [cs.IT]. This is the final version of the journal version of a few conference papers that Vinod and Manoj have done on an interesting variant of the Gács-Körner problem. The motivation is from secure multiparty computation — the problem also touches on some work Vinod and I started but is sadly languishing due to the utter overwhelmingness of starting a new job. Hopefully I can get back to it this summer.

Analysis of Distributed Stochastic Dual Coordinate Ascent (Tianbao Yang, Shenghuo Zhu, Rong Jin, Yuanqing Lin) — arXiv:1312.1031 [cs.DC]. The title pretty much sums it up. I’m interested in looking a bit more at the analysis method, since I had a similar algorithm bouncing around my head that I would like to analyze. The main idea is also update the primal variables to achieve a speedup/use a larger step size.

Convergence of Stochastic Proximal Gradient Algorithm (Lorenzo Rosasco, Silvia Villa, Bang Công Vũ) — arXiv:1403.5074 [math.OC]. This is a similar setup as my last post, with a convex objective that has a smooth and non-smooth component. They show convergence in expectation and almost surely. The key here is that they show convergence in an infinite-dimentionsal Hilbert space instead of, say, $\mathbb{R}^d$.

/

# Differential privacy and the AUC

One of the things I’m always asked when giving a talk on differential privacy is “how should we interpret $\epsilon$?” There a lot of ways of answering this but one way that seems to make more sense to people who actually think about risk, hypothesis testing, and prediction error is through the “area under the curve” metric, or AUC. This post came out of a discussion from a talk I gave recently at Boston University, and I’d like to thank Clem Karl for the more detailed questioning.

/