# Paper a day(?) : optimal mechanisms for differential privacy

Maybe more like “paper whenever I feel like it.” This is a post on a now not-so-recent ArXiV preprint by Quan Geng and Pramod Viswanath on constructing the best mechanism (output distribution) for guaranteeing differential privacy under a utility constraint.

For those readers not quite familiar with differential privacy, the setup may seem a little enigmatic. The idea is that there are $n$ individuals represented by $n$ points in a space $\mathcal{D}$. The objective is to compute a real-valued function of the data points such that from the output $Y$ it will be difficult to infer any particular value of the input. We do this by making the computation randomized, and so this is a property of the conditional distribution $P( y | D)$ where $D \in \mathcal{D}^n$ is the data set. In particular, for $D_1$ and $D_2$ differing in a single point (denoted $D_1 \tilde D_2$, we want the hypothesis test between the two, given the output, to be difficult:

$\left| \log \frac{ P( y | D_1) }{ P(y | D_2) } \right| \le \epsilon$.

Since the log likelihood ratio is small, the hypothesis test is difficult, and an adversary would have a hard time inferring any individual’s data, even if the other data points are revealed. I’m being a bit loose here and saying that the output has a density but you can undo the logs and so on to get a better statement:

$P( S | D_1) \le \exp(\epsilon) P(S | D_2)$.

for any measurable set $S \subseteq \mathbb{R}$. We call an output distribution satisfying this an $\epsilon$-differentially private mechanism.

The focus of the paper is on questioning the ubiquity of the Laplace distribution in guaranteeing differential privacy. In particular, given a real-valued query function $q : \mathcal{D}^n \to \mathbb{R}$ that operates on $n$ data points from a domain $\mathcal{D}$, it’s not clear that producing $q(D) + X$ where $X$ is Laplace-distributed noise is the best differentially-private approximation to $q$. The contribution of this paper is to show that if there is a utility function that we want the output to maximize then the optimal distribution may be quite far from Laplace — instead it is often staircase-shaped.

More formally, they consider mechanisms of the form $q(D) + X$ and ask what the optimal distribution of $X$ to minimize an expected loss:

$L(P) = \int_{x} \ell(x) dP$,

where $P$ is the distribution of $X$ and $\ell(\cdot)$ is some loss on $\mathcal{D}$. We want to minimize $L(P)$ subject to the differential privacy constraints:

$P(S) \le \exp(\epsilon) P(S + d)$

for all measurable $S \subseteq \mathbb{R}$ and all offsets $d \le \Delta$, where $\Delta$ is the sensitivity of the query function $q$:

$\Delta = \max_{D_1 \tilde D_2} | q(D_1) - q(D_2) |$

So this really boils down to a constrained optimization program reminiscent of those we sometimes find in information theory, like finding the capacity achieving input distribution.

The main point of the paper is that the optimal distribution is not Laplace, but staircase-shaped:

Laplace and staircase densities

And that as $\epsilon \to 0$ the Laplace distribution does become optimal, whereas as $\epsilon \to \infty$ it is not. Since choosing an appropriate $\epsilon$ in practical settings is an open question, the “moderate” $\epsilon$ regime is interesting.

How does one prove this result? Basically you have to show that the optimum distribution satisfies various properties. They also need to assume that the cost function is symmetric (not always the case, but usually true) and that it cannot increase too quickly (which is reasonable in many cases). The main approach is to take any differentially private output density and instead look at piecewise constant densities as approximations. This is noncontroversial — the same trick is used in functional analysis. The real key insight is that the steps are of width $\Delta$. This is done in two parts — first they discretize to subdivisions of the $\Delta$ intervals, and then they show that for $[0,\Delta)$ the optimal density is a step function. This is done in Lemmas 23 and 24 (!).

There are more extensions and examples in the paper, but it’s worth a skim if you have a passing interest in differentially private mechanisms and a deeper read if you want to get some insights into what kind of constraints differential privacy puts on output distributions.