# Generating vector-valued noise for differential privacy

A distribution that appears frequently in differential privacy is the Laplace distribution. While in the scalar case we have seen that Laplace noise may not be the best, it’s still the easiest example to start with. Suppose we have $n$ scalars $x_i \in [0,1]$ and we want to compute the average $\frac{1}{n} \sum_{i} x_i$ in a differentially private way. One way to do this is to release $Y = \frac{1}{n} \sum_{i} x_i + Z$, where $Z$ has a Laplace distribution: $p(z) = \frac{\lambda}{2} \exp( - \lambda |z| )$.

To see that this is differentially private, note that by changing one value of $x_i$ the average can change by at most $\pm \frac{1}{n}$. Let $\bar{x}$ and $\bar{x}'$ be the average of the original data and the data with one element changed. The output density in these two cases has distribution $p( y - \bar{x})$ and $p(y - \bar{x}')$, so for a $\frac{ p(y - \bar{x}) }{ p(y - \bar{x}') } \le \exp( \lambda ( |y - \bar{x}'| - |y - \bar{x}| ) ) \le \exp( \lambda |\bar{x} - \bar{x}|' ) \le \exp( \lambda/n )$

So we can see by choosing $\lambda = n \epsilon$ we get an $\epsilon$-differentially private approximation to the average.

What if we now have n vectors $\mathbf{x}_i \in [0,1]^d$? Well, one candidate is release a differentially private version of the mean by computing $\mathbf{Y} = \frac{1}{n} \sum_{i} \mathbf{X}_i + \mathbf{Z}$, where $\mathbf{Z}$ has a distribution that looks Laplace-like but in higher dimensions: $p(\mathbf{z}) \propto \exp( - \lambda \| \mathbf{z} \| )$

Now we can do the same calculation with means $\bar{\mathbf{x}}$ and $\bar{\mathbf{x}}'$ $\frac{ p(\mathbf{y} - \bar{\mathbf{x}}) }{ p(\mathbf{y} - \bar{\mathbf{x}}') } \le \exp( \lambda ( \|\mathbf{y} - \bar{\mathbf{x}}'\| - \|\mathbf{y} - \bar{\mathbf{x}}\| ) ) \le \exp( \lambda \|\bar{\mathbf{x}} - \bar{\mathbf{x}}'\| )$

Now the Euclidean norm of the average vector can change by a most $\sqrt{d}/n$ (by replacing $x_i = \mathbf{0}$ with $x_i' = \mathbf{1}$, for example), so the overall bound is $\exp(\lambda \sqrt{d}/n)$, which means choosing $\lambda = n \epsilon / \sqrt{d}$ we get $\epsilon$-differential privacy.

Sampling from exponentials is easy, but what about sampling from this distribution? Here’s where people can fall into a trap because they are not careful about transformations of random variables. It’s tempting (if you are rusty on your probability) to say that $p(\mathbf{z}) = C(\lambda) \exp( - \lambda \| \mathbf{z} \| )$

and then say “well, the norm looks exponentially distributed and the direction is isotropic so we can just sample the norm with an exponential distribution and the uniform direction by taking i.i.d. Gaussians and normalizing them.” But that’s totally wrong because that is implicitly doing a change of variables without properly adjusting the density function. The correct thing to do is to change from Euclidean coordinates to spherical coordinates. We have a map $T : (z_1, z_2, \ldots, z_d) \to (r, \phi_1, \phi_2, \ldots, \phi_{d-1})$ whose Jacobian is $J(r, \phi_1, \phi_2, \ldots, \phi_{d-1}) = r^{d-1} \sin^{d-2}(\phi_1) \ldots, \sin(\phi_{d-2})$.

Plugging this in and noting that $r = \|\mathbf{z}\|$ we get $p(r, \phi_1, \phi_2, \ldots, \phi_{d-1}) = C'(\lambda,\phi_1,\ldots, \phi_{d-1}) \exp( - \lambda r )$.

So now we can see that the distribution factorizes and indeed the radius and direction are independent. The radius is not exponentially distributed, it’s Erlang with parameters $(d,\lambda)$. We can generate this by taking the sum of $d$ exponential variables with parameter $\lambda$. The direction we can pick uniformly by sampling $d$ i.i.d. Gaussians and normalizing them.

In general sampling distributions for differentially private mechanisms can be complicated — for example in our work on PCA we had to use an MCMC procedure in our experiments to sample from the distribution in our algorithm. This means we could really only approximate our algorithm in the experiments, of course. There are also places to slip up in sampling from simple-looking distributions, and I’d be willing to bet that in some implementations out there people are not sampling from the correct distribution.

(Thanks to Kamalika Chaudhuri for inspiring this post.)

/