Retraction for Symmetric Matrix Perturbation for Differentially-Private Principal Component Analysis, ICASSP 2016

This is somewhat difficult to write, so I’ll cut to the chase. The proposed differentially private PCA method from the ICASSP 2016 paper by my student Hafiz Imtiaz and myself is incorrect and so should be retracted:

H. Imtiaz, A.D. Sarwate, Symmetric Matrix Perturbation for Differentially-Private Principal Component Analysis, Proceedings of the 2016 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2339–2343, March 2016.

As a side-result, the paper of Jiang et al. from AAAI 2016 has as similar incorrect result. I’d like to thank my collaborator Kamalika Chaudhuri, who brought this up to me after a conversation with Kunal Talwar. The fact that I have taken so long to make this public announcement was due to a misguided effort to adapt the algorithm and run new experiments. Unfortunately, this was not to be.

What is the algorithm? Well, the basic idea in all of these papers is to add Wishart noise to produce a noisy estimate of the second-moment matrix of a data set. We start with some data vectors \mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n \in \mathbb{R}^d and compute the second moment matrix A = \sum_{i} \mathbf{x}_i \mathbf{x}_i^{\top}. If we want to compute the SVD in a differentially private way, one approach is to add noise E to the second moment matrix to form \hat{A} = A + E such that \hat{A} is a differentially private version of A. Then by postprocessing invariance, the SVD of \hat{A} is also differentially private.

The motivation for our work was twofold: 1) we wanted to see how theoretically “optimal” algorithms performed on real data sets, especially for suggested settings of the parameters, and 2) we wanted to propose a good (\epsilon,0)-differentially private algorithm? What is wrong with \delta > 0? In some application domains, I have found that the notion of a “failure probability” is not compatible with practitioners’t (often lawyer-driven) policy requirements. Concentrated DP and newer concepts still keep the \delta > 0, so they might not fly either. Plus, it’s a slightly more challenging and interesting mathematical problem.

A very good (\epsilon,\delta) choice for E is to choose the entries to be i.i.d. Gaussian on the diagonal and above and copy entries below the diagonal such that E is symmetric. This is the Analyze Gauss (AG) algorithm of Dwork et al. Although \hat{A} is symmetric, it’s no longer positive semidefinite (PSD). However, this gives order-optimal guarantees on utility. Taking that as a cue, we wanted to preserve the PSD property in \hat{A}, so we proposed adding noise with a Wishart distribution, which corresponds to generating an i.i.d. rectangular d \times p Gaussian matrix $\latex Z$ and settting E = Z Z^{\top}. This generates a PSD perturbation fo \hat{A} and has pretty good utility guarantees as well.

Unfortunately, the Wishart perturbation is blatantly non-private because the image under the perturbation is a PSD cone starting from A, and for A and A' coming from neighboring databases, the cones are not the same. As a trivial example, suppose A = \mathbf{x} \mathbf{x}^{\top} and A' = \mathbf{0}. Hafiz has written up a more formal description. Because differential privacy requires the support of the output distribution under different inputs to coincide, it is difficult to create a (data-independent) additive perturbation that preserves the PSD property. The exponential mechanism works, but is not additive.

In the end, I was rather excited about the empirical performance (c.f. goal 1) above) and we were close to the deadline so I did not do a proper job checking the technical results in our rush to submit. I’m making this a teachable moment for myself but wanted to make sure there was a public, findable version of this retraction so that people don’t continue to build on our results. What we learned from 1) is that designing practical PCA methods for reasonable values of (\epsilon,\delta) is still challenging — many data sets do not have the sample sizes needed to make differential privacy practical. The solution may be to design more complex multi-stage pipeline, revisit what we can assume about the data space and neighborhood relations, or come up with better (\epsilon,0) algorithms that pass muster for the applications when nonzero \delta is unacceptable.