Signal boost: DPCOMP.ORG is live

I got the following email from Gerome Miklau:

Dear colleagues:

We are writing to inform you of the launch of DPCOMP.ORG.

DPCOMP.ORG is a public website designed with the following goals in mind: (1) to increase the visibility and transparency of state-of-the-art differentially private algorithms and (2) to present a principled and comprehensive empirical evaluation of these algorithms. The intended audience is both researchers who study privacy algorithms and practitioners who might deploy these algorithms.

Currently DPComp includes algorithms for answering 1- and 2-dimensional range queries. We thoroughly study algorithm accuracy and the factors that influence it and present our findings using interactive visualizations. We follow the evaluation methodology from the paper “Principled Evaluation of Differentially Private Algorithms using DPBench”. In the future we plan to extend it to cover other analysis tasks (e.g., higher dimensional data, private regression).

Our hope is that the research community will contribute to improving DPCOMP.ORG so that practitioners are exposed to emerging research developments. For example: if you have datasets which you believe would distinguish the performance of tested algorithms, new algorithms that could be included, alternative workloads, or even a new error metric, please let us know — we would like to include them.

Please share this email with interested colleagues and students. And we welcome any feedback on the website or findings.

Sincerely,

Michael Hay (Colgate University)
Ashwin Machanavajjhala (Duke University)
Gerome Miklau (UMass Amherst)

Advertisement

ISIT 2010 : databases

I attended one talk on databases and unfortunately missed the second one, so the writeup I have is based on the paper…

IDENTIFICATION AND LOSSY RECONSTRUCTION IN NOISY DATABASES
(Ertem Tuncel; University of California, Riverside and Deniz Gunduz; Centre Tecnologic de Telecomunicacions de Catalunya)
This was the talk I did attend. The goal was to information-theoritize the notion of compressing databases. You get M i.i.d. individuals with characteristics distributed according to P_X and features Y conditioned on the individuals according to P_{Y | X}. The identification problem is this: given a noisy feature vector Z generated from a particular individual X(i) retrieve the index of the user corresponding to Z. This paper looked at the case where you want to recover a distorted version of Y(i), so they are trading off identification and distortion. The solution is to code Wyner-Zivly, and the result recover the existing results on the identification problem alone as well as the Wyner-Ziv problem.

A THEORY OF UTILITY AND PRIVACY OF DATA SOURCES
Lalitha Sankar; Princeton University, S. Raj Rajagopalan; HP Labs, H. Vincent Poor; Princeton University
This talk I missed, which is all the more sad because I work on this stuff. But I did get to see a version of it at ITA and also at the IPAM workshop on privacy (as a poster). This is again an attempt to information-theoretize the problem of data disclosure: given a database d of n individuals, release a noisy version of this database. In doing so, they assume that each individual has K features and the n rows are generated iid according to some distribution P. So, as they put in the paper, “the primary contribution is demonstrating the equivalence between the database privacy problem and a source coding problem with additional privacy constraints.” The notion of utility is captured by a distortion measure: the encoder maps each database to a string of bits W. A user with side information Z^n can attempt to reconstruct the database using the a lossy source code decoder. The notion of privacy by an equivocation constraint:
\frac{1}{n} H( X^n | W, Z^n) \ge E - \epsilon
They then set about finding the best utility-privacy (= distortion-equivocation) tradeoff. This is a different model than differential privacy, and rests on a number of information-theoretic assumptions that may be relaxed in the future. For example, assuming individuals are i.i.d., that distributions are known, that the side information sequence is known, and so on would give pause to the differential privacy crowd. Much like the epsilon in differential privacy, the equivocation has a meaning but its syntactic effect is hard to parse. I guess this is why syntactic definitions of privacy like k-anonymity are popular — it’s easy to see what they mean “on the ground.”