ISIT 2010 : databases

I attended one talk on databases and unfortunately missed the second one, so the writeup I have is based on the paper…

(Ertem Tuncel; University of California, Riverside and Deniz Gunduz; Centre Tecnologic de Telecomunicacions de Catalunya)
This was the talk I did attend. The goal was to information-theoritize the notion of compressing databases. You get M i.i.d. individuals with characteristics distributed according to P_X and features Y conditioned on the individuals according to P_{Y | X}. The identification problem is this: given a noisy feature vector Z generated from a particular individual X(i) retrieve the index of the user corresponding to Z. This paper looked at the case where you want to recover a distorted version of Y(i), so they are trading off identification and distortion. The solution is to code Wyner-Zivly, and the result recover the existing results on the identification problem alone as well as the Wyner-Ziv problem.

Lalitha Sankar; Princeton University, S. Raj Rajagopalan; HP Labs, H. Vincent Poor; Princeton University
This talk I missed, which is all the more sad because I work on this stuff. But I did get to see a version of it at ITA and also at the IPAM workshop on privacy (as a poster). This is again an attempt to information-theoretize the problem of data disclosure: given a database d of n individuals, release a noisy version of this database. In doing so, they assume that each individual has K features and the n rows are generated iid according to some distribution P. So, as they put in the paper, “the primary contribution is demonstrating the equivalence between the database privacy problem and a source coding problem with additional privacy constraints.” The notion of utility is captured by a distortion measure: the encoder maps each database to a string of bits W. A user with side information Z^n can attempt to reconstruct the database using the a lossy source code decoder. The notion of privacy by an equivocation constraint:
\frac{1}{n} H( X^n | W, Z^n) \ge E - \epsilon
They then set about finding the best utility-privacy (= distortion-equivocation) tradeoff. This is a different model than differential privacy, and rests on a number of information-theoretic assumptions that may be relaxed in the future. For example, assuming individuals are i.i.d., that distributions are known, that the side information sequence is known, and so on would give pause to the differential privacy crowd. Much like the epsilon in differential privacy, the equivocation has a meaning but its syntactic effect is hard to parse. I guess this is why syntactic definitions of privacy like k-anonymity are popular — it’s easy to see what they mean “on the ground.”


2 thoughts on “ISIT 2010 : databases

  1. I couldn’t have done a better job describing my paper this well :).


    p.s.: What was your paper at ISIT on?

    • On how to code when there is an adversary who can see your codeword causally (but with delay). We can show nice results for some limited channel models, but the complete picture is still a little opaque.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.