2nd iDASH Workshop on Privacy

On Saturday I attended the 2nd iDASH workshop on privacy — I thought overall it went quite well, and it’s certainly true that over the last year the dialogue and understanding has improved between the theory/algorithms, data management/governance, and medical research communities. I developed note fatigue partway through the day, but I wanted to blog a little bit about some of the themes which came up during the workshop. Instead of making a monster post which covers everything, I will touch on a few things here. In particular, there were other talks not mentioned below about issues in data governance, cryptographic approaches, special issues in genomics, study design, and policy. I may touch on those in later posts.

Cynthia Dwork and Latanya Sweeney gave the keynotes, as they did last year, and they dovetailed quite nicely this year. Cynthia’s talk centered on how to think of privacy risk in terms of resource allocation — you have a certain amount of privacy and you have to apportion it over multiple queries. Latanya Sweeney’s talk came from the other direction: the current legal framework in the US is designed to make information flow, and so it is already a privacy-unfriendly policy regime. These raise some serious impediments to practically implementing privacy protections that we develop on the technological side.

On the privacy models side, Ashwin Machanavajjhala, Chris Clifton talked about slightly different models of privacy that are based on differential privacy but have a less immediately statistical feel, based on work from PODS 2012 and KDD 2012. Kamalika Chaudhuri talked about our work on differentially private PCA, and Li Xiong talked about differential privacy on time series using adaptive sampling and prediction.

Guy Rothblum talked about something he called “concentrated differential privacy,” which essentially amounts to analyzing the measure concentration properties of the log-likelihood ratio that appears in the differential privacy definition : for any two databases D and D', we want to analyze the behavior of the random variable log \frac{ \mathbb{P}( M(D) \in S ) }{ \mathbb{P}( M(D') \in S ) } for measurable sets S. Aaron Roth talked about taking advantage of more detailed metric structure in differentially private learning problems to get better accuracy for the same privacy level.

William Thurston on proof and progress

William Thurston passed away a little over a month ago, and while I have never had the occasion to read any of his work, this article of his, entitled “On Proof and Progress in Mathematics” has been reposted, and I think it’s worth a read for those who think about how mathematical knowledge progresses. For those who do theoretical engineering, I think Thurston offers an interesting outside perspective that is a refreshing antidote to the style of research that we do now. His first point is that we should ask the question:

How do mathematicians advance human understanding of mathematics?

I think we could also ask the question in our own fields, and we can do a similar breakdown to what he does in the article : how do we understand information theory, and how is that communicated to others? Lav Varshney had a nice paper (though I can’t seem to find it) about the role of block diagrams as a mode of communicating our models and results to each other — this is a visual way of understanding. By contrast, I find that machine learning papers rarely have block diagrams or schematics to illustrate the geometric intuition behind a proof. Instead, the visual illustrations are plots of experimental results.

Thurston goes through a number of questions that interrogate the motives, methods, and outcomes of mathematical research, but I think it’s relevant for everyone, even non-mathematical researchers. In the end, research is about communication, and understanding the what, how, and why of that is always a valuable exercise.