100 signatures needed by 5/12 to nominate José Moura for IEEE President

As I wrote earlierJosé Moura is trying to get on the ballot for IEEE President. The deadline is May 12 and he needs only 100 more signatures to get onto the ballot. If you are an IEEE member please take the 1 minute to sign the petition to get him on the ballot.

IEEE has an undemocratic nominations process in which the IEEE Board of Directors (BoD) gets to decide who the candidates will be. Because Prof. Moura opposed the BoD proposed amendment to consolidate power into the BoD and reduce regional representation, it is not hard to imagine that the BoD would not want to allow dissenting voices in the Presidential race. There is a petition at the IEEE website to put him on the ballot. It needs around 4,000 signatures and students members are also welcome to sign. You sign in with your IEEE account and then go to “Annual Election Petitions.”

Put José Moura on the ballot!

Retraction for Symmetric Matrix Perturbation for Differentially-Private Principal Component Analysis, ICASSP 2016

This is somewhat difficult to write, so I’ll cut to the chase. The proposed differentially private PCA method from the ICASSP 2016 paper by my student Hafiz Imtiaz and myself is incorrect and so should be retracted:

H. Imtiaz, A.D. Sarwate, Symmetric Matrix Perturbation for Differentially-Private Principal Component Analysis, Proceedings of the 2016 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2339–2343, March 2016.

As a side-result, the paper of Jiang et al. from AAAI 2016 has as similar incorrect result. I’d like to thank my collaborator Kamalika Chaudhuri, who brought this up to me after a conversation with Kunal Talwar. The fact that I have taken so long to make this public announcement was due to a misguided effort to adapt the algorithm and run new experiments. Unfortunately, this was not to be.

What is the algorithm? Well, the basic idea in all of these papers is to add Wishart noise to produce a noisy estimate of the second-moment matrix of a data set. We start with some data vectors \mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n \in \mathbb{R}^d and compute the second moment matrix A = \sum_{i} \mathbf{x}_i \mathbf{x}_i^{\top}. If we want to compute the SVD in a differentially private way, one approach is to add noise E to the second moment matrix to form \hat{A} = A + E such that \hat{A} is a differentially private version of A. Then by postprocessing invariance, the SVD of \hat{A} is also differentially private.

The motivation for our work was twofold: 1) we wanted to see how theoretically “optimal” algorithms performed on real data sets, especially for suggested settings of the parameters, and 2) we wanted to propose a good (\epsilon,0)-differentially private algorithm? What is wrong with \delta > 0? In some application domains, I have found that the notion of a “failure probability” is not compatible with practitioners’t (often lawyer-driven) policy requirements. Concentrated DP and newer concepts still keep the \delta > 0, so they might not fly either. Plus, it’s a slightly more challenging and interesting mathematical problem.

A very good (\epsilon,\delta) choice for E is to choose the entries to be i.i.d. Gaussian on the diagonal and above and copy entries below the diagonal such that E is symmetric. This is the Analyze Gauss (AG) algorithm of Dwork et al. Although \hat{A} is symmetric, it’s no longer positive semidefinite (PSD). However, this gives order-optimal guarantees on utility. Taking that as a cue, we wanted to preserve the PSD property in \hat{A}, so we proposed adding noise with a Wishart distribution, which corresponds to generating an i.i.d. rectangular d \times p Gaussian matrix $\latex Z$ and settting E = Z Z^{\top}. This generates a PSD perturbation fo \hat{A} and has pretty good utility guarantees as well.

Unfortunately, the Wishart perturbation is blatantly non-private because the image under the perturbation is a PSD cone starting from A, and for A and A' coming from neighboring databases, the cones are not the same. As a trivial example, suppose A = \mathbf{x} \mathbf{x}^{\top} and A' = \mathbf{0}. Hafiz has written up a more formal description. Because differential privacy requires the support of the output distribution under different inputs to coincide, it is difficult to create a (data-independent) additive perturbation that preserves the PSD property. The exponential mechanism works, but is not additive.

In the end, I was rather excited about the empirical performance (c.f. goal 1) above) and we were close to the deadline so I did not do a proper job checking the technical results in our rush to submit. I’m making this a teachable moment for myself but wanted to make sure there was a public, findable version of this retraction so that people don’t continue to build on our results. What we learned from 1) is that designing practical PCA methods for reasonable values of (\epsilon,\delta) is still challenging — many data sets do not have the sample sizes needed to make differential privacy practical. The solution may be to design more complex multi-stage pipeline, revisit what we can assume about the data space and neighborhood relations, or come up with better (\epsilon,0) algorithms that pass muster for the applications when nonzero \delta is unacceptable.

Readings

Last semester was full of travel but short on time. Despite having more time to read, I end up reading less. Maybe I can just blame the election. This semester is not going that great either, reading-wise.

Measuring the World (Daniel Kehlmann). This was apparently a bestseller in German-speaking countries. It’s about Gauss meeting von Humboldt, but mostly about von Humboldt. Gauss comes off as pretty cranky, but von Humboldt’s adventures have an air of the fantastical about them. I’m not sure why it was so popular in Germany, but I think those who like science and history would enjoy it.

The Fifth Season (N.K. Jemisin). Won the Hugo award, wide-ranging, multiple perspectives. A world in which seismic activity crushes civilizations regularly. Engenders a kind of cruelty and fend-for-oneselfness that seems raw but desperate. Really a great book, can’t wait to read the next.

The Social Construction of What? (Ian Hacking). A book that is not really about the “science wars” but is couched in there nonetheless. Hacking tries to taxonomize and break down what “social construction” means. I like the approach but as a “book” it kind of meanders to the end, with a chapter on child abuse more or less stapled in. He sets up the first half with a bunch of scalpels with which to dissect cases, but then doesn’t use them as clearly. I think I’m biased because I like reading math books where you get a general theorem and then a bunch of nice corollaries that recover known results or shed new light on something etc.

Too Like The Lightning (Ada Palmer) One of the best sci-fi books I’ve read in terms of Big Ideas and Lots of History. The second book in the series came out and I am going to read it soon. There’s a Crooked Timber seminar on it that I’ll read afterwards because spoilers. It’s a very difficult book to describe to others, but it’s in a very futuristic Earth with a very complex governmental system and perhaps extreme stratification into affinity groups. The latter seems “for the purposes of argument” in terms of the philosophizing. The narrator is remarkably unreliable, and the novel is written in a historical style, addressing the Reader, who sometimes themselves interjects. Read it if you like heady speculative fiction and complex worlds.

My Brilliant Friend (Elena Ferrante) This is the first of the Neapolitan Quartet novels and I found it utterly engaging. I think it’s one of those books which many of my friends who are women have read and almost none of my male friends have read, but that’s just some stupid sexism-in-publishing BS. I’m looking forward to the next three books, although I’ve heard this is the best of them. The thing to me about it is how vividly I could picture the events without Ferrante actually going through and describing the details. It somehow captures the feeling of revisiting a memory. Maybe I’m getting older. Time to start reading Proust?

 

Postdoc at ASU/Harvard on ML and Privacy

A joint postdoc position at Arizona State University and Harvard University in the area of machine learning and privacy is available immediately. The successful candidate will be working with the research groups of Prof. Lalitha Sankar and Prof. Flavio du Pin Calmon.

Specific topics of focus are the interplay of machine learning and privacy with focus on both rigorous information-theoretical results as well as practical design aspects.

The appointment will be for a period of 12 months initially, with a possibility for renewal. We are looking for strong applicants with an excellent theoretical background and a proven capacity for high-quality research in form of a strong publication record. Knowledge of privacy literature is desirable.

Interested applicants should submit a current CV, a 1-page research statement and a list of three references. Candidates should contact us via email at lsankar@asu.edu and/or flavio@seas.harvard.edu.

DIMACS Workshop on Distributed Optimization, Information Processing, and Learning

My colleague Waheed Bajwa, Alejandro Ribeiro, and Alekh Agarwal are organizing a Workshop on Distributed Optimization, Information Processing, and Learning from August 21 to August 23, 2017 at Rutgers DIMACS. The purpose of this workshop is to bring together researchers from the fields of machine learning, signal processing, and optimization for cross-pollination of ideas related to the problems of distributed optimization, information processing, and learning. All in all, we are expecting to have 20 to 26 invited talks from leading researchers working in these areas as well as around 20 contributed posters in the workshop.

Registration is open from now until August 14 — hope to see some of you there!

Register soon for the 2017 North American School of Information Theory

The site for the 2017 North American School of Information Theory is now live and registration will begin next week. The IT Schools have been going strong for the last few years and are a great resource for students, especially new students, to get some exposure to information theory research beyond their own work and what they learned in class. Most schools do not have several people working on information theory. For students at such institutions the school provides a great way to meet other new researchers in the field.

Why I will no longer do research sponsored by the Department of Homeland Security

Shortly after the inauguration, the new administration announced a blanket ban on  refugees and any entry into the US from 7 proscribed countries. The announcement was met with outrage, ridicule, and legal opposition. Despite its obvious unconstitutionality and complete lack of guidance on implementation, the Department of Homeland Security (DHS) engaged in a weekend of harassment and intimidation at our border. They were big cheerleaders of the effort but also incompetent and unprofessional. At the same time, a pilot project I had worked on, funded by DHS, was about to be submitted for a “phase 2” extension. In light of those events, I asked to be taken off the project entirely and will no longer seek funding from DHS.

Now the administration has revealed an harsh and chilling new plan to terrorize immigrants and their families through a campaign of demonization and deportation. The biggest cheerleader of this plan is the new DHS secretary, John F. Kelly. This only validates my decision: I cannot conduct research under the aegis of an agency whose planned actions run counter to nearly everything I believe. It would be unethical for me to continue to do so. If that were not enough, DHS has also ceded its technical authority on security/privacy issues. I have little confidence that real security/privacy is a priority.

Supporting graduate students in engineering often requires getting funds from multiple sources; it is difficult to sustain a research group (and in some places, get tenure) on NSF (or NIH, in my field at least) funding alone. The usual suspects are industry, foundations, the various defense research programs (ARO, ONR, AFOSR, DARPA), or other government departments (DOE comes to mind). If you work in security/privacy, DHS has also been a (relatively) new source of research funding. Many of the other government agencies do fund basic research but some are focused more on applied research or transitioning research to practice.

There are many considerations to balance. As a theory and algorithms person, the overhead of making “deliverables” can take time away from basic research. Some programs have a lot of check-ins and progress updates which don’t necessarily work with the academic calendar. Finally, trying to describe your theory question about graph topology and its impact on convergence rates of some estimator as crucial to improving “warfighter capabilities” can sometimes feel awkward. My approach thus far has been to try things out. If the proposal is on research I would want to do anyway and everything is public domain and open, then OK. If the experience is not so good/fun, maybe I will consider alternatives. Basic research has lots of applications; improving a control algorithm could yield better drone targeting or more accurate laser-aided surgery.

It was with this in mind that I proceeded with a DHS-funded project on privacy-sensitive anomaly detection. It was a short project but raised a number of interesting new problems and models that I would like to explore, if possible, in the future. It also funded a student whose studies are now threatened by this very administration. Indeed, the entire academic research enterprise in the US is under threat. As academics we can sign petitions, but it is not enough. Scott Aaronson wrote very eloquently about this issue after the initial ban was announced (see also Terry Tao). My department has seen a dramatic decrease in the number of applicants in general and not just from Iran. We were just informed that we can no longer make TA offers for students who are unlikely to get a visa to come here.

The Department of Homeland Security has demonstrated its blatant disregard for moral norms. Why should we trust its scientific norms? What confidence do we have that funding will not be used in some coercive way? What does it say to our students when we ask them to work for DHS? Yes, the government is big, but at some point the argument that it’s mostly the guy at the top who is bad but the rest of the agency is still committed to good science becomes just too hard to swallow. I decided that I can’t square that circle. Each one of us should think hard about whether we want to.