I experienced a horrible “network dropping causes web forms to clear” experience when filing an NSF report a few years back, so I switched to filling things in via the NSF’s Word template. However, the extraneous formatting in that made the cut-and-paste into the webworm tedious. So this time around I created a Markdown (.md) template with all of the questions you need to answer. This makes it easier to edit and lightly format your report text offline (e.g. on a plane) for much faster cut-and-paste later.
IPAM is hosting a workshop on “Algorithmic Challenges in Protecting Privacy for Biomedical Data” which will be held at IPAM from January 10-12, 2018.
The workshop will be attended by many junior as well as senior researchers with diverse backgrounds. We want to to encourage students or postdoctoral scholars who might be interested, to apply and/or register for this workshop.
I think it will be quite interesting and has the potential to spark a lot of interesting conversations around what we can and cannot do about privacy for medical data in general and genomic data in specific.
Disclaimer: I think that the beach should be open to all. Clearly there are some who unfairly malign some beach-goers. However, we need to have an honest discussion about the differences between different types of beach denizens. Thankfully, because I am a Star-Bellied and not Star-Eyed Sneech, I believe that we we can have an open and frank discussion about why Sneeches who originally did not have stars on their bellies have been disrupting the natural hierarchy of the beach. This bias towards equal-starness in Sneechdom threatens the viability of the beach and makes pariahs of those Sneeches whose came by their stars honestly . It is imperative that we have this conversation now before the beach is destroyed forever.
There are obvious innate biological differences between Star-Bellied Sneeches and Common Sneeches. Star-bellied Sneeches have stars: this makes them more adept at skills that require people who have stars on their bellies . Star-Belliedness is heritable: evolutionary sneechology shows that the reason Star-Bellied Sneeches are the “best kind of Sneetch on the beaches” because of the benefits conferred by their stars. What could be more simple? Of course, there are occasional bad Star-Bellied Sneeches and occasional good Common Sneeches, but the fact is on average, the Common Sneech is simply not as good on the beach as the Star-Bellied Sneech. Since the only difference between them is the star, it is only logical to assume that having a star makes a Sneech better at being on the beach.
Star-Bellied Sneeches are better at beach-related activities such as hiking and playing ball . They are natural denizens of the warm beaches. By contrast, the Sneeches without stars are less adapted, preferring the “cold in the dark of the beaches” and “moping alone on the beaches.” This explains why the two types have different roles: Common Sneeches prefer the cold and moping alone because that is what they are good at. Unfortunately, because of our new culture of suppressing traditional viewpoints, we cannot have an honest discussion about this beach-appropriateness gap; the dire consequences of attempting to make all Sneeches have equivalent stars discriminates against the true Star-Bellied Sneeches. The time has come where a Star-Bellied Sneech can no longer speak of “frankfurter roasts, or picnics or parties or marshmallow toasts,” without requiring that Common Sneeches be invited too (despite the superficial appearance of their stars).
As an employee of Sylvester McMonkey McBean Corp. (SMMC), I feel it is my duty to speak out against unfair discrimination against Star-bellied Sneeches. The company offers special “fix-it-up” training and that puts stars on the bellies of Common Sneeches. What’s more, this star service is only available to Sneeches without stars! This is highly discriminatory: how can will we ever know whose stars are original and whose stars are the result of special training? This violates the natural order of things. Because Common Sneeches have a tendency to mope alone, we run the risk of having Sneeches without innate hiking- and ball-playing-skill gaining prominence on the beech. Again, this is not bad for specific Sneeches, but on average will make SMMC less competitive and the beach sub-optimal.
My recommendation is that we should treat each Sneech individually. Those Common Sneeches who are demonstrably non-mopey and can, in controlled environments, enjoy a marshmallow toast, could be possibly considered for further admission into frankfurter roasts and picnics. However, we cannot simply erase differences between Sneeches by allowing non-Star-Bellied Sneeches to obtain stars. It may be politically incorrect to say this, but we must have a way of distinguishing those Common Sneeches who are deserving of stars from those who are not .
Sylvester McMonkey McBean Corp. is treading down a dangerous path and is alienating a great number of Star-Bellied Sneeches. I have received very positive feedback regarding these opinions, so there is clearly a “silent minority” of Star-Bellied Sneeches who are yearning for an open conversation  about these topics. However, the beach has become an inhospitable plate for the rational-minded Sneech.
In closing, I wish to state here that I do not intend to offend Common Sneeches: therefore I cannot possibly have offended any of them, because I said so. That is logic that any Star-Bellied Sneech can understand: Sneeches who cannot are clearly Common mopes.
 That is, they were born with them.
 For example, fooling starfish while snorkeling is something only Star-bellied Sneeches can do.
 While it’s true that in previous years “you could only play ball if your bellies had stars,” I would like to strenuously emphasize that that is #NotAllStarBellies. Not mentioning this disclaimer is evidence of the ideological commitment to the notion that all Sneeches are similarly capable of playing ball. Common Sneeches should perhaps play in a league of their own.
 Although there may be Star-Bellied Sneeches who fail to meet the bar, this can be addressed by introducing better role models and mentors. Unfortunately, such programs are only available to Common Sneeches.
 Preferably one only involving non-mopey Sneeches.
As I wrote earlier, José Moura is trying to get on the ballot for IEEE President. The deadline is May 12 and he needs only 100 more signatures to get onto the ballot. If you are an IEEE member please take the 1 minute to sign the petition to get him on the ballot.
IEEE has an undemocratic nominations process in which the IEEE Board of Directors (BoD) gets to decide who the candidates will be. Because Prof. Moura opposed the BoD proposed amendment to consolidate power into the BoD and reduce regional representation, it is not hard to imagine that the BoD would not want to allow dissenting voices in the Presidential race. There is a petition at the IEEE website to put him on the ballot. It needs around 4,000 signatures and students members are also welcome to sign. You sign in with your IEEE account and then go to “Annual Election Petitions.”
Put José Moura on the ballot!
This is somewhat difficult to write, so I’ll cut to the chase. The proposed differentially private PCA method from the ICASSP 2016 paper by my student Hafiz Imtiaz and myself is incorrect and so should be retracted:
H. Imtiaz, A.D. Sarwate, Symmetric Matrix Perturbation for Differentially-Private Principal Component Analysis, Proceedings of the 2016 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2339–2343, March 2016.
As a side-result, the paper of Jiang et al. from AAAI 2016 has as similar incorrect result. I’d like to thank my collaborator Kamalika Chaudhuri, who brought this up to me after a conversation with Kunal Talwar. The fact that I have taken so long to make this public announcement was due to a misguided effort to adapt the algorithm and run new experiments. Unfortunately, this was not to be.
What is the algorithm? Well, the basic idea in all of these papers is to add Wishart noise to produce a noisy estimate of the second-moment matrix of a data set. We start with some data vectors and compute the second moment matrix . If we want to compute the SVD in a differentially private way, one approach is to add noise to the second moment matrix to form such that is a differentially private version of . Then by postprocessing invariance, the SVD of is also differentially private.
The motivation for our work was twofold: 1) we wanted to see how theoretically “optimal” algorithms performed on real data sets, especially for suggested settings of the parameters, and 2) we wanted to propose a good -differentially private algorithm? What is wrong with ? In some application domains, I have found that the notion of a “failure probability” is not compatible with practitioners’t (often lawyer-driven) policy requirements. Concentrated DP and newer concepts still keep the , so they might not fly either. Plus, it’s a slightly more challenging and interesting mathematical problem.
A very good choice for is to choose the entries to be i.i.d. Gaussian on the diagonal and above and copy entries below the diagonal such that is symmetric. This is the Analyze Gauss (AG) algorithm of Dwork et al. Although is symmetric, it’s no longer positive semidefinite (PSD). However, this gives order-optimal guarantees on utility. Taking that as a cue, we wanted to preserve the PSD property in , so we proposed adding noise with a Wishart distribution, which corresponds to generating an i.i.d. rectangular Gaussian matrix $\latex Z$ and settting . This generates a PSD perturbation fo and has pretty good utility guarantees as well.
Unfortunately, the Wishart perturbation is blatantly non-private because the image under the perturbation is a PSD cone starting from , and for and coming from neighboring databases, the cones are not the same. As a trivial example, suppose and . Hafiz has written up a more formal description. Because differential privacy requires the support of the output distribution under different inputs to coincide, it is difficult to create a (data-independent) additive perturbation that preserves the PSD property. The exponential mechanism works, but is not additive.
In the end, I was rather excited about the empirical performance (c.f. goal 1) above) and we were close to the deadline so I did not do a proper job checking the technical results in our rush to submit. I’m making this a teachable moment for myself but wanted to make sure there was a public, findable version of this retraction so that people don’t continue to build on our results. What we learned from 1) is that designing practical PCA methods for reasonable values of is still challenging — many data sets do not have the sample sizes needed to make differential privacy practical. The solution may be to design more complex multi-stage pipeline, revisit what we can assume about the data space and neighborhood relations, or come up with better algorithms that pass muster for the applications when nonzero is unacceptable.
Last semester was full of travel but short on time. Despite having more time to read, I end up reading less. Maybe I can just blame the election. This semester is not going that great either, reading-wise.
Measuring the World (Daniel Kehlmann). This was apparently a bestseller in German-speaking countries. It’s about Gauss meeting von Humboldt, but mostly about von Humboldt. Gauss comes off as pretty cranky, but von Humboldt’s adventures have an air of the fantastical about them. I’m not sure why it was so popular in Germany, but I think those who like science and history would enjoy it.
The Fifth Season (N.K. Jemisin). Won the Hugo award, wide-ranging, multiple perspectives. A world in which seismic activity crushes civilizations regularly. Engenders a kind of cruelty and fend-for-oneselfness that seems raw but desperate. Really a great book, can’t wait to read the next.
The Social Construction of What? (Ian Hacking). A book that is not really about the “science wars” but is couched in there nonetheless. Hacking tries to taxonomize and break down what “social construction” means. I like the approach but as a “book” it kind of meanders to the end, with a chapter on child abuse more or less stapled in. He sets up the first half with a bunch of scalpels with which to dissect cases, but then doesn’t use them as clearly. I think I’m biased because I like reading math books where you get a general theorem and then a bunch of nice corollaries that recover known results or shed new light on something etc.
Too Like The Lightning (Ada Palmer) One of the best sci-fi books I’ve read in terms of Big Ideas and Lots of History. The second book in the series came out and I am going to read it soon. There’s a Crooked Timber seminar on it that I’ll read afterwards because spoilers. It’s a very difficult book to describe to others, but it’s in a very futuristic Earth with a very complex governmental system and perhaps extreme stratification into affinity groups. The latter seems “for the purposes of argument” in terms of the philosophizing. The narrator is remarkably unreliable, and the novel is written in a historical style, addressing the Reader, who sometimes themselves interjects. Read it if you like heady speculative fiction and complex worlds.
My Brilliant Friend (Elena Ferrante) This is the first of the Neapolitan Quartet novels and I found it utterly engaging. I think it’s one of those books which many of my friends who are women have read and almost none of my male friends have read, but that’s just some stupid sexism-in-publishing BS. I’m looking forward to the next three books, although I’ve heard this is the best of them. The thing to me about it is how vividly I could picture the events without Ferrante actually going through and describing the details. It somehow captures the feeling of revisiting a memory. Maybe I’m getting older. Time to start reading Proust?
A joint postdoc position at Arizona State University and Harvard University in the area of machine learning and privacy is available immediately. The successful candidate will be working with the research groups of Prof. Lalitha Sankar and Prof. Flavio du Pin Calmon.
Specific topics of focus are the interplay of machine learning and privacy with focus on both rigorous information-theoretical results as well as practical design aspects.
The appointment will be for a period of 12 months initially, with a possibility for renewal. We are looking for strong applicants with an excellent theoretical background and a proven capacity for high-quality research in form of a strong publication record. Knowledge of privacy literature is desirable.