CFP: PPML Workshop at NIPS 2018

Privacy Preserving Machine Learning

NIPS 2018 Workshop

Montreal, December 8, 2018

Description

This one day workshop focuses on privacy preserving techniques for training, inference, and disclosure in large scale data analysis, both in the distributed and centralized settings. We have observed increasing interest of the ML community in leveraging cryptographic techniques such as Multi-Party Computation (MPC) and Homomorphic Encryption (HE) for privacy preserving training and inference, as well as Differential Privacy (DP) for disclosure. Simultaneously, the systems security and cryptography community has proposed various secure frameworks for ML. We encourage both theory and application-oriented submissions exploring a range of approaches, including:

  • secure multi-party computation techniques for ML
  • homomorphic encryption techniques for ML
  • hardware-based approaches to privacy preserving ML
  • centralized and decentralized protocols for learning on encrypted data
  • differential privacy: theory, applications, and implementations
  • statistical notions of privacy including relaxations of differential privacy
  • empirical and theoretical comparisons between different notions of privacy
  • trade-offs between privacy and utility

We think it will be very valuable to have a forum to unify different perspectives and start a discussion about the relative merits of each approach. The workshop will also serve as a venue for networking people from different communities interested in this problem, and hopefully foster fruitful long-term collaboration.

Submission Instructions

Submissions in the form of extended abstracts must be at most 4 pages long (not including references) and adhere to the NIPS format. We do accept submissions of work recently published or currently under review. Submissions should be anonymized. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to have a link to arxiv or a pdf published on the workshop webpage.

Program Committee

  • Pauline Anthonysamy (Google)
  • Borja de Balle Pigem (Amazon)
  • Keith Bonawitz (Google)
  • Emiliano de Cristofaro (University College London)
  • David Evans (University of Virginia)
  • Irene Giacomelli (Wisconsin University)
  • Nadin Kokciyan (King’s College London)
  • Kim Laine (Microsoft Research)
  • Payman Mohassel (Visa Research)
  • Catuscia Palamidessi (Ecole Polytechnique & INRIA)
  • Mijung Park (Max Planck Institute for Intelligent Systems)
  • Benjamin Rubinstein (University of Melbourne)
  • Anand Sarwate (Rutgers University)
  • Philipp Schoppmann (HU Berlin)
  • Nigel Smart (KU Leuven)
  • Carmela Troncoso (EPFL)
  • Pinar Yolum (Utrecht University)
  • Samee Zahur (University of Virginia)

Organizers

  • Adria Gascon (Alan Turing Institute & Edinburgh)
  • Niki Kilbertus (MPI for Intelligent Systems & Cambridge)
  • Olya Ohrimenko (Microsoft Research)
  • Mariana Raykova (Yale)
  • Adrian Weller (Alan Turing Institute & Cambridge)

Retraction for Symmetric Matrix Perturbation for Differentially-Private Principal Component Analysis, ICASSP 2016

This is somewhat difficult to write, so I’ll cut to the chase. The proposed differentially private PCA method from the ICASSP 2016 paper by my student Hafiz Imtiaz and myself is incorrect and so should be retracted:

H. Imtiaz, A.D. Sarwate, Symmetric Matrix Perturbation for Differentially-Private Principal Component Analysis, Proceedings of the 2016 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2339–2343, March 2016.

As a side-result, the paper of Jiang et al. from AAAI 2016 has as similar incorrect result. I’d like to thank my collaborator Kamalika Chaudhuri, who brought this up to me after a conversation with Kunal Talwar. The fact that I have taken so long to make this public announcement was due to a misguided effort to adapt the algorithm and run new experiments. Unfortunately, this was not to be.

What is the algorithm? Well, the basic idea in all of these papers is to add Wishart noise to produce a noisy estimate of the second-moment matrix of a data set. We start with some data vectors \mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n \in \mathbb{R}^d and compute the second moment matrix A = \sum_{i} \mathbf{x}_i \mathbf{x}_i^{\top}. If we want to compute the SVD in a differentially private way, one approach is to add noise E to the second moment matrix to form \hat{A} = A + E such that \hat{A} is a differentially private version of A. Then by postprocessing invariance, the SVD of \hat{A} is also differentially private.

The motivation for our work was twofold: 1) we wanted to see how theoretically “optimal” algorithms performed on real data sets, especially for suggested settings of the parameters, and 2) we wanted to propose a good (\epsilon,0)-differentially private algorithm? What is wrong with \delta > 0? In some application domains, I have found that the notion of a “failure probability” is not compatible with practitioners’t (often lawyer-driven) policy requirements. Concentrated DP and newer concepts still keep the \delta > 0, so they might not fly either. Plus, it’s a slightly more challenging and interesting mathematical problem.

A very good (\epsilon,\delta) choice for E is to choose the entries to be i.i.d. Gaussian on the diagonal and above and copy entries below the diagonal such that E is symmetric. This is the Analyze Gauss (AG) algorithm of Dwork et al. Although \hat{A} is symmetric, it’s no longer positive semidefinite (PSD). However, this gives order-optimal guarantees on utility. Taking that as a cue, we wanted to preserve the PSD property in \hat{A}, so we proposed adding noise with a Wishart distribution, which corresponds to generating an i.i.d. rectangular d \times p Gaussian matrix $\latex Z$ and settting E = Z Z^{\top}. This generates a PSD perturbation fo \hat{A} and has pretty good utility guarantees as well.

Unfortunately, the Wishart perturbation is blatantly non-private because the image under the perturbation is a PSD cone starting from A, and for A and A' coming from neighboring databases, the cones are not the same. As a trivial example, suppose A = \mathbf{x} \mathbf{x}^{\top} and A' = \mathbf{0}. Hafiz has written up a more formal description. Because differential privacy requires the support of the output distribution under different inputs to coincide, it is difficult to create a (data-independent) additive perturbation that preserves the PSD property. The exponential mechanism works, but is not additive.

In the end, I was rather excited about the empirical performance (c.f. goal 1) above) and we were close to the deadline so I did not do a proper job checking the technical results in our rush to submit. I’m making this a teachable moment for myself but wanted to make sure there was a public, findable version of this retraction so that people don’t continue to build on our results. What we learned from 1) is that designing practical PCA methods for reasonable values of (\epsilon,\delta) is still challenging — many data sets do not have the sample sizes needed to make differential privacy practical. The solution may be to design more complex multi-stage pipeline, revisit what we can assume about the data space and neighborhood relations, or come up with better (\epsilon,0) algorithms that pass muster for the applications when nonzero \delta is unacceptable.

DIMACS Workshop on Distributed Optimization, Information Processing, and Learning

My colleague Waheed Bajwa, Alejandro Ribeiro, and Alekh Agarwal are organizing a Workshop on Distributed Optimization, Information Processing, and Learning from August 21 to August 23, 2017 at Rutgers DIMACS. The purpose of this workshop is to bring together researchers from the fields of machine learning, signal processing, and optimization for cross-pollination of ideas related to the problems of distributed optimization, information processing, and learning. All in all, we are expecting to have 20 to 26 invited talks from leading researchers working in these areas as well as around 20 contributed posters in the workshop.

Registration is open from now until August 14 — hope to see some of you there!

Postdoctoral Associate at DIMACS

DIMACS, the Center for Discrete Mathematics and Theoretical Computer Science, invites applications for various postdoctoral associate positions for 2017-18. Applicants should be recent Ph.D.’s with interest in DIMACS areas, such as computer science, discrete mathematics, statistics, physics, operations research, and their applications. There are four positions available:

  1. a one-year postdoctoral associateship investigating modeling of anomaly detection in multi-layer networks,
  2. a two-year associateship in collaboration with the Institute for Advanced Study (IAS) in Princeton, NJ emphasizing theoretical computer science and discrete mathematics,
  3. a position associated with the Simons Collaboration on Algorithms and Geometry which also emphasizes theoretical computer science and discrete mathematics and could be hosted at Rutgers/DIMACS,
  4. a two-year associateship in theoretical machine learning in the Department of Computer Science at Rutgers.

See the DIMACS website for application information.

Applications have various deadlines, beginning December 1, 2016. See website for details.
DIMACS Center, Rutgers University, 96 Frelinghuysen Road, Piscataway, NJ 08854-8018;
Tel: 848-445-5928; Email: postdoc at dimacs.rutgers.edu. DIMACS is an EO/AA employer.

Problems with the KDDCup99 Data Set

I’ve used the KDDCup99 data set in a few papers for experiments, primarily because it has a large sample size and preprocessing is not too onerous. However, I recently learned (from Rebecca Wright) that for applications to network security, this data set has been discredited as unrepresentative. The paper by John McHugh from ACM TISSEC details the charges. Essentially there was little validation done with regards to checking how representative the data set is.

Why do I bring this up? Firstly, I suppose I should stop using this data set to make claims about anomaly detection (which may be a problem for AISec coming up at the end of the month). However, it’s not clear, from a machine learning perspective, whether the claims one can make about a particular application will generalize within an application domain, given the lack of standardization of data sets even within a particular application. I could do a bunch of experiments on mixtures of Gaussians which might tell me that the convergence rate is what the theory said it should be, but validating on a variety of “non-synthetic” data sets can at least show how performance varies with data sets properties (regardless of the accuracy with respect to the application). So should I stop using the data set entirely?

Secondly, if we want to develop new models and algorithms for machine learning on security applications, we need data sets, and preferably public data sets. This is a real challenge for anyone trying to develop theoretical frameworks that don’t sound too bogus: practice could drive theory, but there is a kind of security through obscurity model in the data gathering/sharing world which makes it hard to understand what the problems are.

Signal boost: DPCOMP.ORG is live

I got the following email from Gerome Miklau:

Dear colleagues:

We are writing to inform you of the launch of DPCOMP.ORG.

DPCOMP.ORG is a public website designed with the following goals in mind: (1) to increase the visibility and transparency of state-of-the-art differentially private algorithms and (2) to present a principled and comprehensive empirical evaluation of these algorithms. The intended audience is both researchers who study privacy algorithms and practitioners who might deploy these algorithms.

Currently DPComp includes algorithms for answering 1- and 2-dimensional range queries. We thoroughly study algorithm accuracy and the factors that influence it and present our findings using interactive visualizations. We follow the evaluation methodology from the paper “Principled Evaluation of Differentially Private Algorithms using DPBench”. In the future we plan to extend it to cover other analysis tasks (e.g., higher dimensional data, private regression).

Our hope is that the research community will contribute to improving DPCOMP.ORG so that practitioners are exposed to emerging research developments. For example: if you have datasets which you believe would distinguish the performance of tested algorithms, new algorithms that could be included, alternative workloads, or even a new error metric, please let us know — we would like to include them.

Please share this email with interested colleagues and students. And we welcome any feedback on the website or findings.

Sincerely,

Michael Hay (Colgate University)
Ashwin Machanavajjhala (Duke University)
Gerome Miklau (UMass Amherst)

Multiple Postdoc Openings at USC

Prof. Urbashi Mitra is looking for multiple postdocs. Given that this is the time of year when the future looks murkiest, these are great opportunities!

I am seeking multiple post-doctoral researchers are sought with expertise in one or more areas: Communication Theory, (Statistical) Signal Processing, Controls, Information Theory, and Machine Learning. In particular, the following expertises are of interest: structured inference (sparse approximation, low rank matrix completion, tensor signal processing, graph signal processing); multi-terminal information theory, or information theory at the boundaries of control or signal processing; distributed control, consensus methods and partially observable Markov Decision Process modeling and algorithms; modern optimization methods; or biological communications, signal processing or information theory.

The successful applicants will be expected to perform innovative translational research, mentor PhD students, give oral presentations, write journal papers, and participate in grant writing and project management. There will be significant opportunities for research leadership and interaction with funding agencies.

Ideally, the successful applicants will start in Summer 2016.

Please have your interested graduate students apply using the following portal:

https://jobs.usc.edu/postings/63539

In addition to a cv and research statement, the applicants are requested to have three letters of reference uploaded to the system as well.

UCSD Data Science Postdocs

A bit of a delayed posting due to pre-spring break crunch time, but my inimitable collaborator and ex-colleague Kamalika Chaudhuri passed along the following announcement.

I write with the exciting news that UCSD has up to four postdoctoral fellowship openings in data science and machine learning.

The fellowships will prepare outstanding researchers for academic careers. The fellows will be affiliated with the CSE or ECE Departments, will enjoy broad freedom to work with any of or faculty, they will be allocated a research budget, and will teach one class per year.

If you know anyone who might be interested, please encourage them to apply!

The program is co-sponsored by UCSD’s CSE and ECE departments, the Interdisciplinary Qualcomm Institute, and the Information Theory and Applications Center.

More information is available at the UCSD Data Science site. Review begins March 21, so get your applications in!

LabTV Profiles Are Up!

And now, a little pre-ITA self-promotion. As I wrote earlier, LabTV interviewed me and a subset of the students in the lab last semester (it was opt-in). This opportunity came out of my small part in the a large-scale collaboration organized by Mind Research Network (PI: Vince Calhoun) on trying to implement distributed and differentially private algorithms in a system to enable collaborative neuroscience research. Our lab profiles are now up! They interviewed me, graduate students Hafiz Imtiaz, Sijie Xiong, and Liyang Xie, and an undergraduate student, Kevin Sun. In watching I found that I learned a few new things about my students…

Call for Papers: T-SIPN Special Issue on Inference and Learning Over Networks

IEEE Signal Processing Society
IEEE Transactions on Signal and Information Processing over Networks
Special Issue on Inference and Learning Over Networks

Networks are everywhere. They surround us at different levels and scales, whether we are dealing with communications networks, power grids, biological colonies, social networks, sensor networks, or distributed Big Data depositories. Therefore, it is not hard to appreciate the ongoing and steady progression of network science, a prolific research field spreading across many theoretical as well as applicative domains. Regardless of the particular context, the very essence of a network resides in the interaction among its individual constituents, and Nature itself offers beautiful paradigms thereof. Many biological networks and animal groups owe their sophistication to fairly structured patterns of cooperation, which are vital to their successful operation. While each individual agent is not capable of sophisticated behavior on its own, the combined interplay among simpler units and the distributed processing of dispersed pieces of information, enable the agents to solve complex tasks and enhance dramatically their performance. Self-organization, cooperation and adaptation emerge as the essential, combined attributes of a network tasked with distributed information processing, optimization, and inference. Such a network is conveniently described as an ensemble of spatially dispersed (possibly moving) agents, linked together through a (possibly time – varying) connection topology. The agents are allowed to interact locally and to perform in-network processing, in order to accomplish the assigned inferential task. Correspondingly, several problems such as, e.g., network intrusion, community detection, and disease outbreak inference, can be conveniently described by signals on graphs, where the graph typically accounts for the topology of the underlying space and we obtain multivariate observations associated with nodes/edges of the graph. The goal in these problems is to identify/infer/learn patterns of interest, including anomalies, outliers, and existence of latent communities. Unveiling the fundamental principles that govern distributed inference and learning over networks has been the common scope across a variety of disciplines, such as signal processing, machine learning, optimization, control, statistics, physics, economics, biology, computer, and social sciences. In the realm of signal processing, many new challenges have emerged, which stimulate research efforts toward delivering the theories and algorithms necessary to (a) designing networks with sophisticated inferential and learning abilities; (b) promoting truly distributed implementations, endowed with real-time adaptation abilities, needed to face the dynamical scenarios wherein real-world networks operate; and (c) discovering and disclosing significant relationships possibly hidden in the data collected from across networked systems and entities. This call for papers therefore encourages submissions from a broad range of experts that study such fundamental questions, including but not limited to:

  • Adaptation and learning over networks.
  • Consensus strategies; diffusion strategies.
  • Distributed detection, estimation and filtering over networks.
  • Distributed dictionary learning.
  • Distributed game-theoretic learning.
  • Distributed machine learning; online learning.
  • Distributed optimization; stochastic approximation.
  • Distributed proximal techniques, sub-gradient techniques.
  • Learning over graphs; network tomography.
  • Multi-agent coordination and processing over networks.
  • Signal processing for biological, economic, and social networks.
  • Signal processing over graphs.

Prospective authors should visit http://www.signalprocessingsociety.org/publications/periodicals/tsipn/ for information on paper submission. Manuscripts should be submitted via Manuscript Central at http://mc.manuscriptcentral.com/tsipn-ieee.

Important Dates:

  • Manuscript submission: February 1, 2016
  • First review completed: April 1, 2016
  • Revised manuscript due: May 15, 2016
  • Second review completed: July 15, 2016
  • Final manuscript due: September 15, 2016
  • Publication: December 1, 2016

Guest Editors: