DIMACS Workshop on Distributed Optimization, Information Processing, and Learning

My colleague Waheed Bajwa, Alejandro Ribeiro, and Alekh Agarwal are organizing a Workshop on Distributed Optimization, Information Processing, and Learning from August 21 to August 23, 2017 at Rutgers DIMACS. The purpose of this workshop is to bring together researchers from the fields of machine learning, signal processing, and optimization for cross-pollination of ideas related to the problems of distributed optimization, information processing, and learning. All in all, we are expecting to have 20 to 26 invited talks from leading researchers working in these areas as well as around 20 contributed posters in the workshop.

Registration is open from now until August 14 — hope to see some of you there!

Postdoctoral Associate at DIMACS

DIMACS, the Center for Discrete Mathematics and Theoretical Computer Science, invites applications for various postdoctoral associate positions for 2017-18. Applicants should be recent Ph.D.’s with interest in DIMACS areas, such as computer science, discrete mathematics, statistics, physics, operations research, and their applications. There are four positions available:

  1. a one-year postdoctoral associateship investigating modeling of anomaly detection in multi-layer networks,
  2. a two-year associateship in collaboration with the Institute for Advanced Study (IAS) in Princeton, NJ emphasizing theoretical computer science and discrete mathematics,
  3. a position associated with the Simons Collaboration on Algorithms and Geometry which also emphasizes theoretical computer science and discrete mathematics and could be hosted at Rutgers/DIMACS,
  4. a two-year associateship in theoretical machine learning in the Department of Computer Science at Rutgers.

See the DIMACS website for application information.

Applications have various deadlines, beginning December 1, 2016. See website for details.
DIMACS Center, Rutgers University, 96 Frelinghuysen Road, Piscataway, NJ 08854-8018;
Tel: 848-445-5928; Email: postdoc at dimacs.rutgers.edu. DIMACS is an EO/AA employer.

Problems with the KDDCup99 Data Set

I’ve used the KDDCup99 data set in a few papers for experiments, primarily because it has a large sample size and preprocessing is not too onerous. However, I recently learned (from Rebecca Wright) that for applications to network security, this data set has been discredited as unrepresentative. The paper by John McHugh from ACM TISSEC details the charges. Essentially there was little validation done with regards to checking how representative the data set is.

Why do I bring this up? Firstly, I suppose I should stop using this data set to make claims about anomaly detection (which may be a problem for AISec coming up at the end of the month). However, it’s not clear, from a machine learning perspective, whether the claims one can make about a particular application will generalize within an application domain, given the lack of standardization of data sets even within a particular application. I could do a bunch of experiments on mixtures of Gaussians which might tell me that the convergence rate is what the theory said it should be, but validating on a variety of “non-synthetic” data sets can at least show how performance varies with data sets properties (regardless of the accuracy with respect to the application). So should I stop using the data set entirely?

Secondly, if we want to develop new models and algorithms for machine learning on security applications, we need data sets, and preferably public data sets. This is a real challenge for anyone trying to develop theoretical frameworks that don’t sound too bogus: practice could drive theory, but there is a kind of security through obscurity model in the data gathering/sharing world which makes it hard to understand what the problems are.

Signal boost: DPCOMP.ORG is live

I got the following email from Gerome Miklau:

Dear colleagues:

We are writing to inform you of the launch of DPCOMP.ORG.

DPCOMP.ORG is a public website designed with the following goals in mind: (1) to increase the visibility and transparency of state-of-the-art differentially private algorithms and (2) to present a principled and comprehensive empirical evaluation of these algorithms. The intended audience is both researchers who study privacy algorithms and practitioners who might deploy these algorithms.

Currently DPComp includes algorithms for answering 1- and 2-dimensional range queries. We thoroughly study algorithm accuracy and the factors that influence it and present our findings using interactive visualizations. We follow the evaluation methodology from the paper “Principled Evaluation of Differentially Private Algorithms using DPBench”. In the future we plan to extend it to cover other analysis tasks (e.g., higher dimensional data, private regression).

Our hope is that the research community will contribute to improving DPCOMP.ORG so that practitioners are exposed to emerging research developments. For example: if you have datasets which you believe would distinguish the performance of tested algorithms, new algorithms that could be included, alternative workloads, or even a new error metric, please let us know — we would like to include them.

Please share this email with interested colleagues and students. And we welcome any feedback on the website or findings.

Sincerely,

Michael Hay (Colgate University)
Ashwin Machanavajjhala (Duke University)
Gerome Miklau (UMass Amherst)

Multiple Postdoc Openings at USC

Prof. Urbashi Mitra is looking for multiple postdocs. Given that this is the time of year when the future looks murkiest, these are great opportunities!

I am seeking multiple post-doctoral researchers are sought with expertise in one or more areas: Communication Theory, (Statistical) Signal Processing, Controls, Information Theory, and Machine Learning. In particular, the following expertises are of interest: structured inference (sparse approximation, low rank matrix completion, tensor signal processing, graph signal processing); multi-terminal information theory, or information theory at the boundaries of control or signal processing; distributed control, consensus methods and partially observable Markov Decision Process modeling and algorithms; modern optimization methods; or biological communications, signal processing or information theory.

The successful applicants will be expected to perform innovative translational research, mentor PhD students, give oral presentations, write journal papers, and participate in grant writing and project management. There will be significant opportunities for research leadership and interaction with funding agencies.

Ideally, the successful applicants will start in Summer 2016.

Please have your interested graduate students apply using the following portal:

https://jobs.usc.edu/postings/63539

In addition to a cv and research statement, the applicants are requested to have three letters of reference uploaded to the system as well.

UCSD Data Science Postdocs

A bit of a delayed posting due to pre-spring break crunch time, but my inimitable collaborator and ex-colleague Kamalika Chaudhuri passed along the following announcement.

I write with the exciting news that UCSD has up to four postdoctoral fellowship openings in data science and machine learning.

The fellowships will prepare outstanding researchers for academic careers. The fellows will be affiliated with the CSE or ECE Departments, will enjoy broad freedom to work with any of or faculty, they will be allocated a research budget, and will teach one class per year.

If you know anyone who might be interested, please encourage them to apply!

The program is co-sponsored by UCSD’s CSE and ECE departments, the Interdisciplinary Qualcomm Institute, and the Information Theory and Applications Center.

More information is available at the UCSD Data Science site. Review begins March 21, so get your applications in!

LabTV Profiles Are Up!

And now, a little pre-ITA self-promotion. As I wrote earlier, LabTV interviewed me and a subset of the students in the lab last semester (it was opt-in). This opportunity came out of my small part in the a large-scale collaboration organized by Mind Research Network (PI: Vince Calhoun) on trying to implement distributed and differentially private algorithms in a system to enable collaborative neuroscience research. Our lab profiles are now up! They interviewed me, graduate students Hafiz Imtiaz, Sijie Xiong, and Liyang Xie, and an undergraduate student, Kevin Sun. In watching I found that I learned a few new things about my students…