Postdoc / Visiting Scholar positions at Harvard’s Center for Research on Computation and Society

The Harvard Center for Research on Computation and Society (CRCS) solicits applications for its Postdoctoral Fellows and Visiting Scholars Programs for the 2013-2014 academic year. Postdoctoral Fellows are given an annual salary of approximately $60,000 for one year (with the possibility of renewal) to engage in a program of original research, and are provided with additional funds for travel and research support. Visiting Scholars often come with their own support, but CRCS can occasionally offer supplemental funding.

We seek researchers who wish to interact with both computer scientists and colleagues from other disciplines, and have a demonstrated interest in connecting their research agenda with societal issues.  We are particularly interested in candidates with interests in Economics and Computer Science, Health Care Informatics, Privacy & Security, and/or Technology & Accessibility, and those who may be interested in engaging in one of our ongoing/upcoming projects:

  • Intelligent, Adaptive Systems for Health Care Informatics
  • Language-Based Security
  • Personalized Accessibility
  • Privacy and Security in Targeted Advertising
  • Privacy Tools for Sharing Research Data
  • Trustworthy Crowdsourcing

Harvard University is an Affirmative Action/Equal Opportunity Employer. We are particularly interested in attracting women and underrepresented groups to participate in CRCS.  For further information about the Center and its activities, see http://crcs.seas.harvard.edu/.

Application Procedure

A cover letter, CV, research statement, copies of up to three research papers, and up to three letters of reference should be sent to:

Postdoctoral Fellows and Visiting Scholars Programs
Center for Research on Computation and Society
crcs-apply@seas.harvard.edu

References for postdoctoral fellows should send their letters directly, and Visiting Scholar applicants may provide a list of references rather than having letters sent. The application deadline for full consideration is December 16, 2012.

The Data Map : a map of information flows

One of the things Latanya Sweeney mentioned during her talk at the iDash workshop is a new project called theDataMap, which is trying to visualize how personal information about individuals flows through institutions. One of the starting points is an older map which shows how a putative hospital patient Alice’s personal information is accessed and used by a number of entities of whom she is likely unaware, including pharma companies, her employer, and medical researchers.

This is analogous to a map Lee Tien sent me, also from a report a few years ago, on how private medical information flows look in California.

It’s worth looking at and thinking a bit about how we balance privacy and utility/profit at the moment, and whether generally erring on the side of sharing is the best way to go.

2nd iDASH Workshop on Privacy

On Saturday I attended the 2nd iDASH workshop on privacy — I thought overall it went quite well, and it’s certainly true that over the last year the dialogue and understanding has improved between the theory/algorithms, data management/governance, and medical research communities. I developed note fatigue partway through the day, but I wanted to blog a little bit about some of the themes which came up during the workshop. Instead of making a monster post which covers everything, I will touch on a few things here. In particular, there were other talks not mentioned below about issues in data governance, cryptographic approaches, special issues in genomics, study design, and policy. I may touch on those in later posts.

Cynthia Dwork and Latanya Sweeney gave the keynotes, as they did last year, and they dovetailed quite nicely this year. Cynthia’s talk centered on how to think of privacy risk in terms of resource allocation — you have a certain amount of privacy and you have to apportion it over multiple queries. Latanya Sweeney’s talk came from the other direction: the current legal framework in the US is designed to make information flow, and so it is already a privacy-unfriendly policy regime. These raise some serious impediments to practically implementing privacy protections that we develop on the technological side.

On the privacy models side, Ashwin Machanavajjhala, Chris Clifton talked about slightly different models of privacy that are based on differential privacy but have a less immediately statistical feel, based on work from PODS 2012 and KDD 2012. Kamalika Chaudhuri talked about our work on differentially private PCA, and Li Xiong talked about differential privacy on time series using adaptive sampling and prediction.

Guy Rothblum talked about something he called “concentrated differential privacy,” which essentially amounts to analyzing the measure concentration properties of the log-likelihood ratio that appears in the differential privacy definition : for any two databases D and D', we want to analyze the behavior of the random variable log \frac{ \mathbb{P}( M(D) \in S ) }{ \mathbb{P}( M(D') \in S ) } for measurable sets S. Aaron Roth talked about taking advantage of more detailed metric structure in differentially private learning problems to get better accuracy for the same privacy level.

DIMACS Workshop on Differential Privacy

Via Kamalika, I head about the DIMACS Workshop on differential privacy at the end of October:

DIMACS Workshop on Differential Privacy across Computer Science
October 24-26, 2012
(immediately after FOCS 2012)

Call for Abstracts — Short Presentations

The upcoming DIMACS workshop on differential privacy will feature invited talks by experts from a range of areas in computer science as well as short talks (5 to 10 minutes) by participants.

Participants interested in giving a short presentation should send an email to asmith+dimacs@psu.edu containing a proposed talk title, abstract, and the speaker’s name and affiliation. We will try to
accommodate as many speakers as possible, but

a) requests received before October 1 will get full consideration
b) priority will be given to junior researchers, so students and postdocs should indicate their status in the email.

More information about the workshop:

The last few years have seen an explosion of results concerning differential privacy across many distinct but overlapping communities in computer science: Theoretical Computer Science, Databases, Programming Languages, Machine Learning, Data Mining, Security, and Cryptography. Each of these different areas has different priorities and techniques, and despite very similar interests, motivations, and choice of problems, it has become difficult to keep track of this large literature across so many different venues. The purpose of this workshop is to bring researchers in differential privacy across all of these communities together under one roof to discuss recent results and synchronize our understanding of the field. The first day of the workshop will include tutorials, representing a broad cross-section of research across fields. The remaining days will be devoted to talks on the exciting recent results in differential privacy across communities, discussion and formation of interesting open problems, and directions for potential inter-community collaborations.

The workshop is being organized by Aaron Roth (blog) and Adam Smith (blog).

ArXiV notes : July 2012 part 2

Differentially Private Filtering and Differentially Private Kalman Filtering
Jerome Le Ny, George J. Pappas
These papers apply the differential privacy model to classical signal processing and look at the effect of adding noise before filtering and summing versus adding noise after filtering and summing. The key is that the operations have to be causal, I think — we usually think about differential privacy as operating offline or online in very restricted settings, but here the signals are coming in as time series.

Finite sample posterior concentration in high-dimensional regression
Nate Strawn, Artin Armagan, Rayan Saab, Lawrence Carin, David Dunson
They study ultra high-dimensional linear regression (cue guitars and reverb) in the “large p, small n” setting. The goal is to get a Bernstein-von Mises type theorem — e.g. assuming the data comes from a linear model \beta_0, then using Bayesian inference the posterior should concentrate around \beta_0. They of course need a sparsity assumption, and the prior must assign reasonable mass around the true parameter and assign low probability to non-sparse signals. The methods use some ideas from compressed sensing (the Dantzig selector) and should be of interest to people working in that area.

Identifying Users From Their Rating Patterns
José Bento, Nadia Fawaz, Andrea Montanari, and Stratis Ioannidis
This is a paper about recommender systems as part of the 2011 CAMRa challenge. They look at the problem of re-identification of users in this data and show that looking at the time stamps of movie ratings is much more useful than looking at the rating values. This suggests to me that people should use a masking system like Anant and Parv’s “Incentivizing Anonymous ‘Peer-to-Peer’ Reviews” (P. Venkitasubramaniam and A. Sahai, Allerton 2008) paper.

<a href="http://arxiv.org/abs/1207.4084v1"Mechanism Design in Large Games: Incentives and Privacy
Michael Kearns, Mallesh M. Pai, Aaron Roth, Jonathan Ullman
This proposes a variant of differential privacy which they call joint differential privacy and look at mechanism designs that satisfy privacy and are incentive compatible. At first glance, these should be incompatible, since the latter implies “revealing the truth.” The model is one in which each agent has a finite set of actions but its own payoff/value is private. This is somewhat out of my area, so I can’t really get the nuances of what’s going on here, but a crucial assumption here is that there are a large number of agents. Joint differential privacy seems to be a form of (\epsilon,\delta) differential privacy on the utility functions of the users.

ArXiV notes : July 2012 part 1

I usually tag things from ArXiV that I want to skim later, so I figured I would post the ones I found interesting in the last month or so. Some of them may deserve their own post later. Jet lag is not letting me take the nap I need right now so I figured I’d post it before conking out.

Convergence Rates for Differentially Private Statistical Estimation
K. Chaudhuri and D. Hsu
ArXiV 1206.6395
This paper is about estimation under privacy constraints. In differential privacy there are a fair number of constraints with regards to the data being well behaved — typical assumptions include boundedness, for example. One way around this is to move to a relaxed notion of privacy known as (\alpha,\delta)-differential privacy, but what this paper shows is that for the stricter \alpha-differential privacy definition, statistical estimation is governed by the gross error sensitivity (GES) of the estimator.

The Greedy Miser: Learning under Test-time Budgets
Z. Xu, K. Weinberger, and O. Chapelle
ArXiV 1206.6451
This paper is about incorporating the cost of extracting features in classification or regression. For a data point \mathbf{x}_i \in \mathbb{R}^d, the feature \alpha costs c_{\alpha} to compute and there is a total budget for the computation. The main approach is to minimize a linear combination of the loss and the cost iteratively, where each iteration consists of building a regression tree.

Sequential detection of multiple change points in networks: a graphical model approach
A.A. Amini and X. Nguyen
ArXiV 1207.1687
This looks at the change point detection problem in the network setting, where each of d nodes in a network has a change point \lambda_j. Data is observed on vertices and edges and nodes have to pass messages on the graph. There’s a prior distribution on the change point times and after observing the data the nodes can use message passing to do some belief propagation to estimate the posterior distribution and define a stopping time. This turns out to be asymptotically optimal, in that the expected delay in detecting the change point is achieved.

The Price of Privacy in Untrusted Recommendation Engines
S. Bannerjee, N. Hegde, and L. Massoulié
ArXiV 1207.3269
This studies the effect on recommendations engines of individuals masking their data due to privacy concerns. The technical issues are the the dimension is high and each user only has a small amount of data to control (e.g. the movies that they rented or have seen). The question is one of sample complexity — how many users do we need to cluster successfully. If users are not so sparse, they can achieve the Fano lower bound (order-wise) using a sketch and spectral techniques, but the dependence is like N \log N, where $N$ is the number of items (e.g. movies). It’s worse if the users are sparse — the scaling is with N^2, which is also achievable in some settings. The analysis approach using Fano is analogous to the “information leakage” connections with differential privacy.

Surrogate Losses in Passive and Active Learning
S. Hanneke and L. Yang
ArXiV 1207.3772
This paper looks at how to use surrogate losses (which are computationally tractable) in active learning problems. “We take as our starting point that we have already committed to use a given surrogate loss, and we restrict our attention to just those scenarios in which this heuristic actually does work. We are then interested in how best to make use of the surrogate loss toward the goal of producing a classifier with relatively small error rate.” The central insight in their algorithm is that by taking into account the fact that the “real loss” is (say) the 0-1 loss, then for the purposes of active learning, one only really needs to get the sign right and not the magnitude. So the process will minimize the real loss but not necessarily the surrogate.

CFP : IEEE Signal Processing Magazine Special Issue on Signal Processing for Cyber-security and Privacy

CALL FOR PAPERS
IEEE Signal Processing Society

IEEE SIGNAL PROCESSING MAGAZINE
Special Issue on Signal Processing for Cyber-security and Privacy

Aims and Scope:
Information technology and electronic communications have been rapidly applied to many spheres of human activity, including commerce, medicine and social networking. This has led to the creation of massive electronic repositories for distributed information storage and processing, which enables access by a large number of authorized users. The need for timely access to electronic data makes it imperative to guarantee the security and privacy of this data. Traditionally, electronic data security has been ensured via cryptographic techniques, but these distributed data systems require security and privacy mechanisms at all levels of the system. Thus, providing precise guarantees on the security and privacy of electronic information requires leveraging a range of information processing techniques beyond traditional cryptography to ensure secure distributed storage and access mechanisms. The problems of information exchange, interaction, and access lend themselves to fundamental information processing abstractions and theoretical analysis. The tools of rate-distortion theory, distributed compression algorithms, distributed storage codes, machine learning for feature identification and suppression, and compressive sensing and sampling theory are fundamental and can be applied to precisely formulate and quantify the tradeoff between utility and privacy in a variety of domains. Thus, while rate-distortion theory and information-theoretic security can provide fundamental bounds on privacy and security leakage of distributed data systems, the information and signal processing techniques of compressive sensing, machine learning, and graphical models are the key ingredients necessary to achieve these performance limits in a variety of applications involving streaming data (smart grid, intelligent data collection), distributed data storage (cloud), and interactive data applications across a number of platforms. This special issue seeks to provide a venue for ongoing research in information and signal processing for security and privacy applications across a wide variety of domains, including communication media (e.g. ranging from wireless networks at the edge to optical backbones at the core of the Internet), to computer systems (e.g. ranging from traditional computer architectures to distributed systems, including cloud computing).

Topics of Interest include (but are not limited to):

  • Signal processing for information-theoretic security
  • Data mining and analysis for anomaly and intrusion detection
  • Forensic analysis: device identification, recovery of lost/corrupted information
  • Information processing in the encrypted domain
  • Security in distributed storage systems
  • Codes for security in distributed storage and cloud computing
  • Location privacy and obfuscation of mobile device positioning
  • Physical layer security methods: confidentiality and authentication
  • Secure identity management
  • Formalized models for adversaries and threats
  • Techniques to achieve covert or stealthy communication
  • Stochastic models for large data repositories and for streaming data in cyber-physical systems

Submission Process:
Articles submitted to this special issue must contain significant relevance to signal processing and its application to security and privacy. All submissions will be peer reviewed according to the IEEE and Signal Processing Society guidelines for both publications.Submitted articles should not have been published or under review elsewhere. Manuscripts should be submitted online using the Manuscript Central interface. Submissions to this special issue of the IEEE SIGNAL PROCESSING MAGAZINE should have significant tutorial value. Prospective authors should consult the Magazine site for guidelines and information on paper submission.

Important Dates: Expected publication date for this special issue is September 2013.

Guest Editors:
Lalitha Sankar, Lead GE, Arizona State University, USA, lalithasankar@asu.edu
Vincent H. Poor, Princeton University, USA, poor@princeton.edu
Mérouane Debbah, Supelec, Gif-sur-Yvette, France, merouane.debbah@supelec.fr
Kannan Ramchandran, University of California Berkeley, USA, kannanr@eecs.berkeley.edu
Wade Trappe, Rutgers University, USA, trappe@winlab.rutgers.edu

Linkage

More attacks on anonymity in DNA databases.

A letter from Kurt Vonnegut to the head of a school board in North Dakota who burned copies of Slaughterhouse Five. (via MeFi)

An interview with Olympic bronze medalist John Carlos on giving a black power salute at the 1968 Olympics.

Tomorrow is National Grilled Cheese Day.

I really enjoyed this exhibit of Tagore’s painting at The Art Institute of Chicago, although my favorite drawing is not online, this bird was pretty cool.

ICITS Workshop Deadline Extension

(Via Adam Smith) The deadline for submitting workshop papers to the 6th International Conference on Information Theoretic Security (ICITS) has been extended from today to Monday, April 23 (at 3pm Eastern) due to holidays. It’s in Montreal in August, so if you have some recent results that you would like to present in a workshop setting, please send them in. “Papers which broaden the applicability of information-theoretic techniques (say, to areas such as data privacy or anonymity) would also be very welcome.”

ICITS will have two tracks this year : a more CS-style conference with published proceedings, and a workshop (think ITA) track without proceedings where you can present older stuff.