Via John Langford, I learned that the sequel to the Netflix Prize has been cancelled due to privacy concerns. The paper by Narayanan and Shmatikov (also at the Oakland Security and Privacy Conference, 2008) showed that by combining the public information available via IMDB and the Netflix data, certain individuals could be re-identified. Netflix was sued over the privacy problems, and they’ve settled the suits and decided not to release the new dataset (which was to have demographic information).
Demographic information is known to be pretty valuable in re-identification. The most famous example is Latanya Sweeney’s re-identification of the Governor of Massachusetts by linking (free) hospital discharge records and ($20) voter registration records. In the healthcare field, these kind of disclosures violate HIPAA, but this Netflix case raises an interesting question with regards to privacy promises. When a company assures you that your private information will only be used internally for quality control purposes, what are they actually promising? If they issue summary statistics and give those to third parties, is that privacy preserving?
The answer is no. However, people seem quite loath to worry about these kind of disclosures unless there is a public (and dangerous) privacy breach. This is why Narayanan and Shmatikov’s paper is important — the way to get the public (and hence policymakers) to take privacy seriously is to demonstrate that existing methods are insufficient.