Netflix Prize II is cancelled

Via John Langford, I learned that the sequel to the Netflix Prize has been cancelled due to privacy concerns. The paper by Narayanan and Shmatikov (also at the Oakland Security and Privacy Conference, 2008) showed that by combining the public information available via IMDB and the Netflix data, certain individuals could be re-identified. Netflix was sued over the privacy problems, and they’ve settled the suits and decided not to release the new dataset (which was to have demographic information).

Demographic information is known to be pretty valuable in re-identification. The most famous example is Latanya Sweeney’s re-identification of the Governor of Massachusetts by linking (free) hospital discharge records and ($20) voter registration records. In the healthcare field, these kind of disclosures violate HIPAA, but this Netflix case raises an interesting question with regards to privacy promises. When a company assures you that your private information will only be used internally for quality control purposes, what are they actually promising? If they issue summary statistics and give those to third parties, is that privacy preserving?

The answer is no. However, people seem quite loath to worry about these kind of disclosures unless there is a public (and dangerous) privacy breach. This is why Narayanan and Shmatikov’s paper is important — the way to get the public (and hence policymakers) to take privacy seriously is to demonstrate that existing methods are insufficient.

What’s the point of an X department?

Over at Crooked Timber there’s a discussion on eliminating some majors to save money, particularly if they don’t have many graduates.

The issue made it to Leiter because several of the Philosophy departments in those institutions fall into the low-major category. But is producing Philosophy majors the point of having a Philosophy department? In Our Underachieving Colleges (CT review still on its way: DD to blame if I never get round to it) Derek Bok claims that the standard assumptions within most departments in research universities is that the undergraduate curriculum is for attracting and then teaching majors, and, further, that our attention to the majors should be shaped by the aim of preparing them well for graduate school. This means that the curriculum is designed for a tiny minority of the students who take classes, and even many of them, probably, would be better off doing something other than going to graduate school (that’s me, not Bok, saying the last bit).

Philosophy departments should take heed of Samidh’s observation that philosophers are good entrepreneurs and point out that they may produce the next big alumni donor!

I wonder the degree to which Bok’s claim is true in mathematics, science, and engineering. I think it’s probably true that the average biology major or electrical engineer is being prepared for work at a company. Even senior electives are useful in this sense, especially if they are project-oriented. However, it’s probably the case that if you major in math and do not plan to go to graduate school, then your senior seminar in commutative algebra is pretty much useless for the work you’ll do later. But is the average math major at a public university being prepared for (some) graduate program? Is math in this sense closer to the humanities programs mentioned above?

In electrical engineering, it’s to go work in a company (or for the government) designing/building stuff, and those specialized classes are geared for that. On average, I think undergraduate programs in engineering in the US don’t emphasize going on to graduate study. An exception is the profit-turning one-year masters programs that have become popular in recent years. Designing a program to prepare people primarily for graduate school or designing a program to prepare people primarily for the workforce misses the point of college.

The story you hear is that a classical liberal arts education in the US is supposed to teach you to think critically and be an active and thoughtful member of society. So what does that mean for engineers? In a sense, design choices are a form of critical analysis within the context of engineering, but I think that kind of perspective can be construed more broadly. We’re so keen on formulating notions of optimality or engineering tradeoffs that we don’t also consider the societal aspects of the things that we design. It would be nice to get upper-division engineering classes that talk about where technology is headed, where society is headed, and how those interact on a more technical level. This kind of thinking is good preparation for work and for research. I think there are some classes like that out there, but they’re more or an anomaly than the norm, and they’re not really required. But it would be valuable for the students, regardless of where they go.

broadband tidbits

Despite all of the talk about the Comcast/NBC merger, they only filed a supplemental economic report last week, says FCC Chairman Julius Genachowski. The public comment period will begin soon.

The National Broadband Plan will be unveiled next week.

I thought all the talk about broadband accessibility was about rural areas, but the FCC is talking about people with disabilities as well.

My hometown, together with the U of I, won a 22.5 million dollar grant to connect “40 K-12 schools, 17 social service agencies, 14 healthcare facilities, nine youth centers, four public library systems, and two higher education institutions” and bring high speed internet to low-income neighborhoods.

Papers : you know, to organize ’em

I ponied up the money and bought Papers recently — it’s not perfect but it does let me store all of those pesky PDFs I have lying around in a convenient single location.

The program acts like “iTunes for your papers.” It has its own internal storage system (which is also customizable) and lets you create collections (e.g. playlists). The best feature is the interface to various repositories such as PubMed, ArXiV, JSTOR, ACM, and Web of Science. It technically lets you search IEEEXplore as well, but IEEE just upgraded their system (color me unimpressed), which broke the current version of Papers’ search interface. I’m sure it will get fixed soon enough.

What I wish it let you do was to tag papers so that you can click on a tag to see all papers tagged with that topic; while this functionality is there, it’s not transparent to do it. I’d also like it if the BibTeX was associated as metadata with the paper file, so that I could integrate it better with BibDesk. I had contemplated getting DEVONthink to organize all of my files, but I felt like that was overkill.

Does anyone else out there have a killer system for organizing papers? I know it’s just a crazy dream that I’ll actually get a chance to read most of the papers I have sitting on my hard drive, but I’ll be more likely to read ’em if I can find ’em.

some privacy humor

Ever since I started working on privacy problems (better living through statistics!), I am struck by the generally fatalistic view most people have about privacy. “The credit card companies know everything about me already,” “Google could easily steal my identity,” and so on. When sentiments like this become so widespread, they are fodder for humorists. (via Celeste LeCompte).

The “parity check” on credit card numbers

Via Lifehacker, I came across this short description of how credit card numbers are coded, and how the last digit is a parity check. It’s a cute example of “real world” error-detection that pretty much anyone could understand. Cute extra-credit problem: how many valid credit card numbers are there out there? (This reminds me of a USAMTS problem from ages past).