Here’s an interview with David Goodstein, who’s just written a book titled On Fact and Fraud: Cautionary Tales from the Front Lines of Science, which looks like an interesting read.
Author Archives: Anand Sarwate
Missed Aches
I saw an absolutely hilarious short animation called Missed Aches (by Joanna Priestly) at Spike and Mike’s Festival of Animation last week. If it comes your way you should definitely see it. Here’s a clip (slightly NSFW) from the first part of the film.
Privacy and Google Web History
Posted on ArXiV last night: Private Information Disclosure from Web Searches. (The case of Google Web History), by Claude Castelluccia, Emiliano De Cristofaro, Daniele Perito.
Our report was sent to Google on February 23rd, 2010. Google is investigating the problem and has decided to temporarily suspend search suggestions from Search History. Furthermore, Google Web History page is now offered over HTTPS only. Updated information about this project is available at: this http URL
The link above has some more details of their back and forth with Google on the matter, and at least it looks like Google’s on the losing end of it.
Search histories have a lot of information in them, since searches correlated with local events, such as disease spread (related and interesting is Twitter’s tracking of earthquakes). Since user sessions can be compromised by someone hijacking the cookies that maintain the session, Google requires HTTPS for many services, like GMail, but not for the “automatic suggestion” for searches. The authors implemented an attack called The Historiographer:
The Historiographer uses the fact that users signed in any Google service receive personalized suggestions for their search queries based on previously-searched keywords. Since Google Web Search transmits authentication cookies in clear, the Historiographer monitoring the network can capture such a cookie and exploit the search suggestions to reconstruct a user’s search history.
This attack is not looking at a short time-window of browsing history, but essentially the entire search history as stored by Google. They did real experiments, and found:
Results show that almost one third of monitored users were signed in their Google accounts and, among them, a half had Web History enabled, thus being vulnerable to our attack. Finally, we show that data from several other Google services can be collected with a simple session hijacking attack.
So how does it work? The program hijacks the SID cookie from the user by eavesdropping, and then issues prefixes to the suggestion services; that is, it simulates a user typing in the first few letters of a search query. Prefixes have to be at least 2-3 letters to trigger the suggestion, and the top 3 completions are given. Of course 26^3 is a lot of prefixes to try, so the system has to sample effectively. The system just queries the top 10% of most frequent 3-letter prefixes (based on the statistics of English), which amounts to 121 queries to the system. If a particular 2-letter prefix (e.g. “pr”) is a prefix for many 3-letter prefixes (e.g. “pre”, “pra”, “pro”) which result in 3 completions, they will proceed greedily to look at longer prefixes in that direction. Note that this is the same principle behind Dasher (or arithmetic coding, really).
Based on this, the system can reconstruct the search history for the hijacked user. By using Google’s personalized results service, they can also get more information about the user’s preferences. A little more worrying is this observation:
In fact, a malicious entity could set up a Tor exit node to hijack cookies and reconstruct search histories. The security design underlying the Tor network guarantees that the malicious Tor exit node, although potentially able to access unencrypted traffic, is not able to learn the origin of such traffic. However, it may take the malicious node just one Google SID cookie to reconstruct a user’s search history, the searched locations, the default location, etc., thus signicantly increasing the probability of identifying a user.
It’s an interesting paper, and worth a read if you are interested in these issues.
self (the remix)
Last weekend I had a chance to see Mo’olelo Performing Arts Company‘s production (they also have a blog) of Robert Farid Karimi’s self (the remix) featuring Karimi and DJ D Double:
Storyteller/performance artist, def poetry jam performer, national poetry slam champion robert farid karimi — supported by an amazing soundscape spun live by Chicago DJ and Violator All-Star DJ D Double — mixes together stories, movement, and music to tell the tale of a first generation child of Iranian and Guatemalan immigrants learning how to survive the cultural imperialism of the United States on his quest to find wholeness in the fractured atmosphere of the 70s and 80s.
It’s a coming-of-age story that seems to have a new relevance given the current tensions between the US and Iran and the heated rhetoric around immigration. I usually enjoy solo performance, and although this is technically a dual performance, the “style” is similar to other narrative solo performances (c.f. Josh Kornbluth). What was particularly effective is the way in which DJ D Double weaves the soundtrack and effects into the narrative. It’s rapid-changing and pulls samples, beats, and songs from every direction, providing an structure to support Karimi’s performance while commenting and in an effect becoming its own character. In terms of “solo performance,” it’s some of the best use of sound I’ve seen.
The show only has a few more performances, starting tonight and going through this weekend. If you’re in San Diego and reading this (probably 5 people total), then go check it out!
giving no credit where it is not due
Luca pointed to a paper by Chierichetti, Lattanzi, and Panconesi, which has an amusing comment in the last section (I don’t want to spoil it).
The paper itself is interesting, of course. Conductance often appears in bounds on mixing times for Markov chains, but the rumor spreading problem is a bit different than the consensus problems that I have studied in the past. A nice quote from the introduction:
Our long term goal is to characterize a set of necessary and/or suffcient conditions for rumour spreading to be fast in a given network. In this work, we provide a very general suffcient condition — high conductance. Our main motivation comes from the study of social networks. Loosely stated, we are looking after a theorem of the form “Rumour spreading is fast in social networks”. Our result is a good step in this direction because there are reasons to believe that social networks have high conductance.
Not-so-recent reads
Mixing It Up: Taking on the Media Bullies and Other Reflections (Ishmael Reed) — This is a collection of Reed’s more recent writings, with big pieces on Don Imus, Kobe Bryant, and the John McWhorter. He also has a set of interviews with Sonny Rollins and a nice essay on Charles Chesnutt. Reed’s writing is ascerbic as ever, but I found the essays a mixed bag. For me, some of the nicest pieces were the shorter ones, but they were all thought-provoking.
Karnak Café (Naguib Mahfouz) — This is a short novella set after the war in which Egypt lost Sinai. The narrator frequents this cafe where three younger university students also hang out. The three disappear one day during a string of arrests and return months later. The narrator begins to learn of their experiences and how their lives were destroyed by the government’s manipulation. It’s a tightly written and compelling story that seems all-too-relevant these days. I highly recommend it.
Jhegaala (Steven Brust) — This is the latest in the Vlad Taltos series, and is mainly about untangling the hidden relationships in a small town of Easterners. If you like the series already (or are addicted) you will read it anyway, and it definitely will not make sense without reading the rest of the series…
Yellow : Race in American Beyond Black and White (Frank H. Wu) — One of the earlier books on Asian Americans and politics that was targeted towards a large readership. Although it feels a little dated now (if that is possible), it still makes some solid points. However, the end of the book was a bit disappointing, with its big love for Deep Springs.
Let’s Get Free : A Hip-Hop Theory of Justice (Paul Butler) — After a riveting first chapter, ex-prosecutor Butler takes us on a tour of how the modern criminal justice system is stacked and requires active resistance from the public. He’s an expert on jury nullification, which I didn’t know about before. However, the book kind of derails in the last few chapters with its discussion of new technologies feeling a bit more rambling than making a tight point. It was a quick and interesting read, though.
New Roots in America’s Sacred Ground (Khyati Joshi) — This was a study of mostly middle-class 2nd generation desis and their religious practices. The strong parts of the book came from the interviews, but I wasn’t sure if I agreed with all of its conclusions. Also the focus on professionals versus working-class people makes me feel like the picture was incomplete.
Slumberland (Paul Beatty) — A deeply weird tale of DJ Darky (Schallplattenunterhalter Dunkelmann), who moves to Berlin to find an old jazzman and complete the most perfect beat. I couldn’t put this book down, but I think it appealed to me because of the combination of Germany and jazz.
Faceless Killers (Henning Mankell) — An early Kurt Wallander mystery. People say his mysteries are more violent than the norm, but I found it engrossing albeit depressing.
Maus I and Maus II (Art Spiegelman) — This is a re-read — I had read them apart and now I read them back to back. A must-read.
Writing for Social Scientists (Harold Becker) — I’m not a social scientist, but this book has a lot of useful advice on how to write and edit, which I think would have been useful while writing my thesis but is also good for thinking about research projects in general.
Asterios Polyp (David Mazzucchelli) — It’s a pretty amazingly constructed graphic novel about Ideas about Art, incredibly controlled to the point of sometimes feeling trite, but the way in which style and substance are married on the page makes it a real delight to read.
Black Hole (Chris Burns) — Deeply disturbing and somewhat traumatic and somewhat hopeful. I’ve been wanting to read this since an excerpt was published in McSweeney’s comics issue.
Interracial Intimacies (Randall Kennedy) — The first (and largest) part of the book is a history of black-white race relations in America from the perspective of interracial relationships. Kennedy chooses historical examples carefully to advance the thesis that deep and meaningful romantic relationships existed even during slavery. He then spends the last two chapters of the book railing against any and all race-matching in adoption, including a rather stunningly misguided argument against the Indian Child Welfare Act (ICWA). In all of these arguments, Kennedy blithely dismisses some studies as wrong because they contradict his opinion (invalid), others for having low sample counts (valid), but in all cases argues his own point via anecdote. While detailed research and example cases help bolster his points about history, they fail stunningly to make a rational case about policy, and his un-nuanced view further highlights the poverty of his own evidence. In a sense, Kennedy advances a moral argument (“race matching is bad”) by saying contrary evidence is not representative but never making the case that his own evidence is representative. Maybe lawyers shouldn’t make arguments about ethics. In any case, this book left a bad taste in my mouth, not because I think race-matching is always good, but because the argument was so bad.
A history of the bacon-wrapped hotdog
A fascinating history of the bacon-wrapped hotdog (via Kirk K), a staple of late-night street food which I first encountered in the Mission in SF. They’re not for the faint of heart or high of cholesterol, so I can enjoy by looking at the pictures.
Netflix Prize II is cancelled
Via John Langford, I learned that the sequel to the Netflix Prize has been cancelled due to privacy concerns. The paper by Narayanan and Shmatikov (also at the Oakland Security and Privacy Conference, 2008) showed that by combining the public information available via IMDB and the Netflix data, certain individuals could be re-identified. Netflix was sued over the privacy problems, and they’ve settled the suits and decided not to release the new dataset (which was to have demographic information).
Demographic information is known to be pretty valuable in re-identification. The most famous example is Latanya Sweeney’s re-identification of the Governor of Massachusetts by linking (free) hospital discharge records and ($20) voter registration records. In the healthcare field, these kind of disclosures violate HIPAA, but this Netflix case raises an interesting question with regards to privacy promises. When a company assures you that your private information will only be used internally for quality control purposes, what are they actually promising? If they issue summary statistics and give those to third parties, is that privacy preserving?
The answer is no. However, people seem quite loath to worry about these kind of disclosures unless there is a public (and dangerous) privacy breach. This is why Narayanan and Shmatikov’s paper is important — the way to get the public (and hence policymakers) to take privacy seriously is to demonstrate that existing methods are insufficient.
What’s the point of an X department?
Over at Crooked Timber there’s a discussion on eliminating some majors to save money, particularly if they don’t have many graduates.
The issue made it to Leiter because several of the Philosophy departments in those institutions fall into the low-major category. But is producing Philosophy majors the point of having a Philosophy department? In Our Underachieving Colleges (CT review still on its way: DD to blame if I never get round to it) Derek Bok claims that the standard assumptions within most departments in research universities is that the undergraduate curriculum is for attracting and then teaching majors, and, further, that our attention to the majors should be shaped by the aim of preparing them well for graduate school. This means that the curriculum is designed for a tiny minority of the students who take classes, and even many of them, probably, would be better off doing something other than going to graduate school (that’s me, not Bok, saying the last bit).
Philosophy departments should take heed of Samidh’s observation that philosophers are good entrepreneurs and point out that they may produce the next big alumni donor!
I wonder the degree to which Bok’s claim is true in mathematics, science, and engineering. I think it’s probably true that the average biology major or electrical engineer is being prepared for work at a company. Even senior electives are useful in this sense, especially if they are project-oriented. However, it’s probably the case that if you major in math and do not plan to go to graduate school, then your senior seminar in commutative algebra is pretty much useless for the work you’ll do later. But is the average math major at a public university being prepared for (some) graduate program? Is math in this sense closer to the humanities programs mentioned above?
In electrical engineering, it’s to go work in a company (or for the government) designing/building stuff, and those specialized classes are geared for that. On average, I think undergraduate programs in engineering in the US don’t emphasize going on to graduate study. An exception is the profit-turning one-year masters programs that have become popular in recent years. Designing a program to prepare people primarily for graduate school or designing a program to prepare people primarily for the workforce misses the point of college.
The story you hear is that a classical liberal arts education in the US is supposed to teach you to think critically and be an active and thoughtful member of society. So what does that mean for engineers? In a sense, design choices are a form of critical analysis within the context of engineering, but I think that kind of perspective can be construed more broadly. We’re so keen on formulating notions of optimality or engineering tradeoffs that we don’t also consider the societal aspects of the things that we design. It would be nice to get upper-division engineering classes that talk about where technology is headed, where society is headed, and how those interact on a more technical level. This kind of thinking is good preparation for work and for research. I think there are some classes like that out there, but they’re more or an anomaly than the norm, and they’re not really required. But it would be valuable for the students, regardless of where they go.
broadband tidbits
Despite all of the talk about the Comcast/NBC merger, they only filed a supplemental economic report last week, says FCC Chairman Julius Genachowski. The public comment period will begin soon.
The National Broadband Plan will be unveiled next week.
I thought all the talk about broadband accessibility was about rural areas, but the FCC is talking about people with disabilities as well.
My hometown, together with the U of I, won a 22.5 million dollar grant to connect “40 K-12 schools, 17 social service agencies, 14 healthcare facilities, nine youth centers, four public library systems, and two higher education institutions” and bring high speed internet to low-income neighborhoods.