Domingos on what you should know about machine learning

Dhruv Batra forwarded this Communications of the ACM article by Pedro Domingos, entitled “A Few Useful Things to Know about Machine Learning” [free version] The main point from the abstract is:

However, developing successful machine learning applications requires a substantial amount of “black art” that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.

The article focuses on the classification problem to illustrate these “key lessons.” It’s well-worth reading, especially for people who don’t work on machine learning because it explains a number of important issues.

  1. It illustrates the gap between what the theory/research works on and the nitty-gritty of applying these algorithms to real data.
  2. It gives people who want to implement an ML method important fundamental questions to ask before starting : how do I represent my data? How do I evaluate performance? How do I do things efficiently? These have to get squared away first.
  3. Domain knowledge and feature engineering are the keys to success.

Since I’m guessing there are 2 machine learners who read this blog, go read it (unless you are one of my friends who doesn’t care about all of these technical posts).

Sita tries to send a message to Rama using a digital certificate

Via Erin (via Bruce Schneier’s blog), I found out about S. Parthasarathy‘s proposal to replace Alice and Bob with Sita and Rama. I have been known to use Alice and Bob on occasion (unlike some people I find the anthropomorphizing to be good, on the balance), but perhaps I should develop some cultural pride and make the switch to “a smarter alternative to these characters.” According to Parthasarathy, there is greater literary relevance to the scenario where Sita wants to send a message to Rama. The dramatic personae in this version are:

  • Sita : kidnapped maiden who wishes to send a message
  • Rama : brave prince who is to receive the message
  • Hanuman : the honest broker who relays the message
  • Ravana : the rogue-in-the-middle who acts as the adversary. To avoid confusing first letters, let’s rename him Badmash.

There are a number of other appealing allusions in this scenario.

I think it’s a fun exercise — can one come up with other settings? Perhaps based on Gilgamesh, or Star Wars. I’m sure at least one reader of this blog could come up with a Battlestar Galactica scenario. Adama to Baltar?

Also, I couldn’t help but point to this chestnut, the real story of Alice and Bob (h/t to my father).

DIMACS Workshop on Differential Privacy

Via Kamalika, I head about the DIMACS Workshop on differential privacy at the end of October:

DIMACS Workshop on Differential Privacy across Computer Science
October 24-26, 2012
(immediately after FOCS 2012)

Call for Abstracts — Short Presentations

The upcoming DIMACS workshop on differential privacy will feature invited talks by experts from a range of areas in computer science as well as short talks (5 to 10 minutes) by participants.

Participants interested in giving a short presentation should send an email to asmith+dimacs@psu.edu containing a proposed talk title, abstract, and the speaker’s name and affiliation. We will try to
accommodate as many speakers as possible, but

a) requests received before October 1 will get full consideration
b) priority will be given to junior researchers, so students and postdocs should indicate their status in the email.

More information about the workshop:

The last few years have seen an explosion of results concerning differential privacy across many distinct but overlapping communities in computer science: Theoretical Computer Science, Databases, Programming Languages, Machine Learning, Data Mining, Security, and Cryptography. Each of these different areas has different priorities and techniques, and despite very similar interests, motivations, and choice of problems, it has become difficult to keep track of this large literature across so many different venues. The purpose of this workshop is to bring researchers in differential privacy across all of these communities together under one roof to discuss recent results and synchronize our understanding of the field. The first day of the workshop will include tutorials, representing a broad cross-section of research across fields. The remaining days will be devoted to talks on the exciting recent results in differential privacy across communities, discussion and formation of interesting open problems, and directions for potential inter-community collaborations.

The workshop is being organized by Aaron Roth (blog) and Adam Smith (blog).

Linkage

I’m being lazy about more ISIT blogging because my brain is full. So here are some links as a distraction.

Via John, George Boolos’s talk entitled Gödel’s Second Incompleteness Theorem Explained in Words of One Syllable.

D’Angelo is back!

This short video about a subway stair in New York is great, especially the music.

Crooked Timber is on a tear about workplace coercion and its proponents.

Luca’s thoughts on the Turing Centennial are touching.

Linkage

Via Brandy, Kenji breaks down perfect hard boiled eggs. See also sauceome.

Bret Victor talks about Inventing on Principle — the first half are a lot of demos of some pretty amazing applications of his major driving principle, which is that creators should be able to see what they are creating in real time. He sometimes waxes a little TED-like, but overall, quite inspiring.

My high school history teacher, Chris Butler, has turned his award-winning lecture notes and flowcharts into an iPad app which is available on the App Store.

Queen, live at Wembley. (via MeFi)

Some pretty cool visualizations of sorting. (via logistic aggression)

Linkage

It’s been a busy week, deadline-wise, but I did see a few cool things on the interwebs which seemed worth sharing:

Tarantulas molting, courtesy of my high school biology teacher and ExploraVision coach extraordinare, Mr. Stone (his blog is cool too).

Keeping with the nature theme, find the cuttlefish!. The octopus video is cool too. Thanks to my commute being a bit longer, I listen to Science Friday podcasts as well as Story Collider, which is a pretty cool Moth-meets-science storytelling podcast.

Sometimes papers use pretty strong words in their titles (see for more context). On that note, some letters from John Nash (see also) were recently declassified by the NSA wherein he seems to predict fundamentals of cryptography and computational complexity. In more Rivest news, he coded up the cryptosystem.

In sadder news (also not so recent now), De Bruijn passed away. I’ve started a bioinformatics project recently (maybe more like “started”) and DeBruijn graphs are a pretty useful tool for making sense of data from next-generation sequencing technologies. Here are some animations describing how Illumina and 454 sequencing work.

Maybe when it gets warmer I will put together a worm bin — I miss the curbside composting of Berkeley.

I get a lot of positive comments about this shirt, but Topatoco are discontinuing it. Speaking of potatoes, Lav has a nice post with some links to papers on the importance and history of potatoes.

Call for Papers : ICITS 2012

I am on the PC for this conference, so I figured I would advertise the CFP here for those readers who would be interested.

6th International Conference on Information-Theoretic Security
Montreal, Quebec, Canada
August 15–17, 2012

This is the sixth in a series of conferences that aims to bring together the leading researchers in the areas of information theory, quantum information theory, and cryptography. ICITS covers all aspects of information-theoretic security, from relevant mathematical tools to theoretical modeling to implementation. Papers on all technical aspects of these topics are solicited for submission.

Note that this year there will be two distinct tracks for submission.

Important Dates:

  • Conference Track Submission: Monday, March 12, 2012
  • Conference Track Notification: Friday, May 4, 2012
  • Proceedings version: Tuesday, May 29, 2012
  • Workshop Track Submissions: Monday, April 9, 2012
  • Workshop Track Notification: Monday, May 28, 2012

Note: ICITS (Aug. 15-17, Montreal) is the week before CRYPTO 2012 (Aug. 20–23, Santa Barbara).

Two Tracks: Conference and Workshop

The goal of ICITS is to bring together researchers on all aspects of information-theoretic security. To this end, ICITS 2012 will consist of two types of contributed presentations. The conference track will act as a traditional conference (original papers with published proceedings). The workshop track will operate more like an informal workshop, with papers that have appeared elsewhere or that consist of work in progress.

  1. Conference Track (with proceedings): Submissions to this track must be original papers that have not previously appeared in published form. Accepted papers will be presented at the conference and will also be published in the conference proceedings (which will appear in Springer’s Lecture Notes in Computer Science series). We note that simultaneous submission to journals is acceptable, but simultaneous submission to other conferences with published proceedings is not.
  2. Workshop Track (no proceedings): To encourage presentation of work from a variety of fields (especially those where conference publication is unusual or makes journal publication difficult), the committee also solicits “workshop track” papers. Accepted papers will be presented orally at the conference but will not appear in
    the proceedings. Submissions to this track that have previously appeared (or are currently submitted elsewhere) are acceptable, as long as they first appeared after January 1, 2011. Papers that describe work in progress are also welcome. We note that the same standards of quality will apply to conference and workshop papers.

Conference Organization:

Program Chair: Adam Smith (Pennsylvania State University)
Program Committee:

  • Anne Broadbent (University of Waterloo)
  • Thomas Holenstein (ETH Zurich)
  • Yuval Ishai (Technion)
  • Sidharth Jaggi (CU Hong Kong)
  • Bhavana Kanukurthi (UCLA)
  • Ashish Khisti (University of Toronto)
  • Yingbin Liang (Syracuse University)
  • Prakash Narayan (University of Maryland)
  • Louis Salvail (Universite de Montreal)
  • Anand Sarwate (TTI Chicago)
  • Christian Schaffner (University of Amsterdam)
  • Adam Smith (Pennsylvania State University)
  • Stephanie Wehner (National University of Singapore)
  • Daniel Wichs (IBM Research)
  • Juerg Wullschleger (Universite de Montreal)
  • Aylin Yener (Pennsylvania State University)

General Chair: Juerg Wullschleger (Universite de Montreal)
Local Co-Chairs: Claude Crepeau (McGill University) and Alain Tapp
(Universite de Montreal)

Detailed instructions for authors can be found in the full CFP, available on the website.

Linkage

Via Jay P., a pretty amazing dance video.

Via 530nm330Hz, a very interesting tidbit on the history of the one-time pad. A free tech report version is available too. The one-time pad XOR’s the bits of a message with a i.i.d. random bitstring of the same length, and is credited to Gilbert Vernam and Joseph Mauborgne. However, as Steven Bellovin‘s paper shows,

In 1882, a California banker named Frank Miller published Telegraphic Code to Insure Privacy and Secrecy in the Transmission of Telegrams. In it, he describes the first one-time pad system, as a superencipherment mechanism for his telegraph code. If used properly, it would have had the same property of absolute security.

Although in theory Miller can claim priority, reality is more complex. As will be explained below, it is quite unlikely that either he or anyone else ever used his system for real messages; in fact, it is unclear if anyone other than he and his friends and family ever knew of its existence. That said, there are some possible links to Mauborgne. It thus remains unclear who should be credited with effectively inventing the one-time pad.

Another fun tidbit : apparently mother’s maiden name was used for security purposes way back in 1882!

I really like shiso leaves and their cousins. I had a shiso plant but it did not survive the California sun / I have a black thumb. One of my favorite meals at ISIT 2009 was with Bobak Nazer, where we found an out-of-the way BBQ joint where they brought us a long box filled with 7 varieties of leaves, including perilla leaves. It makes me hungry just writing about it.

Kudos to Adrienne for the amazing photo.

There’s Only One Sun, a short sci-fi film by Wong Kar-Wai.