ICML 2014: thoughts on the format

This is my first time at ICML, and every paper here has a talk and a poster. It’s a lot of work to prepare, but one nice benefit is that because my poster had to be done before I left, the talk was also pretty much done at the same time, modulo minor tweaks. Having to be ready early means less last-minute preparations and lower-stress at the conference overall. Another plus is that some talks are probably better as posters and some posters are probably better as talks, so the two modes of presentation gives a diversity to the delivery process. Some people also prefer talks to posters or vice-versa, so that’s good for them as well. Finally, the conference has 6 parallel tracks, so knowing that there’s a poster takes some of the stress out of deciding which session to attend — you can always catch the poster if you missed the talk.

The major minus is time. Sessions run from 8:30 to 6 and then posters run from 7 to 11 PM — it’s overwhelming! You can easily spend the entire conference at talks and then at posters, resulting in a brain overload. This also leaves less time for chatting and catching up with colleagues over dinner, starting up new research ideas or continuing ongoing projects in person, and the informal communication that happens at conferences. People do make time for that, but the format less conducive to it, or so it appeared to me. I ended up taking time off a bit during the sessions to take a walk around the Olympic park and have a chat, and I saw others leaving to do some sightseeing, so perhaps I am adhering to the schedule too much.

It’s interesting how different the modes of conference/social research communication are across research disciplines. I’ve yet to go to ICASSP or ICC, and while I have been to a medical informatics conference once, I haven’t gone to a Big Science conference or the joint meetings for mathematics or statistics. I imagine the whole purpose and format of those is completely different, and it makes me wonder if the particular formats of machine learning conferences are intentional: since there is rarely an extended/journal version of the paper, the conference is the only opportunity for attendees to really buttonhole the author and ask questions about details that are missing from the paper. Perhaps maximizing author exposure is a means to an end.

Advertisement

ICML 2014: Some talks and posters

I was a somewhat inconsistent note-taker here. Because a lot of the talks I attended were sufficiently out-of-area for me that I didn’t get the context for the work, I often found myself jotting a few “look at this later” pointers to myself rather than actual ideas from the talk.

First, the plenaries: Eric Horvitz, Michael Kearns, and Michael Jordan. Horvitz talked about how we’ve made a lot of progress in machine learning but there’s more work to be done in bringing humans back into the loop. Examples include developing semantics for what features mean, how to visualize the results, adding humans into the loop (e.g. active learning or interactive settings), crowdsourcing, and building tools that are sensitive to human cognitive limitations, like detecting and informing people of “surprising events,” which involves knowing what surprising means. He also announced a new data set, COCO for “common objects in context” (not Cocoa Puffs) which has around 300k-400k images and lots of annotations. The goal was to build al library of objects that a 4-year-old can recognize. Can a computer?

I honestly was a little too zonked/jetlagged to understand Michael Kearns’ talk, which was on challenges in algorithmic trading. He was focused on problems that brokers face, rather than the folks who are holding the risk. Michael Jordan gave a variant on a talk I’ve seen him give in the last few plenary/big talks I’ve seen: computation, statistics, and big data. The three examples he talked about were local differential privacy, bounds for distributed estimation, and the bag of little bootstraps.

As far as the research talks go, here are a few from the first day:

  • Robust Principal Component Analysis with Complex Noise(Qian Zhao; Deyu Meng; Zongben Xu; Wangmeng Zuo; Lei Zhang): This paper interpreted the Robust PCA problem (given Y = L = E where L is low-rank and E is sparse, recover L) in terms of MAP inference. The solution generally looks like a nuclear-norm plus L_1 regularization, which they claim implies a kind of Laplace-like model for the noise. They build a generative model and then change the distributions around to get different noise models.
  • Discriminative Features via Generalized Eigenvectors (Nikos Karampatziakis; Paul Mineiro): This was on how to learn features that are discriminative in a multiclass setting while still being somewhat efficient. The main idea was to look at correlations in the existing features via the tensor x \otimes x \otimes y where x are the features and y are the labels, and to then find generalized eigenvalues and eigenvectors by looking for vectors v that maximize (for a given (i,j) the ratio \frac{ \mathbb{E}[ (v^{\top} x)^2 | y = i] }{ \mathbb{E}[ (v^{\top} x)^2 | y = j] }. This nonlinearity is important for reasons which I wasn’t entirely sure about.
  • Randomized Nonlinear Component Analysis (David Lopez-Paz; Suvrit Sra; Alex Smola; Zoubin Ghahramani; Bernhard Schoelkopf): I really enjoyed this talk — basically the idea is kernel versions of PCA and CCA have annoyingly large running times. So what they do here is linearize the kernel using sampling and then do some linear component analysis on the resulting features. The key tool is to use Matrix Bernstein inequalities to bound the kernel approximations.
  • Memory and Computation Efficient PCA via Very Sparse Random Projections (Farhad Pourkamali Anaraki; Shannon Hughes): This talk was on efficient approximations to PCA for large data sets, but not in a streaming setting. The idea was, as I recall, that you have big data sets and different sites. Each site takes a very sparse random projection of its data (e.g. via a random signed Bernoulli matrix) and then these get aggregated via an estimator. They show that the estimator is unbiased and the variance depends on the kurtosis of the distribution of elements in the projection matrix. One thing that was interesting to me is that the covariance estimate has bias term towards the canonical basis, which is one of those facts that makes sense after you hear it.
  • Concept Drift Detection Through Resampling (Maayan Harel; Shie Mannor; Ran El-Yaniv; Koby Crammer): This talk was sort of about change-detection, but not really. The idea is that a learning algorithm sees examples sequentially and wants to tell if there is a significant change in the expected risk of the distribution. The method they propose is a sequential permutation test — the challenge is that a gradual change in risk might be hard to detect, and the number of possible hypotheses to consider grows rather rapidly. I got some more clarification from Harel’s explanation at the poster, but I think this is one where reading the paper will make it clearer.

Noted without notes, but I enjoyed the posters (sometimes I read them since the presenter was not around):

  • An Asynchronous Parallel Stochastic Coordinate Descent Algorithm (Ji Liu; Steve Wright; Christopher Re; Victor Bittorf; Srikrishna Sridhar)
  • Clustering in the Presence of Background Noise (Shai Ben-David; Nika Haghtalab)
  • Demystifying Information-Theoretic Clustering (Greg Ver Steeg; Aram Galstyan; Fei Sha; Simon DeDeo)
  • Consistency of Causal Inference under the Additive Noise Model (Samory Kpotufe; Eleni Sgouritsa; Dominik Janzing; Bernhard Schoelkopf)
  • Concentration in unbounded metric spaces and algorithmic stability (Aryeh Kontorovich)
  • Hard-Margin Active Linear Regression (Zohar Karnin; Elad Hazan)
  • Heavy-tailed regression with a generalized median-of-means (Daniel Hsu; Sivan Sabato)

Greetings from ICML 2014

The famous "bird nest"

The famous “bird nest”

Greetings from ICML 2014! I will attempt to blog the conference in between attending sessions, giving my talk and poster, and stressing out about writing my CAREER award. Despite what Google Maps might tell you, my hotel is not across the street from the stadium pictured above — this led to a rather frustrating 30 minutes of walking around asking for directions. I do, however, have a lovely view from my room of the Bank of Communications (交通银行), which seems appropriate, somehow.

I can’t access Facebook or Twitter from China without some crazy paid VPN solution it seems (if you have any tips, feel free to email me), so I don’t know if this post will even make it to those services. It’s probably for the best — social media is too much of a distraction, right?

Line-item cost of one student-year on a grant?

I am in the process of writing some proposals and am encountering the fun task of generating budgets for those proposals. Rutgers, like many cash-strapped schools, imposes a hefty “overhead” charge on federal grants (the so-called indirect costs) amounting to something like more than 50% of the value of the grant. Since I’m primarily a theory guy, the largest line item on any grant I write is generally a graduate students. With stipend, tuition, fees, and benefits, a calendar-year appointment for a graduate student costs around $90k, factoring indirect costs. Given that an NSF Small award caps out at $500k, it’s quite difficult to support more than one student for a small grant. This in turn limits the scope of research one can propose — it’s all fine and well to say there are 15 journal papers’ worth of results stemming from your great ideas, but 3-4 student years is probably not enough to make that happen.

I know some schools offer a tuition break for RAs/GSRs, but I am not sure how prevalent this practice is. So I put it to the readers of the blog: what is the line-item cost to support a graduate student for one year (without travel etc.) at your institution?

NIPS 2014 Review Quality Control Procedure

I got this email yesterday:

Dear Author of a NIPS 2014 Submission,

You are in for a treat! This year we will carry out an experiment that will give us insight to the fairness and consistency of the NIPS reviewing process. 10% of the papers, selected at random, will be duplicated and handled by independent Area Chairs. In cases where the Area Chairs arrive at different recommendations for accept/reject, the papers will be reassessed and a final recommendation will be determined.

I welcome this investigation — as an author and reviewer, I have found the NIPS review process to be highly variable in terms of the thoroughness of reviews, discussion, and the consistency of scores. I hope that the results of this experiment are made more publicly available — what is the variance of the scores? How do score distributions vary by area chair (a proxy for area)? There are a lot of ways to slice the data, and I would encourage the organizing committee to take the opportunity to engage with the “NIPS community” to investigate the correlation between the numerical measures provided by the review process and the outcomes.

Readings

My first semester on a “real job” was sufficiently busy to prevent me from reading as much as I would have wanted. I also blame the driving commute.

Hidden History of New Jersey [Joseph G. Bilby, James M. Madden, and Harry Ziegler]. This was a gift from Erin, who apparently has not written up her analysis of the Twins’ crushing of the Yankees yet. This is a collection of short essays about New Jersey and some of the quirkier characters who provide local color. There’s a heavy focus on military history, which was less interesting to me, but later chapters delve into the immigration history and politics in Jersey City and elsewhere that provide a useful context and analogy for our current situation. There’s a wealth of references, although as the authors point out, no book on the history of the Klan in New Jersey. Apparently they had some sort of summer resort there.

Husband of a Fanatic [Amitava Kumar]. Kumar looks at Hindu-Muslim and India-Pakistan relations after the Babri Masjid riots through the lens of his own marriage to a Pakistani Muslim woman. A fascinating and harrowing book which did not leave me particularly optimistic about the new Modi government.

The Pun Also Rises [John Pollack]. A present from my brother, this book is a delightful (or if you are No Fun, painful) tour through the history and variety of puns and joking wordplay. Pollack waxes poetic a bit, but this is fun read.

Carpe Jugulum [Terry Pratchett]. A Discworld novel with the witches and vampires. Brain candy.

Your Republic Is Calling You [Young-Ha Kim]. This is a novel about a North Korean “sleeper agent” in Seoul who thinks he’s been forgotten but after years of no instructions is given a day to rendezvous with a pickup that will take him back to the North. He’s grown comfortable in his new life though, and things are difficult. The back of the book compares Kim’s writing to Murakami’s (hard to tell because it’s all in translation). I initially felt that poisoned my reading of the book, but in the end I think that they are similar in tone/affect. I rather enjoyed this book, and it made me want to investigate more contemporary Korean literature.