How do you attend conferences?

As I’ve gotten farther along in this whole research career, I’ve found it more and more difficult to figure out the optimal way to balance the different things one does at a conference :

  • Going to talks. This is ostensibly the point of the conference. It’s impossible to read all of the papers that are out there and a talk is a fast way to get the gist of a bunch of papers or learn about a new problem in less time than it takes to really read and digest the paper. We’re social creatures so it’s more natural to get information this way.
  • Meeting collaborators to talk about research problems. I have lots of collaborators who are outside TTI and a conference is a good chance to catch up with them face-to-face, actually sit down and hammer out some details of a problem, or work on a new problem with a (potential) new collaborator. Time sitting over a notepad is time not spent in talks, though.
  • Professional networking. I’m on the job market now, and it’s important to at least chat casually with people about your research, what you think is exciting your future plans, and the like. This is sometimes the “real” point of conferences.
  • Social networking. Sometimes conferences are the only times I get to see my friends from grad school, and in a sense your professional peers are the only people who “get” your crazy obsession with esoteric problem P and like to get a beer with you.

So the question for the readership : how do you decide the right balance for yourself? Do you go in with a plan to see at least N talks or a certain set S of talks, or are you open to just huddling in the corner with a notepad?

I wrote this post in an attempt to procrastinate about ITA blogging, which I will get to in a bit. I went to far fewer talks than I expected to this year, but I’ll write about ’em later.

PSA : ISIT submission formatting

If you, like me, tend to cart around old ISIT papers and just gut them to put in the new content for this year’s paper, don’t do it. Instead, download the template because the page size has changed from letter to a4.

Also, as a postscript to Sergio’s note that eqnarray is OVER, apparently Stefan recommends we use IEEEeqnarray instead of align.

Scholarly communication in conferences

A few weeks (!) ago I was talking with an anthropologist friend of mine about how different fields have different modes of communicating research “findings” in the conference setting. Some places people just read their paper out loud, others have slide presentations, yet others have posters, and I imagine some people do blackboard talks. Of course, conferences have many purposes — schmoozing, job hunting, academic political wrangling, and so on. What is unclear to me is why particular academic communities have settled on particular norms for presenting their work.

One axis along which to understand this might be the degree to which the presentation of the paper is an advertisement for the written paper. In many humanities conferences, people simply read their paper out loud. You’d think that theater researchers would be able to make a more… dramatic reading of their work, but you’d be wrong much of the time. It’s very hard to sit and listen and follow a jargon-heavy analysis of something that you probably have never read about (e.g. turn of the century commercial theater in Prague), and in some sense I feel that the talk as an advertisement for the paper is minimal here.

On the other hand, a poster session maximizes the “advertisement of the paper” aspect. People stand there for 5 minutes while you explain the ideas in the paper, and if seems sufficiently interesting then they will go and read the actual paper. A difference here between the model in the humanities is that there is a paper in the proceedings, while in humanities conferences this is not necessarily the case.

Slide presentations are somewhere in the middle — I often go to a talk at a conference and think “well, now I don’t need to read the paper.” These are the trickiest because the audience is captive but you cannot give them the full story. It’s more of a method for luring already-interested people into wanting to read the paper rather than the browsing model of a poster session.

However, even this “advertisement” categorization raises the question of why we have poster sessions, slide presentations, and paper readings. Are these the best way to present the research in those fields? Should we have more posters at ISIT and fewer talks (more like NIPS)? Should NIPS have more parallel sessions to reflect the spread of interest in the “community?” Should anthropology conferences have each panelist give an 8 minute slide presentation followed by real discussion?

I missed ITW in Lausanne this year, but I heard that they mixed up the format to great success. More posters and fewer talks meant more interaction and more discussion. I think more experimenting could be good — maybe some talks should be given as chalk talks with no slides!

NIPS 2012 : day two

I took it a bit easy today at the conference and managed to spend some time talking to collaborators about work, so perhaps I wasn’t as 100% all in to the talks and posters. In general I find that it’s hard to understand for many posters what the motivating problem is — it’s not clear from the poster, and it’s not always clear from the explanation. Here were a few papers which I thought were interesting:

W. Koolen, D. Adamskiy, M. Warmuth
Putting Bayes to sleep
Some signals look sort of jump Markov — the distribution of the data changes over time so that there are segments which have distribution A, then later it switches to B, then perhaps back to A, and so on. A prediction procedure which “mixes past posteriors” works well in this setting but it was not clear why. This paper provides a Bayesian interpretation for the predictor as mixing in a “sleeping experts” setting.

J. Duchi, M. Jordan, M. Wainwright, A. Wibisono
Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods
This paper looked at stochastic gradient descent when function evaluations are cheap but gradient evaluations are expensive. The idea is to compute an unbiased approximation to the gradient by evaluating the function at the \theta_t and \theta_t + \mathrm{noise} and then do the discrete approximate to the gradient. Some of the attendees claimed this is similar to an approach proposed by Nesterov, but the distinction was unclear to me.

J. Lloyd, D. Roy, P. Orbanz, Z. Ghahramani
Random function priors for exchangeable graphs and arrays
This paper looked at Bayesian modeling for structures like undirected graphs which may represent interactions, like protein-protein interactions. Infinite random graphs whose distributions are invariant under permutations of the vertex set can be associated to a structure called a graphon. Here they put a prior on graphons, namely a Gaussian process prior, and then try to do inference on real graphs to estimate the kernel function of the process, for example.

N. Le Roux, M. Schmidt, F. Bach
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
This was a paper marked for oral presentation — the idea is that in gradient descent it is expensive to evaluate gradients if your objective function looks like \sum_{i=1}^{n} f(\theta, x_i), where x_i are your data points and n is huge. This is because you have to evaluate n gradients. On the other hand, stochastic gradient descent can be slow because it picks a single i and does a gradient step at each iteration on f(\theta_t, x_i). Here what they do at step t is pick a random point j, evaluate its gradient, but then take a gradient step on all n points. For points i \ne j they just use the gradient from the last time i was picked. Let T_i(t) be the last time i was picked before time t, and T_j(t) = t. Then they take a gradient step like \sum_{i = 1}^{n} f(\theta_{T_i(t)}, x_i). This works surprisingly well.

Stephane Mallat
Classification with Deep Invariant Scattering Networks
This was an invited talk — Mallat was trying to explain why deep networks seem to do learning well (it all seems a bit like black magic), but his explanation felt a bit heuristic to me in the end. The first main point he had is that wavelets are good at capturing geometric structure like translation and rotation, and appear to have favorable properties with respect to “distortions” in the signal. The notion of distortion is a little vague, but the idea is that if two signals (say images) are similar but one is slightly distorted, they should map to representations which are close to each other. The mathematics behind his analysis framework was group theoretic — he wants to estimate the group of actions which manipulate images. In a sense, this is a control-theory view of the problem (at least it seemed to me). The second point that I understood was that sparsity in representation has a big role to play in building efficient and layered representations. I think I’d have to see the talk again to understand it better, but in the end I wasn’t sure that I understood why deep networks are good, but I did understand some more interesting things about wavelet representations, which is cool.

NIPS 2012 : day one

I am attending NIPS this year for the first time, and so I figured it would be good to blog about some of it here. I totally dropped the ball on Allerton, so maybe I’ll make up for it by writing more about the actual talks here. Fortunately, or unfortunately, most of the conference is about things I have almost no experience with, so I am having a bit of an explore/exploit tradeoff in my selection process.

Every day of the conference has a poster session from 7 to midnight — there are 90+ posters in a single room and people go in and out of hanging out with friends and looking at posters. My poster (a paper with Kamalika Chaudhuri and Kaushik Sinha on differentially private approximations to PCA) was last night, so I was on the presenting end of things. I gave up at 10:30 because I was getting hoarse and tired, but even then there were a fair number of people milling about. Since I was (mostly) at my poster I missed out on the other works.

During the day the conference is a single-track affair with invited and highlighted talks. There are two kinds of highlighted talks — some papers are marked for oral prevention (ETA: presentation), and some are marked as “spotlights,” which means that the authors get to make a 5 minute elevator pitch for their poster in front of the whole conference. Those start today, and I’m looking forward to it.

In the meantime, here is a picture from the hike I took yesterday with Erin:

Mountain Range on a hike near Lake Lily.

Mountain Range on a hike near Lake Lily.

DIMACS Workshop on Information-Theoretic Network Security

At DIMACS, I got a notice about a workshop here that is coming up in November with a deadline ofr November 5 to register: the DIMACS Workshop on Information-Theoretic Network Security organized by Yingbin Liang and Prakash Narayan. Should be worth checking out — they have a nice slate of talks.

If you do come though, don’t stay at the Holiday Inn — go for The Heldrich or a Hyatt or something that is anywhere near walking distance to restaurants or something. I think I almost got run over going to Walgreens yesterday in this land of strip malls…

ICML reviewing absurdity

I’m a reviewer for ICML 2013, which has a novel submission format this year. Papers for the first cycle were due October 1. They received more than they thought (by a significant factor), but I was only assigned papers to review today, more than 2 weeks later. We have been given 2 weeks to submit reviews — given my stack, that’s 2 weeks notice to review ~60 pages of material.

I may be going out on a limb here, but I think that the review quality is not going to be that high this time. Perhaps this is a Mechanical Turk approach to the problem — get a bunch of cheap noisy labels and then hope that you can get a good label by majority vote?

Update: We’ve been given another week, hooray.

Allerton 2012 : David Tse’s plenary on sequencing

I’ll follow up with some blogging about the talks at Allerton a bit later — my notes are somewhat scattershot, so I might give a more cursory view than usual. Overall, I thought the conference was more fun than in previous years: the best Allerton to date. For now though, I’ll blog about the plenary.

David Tse gave the “director’s cut” of his recent work (with Motahari, Bresler, Bresler, and Ramchandran) on a information theoretic model for next-generation sequencing (NGS). In NGS, many copies of a single genome are chopped up into short chunks (say 100 base pairs) of reads. The information theory model is a very simplified abstraction of this process — a read is generated by choosing uniformly a location in the genome and producing the 100 bases following that position. In NGS, the reads overlap, and each nucleotide of the original genome may appear in many reads. The number of reads in which a base appears is called the coverage.

So there are three parameters, G, the length of the genome, L, the length of a read, and N, the number of reads. The questions is how should these depend on each other? David presented an theoretical analysis of the reconstruction question under rather restrictive assumptions (bases are i.i.d., reads are noiseless) and showed that there is a threshold on the number of reads for successful reconstruction with high probability. That is, there is a number C such that N = G/C reads can reconstruct the sequence with high probability. The number C depends on L via L/\log G.

This model is very simple and is clearly not an accurate model for genome sequencing. David began his talk by drawing a grand analogy to the DMC — his proposition is that this approach will be the information theory of genome sequencing. I have to say that this sounds appealing, but it looks at a rather specific problem that arises in NGS, namely assembly. This is a first step towards building one abstract theory for sequencing and while the model may be simple, the results are non-trivial. David also presented some evidence about how real DNA sequences have features (long repeats) which make problems for greedy assemblers but can be handled by more complex assemblers based on deBruijn graphs. They can also handle noise in the form of i.i.d. erasures. What this analysis seems to do is point to features about the data that are problematic for assembly from an information-theoretic standpoint — this is an analysis of the technological process of NGS rather than saying that much about biology.

I’ve been working for the last year (more off than on, alas) on how to understand certain types of NGS data from a statistical viewpoint. I’ll probably write more about that later when I get some actual understanding. But a central lesson I’ve taken from this is that the situation is quite a bit different than it was when Shannon made a theory of communication that abstracted existing communication systems. We don’t have nearly as good an understanding NGS data from an engineering standpoint, and the questions we want to answer from this data are also unclear. Assembly is one thing, but if nothing else, this theoretical analysis shows that the kind of data we have is often insufficient for “real” assembly. This correlated with practice, as many assemblers produce large chunks of DNA, called contigs, rather than the full organism genome. There are many interesting statistical questions to explore in this data — what can we answer from the data without assembling organisms?

Allerton 2012 : Karl J. Åström’s Jubilee Lecture

It’s the fall again, and this year it is the 50th anniversary of the Allerton Conference. Tonight was a special Golden Jubilee lecture by Karl Johan Åström from the Lund University. He gave an engaging view of the pre-history, history, present, and future of control systems. Control is a “hidden technology” he said — it’s everywhere and is what makes all the technology that we use work, but remains largely unknown and unnoticed except during catastrophic failures. He exhorted the young’uns to do a better job at letting people know how important control systems are in everyday life.

The main message of Åström’s talk is that control theory and control practice need to get back together so that we can develop new control theories for emerging areas, including biology and physics. He called this the “holistic” view and pointed out that it really emerged out of the war effort during WWII, when control systems had to be developed for all sorts of military tasks. This got the mathematicians in the same room as the “real” engineers, and led to a lot of new theory. I guess I had always known that was a big driver, but I guess I hadn’t thought of how control really was the glue that tied things together.