pybliographer

Today I discovered pybliographer, a decent (if not perfect) BibTeX management tool. Surprisingly, Ubuntu had a package for it already (as does Fedora and Mandrake I think), so it was a breeze (-y badger?) to install. I think I might just start maintaining a huge single BibTeX file and then pull out paper-appropriate subsets as I need them. I’m hoping that they add folders or something to later versions.

When I get my schmancy new Mac laptop I’ll use BibDesk, which looks even better.

research talks as theater

I’ve gone to a number of job talks lately, since the department is interviewing people, which brought up something I’ve been mulling over since I started grad school. I’ve gone to a huge number of engineering talks aimed at “a general audience.” There are levels of generality, from “control theorists not studying distributed control of hybrid dynamical systems” to “people who look at systems engineering” to “electrical engineers” to “engineers” to “technical people” to “layperson.” A shockingly large number of people I’ve seen talk fail to grasp the fineness of these gradations.

One problem is the “intelligence of the group” issue. In order to pitch their talk to a wider audience, the speaker will dumb down a portion of their research or will abstract away some details. In the former case, they obscure their own contribution by making the problem seem easy. If they then go and introduce some complicated algorithm to solve the problem, the audience may wonder why they went to all that effort. In the latter, they often make their problem seem very similar to another problem that is very simple. When pressed on the point they may fumble because it’s easier to abstract away than to put in the details afterwards. Both simplification and abstraction are important to make the material accesible, but I’ve seen many talks run afoul of underestimating the audience’s ability to follow the argument and find the inconsistencies.

Often times speakers mis-focus their attention. I’ve seen this happen in several ways. Sometimes they wrote the talk for some one hour seminar to a more general audience and then tried to give it to a narrower group in less time. Other times they have just one set of slides for all versions of the talk and get bogged down in the beginning. These can be fixed by just making a fresh set of slides for every talk. Slightly less frequently, they feel the problem needs 5 motivating examples in order to get people interested and they spend all the time explaining their examples. This also happens when work is interdisciplinary. For example, do not give a talk to machine learning people by emphasizing all the points that are more of interest to cancer biologists.

The last and most egregious problem, I think, is that speakers do not have an objective to their talk. Maybe it’s the actor in me, but giving a talk is like doing a monologue, and you can’t just get up on stage and read the text of the monologue without pointing every line and without making the whole thing have an overarching objective. The objective of a job talk should be self-evident (although not to everyone, it seems). Conference talks need objectives too. Most importantly, if you have a poster you better have an objective or people will leave while you’re talking, a truly disheartening experience, as I well know.

I’m not saying that giving a talk is easy — it is a piece of theater, and like all pieces of theater it can be amazing, terrible, or “not quite work.” But thinking about all these talks really reminds me that these aren’t things you can just “phone in,” especially if they are about your research. And some people just don’t think about that enough before getting up there.

am I a peer?

I’ve actually been invited to review a paper, which is a first for me. I guess all that stuff on peer review I’ve been reading during my procrastinating moments will actually come in use now. They’ve asked me to turn in my review in two months, which is good for me, since I have too many looming deadlines. I guess this is why publishing takes so long — reviewers take 2 months, editorial checking and decision, revisions, another round of reviewing, more revisions, and submission. It’s no wonder that in computer science they put all their energy into conferences — it keps the pace of research higher.

something to not do while working on your thesis

Do not open two copies of the same chapter of your thesis, make 2 hours worth or revisions on it, and then save the old version over the changes before closing the file. Furthermore, do not fail to notice this problem until 6 hours after said changes were made. It will make you think that you have actually gone crazy and that the beautiful new draft with eloquent rephrasings was entirely in your head.

NYU Strike

From PhD Comics and MetaFilter I read about the NYU grad student strike that has apparently started getting ugly.

There are several divisive strategies that the University (read : the Man, the Bosses, what-have-you) has to break the TA union’s efforts to get a contract negotiated. The first is to convince the undergrads to blame the TAs. In a normal strike, many customers will not cross the picket line because they can go to another business. This doesn’t fly in the university setting — students (or their parents) are paying big bucks, and the university can punish undergrads by telling them that their coursework won’t count for this semester if the strike continues. The undergrads will turn around and blame the TAs, lending support for the university’s position.

A second problem is institutional. Graduate students in the sciences and engineering are most often supported by research assistantships for the bulk of their time. TA-ing is considered an obligation for graduation or as a means of support when grant money is thin or you are working on your dissertation. Graduate students in those fields are sometimes ambivalent about the union, because “it doesn’t really affect them.” The GSI (read: TA) union at Berkeley is pretty strong, despite the off-putting sloganeering and obligatory “in solidarity” at the end of every email. I’m very pro-labor, but they use rhetoric that was in vogue back when the Wobblies were news. I know a lot of people who did not join the union simply because they didn’t see the point. The workload for TA-ing varies widely across departments — richer ones hire separate graders, for example. By encouraging this heterogeneity, the inclination to authorize or participate in a strike is reduced.

Finally, the university will invariably give a misleading characterization of the benefits offered to graduate students. Since their whole position is that TAs are students first and employees last, they lump in tuition, fees, and all other benefits as the total compensation given to graduate students. These figures show the truth of the situation — without paying TAs, they wouldn’t get the tuition money anyway and they’d have to hire adjunct faculty who are on average more expensive. But by just trotting out the figures they can make it seem like grad students are handsomely compensated for their time. Again, these figures are for the consumption of undergrads and parents — my tuition is much less since I’m in-state at a public school, and wouldn’t look quite as impressive.

Anyway, more power to the union, and I hope the university comes around.

a paper a day : 1

A new feature! Just to keep myself motivated on research and to dissuade people from reading the blog, I am trying to “read” one research paper a day (-ish) to get the new ideas running around my head. And you guessed it, I’m going to blog the interesting (to me) ideas here.

Denoising and Filtering Under the Probability of Excess Loss Criterion (PDF)
Stephanie Pereira and Tsachy Weissmann
Proc. 43rd Allerton Conf. Communication, Control, and Computing (2005)

This paper looks at the discrete denoising problem, which is related to filtering, estimation, and lossy source coding. Very briefly, the idea is that you have a iid sequence of pairs of discrete random variables taking values in a finite alphabet:

Where X is the “clean” source and Z is the “noisy” observation, so that the joint distribution is p(x,z) = p(x) p(z | x), where p(z | x) is some discrete memoryless channel. A denoiser is a set of mappings

so that g_i(z^n) is the “estimate” of X_i. One can impose many different constraints on these functions g_i. For example, they may be forced to operate only causally on the Z sequence, or may only use a certain subset of the Z‘s or only the symbol Z_i. This last case is called a symbol-by-symbol denoiser. The goal is to minimize the time-average of some loss function

This minimization is usually done on the expectation E[L_n], but this paper chooses to look at the the probability of exceeding a certain value P(L_n > D).

The major insight I got from this paper was that you can treat the of the loss function

as outputs of an source with time varying (arbitrarily varying) statistics. Conditioned on Z^n each h_k is independent with a distribution in a finite set of possible distributions. Then to bound the probability P(L_n > D), they prove a large deviations result on L_n, which is the time-average of the arbitrarily varying source.

Some of the other results in this paper are

  • For a Hamming loss function the optimal denoiser is symbol-by-symbol.
  • Among symbol-by-symbol denoisers, time-invariant ones are optimal.
  • An LDP for block denoisers and some analysis of the rate.

Most of the meat of the proofs are in a preprint which seems to still be in flux.

rewarding peer review

I’m not hot-shot enough to be asked to review papers yet, but I’ve looked over a few for others who wanted a second take on things, and it seems that the backlog of reviews, especially for conferences, is enormous. Here’s a set of recent (and not so recent) comments on the peer review process:

Larry Wasserman talks (from experience) about the problems of hostile reviewers and nasty politics.

Cosma Shalizi says there are many many more reasons to reject a paper than Wasserman, but that peer review should be reader-centric in focus.

David Feldman thinks that journals should give out free socks or something to reviewers so that there is at least some token appreciation of all the work they put into it.

Martin Grossglauser and Jennifer Rexford have another good take on the system.

Fundamentally it seems there are two problems to solve — reviewers have no incentive to review papers quickly, and the objective of the reviewing process is rarely articulated clearly. Socks and pools both seem like good steps in that direction. It seems to be one of those situations where trying small fixes now would be much better than trying to institute some huge shift in editorial processes across many journals all at once.

elsevier all over again

As if I didn’t need another reason to hate academic publishing giant Elsevier (other than their horrid price-gouging that is), apparently they are also complicit in arms-dealing. My first reaction was “WTF?” but apparently it’s true.

Imagine you are an academic who works their ass off on some research and submits it gratis for publication in an Elsevier journal. Elsevier turns around and puts an absurd markup on the journal, bundles it with a bunch of other journals that nobody really wants to read, and offers it to your school’s library. The library is faced with having a lousy periodical collection or a lousy book collection and ends up cutting back acquisitions. Meanwhile, Elsevier takes the profits and uses them to run an arms fair so that repressive regimes can buy clusterbombs to kill babies. How might that make you feel?