notes on a review

I received the following TPC review recently (a rejection):

The current version of the paper is incomplete, as important proofs (the novel results…) are left to supplementary material. This can be resolved, however would require a major structural change.

I think I would have preferred the TPC to simply say “we had too many papers, and yours wasn’t in the top X%,” rather then append this completely nonsensical reason for rejection. We put the proofs in the supplementary material because of space constraints. We could just as easily have omitted other things and put the proofs in the main body by doing some minor cutting and pasting. It may be a “major structural change,” but its also trivial. Perhaps they thought the paper was poorly written, but they did not say that.

Of course I’m disappointed that the paper wasn’t accepted, especially given that all the reviewers recommended acceptance. It’s clear that the real reason the TPC rejected us was that the scores were not high enough and they had to reject a lot of papers. It sucks to be on the bad side of a subjective decision, but it happens to everyone. Making up a pseudo-objective reason is about as useful as a little white lie. As it is, this description is about as principled as “your paper has too many authors,” or “your bibliography is too long” or “we cannot accept any more papers starting with the letter D.” There’s always the next deadline, anyway.

McGill’s policy on “harmful consequences”

McGill University is contemplating ending “a requirement that any professor receiving research support from the military indicate whether the research could have ‘direct harmful consequences.'” The proponents of striking the measure say that all research should be scrutinized for harmful consequences, whereas the opponents say that it opens the gates for the US defense industry to shift the Canadian (Canadien?) research agenda.

I’m surprised they even had such a provision in the first place, given the existing injunctions against secret/classified research.

This reminds me a discussion last night at dinner, where my friend told us about a book by UCSD professor Chandra Mukerji called A Fragile Power: Scientists and the State, which talks a bit about science is dependent on state (and military funding) and how the state views scientists as a kind of “reserve force” of experts whose knowledge may become crucial later.

ITA Workshop Aftermath

The 2010 ITA Workshop is over, and now that I have almost caught up on sleep, I can give a short report. I think this year was the largest workshop so far, with over 300 speakers.

One of the additions this year was a poster session to give those students who didn’t get a chance to speak at the “Graduation Day” talks on Wednesday an opportunity to present their research. The posters went up on Monday and many stayed up most of the week. I am usually dubious of poster sessions for more theoretical topics; from past experience I have had a hard time conveying anything useful in the poster and they are poorly attended. However, this poster session seemed to be a rousing success, and I hope they continue to do it in future years.

The theme this year was “networks,” broadly construed, and although I didn’t end up going to a lot of the “pure” networking sessions, the variety of topics was nice to see, from learning in networks to network models. Jon Kleinberg gave a great plenary lecture on cascading phenomena. I particularly enjoyed Mor Harchol-Balter’s tutorial on job scheduling in server farms. I learned to re-adjust my intuition for what good scheduling policies might be. The tutorials should be online sometime and I’ll post links when that happens.

The “Senseless Decompression” session organized by Rudi Urbanke and Ubli Mitra should also be online sometime. Apparently 20+ schools contributed short videos on the theme of \frac{1}{2} \log(1 + \mathrm{SNR}). Maybe we can make a YouTube channel for them.

Perhaps later I’ll touch on specific talks that I went to, but this time around I didn’t take too many notes, probably because I was a little exhausted from the organizing. I may post a more technical thing on the talk I gave about my recent work on privacy-preserving machine learning, but that will have to wait a bit in the queue. Mor’s tutorial suggests I should use Shortest Remaining Processing Time (SRPT) to make sure things don’t wait too long, and I have some low-hanging tasks that I can dispatch first.

Tips for writing

As a postdoc at a school with a gigantic biosciences program and surrounded by other biomedical research institutes (Scripps, Burnham, etc), a lot of the professional development workshops offered here are not specifically helpful to me. For example, I went to a workshop on writing grants, but it was almost entirely focused on NIH grants; the speaker said he had never applied to the NSF for a grant. Still, I did pick up general tips and strategies about the process of writing a grant. In the same vein, I read an article in The Scientist (registration required) about improving scientific writing which offered ideas applicable to technical writing in general. One that stuck out for me was:

Write daily for 15 to 30 minutes
During your daily writing sessions, don’t think about your final manuscript. Just write journal entries, says Tara Gray, director of the teaching academy that provides training and support to New Mexico State University professors. “People think there’s two phases of a research project—doing the research and writing it up,” she says. Rather than setting aside large chunks of time for each activity, combine them to improve your writing and your research. The first time Gray encouraged a group of faculty members at New Mexico State to adhere to this schedule for three months, they wrote about twice as much as their normal output.

I think I’ll try doing this. I often complain that I live an “interrupt-driven” lifestyle, but sometimes flailing on some very involved epsilonics at the last minute to get something to work results in errors, tension, and woe.

In case you are in Austin…

I’m giving a talk on Friday, so come on down! This has been your daily self-promotion effort, thank you for reading.

Consensus in context : leveraging the network to accelerate distributed consensus

October 30, 2009 ENS 637
11:00 am

Gossip algorithms are a class of decentralized solutions to the problem of achieving consensus in a network of agents. They have attracted recent research interest because they are simple and robust — attractive qualities for wireless ad-hoc and sensor networks. Unfortunately, the standard gossip protocol converges very slowly for many popular network models. I will discuss three ways to leverage properties of the network to achieve faster convergence : routing, broadcast, and mobility.

Joint work with Alex G. Dimakis, Tuncer Can Aysal, Mehmet Ercan Yildiz, Martin Wainwright, and Anna Scaglione.

Criticism of open access is backwards, as usual

Inside Higher Ed has a piece today on the presidents of liberal arts colleges writing to support the Federal Research Access Act of 2009. The law would make federal agencies that sponsor research come up with methods for archiving and publishing research that they fund so it would be “made immediately available” to the public. It would (essentially) apply only to journal papers, which raises a question about computer science, which lives the fast-and-dangerous conference life.

The article ends with a reaction from “Martin Frank, executive director of the American Physiological Society and coordinator of the Washington D.C. Principles for Free Access to Science,” who claimed that the since there are many foreign journal subscribers, the argument that taxpayers should have access to the research is not very strong. Frank is concerned with non-profit publishers (such as professional societies like the IEEE), but in his eagerness to protect his own turf he completely ignores the fact that mega-publishers like Elsevier and the Nature Publishing Group are based in other countries. Elsevier is Headquartered in Amsterdam and NPG is run by Macmillan, which “is itself owned by German-based, family run company Verlagsgruppe Georg von Holtzbrinck GmbH.”

If Mr. Frank wants to make a nativist argument against an open access mandate, then perhaps he should support a ban on wasting American taxpayer dollars to fund foreign publishing houses. The whole “taxpayer” argument in the end is marketing for both sides — although in principle any citizen should have access to government-funded research, the real volume comes from universities and industry. Federal money is used many times over for the same piece of research — once to fund it and then once for every (public) university library which has to buy a subscription to the journal where the result was published. University libraries will not stop subscribing to the IEEE journals just because the NSF and DARPA funded research will be made available in (probably separate) repositories run by the NSF and DARPA. If a non-profit is publishing its journals at cost then they should still be affordable. The for-profit publishers are the ones who will have to realize that the “value added” by the Nature brand is not worth the markup they charge.

Samidh Chakrabarti on Transacting Philosophy

I recently re-read my old roommate Samidh Chakrabarti’s master’s thesis : Transacting Philosophy : A History of Peer Review in Scientific Journals (Oxford, 2004). It’s a fascinating history of scientific publishing from the Royal Society up to the present, and shows that “peer review has never been inseparable from the scientific method.” His analysis is summed up in the following cartoon, which shows three distinct phases of peer review:
SamidhModel
When there are few journals but a large supply of papers, peer review is necessary to select the papers to be published. However, when printing became cheap in the 19th century, everybody and their uncle had a journal and sometimes had to solicit papers to fill their pages. After WWII the trend reversed again, so now peer review is “in.” In this longish post I’m going to summarize/highlight a few things I learned.

The first scientific journal was started by the Royal Society, called Philosophical Transactions: giving some Account of the Present Undertakings, Studies and Labours of the Ingenious in many considerable Parts of the World, but is usually shortened to Phil. Trans.. Henry Oldenburg, the secretary of the Society, came up with the idea of using referees. Samidh’s claim is that Oldenburg was motivated by intellectual property claims. Time stamps for submitted documents would let philosophers establish when they made a discovery — Olderburg essentially made Phil. Trans. the arbiter of priority. However, peer review was necessary to provide quality guarantees, since the Royal Society was putting their name on it. He furthermore singled out articles which were not reviewed by putting the following disclaimer:

sit penes authorem fides [let the author take responsibility for it]: We only set it downe, as it was related to us, without putting any great weight upon it.”

Phil. Trans. was quite popular but not profitable. The Society ended up taking over the full responsibility (including fiscal) of the journal, and decided that peer review would not be about endorsing the papers or guaranteeing correctness:

And the grounds of their choice are, and will continue to be, the importance or singularity of the subjects, or the advantageous manner of treating them; without pretending to answer for the certainty of the facts, or propriety of the reasonings, contained in the several papers so published, which must still rest on the credit or judgment of their respective authors.

In the 19th century all this changed. Peer review began to smack of anti-democracy (compare this to the intelligent design crowd now), and doctors of medicine were upset ever since Edward Jenner’s development of the vaccine for smallpox in 1796 was rejected by the Royal Society for having too small a sample size. Peer review made it tough for younger scientists to be heard, and politics played no small role in papers getting rejected. Those journals which still practiced peer review sometimes paid a hefty price. Samidh writes of Einstein:

In 1937 (a time when he was already a celebrity), he submitted an article to Physical Review, one of the most prestigious physics journals. The referees sent Einstein a letter requesting a few revisions before they would publish his article. Einstein was so enraged by the reviews that he fired off a letter to the editor of Physical Review in which he strongly criticized the editor for having shown his paper to other researchers… he retaliated by never publishing in Physical Review again, save a note of protest.

The 19th century also saw the rise of cheap printing and the industrial revolution which created a larger middle class that was literate and interested in science. A lot hadn’t been discovered yet, and an amateur scientist could still make interesting discoveries with their home microscope. There was a dramatic increase in magazines, journals, gazettes, and other publications, each with their own editor, and each with a burning need to fill their pages.

The content of these new scientific journals became a reflection of the moods and ideas of their editors. Even the modern behemoths, Science and Nature, used virtually no peer review. James McKeen Cattell, the editor of Science from 1895-1944 got most of his content from personal solicitations. The editor of Nature would just ask people around the office or his friends at the club. Indeed, the Watson-Crick paper on the structure of DNA was not reviewed because the editor said “its correctness is self-evident.”

As the 20th century dawned, science became more specialized and discoveries became more rapid, so that editors could not themselves curate the contents of their journals. As the curve shows, the number of papers written started to exceed the demand of the journals. In order to maintain their competitive edge and get the “best” papers, peer review became necessary again.

Another important factor was the rise of Nazi Germany and the corresponding decline of German science as Jewish and other scientists fled. Elsevier hired these exiles to start a number of new journals with translations into English, and became a serious player in the scientific publishing business. And it was a business — Elsevier could publish more “risky” research because it had other revenue streams, and so it could publish a large volume of research than other publishers. This was good and bad for science as a whole — journals were published more regularly, but the content was mixed. After the war, investment in science and technology research increased; since the commercial publishers were more established, they had an edge.

How could the quality of a journal be measured?

Eugene Garfield came up with a method of providing exactly this kind of information starting in 1955, though it wasn’t his original intent. Garfield was intrigued by the problem of how to trace the lineage of scientific ideas. He wanted to know how the ideas presented in an article percolated down through other papers and led to the development of new ideas. Garfield drew his inspiration from law indexes. These volumes listed a host of court decisions. Under each decision, they listed all subsequent decisions that used it as a precedent. Garfield realized that he could do the same thing with scientific papers using bibliographical citations. He conceived of creating an index that not only listed published scientific articles, but also listed all subsequent articles that cited each article in question. Garfield founded the Institute for Scientific Information (ISI) to make his vision a reality. By 1963, ISI had published the first incarnation of Garfield’s index, which it called the Science Citation Index.

And hence the impact factor was born — a ratio of citations to citable articles. This proved to be helpful to librarians as well as tenure and promotion committees. They just had to look at the aggregate impact of a professor’s research. Everything became about the impact factor, and the way to improve the impact factor of a journal was to improve the quality (or at least perceived quality) of its peer review. And fortunately, most of it was (and is) given for free — “unpaid editorial review is the only thing keeping the journal industry solvent.” However, as Samidh puts it succinctly in his thesis:

All of this sets aside the issue of whether the referee system in fact provides the best possible quality control. But this merely underscores the fact that in the historical record, the question of peer review’s efficacy has always been largely disconnected from its institutionalization. To summarize the record, peer review became institutionalized largely because it helped commercial publishers inexpensively sustain high impact factors and maintain exalted positions in the hierarchy of journals. Without this hierarchy, profits would vanish. And without this hierarchy, the entire system of academic promotion in universities would be called into question. Hence, every scientist’s livelihood depends on peer review and it has become fundamental to the professional organization of science. As science is an institution chiefly concerned with illuminating the truth, it’s small wonder, then, that editorial peer review has become confused with truth validation.

It seems all like a vicious cycle — is there any way out? Samidh claims that we’re moving to a “publish, then filter” approach where things are put on ArXiV and then are reviewed. He’s optimistic about “a system where truth is debated, not assumed, and where publication is for the love of knowledge, not prestige.” I’m a little more dubious, to be honest. But it’s a fascinating history, and some historical perspective may yield clues about how to design a system with the right incentives for the future of scientific publishing.

Visit to University of Washington

After ISIT I went to visit the Electrical Engineering Department at the University of Washington. I was invited up there by Maya Gupta, who told me about a little company she started called Artifact Puzzles.

On the research end of things, I learned a lot about the the learning problems her group is working on and their applications to color reproduction. I also got a chance to chat with Maryam Fazel about rank minimization problems, Marina Meilă about machine learning and distance models for rankings (e.g. the Fligner-Verducci model), and David Thorsley about self-assembling systems and consensus problems. All in all I learned a lot!

On the social side I got to hang out with friends in Seattle and UW and hiked for an afternoon at Mt. Ranier. Photos are on Picasa!

ISIT 2009 : talks part four

The Gelfand-Pinsker Channel: Strong Converse and Upper Bound for the Reliability Function
Himanshu Tyagi, Prakash Narayan
Strong Converse for Gel’fand-Pinsker Channel
Pierre Moulin

Both of these papers proved the strong converse for the Gel’fand-Pinsker channel, e.g. the discrete memoryless channel with iid state sequence P_S, where the realized state sequence is known ahead of time at the encoder. The first paper proved a technical lemma about the image size of “good codeword sets” which are jointly typical conditioned on a large subset of the typical set of S^n sequences. That is, given a code and a set of almost \exp(n H(P_S)) typical sequences in S^n for which the average probability of error is small, then they get a bound on the rate of the code. They then derive bounds on error exponents for the channel. The second paper has a significantly more involved argument, but one which can be extended to multiaccess channels with states known to the encoder.

Combinatorial Data Reduction Algorithm and Its Applications to Biometric Verification
Vladimir B. Balakirsky, Anahit R. Ghazaryan, A. J. Han Vinck

The goal of this paper was to compute short fingerprints f(\mathbf{x}) from long binary strings \mathbf{x} so that a verifier can look at a new long vector \mathbf{y} and tell whether or not \mathbf{y} is close to \mathbf{x} based on f(\mathbf{x}). This is a little different from hashing, where we could first compute f(\mathbf{y}). They develop a scheme which stores the index of a reference vector \mathbf{c} that is “close” to \mathbf{x} and the distance d(\mathbf{x},\mathbf{c}). This can be done with low complexity. They calculated false accept and reject rates for this scheme. Since the goal is not reconstruction or approximation, but rather a kind of classification, they can derive a reference vector set which has very low rate.

Two-Level Fingerprinting Codes
N. Prasanth Anthapadmanabhan, Alexander Barg

This looks at a variant of the fingerprinting problem, in which you a content creator makes several fingerprinted versions of an object (e.g. a piece of software) and then a group of pirates can take their versions and try to create a new object with a valid fingerprint. The marking assumption mean that the pirates can only alter the positions in their copies which are different. The goal is to build a code such that a verifier looking at an object produced by t pirates can identify at least one of the pirates. In the two-level problem, the objects are coarsely classified into groups (e.g. by geographic region) and the verifier wants to be able to identify one of the groups of one of the pirates when there are more than t pirates. They provide some conditions for traceability and constructions, This framework can also be extended to multiple levels.