PaperCept, EDAS, and so on: why can’t we have nice things?

Why oh why can’t we have nice web-based software for academic things?

For conferences I’ve used PaperCept, EDAS (of course), Microsoft’s CMT, and EasyChair. I haven’t used HotCRP, but knowing Eddie it’s probably significantly better than the others.

I can’t think of a single time I’ve used PaperCept and had it work the way I expect. My first encounter was for Allerton, where it apparently would not allow quotation marks in the title of papers (an undocumented restriction!). But then again, why has nobody heard of sanitizing inputs? The IEEE Transactions on Automatic Control also uses PaperCept, and the paper review has a character restriction on it (something like 5000 or so). Given that a thorough review could easily pass twice that length, I’m shocked at this arbitrary restriction.

On the topic of journal software, the Information Theory Society semi-recently transitioned from Pareja to Manuscript Central. I have heard that Pareja, a home-grown solution, was lovable in its own way, but was also a bit of a terror to use as an Associate Editor. Manuscript Central’s editorial interface is like looking at the dashboard of a modern aircraft, however — perhaps efficient to the expert, but the interaction designers I know would blanche (or worse) to see it.

This semi-rant is due to an email I got about IEEE Collabratec (yeah, brah!):

IEEE is excited to announce the pilot rollout of a new suite of online tools where technology professionals can network, collaborate, and create – all in one central hub. We would like to invite you to be a pilot user for this new tool titled IEEE Collabratec™ (Formerly known as PPCT – Professional Productivity and Collaboration Tool). Please use the tool and tell us what you think, before we officially launch to authors, researchers, IEEE members and technology professionals like yourself around the globe.

What exactly is IEEE Collabratec?
IEEE Collabratec will offer technology professionals robust networking, collaborating, and authoring tools, while IEEE members will also receive access to exclusive features. IEEE Collabratec participants will be able to:

* Connect with technology professionals by location, technical interests, or career pursuits;
* Access research and collaborative authoring tools; and
* Establish a professional identity to showcase key accomplishments.

Parsing the miasma of buzzwords, my intuition is that this is supposed to be some sort of combination of LinkedIn, ResearchGate, and… Google Drive? Why does the IEEE think it has the expertise to pull off integration at this scale? Don’t get me wrong, there are tons of smart people in the IEEE, but this probably should be done by professionals, and not non-profit professional societies. How much money is this going to cost? The whole thing reminds me of Illinois politics — a lucrative contract given to a wealthy campaign contributor after the election, with enough marketing veneer to avoid raising a stink. Except this is the IEEE, not Richard [JM] Daley (or Rahm Emmanuel for that matter).

As far as I can tell, the software that we have to interact with regularly as academics has never been subjected to scrutiny by any user-interface designer. From online graduate school/faculty application forms (don’t get me started on the letter of rec interface), conference review systems, journal editing systems, and on, we are given a terrible dilemma: pay exorbitant amounts of money to some third party, or use “home grown” solutions developed by our colleagues. For the former, there is precious little competition and they have no financial incentive to improve the interface. For the latter, we are at the whims of the home code-gardener. Do they care about user experience? Is that their expertise? Do they have time to both make it functional and be a pleasure to use? Sadly, the answer is usually no, with perhaps a few exceptions.

I shake my fist at the screen.

Feature Engineering for Review Times

The most popular topic of conversation among information theory afficionados is probably the long review times for the IEEE Transactions on Information Theory. Everyone has a story of a very delayed review — either for their own paper or for a friend of theirs. The Information Theory Society Board of Governors and Editor-in-Chief have presented charts of “sub-to-pub” times and other statistics and are working hard on ways to improve the speed of reviews without impairing their quality. These are all laudable. But it occurs to me that there is room for social engineering on the input side of things as well. That is, if we treat the process as a black box, with inputs (papers) and outputs (decisions), what would a machine-learning approach to predicting decision time do?

Perhaps the most important (and overlooked in some cases) aspects of learning a predictor from real data is figuring out what features to measure about each of the inputs. Off the top of my head, things which may be predictive include:

  • length
  • number of citations
  • number of equations
  • number of theorems/lemmas/etc.
  • number of previous IT papers by the authors
  • h-index of authors
  • membership status of the authors (student members to Fellows)
  • associate editor handling the paper — although for obvious reasons we may not want to include this

I am sure I am missing a bunch of relevant measurable quantities here, but you get the picture.

I would bet that paper length is a strong predictor of review time, not because it takes a longer time to read a longer paper, but because the activation energy of actually picking up the paper to review it is a nonlinear function of the length.

Doing a regression analysis might yield some interesting suggestions on how to pick coauthors and paper length to minimize the review time. This could also help make the system go faster, no? Should we request these sort of statistics from the EiC?

Linkage

Some old links I meant to post a while back but still may be of interest to some…

I prefer my okra less slimy, but to each their own.

Via Erin, A tour of the old homes of the Mission.

Also via Erin, Women and Crosswords and Autofill.

A statistician rails against computer science’s intellectual practices.

Nobel Laureate Randy Schekman is boycotting Nature, Science, and Cell. Retraction Watch is skeptical.

Harvard Business Review’s underhanded game

For our first-year seminar, we wanted to get the students to read some the hyperbolic articles on data science. A classic example is the Harvard Business Review’s Data Scientist: The Sexiest Job of the 21st Century. However, when we downloaded the PDF version through the library proxy, we were informed:

Harvard Business Review and Harvard Business Publishing Newsletter content on EBSCOhost is licensed for the private individual use of authorized EBSCOhost users. It is not intended for use as assigned course material in academic institutions nor as corporate learning or training materials in businesses. Academic licensees may not use this content in electronic reserves, electronic course packs, persistent linking from syllabi or by any other means of incorporating the content into course resources

Harvard Business Publishing will be pleased to grant permission to make this content available through such means. For rates and permission, contact permissions@harvardbusiness.org.

So it seems that for a single article we’d have to pay extra, and since “any other means of incorporating the content” is also a violation, we couldn’t tell the students that they can go to the library website and look up an article in a publication whose name sounds like “Schmarbard Fizzness Enqueue” on sexy data science.

My first thought on seeing this restriction is that it would definitely not pass the fair use test, but then the fine folks at the American Library Association say that it’s a little murky:

Is There a Fair Use Issue? Despite any stated restrictions, fair use should apply to the print journal subscriptions. With the database however, libraries have signed a license that stipulates conditions of use, so legally are bound by the license terms. What hasn’t really been fully tested is whether federal law (i.e. copyright law) preempts a license like this. While librarians may like to think it does, there is very little case law. Also, it is possible that if Harvard could prove that course packs and article permission fees are a major revenue source for them, it would be harder to declare fair use as an issue and fail the market effect factor. In other cases as in Georgia State, the publishers could not prove their permissions business was that significant which worked against them. Remember that if Harvard could prove that schools were abusing the restrictions on use, they could sue.

Part of the ALA’s advice is to use “alternate articles to the HBR 500 supplied by other vendors that do not have these restrictions.” Luckily for us, there is no absence of hype on data science, so we could avoid it.

Given Harvard’s well-publicized open access policy and general commitment to sharing scholarly materials, the educational restriction on using materials strikes me as rank hypocrisy. Of course, maybe HBR is not really a venue for scholarly articles. Regardless, I would urge anyone considering including HBR material in their class to think twice before playing their game. Or to indulge in some civil disobedience, but this might end up hurting the libraries and not HBR, so it’s hard to figure out what to do.

Linkage

A taste test for fish sauces.

My friend Ranjit is working on this Crash Course in Psychology. Since I’ve never taken psychology, I am learning a lot!

Apparently the solution for lax editorial standards is to scrub away the evidence. (via Kevin Chen).

Some thoughts on high performance computing vs. Map Reduce. I think about this a fair bit, since some of my colleagues work on HPC, which feels like a different beast than a lot of the problems I’ve been thinking about.

A nice behind-the-scenes on Co-Op Sauce, a staple at Chicagoland farmers’ markets.

Linkage

A map of racial segregation in the US.

Vi Hart explains serial music (h/t Jim CaJacob).

More adventures in trolling scam journals with bogus papers (h/t my father).

Brighten does some number crunching on his research notebook.

Jerry takes “disruptive innovation” to task.

Vladimir Horowitz plays a concert at the Carter White House. Also Jim Lehrer looks very young. The program (as cribbed from YouTube)

  • The Star-Spangled Banner
  • Chopin: Sonata in B-flat minor, opus 35, n°2
  • Chopin: Waltz in a minor, opus 34, n°2
  • Chopin: Waltz in C-sharp minor, opus 64, n° 2
  • Chopin: Polonaise in A-flat major, opus 53 ,Héroïque
  • Schumann: Träumerei, Kinderszene n°7
  • Rachmaninoff: Polka de W.R
  • Horowitz: Variations on a theme from Bizet’s Carmen

The Simons Institute is going strong at Berkeley now. Moritz Hardt has some opinions about what CS theory should say about “big data,” and how it might be require some adjustments to ways of thinking. Suresh responds in part by pointing out some of the successes of the past.

John Holbo is reading Appiah and makes me want to read Appiah. My book queue is already a bit long though…

An important thing to realize about performance art that makes a splash is that it can be often exploitative.

Mimosa shows us what she sees.

Eisen’s comments on the future of scholarly publishing

Michael Eisen gave a talk at the Commonwealth Club in San Francisco recently. Eisen is the founder of the Public Library of Science (PLoS), which publishes a large number of open-access journals in the biosciences, including the amazingly named PLoS Neglected Tropical Diseases. His remarks begin with the background on the “stranglehold existing journals have on academic publishing.” But he also has this throwaway remark:

One last bit of introduction. I am a scientist, and so, for the rest of this talk, I am going to focus on the scientific literature. But everything I will say holds equally true for other areas of scholarship.

This is simply not true — one cannot generalize from one domain of scholarship to all areas of scholarship. In fact, it is in the differences between dysfunctions of academic communication across areas that we can understand what to do about it. It’s not just that this is a lazy generalization, but rather that the as Eisen paints it, in science the journals are more or less separate from the researchers and parasitic entities. As such, there are no reasons that people should publish with academic publishers except for some kind of Stockholm syndrome.

In electrical engineering and computer science the situation is a bit different. IEEE and ACM are not just publishing conglomerates, but are supposed to be the professional societies for their respective fields. People gain professional brownie points for winning IEEE or ACM awards, they can “level up” by becoming Senior Members, and so on. Because disciplinary boundaries are a little more fluid, there are several different Transactions in which a given researcher may publish. At least on paper, IEEE and ACM are not-for-profit corporations. This is not to say that engineering researchers are not suffering from a Stockholm syndrome effect with these professional societies. It’s just that the nature of the beast is different, and when we talk about how IEEExplore or ACM Digital Library is overpriced, that critique should be coupled with one of IEEE’s policy requiring conferences to have a certain profit level. These things are related.

The second issue I had is with Eisen’s proposed solution:

There should be no journal hierarchy, only broad journals like PLOS ONE. When papers are submitted to these journals, they should be immediately made available for free online – clearly marked to indicate that they have not yet been reviewed, but there to be used by people in the field capable of deciding on their own if the work is sound and important.

So… this already exists for large portions of mathematics and mathematical sciences and engineering in the form of ArXiV. The added suggestion is a layer of peer-review on top, so maybe ArXiV plus a StackExchange thing. Perhaps this notion is a radical shift for life sciences where Science and Nature are so dominant, but what I learn myself from looking at the ArXiV RSS feed is that the first drafts of papers that get put up there are usually not the clearest exposition of the work, and without some kind of community sanction (in the form of rejection), there is little incentive for authors to actually go back and make a cleaner version of their proof. If someone has a good idea or result but a confusing presentation they are not going to get downvoted. If someone is famous they are unlikely to get downvoted.

In the end what PLoS ONE and the ArXiV-only model for publishing does is reify and retrench the existing tit-for-tat “clubbiness” that exists in smaller academic communities. In a lot of CS conferences reviewing is double-blind as a way to address this very issue. When someone says “all academic publishing has the same problems” this misses the point, because the problems is not always with publishing but with communication. We need to understand the how the way we communicate the products scholarly knowledge is broken. In some fields, I bet you could argue that papers are inefficient and bad ways of communicating results. In this sense, academic publishing and its rapacious nature are just symptoms of a larger problem.