A taste test for fish sauces.

My friend Ranjit is working on this Crash Course in Psychology. Since I’ve never taken psychology, I am learning a lot!

Apparently the solution for lax editorial standards is to scrub away the evidence. (via Kevin Chen).

Some thoughts on high performance computing vs. Map Reduce. I think about this a fair bit, since some of my colleagues work on HPC, which feels like a different beast than a lot of the problems I’ve been thinking about.

A nice behind-the-scenes on Co-Op Sauce, a staple at Chicagoland farmers’ markets.

A map of racial segregation in the US.

Vi Hart explains serial music (h/t Jim CaJacob).

More adventures in trolling scam journals with bogus papers (h/t my father).

Brighten does some number crunching on his research notebook.

Jerry takes “disruptive innovation” to task.

Vladimir Horowitz plays a concert at the Carter White House. Also Jim Lehrer looks very young. The program (as cribbed from YouTube)

  • The Star-Spangled Banner
  • Chopin: Sonata in B-flat minor, opus 35, n°2
  • Chopin: Waltz in a minor, opus 34, n°2
  • Chopin: Waltz in C-sharp minor, opus 64, n° 2
  • Chopin: Polonaise in A-flat major, opus 53 ,Héroïque
  • Schumann: Träumerei, Kinderszene n°7
  • Rachmaninoff: Polka de W.R
  • Horowitz: Variations on a theme from Bizet’s Carmen

The Simons Institute is going strong at Berkeley now. Moritz Hardt has some opinions about what CS theory should say about “big data,” and how it might be require some adjustments to ways of thinking. Suresh responds in part by pointing out some of the successes of the past.

John Holbo is reading Appiah and makes me want to read Appiah. My book queue is already a bit long though…

An important thing to realize about performance art that makes a splash is that it can be often exploitative.

Mimosa shows us what she sees.

Michael Eisen gave a talk at the Commonwealth Club in San Francisco recently. Eisen is the founder of the Public Library of Science (PLoS), which publishes a large number of open-access journals in the biosciences, including the amazingly named PLoS Neglected Tropical Diseases. His remarks begin with the background on the “stranglehold existing journals have on academic publishing.” But he also has this throwaway remark:

One last bit of introduction. I am a scientist, and so, for the rest of this talk, I am going to focus on the scientific literature. But everything I will say holds equally true for other areas of scholarship.

This is simply not true — one cannot generalize from one domain of scholarship to all areas of scholarship. In fact, it is in the differences between dysfunctions of academic communication across areas that we can understand what to do about it. It’s not just that this is a lazy generalization, but rather that the as Eisen paints it, in science the journals are more or less separate from the researchers and parasitic entities. As such, there are no reasons that people should publish with academic publishers except for some kind of Stockholm syndrome.

In electrical engineering and computer science the situation is a bit different. IEEE and ACM are not just publishing conglomerates, but are supposed to be the professional societies for their respective fields. People gain professional brownie points for winning IEEE or ACM awards, they can “level up” by becoming Senior Members, and so on. Because disciplinary boundaries are a little more fluid, there are several different Transactions in which a given researcher may publish. At least on paper, IEEE and ACM are not-for-profit corporations. This is not to say that engineering researchers are not suffering from a Stockholm syndrome effect with these professional societies. It’s just that the nature of the beast is different, and when we talk about how IEEExplore or ACM Digital Library is overpriced, that critique should be coupled with one of IEEE’s policy requiring conferences to have a certain profit level. These things are related.

The second issue I had is with Eisen’s proposed solution:

There should be no journal hierarchy, only broad journals like PLOS ONE. When papers are submitted to these journals, they should be immediately made available for free online – clearly marked to indicate that they have not yet been reviewed, but there to be used by people in the field capable of deciding on their own if the work is sound and important.

So… this already exists for large portions of mathematics and mathematical sciences and engineering in the form of ArXiV. The added suggestion is a layer of peer-review on top, so maybe ArXiV plus a StackExchange thing. Perhaps this notion is a radical shift for life sciences where Science and Nature are so dominant, but what I learn myself from looking at the ArXiV RSS feed is that the first drafts of papers that get put up there are usually not the clearest exposition of the work, and without some kind of community sanction (in the form of rejection), there is little incentive for authors to actually go back and make a cleaner version of their proof. If someone has a good idea or result but a confusing presentation they are not going to get downvoted. If someone is famous they are unlikely to get downvoted.

In the end what PLoS ONE and the ArXiV-only model for publishing does is reify and retrench the existing tit-for-tat “clubbiness” that exists in smaller academic communities. In a lot of CS conferences reviewing is double-blind as a way to address this very issue. When someone says “all academic publishing has the same problems” this misses the point, because the problems is not always with publishing but with communication. We need to understand the how the way we communicate the products scholarly knowledge is broken. In some fields, I bet you could argue that papers are inefficient and bad ways of communicating results. In this sense, academic publishing and its rapacious nature are just symptoms of a larger problem.

I’ve been trying to get a camera-ready article for the Signal Processing Magazine and the instructions from IEEE include the following snippet:

*VERY IMPORTANT: All source files ( .tex, .doc, .eps, .ps, .bib, .db, .tif, .jpeg, …) may be uploaded as a single .rar archived file. Please do not attempt to upload files with extensions .shs, .exe, .com, .vbs, .zip as they are restricted file types.

While I have encountered .rar files before, I was not very familiar with the file format or its history. I didn’t know it’s a proprietary format — that seems like a weird choice for IEEE to make (although no weirder than PDF perhaps).

What’s confusing to me is that ArXiV manages to handle .zip files just fine. Is .tgz so passé now? My experience with RAR is that it is good for compressing (and splitting) large files into easier-to-manage segments. All of that efficiency seems wasted for a single paper with associated figures and bibliography files and whatnot.

I was trying to find the actual compression algorithm, but like most modern compression software, the innards are a fair bit more complex than the base algorithmic ideas. The Wikipedia article suggests it does a blend of Lempel-Ziv (a variant of LZ77) and prediction by partial matching, but I imagine there’s a fair bit of tweaking. What I couldn’t figure out is if there is a new algorithmic idea in there (like in the Burrows-Wheeler Transform (BWT)), or it’s more a blend of these previous techniques.

Anyway, this silliness means I have to find some extra software to help me compress. SimplyRAR for MacOS seems to work pretty well.

I signed a petition to the White House a while ago about increasing public access to government-funded research — if a petition gets 100,000 signatures then they White House will draft a response. Some of the petitions are silly, but generate amusing responses, c.f. This Isn’t the Petition Response You’re Looking For on government construction of a Death Star. The old threshold was 60K, which the petition I signed passed. On Friday I got the official response from John Holdren, the Director of the White House Office of Science and Technology Policy. The salient bit is this one:
(more…)

Aaron Swartz, who most recently made headlines for expropriating a large amount of information that was on JSTOR and making it available to the public, committed suicide. Cory Doctorow has a remembrance of Aaron and also a reminder of how we should remember how terrible depression can be. In making sense of what happened it’s tempting to say the threat of prosecution was the “cause,” but we shouldn’t lose sight of the person and the real struggles he was going through.

This is an amazing video that makes me miss the Bay Area. (via Bobak Nazer)

Also via Bobak, we’re number 8 and 10!

Since it’s holiday season, I figured it’s time to link to some profanity-laden humor about the holidays. For the new, The Hater’s Guide to the Williams-Sonoma Catalog, and the classic It’s Decorative Gourd Season….

A Game of Food Trucks. (via MetaFilter)

Larry Wasserman takes on the Bayesian/Frequentist debate.

LCD Soundsystem + Miles Davis youtube mashup.

My friend Erik, who started the Mystery Brewing Company, has a blog called Top Fermented. He is now starting a podcast, which also has an RSS feed.

Here’s a little bit of news which may have escaped notice by some in the information theory community:

“In view of its concerns about excessive reviewing delays in the IT Transactions, the BoG authorizes the EiC in his sole judgment to delay publication of papers by authors who are derelict in their reviewing duties.”

Reviewers may be considered to be derelict if they habitually decline or fail to respond to review requests, or accept but then drop out; or if they habitually submit perfunctory and/or excessively delayed reviews.

I’ve said it before, but I think that transparency is the thing which will make reviews more timely — what is the distribution of review times? What is the relationship between paper length and review length? Plots like these may shock people and also give a perspective on their own behavior. I bet some people who are “part of the problem” don’t even realize that they are part of the problem.

So IEEE wants PDFs that appear on IEEExplore to have two properties:

  • all fonts are embedded
  • the compatibility level is 1.4

Seems simple, right? Except that their instructions for PDF Express are for those who use Adobe Distiller, which I don’t have. You’d think there would be a simple workaround, but no…

This post suggests using ps2pdf command line options, which works if all of your figures are in EPS, but not if you have PDF or JPG figures. Daniel Lemire suggests converting the PDF to PS and then back to PDF.

That didn’t really work for me — I alternately got errors saying they wanted Adobe version 5 or higher (corresponding to compatibility level 1.4) or that fonts were not embedded. I blame Mac OS. On the 10th attempt at uploading, I finally got it to work. Here’s what I did:

  1. Generate the PDF however you like (command line or TeXShop)
  2. Open the PDF in Preview, duplicate, and save a copy. This will embed the fonts but make the PDF version 1.3 or something. Say the file is called copy.pdf.
  3. In a terminal, run pdf2ps copy.pdf to generate copy.ps. This will create a PS file with the fonts embedded.
  4. Run pdf2ps14 -dEmbedAllFonts=true copy.ps to generate a new version of copy.pdf that is both 1.4 and has fonts.

This is dumb. I wasted about an hour on this idiocy and still don't understand why it's such a pain. It seems that on a Mac, dvips does not embed fonts properly by default, and pdflatex also cuts corners. Furthermore, it doesn't seem like one can pass command line options (and make them default in TexShop) to automate this process.

I am sure there are better ways of doing this, but for the time being, this at least works.

I was walking back from a seminar today and talking to Yury Makarychev and he mentioned that he and his brother Konstantin had written a paper and submitted it to the IT Transactions more than 10 years ago on a new proof of the Gács-Körner result that common information is much less than the mutual information. They submitted it, got reviews back, submitted a revised version, and then it was lost in the aether of Pareja. Now, a decade later, it is finally available to read and will appear in a future issue.

Follow

Get every new post delivered to your Inbox.

Join 907 other followers