I got this email yesterday:
Dear Author of a NIPS 2014 Submission,
You are in for a treat! This year we will carry out an experiment that will give us insight to the fairness and consistency of the NIPS reviewing process. 10% of the papers, selected at random, will be duplicated and handled by independent Area Chairs. In cases where the Area Chairs arrive at different recommendations for accept/reject, the papers will be reassessed and a final recommendation will be determined.
I welcome this investigation — as an author and reviewer, I have found the NIPS review process to be highly variable in terms of the thoroughness of reviews, discussion, and the consistency of scores. I hope that the results of this experiment are made more publicly available — what is the variance of the scores? How do score distributions vary by area chair (a proxy for area)? There are a lot of ways to slice the data, and I would encourage the organizing committee to take the opportunity to engage with the “NIPS community” to investigate the correlation between the numerical measures provided by the review process and the outcomes.