Wednesday, May 15, 2013

IUI 2013 Reviewing Statistics & Results

Over the past year, I served as co-program chair for Intelligent User Interfaces 2013 with Pedro Szekely. We modified the review process somewhat from previous versions of the conference, and feedback from those attending IUI this year was that the program was strong and interesting.

Over the course of at least two blog posts, I want to write a little bit about the process that we used and to discuss the results of that process.  In this second post, I describe the results of the reviewing process and show various graphs and statistics.

In total, 192 papers were submitted to IUI 2013. We accepted 43 of these papers, which is a 22% acceptance rate and within the 20-25% range that we were hoping to achieve.  Submissions came from countries around the world, as can be seen in the following graph.


One question that we had was about the reviewer pool. We intended for the Senior Program Committee members handling each paper to find the most qualified reviewers for each paper rather than, for example, choosing from the small set of people that they knew well even if those people were not necessarily qualified. From the following graph, we can see that most reviewers contributed just one review, which would seem to indicate that SPC members did their job in finding uniquely qualified reviewers for many papers.


One question that some people ask with respect to this is about consistency. Specifically, how can decisions be made consistently when most reviewers review only one or two papers? The answer to this is in the work of the SPC members, who aggregate all of the reviews for each paper when writing their meta-review and also have visibility across multiple papers to help calibrate. The SPC members' knowledge of their reviewers and their expertise also comes into play when writing the meta-reviews and calibrating across different reviewers. The majority of decisions were also made during the two SPC meetings where many submissions were discussed, allowing for further calibration.

We also looked at the distributed across the reviewer ratings and their self-reported expertise on the papers they reviewed. These can be seen in the following graphs.


The breakdown for review scores makes some sense given the final acceptance rate. The majority of papers are rejected, and thus it is not surprising to see that low scores dominate the overall number.

We believe the expertise distribution suggests that this process at least partially helped achieve our goal of finding better reviewers for each submission. First, we're happy with only 20% of reviewers indicating an expertise of 2 or below, though we'd certainly like to see this be even smaller in the future. The large number of 3 ratings is encouraging, especially because in our experience many well qualified reviewers will hesitate to give themselves a top ranking of 4, perhaps because they are more aware of what they don't know.

Finally, a controversial decision in this year's process was to eliminate the short paper category (previously a 4 page maximum archival category) and to include explicit language in the call for papers and in the instructions to reviewers to rate the contribution of the paper in proportion to the length of the paper. This is the same practice that has been in place at SIGGRAPH for some time and has recently been adopted by the UIST and CSCW communities. An important question is, what was the impact on shorter papers? Did they have a harder time being accepted under this new policy?


This spreadsheet and graph show the submission results broken down by page length. From the data, we can clearly see that longer papers had a better chance of acceptance and that no papers of 4 pages or less were accepted by the conference.  It is also the case that very few 4 page or shorter papers were submitted however, so it is hard to draw a clear conclusion from this sample. It is well known that shorter papers have greater difficulty getting accepted even when there is an explicit short paper category. Informally, we've heard that acceptance rates are often in the 10-15% range. This means we might have expected one paper to be accepted from the 4 pages and under category, and the fact that we didn't have one this year might be ascribed to random chance.

Papers in the 5-7 page range did receive greater consideration, with an overall acceptance rate of 7.5%. While this is somewhat lower than what we've heard for conferences with an explicit short paper category, we are happy that some short papers were accepted to the conference.

We do wonder if our initial policy of disallowing conditional accepts and shepherding had an impact on short paper acceptance rates. Shorter papers are more difficult to write, and we know that in at least one case an interesting short paper was rejected because it had substantial writing flaws that the reviewers were not confident that the authors could address without shepherding during the camera-ready submission process. Had we had a clearer policy allowing conditional accepts and shepherding at the discretion of the SPC member, then perhaps the accept rates on shorter papers would have increased a small amount.

Going forward, I suspect that the short paper category will return in future years. The best argument I've heard so far is that members of the AI community may not submit some of their work if the category does not exist, as it often does in AI conferences, and making sure to cater to both the AI and HCI communities will be important if IUI is to grow and thrive.

No comments:

Post a Comment