The NIPS ExperimentIn contrast to previous years, the most talked about thing from NIPS this year was not any new machine learning approach, but rather a reviewing experiment called the NIPS Experiment.
In a nutshell, about 10% of submissions were reviewed independently by two sets of reviewers (including two different Area Chairs). The goal of the NIPS Experiment was to assess to what extent reviewers agreed on accept/reject decisions. The outcome of the experiment has been a challenge to interpret properly.
The provocative and thought provoking blog post by Eric Price has garnered the most attention from the broader scientific community. Basically, one reasonable way of interpreting the NIPS Experiment results is that of the papers accepted for publication at NIPS 2014, roughly half of them would be rejected if they were reviewed again by a different set of reviewers. This, of course, highlights the degree of subjectivity and randomness (likely exacerbated by sub-optimal reviewing) inherent in reviewing for a such a broad field as machine learning.
The most common way to analyze this is from a certain viewpoint about fairness. I.e., if we had a budget for K papers, did the top K submissions get published? From that standpoint, the answer seems to be a resounding no, no matter how you slice it. One can argue about the degree of unfairness, which is a much murkier subject.
Alternative Viewpoint via Regret MinimizationHowever, as echoed in a blog post by Bert Huang, NIPS was AWESOME this year. The poster sessions had lots of great papers, and the oral presentations were good.
So I'd like to offer a different viewpoint about NIPS, one based on regret minimization. Let's assume that the accepted papers that were more likely to be rejected in a second review are "borderline" papers (seems like a reasonable assumption, but perhaps there are arguments against it). Then, had we swapped out a bunch of borderline papers with other borderline papers that got rejected, would the quality of the conference have been that much better?
In other words, given a budget of K papers to accept, what is the collective quality of K papers actually accepted versus the quality of the "optimal" set of K papers we should've accepted? It's conceivable that the regret on quality difference could be quite low despite the paper overlap being substantially different.
One might even argue, as alluded to here, that long-term regret minimization (i.e., reviewing for NIPS over many years) requires some amount of randomness and/or disagreement between reviewers. Otherwise, there could be a more serious risk of group-think or intellectual inbreeding that can cause the field to stagnate.
Not sure to what extent this viewpoint is appropriate. For instance, NIPS is also a venue by which junior researchers become established in the field. Having a significant amount of randomness in the reviewing process can definitely be detrimental to the morale and career prospects of junior researchers.
On to the Actual PapersThere were many great papers at NIPS this year. Here are a few that caught my eye:
Sequence to Sequence Learning with Neural Networks
by Ilya Sutskever, Oriol Vinyals & Quoc Le.
Ilya gave, hands down, the best talk at NIPS this year. Ever since it started becoming popular, Deep Learning has carried with it the idea that only Geoff Hinton & company could make them work well. Ilya spent most of his talk describing how this is not the case anymore. He also showed how to incorporate a type of gradient momentum called Long Short-Term Memory in order to do sequence-to-sequence prediction with deep neural networks.
Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics
by Sergey Levine & Pieter Abbeel.
This paper combined reinforcement learning and neural networks in order to do policy search. What's shocking about this approach is how few training examples they needed to train a neural network. Granted, the neural network wasn't very deep, but still, the low amount of training data is quite surprising.
Learning to Optimize via Information-Directed Sampling
by Dan Russo & Benjamin Van Roy.
Dan Russo has been doing some great work recently on analyzing bandit/MDP algorithms and proposing new algorithms. This paper proposes the first (mostly) fundamentally new bandit algorithm design philosophy that I've seen in a while. It's not clear yet how to make this algorithm practical in a wide range of complex domains, but it's definitely exciting to think about.
Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets
by Adarsh Prasad, Stefanie Jegelka & Dhruv Batra.
This paper deals with how to do submodular maximization when the ground set is exponentially large. This paper exploits specific structure in the ground set, e.g., it can be solved via cooperative cuts, in order to arrive an efficient solution. It would be interesting to try to learn the diversity/submodular objective function rather than hand-craft a relatively simple one (from a modeling perspective).
From MAP to Marginals: Variational Inference in Bayesian Submodular Models
by Josip Djolonga & Andreas Krause.
Log submodular models are a new family of probabilistic models that generalizes things like associative Markov random fields. This paper shows how to perform variational marginal inference on log submodular functions, which might be wildly intractable when viewed through the lens of conventional graphical models (e.g., very large factors that obey a submodular structure). Very cool stuff.
Non-convex Robust PCA
by Praneeth Netrapalli, Niranjan U N, Sujay Sanghavi, Animashree Anandkumar & Prateek Jain.
This paper gives a very efficient and provably optimal approach for robust PCA, where a matrix is assumed to be low-rank but except for a few sparse components. This optimization problem is non-convex, and convex relaxations can often give sub-optimal results. They also have a cool demo.
How transferable are features in deep neural networks?
by Jason Yosinski, Jeff Clune, Yoshua Bengio & Hod Lipson.
Along with a scientific study on the transferability of neural network features, Jason Yosinski also developed a cool demo that can visualize the various hidden layers of a deep neural network.
Conditional Random Field Autoencoders for Unsupervised Structured Prediction
by Waleed Ammar, Chris Dyer & Noah A. Smith.
This paper gives a surprisingly efficient approach for learning unsupervised auto-encoders that avoids making overly restrictive independence assumptions. The approach is based off CRFs. I wonder if one can do this with a more expressive model class such as structured decision trees.
by Chris J. Maddison, Daniel Tarlow & Tom Minka.
I admit that I don't really understand what's going on in this paper. But it seems like it's doing something quite new so there are perhaps many interesting connections to be made here. This paper also won one of the Outstanding Paper Awards at NIPS this year.