I've recently returned from this year's NIPS conference, which is the largest annually held machine learning conference. The program was excellent, but the venue left something to be desired.
NIPS is currently undergoing a transition period. During the previous decade, NIPS was always held in Vancouver (main conference) and Whistler (workshops), which were excellent locations. However, because of increased costs (in part due to the 2010 winter Olympics), the two recent NIPS conferences were held in Grenada (last year) and South Lake Tahoe (this year), respectively.
This year's conference was held jointly at the Harveys and Harrah's hotels. The two hotels are located across the street from each other, and are joined via an underground floor filled with restaurants, casinos, arcades and the like. I suspect the rather confusing path from one venue to the other was done quite deliberately by the hotels in an effort to promote gambling. Unfortunately for them, rather few of the 1000+ NIPS attendees participated in the smoke-filled debauchery (which I heard peeved the hotels quite a bit).
Nonetheless, the winding maze was successful at limiting my exposure to the outdoors. I actually spent my first 36 hours completely indoors, after which I started feeling claustrophobic. In fact, one fellow attendee spent 5 days (that's 120 hours) straight indoors before finally venturing outside.
The conference program was fantastic. Here are some of the papers that I found particularly interesting:
Discriminative Learning of Sum-Product Networks -- sum-product networks are a new deep learning architecture that yields tractable inference. Deep architectures are the most expressive machine learning models in existence, but are notoriously difficult to train. This paper shows how to discriminatively train sum-product networks (this is a bit of misnomer -- max-product networks really), which leads to significantly improved prediction accuracy.
Imitation Learning by Coaching -- imitation learning is a learning approach where a human expert teaches a computer program how to behave within some environment. Often times, the actions of a human expert are too difficult for a computer program to initially learn, and so the program might be better served by first learning how to perform easier actions. This paper proposes such an approach (which they call Coaching), and demonstrates improved theoretical guarantees and empirical performance.
Near-Optimal MAP Inference for Determinantal Point Processes -- determinental point processes (DPPs) have recently gained visibility in the machine learning community. A DPP is a probabilistic model that encourages predictions to be diverse. This paper shows how to perform MAP inference with DPPs, which were previously difficult to do well. One thing I'm not sold on is why one would want to do MAP inference with a DPP. DPPs spend considerable model capacity learning a probability distribution, which MAP inference throws away. If one wanted to do MAP inference, why not just use a discriminatively trained model instead?
Practical Bayesian Optimization of Machine Learning Algorithms -- let's face it, parameter tuning is a pain in the ass. The naive thing to do (which I have been guilty of on several occasions) is parameter grid search, which scales exponentially with the number of tuning parameters. This paper shows how to frame parameter tuning as a goal-oriented active learning problem (my terminology). The main difference with classical active learning is that the final performance of the model is evaluated on the best action the model can take (i.e., predicting the best parameter setting), rather than predicting the performance of all actions (i.e., predicting the response variable of all inputs). This style of approach could potentially be useful for more structured active learning problems as well.
A Spectral Algorithm for Latent Dirichlet Allocation -- spectral algorithms have become increasingly popular for learning various latent variable models in machine learning (started by this paper). In contrast to the common alternative, Expectation Maximization, spectral algorithms are exact learning algorithms, subject to there being sufficient training data. This paper shows how to do spectral learning for Latent Dirichlet Allocation, which is pretty cool. This particular spectral learning approach is largely a theoretical result, so it will be interesting to see how practical it is (or if it could be made sufficiently practical).
And as usual, the NIPS workshops were fantastic. I particularly enjoyed the Bayesian Optimization & Decision Making and Personalizing Education With Machine Learning workshops. Andrew Ng gave an inspiring presentation on Coursera and the potential of online education. Online education (in its current form) certainly has its flaws, but it definitely seems that they will be an integral part of the education process moving forward.