I recently attended NIPS 2013 at South Lake Tahoe, which was an absolutely fantastic conference (the limited dining choices and general casino debauchery notwithstanding). Highlights included listening to an awesome talk by Daphne Koller on online education (and Coursera in particular), and meeting Mark Zuckerberg and learning about the new Facebook AI research lab.
I also attended the Ben Taskar Memorial Session, which, while very saddening, was also inspiring to myself and hopefully many others who attended. Ben Taskar was a truly exceptional researcher and human being, and it was very touching listening to his friends and collaborators talk about and celebrate his life and work. Ben leaves behind a wife and daughter who requires around-the-clock care. I encourage everyone who can to donate to his memorial fund.
*EDIT* -- you can read more about my thoughts on this Quora answer.
The NIPS workshops have consistently been my favorite machine learning event over the past several years. This year, I was privileged to be an invited speaker at the Discrete and Combinatorial Problems in Machine Learning workshop. The guest of honor at the workshop was Kazuo Murota, who gave a great overview on his work on discrete convexity (slides here). As was usual, I jumped around between several workshops, including Bayesian Optimization, Machine Learning for Sustainability, and Machine Learning for Clinical Data and Healthcare.
Daniel Russo presented a very interesting result at the Bayesian Optimization workshop showing a very simple way of analyzing Thompson sampling based on existing analysis for UCB-style algorithms. He mainly showed results in a Bayesian regret setting, and I wonder if one could extend his result to show account for the full spectrum between Bayesian expected regret and worst-case regret.
There were many great papers at NIPS this year (more than I could get around to reading, it seems like). Here are a few that I found particularly interesting:
Sequential Transfer in Multi-armed Bandit with Finite Set of Models, by Mohammad Azar, Alessandro Lazaric, and Emma Brunskill. This paper shows how to use knowledge learned from previous online-learning problems to learn faster for new online learning problems (i.e., transfer learning). They leverage a recent result on learning tensor decompositions in order to arrive at a provably data-efficient approach. Such problems are particularly relevant in many online systems which need to continuously personalize to new users and domains.
High-Dimensional Gaussian Process Bandits, by Josip Djolonga, Andreas Krause, and Volkan Cevher. Like the above-mentioned paper, this paper addresses how to transfer learning across different actions. Rather than focusing on sequential transfer learning like above, this paper instead treats all the actions (e.g., items) and contexts (e.g., users) jointly as points in very high dimensional space. This paper leverages a different result on low-rank matrix recovery. I've thought about this type of approach a little while back, and am really excited that the authors got this method to work. One thing that kind of bugs me is that the first stage requires a very restrictive random sampling technique, which makes the result less useful in practice.
Eluder Dimension and the Sample Complexity of Optimistic Exploration, by Daniel Russo, and Benjamin Van Roy. This paper proposes a new measure of complexity for analyzing online learning problems. This seems to be a refreshingly unique perspective on the problem, although the relationship with more understood measures such as information gain is currently unclear. But it does have the potential to help us analyze more complex prediction settings.
Multi-Task Bayesian Optimization, by Kevin Swersky, Jasper Snoek, and Ryan Adams. This paper is very cool. Hyper-parameter tuning is a big problem in machine learning. This method proposes a way to use a smaller dataset to extrapolate likely outcomes when running a learning algorithm (with a certain hyper-parameter setting) on the larger actual dataset. The experiments show that it can offer SIGNIFICANT speedups compared to standard approaches.
BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables, by Cho-Jui Hsieh, Matyas Sustik, Inderjit Dhillon, Pradeep Ravikumar, and Russell Poldrack. Estimating high-dimensional inverse covariance matrices is an increasingly more important problem in machine learning. This method absolutely blows the competition out of the water.
Learning Adaptive Value of Information for Structured Prediction, by David Weiss, and Ben Taskar. This paper is also very cool. It is essentially a meta-learning algorithm that trains an adaptive policy on which features to compute for the base classifier to use. For certain applications, computing features can be very expensive, and this paper shows how to achieve almost identical performance while using only a small fraction of the features. Furthermore, certain features can be harmful for certain classification instances, and the authors show how this method can actually predict to avoid computing such features depending on the problem instance. There's a lot about this setup that I don't quite understand from a theoretical perspective, but it seems very powerful.
Latent Structured Active Learning, by Wenjie Luo, Alex Schwing, and Raquel Urtasun. Active learning for structured prediction models is an important problem. Most active learning results address only the unstructured setting, e.g., binary labels. This issue is even more important in structured models because acquiring structured labels is typically much more expensive than binary labels. The authors propose a method for actively eliciting labels on components of the full structured prediction problem (thus leaving much of the structure latent). One thing that's worth exploring here is a mixed-initiative approach rather than a purely system-driven approach.