There have been plenty of posts regarding NIPS already (see: Sebastien Bubeck, Neil Lawrence, John Langford, Paul Mineiro, and Hal Daume), with plenty of great pointers to interesting NIPS papers that I'll hopefully get around to reading soon. On my end, I didn't get a chance to see too many papers, in part because I was helping presenting a poster during one poster session, and a demo during another. But I did very much enjoy many of the talks, especially during the workshops.
OpenAIPerhaps the biggest sensation at NIPS was the announcement of OpenAI, which is a non-profit artificial intelligence research company with $1B in endowment donated by people such as Sam Altman, Elon Musk, Peter Thiel, and others. The core ideal of OpenAI is to promote open research in Artificial Intelligence. For the most part, not much is known about how OpenAI will operate (and from what I've gathered, the people at OpenAI haven't fully decided on a strategy yet either). One thing that I do know on good authority is that OpenAI will NOT be patenting their research.
Nonetheless, there have already been many reactions to OpenAI, from the usual "robots will steal our jobs" trope, to nuanced concerns voiced by machine learning expert Neil Lawrence observing that open access to data is just as important as open access to research and systems. I do very much agree with Neil's point and I think that one of the best things that OpenAI can do for the research community is to generate interesting new datasets and testbeds. There have also been concerns voiced that the founding team is overwhelmingly deep learning people. I don't think this is much of an issue at the moment, because representation learning has been the biggest practical leap forward and giving broader access to learned representations is a great thing.
The announcement has even caught the attention of rationalists such as Scott Alexander, who voiced concerns about whether AI research should be open at all, for risk of losing control of the technology and potentially leading to the catastrophic results. Scott's concern is a meta-concern about the current mentality of AI research being an arms race and institutions such as OpenAI not focusing on "controlling" access to AI that could become dangerous. These meta-concerns are predicated on the assumptions that hard takeoff of AGI is a legitimate existential threat to humanity (which I agree with), and that existing institutions such as OpenAI could directly lead to that happening (which I strongly disagree with). I realize that OpenAI ponders about human-level intelligence in their opening blog post, but that's just a mission statement of sorts. For instance, Google, while awesome, has (thus far) fallen quite short of their mission to "organize the world's information and make it universally accessible and useful". Likewise, I don't expect OpenAI to succeed in their mission statement anytime soon.
Most machine learning experts probably do take an overly myopic view of machine learning progress, which is partly due to the aforementioned research arms race but also just due to how research works (i.e., it is REALLY hard to make tangible progress on something that you can't even begin to rigorously and precisely reason about). However, from what I've read, rationalist non-experts conversely tend to phrase things in such imprecise terms that it's hard to have a substantive discussion between the two communities. I imagine the "truth", such as it is, is somewhere in the middle. Perhaps one should gather both camps together for a common discussion.
What is definitely going to happen, in the near term, is that access to AI technologies will be an increasingly important competitive advantage moving forward. And it's great that institutions such as OpenAI will help promote open access to those technologies.
I am optimistic that the crew at OpenAI will explore alternative mechanisms in contrast to NSF-style funding of research, and how places like the Allen Institute engages in research. I think it'll be exciting to see what comes out of that process. Hopefully, OpenAI will also engage with places like the Future of Humanity Institute, and maybe even create forums that bring together people like Stuart Russell, Eric Horvitz, Scott Alexander and Eliezer Yudkowsky.
Cynthia Dwork on Universal Adaptive Data AnalysisCynthia Dwork gave a great talk on using differential privacy to guard against overfitting when re-using a validation set multiple times. See this Science paper for more details. The basic idea is that, when you use your validation set to evaluate the performance of a model, do so in a differentially private way so that you don't overfit to the idiosyncrasies of the validation set. See, for instance, this paper describing an application to Kaggle-style competitions. This result demonstrates a great instance of (unexpected?) convergence between different areas of study: privacy-preserving computation and machine learning.
Jerry Zhu on Machine TeachingJerry Zhu has been doing very interesting work on Machine Teaching, which he talked about at the NIPS workshop on adaptive machine learning. Roughly speaking, machine teaching is the computational and statistical problem of how to select training examples to teach a learner as quickly as possible. One can think of machine teaching as the converse of active learning, where instead of the learner actively querying for training examples, a teacher actively provides them.
Machine teaching has a wide range of applications, but the one that I'm most interested in is when the learner is a human. As models become necessarily more complex in the quest for predictive accuracy, it is important that we devise methods to keep these models somehow interpretable to humans. One way is to use a machine teaching approach to quickly show the human what concepts the trained model has learned. For instance, this approach would have applications debugging complicated machine learning models.
Rich Caruana on Interpretable Machine Learning for Health CareOn the flip side, Rich Carauna talked about training models that are inherently interpretable by domain experts, such as medical professionals. Of course, these models are only applicable in restricted domains, such as when there is a "sufficient" set of hand-crafted features such that a generalized additive model can accurately capture the phenomenon of interest. The approach was applied to two settings: predicting the risk of pneumonia and 30-day re-admission.
One interesting consequence of this study was that these interpretable models could be used to tease out biases in the data collection process. For instance, the model predicted that patients with asthma are at lower risk of dying from pneumonia. Consulting with medical experts revealed that, historically, patients with asthma are more closely monitored for signs of pneumonia and so the disease is detected much earlier than for the general populace. Nonetheless, it's clear that one wouldn't want a predictive model to predict a lower risk of pneumonia for patients with asthma -- that was simply a consequence of how the historical data was collected. See this paper for details.
Zoubin Ghahramani on Probabilistic ModelsZoubin Ghahramani gave a keynote talk on probabilistic models. During this deep learning craze, it's important keep in mind that properly quantifying uncertainty is often a critical component as well. We are rarely given perfect information, and so we can rarely make perfect predictions. In order to make informed decisions, our models should make calibrated probabilities regarding so that we can properly weigh different tradeoffs. Recall that one of the critical aspects of the Jeopardy! winning IBM Watson machine was being able to properly calibrate its own confidence in the right answer (or question). Another point that Zoubin touched on was rational allocation of computational resources under uncertainty. See also this great essay on the interplay between machine learning and statistics by Max Welling.
Interesting PapersAs I mentioned earlier, I didn't get a chance to check out too many posters, but here are a few that I did see which I found quite interesting.
Generalization in Adaptive Data Analysis and Holdout Reuse
by Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth
This paper generalizes previous work on adaptive data analysis by: 1) allow the query to the validation set be adaptive to the result of previous queries, and 2) provide a more general definition of adaptive data analysis.
Logarithmic Time Online Multiclass Prediction
by Anna Choromanska, John Langford
This paper studies how to quickly construct multiclass classifiers whose running time is logarithmic in the number of classes. This approach is especially useful for settings where the number of classes is enormous, which is also known as Extreme Multiclass Classification.
Spatial Transformer Networks
by Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
This paper studies how to incorporate more invariants into convolutional neural networks beyond just shift invariance. The most obvious cases are being invariant to rotation and skew. See also this post.
Optimization as Estimation with Gaussian Processes in Bandit Settings
by Zi Wang, Bolei Zhou, Stefanie Jegelka
A preliminary version of this paper was presented at the Women in Machine Learning Workshop at NIPS, and will be formally published at AISTATS 2016. This is a really wonderful paper that unifies, to some extent, two of the most popular views in Bayesian optimization: UCB-style bandit algorithms and probability of improvement (PI) algorithms. One obvious future direction is to also unify with expected improvement (EI) algorithms as well.
Fast Convergence of Regularized Learning in Games
by Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire
This paper won a best paper award at NIPS, and analyzed the setting of learning in a repeated game. Previous results showed a regret convergence of O(T-1/2), and this paper demonstrates an asymptotic improvement to O(T-3/4) for individual regret and O(T-1) for the sum of utilities.
Data Generation as Sequential Decision Making
by Philip Bachman, Doina Precup
This paper takes the view of sampling from sequential generative models as sequential decision making. For instance, can we view sequential sampling as an Markov decision process? In particular, this paper focuses on the problem of data imputation, or filling in missing values. This style of research has been piquing my interest recently, since it can offer the potential to dramatically speed up computation when sampling or prediction can be very computationally intensive.
Sampling from Probabilistic Submodular Models
by Alkis Gotovos, S. Hamed Hassani, Andreas Krause
Andreas's group has been working on a general class of probabilistic models called log-submodular and log-supermodular models. These models generalize models such as determinantal point processes. This paper studies how to do inference on these models via MCMC sampling, and establish conditions for fast mixing.
The Self-Normalized Estimator for Counterfactual Learning
by Adith Swaminathan, Thorsten Joachims
This paper addresses a signficant limitation of previous work on counterfactual risk minimization, which is overfitting to hypotheses that match or avoid the logged (bandit) training data, which the authors call propensity overfitting. The authors propose a new risk estimator which deals with this issue.