Saturday, December 19, 2009

Comparing Machine Learning and Statistics

There's a tendency amongst the machine learning folks to view statistics as simply more of the same, except without the computational flavor. After all, the methods employed are largely the same (at least they are nowadays with the rising popularity of statistical machine learning), with things like logistic regression, decision trees, Gaussian processes, density estimation, bagging (bootstrap aggregation) and many other approaches each taking turns being the flavors of the month.

When competing for computer science style research jobs, there's likewise a tendency to view statistics grad students as being handicapped due to the difference in research style and much lower publication frequency. The basic argument is the following: computer science Ph.D. students partake in many smaller sized publishable projects (or cut up their larger research agenda into smaller publishable units) which ideally can distinguish the best students from their peers; whereas statistics Ph.D. students focus more on core mathematics and detailed analysis of scientific data, which often leads to much slower publication rates and thus less information to distinguish between higher and lower quality students.

But it now seems to me that such considerations only really matter when evaluating candidates for core research positions that push for truly automated approaches in problem solving. There are many other jobs out there which are not purely computing focused, and for such positions this handicap might not exist or even be flipped in the other direction.

People working in machine learning are no doubt aware that many scientists are suspicious of machine learning models due to the difficulty in interpreting said models. These scientists would generally rather work with models that are much simpler but perhaps loses a few percentage points in the performance measure used to train the machine learning models (e.g., 0/1 loss -- which may or may not be that informative or crucial). Machine learning people understand this as regularization (i.e., preferring simpler models using some measure of model complexity), but in this case the regularization criterion is human interpretability.

One important reason for this is that such models are meant to be used as one tool or indicator amongst a collection of indicators for measuring quality or performance along a variety of axes. Because of the difficulty in appropriately combining these different desiderata into a single performance measure, domain experts (e.g., scientists, financial traders, marketing and campaign managers, user interface designers, etc) would much rather have a suite of simpler models that they intuitively "know how to use".

Indeed, I had such an experience this past summer, when I interned with the Search Quality Evaluation Research team at Google. This team consists of a mix of developers and statisticians and is dedicated to designing and implementing new metrics (i.e., ways of measuring the quality of Google search). Given my lack of experience in this style of analysis, my fellow interns (who were statistics Ph.D. students) made much faster initial progress on their projects. From a holistic learning perspective, this experience was very beneficial for me, although the project ended up being a bit more stressful than I'd anticipated.

In summary:

-- Computer scientists seek to build systems and solve problems a principled (and thus automated) fashion. From the (current) machine learning perspective, when we have the right performance objective and sufficient quantities of data, then we can strive to build efficient and accurate models to optimize for this relatively well-formulated problem setting.

-- Statisticians seek to understand the world. The models must be interpretable and explain the primary factors contributing to some observed phenomenon. It is often very difficult to arrive at a single performance measure, in which case it is often desirable to present human practitioners with informative and intuitive models.

-- It is unclear to me that having a plethora of conference publications is actually all that useful in distinguishing between higher and lower quality students for most jobs. Due to how the economics play out, most jobs available to Ph.D. students are more applied than not, and I'm not sure if the type of papers that tend to get published at machine learning conferences are adequately informative from that perspective.

-- It also seems to me that bridging this gap between statistics and machine learning will be crucial for developing general learning algorithms an intelligent systems. Machine learning currently works quite well when the problem is well specified. The challenge is to extend such learning methods to more heterogeneous and dynamic environments.

No comments: