Ever try to do some information gathering on a topic (e.g., buying a car, finding a good restaurant, researching some historical event), only to be frustrated by the inane way commercial search engines just spit out rankings after rankings of useless results? One reason why this happens is because existing systems, while they can make static predictions such as rankings very efficiently (witness Google Instant), ignore contextual information as users interact with the system.
Suppose I'm looking for that perfect restaurant for a first date. I might go on Yelp and do a few searches with some typical keywords (e.g., "romantic, $$$, good wine"). Sure, Yelp does a reasonable job there, if the end goal is to just efficiently provide a reasonable ranking of restaurants for the query.
But I care about so much more than that. I want to understand the ambiance (I want something romantic but also lively), parking issues (I hate not being able to find parking), service quality (I drink a lot of water, and hate having my glass unfilled), and types of food (I prefer having healthier options). Essentially, I want to interact with someone who knows the wisdom of the crowds (e.g., the reviews) and have that person help me quickly find what I'm looking for.
How would a restaurant/dating expert answer these questions? Such an expert would probably have a few top notch (and diverse) recommendations in mind, and help this aspiring young romantic to zero in on the right selection. If there turns out to be some previously unrevealed constraints (maybe I want it to be close to the lake front, so we can go for a stroll afterwards), then the expert can adapt accordingly.
Basically, if we want to achieve the same type of efficiency with an information system, we'd need algorithms that can predict a dynamic plan for the entire user session.
Given the rapid increases in computational resources and the efficiency of machine learning algorithms, it's worth asking what would be possible if one could run sophisticated learning and inference algorithms in (near) real-time. Could one design algorithms and systems that can efficiently predict good dynamic plans?
One of the major strengths of more sophisticated learning algorithms is the fact they can train more flexible models. Sure, computers can play chess in real-time, but chess is actually a very rigid environment with limited ambiguity.
The most powerful learning algorithms in our toolkit are supervised learning algorithms -- those that train models whose predictions most closely match some target labels. Historically, these target labels have been acquired through manual annotation (e.g., a bunch of experts in some domain label a document as good or bad). This is obviously expensive and limits the scope and applicability of such approaches. Commercial search companies employ raters through interfaces similar to Amazon Mechanical Turk to collect millions of these types of annotations, but that is not a viable solution for most applications. Furthermore, it would be wildly intractable to collect sufficient training data to cover all contexts and use cases.
But suppose a system could collect feedback dynamically throughout the session. For example, suppose I could flag a few results as "good" or "bad" and tell Google or Yelp to (instantly) re-train the ranker and give me a new ranking using that feedback. If the results look better but not perfect, I can unflag some of the earlier results or maybe flag some of new results, rinse and repeat. And, of course, many HCI researchers are interested in designing new ways of exploring information (e.g., Microsoft Pivot), and that can lead to different ways for users to provide feedback to the system.
The key thing to note in the above scenario is that the system receives feedback that is directly tied to the specific usage context (e.g., one search session). As such, the system doesn't have to learn to optimize for every possible context a priori, so long as it can react quickly and intelligently to the real-time feedback the user is providing.
One important modeling or theoretical question here is, given that I expect the user to flag some results as good or bad, to what extent should the system make its initial recommendations to maximize its information gain? If we think the user is willing to go through, say, three rounds of feedback, then maybe we should only care about providing the most relevant results after the third round and spend the first two rounds showing results whose feedback would cut the version space as much as possible. But on the other hand, suppose that for some queries we're very confident that the user is only interested in one of a few results. In that case, maybe we should just show those results? This is not that different from the restaurant/dating expert deciding between asking more clarifying questions about your preferences or making a few recommendations based on her understanding of what you want. But how does one quantify this trade-off between exploration and exploitation in a way that is both programmable and an appropriate characterization of user utility?
In any case, it looks like real-time and interactive machine learning is on the horizon. And when it arrives, I expect it to be here to stay.