Sunday, February 20, 2011

Dynamic Ranked Retrieval

One of the biggest trends in web page design is the use of rich or dynamic content. In this world of Flash, AJAX, jQuery and Firefox plugins, it is quite easy to create rich interactive layouts that can efficiently and dynamically display relevant content in response to user input.

Information retrieval services have been taking advantage these technologies by dynamically changing the search results retrieved in response to the user modifying the query or clicking on a search result. For instance, Surf Canyon is a start-up search company that allows users to expand or refine search results. Below is an example interface inspired by their design.

In the first screen capture, the interface initially displays a normal ranking of results in response to the query "SVM".

In the second screen capture, the interface displays an indented ranking in response to the user clicking on the expand button.

These indented rankings can potentially go several levels deep.

One can also imagine an alternative layout where the results are displayed in a multi-column layout, similar to the Mac Finder. The key point here is that dynamic rankings allow the user to efficiently retrieve the most relevant information (i.e. dynamic post-query personalization).

In a recent paper presented at WSDM 2011, my collaborators and I have developed the first formal model (to my knowledge) for characterizing such rankings. It is a bit crude in some respects, but is still an exciting first step. The key intuition is to treat the user experience as traversing a ranking tree.
When examining each result, the user decides whether to expand or skip that result. An expand corresponds to taking a right branch of the corresponding node in the ranking tree, and a skip corresponds to taking the left branch. In essence, the ranking tree captures all possible sequences of clicks and skips the user can make when interacting with the dynamic ranking. It is, of course, intractable to explicitly enumerate the entire tree, but we can still prove things about it. A conventional static ranking corresponds to the special case where the user always takes the left (skip) branch. The utility a user derives from a ranking tree can then be defined over the path (or distribution of paths) the user traverses in the ranking tree.

Given a modularity assumption on utility (i.e. that utility can be decomposed additively over the results), we proved that a simple greedy algorithm always performs at least as well as the optimal static ranking (i.e. dynamic personalization never hurts). Our experiments furthermore demonstrate that dynamic rankings can offer substantial gains in performance in practice. We have also made our experimental setup available for others to test with. Many thanks go to Christie Brandt, who did the lion's share of the work for this project.

There are still many interesting directions to explore. For instance, this model only deals with extrinsic ambiguity (different users have different but narrow intents when issuing the same query), but not intrinsic ambiguity (users are simultaneously interested in many different aspects of the query). The model also doesn't capture cases where the user backtracks or scans multiple results before expanding one of them. One direction that I'm particularly interested in is modeling even richer interfaces, such as those that allow users to provide redundancy feedback. Needless to say, this is an exciting research area that I expect will continue to grow in the foreseeable future.

No comments: