I've been spending some time recently thinking about the problem of running online experiments for eliciting user preferences on search engines. The setup is pretty straightforward:
1. Our search engine has a pool of K retrieval functions (a retrieval function returns a ranking of results given an input query).
2. We want to figure out which of these is the best one. The way we elicit preferences from users is by showing users an interleaving of two rankings (computed from two different retrieval functions). Then we can interpret users' clicks as a preference vote for one of the two retrieval functions.
3. However, we do pay an exploration penalty because we might be subjecting users to suboptimal results as we're performing our online interleaving experiments. Since the goal is to ultimately provide the best possible service to our users overall the lifetime of the search engine, there's an exploration versus exploitation trade-off at play here.
The mathematical framework I've been working on to model this setting is called the Dueling Bandits Problem. The linked to paper, we presented an algorithm that is optimal in the worst case under relatively mild assumptions.
Now, there are two ways to make interesting refinements to a problem statement (i.e., to consider a special case of the original problem). Not surprisingly, they are to make the problem easier or harder. Problems become easier when the refinement introduces additional structure that removes difficult to solve cases -- this is in my experience the more common setting. Problems become more difficult when the refinement restricts the strategy space with which you can design an algorithmic solution. This is especially pertinent when the previously known algorithm that solves the original problem is no longer feasible with the restriction of the strategy space.
For example, we might know a similarity metric on our set of candidate retrieval functions. For example, all the retrieval functions that pay special attention to navigational queries could be "closer" to each other than retrieval functions that do not. This might allow us to remove various worst case scenarios and prove stronger performance guarantees. Such a refinement would make the problem easier.
In the original statement of the Dueling Bandits Problem, our decision algorithm is allowed to spend as much time exploring as necessary. This is actually an undesirable property for many search engines, due to management and infrastructure issues. For example, we may only want to allocate a fixed amount of exploration budget, after which we must decide on which retrieval function to commit to for our production system. Such a refinement would make the problem harder, since it restricts the strategy space of our decision algorithm.
In this case, it seems that making restrictions (that make the problem harder) motivated by real-world concerns is the more interesting direction to pursue, since otherwise these results are much more likely to be confined in pure theory-land.