Monday, May 02, 2011

SIGIR 2011 Tutorial: Practical Online Retrieval Evaluation

Filip Radlinski and I will be teaching a tutorial titled "Practical Online Retrieval Evaluation" later this year at SIGIR 2011. The tutorial is currently scheduled for the afternoon session on July 24th, 2011. A list of all SIGIR tutorials can be found here.

Tutorial Description:

Online evaluation is amongst the few evaluation techniques available to the information retrieval community that is guaranteed to reflect how users actually respond to improvements developed by the community. Broadly speaking, online evaluation refers to any evaluation of retrieval quality conducted while observing user behavior in a natural context. However, it is rarely employed outside of large commercial search engines due primarily to a perception that it is impractical at small scales. The goal of this tutorial is to familiarize information retrieval researchers with state-of-the-art techniques in evaluating information retrieval systems based on natural user clicking behavior, as well as to show how such methods can be practically deployed. In particular, our focus will be on demonstrating how the Interleaving approach and other click based techniques contrast with traditional offline evaluation, and how these online methods can be effectively used in academic-scale research. In addition to lecture notes, we will also provide sample software and code walk-throughs to showcase the ease with which Interleaving and other click-based methods can be employed by students, academics and other researchers.

Topics include:
  • Overview of online evaluation
  • Collecting usage data: How to be their search engine (with code walk-through)
  • The Interleaving approach for click-based evaluation
  • Practical issues in deploying Interleaving experiments (with code walk-through)
  • Analyzing and interpreting Interleaving results
  • Quantitative comparison of Interleaving with other evaluation methods (both online and offline)
  • Tricky issues, extensions, and limitations
  • From evaluation to optimization: Deriving reliable training data from user feedback
  • No comments: