There are three possible answers to the last question posed above. The first probability could either be greater than, less than, or equal to the second probability. To answer this question, one needs to consider the distribution of issued queries in a given query session, which is known to follow a heavy-tailed distribution

One of the most prevalent phenomena in our society is abundance of heavy-tailed distributions. For those unfamiliar with the concept, a heavy-tailed distribution is simply one whose tail is not exponentially bounded. Heavy-tailed distributions arise basically everywhere. For example, the distribution of phone call activities is almost always heavy-tailed for any call network (e.g., the the number of people who made X calls per month, F(X), is heavy-tailed in a dataset I'm examining). The number of citations any research paper receives also follows a heavy-tailed distribution.

Coming back to search engine users, let P

_{2}denote the probability of a user issuing a second query conditioned on already having issued one query, and let P

_{3}denote the probability of a user issuing a third query conditioned on already having issued two queries. Then, assuming that query session length follows a heavy-tailed distribution, we can conclude that P

_{3}> P

_{2}. In fact, any distribution which results in P

_{3}being equal to or less than P

_{2}must have an exponentially bounded tail.

What this means is that, the more queries a user issues, the more confident we can be that the user will continue issuing more queries in this particular search session. This certainly has implications on how we can think about automatically incorporating session-based memory into information services such as search engines. And in fact, if we can predict when the user is about to engage in a long information gathering session, then the search engine can potentially preemptively show the user a more diverse set of results (i.e., exploration) so that the user's feedback (e.g., which results she clicks on) can help the search engine produce more relevant results in down-stream queries.

This topic came up during a discussion with Paul Bennett.

## No comments:

Post a Comment