The situation can be summarized as follows.
My response: of course Bing uses that kind of data, how could they not use such a valuable resource? I actually find it surprising that Google didn't expect Microsoft to be doing this already.
Numerous studies in the literature have provided us with overwhelming evidence that mining click data is one of the most useful signals for improving search quality. If I worked at Bing, I would be pushing to make use of such data.
Many other search companies use click data on third party search results as well, for example Surf Canyon. Surf Canyon has an installable add-on that can dynamically re-rank search results on Google, Bing, Yahoo!, and Craigslist. This re-ranker is, of course, trained in part using click data harvested via their toolbar from users issuing queries on other search engines. Surf Canyon also has a native search engine, which I expect is also optimized using click data on Google's search results gathered from their own toolbars. That is basically the exact same thing as what Bing is doing. Now, none of these other search companies come close to the size of Bing, so maybe Google just didn't care or notice until Bing started doing this in a more obvious way.
I personally think that I should own my search logs. If I am allowed to share my usage data with any company of my choosing, then I think that's a win for everyone (except perhaps for the company currently holding a monopoly over the usage data, of course). As mentioned elsewhere, this would lower the barrier for competition and innovation.
In reality, clicks on search results make up only a small part of the equation. Suppose a Google Chrome user is sending usage data to Google. Google sees a log where the user
1) issued a query on Google
2) clicked on a search result
3) immediately issued the same query on Bing
4) clicked on a search result
5) browsed around on the landing website for 15 minutes
Would Google ignore this entry simply because it contains a Bing query? I think existence of the Bing query is almost beside the point in this case, because harvesting just clicks on search results does not tell the whole story. Usage data also includes the actions users take after leaving the search results page, which can be just as valuable (see example study). Leveraging usage data of all varieties is the future, and it benefits everyone.
3 comments:
A potential problem with learning from a competitor's results is that it can lead to "circular" learning. Let's say that Google learns from clicks on Bing results and Bing learns from clicks on Google results. This seems potentially dangerous to me as it means that Google and Bing don't really learn anything new, they will just brush up on what they already know.
That is definitely a concern, but I think we're still quite far from that point. Furthermore, if we start to integrate other forms of usage data (such as general browsing behavior), then we can incorporate signals far beyond just what results show up on Google's and Bing's results pages.
I should also add that the World Wide Web is not a static object. New content is being added all the time, which also diffuses some concerns regarding "circular" learning.
Post a Comment