Friday, January 26, 2007

The cost of search computations

I just picked Why Choose This Book? by Read Montague for reading material on my flight back from a meeting. Coincidentally, I had been talking with several friends about the costs and benefits of the computations needed to add natural-language processing to search. Here's Montague on the cost of computation and the peculiar development of modern computers:

It is widely thought that the effort of the group at Bletchley Park saved many lives during the war. But I think it also attached to modern computing an odd legacy. The model for the code-breaker was speed and accuracy at all costs, which means loads of wasted heat. The machines did not have to decide how much energy each computation should get; instead, they simply oversupplied energy to all the computations and wasted most of it. And although we understand the urgent needs of the time, this style allowed them all to overlook a critical fact --- the amount of energy a computation “should” get is a measure of its value to the overall goal. Goals and energy allocation under those goals; these two features are generally missing from the modern model of computing today. Just as for the code-breakers, speed and accuracy are the primary constraints on modern computers.

This picture is changing radically for the search engine. The computation for each query has quantifiable costs and benefits. The costs include R&D, computing infrastructure amortization, energy, rent, maintenance. The direct benefit is advertising revenue from the query response; indirect benefits such as market share gained from greater search quality are harder to measure, but can still be estimated. Search engine success is ultimately given by the efficiency with which it delivers advertising revenue given these factors.

Researchers often complain that search engines are not using (their) latest and greatest ideas in machine learning and natural-language processing. How could they be so resistant to ideas that must improve search quality?

Before I support their complaints, I'd want to see a good cost/benefit analysis. More elaborate algorithms cost more to develop and run. New page analysis, indexing, or retrieval methods will spread their costs all over the building and operation of search engine facilities. It is not unconceivable that one of the fancier query processing schemes being bandied around will double storage and computation requirements. How much more advertising revenue would it have to generate to just break even with the current efficiency? How likely is it that it will?

As opposed to the Bletchley Park model of computation discussed by Montague, the search engine does not have absolute accuracy or timeliness requirements. Higher accuracy or index timeliness are good only insofar as they lead to better ad click-through. In addition, user response to search results depends on other factors beyond accuracy, such as speed and clarity. These tradeoffs can be quantified and tested experimentally.

Search engines are very interesting from this perspective because they are persistent computations whose survival depends on their ability to both pay their way and promise future profits so that their high stock price reduces their cost of capital for growth. They are truly embodied computations whose meaning ultimately assigned by their survival value.


No comments: