Google's co-founder Sergey Brin gave last night's second opening talk. Clearly Google was one of the web's preeminent services long before SOAP APIs were slapped onto it. Google climbed a mountain that few others have attempted. At Stanford, from 1995 to 1998, Google was a hodgepodge of boxen using a disk cage made of Duplos. (Actually, an off-brand called MegaBlocks. Advice to entrepeneurs: "I encourage you to find innovative ways to save money, but buy good stuff, get Lego-brand Duplos.") By 1998 Google was fielding 10,000 queries/day on 25 boxes. Today it handles 150 million queries on a multi-datacenter cluster that runs to tens of thousands of computers.
Brin's recipe for scaling is: commodity systems, cleverness, and lots of brute force. The result is not just sub-second response on 150 million queries a day, but -- thanks to the new APIs -- a highly-available set of programmatic services. Brin showed, for example, how a Visual Basic application can incorporate Google's spell-checking by tapping into the SOAP API.
Even though Google has 100 Ph.D.s working the problem, Brin thinks that truly intelligent search remains an elusive goal. In the Q and A, when asked about RDF and the Semantic Web, he offered a "possibly unpopular" view. "Look, putting angle brackets around things is not a technology, by itself. I'd rather make progress by having computers understand what humans write, than by forcing humans to write in ways computers can understand." When I asked about an experiment with topic-sensitive PageRank , Brin pointed out that the kinds of ambiguous queries this approach can resolve -- for example, the difference between "blues" the musical sense versus the mental health sense -- are in practice rare, and easily resolved by the user with follow-on querying.
What's next? Brin wants to tackle big problems in health, materials, transportation. The company is working on a system to simulate molecular dynamics. For a simple protein, this proceeds at the rate of one nanosecond of simulation per CPU per day -- three CPU years to crank a millisecond of simulation. As these new computational peaks come into view, it's a given that scaling them won't be a solo effort. Leaders like Google know they'll need to leave a trail of services for others to follow.
Former URL: http://weblog.infoworld.com/udell/2002/09/19.html#a414