Near-optimal Monitoring of Online Data Sources

Google TechTalks
July 27, 2006

Ryan Peterson

ABSTRACT
Crawling the Web for interesting and relevant changes has become increasingly difficult due to the abundance of frequently changing information. Common techniques for solving such problems make use of heuristics, which do not provide performance guarantees and tend to be tailored to specific scenarios or benchmarks.

In this talk, I will present a principled approach based on mathematical optimization for monitoring high-volume online data sources. We have built and deployed a distributed system called Corona that enables clients to subscribe to Web pages and notifies clients of updates asynchronously via instant messages. Corona assigns…