Nc&IS Web Graph Link Analysis http:/net.pku.edu.cn/wwbia 彭波 pb@netpku.edu.cn 北京大学信息科学技术学院 9/27/2010
Web Graph & Link Analysis http://net.pku.edu.cn/~wbia 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 9/27/2010
上一讲主要内容 crawle面临的难题 Scalable, fast, polite ync UDP robust, continuous (slack about DNS prefetch K expiry dates) Text indexing client (UDP) 画晶实现高效率的基本技术 cache Cache Hyperlink Prefetch HttpH normalizer receve a Concurrency Page fetching context'thread disPageKnown? 多进程/多线程 e Craw k+ 异步I/O Persistent work-threadK H isUrlVisited? K+ pool of URLs 〓■有趣的技术 Bloom filter Consistent Hashing
上一讲主要内容 ◼ Crawler面临的难题 ◼ Scalable, fast, polite, robust, continuous ◼ 实现高效率的基本技术 ◼ Cache ◼ Prefetch ◼ Concurrency ◼ 多进程/多线程 ◼ 异步I/O ◼ 有趣的技术 ◼ Bloom filter ◼ Consistent Hashing
I ATT I BBNIGTE MAEW I CERFnet I DIgex I PSI I Sprint I UUnet Unkn owi MAE
Web Graph a玉 an Kodak Canon Worl wide Gateway Apple-Products-QuikTie TOshiba Dell C debe Systemi Incorporated Corpor atien td. Logitech Keyboards AMDE Adva AsUS International lemas Instruments In Veritas softw check c)2003 Touch Graph llC http://www.touchgraph.com/tggooglebroWseR.html
Web Graph http://www.touchgraph.com/TGGoogleBrowser.html
Giant Global Graph My Spac Linkedin Pinkbike PASA biking c content benullman. col google (portfolio sitel news creativity meetup me side proje groups Charlotte google budesigns.com groups employer Design Twitter personal site. intranet Charlotte o● leg: site owner, partner) shopping weekly O●幽 (eg: contributor, organizer Last. fm shallow Icc mont Amazon ebay Pandora (eg maintain a profile)
Giant Global Graph