Web的大小 Estimate indexed web low-bound by analysis overlap of search Engine steve lawrence and c Lee giles, 1998*) ■选择6个流行的 search engine,假设它们索引页面之间 的 independency Sampling:通过575个查询对这些SE采样,分析它们之 间的 overlap 用 overlap来估计各个SE所覆盖的 indexable web的大小 利用已知某个SE的页面数,来估计整个Web的大小
Web的大小 ◼ Estimate indexed web low-bound by analysis overlap of Search Engine (Steve Lawrence and C. Lee Giles,1998*) ◼ 选择6个流行的 search engine, 假设它们索引页面之间 的 independency ◼ Sampling: 通过575个查询对这些SE采样,分析它们之 间的overlap ◼ 用overlap来估计各个SE所覆盖的 indexable Web的大小 ◼ 利用已知某个SE的页面数,来估计整个Web的大小
Overlap analysis b/0 During the test time HotBot report indexed 110 milion page lower bound on the size of the indexable Web of 320 million pages. (1998
Overlap analysis ◼ P(a) =na /N = n0 /nb ◼ N = na*nb /n0 ◼ During the test time, HotBot report indexed 110 miliion page ◼ lower bound on the size of the indexable Web of 320 million pages. (1998)
Nc&IS Web的连通性如何?
Web的连通性如何?
Web的形状 a large scale study(altavista crawls) reveals interesting properties of web (Andrei Broder 1999) Study of 200 million nodes 1.5 billion links Some parts unreachable, others have long paths found Bow-tie structure
Web的形状 ◼ A large scale study (Altavista crawls) reveals interesting properties of web (Andrei Broder ,1999) ◼ Study of 200 million nodes & 1.5 billion links ◼ Some parts unreachable, Others have long paths ◼ found Bow-tie Structure
Bow-tie Components Tendril Strong ly f midical Connected Component (scc) Core upstream(IN) sCC OUT Core cant reach f4 median ncode S6 Mr'dicn nOdes f4 Midical nCde Downstream (OUT) oUT cant reach Tubes OOo Disconnected Tendrils tubes Disconnec ted components
Bow-tie Components ◼ Strongly Connected Component (SCC) ◼ Core ◼ Upstream (IN) ◼ Core can’t reach IN ◼ Downstream (OUT) ◼ OUT can’t reach core ◼ Disconnected ◼ Tendrils & Tubes