Search Engines Architecture Indexing process Document data store Text Acquisition Index Creation E-mail, Web pages, News articles, Memos, Letters Index Text Transformation
Search Engines 6 Indexing Process Architecture
Search Engines Architecture Indexing process Text acquisition identifies and stores documents for indexing Text transformation transforms documents into index terms ndex creatⅰon takes index terms and creates data structures( indexes)to support fast searching
Search Engines 7 Indexing Process ▪ Text acquisition ▪ identifies and stores documents for indexing ▪ Text transformation ▪ transforms documents into index terms ▪ Index creation ▪ takes index terms and creates data structures (indexes) to support fast searching Architecture
Search Engines Architecture Query Process Document data store User interaction Ranking Index Evaluation Log data
Search Engines 8 Query Process Architecture
Search Engines Architecture Query Process User interaction supports creation and refinement of query display of results Ranking uses query and indexes to generate ranked list of documents Evaluation monitors and measures effectiveness and efficiency (primarily offline
Search Engines 9 Query Process ▪ User interaction ▪ supports creation and refinement of query, display of results ▪ Ranking ▪ uses query and indexes to generate ranked list of documents ▪ Evaluation ▪ monitors and measures effectiveness and efficiency (primarily offline) Architecture
Search Engines Architecture Details: Text acquisition Crawler Identifies and acquires documents for search engine Many types -web, enterprise, desktop Web crawlers follow links to find documents Must efficiently find huge numbers of web pages( coverage) and keep them up-to-date (freshness) Single site crawlers for site search Topical or focused crawlers for vertical search Document crawlers for enterprise and desktop search Follow links and scan directories
Search Engines 10 Details: Text Acquisition ▪ Crawler ▪ Identifies and acquires documents for search engine ▪ Many types – web, enterprise, desktop ▪ Web crawlers follow links to find documents ▪ Must efficiently find huge numbers of web pages (coverage) and keep them up-to-date (freshness) ▪ Single site crawlers for site search ▪ Topical or focused crawlers for vertical search ▪ Document crawlers for enterprise and desktop search ▪ Follow links and scan directories Architecture