●●● ●●●● ●●●●● ●●●● ●●●●● Techniques/Boosting ●●●● o Used to increase ranking Hypertext boosting Term -Relevance(one/many queries Target: TF-IDF variants Link Importance - Target: inlink/outlink count
Techniques/Boosting ⚫ Used to increase ranking ⚫ Hypertext boosting ⚫ Term –Relevance (one/many queries) –Target: TF-IDF variants ⚫ Link –Importance –Target: inlink/outlink count
●●● ●●●● ●●●●● ●●●● Techniques/Boosting/Ter give a higher weight to ppear <html> title <head> <meta name="keywords"content ="bu Simplest, most cameras Lens, accessories nikon, canon <title>Tree,Treer, cheap</title> popular, as old as search engines </head> <body> Our customers agree that we are the best online retailer of cameras offer a summary of the </body> pointed document, higer </htm> weight <html> A great <a href buy-canon-rebel-20d-lens-case. cayerasx cor free, great deals, Cheap,inexpensive, Cheap, freek/a> store </html>
Techniques/Boosting/Term term body title Meta tag anchor url <html> <head> <meta name = “keywords” content = “buy,cheap ,cameras,Lens,accessories,nikon,canon”> <title>free,free,free, cheap</title> </head> <body> Our customers agree that we are the best online retailer of cameras! … </body> </html> <html> …A great <a href = “buy-canon-rebel-20d-lens-case.camerasx.com”> free,great deals,cheap,inexpensive,cheap,free</a> store. </html> heavy spamming,low priority or ignore them completely give a higher weight to terms that appear in the title Simplest,most popular,as old as search engines the url of a page =>a set of terms,to determine the relevance of the page offer a summary of the pointed document,higer weight
●●● ●●●● ●●●●● ●●●● ●●●●● Techniques/Boosting/Link ●●●● Yahoo! Direct link Arts Humanities Business & Economy B28, Finance, shopping, Jobs Computers& Internet Hardware Software, Web, Games provide usefu inlink Spammers can control a bing resourseS, g large number of sites and links to spa Buy expired domains, takes advantage of the false create arbitrary link relevance/importance conveyed structures by the pool of old links Health Diseases, Drugs, Fitness, Nutrition honey pot farm News Media Newspapers, Radio, Weather, Blogs directory exp domain comment exchange
Techniques/Boosting/Link link inlink outlink honey pot directory comment exchange exp.domain farm dir.clone Outgoing links to provide useful well-known pages resourses,BUT,have links to spam pages allow webmasters to post links their sites,maybe spam links Post messages (containing links) to Blogs;forums;Wikis A group of spammers set up a link exchange structure,their sites point to each other Buy expired domains,takes advantage of the false relevance/importance converyed by the pool of old links Spammers can control a large number of sites and create arbitrary link structures
●●● ●●●● ●●●●● ●●●● ●●●●● Techniques/Hiding ●●●● Different web pages to Hiding techniques users and web crawlers Content hiding cloaking redirection <body background=red> <script type="text/javascript><, <font color="red">hidden text</font> location. replace(target. html) </body> <div style="visibility: hidden>You </scripmetahttp-equiv"refresh cant see me! </div> Nretatagcontent="o;url=plush.com"> <a href="target. html><img src= tinyimg. gif></a>
Techniques/Hiding Hiding techniques Content hiding cloaking redirection text link Meta tag script color script graphics Different web pages to users and web crawlers <script type=“text/javascript”><!-- location.replace(“target.html”) //--> </script> <body background=“red”> <font color=“red”>hidden text</font> … </body> <div style=“visibility:hidden”>You can’t see me!</div> <a href=“target.html”><img src= “tinyimg.gif”></a> <meta http-equiv=“refresh” content=“0;url=plush.com”>
●●● ●●●● ●●●●● ●●●● Outline ●●●●● ●●●● ● Motivation o Introduction to Web Spam ● Web Spam Taxonom ● Web Spam Detection ● Conclusions
Outline ⚫ Motivation ⚫ Introduction to Web Spam ⚫ Web Spam Taxonomy ⚫ Web Spam Detection ⚫ Conclusions