Case 1: 08-CV-03139-RRM-RER Document 51-6 Filed 03/17/10 Page 11 of 47 Google and generate revenue on either per-click or per-thousand-ads-displayed basi Since we are interested in click fraud, we will limit our considerations only to clicks and to the ppc payment method AdSense was launched in March 2003 and constituted the second major milestone in Googles PPC advertising model that generated significant additional revenues company There are two ways for publishers to participate in the Adsense program AdSense for Search(AFS: publishers allow Google to place its ads on their websites when the user does keyword-based searches on their sites. In other words, as a result of a search, relevant ads are displayed as links sponsored by oogle,andtheselinksareproducedusingthesamemethodsasonGoogle.com Examples of such publishers include AOl and earthLink moreover, the search results pages containing the ads are customizable to fit with the publisher's site themeandmayhaveadifferent"flavor"thantheadsonGoogle.com AdSense for Content(AFC): the system that automatically delivers targeted ads to the publisher's web pages that the user is visiting. These ads are based on the content of the visited pages, geographical location and some other factors. These ads are usually preceded by statement"Ads by google. Google has developed methods for matching the ads to the content of the pages that also take into account the CPC values when selecting the best ads to place on the page. The whole idea is to display ads that are relevant to the users and to what the users are looking for on the site so that they would click on the displayed ads. This is also combined with financial considerations (the CPC factor ) to maximize the expected revenues for Google from displaying the ad In both the AFS and the AFC cases, the publishers and Google are being paid by the advertisers on the PPC basis. Google does not disclose how it shares the clicking revenues with the publishers. What the publishers can see though, are the detailed online eports helping the publishers to track their earnings. These reports contain several statistics of clicking activities on the ads displayed on publishers website. These statistics help the publisher to get an idea of how well his or her website is performing in the adsense program and how much the publisher is expected to earn over time As we can see from this description, there is a direct incentive for the publishers to attract traffic to their websites and encourage the visitors to click on Google's ads on the site to maximize their own AdSense income. They can do this in three ways Build a valuable content on the site that attracts the most highly paid ads Use a wide range of traffic generating te g ol Encourage clicks on ads using legitimate means( Google has a list of prohibited activities for the publishers, such as explicit requests to click on Googles ads nat can lead to terminations of their accounts)
11 Google and generate revenue on either per-click or per-thousand-ads-displayed basis. Since we are interested in click fraud, we will limit our considerations only to clicks and to the PPC payment method. AdSense was launched in March 2003 and constituted the second major milestone in Google’s PPC advertising model that generated significant additional revenues for the company. There are two ways for publishers to participate in the AdSense program: • AdSense for Search (AFS): publishers allow Google to place its ads on their websites when the user does keyword-based searches on their sites. In other words, as a result of a search, relevant ads are displayed as links sponsored by Google, and these links are produced using the same methods as on Google.com. Examples of such publishers include AOL and EarthLink. Moreover, the search results pages containing the ads are customizable to fit with the publisher’s site theme, and may have a different “flavor” than the ads on Google.com. • AdSense for Content (AFC): the system that automatically delivers targeted ads to the publisher’s web pages that the user is visiting. These ads are based on the content of the visited pages, geographical location and some other factors. These ads are usually preceded by statement “Ads by Google.” Google has developed methods for matching the ads to the content of the pages that also take into account the CPC values when selecting the best ads to place on the page. The whole idea is to display ads that are relevant to the users and to what the users are looking for on the site so that they would click on the displayed ads. This is also combined with financial considerations (the CPC factor) to maximize the expected revenues for Google from displaying the ad. In both the AFS and the AFC cases, the publishers and Google are being paid by the advertisers on the PPC basis. Google does not disclose how it shares the clicking revenues with the publishers. What the publishers can see though, are the detailed online reports helping the publishers to track their earnings. These reports contain several statistics of clicking activities on the ads displayed on publisher’s website. These statistics help the publisher to get an idea of how well his or her website is performing in the AdSense program and how much the publisher is expected to earn over time. As we can see from this description, there is a direct incentive for the publishers to attract traffic to their websites and encourage the visitors to click on Google’s ads on the site to maximize their own AdSense income. They can do this in three ways: • Build a valuable content on the site that attracts the most highly paid ads. • Use a wide range of traffic generating techniques, including online advertising. • Encourage clicks on ads using legitimate means (Google has a list of prohibited activities for the publishers, such as explicit requests to click on Google’s ads, that can lead to terminations of their accounts). Case 1:08-cv-03139-RRM -RER Document 51-6 Filed 03/17/10 Page 11 of 47
Case 1: 08-CV-03139-RRM-RER Document 51-6 Filed 03/17/10 Page 12 of 47 Unfortunately, overzealous and unethical users can"stretch"or directly abuse this system in the effort to maximize their revenues from the adSense program. This leads to the invalid clicks problem discussed in the next section It is interesting to note that AdWords and AdSense have different motivations for the unethical users to abuse the programs. Unethical users on AdWords constitute advertisers or their partners whose motivation is to hurt other advertisers. In contrast to this, the main motivation of the AdSense unethical publishers is to enrich themselves through certain prohibited means. Therefore, motivations of these two groups of unethical users are significantly different Although both motivations are important and should be addressed in the most serious manner, greedy motivations of unethical AdSense publishers constitute more serious problem for Google than the desire to hurt the competitors by unethical advertisers or their partners. This results in a significantly greater percentage of invalid clicks being generated by unethical AdSense publishers than by unethical AdWords advertisers (however, it is not clear if this statement is still true in terms of absolute numbers of invalid clicks generated by these two sources because of different volumes of clicks for the two programs) 7. 3 The Google Network Initially, Google's sponsored links were displayed only on Google. com. However,over the years, Google built and expanded its partners network to include various websites into, the so-called, Google Network. With this network of partners, Google ads can be placed not only on Google. com but also on the partners websites either using the search based or the content-based methods described in Section 7. 2. Google provides tools for advertisers to express preferences on which types of sites in the Network they prefer their ads to appear Based on how these ads are placed, Google Network can be categorized into the following types of websites Google. com: the flagship and the original site in the Network against which all other Network sites are compared AdSense for Content (AFC) sites: web publishers'sites where content-based ads are served as described in Section 7.2. These publishers are divided into o Direct Publishers: the most important and trusted publishers, such as New York Times, with whom Google has special relationships. Because of the brand names and reputations of these publishers, very little invalid clicking activities occur on these websites Even when invalid clicking activities occur, they usually arise because of some technical problems and miscommunications" between Google's and publisher's software the resulting invalid clicks are credited back to advertisers resolved. and stems. These problems are usually quickly detected 12
12 Unfortunately, overzealous and unethical users can “stretch” or directly abuse this system in the effort to maximize their revenues from the AdSense program. This leads to the invalid clicks problem discussed in the next section. It is interesting to note that AdWords and AdSense have different motivations for the unethical users to abuse the programs. Unethical users on AdWords constitute advertisers or their partners whose motivation is to hurt other advertisers. In contrast to this, the main motivation of the AdSense unethical publishers is to enrich themselves through certain prohibited means. Therefore, motivations of these two groups of unethical users are significantly different. Although both motivations are important and should be addressed in the most serious manner, greedy motivations of unethical AdSense publishers constitute more serious problem for Google than the desire to hurt the competitors by unethical advertisers or their partners. This results in a significantly greater percentage of invalid clicks being generated by unethical AdSense publishers than by unethical AdWords advertisers (however, it is not clear if this statement is still true in terms of absolute numbers of invalid clicks generated by these two sources because of different volumes of clicks for the two programs). 7.3 The Google Network Initially, Google’s sponsored links were displayed only on Google.com. However, over the years, Google built and expanded its partner’s network to include various websites into, the so-called, Google Network. With this network of partners, Google ads can be placed not only on Google.com but also on the partners’ websites either using the searchbased or the content-based methods described in Section 7.2. Google provides tools for advertisers to express preferences on which types of sites in the Network they prefer their ads to appear. Based on how these ads are placed, Google Network can be categorized into the following types of websites: • Google.com: the flagship and the original site in the Network against which all other Network sites are compared. • AdSense for Content (AFC) sites: web publishers’ sites where content-based ads are served as described in Section 7.2. These publishers are divided into o Direct Publishers: the most important and trusted publishers, such as New York Times, with whom Google has special relationships. Because of the brand names and reputations of these publishers, very little invalid clicking activities occur on these websites. Even when invalid clicking activities occur, they usually arise because of some technical problems and “miscommunications” between Google’s and publisher’s software systems. These problems are usually quickly detected and resolved, and the resulting invalid clicks are credited back to advertisers. Case 1:08-cv-03139-RRM -RER Document 51-6 Filed 03/17/10 Page 12 of 47
Case 1: 08-CV-03139-RRM-RER Document 51-6 Filed 03/17/10 Page 13 of 47 o Online Publishers. smaller "self-serv lblishers. such as various bloggers who joined the AdSense program. Most of the invalid clicking activities are associated with these publishers AdSense for Search(AFS) sites: search sites displaying Google's ads based on the searches done by the site visitors, as described in Section 7. 2. These sites are also EarthLink, with whom Google also has special relationship s AOL and o Direct: the most important and trusted search sites, such o Online. other search sites Most of the search sites are Direct with whom Google has special relationships This network of partner sites is constantly evolving as new partners are added and old ones either leave or are terminated by Google. All the partner sites in the network are periodically reviewed and monitored to detect possible problems and assure advertisers hat their ads are placed only on the sites that passed certain quality control standards Among the five types of sites in the Google network, the one category that is intrinsically prone to invalid clicking activities is the AFC Online category. Examples of these publishers include various bloggers and"homegrown"web masters with unknown or unclear reputation in the field 7. 4 What Google Knows about clicking Activities In order to manage the AdSense and AdWords programs, properly charge advertisers for the PPC revenue model, share revenues with publishers and detect invalid clicks, Google collects various types of information about querying and clicking activities, including certain types of post-clicking" data about conversion actions on the advertisers website where the visitor is taken following the click. All this data accumulated by Google is extracted from various sources and contains comprehensive information about visitor's activities on the google Network On stated before, the conversion data-the"post-clicking"data about conversion actions the advertiser's website constitutes an important piece of this collected data. In particular, if the advertiser formally agrees to provide this information, Google collects data on whether or not the user visited certain designated pages on the advertised website that the advertiser marked as"conversion" pages, such as the checkout page and certain form filling pages. This conversion data is limited to what the advertiser decided to provide to Google and is not as rich as the clickstream data collected by advertisers themselves on their websites. Also, many advertisers decide to opt out from providing this conversion data. In this case, Google does not have any conversion information and therefore does not know what happened after a visitor clicked on the ad. Nevertheless this post-clicking conversion data is important for Google even in its limited form because it conveys some intentions of the visitors on the advertised website and provides good insights into whether or not the visitor is seriously considering purchasing the advertised product or service 13
13 o Online Publishers: smaller “self-service” publishers, such as various bloggers who joined the AdSense program. Most of the invalid clicking activities are associated with these publishers. • AdSense for Search (AFS) sites: search sites displaying Google’s ads based on the searches done by the site visitors, as described in Section 7.2. These sites are also divided into o Direct: the most important and trusted search sites, such as AOL and EarthLink, with whom Google also has special relationships. o Online: other search sites. Most of the search sites are Direct with whom Google has special relationships. This network of partner sites is constantly evolving as new partners are added and old ones either leave or are terminated by Google. All the partner sites in the network are periodically reviewed and monitored to detect possible problems and assure advertisers that their ads are placed only on the sites that passed certain quality control standards. Among the five types of sites in the Google network, the one category that is intrinsically prone to invalid clicking activities is the AFC Online category. Examples of these publishers include various bloggers and “homegrown” web masters with unknown or unclear reputation in the field. 7.4 What Google Knows about Clicking Activities In order to manage the AdSense and AdWords programs, properly charge advertisers for the PPC revenue model, share revenues with publishers and detect invalid clicks, Google collects various types of information about querying and clicking activities, including certain types of “post-clicking” data about conversion actions on the advertiser’s website where the visitor is taken following the click. All this data accumulated by Google is extracted from various sources and contains comprehensive information about visitor’s activities on the Google Network. As stated before, the conversion data – the “post-clicking” data about conversion actions on the advertiser’s website – constitutes an important piece of this collected data. In particular, if the advertiser formally agrees to provide this information, Google collects data on whether or not the user visited certain designated pages on the advertised website that the advertiser marked as “conversion” pages, such as the checkout page and certain form filling pages. This conversion data is limited to what the advertiser decided to provide to Google and is not as rich as the clickstream data collected by advertisers themselves on their websites. Also, many advertisers decide to opt out from providing this conversion data. In this case, Google does not have any conversion information and therefore does not know what happened after a visitor clicked on the ad. Nevertheless, this post-clicking conversion data is important for Google even in its limited form because it conveys some intentions of the visitors on the advertised website and provides good insights into whether or not the visitor is seriously considering purchasing the advertised product or service. Case 1:08-cv-03139-RRM -RER Document 51-6 Filed 03/17/10 Page 13 of 47
Case 1: 08-CV-03139-RRM-RER Document 51-6 Filed 03/17/10 Page 14 of 47 This "raw"clicking data described above is subsequently cleaned, preprocessed and stored in various internal logs by Google for different types of subsequent analy conducted on this data One inherent weakness of Google's(or any other search engine) data collection effort that is important for detecting invalid clicks, is inability to get full access to all the clicking activities of the visitors of the advertised website. In other words the conversion data that Google collects provides only a partial picture of all the post-clicking activities of the visitor on the advertised website. This data is important for detecting invalid clicks since better invalid click detection methods can be developed using this data Unfortunately, Google(and other search engines) does not have full access to this data unless the advertised website decides to provide its clickstream data to Google, which many websites are reluctant to do. However, this is not Google's fault - this is an inherent limitation of the types of data available to Google However, this lack of full conversion data available to Google is compensated by various types of querying and clicking data that Google can collect, whereas advertisers and third-party vendors cannot. Therefore, there exists a tradeoff between the types of data relevant for detecting invalid clicks that is available to google. advertisers and the third- party vendors. None of these three groups have the most comprehensive set of data pertinent to detecting invalid clicks, and each of them needs to settle for the invalid click detection methods possible only with the data that they have 7.5 The Advertisers'Dilemma or What Knowledge Google Shares with Advertisers about clicks When advertisers are billed by Google, they receive reports describing the clicking and billing activities. These reports can be customized by the advertisers who can select various clicking statistics that they want to see in these reports. These reports were much simpler initially; but Google enhanced its reporting functionality over the last few years and the customers can see a wide range of clicking statistics in these reports now One problem with these reports, however, is that these statisti Google over some time period. The smallest unit of analysis is one day. For example, the number of invalid clicks on an ad detected by Google(or any other related statistic)can only be reported on a daily basis(although there are certain alternative methods obtaining aggregation granularity that is smaller than a day). In other words, advertisers cannot know if a particular click on a particular ad was marked as valid or invalid by Google, and Google refuses to provide this information to advertisers This is a source of contention and dispute between Google and the advertisers, and one can understand both parties in this dispute. On one hand, the advertiser has the right to know why a particular click was marked as valid by google(when the advertiser thinks that it is invalid) because the advertiser pays for this click. On the other hand, if Google 14
14 This “raw” clicking data described above is subsequently cleaned, preprocessed and stored in various internal logs by Google for different types of subsequent analysis conducted on this data. One inherent weakness of Google’s (or any other search engine) data collection effort that is important for detecting invalid clicks, is inability to get full access to all the clicking activities of the visitors of the advertised website. In other words, the conversion data that Google collects provides only a partial picture of all the post-clicking activities of the visitor on the advertised website. This data is important for detecting invalid clicks since better invalid click detection methods can be developed using this data. Unfortunately, Google (and other search engines) does not have full access to this data, unless the advertised website decides to provide its clickstream data to Google, which many websites are reluctant to do. However, this is not Google’s fault – this is an inherent limitation of the types of data available to Google. However, this lack of full conversion data available to Google is compensated by various types of querying and clicking data that Google can collect, whereas advertisers and third-party vendors cannot. Therefore, there exists a tradeoff between the types of data relevant for detecting invalid clicks that is available to Google, advertisers and the thirdparty vendors. None of these three groups have the most comprehensive set of data pertinent to detecting invalid clicks, and each of them needs to settle for the invalid click detection methods possible only with the data that they have. 7.5 The Advertisers’ Dilemma or What Knowledge Google Shares with Advertisers about Clicks When advertisers are billed by Google, they receive reports describing the clicking and billing activities. These reports can be customized by the advertisers who can select various clicking statistics that they want to see in these reports. These reports were much simpler initially; but Google enhanced its reporting functionality over the last few years, and the customers can see a wide range of clicking statistics in these reports now. One problem with these reports, however, is that these statistics are aggregated by Google over some time period. The smallest unit of analysis is one day. For example, the number of invalid clicks on an ad detected by Google (or any other related statistic) can only be reported on a daily basis (although there are certain alternative methods of obtaining aggregation granularity that is smaller than a day). In other words, advertisers cannot know if a particular click on a particular ad was marked as valid or invalid by Google, and Google refuses to provide this information to advertisers. This is a source of contention and dispute between Google and the advertisers, and one can understand both parties in this dispute. On one hand, the advertiser has the right to know why a particular click was marked as valid by Google (when the advertiser thinks that it is invalid) because the advertiser pays for this click. On the other hand, if Google Case 1:08-cv-03139-RRM -RER Document 51-6 Filed 03/17/10 Page 14 of 47
Case 1: 08-CV-03139-RRM-RER Document 51-6 Filed 03/17/10 Page 15 of 47 discloses this information, it opens itself to click fraud on a massive scale because, by doing so, it provides certain hints about how its invalid click detection methods work This means that unethical users will immediately take advantage of this information to conduct more sophisticated fraudulent activities undetectable by Google's methods This conflicting dilemma between advertisers' right to know and Google's inability to provide the appropriate information to advertisers because of the security concerns is part of the Fundamental Problem of the PPC advertising model to be discussed in the next More recently, Google tried to bridge this gap between Google and the advertisers by explaining to advertisers a little more about Google's invalid click detection efforts However, these activities, although indicative of Google's desire to work closer with the advertisers, are too small to be of any major consequence. Therefore, the gap described above and the Fundamental Problem of the PPC model still remains pretty much open 8. Invalid clicks and Google's Definition 8.1. Conceptual Definitions of Invalid clicks There are numerous definitions of fraudulent and invalid clicks. One such definition takenfromwiKipedia(http://en.wikipediaorg/wiki/invalidclick),is Click fraud occurs in pay per click online advertising when a person, automated on an ad, for the purpose of generating an improper charge per click 'Ser clickin script or computer program imitates a legitimate user of a web brow Google does not like the concept of""clicks and uses the term"invalid'"(or spam")click instead. Google provides the following definition of invalid clicks (https://www.google.com/support/adsense/bin/answer.py?answer=32740&topic=8526) Clicks. generated through prohibited means, and intended to artificially ck.. counts on a publisher [or advertiser- AST]account Google has also used other definitions of invalid clicks in the past, such as Click spam [invalid click- AST] is any kind of click received from a Cost-Per- Click(CPC)advertising engine that is generated artificially though human or technological means with the sole purpose of creating a debiting click, resulting in bility fo All these related definitions emphasize the following points Invalid clicks can be generated either by humans or technological means including various types of deceptive software programs, such as scripts or bots
15 discloses this information, it opens itself to click fraud on a massive scale because, by doing so, it provides certain hints about how its invalid click detection methods work. This means that unethical users will immediately take advantage of this information to conduct more sophisticated fraudulent activities undetectable by Google’s methods. This conflicting dilemma between advertisers’ right to know and Google’s inability to provide the appropriate information to advertisers because of the security concerns is part of the Fundamental Problem of the PPC advertising model to be discussed in the next section. More recently, Google tried to bridge this gap between Google and the advertisers by explaining to advertisers a little more about Google’s invalid click detection efforts. However, these activities, although indicative of Google’s desire to work closer with the advertisers, are too small to be of any major consequence. Therefore, the gap described above and the Fundamental Problem of the PPC model still remains pretty much open. 8. Invalid Clicks and Google’s Definition 8.1. Conceptual Definitions of Invalid Clicks There are numerous definitions of fraudulent and invalid clicks. One such definition, taken from Wikipedia (http://en.wikipedia.org/wiki/Invalid_click), is “Click fraud occurs in pay per click online advertising when a person, automated script or computer program imitates a legitimate user of a web browser clicking on an ad, for the purpose of generating an improper charge per click.” Google does not like the concept of “fraudulent” clicks and uses the term “invalid” (or “spam”) click instead. Google provides the following definition of invalid clicks (https://www.google.com/support/adsense/bin/answer.py?answer=32740&topic=8526): “Clicks … generated through prohibited means, and intended to artificially increase click … counts on a publisher [or advertiser – AST] account” Google has also used other definitions of invalid clicks in the past, such as Click spam [invalid click – AST] is any kind of click received from a Cost-PerClick (CPC) advertising engine that is generated artificially though human or technological means with the sole purpose of creating a debiting click, resulting in zero possibility for a conversion to occur All these related definitions emphasize the following points: • Invalid clicks can be generated either by humans or technological means, including various types of deceptive software programs, such as scripts or bots. Case 1:08-cv-03139-RRM -RER Document 51-6 Filed 03/17/10 Page 15 of 47