the sale, it should not be adopted ubiquitously. When consumers already have a planned purchase in mind(as in our setting), active personalization may actually cause the conversion rate to drop 2. Literature review Our research is related to the fields of search engine ranking and online position effect. Over the past few years, two opposite views have been held towards the position effect in product search. On one hand, consumers are endowed with cognitive limitation. Eye-tracking studies have long shown chat people tend to scan the search results in order(e. g, Aula and Rodden 2009). Hence, the same link will have a higher click-through rate(CTR) if it is positioned towards the top of the page versus the bottom (e. g, Srikant et al 2010). Studies have also found empirical evidence suggesting significant effect of rank order in the context of search engine-based keyword advertising (e.g Ghose and Yang 2009, Rutz and bucklin 2007) However, very little empirical work actually examines the rank order effect on product demand in searching and purchasing commercial products. A few existing studies such as Baye et al. (2009) examine the ranking effect on click-through rate as a substitute for the actual demand (conversions) Other studies tend to focus only on a single search dimension, for example, examining the competition of retailers ranked on price search engines(e.g, Ellison and Ellison 2009) In contrast to the above theoretical work, consumers have been found to be"variety-seeking"in their economic choice making process(e. g, McAlister 1982, Givon 1984 ). Especially, recent studies have shown that when consumers search and shop for commercial products online, they tend to examine the variety reflected in the set of product search results as a whole for their choice decision(Agrawal et al 2009, Panigrahi and Gollapudi 2011). This is different from the traditional web search (i.e, which returns web pages) where people often examine the results in a top-down order. As a consequence, the rank order of the product search results(i.e, which contain normally commercial products) may not have significant effects as in the web page search context (Bhattacharya et al 2011), whereas only the diversity of products in the search results set matters Therefore, one of our major goals in this research is to examine whether there exists a significant
6 the sale, it should not be adopted ubiquitously. When consumers already have a planned purchase in mind (as in our setting), active personalization may actually cause the conversion rate to drop. 2. Literature Review Our research is related to the fields of search engine ranking and online position effect. Over the past few years, two opposite views have been held towards the position effect in product search. On one hand, consumers are endowed with cognitive limitation. Eye-tracking studies have long shown that people tend to scan the search results in order (e.g., Aula and Rodden 2009). Hence, the same link will have a higher click-through rate (CTR) if it is positioned towards the top of the page versus the bottom (e.g., Srikant et al 2010). Studies have also found empirical evidence suggesting significant effect of rank order in the context of search engine-based keyword advertising (e.g., Ghose and Yang 2009, Rutz and Bucklin 2007). However, very little empirical work actually examines the rank order effect on product demand in searching and purchasing commercial products. A few existing studies such as Baye et al. (2009) examine the ranking effect on click-through rate as a substitute for the actual demand (conversions). Other studies tend to focus only on a single search dimension, for example, examining the competition of retailers ranked on price search engines (e.g., Ellison and Ellison 2009). In contrast to the above theoretical work, consumers have been found to be “variety-seeking” in their economic choice making process (e.g., McAlister 1982, Givon 1984). Especially, recent studies have shown that when consumers search and shop for commercial products online, they tend to examine the variety reflected in the set of product search results as a whole for their choice decision (Agrawal et al 2009, Panigrahi and Gollapudi 2011). This is different from the traditional web search (i.e., which returns web pages) where people often examine the results in a top-down order. As a consequence, the rank order of the product search results (i.e., which contain normally commercial products) may not have significant effects as in the web page search context (Bhattacharya et al 2011), whereas only the diversity of products in the search results set matters. Therefore, one of our major goals in this research is to examine whether there exists a significant
anking effect in product search. By combining archival data analysis with a set of randomized experiments, our research can thus provide critical insights on the impact of search engine ranking and design on users' search and purchase behavior from a causal perspective Existing research also holds two different opinions toward the effects of personalization supportive and skeptical(Arora et al. 2008 ). From the supportive perspective, Malthouse and Elsner(2006)show in a field test that personalizing the copy used in a book offer increases response rates significantly. Rossi et al. (1996) quantified the benefits of adopting one-to-one pricing by utilizing household purchase history data and empirically found that individual personalization improves 7.6% over mass optimization. Ansari and Mela(2003)found that targeting the content can potentially increase the expected number of click through by 62%.Arora and Henderson(2007)showed targeting at individual level can enhance the efficiency of embedded premium. From the skeptical perspective, Zhang and Wedel(2009)investigate the profit potential of various promotion programs customized at different levels in online and offline stores. They found that the incremental benefits of one-to-one promotions over segment- and market-level customized promotions were small in general, especially in offline stores Furthermore, one major concern in one-to-one marketing is invasion of privacy( Chellappa and Sin 2005, Arora et al. 2008 ) Chellappa and Sin(2005)developed a parsimonious model to predict consumers' usage of online personalization as a result of the tradeoff between their value for personalization and concern for privacy. They found that a consumer's intent to use personalization services is positively influenced by her trust in the vendor. A recent experimental study by aral and Walker(201 1)looked at application adoptions among 1. 4 million friends of over 9,000 users on Facebook. com, and found that active-personalized invitations are less effective in generating peer influence and social contagion compared to passive-broadcast notifications In summary, existing studies indicate although personalization can lead to customer satisfaction and profits, it may not work universally. Moreover, the level of personalization design is sensitive to the context and consumer behavior. Therefore. another goal of our research is to examine consumer online search
7 ranking effect in product search. By combining archival data analysis with a set of randomized experiments, our research can thus provide critical insights on the impact of search engine ranking and design on users’ search and purchase behavior from a causal perspective. Existing research also holds two different opinions toward the effects of personalization: supportive and skeptical (Arora et al. 2008). From the supportive perspective, Malthouse and Elsner (2006) show in a field test that personalizing the copy used in a book offer increases response rates significantly. Rossi et al. (1996) quantified the benefits of adopting one-to-one pricing by utilizing household purchase history data and empirically found that individual personalization improves 7.6% over mass optimization. Ansari and Mela (2003) found that targeting the content can potentially increase the expected number of click through by 62%. Arora and Henderson (2007) showed targeting at individual level can enhance the efficiency of embedded premium. From the skeptical perspective, Zhang and Wedel (2009) investigate the profit potential of various promotion programs customized at different levels in online and offline stores. They found that the incremental benefits of one-to-one promotions over segment- and market-level customized promotions were small in general, especially in offline stores. Furthermore, one major concern in one-to-one marketing is invasion of privacy (Chellappa and Sin 2005, Arora et al. 2008). Chellappa and Sin (2005) developed a parsimonious model to predict consumers’ usage of online personalization as a result of the tradeoff between their value for personalization and concern for privacy. They found that a consumer’s intent to use personalization services is positively influenced by her trust in the vendor. A recent experimental study by Aral and Walker (2011) looked at application adoptions among 1.4 million friends of over 9,000 users on Facebook.com, and found that active-personalized invitations are less effective in generating peer influence and social contagion compared to passive-broadcast notifications. In summary, existing studies indicate although personalization can lead to customer satisfaction and profits, it may not work universally. Moreover, the level of personalization design is sensitive to the context and consumer behavior. Therefore, another goal of our research is to examine consumer online search
and purchase behavior under different levels of personalization mechanisms on product search engines 3. Data Our dataset consists of detailed information on a total of 969.033 online sessions from Travelocity. com, including consumer searches, clicks and conversions that occurred within these sessions over 3 months from 2008/11 to 2009/1. Besides, we supplement our search and transaction data with hotel service-, location- and customer review-based information collected using various machine learning techniques such as image classification and text mining tools. This provides us a final dataset with a total of 29, 222 weekly observations for 2117 hotels in the US. More specifically, our dataset combines four major sources 3.1.ConsumerSearchClickandConversionDatafromTravelocity.com We have complete information on consumer searching and shopping behavior. a typical online session involves the initialization of the session, the search query, the results (in a particular rank order) returned from that search query, the sorting method, the click(s)on hotel(s)if there exists any, the login and actual transaction(s) if any conversion occurs, and the termination of the session We count a display for a hotel if that hotel appears visible to a consumer on the web page in online search session. Meanwhile, a"click? "is counted if the hotel is selected by a consumer, and a conversion"is counted only if a consumer has finished the payment in that online session. Since our major goal is to exam the effect of rank order displayed on a page, we focus only on the sessions with at least one display a display often leads to a click but it may not lead to an actual purchase. Each hotel that counts for a display is associated with a page number and a screen position, which capture the corresponding page order and( within-page) rank order of that hotel in he search results. Notice that when Travelocity displays the hotel search results on a web page, it 2 In some cases, users may initiate a session and look for general travel information, for example the area of the city rather than search for any hotels, thus there will be no hotels displayed on any web page. We excluded such sessions in our analysis
8 and purchase behavior under different levels of personalization mechanisms on product search engines. 3. Data Our dataset consists of detailed information on a total of 969,033 online sessions from Travelocity.com, including consumer searches, clicks and conversions that occurred within these sessions over 3 months from 2008/11 to 2009/1. Besides, we supplement our search and transaction data with hotel service-, location- and customer review-based information collected using various machine learning techniques such as image classification and text mining tools. This provides us a final dataset with a total of 29,222 weekly observations for 2117 hotels in the US. More specifically, our dataset combines four major sources: 3.1. Consumer Search, Click and Conversion Data from Travelocity.com We have complete information on consumer searching and shopping behavior. A typical online session involves the initialization of the session, the search query, the results (in a particular rank order) returned from that search query, the sorting method, the click(s) on hotel(s) if there exists any, the login and actual transaction(s) if any conversion occurs, and the termination of the session. We count a “display” for a hotel if that hotel appears visible to a consumer on the web page in an online search session. Meanwhile, a “click” is counted if the hotel is selected by a consumer, and a “conversion” is counted only if a consumer has finished the payment in that online session. Since our major goal is to exam the effect of rank order displayed on a page, we focus only on the sessions with at least one display2 . A display often leads to a click, but it may not lead to an actual purchase. Each hotel that counts for a display is associated with a page number and a screen position, which capture the corresponding page order and (within-page) rank order of that hotel in the search results. Notice that when Travelocity displays the hotel search results on a web page, it 2 In some cases, users may initiate a session and look for general travel information, for example the area of the city, rather than search for any hotels, thus there will be no hotels displayed on any web page. We excluded such sessions in our analysis
only shows 25 hotels per page. This restricts the rank order for each hotel within the range from 1 to 25. Meanwhile, to facilitate consumer search, Travelocity provides a sorting criterion called Travelocity Pick" by default. Besides, it also provides multiple alternative sorting criteria: Price, Hotel Class, Hotel Name, and Customer Review Rating. To capture consumers' particular sorting preferences that may potentially influence the position effect, we include a control variable in our study to indicate how frequently a hotel appears in a result list under a"special sort In addition, we also have supplemental data collected from three other sources. We only briefly discuss them below 3. 2. Hotel characteristics Location Characteristics: We used geo-mapping search tools(Bing Maps APD) and social geo- tags (from geonames. org) to identify the external amenities (e.g, shops, bars) and public transportation in the area around the hotel. We also used image classification together with Mechanical Turk to examine whether there is a nearby beach, a nearby lake, a downtown area, and whether the hotel is close to a highway. We extracted these characteristics within an area of 0.25- mile. 0.5 mile. 1-mile and 2-mile radius Service Characteristics: This category contains hotel class, number of internal amenities and number of rooms. Hotel class is an internationally accepted standard ranging from 1-5 stars, representing low to high hotel grades. Number of internal amenities is the aggregation of hotel internal amenities, such as bed quality, hotel staff, food quality, bathroom amenities and parking facility. We extracted this information from the Tripadvisor website using fully automated parsing Since hotel amenities are not directly listed on the Tripadvisor website, we retrieved them by following the link provided on the hotel web page, which randomly directs the user to one of its cooperating partner websites(e.g, Travelocity, Orbitz) Review Characteristics: We collected customer reviews from Travelocity. com. The online reviews and reviewers information were collected on a daily basis up to January 31, 2009(the last Recently Travelocity has upgraded the webpage design by showing 10 hotels per page. However, during our examination time period this number was 25
9 only shows 25 hotels per page3 . This restricts the rank order for each hotel within the range from 1 to 25. Meanwhile, to facilitate consumer search, Travelocity provides a sorting criterion called “Travelocity Pick” by default. Besides, it also provides multiple alternative sorting criteria: Price, Hotel Class, Hotel Name, and Customer Review Rating. To capture consumers’ particular sorting preferences that may potentially influence the position effect, we include a control variable in our study to indicate how frequently a hotel appears in a result list under a “special sort.” In addition, we also have supplemental data collected from three other sources. We only briefly discuss them below. 3.2. Hotel Characteristics Location Characteristics: We used geo-mapping search tools (Bing Maps API) and social geotags (from geonames.org) to identify the external amenities (e.g., shops, bars) and public transportation in the area around the hotel. We also used image classification together with Mechanical Turk to examine whether there is a nearby beach, a nearby lake, a downtown area, and whether the hotel is close to a highway. We extracted these characteristics within an area of 0.25- mile, 0.5 mile, 1-mile, and 2-mile radius. Service Characteristics: This category contains hotel class, number of internal amenities and number of rooms. Hotel class is an internationally accepted standard ranging from 1-5 stars, representing low to high hotel grades. Number of internal amenities is the aggregation of hotel internal amenities, such as bed quality, hotel staff, food quality, bathroom amenities and parking facility. We extracted this information from the Tripadvisor website using fully automated parsing. Since hotel amenities are not directly listed on the Tripadvisor website, we retrieved them by following the link provided on the hotel web page, which randomly directs the user to one of its cooperating partner websites (e.g., Travelocity, Orbitz). Review Characteristics: We collected customer reviews from Travelocity.com. The online reviews and reviewers’ information were collected on a daily basis up to January 31, 2009 (the last 3 Recently Travelocity has upgraded the webpage design by showing 10 hotels per page. However, during our examination time period, this number was 25
date of transactions in our database). In addition to the total number of reviews and the numeric reviewer rating, we extracted indicators that measure the stylistic characteristics of the reviews for robustness checks. We examined two text-style features: subjectivity and readability of reviews Ghose and Ipeirotis 2011). Also, since prior research suggested that disclosure of identity information is associated with changes in subsequent online product sales( Forman et al 2008 ),we measured the percentage of reviewers for each hotel who reveal their name or location information on their profile pages Table 1. Definitions and Summary Statistics of variables Variable Definition Mean std dev Min max Search Click and Conversion Data PRICE Transaction price per room per night 12045 732525.77 978 DISPLAY Number of displays 213.6538228 14849 CLICK Number of clicks 3.55 CONVERSION Number of conversions 1.26 0.66 PAGE Page number of the hotel 20.86 13.4 0011 192 RANK Screen position of the hotel within a page 1209 4.32 25 Hotel location-Related characteristics BEACH Beachfront within 0.6 miles 18 LAKE Lake or river within 0.6 miles 2 TRANS Public transportation within 0.6 miles HIGHWAY Highway exits within 0.6 miles 74 DOWNTOWN Downtown area within 0.6 miles 67 000000 EXTAMENITY Number of external amenities within 1 mile 4.57 792 27 i.e., restaurants, shopping malls, or bars CRIME City annual crime rate 193.19126.70 13 Hotel service-Related characteristics CLASS Hotel class 3.36 1.37 AMENiTyCNT Total number of hotel amenities 1154 7.56 12 23 ROOMS Total number of hotel rooms 212.30250.70 122900 Hotel review-Related characteristics REVTEWCNT Total number of reviews 21.06 29.28 202 RATING Overall reviewer rating SPECialSORt Number of times using a sorting method 37726 Total number of hotels in a city 24.03 BRAND Dummies for 9 hotel brands: Accor. Be western, Cendant, Choice, Hilton, Hyatt ntercontinental. Marriott and Starwood Number of observations(Weekly-Level): 29, 222 Time period:I1/2008-1/31/2009
10 date of transactions in our database). In addition to the total number of reviews and the numeric reviewer rating, we extracted indicators that measure the stylistic characteristics of the reviews for robustness checks. We examined two text-style features: subjectivity and readability of reviews (Ghose and Ipeirotis 2011). Also, since prior research suggested that disclosure of identity information is associated with changes in subsequent online product sales (Forman et al 2008), we measured the percentage of reviewers for each hotel who reveal their name or location information on their profile pages. Table 1. Definitions and Summary Statistics of Variables Variable Definition Mean Std. Dev. Min Max Search, Click and Conversion Data PRICE Transaction price per room per night 120.45 73.25 25.77 978 DISPLAY Number of displays 213.65 382.28 1 4849 CLICK Number of clicks 2.99 3.55 0 56 CONVERSION Number of conversions 1.26 0.66 0 9 PAGE Page number of the hotel 20.86 13.44 1 192 RANK Screen position of the hotel within a page 12.09 4.32 1 25 Hotel Location-Related Characteristics BEACH Beachfront within 0.6 miles .18 .38 0 1 LAKE Lake or river within 0.6 miles .22 .42 0 1 TRANS Public transportation within 0.6 miles .30 .46 0 1 HIGHWAY Highway exits within 0.6 miles .74 .44 0 1 DOWNTOWN Downtown area within 0.6 miles .67 .47 0 1 EXTAMENITY Number of external amenities within 1 mile, i.e., restaurants, shopping malls, or bars 4.57 7.92 0 27 CRIME City annual crime rate 193.19 126.70 3 1310 Hotel Service-Related Characteristics CLASS Hotel class 3.36 1.37 1 5 AMENITYCNT Total number of hotel amenities 11.54 7.56 2 23 ROOMS Total number of hotel rooms 212.30 250.70 12 2900 Hotel Review-Related Characteristics REVIEWCNT Total number of reviews 21.06 29.28 1 202 RATING Overall reviewer rating 3.84 .85 1 5 Control Variables SPECIALSORT Number of times using a sorting method 204.64 377.26 0 4810 H Total number of hotels in a city 24.03 56.48 1 922 BRAND Dummies for 9 hotel brands: Accor, Best western, Cendant, Choice, Hilton, Hyatt, Intercontinental, Marriott, and Starwood -- -- 0 1 Number of Observations (Weekly-Level): 29,222 Time Period: 11/1/2008-1/31/2009