WWw 2011-Session: E-commerce March 28-April 1, 2011, Hyderabad, India Towards a Theory Model for Product Search Beibei li Anindya Ghose Panagiotis G. Ipeirotis bli@stern. nyu. edu lose@stern. nyu. edu panos @stern. nyu. edu Department of Information, Operations, and Management Sciences Leonard N. Stern School of Business, New York University New York. New York 10012. USA ABSTRACT a product is different from the process of finding With the growing pervasiveness of the Internet, online search document or object. Customers do not simply for products and services is constantI easing. Most product something relevant to their search, but also try to search engines are based on adaptations of theoretical models " best"deal that satisfies their specific desired criteria it is difficult 四 devised for information retrieval. However. the decision mecha- ntify the notion of"best" prod nism that underlies the process of buying a product is different trying to understand what the users are optimizing than the process of locating relevant documents or objects Today's product search engines provide only rudimentary We propose a theory model for product search based on ranking facilities for search results, typically using a single rank expected utility theory from economics. Specifically, we propose ing criterion such as name, price, best selling(volume of sales a ranking technique in which we rank highest the products that or more recently, using customer review ratings. This approach enerate the highest surplus, after the purchase. In a sense. the has quite a few shortcor multidimen- top ranked products are the"best value for money for a specif sional preferences of consumers. Second, it fails to leverage Pat. Our approach builds on research on"demand estimation" the information generated by the online communities, goin beyond simple numerical ratings. Third, it hardly account which further research can build on. We build algorithms that for the heterogeneity of consumers. These drawbacks highl ake into account consumer demographics, heterogeneity of necessitate a recommendation strategy for products that can consumer preferences, and also account for the varying price of better model consumers' underlying behavior, to capture their the products. We show how to achieve this without knowing the multidimensional prefe es and heterogeneous tastes. demographics or purchasing histories of individual consumers Recommender systems [1] could fix some of these problems but by using aggregate demand data. We evaluate our work. but, to the best of our knowledge, existing techniques still have by applying the techniques on hotel search. Our extensive limitations. First, most recommendation mechanisms require ser studies, using more than 15,000 user-provided ranking consumers' to log into the system. However, in reality many comparisons, demonstrate an overwhelming preference for the consumers browse only anonymously. Due to the lack of any rankings generated by our techniques, compared to a large eaningful, personalized recommendat consumers do no number of existing strong state-of-the-art baselines feel compelled to login before purchasing. For example, or Travelocity, less than 2% of the users login. But even when the Categories and Subject Descriptors login, before or after a purchase, consumers are reluctant to H.3.3(Information Storage and Retrieval]: Information of reasons (e. g, time constraints or privacy issues). Therefore, most context information is missing at the individual consumer General terms level. Second, for goods with a low purchase frequency for an Algorithms, Economics, Experimentation, Measurement individual consumer such as hotels. cars. real estate. or even electronics, there are few repeated purchases we could leverage Keywords towards building a predictive model (i.e, models based on onsumer Surplus, Economics, Product Search, Ranking. Text collaborative filtering). Third, and potentially more importantly, Mining, User-Generated Content, Utility Theory as privacy issues become increasingly important, marketers may not be able to observe the individual-level purchase history of each consumer(or consumer segment). In contrast, aggregate 1. INTRODUCTION purchase statistics(e. g, market share) sier to obtain. As a Online search for products is increasing in popularity, as more consequence, many algorithms that rely on knowing individual and more users search and purchase products from the Internet level behavior lack the ability of deriving consumer preferences lost search engines for products today are based on models of from such aggregate data. relevance from"classic"information retrieval theory [22] or use Alternative techniques try to identify the"Pareto optimal variants of faceted search[27] to facilitate browsing. However, set of results [3]. Unfortunately, the feasibility of this approach the decision mechanism that underlies the process of buying diminishes as the number of product characteristics With more than five haracteristics, the probability of a This work Ipported by NSF grants IIs-0613847 and IIS-06413846 point being classified as"Pareto optimal "dramatically increases. is held by the International World wide Web Conference Com- As a consequence, the set of Pareto optimal results soon includes nittee(Iw3C2). Distribution of these papers is limited to classroom use, prod So, how to generate the "best"ranking of products wh 011, March 28-April 1, 2011, Hyderabad, India. ACM978-1-4503-0632-4/1103
Towards a Theory Model for Product Search∗ Beibei Li bli@stern.nyu.edu Anindya Ghose aghose@stern.nyu.edu Panagiotis G. Ipeirotis panos@stern.nyu.edu Department of Information, Operations, and Management Sciences Leonard N. Stern School of Business, New York University New York, New York 10012, USA ABSTRACT With the growing pervasiveness of the Internet, online search for products and services is constantly increasing. Most product search engines are based on adaptations of theoretical models devised for information retrieval. However, the decision mechanism that underlies the process of buying a product is different than the process of locating relevant documents or objects. We propose a theory model for product search based on expected utility theory from economics. Specifically, we propose a ranking technique in which we rank highest the products that generate the highest surplus, after the purchase. In a sense, the top ranked products are the “best value for money” for a specific user. Our approach builds on research on “demand estimation” from economics and presents a solid theoretical foundation on which further research can build on. We build algorithms that take into account consumer demographics, heterogeneity of consumer preferences, and also account for the varying price of the products. We show how to achieve this without knowing the demographics or purchasing histories of individual consumers but by using aggregate demand data. We evaluate our work, by applying the techniques on hotel search. Our extensive user studies, using more than 15,000 user-provided ranking comparisons, demonstrate an overwhelming preference for the rankings generated by our techniques, compared to a large number of existing strong state-of-the-art baselines. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Algorithms, Economics, Experimentation, Measurement Keywords Consumer Surplus, Economics, Product Search, Ranking, Text Mining, User-Generated Content, Utility Theory 1. INTRODUCTION Online search for products is increasing in popularity, as more and more users search and purchase products from the Internet. Most search engines for products today are based on models of relevance from “classic” information retrieval theory [22] or use variants of faceted search [27] to facilitate browsing. However, the decision mechanism that underlies the process of buying ∗This work was supported by NSF grants IIS-0643847 and IIS-0643846. Copyright is held by the International World Wide Web Conference Committee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2011, March 28–April 1, 2011, Hyderabad, India. ACM 978-1-4503-0632-4/11/03. a product is different from the process of finding a relevant document or object. Customers do not simply seek to find something relevant to their search, but also try to identify the “best” deal that satisfies their specific desired criteria. Of course, it is difficult to quantify the notion of “best” product without trying to understand what the users are optimizing. Today’s product search engines provide only rudimentary ranking facilities for search results, typically using a single ranking criterion such as name, price, best selling (volume of sales), or more recently, using customer review ratings. This approach has quite a few shortcomings. First, it ignores the multidimensional preferences of consumers. Second, it fails to leverage the information generated by the online communities, going beyond simple numerical ratings. Third, it hardly accounts for the heterogeneity of consumers. These drawbacks highly necessitate a recommendation strategy for products that can better model consumers’ underlying behavior, to capture their multidimensional preferences and heterogeneous tastes. Recommender systems [1] could fix some of these problems but, to the best of our knowledge, existing techniques still have limitations. First, most recommendation mechanisms require consumers’ to log into the system. However, in reality many consumers browse only anonymously. Due to the lack of any meaningful, personalized recommendations, consumers do not feel compelled to login before purchasing. For example, on Travelocity, less than 2% of the users login. But even when they login, before or after a purchase, consumers are reluctant to give their individual demographic information due to a variety of reasons (e.g., time constraints or privacy issues). Therefore, most context information is missing at the individual consumer level. Second, for goods with a low purchase frequency for an individual consumer, such as hotels, cars, real estate, or even electronics, there are few repeated purchases we could leverage towards building a predictive model (i.e., models based on collaborative filtering). Third, and potentially more importantly, as privacy issues become increasingly important, marketers may not be able to observe the individual-level purchase history of each consumer (or consumer segment). In contrast, aggregate purchase statistics (e.g., market share) are easier to obtain. As a consequence, many algorithms that rely on knowing individuallevel behavior lack the ability of deriving consumer preferences from such aggregate data. Alternative techniques try to identify the “Pareto optimal” set of results [3]. Unfortunately, the feasibility of this approach diminishes as the number of product characteristics increases. With more than five or six characteristics, the probability of a point being classified as “Pareto optimal” dramatically increases. As a consequence, the set of Pareto optimal results soon includes every product. So, how to generate the “best” ranking of products when WWW 2011 – Session: E-commerce March 28–April 1, 2011, Hyderabad, India 327
WWw 2011-Session: E-commerce March 28-April 1, 2011, Hyderabad India multiple criteria? For this purpose, we use two 2. THEORY MODEL fundamental concepts from economics: utility and surplus. Util- In this section. we ned as a measure of the relative satisfaction desirability of, consumption of various goods and services [17]. by formalizing our problem and introducing the "economic Each product provides consumers with an overall utility, which view"of consumer rational choices. For better understanding we introduce the following theoretical bases: utility theory. individual product characteristics. At the same time, the action of purchasing deprives the customer from the utility of the ney that is spent for buying the product. With the assump- 2.1 Problem Description tion consumers rationality, the de making process behind n general, our main goal is to identify the best products for a purchasing can be viewed as a process of utility maximization that takes into consideration both product quality and price Based on utility theory, we propose to design a new ranking ExAMPLE 1. Alice is looking for a hotel in New York City system that uses demand-estimation approaches from economics She prefers a place of good quality but preferably costing not o generate the weights that consumers implicitly assign to each more than $300 per night. She conducts a faceted search(e.g. individual product characteristic. An important characteristic of with respect to price and ratings ): Unfortunately, with explicit this approach is that it does not require purchasing information price constrain, she may miss some "great deal"with much individual customers but rather relies on aggregate demand higher value but a slightly higher price. For instance, the 5-star data. Based on the estimated weights, we then derive the surplus Mandarin hotel happens to run a promotion that week with a for each product, which represents how much extra utility one discounted price of $333 per night. With the most luxurious can obtain by purchasing a product. Finally, we rank all the environment and room services, the price for Mandarin would products ording to their surplus. We extend our ranking normally be around S900 per night otherwise. So, although the strategy to a personalized level, based on the distribution of price is $33 aboue her budget, Alice would certainly be willing to "grab the deal"if this hotel appeared in the search result We instantiate our research by building a search engine for otels, based on a unique data set containing transactions from However, the problem is how can Alice know that such a Nov. 2008 to Jan. 2009 for US hotels from a major travel deal exists? In other words, how can we improve the search so web site. Our extensive user studies, using more than 15000 that it can help Alice identify the "best value" products? To evaluations, demonstrate an overwhelming preference for examine this problem, we introduce the concept of surplus from economics. It is a number of existing strong baselines. from the exchange of goods [17]. If we can derive the surplus he following from each product, then by ranking the products according to We aim at making recommendations based on better un their surplus, we can easily find the best product that provides derstanding of the underlying"causality"of consumers' the highest benefit purchase decisions. We present a user model that captures derive the surplus so that we can quantify the gain from buying the decision-making process of consumers, leading to a a product? To do so, we introduce another concept: utility. contrast to building a"black-box"style predictive model 2.2 Choice Decisions and Utility Maximization using machine learning algorithms. The causal model re- Surplus can be derived from utility and rational choice the- laxes the assumption of a"consistent environment"across ories. A fundamental notion in utility theory is that each training and testing data sets and allows for changes in the consumer is endowed with an associated utility function U, odeling environment and predicts what should happen which is "a measure of the satisfaction from consumption of n when things chan various goods and services"[17. The rationality assumption We infer personal preferences from aggregate data, in a defines that each person tries to maximize its own utility privacy-preserving manner. Our algorithm learns con- In the context of purchasing decisions, we assume tI sumer preferences based on the largely anonymous, pub- consumer has access to a set of products, each product licly observed distributions of consumer demographics as a price. Informally, buying a product involves the exo well as the observed aggregate-level purchases (i.e, anony- of money for a product. Therefore, to analyze the purchasing mous purchases and market shares in NYC and LA), not behavior we need two components for the utility function ning from the identified behavior or demographics Utility of Product: The utility that the consumer will gain of each individual We propose a ranking method using the notion of sur- by buying the product, and plus, which is not only theory-driven but also generates Utility of Money: The utility that the consumer will lose systematically better results than current approaches by paying the price for that product In general, a consumer buys the product that maximizes utility. We present an extensive experimental study: using six and does so only if the utility gained by purchasing the product hotel markets, 15000 user evaluations, and using blind ests, we demonstrate that the generated rankings is higher than the corresponding, lost, utility of the money. significantly better than existing approaches The rest of the paper is organized as follows. Section 2 n products, and each product X, has a price Pi 2 Before the the background. Section 3 explains how we estimate the model purchase, the consumer has some disposable income I that ly how we compute the weights associat with product characteristics. Section 4 discusses how we build modeling framework that we adop our basic rankings, and how we can personalize the presente the experimental evaluation, even imperfect theories generate good esults. Section 5 provides the setting for the experiment imental results valuation,and Section 6 discusses the results. Finally, Section 7 To allow for the possibility of not buying anything, we also add a dummy product Xo with price po = 0, which corresponds to the choice of discusses related work and Section 8 concludes buying anything. 328
consumers use multiple criteria? For this purpose, we use two fundamental concepts from economics: utility and surplus. Utility is defined as a measure of the relative satisfaction from, or desirability of, consumption of various goods and services [17]. Each product provides consumers with an overall utility, which can be represented as the aggregation of weighted utilities of individual product characteristics. At the same time, the action of purchasing deprives the customer from the utility of the money that is spent for buying the product. With the assumption consumers rationality, the decision-making process behind purchasing can be viewed as a process of utility maximization that takes into consideration both product quality and price. Based on utility theory, we propose to design a new ranking system that uses demand-estimation approaches from economics to generate the weights that consumers implicitly assign to each individual product characteristic. An important characteristic of this approach is that it does not require purchasing information for individual customers but rather relies on aggregate demand data. Based on the estimated weights, we then derive the surplus for each product, which represents how much extra utility one can obtain by purchasing a product. Finally, we rank all the products according to their surplus. We extend our ranking strategy to a personalized level, based on the distribution of consumers’ demographics. We instantiate our research by building a search engine for hotels, based on a unique data set containing transactions from Nov. 2008 to Jan. 2009 for US hotels from a major travel web site. Our extensive user studies, using more than 15000 user evaluations, demonstrate an overwhelming preference for the ranking generated by our techniques, compared to a large number of existing strong baselines. The major contributions of our research are the following: • We aim at making recommendations based on better understanding of the underlying “causality” of consumers’ purchase decisions. We present a user model that captures the decision-making process of consumers, leading to a better understanding of consumer preferences. This is in contrast to building a “black-box” style predictive model using machine learning algorithms. The causal model relaxes the assumption of a “consistent environment” across training and testing data sets and allows for changes in the modeling environment and predicts what should happen even when things change. • We infer personal preferences from aggregate data, in a privacy-preserving manner. Our algorithm learns consumer preferences based on the largely anonymous, publicly observed distributions of consumer demographics as well as the observed aggregate-level purchases (i.e., anonymous purchases and market shares in NYC and LA), not by learning from the identified behavior or demographics of each individual. • We propose a ranking method using the notion of surplus, which is not only theory-driven but also generates systematically better results than current approaches. • We present an extensive experimental study: using six hotel markets, 15000 user evaluations, and using blind tests, we demonstrate that the generated rankings are significantly better than existing approaches. The rest of the paper is organized as follows. Section 2 gives the background. Section 3 explains how we estimate the model parameters, specifically how we compute the weights associated with product characteristics. Section 4 discusses how we build our basic rankings, and how we can personalize the presented results. Section 5 provides the setting for the experimental evaluation, and Section 6 discusses the results. Finally, Section 7 discusses related work and Section 8 concludes. 2. THEORY MODEL In this section, we provide the background economic theory that explains the basic concepts behind our model. We start by formalizing our problem and introducing the “economic view” of consumer rational choices. For better understanding, we introduce the following theoretical bases: utility theory, characteristics-based theory, and surplus. 2.1 Problem Description In general, our main goal is to identify the best products for a consumer. The example illustrates this: Example 1. Alice is looking for a hotel in New York City. She prefers a place of good quality but preferably costing not more than $300 per night. She conducts a faceted search (e.g., with respect to price and ratings): Unfortunately, with explicit price constrain, she may miss some “great deal” with much higher value but a slightly higher price. For instance, the 5-star Mandarin hotel happens to run a promotion that week with a discounted price of $333 per night. With the most luxurious environment and room services, the price for Mandarin would normally be around $900 per night otherwise. So, although the price is $33 above her budget, Alice would certainly be willing to ”grab the deal” if this hotel appeared in the search result. However, the problem is how can Alice know that such a deal exists? In other words, how can we improve the search so that it can help Alice identify the “best value” products? To examine this problem, we introduce the concept of surplus from economics. It is a measure of the benefits consumers derive from the exchange of goods [17]. If we can derive the surplus from each product, then by ranking the products according to their surplus, we can easily find the best product that provides the highest benefits to a consumer. Now the question is, how to derive the surplus so that we can quantify the gain from buying a product? To do so, we introduce another concept: utility. 2.2 Choice Decisions and Utility Maximization Surplus can be derived from utility and rational choice theories. A fundamental notion in utility theory is that each consumer is endowed with an associated utility function U, which is “a measure of the satisfaction from consumption of various goods and services” [17]. The rationality assumption defines that each person tries to maximize its own utility.1 In the context of purchasing decisions, we assume that the consumer has access to a set of products, each product having a price. Informally, buying a product involves the exchange of money for a product. Therefore, to analyze the purchasing behavior we need two components for the utility function: • Utility of Product: The utility that the consumer will gain by buying the product, and • Utility of Money: The utility that the consumer will lose by paying the price for that product. In general, a consumer buys the product that maximizes utility, and does so only if the utility gained by purchasing the product is higher than the corresponding, lost, utility of the money. More formally, assume that the consumer has a choice across n products, and each product Xj has a price pj . 2 Before the purchase, the consumer has some disposable income I that 1While in reality consumers are not always rational, it is a convenient modeling framework that we adopt in this paper. As we demonstrate in the experimental evaluation, even imperfect theories generate good experimental results. 2To allow for the possibility of not buying anything, we also add a dummy product X0 with price p0 = 0, which corresponds to the choice of not buying anything. WWW 2011 – Session: E-commerce March 28–April 1, 2011, Hyderabad, India 328
WWw 2011-Session: E-commerce March 28-April 1, 2011, Hyderabad, India generates a money utility Um(I). The decision to →U0-u0-p)-a X generates a product utility Up(X,) and, simt paying the price P; decreases the money utility to Un Assuming that the consumer strives to optimize its ov the purchased product X, is the one that gives the highest This approach naturally generates a ranking order for the products: The products that generate the highest increase in utility should be ranked on top. Thus, to compute the in utility, we need the gained utility of product Up(X,)and the lost utility of money Um(1)-Um(I-Pi) 2.2.1 Utility of product Figure 1: A concave, bounded, increasing function for Modeling the utility of a product can be traced back to Lan- “1 utility of noney, "approximated with a linear func aster's characteristics theory [15] and Rosen's hedonic price tion for small changes del 24. The hedonic price model es that differenti. The ave assumption can be relaxed when the changes a product can be decomposed into a set of utilities for each that the marginal utility of money is approximately constant. roduct characteristic cording to this model, a product X More formally, we assume that a consumer with income I ith K features can be represented by a K-dimensional vector eceives ney utility Um(n). Paying the price p decrease X=(l,,k), where r* represents the amount or quality small compared to the disposable income 1, the marginal utility product X is then modeled by the function Up(a, .,z") One of the critical issues in this model is how to estimate Under this assumption, the utility of money that the consumer the aggregated utility from the individual product characteris- will lose by paying the price p for product X, can be thereby tics. Based on the hedonic price model, we assume that each represented in a quasi-linear form as follows product characteristic is associated with a weight representing onsumers'desirability towards that characteristic. Under this Um(D)-Um(I-p)=aI-a(I-p)=a(n p,(2) ption, we further refine the definition of overall utility where a(n) denotes the marginal utility for money for someone individual characteristics and an unobserved characteristic E 2.2.3 Challenges Up(X)=Ui(x2…,x)=∑·x+, Given the utility of product and utility of derive the utility surplus as the increase utility, after the purchase. More formally, where B" represents the corresponding weight that the consumer definition for utility surplus is provided as follows assign to the k-th characteristic I. Notice that with Ewe capture the influence of all product characteristics that are not DEFINITION 1: The utility surplus (US), for a consumer explicitly accounted in our model. So, a product that consumers with disposable income I, when buying a product X priced at perceive as high-quality due to a characteristic not explicitly p, is the gain in the utility of product Up minus the loss in the captured in our measurements(e. g. brand name), will end up utility of money Un having a high value of E 2.2.2 Utility of Money US= Up(X)-[Um(D)-Um(I-P)l+E Given the utility of a product, to analyze consumers mo- tivation to trade money for the product, it is also necessary ∑·x+5 to understand the utility of money. This concept is defined as consumers' happiness for owning monetary capital. Based Utility of product on Alfred Marshall's well-established principles[17], utility of Note that s is a product-specific disturbance scalar summarizing money has two basic properties: increasing and concave. unobserved characteristics of product X, and a Increasing: An increase in the amount of money will cause error term that is assumed to be i i.d. across pre an increase in the utility of money. In other words, the nsumers in the selection process and is usually more money someone has, the better. follow a Type I extreme-value distribution. D Concave: The increase in utility, or marginal utility of money, is diminishing as the amount of money increase In this theory model, the key challenge is to estimate the rresponding weights assigned by consumers towards money cample of the utility function for money is shown in Figure 1. Note that with the concave d product dimensions. We discuss this next form of the utility function, the slope is decreasing hence the marginal utility of money is diminishing 100 is more 3. ESTIMATION OF MODEL PARAMETERS rtant for someone with S1000 than for someone with $100.000 In the previous section, we have introduced the background of This also implies that consumers are risk-averse under normal utility theory, characteristics-based theory, and surplus. Recall circumstances. This is because with the same probability to that our main go oal is to identify the best product (with the win or lose, losing N dollars in the assets will cause a drop in highest surplus) for a consumer. This is complicated by the the utility larger than the boost while winning N dollars fact that utility, and therefore surplus, of consumers is private
generates a money utility Um(I). The decision to purchase Xj generates a product utility Up(Xj ) and, simultaneously, paying the price pj decreases the money utility to Um(I − pj ). Assuming that the consumer strives to optimize its own utility, the purchased product Xj is the one that gives the highest increase in utility. This approach naturally generates a ranking order for the products: The products that generate the highest increase in utility should be ranked on top. Thus, to compute the increase in utility, we need the gained utility of product Up(Xj ) and the lost utility of money Um(I) − Um(I − pj ). 2.2.1 Utility of Product Modeling the utility of a product can be traced back to Lancaster’s characteristics theory [15] and Rosen’s hedonic price model [24]. The hedonic price model assumes that differentiated products are described by vectors of objectively measured characteristics. In addition, the utility that a consumer has for a product can be decomposed into a set of utilities for each product characteristic. According to this model, a product X with K features can be represented by a K-dimensional vector X = x1, ... , xK, where xk represents the amount or quality of the k-th characteristic of the product. The overall utility of product X is then modeled by the function Up(x1, ..., xK). One of the critical issues in this model is how to estimate the aggregated utility from the individual product characteristics. Based on the hedonic price model, we assume that each product characteristic is associated with a weight representing consumers’ desirability towards that characteristic. Under this assumption, we further refine the definition of overall utility to be the aggregation of weighted utilities from the observed individual characteristics and an unobserved characteristic ξ: Up(X) = Up(x1 ,...,xK) = K k=1 βk · xk + ξ, (1) where βk represents the corresponding weight that the consumer assign to the k-th characteristic xk. Notice that with ξ we capture the influence of all product characteristics that are not explicitly accounted in our model. So, a product that consumers perceive as high-quality due to a characteristic not explicitly captured in our measurements (e.g. brand name), will end up having a high value of ξ. 2.2.2 Utility of Money Given the utility of a product, to analyze consumers’ motivation to trade money for the product, it is also necessary to understand the utility of money. This concept is defined as consumers’ happiness for owning monetary capital. Based on Alfred Marshall’s well-established principles [17], utility of money has two basic properties: increasing and concave. • Increasing: An increase in the amount of money will cause an increase in the utility of money. In other words, the more money someone has, the better. • Concave: The increase in utility, or marginal utility of money, is diminishing as the amount of money increases. Based on these properties, an example of the utility function for money is shown in Figure 1. Note that with the concave form of the utility function, the slope is decreasing hence the marginal utility of money is diminishing. So, $100 is more important for someone with $1000 than for someone with $100,000. This also implies that consumers are risk-averse under normal circumstances. This is because with the same probability to win or lose, losing N dollars in the assets will cause a drop in the utility larger than the boost while winning N dollars. Figure 1: A concave, bounded, increasing function for “1utility of money,” approximated with a linear function for small changes The concave assumption can be relaxed when the changes in money are small. For most transactions, we often assume that the marginal utility of money is approximately constant. More formally, we assume that a consumer with income I receives a money utility Um(I). Paying the price p decreases the money utility to Um(I − p). Assuming that p is relatively small compared to the disposable income I, the marginal utility of money remains mostly constant in the interval [I − p, I] [17]. Under this assumption, the utility of money that the consumer will lose by paying the price p for product X, can be thereby represented in a quasi-linear form as follows: Um(I) − Um(I − p) = α · I − α · (I − p) = α(I) · p, (2) where α(I) denotes the marginal utility for money for someone with disposable income I. 2.2.3 Challenges Given the utility of product and utility of money, we can now derive the utility surplus as the increase in utility, or excess utility, after the purchase. More formally, the mathematical definition for utility surplus is provided as follows. Definition 1.: The utility surplus (US), for a consumer with disposable income I, when buying a product X priced at p, is the gain in the utility of product Up minus the loss in the utility of money Um. US = Up(X) − [Um(I) − Um(I − p)] + ε i j = k βk · xk + ξ Utility of product − α · p Utility of money + ε Stochastic error (3) Note that ξ is a product-specific disturbance scalar summarizing unobserved characteristics of product X, and ε is a stochastic error term that is assumed to be i.i.d. across products and consumers in the selection process and is usually assumed to follow a Type I extreme-value distribution. ✷ In this theory model, the key challenge is to estimate the corresponding weights assigned by consumers towards money and product dimensions. We discuss this next. 3. ESTIMATION OF MODEL PARAMETERS In the previous section, we have introduced the background of utility theory, characteristics-based theory, and surplus. Recall that our main goal is to identify the best product (with the highest surplus) for a consumer. This is complicated by the fact that utility, and therefore surplus, of consumers is private WWW 2011 – Session: E-commerce March 28–April 1, 2011, Hyderabad, India 329
WWw 2011-Session: E-commerce March 28-April 1, 2011, Hyderabad, India and not directly observable. As a result, there exists no"true for establishing the connection between logistic regression and bserved utility" that we can compare with our"model predicte models of discrete user choice utility. "Instead we need to observe the behavior of consumers 3.1.2 Estimation Methodolog and estimate the values of these latent parameters that explain Given Equation 6, we can estimate consumer preferences best the consumer behavior. Furthermore, since we cannot (expressed by the parameters a and P), by observing market assume that we can observe in detail the behavior of individual shares of the different products. One challenge is that we need onsumers, nor can we explicitly ask each consumer for their to know the"demand"for the"buy nothing "option in order to personal"tastes"(e. g, choice of a product, "weight"assigned to product feature, etc. ) we need to extract utilities and derive stimate properly the value P(choicei) in Equation 6. Specifically, we set P(choice)= dibs/dtotal, where dobs is individual preferences by using aggregate data. the observed demand for product j and dtotal is"total demand. The basic idea is the following: If we know the utilities of which includes the demand for the buy-nothing option Taking different products for a consumer, we can estimate the demand for different products, as consumers will behave according to logs in Equation 6 and solving the system 5 heir utility-encoded preferences. So, if we observe the de. mand for various products, we can infer the preferences of the ln()=-ap+∑+6 consumer population for different product aspects. Observing roduct demand is easier than it sounds: For example, we can Such a model can be easily solved to acquire the parameters bserve salesrank on amazon. com and transform salesrank to B and a using any linear regression method, such as ordinar 8, or we can directly observe the transactions at mar- such as eBay and Amazon [11, or we can simply get ExAMPLE 2. Assume that we have a hotel market in New nymous transactions from a merchant In this section, we discuss how we estimate the parameters York, with two hotels: Hotel M(Mandarin Oriental, 5-star), using aggregate demand data. First in Section 3. 1 we discuss and Hotel D(Doubletree, 3-star). From day 1 to 3, we observ that the price for Mandarin Oriental is $500, $480 and $530 pe case where consumers are homogeneous and have night. We also observe a corresponding demand of 400, 470, and similar preferences. Then in Section 3.2 we analyze the more realistic case where consumers have different preferences, which 320 bookings, respectively. Meanwhile, the price for Doubletree is $250, $270 and $225 per night, and its corresponding demand 600, 530 and 680 bookings. Using ou we can 3.1 Homogeneous Consumers: Logit model down the regression equations: 3.1.1 Model specification In(bookings )=-a price+B. stars + hotel +E ( 8) The basic Logit model, introduced by McFadden [18, 19 Here, we divide the unobservable f into a fired effect f that is common for the same hotel (effectively a dummy bina product characteristics. In other words, the weights B and a variable), and an i.i.d. random error term E. Using OLS,we are common across all consumers. Thus, following Definition 1, get a=0.0067 and B=0.64 which express the sensitivity of the the utility surplus for consumer i and product j is written as: consumers to price and their preference for"stars, respectively US;=Vi(o, B)+ej Of course, the assumption of homogeneity of cor where V(a,B)=∑kB,+5-a·p1. Notice that erences is only an ideal case. In reality, consumers are different separate preferences towards product j, captured by Vi(a, B) and their tastes vary. Next, we examine the case where the from non-deterministic aspects of individual consumer behavi consumer have heterogeneous captured by the error term ej 3.2 Heterogeneous Consumers: BLP Model According to the assumption of consumer rationality for utility heterogeneous. In prin maximization, the consumer chooses the product that maximizes ciple, we could observe a customer for a long period of time and utility surplus. Note that the choice is stochastic, given the then use the Logit scheme described above to extract the pref. rror term Ei. Therefore, in our scenario, the probability that erences of each customer. Unfortunatel individual behavior over long periods of time, so it is difficult P(choicei)= P(US,>USi) to estimate the individual preferences for each consumer. (Vl in the same market, 1+1).(5) erences are a function of consumer demographics and purchase Solving this equation, we have [18, 19 context. For example, everything else being equal, honeymoon ers may appreciate a hotel in a romantic remote setting, while P(choicei)= exp(Vi(a, B) business travelers may appreciate more a location with easy 1+∑:exp(V(a,B) access to public transportation. We can therefore characterize In the homogeneous case, all consul ave the each customer by a set of demographic characteristics(e.g, and this age, gender, travel purpose, etc. and make the preference project j(the consumer-specific error term Ei has disappeared Notice that the problem of estimating preferences can be now In this case, the overall preference distribution of the whole population is a mizture of preference distribution of the various expressed as a logistic regression problem. What is worthwhile in this to mention is that this solution is not an adhoc choice, but is consumer types in the population. The main not the a direct derivation from a theory-driven user behavior model setting is that we only observe overall deman demand from each separate consumer group Daniel McFadden got the Nobel prize in Economics in 2000 as, the absolute the parameter estimation and
and not directly observable. As a result, there exists no “true observed utility” that we can compare with our “model predicted utility.” Instead we need to observe the behavior of consumers and estimate the values of these latent parameters that explain best the consumer behavior. Furthermore, since we cannot assume that we can observe in detail the behavior of individual consumers, nor can we explicitly ask each consumer for their personal “tastes” (e.g., choice of a product, “weight” assigned to a product feature, etc.), we need to extract utilities and derive individual preferences by using aggregate data. The basic idea is the following: If we know the utilities of different products for a consumer, we can estimate the demand for different products, as consumers will behave according to their utility-encoded preferences. So, if we observe the demand for various products, we can infer the preferences of the consumer population for different product aspects. Observing product demand is easier than it sounds: For example, we can observe salesrank on Amazon.com and transform salesrank to demand [8], or we can directly observe the transactions at marketplaces such as eBay and Amazon [11], or we can simply get directly anonymous transactions from a merchant. In this section, we discuss how we estimate the parameters using aggregate demand data. First, in Section 3.1 we discuss the simpler case where consumers are homogeneous and have similar preferences. Then in Section 3.2 we analyze the more realistic case where consumers have different preferences, which depend on their demographics and purchase context. 3.1 Homogeneous Consumers: Logit Model 3.1.1 Model Specification The basic Logit model, introduced by McFadden [18, 19], assumes that consumers have “homogeneous preferences” towards product characteristics. In other words, the weights β and α are common across all consumers. Thus, following Definition 1, the utility surplus for consumer i and product j is written as: USi j = Vj (α, β) + ε i j . (4) where Vj (α, β) = k βk · xk j + ξj − α · pj . Notice that we separate preferences towards product j, captured by Vj (α, β), from non-deterministic aspects of individual consumer behavior, captured by the error term εi j . According to the assumption of consumer rationality for utility maximization, the consumer chooses the product that maximizes utility surplus. Note that the choice is stochastic, given the error term εi j . Therefore, in our scenario, the probability that a consumer i chooses product j is: P(choicei j ) = P(USi j > USi l ) (∀l in the same market, l = j). (5) Solving this equation, we have [18, 19]: P(choicei j ) = exp (Vj (α, β)) 1 + l exp (Vl(α, β)). (6) In the homogeneous case, all consumers have the same α and β and this probability is proportional to the market share3 of project j (the consumer-specific error term εi j has disappeared). Notice that the problem of estimating preferences can be now expressed as a logistic regression problem. What is worthwhile to mention is that this solution is not an adhoc choice, but is a direct derivation from a theory-driven user behavior model. Daniel McFadden got the Nobel prize in Economics in 2000 3Market share is defined as the percentage of total sales volume in a market captured by a brand, product, or firm. for establishing the connection between logistic regression and models of discrete user choice. 3.1.2 Estimation Methodology Given Equation 6, we can estimate consumer preferences (expressed by the parameters α and β), by observing market shares of the different products. One challenge is that we need to know the “demand” for the “buy nothing” option in order to estimate properly the value P(choicej ) in Equation 6. Specifically, we set P(choicej ) = dobs j /dtotal , where dobs j is the observed demand for product j and dtotal is “total demand,” which includes the demand for the buy-nothing option.4 Taking logs in Equation 6 and solving the system [5]: ln(dobs j ) = −α · pj + k βk · xk j + ξj . (7) Such a model can be easily solved to acquire the parameters β and α using any linear regression method, such as ordinary least squares (OLS). Example 2. Assume that we have a hotel market in New York, with two hotels: Hotel M (Mandarin Oriental, 5-star), and Hotel D (Doubletree, 3-star). From day 1 to 3, we observe that the price for Mandarin Oriental is $500, $480 and $530 per night. We also observe a corresponding demand of 400, 470, and 320 bookings, respectively. Meanwhile, the price for Doubletree is $250, $270 and $225 per night, and its corresponding demand is 600, 530 and 680 bookings. Using our model, we can write down the regression equations: ln(bookings) = −α · price + β · stars + fhotel + (8) Here, we divide the unobservable ξ into a fixed effect f that is common for the same hotel (effectively a dummy binary variable), and an i.i.d. random error term . Using OLS, we get α = 0.0067 and β = 0.64 which express the sensitivity of the consumers to price and their preference for “stars,” respectively. Of course, the assumption of homogeneity of consumer preferences is only an ideal case. In reality, consumers are different and their tastes vary. Next, we examine the case where the consumer have heterogeneous tastes. 3.2 Heterogeneous Consumers: BLP Model In reality, consumers’ preferences are heterogeneous. In principle, we could observe a customer for a long period of time and then use the Logit scheme described above to extract the preferences of each customer. Unfortunately, we can rarely observe individual behavior over long periods of time, so it is difficult to estimate the individual preferences for each consumer. To allow preferences to vary, though, we can assume that preferences are a function of consumer demographics and purchase context. For example, everything else being equal, honeymooners may appreciate a hotel in a romantic remote setting, while business travelers may appreciate more a location with easy access to public transportation. We can therefore characterize each customer by a set of demographic characteristics (e.g., age, gender, travel purpose, etc.) and make the preference coefficients β to be a function of these demographics. In this case, the overall preference distribution of the whole population is a mixture of preference distribution of the various consumer types in the population. The main challenge in this setting is that we only observe overall demand, and not the demand from each separate consumer group. So, the question 4Since dtotal, appears as a constant in across all equations, the absolute value of dtotal and of the “buy nothing” demand d0 is not relevant to the the parameter estimation and can be ignored. WWW 2011 – Session: E-commerce March 28–April 1, 2011, Hyderabad, India 330
WwW 2011- Session: E-commerce March 28-April 1, 2011, Hyderabad, India becomes: How can we find the preferences of various consumer demographic and income distributions P(T)and P( types by simply observing the aggregate product demand? exp(6+apn+∑k時r 3.2.1 Model Specification P(chon)=/1+exp(61+a1n+∑p“) dP(r)dP() We solve this issue by monitoring demand for similar products in different cets, for which we know the distribution of We explain next how we compute this integral and how we onsumers. Since the same product will have the same demand extract the parameters that capture the population preferences from a given demographic group, any differences in demand oss markets can be attributed to the different demographic 3.2.2 Estimation Methodology The following simplified example illustrates the intuition behind With the model in hand, now we discuss how we identify th this approach arameters 5, ar and Br. We apply methods simila to those used in [6, 7] and[25. In general, we estimated the ExAMPLE 3. Consider an example where we have two cities parameters by searching the parameter space in an iterative A and B and two types of consumers: business trip travelers and manner, using the following steps family trip travelers. City A is a business destination with 80% 1.Initialize the parameters 8 o)and 8(o)=(afo), e(o )using of the travelers being business travelers and 20% families. City a random choice of values B is mainly a family destination with 10% business travelers 2. Estimate market shares s; given 0 and and 90% family travelers. In city A, we have two hotels: Hilton 3. Estimate most likely mean utility dj given the market (Al) and Doubletree(A2). In city B, we have again two hotels Hilton(Bi) and Doubletree(B2). Hilton hotels(Al, Bi)have 4. Find the best parameters a and B that minimize the a conference center but no pool, and Doubletree hotels(A2, B2) explained remaining error in dj and evaluate the gen have a pool but no conference center. To keep the example eralized method of moments(GMM)objective function simple, we assume that preferences of consumers do not change when they travel in different cities and that prices are the same 5. Use Nelder-Mead Simplex algorithm to update the pa- By observing demand, we see that demand in city A(business rameter values for 0=(aI, Br) and go to Step 2, until destination)is 820 bookings per day for Hilton and 120 bookings minimizing the GmM objective function for Doubletree. In city B(family destination) the demand is 540 bookings per day for Hilton and 460 bookings for Double- We describe the steps in more detail below. tree. Since the hotels are identical in the two cities, the changes Calculating market share sj: To form the market equations in demand must be the result of different traveler demograph- need two things: the right-hand side sabs that can be observed ics, hinting that a conference center is desirable for business from our transaction data. and the left-hand side si. derived from Equation ll. Unfortunately, the integral in Equation 11 is not analytic. To approximate this integral, we proceed as For this paper, to extract consumer preferences, we use the follows: Given the demographic distribution, we generate"a Random-Coefficient Model (6), introduced by Berry, Levinsohn, consumer randomly, with a known demographic and income and and Pakes, and commonly referred to as the BLP model. This therefore known prefere Then, using the standard Logit model extends the basic Logit model by assuming the coefficients model(Equation 6), we generate the choice of the product for B and a in Equation 6 to be demographic-specific. Let T be a his consumer. For example, assume that we have the following vector representing consumer type, which can specify a particular int demographic distribution of travel purpose and age group purchase context, age group, and so on. In the simplest case, ve can have a binary variable for each consumer group. With lge≤45Age>45 the preferences being now demographic-specific, we write the utility surplus for consumer i, of type T, when buying product 30% 40% j, with features(x3,……,), at price p i to be In this case, we have a 40% probability of generating a"sample US=∑:(r)-a()P+6+ consumer" with family travel purpose and age above 45. By (9) repeating the process and obtaining Nr samples of demographic T and NI samples of income I, we can compute an unbiased For the Logit model, in Equation 4, we used V(a, B)to stylis. estimator of the Equation 1l integral: 5 tically separate the population preferences from the idiosyn- ep(+a1+∑k跨r“) cratic behavior of the consumer. We now do the same for the BLP model, separating the mean population preferences from 83(631)~MNr7+2∞p(+a1P+∑kTr) the demographic-specific preferences. So, we write B(T) (Bk+BTr), where Bk is the mean of the preference distribu Estimate utility d Since we know how to compute tion, and Br is a vector capturing the variation in the preferen market shares from the parameters, we can now find a value from different consumer types. Similarly, we model a as a func- of oj that best"fits"the observed market shares.(Notice that, conditional on 8=(or, Br), market share s; can be viewed as ion of income r: a(r)=(a+arr ). Notice that we assume a function of the mean utility 8. )We apply the contraction aI and Br to be independent. We rewrite US;as mapping method recommended by Berry [6], which suggest computing the value for d using an iterative apy ∑(+時r)·+5-(+aP) (1 (91).(13) We use d=-a·p+∑ The procedure is guaranteed to converge [6] and find d, that zi+5, to represent the mean satisfies s, (6, 10)=s9 y the Logit me the choice probability for j, by integrating over the population We use N= Nr= 100 in our study. 331
becomes: How can we find the preferences of various consumer types by simply observing the aggregate product demand? 3.2.1 Model Specification We solve this issue by monitoring demand for similar products in different markets, for which we know the distribution of consumers. Since the same product will have the same demand from a given demographic group, any differences in demand across markets can be attributed to the different demographics. The following simplified example illustrates the intuition behind this approach. Example 3. Consider an example where we have two cities, A and B and two types of consumers: business trip travelers and family trip travelers. City A is a business destination with 80% of the travelers being business travelers and 20% families. City B is mainly a family destination with 10% business travelers and 90% family travelers. In city A, we have two hotels: Hilton (A1) and Doubletree (A2). In city B, we have again two hotels: Hilton (B1) and Doubletree (B2). Hilton hotels (A1, B1) have a conference center but no pool, and Doubletree hotels (A2, B2) have a pool but no conference center. To keep the example simple, we assume that preferences of consumers do not change when they travel in different cities and that prices are the same. By observing demand, we see that demand in city A (business destination) is 820 bookings per day for Hilton and 120 bookings for Doubletree. In city B (family destination) the demand is 540 bookings per day for Hilton and 460 bookings for Doubletree. Since the hotels are identical in the two cities, the changes in demand must be the result of different traveler demographics, hinting that a conference center is desirable for business travelers. For this paper, to extract consumer preferences, we use the Random-Coefficient Model [6], introduced by Berry, Levinsohn, and Pakes, and commonly referred to as the BLP model. This model extends the basic Logit model by assuming the coefficients β and α in Equation 6 to be demographic-specific. Let Ti be a vector representing consumer type, which can specify a particular purchase context, age group, and so on. In the simplest case, we can have a binary variable for each consumer group. With the preferences being now demographic-specific, we write the utility surplus for consumer i, of type Ti , when buying product j, with features x1 j ,...,xk j , at price pj to be: USi j = k βk(Ti ) · xk j − α(Ii ) · pj + ξj + ε i j . (9) For the Logit model, in Equation 4, we used V (α, β) to stylistically separate the population preferences from the idiosyncratic behavior of the consumer. We now do the same for the BLP model, separating the mean population preferences from the demographic-specific preferences. So, we write βk(Ti ) = ¯ βk + βk T Ti , where ¯ βk is the mean of the preference distribution, and βk T is a vector capturing the variation in the preferences from different consumer types. Similarly, we model αi as a function of income Ii : α(Ii ) = α¯ + αI Ii . Notice that we assume αI and βT to be independent. We rewrite USi j as: USi j = k ¯ βk + βk T Ti · xk j + ξj − α¯ + αI Ii · pj + εi j . (10) We use δj = −α¯ · pj + k ¯ βk · xk j + ξj to represent the mean utility of product j. Then, as in the Logit model, we derive the choice probability for j, by integrating over the population demographic and income distributions P(T) and P(I): P (choicej ) = exp δj + αI Iipj + k βk T Tixk j 1 + l exp δl + αI Iipl + k βk T Tixk l dP (T) dP (I) (11) We explain next how we compute this integral and how we extract the parameters that capture the population preferences. 3.2.2 Estimation Methodology With the model in hand, now we discuss how we identify the unknown parameters δj , αI and βT . We apply methods similar to those used in [6, 7] and [25]. In general, we estimated the parameters by searching the parameter space in an iterative manner, using the following steps: 1. Initialize the parameters δ (0) j and θ(0) = (α(0) I , β(0) T ) using a random choice of values. 2. Estimate market shares sj given θ and δ. 3. Estimate most likely mean utility δj given the market shares. 4. Find the best parameters α¯ and ¯ βk that minimize the unexplained remaining error in δj and evaluate the generalized method of moments (GMM) objective function. 5. Use Nelder-Mead Simplex algorithm to update the parameter values for θ = (αI , βT ) and go to Step 2, until minimizing the GMM objective function. We describe the steps in more detail below. Calculating market share sj : To form the market equations (i.e., model predicted market share = observed market share), we need two things: the right-hand side sobs j that can be observed from our transaction data, and the left-hand side sj , derived from Equation 11. Unfortunately, the integral in Equation 11 is not analytic. To approximate this integral, we proceed as follows: Given the demographic distribution, we “generate” a consumer randomly, with a known demographic and income and, therefore, known preferences. Then, using the standard Logit model (Equation 6), we generate the choice of the product for this consumer. For example, assume that we have the following joint demographic distribution of travel purpose and age group: Age ≤ 45 Age > 45 Business 15% 15% F amily 30% 40% . In this case, we have a 40% probability of generating a “sample consumer” with family travel purpose and age above 45. By repeating the process and obtaining NT samples of demographics Ti and NI samples of income Ii , we can compute an unbiased estimator of the Equation 11 integral:5 sj (δj |θ) ∼ 1 NI 1 NT NI Ii NT T i exp δj + αI Iipj + k βk T Tikxk j 1 + l exp δl + αI Iipl + k βk T Tikxk l . (12) Estimate mean utility δj : Since we know how to compute market shares from the parameters, we can now find a value of δj that best “fits” the observed market shares. (Notice that, conditional on θ = (αI , βT ), market share sj can be viewed as a function of the mean utility δj .) We apply the contraction mapping method recommended by Berry [6], which suggests computing the value for δ using an iterative approach: δ (t+1) j = δ (t) j + (ln(s obs j ) − ln(sj (δ (t) j |θ))). (13) The procedure is guaranteed to converge [6] and find δj that satisfies sj (δj |θ) = sobs j . 5We use NT = NI = 100 in our study. WWW 2011 – Session: E-commerce March 28–April 1, 2011, Hyderabad, India 331