consuming operation. In summary, the above issues can be attributed to the task and model mismatch SQL is a general-purpose query language, which makes it a less intuitive and a less useful tool for users in the vertical" application domain of recommender systems, where SQL may not have some specialized capabilities important for recommender systems. Also, SQL is based on the relational data model, and multidimensional recommendations on the multidimensional model 4] would need to be mapped into the relational model to support SQL queries, which leads to various translation problems. To avoid these issues, it is advantageous to develop a specialized recommendation language based on the characteristics of the application domain that also supports the multidimensional recommendation model, as we do in the To provide flexible and user-driven recommendations and to address the previously specified issues with using SQL as a recommendation language, we designed a new recommendation query language REQUEST, which allows its users to express in a flexible manner a broad range of recommendations that are tailored to their own individual needs and, therefore, more accurately reflect their interests. For example, the earlier recommendation for Tom can be expressed in REQUEST as RECOMMEND Movie, Time TO User, Companion USING MoVie Recommendet RESTRICT USer Name=“Tom” AND Time. Timeofweek= weekend” AND Companion.Type=ˇ Girlfriend ASed oN Personal Rating SHOW TOP 3 where Movie Recommender is a 5-dimensional cube of ratings having dimensions User Movie. Time Companion, and Theater; also, Personal Rating represents the ratings measure for the cube The above REqueSt query is based on the olaP paradigm [9], which is a natural choice for querying multidimensional recommender systems, since the data model of REQUEST matches the multidimensional data model of the ratings cube. Besides reQuest, we also present a multidimensional REQUEST is an acronym for REcommendation QUEry STatements. The initial query language, called RQL, was introduced in an earlier workshop paper [5], where only the preliminary ideas of how to define the query language were presented. In this paper, we systematically redesigned the language by formally introducing its syntax, semantics, and the corresponding recommendation algebra. This allowed us to significantly extend capabilities of the language over its preliminary version [5]. To reflect these major changes, we renamed the language from rQl to REQUEST
5 consuming operation. In summary, the above issues can be attributed to the task and model mismatch. SQL is a general-purpose query language, which makes it a less intuitive and a less useful tool for users in the “vertical” application domain of recommender systems, where SQL may not have some specialized capabilities important for recommender systems. Also, SQL is based on the relational data model, and multidimensional recommendations on the multidimensional model [4] would need to be mapped into the relational model to support SQL queries, which leads to various translation problems. To avoid these issues, it is advantageous to develop a specialized recommendation language based on the characteristics of the application domain that also supports the multidimensional recommendation model, as we do in the paper. To provide flexible and user-driven recommendations and to address the previously specified issues with using SQL as a recommendation language, we designed a new recommendation query language REQUEST, 1 which allows its users to express in a flexible manner a broad range of recommendations that are tailored to their own individual needs and, therefore, more accurately reflect their interests. For example, the earlier recommendation for Tom can be expressed in REQUEST as RECOMMEND Movie, Time TO User, Companion USING MovieRecommender RESTRICT User.Name = “Tom” AND Time.TimeOfWeek=“weekend” AND Companion.Type = “Girlfriend” BASED ON PersonalRating SHOW TOP 3 where MovieRecommender is a 5-dimensional cube of ratings having dimensions User, Movie, Time, Companion, and Theater; also, PersonalRating represents the ratings measure for the cube. The above REQUEST query is based on the OLAP paradigm [9], which is a natural choice for querying multidimensional recommender systems, since the data model of REQUEST matches the multidimensional data model of the ratings cube. Besides REQUEST, we also present a multidimensional 1 REQUEST is an acronym for REcommendation QUEry STatements. The initial version of our recommendation query language, called RQL, was introduced in an earlier workshop paper [5], where only the preliminary ideas of how to define the query language were presented. In this paper, we systematically redesigned the language by formally introducing its syntax, semantics, and the corresponding recommendation algebra. This allowed us to significantly extend capabilities of the language over its preliminary version [5]. To reflect these major changes, we renamed the language from RQL to REQUEST
ecommendation algebra that is used for defining certain" core"parts of REQUEST queries. We also describe how these core reQueST queries can be processed by mapping them into this algebra This paper makes the following contributions. It proposes language REQUEST for expressing flexible user-driven recommendations and presents its syntax and semantics. It also presents recommendation algebra RA, which enhances the systematic definition of REQUEST. We also show how the core rEQUeSt queries can be mapped into RA, thus providing a way to process these queries and compare the expressive power of REQUESt and ra 2. Background: Multidimensional Recommender Systems A multidimensional ratings cube is defined as a tuple(D, M, H, E, L)as follows Dimensions D). D=id,, d,2,. dni is a set of n dimensions, where each d, is a dimension name. For example, in addition to the standard User and Movie dimensions of the traditional movie recommender systems, such as Movielens [19], we consider other contextual dimensions [4, 5], such as Time, Theater and Companion, i.e., D=(User, Movie, Time, Theater, Companion) Attribute Hierarchies(H). Each dimension d; is represented by a set of attributes AF(ai, ., ait) where each ai is an attribute name; e.g. Atime=(Date, DayOfWeek, TimeOfWeek, Month, Quarter, Year). The domain of attribute x of dimension d is denoted as dom(dx), e.g., dom(Time. DayOfWeek)= Mon, Tue, Wed, Thu, Fri, Sat, Sun) and dom(Time. TimeOfWeek)= weekday, weekend j The multidimensional recommendation model allows for OLAP-based aggregation hierarchies [4, 5 that help aggregate ratings according to the methods described in [4]. In particular, attributes A, of dimension d; form a directed acyclic graph(i.e, a hierarchy )H;=(Ai, Ei) with set of nodes A; (i.e, each ode corresponds to an attribute)and set of edges Ei. There exists a directed edge in Hi from attribute E Ai to attribute y e Ai, iff every value of x uniquely determines the value of y, i.e., if attribute y is functionally dependent on attribute x. Such an edge will be denoted (x, y)or x]y. We will assume that has a single root node, Root(Hi, which we will call the key dimension attribute, consistent with the standard database terminology. Let H=( H1,..., Hn) Given hierarchy H, and attribute di- x EAi, we define SubGraph(H, d; x)to be a subgraph of H; rooted
6 recommendation algebra that is used for defining certain “core” parts of REQUEST queries. We also describe how these core REQUEST queries can be processed by mapping them into this algebra. This paper makes the following contributions. It proposes language REQUEST for expressing flexible user-driven recommendations and presents its syntax and semantics. It also presents recommendation algebra RA, which enhances the systematic definition of REQUEST. We also show how the core REQUEST queries can be mapped into RA, thus providing a way to process these queries, and compare the expressive power of REQUEST and RA. 2. Background: Multidimensional Recommender Systems A multidimensional ratings cube is defined as a tuple (D, M, H, E, L) as follows. Dimensions (D). D = {d1, d2, …, dn} is a set of n dimensions, where each di is a dimension name. For example, in addition to the standard User and Movie dimensions of the traditional movie recommender systems, such as MovieLens [19], we consider other contextual dimensions [4, 5], such as Time, Theater and Companion., i.e., D = {User, Movie, Time, Theater, Companion}. Attribute Hierarchies (H). Each dimension di is represented by a set of attributes Ai={ai1, …, ait} where each aij is an attribute name; e.g. Atime={Date, DayOfWeek, TimeOfWeek, Month, Quarter, Year}. The domain of attribute x of dimension d is denoted as dom(d.x), e.g., dom(Time.DayOfWeek) = { Mon, Tue, Wed, Thu, Fri, Sat, Sun } and dom(Time.TimeOfWeek) = { weekday, weekend }. The multidimensional recommendation model allows for OLAP-based aggregation hierarchies [4, 5] that help aggregate ratings according to the methods described in [4]. In particular, attributes Ai of dimension di form a directed acyclic graph (i.e., a hierarchy) Hi = (Ai, Ei) with set of nodes Ai (i.e., each node corresponds to an attribute) and set of edges Ei. There exists a directed edge in Hi from attribute x ∈ Ai to attribute y ∈ Ai, iff every value of x uniquely determines the value of y, i.e., if attribute y is functionally dependent on attribute x. Such an edge will be denoted (x, y) or xÆy. We will assume that Hi has a single root node, Root(Hi), which we will call the key dimension attribute, consistent with the standard database terminology. Let H = { H1, …, Hn }. Given hierarchy Hi and attribute di.x ∈Ai, we define SubGraph(Hi, di.x) to be a subgraph of Hi rooted
at dix, i.e., it defines the graph containing all the nodes and edges reachable from di-x Elements(E). Each dimension di in a cube is represented by a set of elements Er. For instance dimension Movie in our example is represented by all the movies available for the users to rate. For simplicity and without loss of generality, we use the domain of the key dimension attribute to represent the set of elements of d, i.e., E;: dom(root(Hi). An example of the elements' set for the User dimension would be a set of all user IDs available in the data. Let E=( El,..., Eni Measures(M). M=(mI, m2, ., mk) represents a set of measures, where each m, is a different type of a rating from domain dom(mi). The measures can either be numeric or Boolean. A numeric measure usually represents a discrete finite ordered value, e.g, a movie rating on the scale of (1,., N.A Boolean measure can be used to represent a"status flag denoting the state of a rating or its specific characteristic, e.g., indicating whether a given movie has been seen by a given user Example 1. Consider the application for recommending movies to users that has the following dimensions, each dimension defined by the attributes specified in parentheses Movie: the set of all the movies that can be recommended it is defined as Movie(MovieID Title, Length, Release Y ear, Director, Genre) User: the people to whom movies are recommended; it is defined as User(UserlD, Name, Address, Age, Gender, Profession) Theater: the movie theaters showing the movies; it is defined as Theater(TheaterlD, Name Address, Capacity, City, Time: the time when the movie can be or has been seen; it is defined as Time(Date, DayOfWeek, TimeofWeek, Month, Quarter, Year) ompanion: represents a person or a group of persons with whom one can see the movie. It defined as Companion( companion Type), where attribute companionType has values and“ others We also use three rating measures in this example: PublicRating, a numeric measure specifying how much the general public liked the movie; Personal Rating, a numeric measure specifying how much a particular person liked or is predicted to like the movie in the settings specified by the Time, Theater, and Companion dimensions; and Consumed, a Boolean measure specifying whether or not a given user has actually seen a given movie in a given context. The Personal Rating assigned to a movie by a person depends on where and how the movie has been
7 at di.x, i.e., it defines the graph containing all the nodes and edges reachable from di.x. Elements (E). Each dimension di in a cube is represented by a set of elements Ei. For instance, dimension Movie in our example is represented by all the movies available for the users to rate. For simplicity and without loss of generality, we use the domain of the key dimension attribute to represent the set of elements of di, i.e., Ei := dom(Root(Hi)). An example of the elements’ set for the User dimension would be a set of all user IDs available in the data. Let E = { E1, …, En }. Measures (M). M = {m1, m2, …, mk} represents a set of measures, where each mi is a different type of a rating from domain dom(mi). The measures can either be numeric or Boolean. A numeric measure usually represents a discrete finite ordered value, e.g., a movie rating on the scale of {1, …, N}. A Boolean measure can be used to represent a “status flag” denoting the state of a rating or its specific characteristic, e.g., indicating whether a given movie has been seen by a given user. Example 1. Consider the application for recommending movies to users that has the following dimensions, each dimension defined by the attributes specified in parentheses: • Movie: the set of all the movies that can be recommended; it is defined as Movie(MovieID, Title, Length, ReleaseYear, Director, Genre). • User: the people to whom movies are recommended; it is defined as User(UserID, Name, Address, Age, Gender, Profession). • Theater: the movie theaters showing the movies; it is defined as Theater(TheaterID, Name, Address, Capacity, City, State, Country). • Time: the time when the movie can be or has been seen; it is defined as Time(Date, DayOfWeek, TimeOfWeek, Month, Quarter, Year). • Companion: represents a person or a group of persons with whom one can see the movie. It is defined as Companion(companionType), where attribute companionType has values “alone”, “friends”, “girlfriend/boyfriend”, “family”, “co-workers”, and “others”. We also use three rating measures in this example: PublicRating, a numeric measure specifying how much the general public liked the movie; PersonalRating, a numeric measure specifying how much a particular person liked or is predicted to like the movie in the settings specified by the Time, Theater, and Companion dimensions; and Consumed, a Boolean measure specifying whether or not a given user has actually seen a given movie in a given context. The PersonalRating assigned to a movie by a person depends on where and how the movie has been
seen, with whom and at what time. Finally, we consider the following aggregation hierarchies Movie: MovielD Genre; User: UserID- Age, UserID Gender, UserID+Profession Theater: TheaterID> City State Country; Time: Date, DayOfWeek TimeOfWeek, Date> Month, Quarter, Year Cube Cells(L). Each cube is a partially defined rating function R from an n-dimensional space of E,X E. to a k- dimensional space of measures,ie.,R:E×…×En→>dom(m1)×….xdom(mk) Alternatively a cube can be perceived as a set of cells L, each cell /E L consisting of the tuple(address, content), i.e., 7 =( address, conten), where address=(a1,…,an),a∈E, and content=(B1,….,B),B∈dom(m) Since the mapping R is partial, content can also have value NULL for some cells. We also use the notation L[address]= content to refer to a specific cell, and l[address]. m; to refer to a specific measure within a cell. Furthermore, the ratings R(al,., an)of the recommendation space S=Eix E2x...x En are either explicitly provided by the users or are implicitly inferred by the system as described below. For example, R(Aviator, Jane, theaters, 2/19/2005, boyfriend)=(6,8, True)means that Jane gave rating 6 (i.e, Personal Rating=6)to"Aviator"that she actually saw (i.e, Consumed= True) with her boyfriend on February 19, 2005 in movie theater 5, but the general public gave the movie the rating of 8(i.e ublicRating =8) Given these preliminaries, the recommendation problem is defined as follows. First, the system need to estimate the unknown ratings and make the rating function R total [4]. Second, to make a recommendation, one needs to select certain non-overlapping""dimensions di,..., dik(k < n)and certain"for whom"dimensions dl,., diI(<n)and, accordingly, recommend for each tuple (a,l, c∈E1x, nEil tuple(a1,…,a;)∈Enx,, XEik maximizing the rating R(a1,…,an) across all the tuples (al,, an)coinciding with(al,., ainE Enix..xEjr on corresponding dimensions dil,.,djl Since the rating cube is only partially filled, it is important to estimate the unspecified ratings for ecommendation purposes. This multidimensional rating estimation problem is addressed in [4], where the reduction-based method of estimating unknown ratings in terms of the known ratings is presented. To
8 seen, with whom and at what time. Finally, we consider the following aggregation hierarchies: Movie: MovieID Æ Genre; User: UserID Æ Age, UserID Æ Gender, UserID Æ Profession; Theater: TheaterID Æ City Æ State Æ Country; Time: Date Æ DayOfWeek Æ TimeOfWeek, Date Æ MonthÆ QuarterÆ Year. Cube Cells (L). Each cube is a partially defined rating function R from an n-dimensional space of E1 × … × En to a k-dimensional space of measures, i.e., R: E1 × …× En → dom(m1) × … × dom(mk). Alternatively, a cube can be perceived as a set of cells L, each cell l ∈ L consisting of the tuple (address, content), i.e., l = (address, content), where address = (α1, …, αn), αi ∈ Ei, and content = (β1, …, βk), βi ∈ dom(mi). Since the mapping R is partial, content can also have value NULL for some cells. We also use the notation L[address] = content to refer to a specific cell, and L[address].mj to refer to a specific measure within a cell. Furthermore, the ratings R(α1, …, αn) of the recommendation space S = E1× E2× …× En are either explicitly provided by the users or are implicitly inferred by the system as described below. For example, R(Aviator, Jane, theater5, 2/19/2005, boyfriend) = (6, 8, True) means that Jane gave rating 6 (i.e., PersonalRating = 6) to “Aviator” that she actually saw (i.e., Consumed = True) with her boyfriend on February 19, 2005 in movie theater 5, but the general public gave the movie the rating of 8 (i.e., PublicRating = 8). Given these preliminaries, the recommendation problem is defined as follows. First, the system needs to estimate the unknown ratings and make the rating function R total [4]. Second, to make a recommendation, one needs to select certain non-overlapping “what” dimensions di1, …, dik (k < n) and certain “for whom” dimensions dj1, …, djl (l < n) and, accordingly, recommend for each tuple (αj1, …, αjl)∈ Ej1×…×Ejl tuple (αi1, …, αik)∈ Ei1×…×Eik maximizing the rating R(α1,…, αn) across all the tuples (α1,…, αn) coinciding with (αj1, …, αjl)∈ Ej1×…×Ejl on corresponding dimensions dj1,…,djl. Since the rating cube is only partially filled, it is important to estimate the unspecified ratings for recommendation purposes. This multidimensional rating estimation problem is addressed in [4], where the reduction-based method of estimating unknown ratings in terms of the known ratings is presented. To
understand how it works assume that we want to recommend a movie to jane Doe who wants to see it with her boyfriend on Saturday in a movie theater. If the Time dimension is partitioned into weekend and weekday components and since Saturday falls on a weekend, the reduction-based approach uses only the ratings for the movies seen on weekends by customers with their boyfriends/girlfriends in the movi theaters in order to provide recommendations for Jane Doe. It was shown that this approach outperforms the standard collaborative filtering in multidimensional settings under certain conditions [4]. Alternative multidimensional rating estimation methods include heuristic- and model-based approaches [2] In this paper we focus on the querying capabilities of the REQUEST language and, therefore, we assume that the multidimensional ratings cube is fully pre-computed before users start issuin recommendation queries. In other words, we assume that all the unknown ratings have been estimated using any of the aforementioned rating estimation techniques. How to perform rating estimation"on demand" based on the query that was issued on a partially filled ratings cube constitutes an interesting future research problem, as we mention in Section The work described in [4] focuses on presenting the multidimensional recommendation model and does not specify how to express a wide variety of recommendations that are possible in multidimensional settings. In the next section we address this limitation by presenting the query language REQUESt for expressing such recommendations 3. Recommendation Query Language REQUEST In this section, we describe the language by providing various examples of rEQUESt queries in Section 3.1, then present its syntax in Section 3.2 and semantics in Section 3. 3 3.1. Introducing REQUEST via Examples All the examples presented in this section are based on the 5-dimensional MovieRecommender schema from Example 1. The first example presents the most basic and traditional recommendation request Query 1: Recommend the best movies to users RECOMMEND Movie To User USING MOVIe Recommender based oN PersonalRating
9 understand how it works, assume that we want to recommend a movie to Jane Doe who wants to see it with her boyfriend on Saturday in a movie theater. If the Time dimension is partitioned into weekend and weekday components and since Saturday falls on a weekend, the reduction-based approach uses only the ratings for the movies seen on weekends by customers with their boyfriends/girlfriends in the movie theaters in order to provide recommendations for Jane Doe. It was shown that this approach outperforms the standard collaborative filtering in multidimensional settings under certain conditions [4]. Alternative multidimensional rating estimation methods include heuristic- and model-based approaches [2]. In this paper we focus on the querying capabilities of the REQUEST language and, therefore, we assume that the multidimensional ratings cube is fully pre-computed before users start issuing recommendation queries. In other words, we assume that all the unknown ratings have been estimated using any of the aforementioned rating estimation techniques. How to perform rating estimation “on demand” based on the query that was issued on a partially filled ratings cube constitutes an interesting future research problem, as we mention in Section 6. The work described in [4] focuses on presenting the multidimensional recommendation model and does not specify how to express a wide variety of recommendations that are possible in multidimensional settings. In the next section we address this limitation by presenting the query language REQUEST for expressing such recommendations. 3. Recommendation Query Language REQUEST In this section, we describe the language by providing various examples of REQUEST queries in Section 3.1, then present its syntax in Section 3.2 and semantics in Section 3.3. 3.1. Introducing REQUEST via Examples All the examples presented in this section are based on the 5-dimensional MovieRecommender schema from Example 1. The first example presents the most basic and traditional recommendation request. Query 1: Recommend the best movies to users: RECOMMEND Movie TO User USING MovieRecommender BASED ON PersonalRating