12 CHAPTER 2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSoffs among expressiveness and ease of realizing scalable implementations.2.2.1Topic-based ModelIn the topic-based model notifications are grouped in topics (or subjects) i.e.,a subscriber declares its interest for a particular topic and will receiveall noti-ficationsrelated tothat topic.Inotherwords,thefilter o.f of asubscriptiong is simply the specification of atopic.Each topic corresponds to a logicalevent channel ideally connecting each possible publisher to all interested sub-scribers. That is, there exists a static association between a channel and allits subscribers,then when a notificationispublished,the systemdoes nothave to calculate all thereceivers.Topic-based model has been the solutionadopted in all early pub/sub incarnations. Examples of systems that fall un-der this category are TIB/RV [77], iBus [3], SCRIBE [22], Bayeux [117] andthe CORBA Notification Service [49].Topics are equivalent to the notion of groups used for instance in the con-text of group communication [86] (e.g., for replication). This equivalence isnot very surprising, since the first systems to offer a form of publish/subscribeinteraction were actually extensions of group communication toolkits [25, 12]and the subscription scheme was thus inherently based on groups [10]. Subse-quently, subscribing to a topic can be viewed as becoming member of a groupand publishing a notification for a topic translates accordingly to broadcast-ing that notification among the members of the corresponding group.Thanksto the topic-group equivalence the topic-based solution mechanism can driveto very efficient implementations, exploiting directly on one hand the largeamount of research work in the multicast area, and on the other the networklevel multicast implementations for diffusing notifications.The main drawback of the topic-based model is the very limited expres-siveness it offers to subscribers. Let us consider, for example an applicationmanaging stock quotes.Though notifications may be structured to containseveral attributes(for example,the name of the quote,its current value,itsvariation), onlyoneattribute maybechosenas being selectivefor thedeliveryof notifications (i.e.the topic).In the example, it may be the quote name.As a consequence,a subscriber interested in a subset of notifications relatedtoaspecificquote(forexampleonlythosesignallingariseof thequoteabovea certain value) will receive also all the other notifications that belong to thesametopic.To address problems related to low expressiveness of topics, several solu-tionsareexploited inpub/subimplementations.For example,thetopic-basedmodel is often extended toprovidehierarchical organization of topics,insteadofasimpleflatstructureofthetopicspace(suchasin[49)i.AtopicBcan1Sometimes, the word subject is used to refer to hierarchical topics instead of being simply
12 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS offs among expressiveness and ease of realizing scalable implementations. 2.2.1 Topic-based Model In the topic-based model notifications are grouped in topics (or subjects) i.e., a subscriber declares its interest for a particular topic and will receive all noti- fications related to that topic. In other words, the filter σ.f of a subscription σ is simply the specification of a topic. Each topic corresponds to a logical event channel ideally connecting each possible publisher to all interested subscribers. That is, there exists a static association between a channel and all its subscribers, then when a notification is published, the system does not have to calculate all the receivers. Topic-based model has been the solution adopted in all early pub/sub incarnations. Examples of systems that fall under this category are TIB/RV [77], iBus [3], SCRIBE [22], Bayeux [117] and the CORBA Notification Service [49]. Topics are equivalent to the notion of groups used for instance in the context of group communication [86] (e.g., for replication). This equivalence is not very surprising, since the first systems to offer a form of publish/subscribe interaction were actually extensions of group communication toolkits [25, 12] and the subscription scheme was thus inherently based on groups [10]. Subsequently, subscribing to a topic can be viewed as becoming member of a group and publishing a notification for a topic translates accordingly to broadcasting that notification among the members of the corresponding group. Thanks to the topic-group equivalence the topic-based solution mechanism can drive to very efficient implementations, exploiting directly on one hand the large amount of research work in the multicast area, and on the other the network level multicast implementations for diffusing notifications. The main drawback of the topic-based model is the very limited expressiveness it offers to subscribers. Let us consider, for example an application managing stock quotes. Though notifications may be structured to contain several attributes (for example, the name of the quote, its current value, its variation), only one attribute may be chosen as being selective for the delivery of notifications (i.e. the topic). In the example, it may be the quote name. As a consequence, a subscriber interested in a subset of notifications related to a specific quote (for example only those signalling a rise of the quote above a certain value) will receive also all the other notifications that belong to the same topic. To address problems related to low expressiveness of topics, several solutions are exploited in pub/sub implementations. For example, the topic-based model is often extended to provide hierarchical organization of topics, instead of a simple flat structure of the topic space (such as in [49])1 . A topic B can 1Sometimes, the word subject is used to refer to hierarchical topics instead of being simply
132.2.SUBSCRIPTIONMODELSbe then defined as a sub-topic of an existing topic A. Notifications matchingB will be received by all clients subscribed to both A and B. Implementa-tions also often include convenience operators, such as wildcard characters,for subscribing to more than one topic with a single subscription.Thoughthesetechniquesgivetheapplication developer effective solutionsto overcomeexpressiveness limitations of the topic-based scheme, they does not still alterthe very nature of the topic concept, i.e. a simple organization of subscribersinto group with thegreat advantage of a simpleand efficient implementation,that however may not be sufficient with those applications where the inter-est of subscribers presents a high variability and cannot be clustered withsimplicity.2.2.2Content-based ModelIn the content-based variant, subscribers express their interest by specifyingconditions over the content of notifications they want to receive.In otherwords, a filter in a subscription is a query composed by a conjunction of con-straints over the values of attributes of the notification?. Possible constraintsdepend on the attribute type and on the subscription language. Most sub-scription languages comprise equality and comparison operators as well asregular expressions[20,100,44].Generally constraintscan bejoined insidefiltersthroughAND/ORexpressions3A completespecificationof content-based subscriptionmodelscanbefoundin[75].Examplesof systemsthatfallunderthecontent-based categoryre Gryphon [53], SIENA[102],JEDI [29]yaLeSubscribe[87], Ready [52], Hermes[83], Elvin [99]As an example of the content-based model, let us consider again notifi-cations representing stock quotes.Differently from thetopic-based scheme.a subscription can involve all the attributes of the notification, on which asubscriber can express a constraint with type-specific operators:StockName =‘IBM'and change<-3StockName =‘M*" and change >= 1In content-based publish/subscribe, notifications are not classified accord-ing tosomepre-defined external criterion (i.e.,topic name),but rather accord-ing to properties of the notifications themselves, that assume different valuesin each different notification.As a consequence the set of receivers for eachnotification cannot be determined a priori but has to be computed at publica-tion time.Then, the higher expressive power of content-based pub/sub comesa synonymous for topic.2disjunctive constraints can be treated as separate subscriptions3The complexity of the subscription language obviously influences the complexity ofmatching operation. Then it is not common to have subscription languages allowing queriesmore complex than ones in conjunctive forms. One example can be found in [16]
2.2. SUBSCRIPTION MODELS 13 be then defined as a sub-topic of an existing topic A. Notifications matching B will be received by all clients subscribed to both A and B. Implementations also often include convenience operators, such as wildcard characters, for subscribing to more than one topic with a single subscription. Though these techniques give the application developer effective solutions to overcome expressiveness limitations of the topic-based scheme, they does not still alter the very nature of the topic concept, i.e. a simple organization of subscribers into group with the great advantage of a simple and efficient implementation, that however may not be sufficient with those applications where the interest of subscribers presents a high variability and cannot be clustered with simplicity. 2.2.2 Content-based Model In the content-based variant, subscribers express their interest by specifying conditions over the content of notifications they want to receive. In other words, a filter in a subscription is a query composed by a conjunction of constraints over the values of attributes of the notification2 . Possible constraints depend on the attribute type and on the subscription language. Most subscription languages comprise equality and comparison operators as well as regular expressions [20, 100, 44]. Generally constraints can be joined inside filters through AND/OR expressions3 A complete specification of contentbased subscription models can be found in [75]. Examples of systems that fall under the content-based category are Gryphon [53], SIENA [102], JEDI [29], LeSubscribe [87], Ready [52], Hermes [83], Elvin [99]. As an example of the content-based model, let us consider again notifi- cations representing stock quotes. Differently from the topic-based scheme, a subscription can involve all the attributes of the notification, on which a subscriber can express a constraint with type-specific operators: StockName = ‘IBM’ and change < -3 StockName = ‘M*’ and change >= 1 In content-based publish/subscribe, notifications are not classified according to some pre-defined external criterion (i.e., topic name), but rather according to properties of the notifications themselves, that assume different values in each different notification. As a consequence the set of receivers for each notification cannot be determined a priori but has to be computed at publication time. Then, the higher expressive power of content-based pub/sub comes a synonymous for topic. 2disjunctive constraints can be treated as separate subscriptions 3The complexity of the subscription language obviously influences the complexity of matching operation. Then it is not common to have subscription languages allowing queries more complex than ones in conjunctive forms. One example can be found in [16]
14CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSat the price of the higher resource consumption needed to calculate for eachpublished notification the set of interested subscribers [19, 39].It is straightforward to see that a topic-based scheme can be possibly em-ulated through a content-based one, simply considering filters comprising asingle equality constraint. The opposite is not true: in particular, the channelabstraction in the topic-based scheme cannot represent fexible features of thecontent-based scheme such as comparison operators or complexconjunctivesubscriptions [68].At most, some systems (such as the COM+ NotificationService [85] and the CORBA Notification Service [51] provide an additionalcontent-based subscription language that allows to filter out notifications re-ceived from a channel.We referto this model asthe filtered topic model,making a clear distinction with the content-based model.Stressing the difference between the filtered topic approach and a purecontent-based oneallows us to clarify oncemorethe implications of the sub-scription model over the system implementation. The Notification Serviceexploits subscriptions to derive the set of clients to which the notificationmust be sent to. In a pure content-based system,a notificationthat does notmatchanysubscriptionisnotsenttoclientsavingnetworkresources.Ontheotherhand.in afiltered tocipientscanonlybedeter-mined acconotificationistheedtodetermineifitsenttoasubsoribucllenshould betopicbutdoesctuallydenotmatchthecoontent-hedhtto the client,generatinguselessnetworktraffic.Thenthehighofcontent-basedmodelcan aid to savenetworkresources,sendinga notification to all and onlytheactual subscribers.However,as we will see in Section 2.4,a content-basedsystem requires more information to be sent on the network for determiningthe recipient set.Inoverall,wecansaythat content-basedpub/subisthemostgeneralsubscription model.This is the reason why this model has gained a lot ofattention from the research community and currently still represents the mainsource of open problems in the field, especially related to efficient and scalablematching and diffusion [6].2.2.3Type-based ModelThe third alternative subscription model proposed in the literature is the type-based model [37].The type-based variant enhances common pub/sub withconcepts derived from object-oriented programming:notificationsaredeclaredas objects belonging to a specific type (obuents), which can thus encapsulateattributes as well as methods. With respect to simple, unstructured models,Types represent a more robust data model for application developer providing
14 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS at the price of the higher resource consumption needed to calculate for each published notification the set of interested subscribers [19, 39]. It is straightforward to see that a topic-based scheme can be possibly emulated through a content-based one, simply considering filters comprising a single equality constraint. The opposite is not true: in particular, the channel abstraction in the topic-based scheme cannot represent flexible features of the content-based scheme such as comparison operators or complex conjunctive subscriptions [68]. At most, some systems (such as the COM+ Notification Service [85] and the CORBA Notification Service [51]) provide an additional content-based subscription language that allows to filter out notifications received from a channel. We refer to this model as the filtered topic model, making a clear distinction with the content-based model. Stressing the difference between the filtered topic approach and a pure content-based one allows us to clarify once more the implications of the subscription model over the system implementation. The Notification Service exploits subscriptions to derive the set of clients to which the notification must be sent to. In a pure content-based system, a notification that does not match any subscription is not sent to any client, saving network resources. On the other hand, in a filtered topic-based system, recipients can only be determined according to the topic they subscribed to. Only once a notification is sent to a subscribing client, the content-based filter is used to determine if it should be actually delivered to it. If the notification matches a topic but does not match the content-based filter, it is not delivered to the client, generating useless network traffic. Then the higher expressiveness of content-based model can aid to save network resources, sending a notification to all and only the actual subscribers. However, as we will see in Section 2.4, a content-based system requires more information to be sent on the network for determining the recipient set. In overall, we can say that content-based pub/sub is the most general subscription model. This is the reason why this model has gained a lot of attention from the research community and currently still represents the main source of open problems in the field, especially related to efficient and scalable matching and diffusion [6]. 2.2.3 Type-based Model The third alternative subscription model proposed in the literature is the typebased model [37]. The type-based variant enhances common pub/sub with concepts derived from object-oriented programming: notifications are declared as objects belonging to a specific type (obvents), which can thus encapsulate attributes as well as methods. With respect to simple, unstructured models, Types represent a more robust data model for application developer providing
15ARCHITECTURALMODELS2.3.NotficationNotificationServiceServiceNotificatMiddlewareOverlayServi(IIOP.DCOM..NotificatinrTransportTransportTransportServiceNetworkNetworkNetworkNetwork(b)(a)(c)Figure 2.2:Pub/subArchitectural Modelstype-safety to be checked by the Notification Service, rather than inside theapplication[41]In a type-based subscription the declaration of a desired type is the maindiscriminating attribute. That is, with respect to the aforementioned models,type-based pub/sub poses itself somehow in the middle, by giving a coarse-grained structure on notifications (like in topic-based) on which fine-grainedconstraints can be expressed over attributes (like in content-based) or overmethods (as a consequence of the object-oriented approach). Type-basedpub/sub in this sense resembles the filtered topic model.Type-based pub/sub was firstly proposed in [42] and fully developed in [37],where it is described an extension to Java for the management of distributedobvents and type-based subscriptions.2.3Architectural ModelsIn this Section wefocusonanotherbasicaspect ofa pub/sub system,thatisthe architectureof a distributed Notification Service, with respect to howthe various parts it is composed of are structured and what is the mechanismthey used to communicate. There are basically three solutions for realizing inpractice the architecture of a Notification Service (Figure 2.2): Relying on multicasting facilities provided by the underlying networklevels -Figure2.2 (a). Implementing a broker-level notification routing protocol -Figure 2.2(b). Exploiting a peer-to-peer overlay network infrastructurefor application-level multicasting - Figure 2.2 (c)
2.3. ARCHITECTURAL MODELS 15 Network Notification Service Network Notification Service Transport Network Notification Service Transport Middleware ( IIOP , DCOM .) Network Notification Service Overlay Transport (a) (b) ( c ) Figure 2.2: Pub/sub Architectural Models type-safety to be checked by the Notification Service, rather than inside the application [41]. In a type-based subscription the declaration of a desired type is the main discriminating attribute. That is, with respect to the aforementioned models, type-based pub/sub poses itself somehow in the middle, by giving a coarsegrained structure on notifications (like in topic-based) on which fine-grained constraints can be expressed over attributes (like in content-based) or over methods (as a consequence of the object-oriented approach). Type-based pub/sub in this sense resembles the filtered topic model. Type-based pub/sub was firstly proposed in [42] and fully developed in [37], where it is described an extension to Java for the management of distributed obvents and type-based subscriptions. 2.3 Architectural Models In this Section we focus on another basic aspect of a pub/sub system, that is the architecture of a distributed Notification Service, with respect to how the various parts it is composed of are structured and what is the mechanism they used to communicate. There are basically three solutions for realizing in practice the architecture of a Notification Service (Figure 2.2): ❒ Relying on multicasting facilities provided by the underlying network levels - Figure 2.2 (a). ❒ Implementing a broker-level notification routing protocol - Figure 2.2 (b). ❒ Exploiting a peer-to-peer overlay network infrastructure for applicationlevel multicasting - Figure 2.2 (c)
16CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSIn the following we discuss these three solutions, pointing out the draw-backs due to each of them.2.3.11NetworkMulticastingAs pointed out in previous Section, the use of a network-level multicast pro-tocol [32] as a communication layer for the Notification Service has been thenatural choice in early systems as they were substantiallytopic-based.Themain advantage of a network-level approach is that it is the easier way to re-alize many-to-many diffusion experiencing low latencies and high throughput,thanks to the small delays introduced by implementing the protocols exclu-sively involving routers and switches.We recall again that multicasting can be directly used in topic-based sys-tems, as each topic corresponds exactly to one multicast group. Using multi-castingfor content-based systems is not as straightforward because subscriberscannotbedirectlymappedtomulticastgroups78]:Thisproblemhasinspiredsome research work [91, 92, 54], aiming at finding the best possible configura-tion for multicastgroups from a given set of subscribers,where "best"meansstructuring clusters of subscribers with “similar" subscriptions,so that a no.tificationsenttoamulticastgroupwill bedelivered tomost of itsmembersThe main drawback of multicast when applied to wide-area scenarios isin its lack ofa widespread deployment[34, 101].Hence, network-level multi-casting cannot in general be considered as a feasible solution for applicationsdeployed over a WAN (for example TIB/RV or the CORBA Notification Ser-vice uses multicast only for diffusing notifications inside a local area network).Another undesirable property of network-level multicasting is the lack ofreliability guarantees.Several algorithms [111, 56,45] have been proposedoffering reliable delivery through a retransmission mechanism over the unreli-able multicast transport. However, this approach has proven not to reach thelevels of scalability required in a large-scale pub/sub system, because of thehigh overhead ofmessage retransmission,making it suitable onlyfor systemswith up to a few hundred participants [13].For this reason, multicast algorithms being designed specifically for beingapplied to pub/sub systems often belong to the class of gossip algorithms [43].Gossip (or epidemic or probabilistic) algorithms achieve reliability in a prob-abilisticsense[13],byguaranteeingthatall participants inthesystemreceiveany message only with a certain, quantifiable probability. Gossip algorithmsreach high level of scalability at the price of a small loss in reliability. Specificgossip algorithms for pub/sub systems have been proposed in [36, 38, 28]
16 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS In the following we discuss these three solutions, pointing out the drawbacks due to each of them. 2.3.1 Network Multicasting As pointed out in previous Section, the use of a network-level multicast protocol [32] as a communication layer for the Notification Service has been the natural choice in early systems as they were substantially topic-based. The main advantage of a network-level approach is that it is the easier way to realize many-to-many diffusion experiencing low latencies and high throughput, thanks to the small delays introduced by implementing the protocols exclusively involving routers and switches. We recall again that multicasting can be directly used in topic-based systems, as each topic corresponds exactly to one multicast group. Using multicasting for content-based systems is not as straightforward because subscribers cannot be directly mapped to multicast groups [78]: This problem has inspired some research work [91, 92, 54], aiming at finding the best possible configuration for multicast groups from a given set of subscribers, where “best” means structuring clusters of subscribers with “similar” subscriptions, so that a notification sent to a multicast group will be delivered to most of its members. The main drawback of multicast when applied to wide-area scenarios is in its lack of a widespread deployment [34, 101]. Hence, network-level multicasting cannot in general be considered as a feasible solution for applications deployed over a WAN (for example TIB/RV or the CORBA Notification Service uses multicast only for diffusing notifications inside a local area network). Another undesirable property of network-level multicasting is the lack of reliability guarantees. Several algorithms [111, 56, 45] have been proposed offering reliable delivery through a retransmission mechanism over the unreliable multicast transport. However, this approach has proven not to reach the levels of scalability required in a large-scale pub/sub system, because of the high overhead of message retransmission, making it suitable only for systems with up to a few hundred participants [13]. For this reason, multicast algorithms being designed specifically for being applied to pub/sub systems often belong to the class of gossip algorithms [43]. Gossip (or epidemic or probabilistic) algorithms achieve reliability in a probabilistic sense [13], by guaranteeing that all participants in the system receive any message only with a certain, quantifiable probability. Gossip algorithms reach high level of scalability at the price of a small loss in reliability. Specific gossip algorithms for pub/sub systems have been proposed in [36, 38, 28]