22CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSdistribution of the Notification Service: in particular, by avoiding subscrip-tions to be replicated throughout the whole Notification Service, allows thesystem to tackle scalability issues such as the high traffic generated by thediffusion of each subscription change, the high memory consumption at eachbroker to storesubscriptionsand also thehigh computational powerrequiredtomatchsubscriptions6Then, a criterion should be defined in order to choose for each subscriptiona corresponding target broker to which it is assigned to. Clearly, there isa strict relationship betweenthisproblem and the other onespresented inprevious Section: the assignment policy must be also considered (i) whenrouting subscriptions to the corresponding target broker and (ii) for identifying(and reaching)all possibletargetbrokers whenanotification ispublished.Given the set of all possible subscriptions SC, we can abstract the sub-scription assignmentproblembyconsideringthefollowingfunction:assign:SC-24(2.1) The meaning of assign(o) = [bi,.,bk) is the following: a broker b; Eassign(o) is the target broker for the subscriber that issued subscription o.Then, it hosts and matches all notifications related to o. Given a notificatione, the task of deriving the target broker set for e, namely event resolving, isabstracted by thefollowingfunction:resolve:2-24(2.2)Obviously, assignment and resolving are strictly related since a commoncriteria (Assignment policy)must be used to implement the assign and resolvefunctions.It is not straightforward to realize an assignment policy that can easily beimplemented both for assignment and for resolving. Subscription assignmentstrategies used in actual pub/sub systems can be summarized in the followingtwo dual policies: Access-Driven Assignment (ADA): a subscription is stored in the ac-cess points for the subscriber o.s. In this case access points and targetbrokers coincides for s. In other words, the value of assign depends onlyon o.s, not considering a.f.Filter-Driven Assignment (FDA):subscription filters are partitioned intoa set of clusters. Each subscription is assigned to a different broker de-pending on the cluster it is assigned to. Cluster are determined according6Let us recall that we assume scenarios where a large number of subscriptions must behandled by the system
22 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS distribution of the Notification Service: in particular, by avoiding subscriptions to be replicated throughout the whole Notification Service, allows the system to tackle scalability issues such as the high traffic generated by the diffusion of each subscription change, the high memory consumption at each broker to store subscriptions and also the high computational power required to match subscriptions6 . Then, a criterion should be defined in order to choose for each subscription a corresponding target broker to which it is assigned to. Clearly, there is a strict relationship between this problem and the other ones presented in previous Section: the assignment policy must be also considered (i) when routing subscriptions to the corresponding target broker and (ii) for identifying (and reaching) all possible target brokers when a notification is published. Given the set of all possible subscriptions SC, we can abstract the subscription assignment problem by considering the following function: assign : SC → 2 ∆ (2.1) The meaning of assign(σ) = {b1, ., bk} is the following: a broker bi ∈ assign(σ) is the target broker for the subscriber that issued subscription σ. Then, it hosts σ and matches all notifications related to σ. Given a notification e, the task of deriving the target broker set for e, namely event resolving, is abstracted by the following function: resolve : Ω → 2 ∆ (2.2) Obviously, assignment and resolving are strictly related since a common criteria (Assignment policy) must be used to implement the assign and resolve functions. It is not straightforward to realize an assignment policy that can easily be implemented both for assignment and for resolving. Subscription assignment strategies used in actual pub/sub systems can be summarized in the following two dual policies: ❒ Access-Driven Assignment (ADA): a subscription σ is stored in the access points for the subscriber σ.s. In this case access points and target brokers coincides for s. In other words, the value of assign depends only on σ.s, not considering σ.f. ❒ Filter-Driven Assignment (FDA): subscription filters are partitioned into a set of clusters. Each subscription is assigned to a different broker depending on the cluster it is assigned to. Cluster are determined according 6Let us recall that we assume scenarios where a large number of subscriptions must be handled by the system
2.4.BEHINDTHE SCENES OFADISTRIBUTEDNOTIFICATION23SERVICE4B1B2AB1Bo:B3Figure 2.4: Subscription Assignment Policiesto subscriptions' filters. That is, assign is the exact dual of the previouscase, considering only o.f independently from o.s.Figure 2.4 shows the two different approaches applied to an example sub-scription X, defined over a 2-dimensional space. On theleft, it is shown a ADAapproach: subscription is hosted at broker Bo that has been contacted by thesubscriber si. On the right, a FDA approach is shown, for the same brokerconfiguration.The plot on the right is a graphical depiction of a simple wayto implement the assign function: a 2-dimensional space is divided in zones,each one assigned to a broker. Figure shows the graphical representation ofthe space on a 2-Dplot.Inthis repesentation,notificationsarepointsand?subscriptions are rectangles.X falls in the zone assigned to broker Bi,thenX is hosted in BiADA is theapproach commonly used in the majority of distributed pub/subsystems (for example SIENA and Gryphon). No subscription routing is neededbecause subscriptions areheld bybrokersdirectlyconnected tothesubscribersHowever, for a correct dispatching of notifications, some routing informationhasalsotobediffusedthroughthenetwork,aswewill explainbelow.The FDA approach has been formalized in [110], (under the name sub-scription partitioning)and recently, many systems appeared following such ascheme (Scribe [22], Bayeux [117], Hermes [83]). The FDA approach is moti-vated by the fact that a controlled subscription distribution can allow to betterload balance subscription storage and management: all subscription matchingthe same notifications will be hosted by the same broker, avoiding a redun-dantmatchingtobeperformed inseveral differentbrokers.Alsonotificationrouting is simplified,consisting in the creation of single-rooted diffusiontreesstarting from target brokers and spanning all subscribers.On the other hand, the FDA approach requires all brokers to be awareoftheassignmentpolicyused.Thisproblemhasnoobvioussolution inthe
2.4. BEHIND THE SCENES OF A DISTRIBUTED NOTIFICATION SERVICE 23 s B 1 0 B 1 B 2 X B 3 X s B 1 0 B 1 B 2 X B 3 X B 1 B 0 B 2 B 3 X Figure 2.4: Subscription Assignment Policies to subscriptions’ filters. That is, assign is the exact dual of the previous case, considering only σ.f independently from σ.s. Figure 2.4 shows the two different approaches applied to an example subscription X, defined over a 2-dimensional space. On the left, it is shown a ADA approach: subscription is hosted at broker B0 that has been contacted by the subscriber s1. On the right, a FDA approach is shown, for the same broker configuration. The plot on the right is a graphical depiction of a simple way to implement the assign function: a 2-dimensional space is divided in zones, each one assigned to a broker. Figure shows the graphical representation of the space on a 2-D plot. In this representation, notifications are points and subscriptions are rectangles. X falls in the zone assigned to broker B1, then X is hosted in B1. ADA is the approach commonly used in the majority of distributed pub/sub systems (for example SIENA and Gryphon). No subscription routing is needed because subscriptions are held by brokers directly connected to the subscribers. However, for a correct dispatching of notifications, some routing information has also to be diffused through the network, as we will explain below. The FDA approach has been formalized in [110], (under the name subscription partitioning) and recently, many systems appeared following such a scheme (Scribe [22], Bayeux [117], Hermes [83]). The FDA approach is motivated by the fact that a controlled subscription distribution can allow to better load balance subscription storage and management: all subscription matching the same notifications will be hosted by the same broker, avoiding a redundant matching to be performed in several different brokers. Also notification routing is simplified, consisting in the creation of single-rooted diffusion trees starting from target brokers and spanning all subscribers. On the other hand, the FDA approach requires all brokers to be aware of the assignment policy used. This problem has no obvious solution in the
24CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSgeneral case where new brokers may join the system, because the assignmentpolicy can be difficult to adapt to a dynamic scenario: for example considera policy where the notification schema is partitioned and each partition is as-signed to a broker. When a new broker joins, a new partition must be createdand all brokers have to be kept aware of the update in the policy. Findinga scalable way to perform this update is not straightforward. [11o] proposestwodifferent solutions to theassignment problem, relying on simplifyingas-sumptions: one considers a subscription language comprising only equalityconstraints, while another allows a general content-based language but as-sumingthat thenumber of brokers doesnot changethroughout thesystemlifetime. In Section 2.5 we present assignment policies implemented in systemssuch as SCRIBE and Hermes.Let us finally point out that the problem of finding a assignment policy inFDA is equivalent to the problem of determining multicast groups in content-based multicasting, addressed in [91]. Both problems imply a clustering of thesubscription set, where in FDA each cluster of subscriptions is assigned to abroker, while in content-based multicasting it is assigned to a multicast group.We stress the fact that none of the aforementioned papers solve the problemin presence of a number of brokers that can change over time.2.4.4Event and Notification RoutingWe define another function that will be used for stating the event and notifi-cation routing problems:avail:xB×T-booleanThe meaning of avail relates to the fact that in a distributed system notallprocessesmayhavethesameviewofpublishedinformationitems,i.e.eis not availableto all processes at the sametime[5].avail returnsTRUE ata timet when a broker has received an event at t and is able to process it(match, forward or deliver to subscriber).Eventroutingmeans, consideringan event eissued by apublisher at atimeT, reaching eventually all the brokers that can host subscriptions matching e.Formally, this is expressed as:Vg : e C o, VbE assign(o) - t >T : avail(e,b,t) = TRUEOn the other hand, notification routing means eventually reaching all thebrokers that are access points for subscribers that have been identified asinterested in thenotification. Formally, this is expressed as:Ve, Vo : e E o, Vb E AP(o.s) -→ 3t > T : avail(e,b,t) = TRUE
24 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS general case where new brokers may join the system, because the assignment policy can be difficult to adapt to a dynamic scenario: for example consider a policy where the notification schema is partitioned and each partition is assigned to a broker. When a new broker joins, a new partition must be created and all brokers have to be kept aware of the update in the policy. Finding a scalable way to perform this update is not straightforward. [110] proposes two different solutions to the assignment problem, relying on simplifying assumptions: one considers a subscription language comprising only equality constraints, while another allows a general content-based language but assuming that the number of brokers does not change throughout the system lifetime. In Section 2.5 we present assignment policies implemented in systems such as SCRIBE and Hermes. Let us finally point out that the problem of finding a assignment policy in FDA is equivalent to the problem of determining multicast groups in contentbased multicasting, addressed in [91]. Both problems imply a clustering of the subscription set, where in FDA each cluster of subscriptions is assigned to a broker, while in content-based multicasting it is assigned to a multicast group. We stress the fact that none of the aforementioned papers solve the problem in presence of a number of brokers that can change over time. 2.4.4 Event and Notification Routing We define another function that will be used for stating the event and notifi- cation routing problems: avail : Ω × B × T → boolean The meaning of avail relates to the fact that in a distributed system not all processes may have the same view of published information items, i.e. e is not available to all processes at the same time [5]. avail returns TRUE at a time t when a broker has received an event at t and is able to process it (match, forward or deliver to subscriber). Event routing means, considering an event e issued by a publisher at a time τ , reaching eventually all the brokers that can host subscriptions matching e. Formally, this is expressed as: ∀σ : e @ σ, ∀b ∈ assign(σ) → ∃t > τ : avail(e, b, t) = T RUE On the other hand, notification routing means eventually reaching all the brokers that are access points for subscribers that have been identified as interested in the notification. Formally, this is expressed as: ∀e, ∀σ : e @ σ, ∀b ∈ AP(σ.s) → ∃t > τ : avail(e, b, t) = T RUE
2.4.BEHINDTHESCENES OFADISTRIBUTEDNOTIFICATION25SERVICERouting Strategies for Access-Driven Assignment.Obviously underan ADA approach, event routing and notification routing coincide as the targetbroker for a subscription is the access point for its subscriber o.s.Thus,when an event ispublished,thedifficulttaskisgettingtoknowall thebrokersthat may be subscribers for it, i.e. implementing the resolve function. Thetrivial solution is to broadcast each notification to all the brokers (flooding).This obviouslyleadstoan undesirablewaste of networkresources,becauseall notifications are always sent to all brokers even if they do not match anyof their subscribers.But avoiding that all notifications are blindly flooded tonot-interested subscribers, requires more information on the publisher side.Thatis,copiesof allthesubscriptionshavetobediffused towards all possiblepublishers, and in the general case when all brokers may host a publisher forany subscription, this means flooding all subscriptions.Aiding event routing through subscription redundancy creates a trade-offbetween notification flooding and subscription fooding. The more brokersare aware of all subscriptions, the earlier notifications that do not match anysubscribers canbefilteredetthatsubscriptionsaregenerallysuphefacposed tochangeataloweventpublicationmaysuggestthattherate thancost of flooding 1.bothsimulationsstud-aid.Howemgtbeyortnbeies(76,20)and)that subscription flooding canracticaexample, the completeflooding ofrarelybeconsideredafeasiblesolusubscriptionsquenching")ofanreterred"0olderversionofElvin[99].Authcsion((100)hadremovedthefeature,as itproved toostly.Asophisticated solutionforlimiting subscription diffusion is the one included in SIENA, and successivelyrefined inRebeca, that wewill present in Section 2.5.Routing Strategies forFilter-DrivenAssignment.UnderaFilter-DrivenAssignment policyevent routing andnotification routing are twodistinctphases: in the first one the target broker is reached by the event, in the sec-ond one the notification is delivered to all matching subscribers.Differently from the ADA approach, in FDA brokers that have to bereached by a notification in both phases can be determined in advance: thetarget broker with the resolve function and the matching subscribers' accesspoints from the matching operation itself7. Then, event and notification rout-ingreducestorouting to somespecific brokers (Directrouting).Thiscan beeasilyimplemented eitherat application-level, orthroughwell-known networkrouting algorithms (such as link state or distancevector),orby exploiting thedirect routing capabilities of an overlay network infrastructure.7Let us recall that clients are excluded from our model.They can be completely replacedby theiraccess points
2.4. BEHIND THE SCENES OF A DISTRIBUTED NOTIFICATION SERVICE 25 Routing Strategies for Access-Driven Assignment. Obviously under an ADA approach, event routing and notification routing coincide as the target broker for a subscription σ is the access point for its subscriber σ.s. Thus, when an event is published, the difficult task is getting to know all the brokers that may be subscribers for it, i.e. implementing the resolve function. The trivial solution is to broadcast each notification to all the brokers (flooding). This obviously leads to an undesirable waste of network resources, because all notifications are always sent to all brokers even if they do not match any of their subscribers. But avoiding that all notifications are blindly flooded to not-interested subscribers, requires more information on the publisher side. That is, copies of all the subscriptions have to be diffused towards all possible publishers, and in the general case when all brokers may host a publisher for any subscription, this means flooding all subscriptions. Aiding event routing through subscription redundancy creates a trade-off between notification flooding and subscription flooding. The more brokers are aware of all subscriptions, the earlier notifications that do not match any subscribers can be filtered out. The fact that subscriptions are generally supposed to change at a lower rate than event publication may suggest that the cost of flooding might be worth being paid. However, both simulations studies ([76, 20]) and practical experiences report that subscription flooding can rarely be considered a feasible solution. For example, the complete flooding of subscriptions was a characterizing feature (referred to as “quenching”) of an older version of Elvin [99]. Authors in a successive version ([100]) had removed the feature, as it proved to be very costly. A more sophisticated solution for limiting subscription diffusion is the one included in SIENA, and successively refined in Rebeca, that we will present in Section 2.5. Routing Strategies for Filter-Driven Assignment. Under a Filter-Driven Assignment policy, event routing and notification routing are two distinct phases: in the first one the target broker is reached by the event, in the second one the notification is delivered to all matching subscribers. Differently from the ADA approach, in FDA brokers that have to be reached by a notification in both phases can be determined in advance: the target broker with the resolve function and the matching subscribers’ access points from the matching operation itself7 . Then, event and notification routing reduces to routing to some specific brokers (Direct routing). This can be easily implemented either at application-level, or through well-known network routing algorithms (such as link state or distance vector), or by exploiting the direct routing capabilities of an overlay network infrastructure. 7Let us recall that clients are excluded from our model. They can be completely replaced by their access points
26CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSAnother possible approach for notification routing, proposed in [110], relieson an external centralized notification routing service that receives notifica-tions from brokers and dispatch them to proper subscribers. In this situation,brokersmaintain subscriptions andperformmatching andthenotification ser-vicemaintainsall thereferencestosubscribers.2.4.5ClassificationFrameworkAll the general solutions related to the various aspect of a Notification Servicepresented in previous Section can be positioned in the classification frameworkthat we define in thefollowing.The result of the classification is summarizedin Table 2.1. The table reports on columns the possible assignment criteria (allbrokers, ADA, FDA) and on rows the phases related to notification diffusion.Each cell in the table represents, in each assignment scenario, if a phase isrequired, how it can be solved and when matching is performed. It is importantto point out that herewegivea general overviewthat doesnotconsiderthespecific optimizations carried out in actual systems. These are presented inthe following Section, specifying how they are related to the results in thetable.First, we analyze the case when no particular subscription assignment pol-icy is considered, that means that all subscriptions are propagated to all thebrokers in the Notification Service (i.e. the target broker set coincides with thewhole broker set).In other words, subscription routing is performed througha broadcast of all subscriptions. Matching can be performed by the brokerthat receives the publication (Match-first approach), then no event routing isrequired but only a notification routing, that reduces to a direct multicast toall recipients.The drawback is the high subscription redundancy that impliesahighmemory consumption and high subscription traffic.For what concerns Access-Driven Assignment,in its basic realization sub-scriptions are not replicated but they are retained by access points.Thismakes necessary for event routing a broadcast of all events through the en-tire network.Matching is then performed at eachaccesspoint (Diffusion-firstapproach).In Filter-Driven Assignment, we assume that each broker can calculateboth the assign and resolve functions upon publication and subscription, inorder to identify thetarget broker, that can hold a subscription or match anevent (Match-at-Target approach). Once the target broker is identified, it canbe reached through direct routing from any broker:A notification routingphase is also required after the matching, but this can also be performed in adirect way.The drawback is the additional notification routing phase required.Moreover, implementing a consistent assignment criteria may be no trivial job
26 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS Another possible approach for notification routing, proposed in [110], relies on an external centralized notification routing service that receives notifications from brokers and dispatch them to proper subscribers. In this situation, brokers maintain subscriptions and perform matching and the notification service maintains all the references to subscribers. 2.4.5 Classification Framework All the general solutions related to the various aspect of a Notification Service presented in previous Section can be positioned in the classification framework that we define in the following. The result of the classification is summarized in Table 2.1. The table reports on columns the possible assignment criteria (all brokers, ADA, FDA) and on rows the phases related to notification diffusion. Each cell in the table represents, in each assignment scenario, if a phase is required, how it can be solved and when matching is performed. It is important to point out that here we give a general overview that does not consider the specific optimizations carried out in actual systems. These are presented in the following Section, specifying how they are related to the results in the table. First, we analyze the case when no particular subscription assignment policy is considered, that means that all subscriptions are propagated to all the brokers in the Notification Service (i.e. the target broker set coincides with the whole broker set). In other words, subscription routing is performed through a broadcast of all subscriptions. Matching can be performed by the broker that receives the publication (Match-first approach), then no event routing is required but only a notification routing, that reduces to a direct multicast to all recipients. The drawback is the high subscription redundancy that implies a high memory consumption and high subscription traffic. For what concerns Access-Driven Assignment, in its basic realization subscriptions are not replicated but they are retained by access points. This makes necessary for event routing a broadcast of all events through the entire network. Matching is then performed at each access point (Diffusion-first approach). In Filter-Driven Assignment, we assume that each broker can calculate both the assign and resolve functions upon publication and subscription, in order to identify the target broker, that can hold a subscription or match an event (Match-at-Target approach). Once the target broker is identified, it can be reached through direct routing from any broker. A notification routing phase is also required after the matching, but this can also be performed in a direct way. The drawback is the additional notification routing phase required. Moreover, implementing a consistent assignment criteria may be no trivial job