172.3.ARCHITECTURAL MODELSApplication-level Networks2.3.2The most common approach for the design of a distributed Notification Ser-vice is building an application-level network of brokers.Brokers communicatethrough links consisting of connections over an underlying transport protocol.The underlying broker-to-broker protocol can be then any transport-level orapplication-level protocol:themost common choice is TCP/IPbut alsoHTTPor middleware protocols such as IIOP or DCOM can be used.The application-level network is a pure abstraction as links are not re-quired to represent permanent, long-lived connections.The main implicationof structuring a Notification Service as an application-level network is that abroker "knows" only a limited set of brokers (i.e., its neighbors in the net-work) that are the only processes with which it can actually communicate.This allows the system to achieve high degrees of scalability because even ifthe system size grows, the number of neighbors for each broker remains fixedensuring that it will manage a bounded number of concurrent open connectionsand data structures.The choice of structuring a Notification Service as an application-levelbroker network is the most common one in actual pub/sub implementations,used by system such as TIB/RV [77], Gryphon [53], SIENA [102], or JEDI[29].Apart from the application-level routing protocols, that we will analyzein Section 2.4, the main aspect to be clarified in an architecture based onan application-level network of brokers is the topologyformed by the brokersthemselves. There are basically two solutions, hierarchical or peer-to-peer.Ina hierarchical topologytree structures, where sub-broke111scribersaccesspointstthebpublishersaccesspointsarerootsnthistopology,thankstothe(orviceversa).Manyontributions9.110/relyO+simplifications itdiffused only inonedirec-tion.However,thehierarchicalarchitecturelacksgeneralitybecauseitreliesonafixed,givenstructurethathastobespecificallymadeupbyapplicationdeveloper and can be hardly modified (e.g. when adding or removing bro-kers). Moreover in [20], a simulation study shows its inherent inefficiency, dueto the fact that brokers belonging to upper levels of the hierarchy experiencea higher load than ones at lower levels. In a generic peer-to-peer topology,a broker can be connected with any other broker, with no restrictions. [20]shows the more effective load-balance obtained with respect to a hierarchi-cal topology,thoughpeer-to-peeralgorithmsmaybemoredifficulttorealize.Then, implementations typically use an acyclic topology to aid the routingprocess.The main problem of an application-level broker network is the creation ofthe topology itself. None of the aforementioned system is able to self-organize
2.3. ARCHITECTURAL MODELS 17 2.3.2 Application-level Networks The most common approach for the design of a distributed Notification Service is building an application-level network of brokers. Brokers communicate through links consisting of connections over an underlying transport protocol. The underlying broker-to-broker protocol can be then any transport-level or application-level protocol: the most common choice is TCP/IP but also HTTP or middleware protocols such as IIOP or DCOM can be used. The application-level network is a pure abstraction as links are not required to represent permanent, long-lived connections. The main implication of structuring a Notification Service as an application-level network is that a broker “knows” only a limited set of brokers (i.e., its neighbors in the network) that are the only processes with which it can actually communicate. This allows the system to achieve high degrees of scalability because even if the system size grows, the number of neighbors for each broker remains fixed ensuring that it will manage a bounded number of concurrent open connections and data structures. The choice of structuring a Notification Service as an application-level broker network is the most common one in actual pub/sub implementations, used by system such as TIB/RV [77], Gryphon [53], SIENA [102], or JEDI [29]. Apart from the application-level routing protocols, that we will analyze in Section 2.4, the main aspect to be clarified in an architecture based on an application-level network of brokers is the topology formed by the brokers themselves. There are basically two solutions, hierarchical or peer-to-peer. In a hierarchical topology, brokers are organized in tree structures, where subscribers’ access points lie at the bottom and publishers’ access points are roots (or viceversa). Many contributions [9, 110] rely on this topology, thanks to the simplifications it can allow since notifications are diffused only in one direction. However, the hierarchical architecture lacks generality because it relies on a fixed, given structure that has to be specifically made up by application developer and can be hardly modified (e.g. when adding or removing brokers). Moreover in [20], a simulation study shows its inherent inefficiency, due to the fact that brokers belonging to upper levels of the hierarchy experience a higher load than ones at lower levels. In a generic peer-to-peer topology, a broker can be connected with any other broker, with no restrictions. [20] shows the more effective load-balance obtained with respect to a hierarchical topology, though peer-to-peer algorithms may be more difficult to realize. Then, implementations typically use an acyclic topology to aid the routing process. The main problem of an application-level broker network is the creation of the topology itself. None of the aforementioned system is able to self-organize
18CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSthe broker's network and it is up to the user to decide and set up the connec-tions among brokers. This may generate a non-optimized behavior becausetheapplication-leveltopologymaynotreflecttheunderlyingphysical network:then a single link between two brokers can actuallymap to a"long"networkpath, in terms of latency and/or throughput and this may seriously harm theoverall performance of the whole svstem. Furthermore,an application-levelcommunication protocol is inherently slower than one implemented at therouter level and, even with thehigh power of current devices and networks, itisimpossibleto achievethesameperformancelevels.2.3.3Peer-to-peerOverlayNetworkInfrastructuresA peer-to-peer overlay network infrastructure realizes an application-level net-work for information diffusion.It is composed by a set of nodes, each havinga unique identifier, and provide the possibility of sending/retrieving informa-tion to/from one or more specific node(s), just by specifying their identifier.In other words, they realize a general-purpose unicast or multicast communi-cationfacility among thenodes.Themain advantageof using such infrastruc-tures in a large-scale setting is their self-organization capability,allowing themto structurethenetwork whenanodeleaves (forexampleafter afault)orjoins(forexampletofaceahighernumberOverlaynetworkinfrastruc-ofuisersge-scaleinformationtures are theeftectivetoasalizinga:larpropagationandhavebeconwdelypuarchareainrecentyears.Asaconsequencofthatebeendeveloped:weciteamongtheothersPastry[95],Chord[105],Tapestry[116](unicastdiffusion)orCAN[89],13[104]andAstrolabe[109](multicastdiffusion)Structuring a pub/sub system over an overlay network infrastructure meansleveraging the self-organization capabilities of the infrastructure, by buildinga pub/sub interface over it.The pub/sub behavior is realized through thecommunication primitives provided by the underlying overlay. With respectto directly building an application-level network of broker, this solution allowsto more easily manage dynamic aspects of the systems such as faults and agrowing number of broker, inheriting such features from the overlay networkinfrastructure. Examples of systems using this solution are Bayeux [117]and Scribe [94], for what concerns topic-based systems, and Hermes [84] andRebeca[108],for what concernscontent-based systems.Finally,wecite Select-Cast[15l,amulticast systembuilton topof Astrolabeprovidinga SQL-likeARigorously,also an application-level network of brokers does realize an overlay networkAnywayin theliterature.theprecisenature of an overlaynetworkinfrastructure consistsin just the basic unicast/multicast behavior, rather than the more specific many-to-many,anonymous semantics of apub/subsystem.Thisexplainsthedistinction wemadebetweenthe two solutions
18 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS the broker’s network and it is up to the user to decide and set up the connections among brokers. This may generate a non-optimized behavior because the application-level topology may not reflect the underlying physical network: then a single link between two brokers can actually map to a “long” network path, in terms of latency and/or throughput and this may seriously harm the overall performance of the whole system. Furthermore, an application-level communication protocol is inherently slower than one implemented at the router level and, even with the high power of current devices and networks, it is impossible to achieve the same performance levels. 2.3.3 Peer-to-peer Overlay Network Infrastructures A peer-to-peer overlay network infrastructure realizes an application-level network for information diffusion. It is composed by a set of nodes, each having a unique identifier, and provide the possibility of sending/retrieving information to/from one or more specific node(s), just by specifying their identifier. In other words, they realize a general-purpose unicast or multicast communication facility among the nodes. The main advantage of using such infrastructures in a large-scale setting is their self-organization capability, allowing them to structure the network when a node leaves (for example after a fault) or joins (for example to face a higher number of users). Overlay network infrastructures are then an effective way for easily realizing a large-scale information propagation and have become a widely popular research area in recent years. As a consequence of that, many systems have been developed: we cite among the others Pastry [95], Chord [105], Tapestry [116] (unicast diffusion) or CAN [89], I3 [104] and Astrolabe [109] (multicast diffusion). Structuring a pub/sub system over an overlay network infrastructure means leveraging the self-organization capabilities of the infrastructure, by building a pub/sub interface over it. The pub/sub behavior is realized through the communication primitives provided by the underlying overlay. With respect to directly building an application-level network of broker, this solution allows to more easily manage dynamic aspects of the systems such as faults and a growing number of broker, inheriting such features from the overlay network infrastructure4 . Examples of systems using this solution are Bayeux [117] and Scribe [94], for what concerns topic-based systems, and Hermes [84] and Rebeca [108], for what concerns content-based systems. Finally, we cite SelectCast [15], a multicast system built on top of Astrolabe providing a SQL-like 4Rigorously, also an application-level network of brokers does realize an overlay network. Anyway in the literature, the precise nature of an overlay network infrastructure consists in just the basic unicast/multicast behavior, rather than the more specific many-to-many, anonymous semantics of a pub/sub system. This explains the distinction we made between the two solutions
2.4.BEHINDTHE SCENES OFADISTRIBUTEDNOTIFICATION19SERVICEsyntax for expressing subscriptions2.4Behind the Scenes of a Distributed NotificationServiceIn the previous sections we presented “external" aspects, that is how a pub/subsystem can expose its feature to users (Subscription Models)and how it isstructured (Architectural Models). In this section we focus on an“"internal"view of a pub/sub system, describing what are the mechanisms and algorithmsthat a pub/sub system must implement, given a subscription model and anarchitectural model, in order to realize its functionality.2.4.1OverviewAs pointed out above, we consider distributed Notification Services where eachsubscriberorpublisher cancontact anynotificationbrokerinordertoparticipate to the system. The brokers to which a subscriber s actually connectsare called access points for s.The set of access points of a subscriber s isdenoted asAP(s).Theaccesspoint represents onlythebrokerthroughwhichs issues subscriptions and receives notifications.In general, the access pointmay not necessarily host the subscription of its subscribers: for load-balancingpurposes or for simplifying notification propagation, a subscription may bestored in one or more brokers different from the access point of the issuingsubscriber, o.s. In this case, these brokers will be referred to as the targetbrokersforA subscription configuration is a set sc = (o1,...Om) that contains all thesubscriptionspresent in thewholesystem ataparticular time.Theset of allpossible subscription configurations is denoted as SC.Finally,we denote theset of all possible notifications as 2.The functionality of a distributed Notification Service, in its most mostgeneral form, can be thought of as decomposed in the following sub-problems5(Figure 2.3):Event Matching : the task of computing the set of interested subscribers.Subscription Assignment : identification of a policy used to determinehow to distribute subscriptions among brokers.Subscription Routing : the process of dispatching the subscription fromthe access point to the target broker, i.e.the implementation of theassignment policy.5In this part of the Chapter we consider the terms events and notifications as not syn-onyms
2.4. BEHIND THE SCENES OF A DISTRIBUTED NOTIFICATION SERVICE 19 syntax for expressing subscriptions. 2.4 Behind the Scenes of a Distributed Notification Service In the previous sections we presented “external” aspects, that is how a pub/sub system can expose its feature to users (Subscription Models) and how it is structured (Architectural Models). In this section we focus on an “internal” view of a pub/sub system, describing what are the mechanisms and algorithms that a pub/sub system must implement, given a subscription model and an architectural model, in order to realize its functionality. 2.4.1 Overview As pointed out above, we consider distributed Notification Services where each subscriber or publisher can contact any notification broker in order to participate to the system. The brokers to which a subscriber s actually connects are called access points for s. The set of access points of a subscriber s is denoted as AP(s). The access point represents only the broker through which s issues subscriptions and receives notifications. In general, the access point may not necessarily host the subscription of its subscribers: for load-balancing purposes or for simplifying notification propagation, a subscription σ may be stored in one or more brokers different from the access point of the issuing subscriber, σ.s. In this case, these brokers will be referred to as the target brokers for σ. A subscription configuration is a set sc = (σ1, . . . σm) that contains all the subscriptions present in the whole system at a particular time. The set of all possible subscription configurations is denoted as SC. Finally, we denote the set of all possible notifications as Ω. The functionality of a distributed Notification Service, in its most most general form, can be thought of as decomposed in the following sub-problems5 (Figure 2.3): Event Matching : the task of computing the set of interested subscribers. Subscription Assignment : identification of a policy used to determine how to distribute subscriptions among brokers. Subscription Routing : the process of dispatching the subscription from the access point to the target broker, i.e. the implementation of the assignment policy. 5 In this part of the Chapter we consider the terms events and notifications as not synonyms
20CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSBFigure2.3:Publish/SubscribeMechanismsEvent Routing : the process of (i) identifying the target brokers for the notification (resolving), (ii)forwarding thenotification through the brokersnetwork in order to reach all possible target brokersNotification Routing :the process ofdispatching anotification to allmatch-ing subscribers, i.e. delivering a matching notification from target brokers to the corresponding subscribers.Figure 2.3 (a) shows an example of subscription assignment and routing:we supposea partitioning policy that assigns subscription X to broker B.When X is issued by a subscriber si, it has to be routed from si's accesspoint Bo to target broker Bi. Figure 2.3 (b) shows what happens in the sameexample situation, when a notification e is published by a client pi at brokerB2. The assignment policy applied to e, resolves the notification to targetbroker Bo. Then, e has to be routed to Bo where it can be matched (eventrouting). If e matches X, the corresponding notification has to be routed tointerested subscriber si (notification routing). Events and subscriptions arerepresented in the Figures as two opposite flows, unified by the choice of thetarget broker given by the partitioning policy. In the following, we providedetails on each of the identified mechanisms by discussing their relationshipsandtrade-offs.We point out that the whole process takes place as a coordination amongbrokers. Client processes are only the last link of the chain and act in thesystem only through their access points. Then in the following we do notconsider client processes but restrict all the analysis to the network of brokers.2.4.2EventMatchingEvent matching is an extension of the matching operation defined in Section2.1.1, that is calculating if a notification satisfies a filter. In this case, thenotification has to bematched against all thefilters in sc,returning all the
20 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS s B 1 0 B 1 B 2 X B 3 X B 0 B 1 B 2 X s 1 p 1 B 3 e e Figure 2.3: Publish/Subscribe Mechanisms Event Routing : the process of (i) identifying the target brokers for the notification (resolving), (ii) forwarding the notification through the brokers network in order to reach all possible target brokers Notification Routing : the process of dispatching a notification to all matching subscribers, i.e. delivering a matching notification from target brokers to the corresponding subscribers. Figure 2.3 (a) shows an example of subscription assignment and routing: we suppose a partitioning policy that assigns subscription X to broker B1. When X is issued by a subscriber s1, it has to be routed from s1’s access point B0 to target broker B1. Figure 2.3 (b) shows what happens in the same example situation, when a notification e is published by a client p1 at broker B2. The assignment policy applied to e, resolves the notification to target broker B0. Then, e has to be routed to B0 where it can be matched (event routing). If e matches X, the corresponding notification has to be routed to interested subscriber s1 (notification routing). Events and subscriptions are represented in the Figures as two opposite flows, unified by the choice of the target broker given by the partitioning policy. In the following, we provide details on each of the identified mechanisms by discussing their relationships and trade-offs. We point out that the whole process takes place as a coordination among brokers. Client processes are only the last link of the chain and act in the system only through their access points. Then in the following we do not consider client processes but restrict all the analysis to the network of brokers. 2.4.2 Event Matching Event matching is an extension of the matching operation defined in Section 2.1.1, that is calculating if a notification satisfies a filter. In this case, the notification has to be matched against all the filters in sc, returning all the
2.4.BEHINDTHESCENESOFADISTRIBUTEDNOTIFICATION21SERVICEcorresponding subscribers. This can be formally represented by defining thefollowing function:元:2×SC-22The realization of is one central and challenging point: as we are deal-ing with large-scale systems, we expect on one side the overall number ofsubscriptions in the system to be very high, and on the other a high rate ofnotifications. Then, the matching operation has to be performed often and onmassivedatasizes.Whileobviouslythis posesno problemsin atopic-basedsystem, where matching reduces to a simple table lookup, it is a fundamentalissue for the overall performance of a content-based system. The trivial so-lution of testing sequentially each subscription against the notification to bematched may result in a very poor performance in this settings.Techniques for efficiently performing the matching operation are then oneimportantresearchissuerelated in the pub/subfield.Sincethey aremorerelated to other research fields rather than distributed computing (e.g. activedatabases), we only give a brief survey on the solutions actually exploited incontent-based implementations.These can be grouped in two main categories [93], namely predicate inder-ing algorithms and testing network algorithms.Predicate indexing algorithmsare structured in two phases: the first phase is used to decompose filters ofsubscriptions intoelementary constraints and determinewhich constraints aresatisfied bythenotification;inthesecond phasetheresults of thefirstphasematchthenotificationareusedtodeteehlteraintMatching algorithms fallinateindexingfamilyare[80,44].Testing network algorithmsbasedon apre-processing of theset of subscriptions that builds e(atreein[1] and [48] orabi-+narydecisiondiagramin[16])composedbynodesrepresentingtheconstraintsin each filter. The structure is traversed in a second phase of the algorithm,by matching the notification against each constraint. A notification matchesa filter whenthe data structure is completely traversed by it.2.4.3SubscriptionAssignmentandRoutingIn the above description of the Notification Service, we stated that one of itsfunctions isto stores all the subscriptions issued by subscribers.When consid-ering a distributed implementation, this task is actually accomplished by oneor moreprocesses in the brokers set.For the scalability sake, it is a desirableproperty not to maintain a copy of the whole subscription set on every singlebroker, but rather sharing the load of storing and managing (i.e. matching)subscriptionsamongthe set of brokers.This allowsto effectively exploitthe
2.4. BEHIND THE SCENES OF A DISTRIBUTED NOTIFICATION SERVICE 21 corresponding subscribers. This can be formally represented by defining the following function: π : Ω × SC → 2 Σ The realization of π is one central and challenging point: as we are dealing with large-scale systems, we expect on one side the overall number of subscriptions in the system to be very high, and on the other a high rate of notifications. Then, the matching operation has to be performed often and on massive data sizes. While obviously this poses no problems in a topic-based system, where matching reduces to a simple table lookup, it is a fundamental issue for the overall performance of a content-based system. The trivial solution of testing sequentially each subscription against the notification to be matched may result in a very poor performance in this settings. Techniques for efficiently performing the matching operation are then one important research issue related in the pub/sub field. Since they are more related to other research fields rather than distributed computing (e.g. active databases), we only give a brief survey on the solutions actually exploited in content-based implementations. These can be grouped in two main categories [93], namely predicate indexing algorithms and testing network algorithms. Predicate indexing algorithms are structured in two phases: the first phase is used to decompose filters of subscriptions into elementary constraints and determine which constraints are satisfied by the notification; in the second phase the results of the first phase are used to determine the filters in which all constraints match the notification. Matching algorithms falling into the predicate indexing family are [80, 44]. Testing network algorithms ([1, 48, 16]) are based on a pre-processing of the set of subscriptions that builds a data structure (a tree in [1] and [48] or a binary decision diagram in [16]) composed by nodes representing the constraints in each filter. The structure is traversed in a second phase of the algorithm, by matching the notification against each constraint. A notification matches a filter when the data structure is completely traversed by it. 2.4.3 Subscription Assignment and Routing In the above description of the Notification Service, we stated that one of its functions is to stores all the subscriptions issued by subscribers. When considering a distributed implementation, this task is actually accomplished by one or more processes in the brokers set. For the scalability sake, it is a desirable property not to maintain a copy of the whole subscription set on every single broker, but rather sharing the load of storing and managing (i.e. matching) subscriptions among the set of brokers. This allows to effectively exploit the