Chapter 2UnderstandingPublish/Subscribe SystemsThe general objective of a publish/subscribe (pub/sub) system is to let information propagate from publishers to interested subscribers, in an anonymous,decoupled fashion. Such a common general behavior is implemented with dif-ferent favors in actual systems known in the pub/sub literature.In particular,thethree aspects that have to be specified are: How subscribers' interest is expressed in relation to information. Inother words, which is the query language used by subscribers for issuingsubscriptions to the Notification Service. How the Notification Service is implemented. The Notification Servicecan be realized as a single, centralized entity or as a distributed set ofprocesses. How information is propagated from publishers to subscribers. That is.how the Notification Service exploits the underlying network levels inorder to correctly dispatch information to interested subscribers.These issues are tightly coupled with one another and their combinationstrongly influences the mechanisms underlying the pub/sub system, For ex-ample, a simple subscription language can favor the implementation of thediffusion mechanism but as a drawback provides a low expressive power forusers to express their interest. The pub/sub paradigm has inspired a greatamount of research in recent years trying to achieve trade-offs between theseconflicting issues. However, this research area currently lacks a unifying viewin which each research contribution can be exactly positioned.In this Chapter we describe all the problems hidden behind a distributedpub/sub system and present a framework to classify thedifferent ways for them7
Chapter 2 Understanding Publish/Subscribe Systems The general objective of a publish/subscribe (pub/sub) system is to let information propagate from publishers to interested subscribers, in an anonymous, decoupled fashion. Such a common general behavior is implemented with different flavors in actual systems known in the pub/sub literature. In particular, the three aspects that have to be specified are: ❒ How subscribers’ interest is expressed in relation to information. In other words, which is the query language used by subscribers for issuing subscriptions to the Notification Service. ❒ How the Notification Service is implemented. The Notification Service can be realized as a single, centralized entity or as a distributed set of processes. ❒ How information is propagated from publishers to subscribers. That is, how the Notification Service exploits the underlying network levels in order to correctly dispatch information to interested subscribers. These issues are tightly coupled with one another and their combination strongly influences the mechanisms underlying the pub/sub system. For example, a simple subscription language can favor the implementation of the diffusion mechanism but as a drawback provides a low expressive power for users to express their interest. The pub/sub paradigm has inspired a great amount of research in recent years trying to achieve trade-offs between these conflicting issues. However, this research area currently lacks a unifying view in which each research contribution can be exactly positioned. In this Chapter we describe all the problems hidden behind a distributed pub/sub system and present a framework to classify the different ways for them 7
CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMS8subscribe(a)bscribe(a(S)P2OVBCSnFigure 2.1:High-level view of a pub/sub system.to be addressed that have been proposed in the pub/sub literature. In the firstpart of the chapter we donotrefer to any specific system or implementation,but consider problems at their most general level. This allows us to isolatethe overall issues from what are specific implementations techniques. In thefinal part of the chapterwe surveythemost representativepub/sub systems,positioning them with respect to the general reference framework.At thebest of our knowledge, this is the first attempt to realize a general survey ofthe area that captures all the current state-of-the-art systems and researchcontributions.2.1BasicPublish/SubscribeSpecificationIn this section we propose a general high-level framework of a pub/sub system,by first describing the participants to the system and their roles and thendefining the various aspects of their interaction. A formal specification of thesemantics of a pub/sub system is given in Chapter 3.Elements of a Publish/Subscribe System2.1.11A generic pub/sub communication system (PsS) can be represented by a triple< II, B, > of sets of processes (Figure 2.1). Sets in the triple are defineddepending on the role of the processes in the system: II = pi,..., Pn is a setof n processes, called the publishers, which are producers of information.=si,..,sm is a set of m processes called the subscribers, which are consumersof information. and II may have a non-zero intersection, that is a sameprocess may act both as a publisher and as a subscriber. = Bi,..., B。 is aset of o processes, called brokersWe assume publishers and subscribers to be decoupled: a process in IIcannot communicate directly with a process in and vice versa (unless it isthesameprocess).Decoupling is a desirablecharacteristicfor a communi-
8 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS B1 B2 B3 Bn . p1 p2 pn . s1 s2 sn publish (e ) notify (e ) subscribe (σ) unsubscribe (σ) . NOTIFICATION SERVICE Figure 2.1: High-level view of a pub/sub system. to be addressed that have been proposed in the pub/sub literature. In the first part of the chapter we do not refer to any specific system or implementation, but consider problems at their most general level. This allows us to isolate the overall issues from what are specific implementations techniques. In the final part of the chapter we survey the most representative pub/sub systems, positioning them with respect to the general reference framework. At the best of our knowledge, this is the first attempt to realize a general survey of the area that captures all the current state-of-the-art systems and research contributions. 2.1 Basic Publish/Subscribe Specification In this section we propose a general high-level framework of a pub/sub system, by first describing the participants to the system and their roles and then defining the various aspects of their interaction. A formal specification of the semantics of a pub/sub system is given in Chapter 3. 2.1.1 Elements of a Publish/Subscribe System A generic pub/sub communication system (PSS) can be represented by a triple < Π, B, Σ > of sets of processes (Figure 2.1). Sets in the triple are defined depending on the role of the processes in the system: Π = p1, . . . , pn is a set of n processes, called the publishers, which are producers of information. Σ = s1, . . . , sm is a set of m processes called the subscribers, which are consumers of information. Σ and Π may have a non-zero intersection, that is a same process may act both as a publisher and as a subscriber. ∆ = B1, . . . , Bo is a set of o processes, called brokers. We assume publishers and subscribers to be decoupled: a process in Π cannot communicate directly with a process in Σ and vice versa (unless it is the same process). Decoupling is a desirable characteristic for a communi-
92.1.BASICPUBLISH/SUBSCRIBESPECIFICATIONcation system because applications can be made more independent from thecommunication issues, avoiding to deal with aspects such as addressing or syn-chronization [40]. We discuss this point in Section 2.1.2. Processes in II andcan exclusivelycommunicatewithanyotherprocessin.Then,thesetofbrokers represents a logically centralized entity that allows the communica-tion between publishers and subscribers, at the same time maintaining themdecoupled.Deltainitswholeconstitutewhat in literatureis often referredtoas Notification Service (or Event Service).Publishers and subscribers act asclients forthe Notification Service.In the particular case of [/= 1, we have a centralized implementationfor the Notification Service. Centralized implementations are obviously thesimplest implementation solution for a Notification Service. However, scala-bility is limited by the processing power of the machine that hosts the serviceand itsnetworking resources.Inthegeneral case of/>1theNotificationService is implemented as a network of distributed brokers. In a distributedimplementation, publishers and subscribers can contact indifferently any of thebrokers, that share the load of managing subscriptions and publications. Therealization of this solution is more challenging,requiring complex protocols forthe coordination of the various brokers and the diffusion of the information(discussed inSection 2.4).From nowon,weexclusivelyfocuson distributedimplementations.Client InteractionThe interaction between client processes and the Noti-fication Servicetakes place through a set of operations that can be executedby clientprocesses on the Notification Serviceand viceversa (Figure 2.1).Apublisher submits a piece of information e to other processes by executing thepublish(e) operation on the Notification Service.The Notification Servicedispatches a piece of information e submitted by other processes to a sub-scriber by executing the notify(e) on it. A subscription is respectivelyinstalled and removed on the Notification Service by subscriber processes byexecuting the subscribe(α) and unsubscribe() operations.Notifications and SubscriptionsIn the following we specify the natureoftheinformationexchanged betweenpublishers and subscribers.Suchin-formation is produced in form of notifications. In the pub/sub literature alsothe terms event and publication are often used,but is important to pointout that the three terms are not interchangeable. We clarify the exact wayin which these terms are generally used: a publisher produces a event (ora publication), while the Notification Service issues the corresponding notifi-cation on subscribers. For simplicity, in the following we only use the term"notification",exceptwhereit is importanttopoint outthedifference
2.1. BASIC PUBLISH/SUBSCRIBE SPECIFICATION 9 cation system because applications can be made more independent from the communication issues, avoiding to deal with aspects such as addressing or synchronization [40]. We discuss this point in Section 2.1.2. Processes in Π and Σ can exclusively communicate with any other process in ∆. Then, the set of brokers ∆ represents a logically centralized entity that allows the communication between publishers and subscribers, at the same time maintaining them decoupled. Delta in its whole constitute what in literature is often referred to as Notification Service (or Event Service). Publishers and subscribers act as clients for the Notification Service. In the particular case of |∆| = 1, we have a centralized implementation for the Notification Service. Centralized implementations are obviously the simplest implementation solution for a Notification Service. However, scalability is limited by the processing power of the machine that hosts the service and its networking resources. In the general case of |∆| > 1 the Notification Service is implemented as a network of distributed brokers. In a distributed implementation, publishers and subscribers can contact indifferently any of the brokers, that share the load of managing subscriptions and publications. The realization of this solution is more challenging, requiring complex protocols for the coordination of the various brokers and the diffusion of the information (discussed in Section 2.4). From now on, we exclusively focus on distributed implementations. Client Interaction The interaction between client processes and the Noti- fication Service takes place through a set of operations that can be executed by client processes on the Notification Service and viceversa (Figure 2.1). A publisher submits a piece of information e to other processes by executing the publish(e) operation on the Notification Service. The Notification Service dispatches a piece of information e submitted by other processes to a subscriber by executing the notify(e) on it. A subscription σ is respectively installed and removed on the Notification Service by subscriber processes by executing the subscribe(σ) and unsubscribe(σ) operations. Notifications and Subscriptions In the following we specify the nature of the information exchanged between publishers and subscribers. Such information is produced in form of notifications. In the pub/sub literature also the terms event and publication are often used, but is important to point out that the three terms are not interchangeable. We clarify the exact way in which these terms are generally used: a publisher produces a event (or a publication), while the Notification Service issues the corresponding notifi- cation on subscribers. For simplicity, in the following we only use the term “notification”, except where it is important to point out the difference
10CHAPTER2.UNDERSTANDINGPUBLISH/SUBSCRIBESYSTEMSThe common data model used in pub/sub systems defines a notification asa set of attribute-value pairs. Each attribute has a name, a simple characterstring, and a type.The type is generally one of the common primitive datatypes defined in programming languages or query languages (e.g. integer, real,string,etc.).On the subscribers'side, interest in specific notifications is expressed throughsubscriptions.Asubscriptionisapair=(f,s),wheresEZisthesubscriberwhich is interested in notifications declared through the filter f. We say anotification e matches a subscription if it satisfies its filter f.The taskof verifyingwhenevera notification ematchesa filterf is called matching(e f).The precise characterization of the possibleformat of f is the subjectofSection 2.2.2.1.2Positioning the Publish/Subscribe ParadigmThe simple model of pub/sub presented above highlights the characterizingaspects of this paradigm, that is the decoupling among participants and themany-to-many interaction. Such features are desirable properties for buildingscalable distributed applications, but are not offered by any of the other com-mon distributed communication paradigms.In the following we give a briefpresentation of the most popular paradigms for realizing distributed interac-tions, pointing out the differences with the pub/sub paradigm.A detailedcomparison among pub/sub and other paradigms can befound in [40]RemoteProcedureCalls.RemoteProcedureCalls[14representthefirstbasicform of abstractionofadistributed computation,extendingthecommonsub-program invocation to a distributedlevel,by wrapping transparently theaspects related to remotecommunication.RPC isprobably also themostpopular distributed paradigm, thanks to the different form it has been pre-sented. RPC-based mechanisms are in fact included in the C and Java [72]languages, and are the foundation of the most popular middleware technolo-gies, such as CORBA [50], DCOM [85] and J2EE [74]. Finally, the SOAPprotocol [27], foundation of the Web Services technology [79], is the latestincarnation of RPC.RPC realizes basically a one-to-one interaction,with astrongcoupling amongtheparticipants:thecallermustownareferencetoeach entity it wants to communicatewith and a many-to-manyinteraction isdifficult to realize in an efficient way. Moreover, the interaction is generallycompletely synchronous: the called entity act as a server, and must remainavailablefor being invoked for itsentirelifetime,whilethe calling entityactasa client, that generally must remain blocked until it receives a reply from theserver (though asynchronous invocations are common, where the client canleavethecommunicationwithoutwaitingfortheresult)
10 CHAPTER 2. UNDERSTANDING PUBLISH/SUBSCRIBE SYSTEMS The common data model used in pub/sub systems defines a notification as a set of attribute-value pairs. Each attribute has a name, a simple character string, and a type. The type is generally one of the common primitive data types defined in programming languages or query languages (e.g. integer, real, string, etc.). On the subscribers’ side, interest in specific notifications is expressed through subscriptions. A subscription is a pair σ = (f, s), where s ∈ Σ is the subscriber which is interested in notifications declared through the filter f. We say a notification e matches a subscription σ if it satisfies its filter f. The task of verifying whenever a notification e matches a filter f is called matching (e @ f). The precise characterization of the possible format of f is the subject of Section 2.2. 2.1.2 Positioning the Publish/Subscribe Paradigm The simple model of pub/sub presented above highlights the characterizing aspects of this paradigm, that is the decoupling among participants and the many-to-many interaction. Such features are desirable properties for building scalable distributed applications, but are not offered by any of the other common distributed communication paradigms. In the following we give a brief presentation of the most popular paradigms for realizing distributed interactions, pointing out the differences with the pub/sub paradigm. A detailed comparison among pub/sub and other paradigms can be found in [40]. Remote Procedure Calls. Remote Procedure Calls [14] represent the first basic form of abstraction of a distributed computation, extending the common sub-program invocation to a distributed level, by wrapping transparently the aspects related to remote communication. RPC is probably also the most popular distributed paradigm, thanks to the different form it has been presented. RPC-based mechanisms are in fact included in the C and Java [72] languages, and are the foundation of the most popular middleware technologies, such as CORBA [50], DCOM [85] and J2EE [74]. Finally, the SOAP protocol [27], foundation of the Web Services technology [79], is the latest incarnation of RPC. RPC realizes basically a one-to-one interaction, with a strong coupling among the participants: the caller must own a reference to each entity it wants to communicate with and a many-to-many interaction is difficult to realize in an efficient way. Moreover, the interaction is generally completely synchronous: the called entity act as a server, and must remain available for being invoked for its entire lifetime, while the calling entity act as a client, that generally must remain blocked until it receives a reply from the server (though asynchronous invocations are common, where the client can leave the communication without waiting for the result)
112.2.SUBSCRIPTIONMODELSShared Spaces. Shared spaces has been the first paradigm to consider anindirect communication. This is realized by a distributed shared memory, com-mon to all participants, that interact each other by writing and reading datafrom/to the shared space. Actually, this realizes a many-to-many anonymousinteraction,wheremanyproducers can indirectly send messages thatwill bereceived by many consumers.The difference with pub/sub is that consumersare not asynchronously notified but retrieve messages in a push-style fashionwith an explicit,synchronousrequest.Amongthe most popular shared spaceimplementations we cite Linda [47], JavaSpaces [46] and TSpaces [64].Message Queues.Maybe the most common alternative to pub/sub for re-alizing interactions with multiple recipients is the message queue paradigm.Message Queues are an abstraction that is particularly used in the industry,with many popular existing implementations, such as IBM WebSphereMQ[58], Microsoft Message Queue [85] and part of the JMS specification [74].The reason is that the message queue paradigm can easily provide transac-tional or reliability guarantees, thanks to the fact that messages are persis-tently stored within the queues. Moreover, it is often used as the basis forasynchronous invocation to software components (such as COM+ QueuedComponents,Message-DrivenEnterprise Java Beansor Web Services).Allthe communications in this paradigm are filtered by the queue, that coversa role similar to the Notification Service in pub/sub.The differenceis thateach participant may have its own queue and a one-to-many interaction couldrequire addressing several queues. Another feature that makes the level ofdecoupling obtained through message queue lower than the one provided bypub/sub isthefactthat consumersmust explicitlypullthemessagesfromthequeue. However, push-style callback is often present, also with a one-to-manydelivery, making the message queue paradigm similar to a persistent form ofpub/sub.2.2Subscription ModelsDifferent ways for specifying the notifications of interest have led to identify-ing different variants of the pub/sub paradigm. Several subscription modelsappear in the literature, characterized by different expressive powers. Highlyexpressive models offer to subscribers the possibility to precisely match theirinterest, i.e. receiving only the notifications they are interested in. However,as we will point out, the expressive power of the subscription language is notsimply related to the flexibility of interaction with clients, but has a strong in-fluence ontherealization of the wholeNotification Service.Inthis Section wepresent the most popular pub/sub subscription models highlighting the trade-
2.2. SUBSCRIPTION MODELS 11 Shared Spaces. Shared spaces has been the first paradigm to consider an indirect communication. This is realized by a distributed shared memory, common to all participants, that interact each other by writing and reading data from/to the shared space. Actually, this realizes a many-to-many anonymous interaction, where many producers can indirectly send messages that will be received by many consumers. The difference with pub/sub is that consumers are not asynchronously notified but retrieve messages in a push-style fashion with an explicit, synchronous request. Among the most popular shared space implementations we cite Linda [47], JavaSpaces [46] and TSpaces [64]. Message Queues. Maybe the most common alternative to pub/sub for realizing interactions with multiple recipients is the message queue paradigm. Message Queues are an abstraction that is particularly used in the industry, with many popular existing implementations, such as IBM WebSphereMQ [58], Microsoft Message Queue [85] and part of the JMS specification [74]. The reason is that the message queue paradigm can easily provide transactional or reliability guarantees, thanks to the fact that messages are persistently stored within the queues. Moreover, it is often used as the basis for asynchronous invocation to software components (such as COM+ Queued Components, Message-Driven Enterprise Java Beans or Web Services). All the communications in this paradigm are filtered by the queue, that covers a role similar to the Notification Service in pub/sub. The difference is that each participant may have its own queue and a one-to-many interaction could require addressing several queues. Another feature that makes the level of decoupling obtained through message queue lower than the one provided by pub/sub is the fact that consumers must explicitly pull the messages from the queue. However, push-style callback is often present, also with a one-to-many delivery, making the message queue paradigm similar to a persistent form of pub/sub. 2.2 Subscription Models Different ways for specifying the notifications of interest have led to identifying different variants of the pub/sub paradigm. Several subscription models appear in the literature, characterized by different expressive powers. Highly expressive models offer to subscribers the possibility to precisely match their interest, i.e. receiving only the notifications they are interested in. However, as we will point out, the expressive power of the subscription language is not simply related to the flexibility of interaction with clients, but has a strong in- fluence on the realization of the whole Notification Service. In this Section we present the most popular pub/sub subscription models highlighting the trade-