UNIVERSITA DEGLI STUDIDIROMA “LA SAPIENZA"DOTTORATO DI RICERCA IN INGEGNERIA INFORMATICAXVI CICLO - 2003-VPublish/SubscribeCommunicationSystems: from Models to ApplicationsAntoninoVirgillito
Universita degli Studi di Roma “La Sapienza” ` Dottorato di Ricerca in Ingegneria Informatica XVI Ciclo – 2003– V Publish/Subscribe Communication Systems: from Models to Applications Antonino Virgillito
Contents11 Introduction21.1The Publish/Subscribe Paradigm31.1.1ResearchChallengesforPublish/Subscribe41.2Contributionsof theThesis61.3Structure of the Thesis .721UnderstandingPublish/SubscribeSystems82.11BasicPublish/SubscribeSpecification82.1.1Elements of a Publish/Subscribe System2.1.210PositioningthePublish/SubscribeParadigm112.2Subscription Models2.2.112Topic-based Model132.2.2Content-based Model2.2.314 Type-based Model152.3Architectural Models2162.3.1NetworkMulticasting172.3.2Application-level Networks182.3.3Peer-to-peer Overlay Network Infrastructures192.4Behind the Scenes of a Distributed Notification Service192.4.1Overview2.4.220Event Matching.212.4.3Subscription Assignment and Routing2.4.424Event and Notification Routing262.4.5Classification Framework272.5Surveying Publish/Subscribe Systems272.5.1TIB/RV2.5.2Scribe28282.5.3Gryphon292.5.4SIENA302.5.5Hermes312.6Concluding Remarksi
Contents 1 Introduction 1 1.1 The Publish/Subscribe Paradigm . . . . . . . . . . . . . . . . . 2 1.1.1 Research Challenges for Publish/Subscribe . . . . . . . 3 1.2 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . 4 1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 6 2 Understanding Publish/Subscribe Systems 7 2.1 Basic Publish/Subscribe Specification . . . . . . . . . . . . . . 8 2.1.1 Elements of a Publish/Subscribe System . . . . . . . . . 8 2.1.2 Positioning the Publish/Subscribe Paradigm . . . . . . 10 2.2 Subscription Models . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Topic-based Model . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Content-based Model . . . . . . . . . . . . . . . . . . . 13 2.2.3 Type-based Model . . . . . . . . . . . . . . . . . . . . . 14 2.3 Architectural Models . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Network Multicasting . . . . . . . . . . . . . . . . . . . 16 2.3.2 Application-level Networks . . . . . . . . . . . . . . . . 17 2.3.3 Peer-to-peer Overlay Network Infrastructures . . . . . . 18 2.4 Behind the Scenes of a Distributed Notification Service . . . . . 19 2.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.2 Event Matching . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.3 Subscription Assignment and Routing . . . . . . . . . . 21 2.4.4 Event and Notification Routing . . . . . . . . . . . . . . 24 2.4.5 Classification Framework . . . . . . . . . . . . . . . . . 26 2.5 Surveying Publish/Subscribe Systems . . . . . . . . . . . . . . 27 2.5.1 TIB/RV . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.2 Scribe . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5.3 Gryphon . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5.4 SIENA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.5 Hermes . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 31 i
333ModellingPublish/SubscribeSystems343.1AFramework forPublish/Subscribe343.1.1Process-NS Interaction .3.1.235Computational Model店3.1.336NS Implementation Parameters393.1.4Liveness Property413.1.5Persistent Notifications3.1.643On the liveness specification in dynamic systems453.2Analytical Model453.2.1Measuring Notification Loss3.2.248Analytical results.523.2.3Discussion.543.3Simulation Study543.3.1Simulation Details553.3.2Simulation Results3.457Related Work.583.5 Concluding Remarks4 Self-Organizing Content-Based Publish/Subscribe61624.1 Background.624.1.1Publish/Subscribe Model634.1.2Content-based Routing Protocol644.1.3Scalability of Content-Based Routing.654.2SOCBR:ASelf-OrganizingCBRAlgorithm664.2.1The Cost Metric:TCPhops664.2.2Measuring Subscription Similarity:Associativity684.2.3AlgorithmOverview684.3AlgorithmSpecification694.3.1Basic Notions4.3.270Triggering704.3.3Tear-Up Link Discovery4.3.471Tear-Down Link Selection734.3.5Reconfiguration.4.475Addressing Network Proximity754.4.1Network Awareness in Pub/Sub Systems774.4.2pbSOCBR:Network-Aware Self-Organization4.577Simulation Study.79Implementation Details4.5.1804.5.2Simulation Scenarios814.5.3Experimental Results884.6Related Work904.7Concluding Remarks
3 Modelling Publish/Subscribe Systems 33 3.1 A Framework for Publish/Subscribe . . . . . . . . . . . . . . . 34 3.1.1 Process-NS Interaction . . . . . . . . . . . . . . . . . . . 34 3.1.2 Computational Model . . . . . . . . . . . . . . . . . . . 35 3.1.3 NS Implementation Parameters . . . . . . . . . . . . . . 36 3.1.4 Liveness Property . . . . . . . . . . . . . . . . . . . . . 39 3.1.5 Persistent Notifications . . . . . . . . . . . . . . . . . . 41 3.1.6 On the liveness specification in dynamic systems . . . . 43 3.2 Analytical Model . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.1 Measuring Notification Loss . . . . . . . . . . . . . . . . 45 3.2.2 Analytical results . . . . . . . . . . . . . . . . . . . . . . 48 3.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.1 Simulation Details . . . . . . . . . . . . . . . . . . . . . 54 3.3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . 55 3.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 58 4 Self-Organizing Content-Based Publish/Subscribe 61 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.1 Publish/Subscribe Model . . . . . . . . . . . . . . . . . 62 4.1.2 Content-based Routing Protocol . . . . . . . . . . . . . 63 4.1.3 Scalability of Content-Based Routing . . . . . . . . . . . 64 4.2 SOCBR: A Self-Organizing CBR Algorithm . . . . . . . . . . . 65 4.2.1 The Cost Metric: TCP hops . . . . . . . . . . . . . . . 66 4.2.2 Measuring Subscription Similarity: Associativity . . . . 66 4.2.3 Algorithm Overview . . . . . . . . . . . . . . . . . . . . 68 4.3 Algorithm Specification . . . . . . . . . . . . . . . . . . . . . . 68 4.3.1 Basic Notions . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.2 Triggering . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.3 Tear-Up Link Discovery . . . . . . . . . . . . . . . . . . 70 4.3.4 Tear-Down Link Selection . . . . . . . . . . . . . . . . . 71 4.3.5 Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . 73 4.4 Addressing Network Proximity . . . . . . . . . . . . . . . . . . 75 4.4.1 Network Awareness in Pub/Sub Systems . . . . . . . . 75 4.4.2 pbSOCBR: Network-Aware Self-Organization . . . . . . 77 4.5 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.5.1 Implementation Details . . . . . . . . . . . . . . . . . . 79 4.5.2 Simulation Scenarios . . . . . . . . . . . . . . . . . . . . 80 4.5.3 Experimental Results . . . . . . . . . . . . . . . . . . . 81 4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 90
91Publish/Subscribe at Work:The DaQuinCIS Project5925.1Background:theDaQuinCISproject925.1.1DataQuality in Cooperative Information Systems945.2ManagingDataQualityinCIS's:TheDaQuinCIS Architecture955.2.1The D?Q model965.2.2ArchitectureDescription5.397The Quality Notification Service-Specification985.3.1.1025.4QNSDesign..1025.4.1OverviewandMotivations.5.4.2103Merge Subscriptions.5.4.3105Diffusion Trees5.4.4106Quality Notification Service Internal Architecture5.5108Implementation and Simulation5.6Related Work1095.7110Concluding Remarks1136Conclusions1136.1ContributionsandFutureWork6.2116FutureDirections
5 Publish/Subscribe at Work: The DaQuinCIS Project 91 5.1 Background: the DaQuinCIS project . . . . . . . . . . . . . . . 92 5.1.1 Data Quality in Cooperative Information Systems . . . 92 5.2 Managing Data Quality in CIS’s: The DaQuinCIS Architecture 94 5.2.1 The D2Q model . . . . . . . . . . . . . . . . . . . . . . 95 5.2.2 Architecture Description . . . . . . . . . . . . . . . . . . 96 5.3 The Quality Notification Service . . . . . . . . . . . . . . . . . 97 5.3.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . 98 5.4 QNS Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4.1 Overview and Motivations . . . . . . . . . . . . . . . . . 102 5.4.2 Merge Subscriptions . . . . . . . . . . . . . . . . . . . . 103 5.4.3 Diffusion Trees . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.4 Quality Notification Service Internal Architecture . . . . 106 5.5 Implementation and Simulation . . . . . . . . . . . . . . . . . . 108 5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 110 6 Conclusions 113 6.1 Contributions and Future Work . . . . . . . . . . . . . . . . . . 113 6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Chapter 1IntroductionThe world-wide connectivity achieved with the explosion of the Internet is nowa well established reality.Possibilities are currently ever growing, thanks to thewidespreaddiffusionof high-bandwidthlinksand of powerful mobiledevicessuch as wireless laptops, palm computers or new generation mobile phones.The result is that millions of users are now potentially able to communicate,with massive loads of information possibly being exchanged from one side toanother of networks spanning a world-wide range.One of the biggest challenges in next-generation distributed computing isrepresented by large-scale diffusion of information.Example of applicationsare stock and news tickers, traffic information, instant messaging and elec-tronic auctions.The general model behind these applications is based ongathering information from a set of data sources, and delivering it to all theusers, depending on their interest. Such applications are expected to handle ahuge number of concurrent users, with frequent information publication anddynamic changes in users'interest.The design of this type of distributed applications in such a highly de-manding context still hides many issues to cope with and the large spectrumof possibilities offered by the technological advances cannot be by themselvesthe answer. Powerful tools are still required that can effectively exploits avail-able computational resources, carefully avoiding to overuse them more thanwhat strictly necessary. On the other hand, such tools have to provide to bothapplication developers and users a flexibility allowing them a quick usage andan easy deployment in a broad range of situations.The classical abstractions on whichdistributed applications have been builtuntil now cannot keep this pace anymore. For example, the common RPCparadigm that is the basis for the most popular middleware tools, have provedto be inadequate for large-scale interactions requiring a frequent diffusion ofinformation among many participants. The reason is that RPC promotes a1
Chapter 1 Introduction The world-wide connectivity achieved with the explosion of the Internet is now a well established reality. Possibilities are currently ever growing, thanks to the widespread diffusion of high-bandwidth links and of powerful mobile devices such as wireless laptops, palm computers or new generation mobile phones. The result is that millions of users are now potentially able to communicate, with massive loads of information possibly being exchanged from one side to another of networks spanning a world-wide range. One of the biggest challenges in next-generation distributed computing is represented by large-scale diffusion of information. Example of applications are stock and news tickers, traffic information, instant messaging and electronic auctions. The general model behind these applications is based on gathering information from a set of data sources, and delivering it to all the users, depending on their interest. Such applications are expected to handle a huge number of concurrent users, with frequent information publication and dynamic changes in users’ interest. The design of this type of distributed applications in such a highly demanding context still hides many issues to cope with and the large spectrum of possibilities offered by the technological advances cannot be by themselves the answer. Powerful tools are still required that can effectively exploits available computational resources, carefully avoiding to overuse them more than what strictly necessary. On the other hand, such tools have to provide to both application developers and users a flexibility allowing them a quick usage and an easy deployment in a broad range of situations. The classical abstractions on which distributed applications have been built until now cannot keep this pace anymore. For example, the common RPC paradigm that is the basis for the most popular middleware tools, have proved to be inadequate for large-scale interactions requiring a frequent diffusion of information among many participants. The reason is that RPC promotes a 1