Democratizing content publication with Coral Michael J. Freedman. Eric Freudenthal. David mazieres New York Universit http://www.scs.cs.nyu.edu/coral/ Abstract stream bandwidth utilization and improve the latency for CoralCDN is a peer-to-peer content distribution network This paper describes CoralCDN, a decentralized, self- performance and meets huge demand, all for the price of orgaNizing, peer-to-peer web-content distribution net- work. CoralCDN leverages the aggregate bandwidth of a cheap broadband Internet connection. Volunteer sites volunteers running the software to absorb and dissipate that run CoralCDN automatically replicate content as most of the traffic for web sites using the system. In so do- a side effect of users accessing it. Publishing through ing, CoralCDN replicates content in proportion to the con- CoralCDN is as simple as making a small change to the tent' s popularity, regardless of the publishers resources- hostname in an object's URL, a peer-to-peer DNS layer in effect democratizing content publication To use CoralCDN, a content publisher-or some cache nodes, which in turn cooperate to minimize load on one posting a link to a high-traffic portal-simply ap- to avoid creating hot spots that might dissuade volunteers pends".nyu. net: 8090"to the hostname in a uRL server One of the systems key goals is Through dNS redirection, oblivious clients with unmod- and hurt performance. It achieves this through Coral, ified web browsers are transparently redirected to nearby a latency-optimized hierarchical indexing infrastructure Coral web caches. These caches cooperate to transfer data based on a novel abstraction called a distributed sloppy from nearby peers whenever possible, minimizing both the load on the origin web server and the end-to-end la 1 Introduction tency experienced by browsers CoralCdN is built on top of a novel key/value indexing The availability of content on the Internet is to a large de- infrastructure called Coral. Two properties make Coral gree a function of the cost shouldered by the publisher. a ideal for CDNs. First, Coral allows nodes to locate nearby well-funded web site can reach huge numbers of people cached copies of web objects without querying more dis- through some combination of load-balanced servers, fast tant nodes. Second, Coral prevents hot spots in the in- network connections, and commercial content distribu- frastructure, even under degenerate loads. For instance tion networks(CDNs). Publishers who cannot afford such if every node repeatedly stores the same key, the rate of amenities are limited in the size of audience and type of requests to the most heavily-loaded machine is still only content they can serve. Moreover, their sites risk sudden logarithmic in the total number of nodes overload following publicity, a phenomenon nicknamed Coral exploits over lay routing techniques recently pop the "Slashdot"effect, after a popular web site that period- ularized by a number of peer-to-peer distributed hash ta- ically links to under-provisioned servers, driving unsus- bles(DHTs). However, Coral differs from DHTs in sev- tainable levels of traffic to them. Thus, even struggling eral ways. First, Corals locality and hot-spot prevention content providers are often forced to expend significant properties are not possible for DHTS. Second, Corals resources on content distribution architecture is based on clusters of well-connected Fortunately, at least with static content, there is an easy chines. Clusters are exposed in the interface to higher- way for popular data to reach many more people than level software, and in fact form a crucial part of the DNS publishers can afford to serve themselves-volunteers can redirection mechanism. Finally, to achieve its goals, Coral mirror the data on their own servers and networks. In- provides weaker consistency than traditional DHTs. For deed, the Internet has a long history of organizations with that reason, we call its indexing abstraction a distributed good network connectivity mirroring data they consider to sloppy hash table, or DSHT be of value. More recently, peer-to-peer file sharing has CoralCDN makes a number of contributions. It enables demonstrated the willingness of even individual broad- people to publish content that they previously could not or band users to dedicate upstream bandwidth to redistribute would not because of distribution costs. It is the first com- content the users themselves enjoy. Additionally, orga- pletely decentralized and self-organizing web-content dis- nizations that mirror popular content reduce their down- tribution network. Coral, the indexing infrastructure,pro-
Democratizing content publication with Coral Michael J. Freedman, Eric Freudenthal, David Mazieres ` New York University http://www.scs.cs.nyu.edu/coral/ Abstract CoralCDN is a peer-to-peer content distribution network that allows a user to run a web site that offers high performance and meets huge demand, all for the price of a cheap broadband Internet connection. Volunteer sites that run CoralCDN automatically replicate content as a side effect of users accessing it. Publishing through CoralCDN is as simple as making a small change to the hostname in an object’s URL; a peer-to-peer DNS layer transparently redirects browsers to nearby participating cache nodes, which in turn cooperate to minimize load on the origin web server. One of the system’s key goals is to avoid creating hot spots that might dissuade volunteers and hurt performance. It achieves this through Coral, a latency-optimized hierarchical indexing infrastructure based on a novel abstraction called a distributed sloppy hash table, or DSHT. 1 Introduction The availability of content on the Internet is to a large degree a function of the cost shouldered by the publisher. A well-funded web site can reach huge numbers of people through some combination of load-balanced servers, fast network connections, and commercial content distribution networks (CDNs). Publishers who cannot afford such amenities are limited in the size of audience and type of content they can serve. Moreover, their sites risk sudden overload following publicity, a phenomenon nicknamed the “Slashdot” effect, after a popular web site that periodically links to under-provisioned servers, driving unsustainable levels of traffic to them. Thus, even struggling content providers are often forced to expend significant resources on content distribution. Fortunately, at least with static content, there is an easy way for popular data to reach many more people than publishers can afford to serve themselves—volunteers can mirror the data on their own servers and networks. Indeed, the Internet has a long history of organizations with good network connectivity mirroring data they consider to be of value. More recently, peer-to-peer file sharing has demonstrated the willingness of even individual broadband users to dedicate upstream bandwidth to redistribute content the users themselves enjoy. Additionally, organizations that mirror popular content reduce their downstream bandwidth utilization and improve the latency for local users accessing the mirror. This paper describes CoralCDN, a decentralized, selforganizing, peer-to-peer web-content distribution network. CoralCDN leverages the aggregate bandwidth of volunteers running the software to absorb and dissipate most of the traffic for web sites using the system. In so doing, CoralCDN replicates content in proportion to the content’s popularity, regardless of the publisher’sresources— in effect democratizing content publication. To use CoralCDN, a content publisher—or someone posting a link to a high-traffic portal—simply appends “.nyud.net:8090” to the hostname in a URL. Through DNS redirection, oblivious clients with unmodified web browsers are transparently redirected to nearby Coral web caches. These caches cooperate to transfer data from nearby peers whenever possible, minimizing both the load on the origin web server and the end-to-end latency experienced by browsers. CoralCDN is built on top of a novel key/value indexing infrastructure called Coral. Two properties make Coral ideal for CDNs. First, Coral allows nodes to locate nearby cached copies of web objects without querying more distant nodes. Second, Coral prevents hot spots in the infrastructure, even under degenerate loads. For instance, if every node repeatedly stores the same key, the rate of requests to the most heavily-loaded machine is still only logarithmic in the total number of nodes. Coral exploits overlay routing techniques recently popularized by a number of peer-to-peer distributed hash tables (DHTs). However, Coral differs from DHTs in several ways. First, Coral’s locality and hot-spot prevention properties are not possible for DHTs. Second, Coral’s architecture is based on clusters of well-connected machines. Clusters are exposed in the interface to higherlevel software, and in fact form a crucial part of the DNS redirection mechanism. Finally, to achieve its goals, Coral provides weaker consistency than traditional DHTs. For that reason, we call its indexing abstraction a distributed sloppy hash table, or DSHT. CoralCDN makes a number of contributions. It enables people to publish content that they previously could not or would not because of distribution costs. It is the first completely decentralized and self-organizing web-content distribution network. Coral, the indexing infrastructure, pro- 1
vides a new abstraction potentially of use to any applica Coral tion that needs to locate nearby instances of resources on the network. Coral also introduces an epidemic clustering algorithm that exploits distributed network measurements Coral Furthermore, Coral is the first peer-to-peer key/value in- 8,11 dex that can scale to many stores of the same key without hot-spot congestion, thanks to a new rate-limiting tech- Coral 9 Corals nique. Finally, CoralCdN contains the first peer-to-peer DNS redirection infrastructure, allowing the system to dns sr dns srv http httpprx inter-operate with unmodified web browsers Measurements of coralcdn demonstrate that it al www.x.com 7 nvod. net/ lows under-provisioned web sites to achieve dramatically www.x.comt5 10 igher capacity, and its clustering provides quantitatively Resolver better performance than locality-unaware systems owser The remainder of this paper is structured as follows Section 2 provides a high-level description of CoralCDN, Figure 1: Using CoralCDN, the steps involved in resolving a and Section 3 describes its DNS system and web caching Coralized URL and returning the corresponding fi le, per Sec- components.In Section 4, we describe the Coral index- tion 2.2. Rounded boxes represent CoralCDN nodes running ing infrastructure its underlying Dsht layers and the coRal, Dns, and Http servers Solid arrows correspond to clustering algorithms. Section 5 includes an implementa- Coral RPCs, dashed arrows to DNS traffi c, dotted-dashed arrows tion overview and Section 6 presents experimental results to network probes and dotted arrows to Http traffi c Section 7 describes related work. Section 8 discusses fu ture work. and Section 9 concludes websitesAllrelativelinksandhttpredirectsare 2 The Coral Content distribution Network automatically Coralized The Coral Content Distribution Network( CoralCDN) is 2.2 System Overview composed of three main parts:(1)a network of coop- Figure I shows the steps that occur when a client accesses Http networkofDnsnameserversfornyucd.netthatmapaCoralizedUrl,suchashttp://www.x.com clients to nearby Coral Httpproxiesand(3)theundernyud.net:8090/,usingastandardwebbrowser.the lying Coral indexing infrastructure and clustering machin two main stages-dns redirection and Http request ery on which the first two applications are built. handling-both use the coral indexing infrastructure 2.1 Usage models 1.aclientsendsaDnsrequestforwww.x.com nyud. net to its local resolver To enable immediate and incremental deployment, Coral- 2. The client's resolver attempts to resolve the host- CDN is transparent to clients and requires no software or name using some Coral DNS server(s), possibly plug-in installation. CoralCdN can be used in a variety of starting at one of the few registered under the. net Publishers. a web site publisher for x. com can 3. Upon receiving a query, a Coral DNS server probes change selected URLs in their web pages to"Cor the client to determines its round-trip-time and last alized"urls,suchashttp://www.x.com few network hops nyu. net: 8090/y. jpg 4. Based on the probe results, the DNS server checks Third-parties. An interested third-part andor Http proxies near the clients resolver poster to a web portal or a Usenet group-can Coral e a URl before publishing it, causing all embedded The DNS server replies, returning any servers found through Coral in the previous step, if none were relative links to use Coralcdn as well a random set of nam Users. Coral-aware users can manually construct proxies. In either case, if the DNS server is close to Coralized URls when surfing slow or overloaded the client, it only returns nodes that are close to itself i While Corals Http proxy defi nitely provides (see Section 3.1) it is not an Http proxy in the strict Rfc2616 sense it serves requests 6. The client 's resolver returns the address of a coral that are syntactically formatted for an ordinary HttpseRver Httpproxyforwww.x.com.nyud.net
vides a new abstraction potentially of use to any application that needs to locate nearby instances of resources on the network. Coral also introduces an epidemic clustering algorithm that exploits distributed network measurements. Furthermore, Coral is the first peer-to-peer key/value index that can scale to many stores of the same key without hot-spot congestion, thanks to a new rate-limiting technique. Finally, CoralCDN contains the first peer-to-peer DNS redirection infrastructure, allowing the system to inter-operate with unmodified web browsers. Measurements of CoralCDN demonstrate that it allows under-provisioned web sites to achieve dramatically higher capacity, and its clustering provides quantitatively better performance than locality-unaware systems. The remainder of this paper is structured as follows. Section 2 provides a high-level description of CoralCDN, and Section 3 describes its DNS system and web caching components. In Section 4, we describe the Coral indexing infrastructure, its underlying DSHT layers, and the clustering algorithms. Section 5 includes an implementation overview and Section 6 presents experimental results. Section 7 describes related work, Section 8 discusses future work, and Section 9 concludes. 2 The Coral Content Distribution Network The Coral Content Distribution Network (CoralCDN) is composed of three main parts: (1) a network of cooperative HTTP proxies that handle users’ requests,1 (2) a network of DNS nameservers for nyucd.net that map clients to nearby Coral HTTP proxies, and (3) the underlying Coral indexing infrastructure and clustering machinery on which the first two applications are built. 2.1 Usage Models To enable immediate and incremental deployment, CoralCDN is transparent to clients and requires no software or plug-in installation. CoralCDN can be used in a variety of ways, including: • Publishers. A web site publisher for x.com can change selected URLs in their web pages to “Coralized” URLs, such as http://www.x.com. nyud.net:8090/y.jpg. • Third-parties. An interested third-party—e.g., a poster to a web portal or a Usenet group—can Coralize a URL before publishing it, causing all embedded relative links to use CoralCDN as well. • Users. Coral-aware users can manually construct Coralized URLs when surfing slow or overloaded 1While Coral’s HTTP proxy definitely provides proxy functionality, it is not an HTTP proxy in the strict RFC2616 sense; it serves requests that are syntactically formatted for an ordinary HTTP server. .nyud.net/ www.x.com www.x.com .nyud.net dns srv http prx Coral dns srv http prx Coral dns srv http prx Coral dns srv http prx Coral dns srv http prx Coral Resolver Browser dns srv 4 4 9 8, 11 5 1 6 10 2 7 3 dns srv http prx Coral http prx Coral Figure 1: Using CoralCDN, the steps involved in resolving a Coralized URL and returning the corresponding file, per Section 2.2. Rounded boxes represent CoralCDN nodes running Coral, DNS, and HTTP servers. Solid arrows correspond to Coral RPCs, dashed arrows to DNS traffic, dotted-dashed arrows to network probes, and dotted arrows to HTTP traffic. web sites. All relative links and HTTP redirects are automatically Coralized. 2.2 System Overview Figure 1 shows the steps that occur when a client accesses a Coralized URL, such as http://www.x.com. nyud.net:8090/, using a standard web browser. The two main stages—DNS redirection and HTTP request handling—both use the Coral indexing infrastructure. 1. A client sends a DNS request for www.x.com. nyud.net to its local resolver. 2. The client’s resolver attempts to resolve the hostname using some Coral DNS server(s), possibly starting at one of the few registered under the .net domain. 3. Upon receiving a query, a Coral DNS server probes the client to determines its round-trip-time and last few network hops. 4. Based on the probe results, the DNS server checks Coral to see if there are any known nameservers and/or HTTP proxies near the client’s resolver. 5. The DNS server replies, returning any servers found through Coral in the previous step; if none were found, it returns a random set of nameservers and proxies. In either case, if the DNS server is close to the client, it only returns nodes that are close to itself (see Section 3.1). 6. The client’s resolver returns the address of a Coral HTTP proxy for www.x.com.nyud.net. 2
7.TheclientsendstheHttprequesthttp://www..nodes(level,counttargetservices):Returns x.com.nyud.net8090/tothespecifiedproxy count neighbors belonging to the node's cluster as If the proxy is caching the file locally, it returns the specified by level. target, if supplied, specifies the file and stops. Otherwise, this process continues IP address of a machine to which the returned nodes 8. The proxy looks up the web objects URL in Coral. would ideally be near. Coral can probe target and 9. If Coral returns the address of a node caching the exploit network topology hints stored in the dsht object, the proxy fetches the object from this node to satisfy the request. If services is specified, Coral Otherwise, the proxy downloads the object from the will only return nodes running the particular service, originserverwww.x.com(notshown e. g., an Http proxy or Dns server 10. The proxy stores the web object and returns it to the levels(: Returns the number of levels in Corals hi- client browser erarchy and their corresponding rtt thresholds 11. The proxy stores a reference to itself in Coral, recording the fact that is now caching the URL The next section describes the design of CoralCDN's Dns redirector and Http proxy-especially with regard 2.3 The Coral Indexing abstraction to their use of Corals dsht abstraction and clustering hierarchy--before returning to Coral in Section This section introduces the Coral indexing infrastructu as used by CoralCDN Coral provides a distributed sloppy 3 Application-Layer Components hash table(dsht) abstraction. DSHTs are designed for applications storing soft-state key/value pairs, where mul- The Coral DNS server directs browsers fetching Coralized tiple values may be stored under the same key Coral- Urls to cOral Http proxies attempting to find ones near CDN uses this mechanism to map a variety of types of he requesting client These Http proxies exploit each key onto addresses of CoralCDN nodes. In particular, it others caches in such a way as to minimize both transfer uses DSHTs to find Coral nameservers topologically close latency and the load on origin web servers clients'networks, to find Http proxies caching particu- 3.1 The Coral Dns server lar web objects, and to locate nearby Coral nodes for the purposes of minimizing internal request latency The Coral dns server. dnssry. returns ip addresses of Instead of one global overlay as in [5, 14, 27], each Coral Http proxies when browsers look up the host Coral node belongs to several distinct DSHTs called clus- names in Coralized URLS. To improve locality, it at ters. Each cluster is characterized by a maximum desired tempts to return proxies near requesting clients. In partic network round-trip-time(RTT) we call the diameter. The ular, whenever a dNS resolver(client) contacts a nearby system is parameterized by a fixed hierarchy of diameters dnssrv instance, dnssrv both returns proxies within an ap- known as levels. Every node is a member of one DSht propriate cluster, and ensures that future DNS requests at each level. A group of nodes can form a level-i cluster from that client will not need to leave the cluster. Using if a high-enough fraction their pair-wise RTTs are below the nodes function, dnssrv also exploits Corals on-the- the level-i diameter threshold. Although Corals imple- fly network measurement capabilities and stored topology mentation allows for an arbitrarily-deep dShT hierarchy, hints to increase the chances of clients discovering nearby this paper describes a three-level hierarchy with thresh- DNS servers olds of oo, 60 msec, and 20 msec for level-0,-1, and-2 More specifically, every instance of dnssrv is an au- clusters respectively. Coral queries nodes in higher-level, thoritative nameserver for the domain nyucd. net. AS- fast clusters before those in lower-level, slower clusters. suming a 3-level hierarchy, as Coral is generally config- ThisbothreducesthelatencyoflookupsandincreasesureddnssrvmapsanydomainnameendinghttpL2 the chances of returning values stored by nearby nodes L1. l0. nyucd. net to one or more cOral Httpprox Coral provides the following interface to higher-level ies.(For an(n+ 1)-level hierarchy, the domain name applicatior is extended out to Ln in the obvious way. ) Because put(key,val,ttl,[llEvels):InsertsamappingfromDnsDnaMealias[4),nyud.netwithtargethttp the key to some arbitrary value, specifying the time- L2 L1. LO nyucd. net. Any domain name ending to-live of the reference. The caller nyud. net is therefore equivalent to the same name with specifyasubsetoftheclusterhierarchytorestrictsuffixhttp.L2.l1.lo.nyucd.net,allowingCor- the operation to certain levels alizedUrlstohavethemoreconciseformhttp:// .get(key,Llevels):reTrievessomesubsetoftheval-www.x.com.nyud.net:8090/ es stored under a key. Again, one can optionally dnssrv assumes that web browsers are generally pecify a subset of the cluster hierarchy. to their resolvers on the network so that the source
7. The client sends the HTTP request http://www. x.com.nyud.net:8090/ to the specified proxy. If the proxy is caching the file locally, it returns the file and stops. Otherwise, this process continues. 8. The proxy looks up the web object’s URL in Coral. 9. If Coral returns the address of a node caching the object, the proxy fetches the object from this node. Otherwise, the proxy downloads the object from the origin server, www.x.com (not shown). 10. The proxy stores the web object and returns it to the client browser. 11. The proxy stores a reference to itself in Coral, recording the fact that is now caching the URL. 2.3 The Coral Indexing Abstraction This section introduces the Coral indexing infrastructure as used by CoralCDN. Coral provides a distributed sloppy hash table (DSHT) abstraction. DSHTs are designed for applications storing soft-state key/value pairs, where multiple values may be stored under the same key. CoralCDN uses this mechanism to map a variety of types of key onto addresses of CoralCDN nodes. In particular, it uses DSHTs to find Coral nameserverstopologically close clients’ networks, to find HTTP proxies caching particular web objects, and to locate nearby Coral nodes for the purposes of minimizing internal request latency. Instead of one global overlay as in [5, 14, 27], each Coral node belongs to several distinct DSHTs called clusters. Each cluster is characterized by a maximum desired network round-trip-time (RTT) we call the diameter. The system is parameterized by a fixed hierarchy of diameters known as levels. Every node is a member of one DSHT at each level. A group of nodes can form a level-i cluster if a high-enough fraction their pair-wise RTTs are below the level-i diameter threshold. Although Coral’s implementation allows for an arbitrarily-deep DSHT hierarchy, this paper describes a three-level hierarchy with thresholds of ∞, 60 msec, and 20 msec for level-0, -1, and -2 clusters respectively. Coral queries nodes in higher-level, fast clusters before those in lower-level, slower clusters. This both reduces the latency of lookups and increases the chances of returning values stored by nearby nodes. Coral provides the following interface to higher-level applications: • put(key, val, ttl, [levels]): Inserts a mapping from the key to some arbitrary value, specifying the timeto-live of the reference. The caller may optionally specify a subset of the cluster hierarchy to restrict the operation to certain levels. • get(key, [levels]): Retrieves some subset of the values stored under a key. Again, one can optionally specify a subset of the cluster hierarchy. • nodes(level, count, [target], [services]): Returns count neighbors belonging to the node’s cluster as specified by level. target, if supplied, specifies the IP address of a machine to which the returned nodes would ideally be near. Coral can probe target and exploit network topology hints stored in the DSHT to satisfy the request. If services is specified, Coral will only return nodes running the particular service, e.g., an HTTP proxy or DNS server. • levels(): Returns the number of levels in Coral’s hierarchy and their corresponding RTT thresholds. The next section describes the design of CoralCDN’s DNS redirector and HTTP proxy—especially with regard to their use of Coral’s DSHT abstraction and clustering hierarchy—before returning to Coral in Section 4. 3 Application-Layer Components The Coral DNS server directs browsersfetching Coralized URLs to Coral HTTP proxies, attempting to find ones near the requesting client. These HTTP proxies exploit each others’ caches in such a way as to minimize both transfer latency and the load on origin web servers. 3.1 The Coral DNS server The Coral DNS server, dnssrv, returns IP addresses of Coral HTTP proxies when browsers look up the hostnames in Coralized URLs. To improve locality, it attempts to return proxies near requesting clients. In particular, whenever a DNS resolver (client) contacts a nearby dnssrv instance, dnssrv both returns proxies within an appropriate cluster, and ensures that future DNS requests from that client will not need to leave the cluster. Using the nodes function, dnssrv also exploits Coral’s on-the- fly network measurement capabilities and stored topology hints to increase the chances of clients discovering nearby DNS servers. More specifically, every instance of dnssrv is an authoritative nameserver for the domain nyucd.net. Assuming a 3-level hierarchy, as Coral is generally configured, dnssrv maps any domain name ending http.L2. L1.L0.nyucd.net to one or more Coral HTTP proxies. (For an (n + 1)-level hierarchy, the domain name is extended out to Ln in the obvious way.) Because such names are somewhat unwieldy, we established a DNS DNAME alias [4], nyud.net, with target http. L2.L1.L0.nyucd.net. Any domain name ending nyud.net is therefore equivalent to the same name with suffix http.L2.L1.L0.nyucd.net, allowing Coralized URLs to have the more concise form http:// www.x.com.nyud.net:8090/. dnssrv assumes that web browsers are generally close to their resolvers on the network, so that the source ad- 3
dress of a DNS query reflects the browser's network lo- old, it returns nameservers for L1. LO.nyucd. net. For cation. This assumption holds to varying degrees, but is clients closer than the level-2 threshold, it returns name- good enough that Akamai [12], Digital Island [6], and servers for domain L2 L1 L0 nyucd. net. Because Mirror Image [21] have all successfully deployed com- DNS resolvers query the servers for the most specific mercial CDNS based on DNS redirection. The locality known domain, this scheme allows closer dnssrv instances problem therefore is reduced to returning proxies that are to override the results of more distant ones near the source of a DNS request. In order to achieve lo- Unfortunately, although resolvers can tolerate a frac- cality, dnssrv measures its round-trip-time to the resolver tion of unavailable DNS servers, browsers do not han- and categorizes it by level For a 3-level hierarchy the re- dle bad Http servers gracefully. (this is one reason for solver will correspond to a level 2, level 1, or level 0 client, returning CoralProxy addresses with short TTL fields. pending on how its RTT compares to Corals cluster- As an added precaution, dnssrv only returns CoralProxy level thresholds addresses which it has recently verified first-hand. This When asked for the address of a hostname ending sometimes means synchronously checking a proxy's sta httpL2.ll.l0.nyUcd.netdnssr'sreplycontus(viaaUdpRpc)priorreplyingtoadnsqueryWe tains two sections of interest: A set of addresses for the note further that people who wish to contribute only up- name--answers to the query-and a set of nameservers stream bandwidth can flag their proxy as"non-recursive for that name's domain-known as the authority section in which case dnssrv will only return that proxy to clients of a DNS reply. dnssrv returns addresses of CoralProx- on local networks ies in the cluster whose level corresponds to the client level categorization In other words if the Rtt between 3.2 The Coral Http proxy the dns client and dnssry is below the level-i threshold (for the best i), dnssrv will only return addresses of Coral The Coral Http proxy Coralproxy, satisfies Http re- nodes in its level-i cluster dnssry obtains a list of such quests for Coralized URLS. It seeks to provide reasonable nodes with the nodes function. Note that dns srv always re- request latency and high system throughput, even while turns CoralProxy addresses with short time-to-live fields serving data from origin servers behind comparatively slow network links such as home broadband connections To achieve better locality, dnssrv also specifies the This design space requires particular care in minimiz- clients IP address as a target argument to nodes. This ing load on origin servers compared to traditional cdNs, causes Coral to probe the addresses of the last five net- for two reasons. First, many of Coral's origin servers work hops to the client and use the results to look for are likely to have slower network connections than typ- clustering hints in the DSHTs. To avoid significantly de- ical customers of commercial CDNS. Second,commer- laying clients, Coral maps these network hops using a fast, cial CDNs often collocate a number of machines at each built-in traceroute-like mechanism that combines concur- deployment site and then select proxies based in part on rent probes and aggressive time-outs to minimize latency. the URL requested-eftectively distributing URLs across The entire mapping process generally requires around 2 proxies. Coral, in contrast, selects proxies only based on RTTs and 350 bytes of bandwidth. A Coral node caches client locality. Thus, in CoralCDN, it is much easier for results to avoid repeatedly probing the same client. every single proxy to end up fetching a particular URL The closer dnssryv is to a client the better its selection of To aggressively minimize load on origin servers, CoralProxy addresses will likely be for the client. dnssrv CoralProxy must fetch web pages from other proxies therefore exploits the authority section of DNS replies to whenever possible. Each proxy keeps a local cache from lock a DNS client into a good cluster whenever it happens which it can immediately fulfill requests. When a client upon a nearby dnssrv. As with the answer section, dnssrv requests a non-resident URL, CoralProxy first attempts selects the nameservers it returns from the appropriate to locate a cached copy of the referenced resource using cluster level and uses the target argument to exploit mea- Coral (a get ), with the resource indexed by a SHA-I hash surement and network hints. Unlike addresses in the an- of its URL [22]. If CoralProxy discovers that one or more section, however, it gives nameservers in the author- other proxies have the data, it attempts to fetch the data section a long TTL(one hour). a nearby dnssrv must from the proxy to which it first connects. If Coral provides therefore override any inferior nameservers a dNS client no referrals or if no referrals return the data, CoralProxy may be caching from previous queries. dnssrv does so by must fetch the resource directly from the origin manipulating the domain for which returned nameservers While CoralProxy is fetching a web object-either servers. To clients more distant than the level-1 timing from the origin or from another CoralProxy-it inserts a reshold, dnssrv claims to return nameservers for domain reference to itself in its DSHTs with a time-to-live of 20 LO. nyucd. net. For clients closer than that thresh- seconds. (It will renew this short-lived reference until it completes the download. )Thus, if a flash crowd suddenly
dress of a DNS query reflects the browser’s network location. This assumption holds to varying degrees, but is good enough that Akamai [12], Digital Island [6], and Mirror Image [21] have all successfully deployed commercial CDNs based on DNS redirection. The locality problem therefore is reduced to returning proxies that are near the source of a DNS request. In order to achieve locality, dnssrv measures its round-trip-time to the resolver and categorizes it by level. For a 3-level hierarchy, the resolver will correspond to a level 2, level 1, or level 0 client, depending on how its RTT compares to Coral’s clusterlevel thresholds. When asked for the address of a hostname ending http.L2.L1.L0.nyucd.net, dnssrv’s reply contains two sections of interest: A set of addresses for the name—answers to the query—and a set of nameservers for that name’s domain—known as the authority section of a DNS reply. dnssrv returns addresses of CoralProxies in the cluster whose level corresponds to the client’s level categorization. In other words, if the RTT between the DNS client and dnssrv is below the level-i threshold (for the best i), dnssrv will only return addresses of Coral nodes in its level-i cluster. dnssrv obtains a list of such nodes with the nodesfunction. Note that dnssrv alwaysreturns CoralProxy addresses with short time-to-live fields (30 seconds for levels 0 and 1, 60 for level 2). To achieve better locality, dnssrv also specifies the client’s IP address as a target argument to nodes. This causes Coral to probe the addresses of the last five network hops to the client and use the results to look for clustering hints in the DSHTs. To avoid significantly delaying clients, Coral mapsthese network hops using a fast, built-in traceroute-like mechanism that combines concurrent probes and aggressive time-outs to minimize latency. The entire mapping process generally requires around 2 RTTs and 350 bytes of bandwidth. A Coral node caches results to avoid repeatedly probing the same client. The closer dnssrv is to a client, the better its selection of CoralProxy addresses will likely be for the client. dnssrv therefore exploits the authority section of DNS replies to lock a DNS client into a good cluster whenever it happens upon a nearby dnssrv. As with the answer section, dnssrv selects the nameservers it returns from the appropriate cluster level and uses the target argument to exploit measurement and network hints. Unlike addresses in the answer section, however, it gives nameservers in the authority section a long TTL (one hour). A nearby dnssrv must therefore override any inferior nameservers a DNS client may be caching from previous queries. dnssrv does so by manipulating the domain for which returned nameservers are servers. To clients more distant than the level-1 timing threshold, dnssrv claims to return nameserversfor domain L0.nyucd.net. For clients closer than that threshold, it returns nameserversfor L1.L0.nyucd.net. For clients closer than the level-2 threshold, it returns nameservers for domain L2.L1.L0.nyucd.net. Because DNS resolvers query the servers for the most specific known domain, this scheme allows closer dnssrv instances to override the results of more distant ones. Unfortunately, although resolvers can tolerate a fraction of unavailable DNS servers, browsers do not handle bad HTTP servers gracefully. (This is one reason for returning CoralProxy addresses with short TTL fields.) As an added precaution, dnssrv only returns CoralProxy addresses which it has recently verified first-hand. This sometimes means synchronously checking a proxy’s status (via a UDP RPC) prior replying to a DNS query. We note further that people who wish to contribute only upstream bandwidth can flag their proxy as “non-recursive,” in which case dnssrv will only return that proxy to clients on local networks. 3.2 The Coral HTTP proxy The Coral HTTP proxy, CoralProxy, satisfies HTTP requests for Coralized URLs. It seeks to provide reasonable request latency and high system throughput, even while serving data from origin servers behind comparatively slow network links such as home broadband connections. This design space requires particular care in minimizing load on origin servers compared to traditional CDNs, for two reasons. First, many of Coral’s origin servers are likely to have slower network connections than typical customers of commercial CDNs. Second, commercial CDNs often collocate a number of machines at each deployment site and then select proxies based in part on the URL requested—effectively distributing URLs across proxies. Coral, in contrast, selects proxies only based on client locality. Thus, in CoralCDN, it is much easier for every single proxy to end up fetching a particular URL. To aggressively minimize load on origin servers, a CoralProxy must fetch web pages from other proxies whenever possible. Each proxy keeps a local cache from which it can immediately fulfill requests. When a client requests a non-resident URL, CoralProxy first attempts to locate a cached copy of the referenced resource using Coral (a get), with the resource indexed by a SHA-1 hash of its URL [22]. If CoralProxy discovers that one or more other proxies have the data, it attempts to fetch the data from the proxy to which it first connects. If Coral provides no referrals or if no referrals return the data, CoralProxy must fetch the resource directly from the origin. While CoralProxy is fetching a web object—either from the origin or from another CoralProxy—it inserts a reference to itself in its DSHTs with a time-to-live of 20 seconds. (It will renew this short-lived reference until it completes the download.) Thus, if a flash crowd suddenly 4
simultaneous requests, will naturally form a kind of mul- 4 5 70 2 3 13 14 ticast tree for retrieving the web page. Once any Coral- distance(nodeids xor 4) Proxy obtains the full file, it inserts a much longer-lived reference to itself(e. g, I hour ) Because the insertion al- gorithm accounts for TTL, these longer-lived references RPC#1(2) will overwrite shorter-lived ones, and they can be stored 4.5,7 R on well-selected nodes even under high insertion load, as CoralProxies periodically renew referrals to resources L0 later described in section 4.2 34 67 RPC#2(1 their caches. A proxy should not evict a web object from its cache while a reference to it may persist in the DSHT. Ideally, proxies would adaptively set TTLs based on cache capacity, though this is not yet implemented 67 PC#3(0) 4 Coral: A Hierarchical Indexing system Figure 2: Example of routing operations in a system contain- This section describes the Coral indexing infrastructure, ing eight nodes with IDs (4, 5, 7, 0, 2, 3, 13, 14. In this illus- which CoralCDN leverages to achieve scalability, self- tration, node R with id 14 is looking up the node closest to organization, and efficient data retrieval. We describe how key k= 4, and we have sorted the nodes by their distance to Coral implements the put and get operations that form k. The top boxed row illustrates XOr distances for the nodes the basis of its distributed sloppy hash table(DSHT) ab- 10, 2, 3, 13, 14] that are initially known by R. R fi rst contacts a straction:the underlying key-based routing layer(4.1), known peer whose distance to k is closest to half of R's distance the dSht algorithms that balance load(4.2), and the (10/2= 5); in this illustration, this peer is node zero, whose changes that enable latency and data-placement optimiza- distance to k is 0 e 4=4. Data in RPC requests and responses tions within a hierarchical set of DSHTs(4.3). Finally, are shown in parentheses and braces, respectively: R asks node we describe the clustering mechanisms that manage this zero for its peers that are half-way closer to k, i.e., those at dis- hierarchical structure(4.4 tance ==2. R inserts these new references into its routing table (middle row ). R now repeats this process, contacting node fi ve. signed IDs in the same 160-bit identifier space. A nod, s L ose distance I is closest to Finally, R contacts node four, 4.1 Corals Key-Based Routing Layer ose distance is 0, and completes its search(bottom row) Corals keys are opaque 160-bit ID values; nodes are ID is the sha-1 hash of its ip address. Coral defines a est node to k or some node whose distance to k is at least distance metric on IDs. Henceforth, we describe a node one bit shorter than Rs. This permits R to visit a se- as being close to a key if the distance between the key and quence of nodes with monotonically decreasing distances the node's ID is small. A Coral put operation stores a [ d1 I to k, such that the encoding of di +1 as a bi- key/value pair at a node close to the key. a get operation nary number has one fewer bit than di. As a result, the searches for stored key/value pairs at nodes successively expected number of iterations for R to discover the clos- closer to the key. To support these operations, a node re- est node to k is logarithmic in the number of node quires some mechanism to discover other nodes close to Figure 2 illustrates the Coral routing algorithm, which any art successively visits nodes whose distances to the key Every DShT contains a routing table. For any key k, a approximately halved each iteration. Traditional key node R's routing table allows it to find a node closer to h, based routing layers attempt to route directly to the node unless R is already the closest node. These routing tables closest to the key whenever possible [25, 26, 31, 35],re- are based on Kademlia [17], which defines the distance sorting to several intermediate hops only when faced with between two values in the ID-space to be their bitwise incomplete routing information. By caching additional exclusive or(XOR), interpreted as an unsigned integer. routing state-beyond the necessary log(n)references- Using the XOR metric, IDs with longer matching prefixes these systems in practice manage to achieve routing in a (of most significant bits) are numerically closer constant number of hops. We observe that frequent refer- The size of a nodes routing table in a DSHT is logarith- ences to the same key can generate high levels of traffic in mic in the total number of nodes comprising the dShT. nodes close to the key. This congestion, called tree satul- If a node R is not the closest node to some key k, then ration, was first identified in shared-memory interconnec- R's routing table almost always contains either the clos- tion networks [24
fetches a web page, all CoralProxies, other than the first simultaneous requests, will naturally form a kind of multicast tree for retrieving the web page. Once any CoralProxy obtains the full file, it inserts a much longer-lived reference to itself (e.g., 1 hour). Because the insertion algorithm accounts for TTL, these longer-lived references will overwrite shorter-lived ones, and they can be stored on well-selected nodes even under high insertion load, as later described in Section 4.2. CoralProxies periodically renew referrals to resources in their caches. A proxy should not evict a web object from its cache while a reference to it may persist in the DSHT. Ideally, proxies would adaptively set TTLs based on cache capacity, though this is not yet implemented. 4 Coral: A Hierarchical Indexing System This section describes the Coral indexing infrastructure, which CoralCDN leverages to achieve scalability, selforganization, and efficient data retrieval. We describe how Coral implements the put and get operations that form the basis of its distributed sloppy hash table (DSHT) abstraction: the underlying key-based routing layer (4.1), the DSHT algorithms that balance load (4.2), and the changes that enable latency and data-placement optimizations within a hierarchical set of DSHTs (4.3). Finally, we describe the clustering mechanisms that manage this hierarchical structure (4.4). 4.1 Coral’s Key-Based Routing Layer Coral’s keys are opaque 160-bit ID values; nodes are assigned IDs in the same 160-bit identifier space. A node’s ID is the SHA-1 hash of its IP address. Coral defines a distance metric on IDs. Henceforth, we describe a node as being close to a key if the distance between the key and the node’s ID is small. A Coral put operation stores a key/value pair at a node close to the key. A get operation searches for stored key/value pairs at nodes successively closer to the key. To support these operations, a node requires some mechanism to discover other nodes close to any arbitrary key. Every DSHT contains a routing table. For any key k, a node R’s routing table allows it to find a node closer to k, unless R is already the closest node. These routing tables are based on Kademlia [17], which defines the distance between two values in the ID-space to be their bitwise exclusive or (XOR), interpreted as an unsigned integer. Using the XOR metric, IDs with longer matching prefixes (of most significant bits) are numerically closer. The size of a node’s routing table in a DSHT is logarithmic in the total number of nodes comprising the DSHT. If a node R is not the closest node to some key k, then R’s routing table almost always contains either the clos- 6 7 9 4 6 7 9 10 0 1 3 4 10 RPC#3 (0) R 0 2 3 13 14 {4, 5, 7} 0 1 3 4 6 7 9 10 4 5 7 {4, 5, 7} RPC#1 (2) target 2 RPC#2 (1) target 0 distance (nodeids xor 4) target 5 nodeids Figure 2: Example of routing operations in a system containing eight nodes with IDs {4, 5, 7, 0, 2, 3, 13, 14}. In this illustration, node R with id = 14 is looking up the node closest to key k = 4, and we have sorted the nodes by their distance to k. The top boxed row illustrates XOR distances for the nodes {0, 2, 3, 13, 14} that are initially known by R. R first contacts a known peer whose distance to k is closest to half of R’s distance (10/2 = 5); in this illustration, this peer is node zero, whose distance to k is 0 ⊕ 4 = 4. Data in RPC requests and responses are shown in parentheses and braces, respectively: R asks node zero for its peers that are half-way closer to k, i.e., those at distance 4 2 =2. R inserts these new references into its routing table (middle row). R now repeats this process, contacting node five, whose distance 1 is closest to 4 2 . Finally, R contacts node four, whose distance is 0, and completes its search (bottom row). est node to k, or some node whose distance to k is at least one bit shorter than R’s. This permits R to visit a sequence of nodes with monotonically decreasing distances [d1, d2, . . .] to k, such that the encoding of di+1 as a binary number has one fewer bit than di . As a result, the expected number of iterations for R to discover the closest node to k is logarithmic in the number of nodes. Figure 2 illustrates the Coral routing algorithm, which successively visits nodes whose distances to the key are approximately halved each iteration. Traditional keybased routing layers attempt to route directly to the node closest to the key whenever possible [25, 26, 31, 35], resorting to several intermediate hops only when faced with incomplete routing information. By caching additional routing state—beyond the necessary log(n) references— these systems in practice manage to achieve routing in a constant number of hops. We observe that frequent references to the same key can generate high levels of traffic in nodes close to the key. This congestion, called tree saturation, was first identified in shared-memory interconnection networks [24]. 5