Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering University of california, San Diego Abstract Doom can spread through multiple vectors and have Network worms are a clear and growing threat to the se- added backdoors, mail-relays and denial-of-service at curity of todays Internet-connected hosts and networks tacks to their payloads The combination of the internets unrestricted connec- Unfortunately, our current ability to defend against tivity and widespread software homogeneity allows net- these outbreaks is extremely poor and their propagation. In fact, modern worms can spread so In fact, the basic approach of detection, characterization, quickly, and so widely, that no human-mediated reaction and containment has not changed significantly over the can hope to contain an outbreak. last five years. Typically, new worms are detected in an G In this paper, we propose an automated approach ad hoc fashion by a combination of intrusion detection quickly detecting previously unknown worms and systems and administrator legwork. Then, after isolat- viruses based on two key behavioral characteristics- ing an instance of the worm, skilled security profession- a common exploit sequence together with a range of als manually characterize a worm signature and finally, unique sources generating infections and destinations be- this signature is used to contain subsequent infections ng targeted. More importantly, our approach- called via updates to anti-virus software and network filtering content sifting"-automatically generates precise sig- products. While this approach is qualitatively sound, it is natures that can then be used to filter or moderate the quantitatively insufficient. Manual signature extraction spread of the worm elsewhere in the network is an expensive, slow, manual procedure that can take Using a combination of existing and novel algorithms hours or even days to complete. It requires isolating a we have developed a scalable content sifting implemen- new worm, decompiling it, looking for invariant code tation with low memory and CPU requirements. Over quences and testing the signature for uniqueness. How months of active use at UCSD, our Earlybird prototype ever, recent simulations by Moore et al. suggest that an system has automatically detected and generated signa- effective worm containment can require a reaction time tures for all pathogens known to be active on our network of well under sixty seconds[23]. More concretely,con- as well as for several new worms and viruses which were sider that in the time it took to read this section, the slam- unknown at the time our system identified them. Our mer worm had contacted well over a billion distinct In- initial experience suggests that, for a wide range of net- ternet hosts work pathogens, it may be practical to construct fully This paper investigates the challenges in addressin automated defenses -even against so-called"zero-day this problem and describes a prototype system, called Earlybird, that can automatically detect and contain new worms on the network using precise signatures. Our ap. 1 Introduction proach, which we call content sifting, is based on two In the last three years, large-scale Internet worm out- observations: first, that some portion of the content in ex breaks have profoundly demonstrated the threat posed isting worms is invariant-typically the code exploiting a by self-propagating programs. The combination of latent host vulnerability-and second, that the spreading despread software homogeneity and the Internets un- dynamics of a worm is atypical of Internet application restricted communication model creates an ideal climate Simply stated, it is rare to observe the same string re- for infectious pathogens. Worse, each new epidemic has curring within packets sent from many sources to many demonstrated increased speed, virulence or sophistica- destinations. By sifting through network traffic for con- tion over its predecessors. While the Code Red worm tent strings that are both frequently repeated and widely took over fourteen hours to infect its vulnerable pop dispersed, we can automatically identify new worms and ulation in 2001, the Slammer worm, released some 18 their precise signatures months later, did the same in under 10 minutes [22, 21] In our prototype system, we have developed appl The Code red worm is thought to have infected roughly mate versions of this algorithm that are amenable to high 360,000 hosts, while, by some estimates, the Nimda speed implementation. In live experiments on a portion worm compromised over two million [8]. While early of the UCSD campus network, we have deployed Early worms typically spread by a single mechanism and bird and demonstrated that it can automatically extract little else, modern variants such as SoBig. F and My- the signature for all known active worms(e.g Codered
Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering University of California, San Diego Abstract Network worms are a clear and growing threat to the security of today’s Internet-connected hosts and networks. The combination of the Internet’s unrestricted connectivity and widespread software homogeneity allows network pathogens to exploit tremendous parallelism in their propagation. In fact, modern worms can spread so quickly, and so widely, that no human-mediated reaction can hope to contain an outbreak. In this paper, we propose an automated approach for quickly detecting previously unknown worms and viruses based on two key behavioral characteristics – a common exploit sequence together with a range of unique sources generating infections and destinations being targeted. More importantly, our approach – called “content sifting” – automatically generates precise signatures that can then be used to filter or moderate the spread of the worm elsewhere in the network. Using a combination of existing and novel algorithms we have developed a scalable content sifting implementation with low memory and CPU requirements. Over months of active use at UCSD, our Earlybird prototype system has automatically detected and generated signatures for all pathogens known to be active on our network as well as for several new worms and viruses which were unknown at the time our system identified them. Our initial experience suggests that, for a wide range of network pathogens, it may be practical to construct fully automated defenses – even against so-called “zero-day” epidemics. 1 Introduction In the last three years, large-scale Internet worm outbreaks have profoundly demonstrated the threat posed by self-propagating programs. The combination of widespread software homogeneity and the Internet’s unrestricted communication model creates an ideal climate for infectious pathogens. Worse, each new epidemic has demonstrated increased speed, virulence or sophistication over its predecessors. While the Code Red worm took over fourteen hours to infect its vulnerable population in 2001, the Slammer worm, released some 18 months later, did the same in under 10 minutes [22, 21]. The Code Red worm is thought to have infected roughly 360,000 hosts, while, by some estimates, the Nimda worm compromised over two million [8]. While early worms typically spread by a single mechanism and did little else, modern variants such as SoBig.F and MyDoom can spread through multiple vectors and have added backdoors, mail-relays and denial-of-service attacks to their payloads. Unfortunately, our current ability to defend against these outbreaks is extremely poor and has not advanced significantly since the Code Red episode in mid-2001. In fact, the basic approach of detection, characterization, and containment has not changed significantly over the last five years. Typically, new worms are detected in an ad hoc fashion by a combination of intrusion detection systems and administrator legwork. Then, after isolating an instance of the worm, skilled security professionals manually characterize a worm signature and finally, this signature is used to contain subsequent infections via updates to anti-virus software and network filtering products. While this approach is qualitatively sound, it is quantitatively insufficient. Manual signature extraction is an expensive, slow, manual procedure that can take hours or even days to complete. It requires isolating a new worm, decompiling it, looking for invariant code sequences and testing the signature for uniqueness. However, recent simulations by Moore et al. suggest that an effective worm containment can require a reaction time of well under sixty seconds [23]. More concretely, consider that in the time it took to read this section, the Slammer worm had contacted well over a billion distinct Internet hosts. This paper investigates the challenges in addressing this problem and describes a prototype system, called Earlybird, that can automatically detect and contain new worms on the network using precise signatures. Our approach, which we call content sifting, is based on two observations: first, that some portion of the content in existing worms is invariant – typically the code exploiting a latent host vulnerability – and second, that the spreading dynamics of a worm is atypical of Internet applications. Simply stated, it is rare to observe the same string recurring within packets sent from many sources to many destinations. By sifting through network traffic for content strings that are both frequently repeated and widely dispersed, we can automatically identify new worms and their precise signatures. In our prototype system, we have developed approximate versions of this algorithm that are amenable to highspeed implementation. In live experiments on a portion of the UCSD campus network, we have deployed Earlybird and demonstrated that it can automatically extract the signature for all known active worms (e.g. CodeRed
Slammer). Moreover, during our experiments Earlybird age worm outbreaks. Staniford et al.s landmark paper detected and extracted a signature for the Blaster, My- anticipated the development of far faster worms and ex- Doom and Kibuv B worms- significantly before they trapolated their growth analytically [42]-foreshadow- had been publicly disclosed and hours or days before any ing the release of the Slammer worm in 2002. Moore public detection signatures were distributed. Finally, in et al. subsequently analyzed the Slammer outbreak and our testing over a period of eight months, we have ex- estimated that almost all of the Internet address space perienced relatively few false positives and exceptions was scanned by the worm in under 10 minutes-limited are typically due to structural properties of a few popu- only by bandwidth constraints at the infected sites [21] lar protocols(SPAM via SMTP and NetBlOS) that recur This experience also motivates the need for fast and au- consistently and can be procedurally"white-listed tomated reaction times. Finally, based on these results, The remainder of this paper is structured as follows. Moore et al. analyzed the engineering requirements for In Section 2 we survey the field of worm research that reactive defenses-exploring the tradeoffs between reac. re have built upon and describe how it motivates our tion time, deployment and the granularity of containment work. Section 3 describes how we define worm behav- mechanisms(signature based vs. IP address based)[23] ior. Section 4 outlines a naive approach to detecting such Two of their key findings motivate our work. behaviors, followed by a concrete description of practi- First, they demonstrated that signature-based methods cal content sifting algorithms in Section 5. Section 6 can be an order of magnitude more effective than simply describes the implementation of the Earlybird prototype quarantining infected hosts piecemeal. The rough intu and an analysis of our live experiments using it. We de- ition for this is simple: if a worm can compromise a new scribe limitations and extensions in Section 7. Finally, host with an average latency of a seconds, then an ad- in Section 8 we summarize our findings and conclude dress based quarantine can must react more quickly than I seconds to prevent the worm from spreading. By con- 2 Background and related Work trast, a signature based system can, in principle, halt all Worms are simply small programs. They spread by ex- subsequent spreading once a signature is identified. The ploiting a latent software vulnerability in some popular second important result was their derivation, via simu network service-such as email. Web or terminal access tion, of"benchmarks"for how quickly such signatures 9 Seizing control of program execution and then sending must be generated to offer effective containment. Slow copy of themselves to other susceptible hosts spreading worms, such as CodeRe ffectively While the potential threat posed by network worms contained if signatures are generated within 60 minutes, has a long past -originating with fictional accounts while containing high-speed worms, such as Slammer in gerrold's"When harlie was One" and Brunner's may require signature generation in well under 5 minutes "Shockwave Rider"-it is only recently that this threat -perhaps as little as 60 seconds. Our principal contribu- has enjoyed significant research attention. Fred Co- tion is demonstrating practical mechanisms for achieving hen first lay the theoretical foundations for understand- this requirement. ing computer viruses in 198414, 5], and the Internet In the remainder of this section we examine existing worm of 1988 demonstrated that self-replication via a techniques for detecting worm outbreaks, characteriz- network could dramatically amplify the virulence of such ing worms and proposed countermeasures for mitigating pathogens [33, 39]. However, the analysis and under- worm spread standing of network worms did not advance substantially until the Code Red outbreak of 2001. In this section. we 2.1 Worm Detection attempt to summarize the contemporary research litera- Three current classes of methods are used for detecting ure-especially in its relation to our own work. detection, honeypots, and behavioral The first research papers in the"modern worm era" techniques at end hosts. We consider each of these in focused on characterizations and analyses of particu lar worm outbreaks. For example, Moore et al. pub Worms spread by selecting susceptible target hosts, in- lished one of the first empirical analyses of the CodeRed fecting them over the network, and then repeating this worms growth, based on unsolicited scans passively ob- process in a distributed recursive fashion. Many existing served on an unused network [22]. Further, the authors worms, excepting email viruses, will select targets using estimated the operational"repair"rate by actively prob- a random process. For instance, CodeRed selected target ng a subsample of the 360,000 infected sites over time. IP addresses uniformly from the entire address space. As They found that, despite unprecedented media coverage, a result, a worm may will be highly unusual in the num- the repair rate during the initial outbreak averaged under ber, frequency and distribution of addresses that it scans 2 percent per day. This reinforces our belief that fully This can be leveraged to detect worms in several ways ated intervention is necessary to effectively To monitor random scanning worms from a global p
Slammer). Moreover, during our experiments Earlybird detected and extracted a signature for the Blaster, MyDoom and Kibuv.B worms – significantly before they had been publicly disclosed and hours or days before any public detection signatures were distributed. Finally, in our testing over a period of eight months, we have experienced relatively few false positives and exceptions are typically due to structural properties of a few popular protocols (SPAM via SMTP and NetBIOS) that recur consistently and can be procedurally “white-listed”. The remainder of this paper is structured as follows. In Section 2 we survey the field of worm research that we have built upon and describe how it motivates our work. Section 3 describes how we define worm behavior. Section 4 outlines a naive approach to detecting such behaviors, followed by a concrete description of practical content sifting algorithms in Section 5. Section 6 describes the implementation of the Earlybird prototype and an analysis of our live experiments using it. We describe limitations and extensions in Section 7. Finally, in Section 8 we summarize our findings and conclude. 2 Background and Related Work Worms are simply small programs. They spread by exploiting a latent software vulnerability in some popular network service – such as email, Web or terminal access – seizing control of program execution and then sending a copy of themselves to other susceptible hosts. While the potential threat posed by network worms has a long past – originating with fictional accounts in Gerrold’s “When Harlie was One” and Brunner’s “Shockwave Rider” – it is only recently that this threat has enjoyed significant research attention. Fred Cohen first lay the theoretical foundations for understanding computer viruses in 1984 [4, 5], and the Internet worm of 1988 demonstrated that self-replication via a network could dramatically amplify the virulence of such pathogens [33, 39]. However, the analysis and understanding of network worms did not advance substantially until the CodeRed outbreak of 2001. In this section, we attempt to summarize the contemporary research literature – especially in its relation to our own work. The first research papers in the “modern worm era” focused on characterizations and analyses of particular worm outbreaks. For example, Moore et al. published one of the first empirical analyses of the CodeRed worm’s growth, based on unsolicited scans passively observed on an unused network [22]. Further, the authors estimated the operational “repair” rate by actively probing a subsample of the 360,000 infected sites over time. They found that, despite unprecedented media coverage, the repair rate during the initial outbreak averaged under 2 percent per day. This reinforces our belief that fully automated intervention is necessary to effectively manage worm outbreaks. Staniford et al.’s landmark paper anticipated the development of far faster worms and extrapolated their growth analytically [42] – foreshadowing the release of the Slammer worm in 2002. Moore et al. subsequently analyzed the Slammer outbreak and estimated that almost all of the Internet address space was scanned by the worm in under 10 minutes – limited only by bandwidth constraints at the infected sites [21]. This experience also motivates the need for fast and automated reaction times. Finally, based on these results, Moore et al. analyzed the engineering requirements for reactive defenses – exploring the tradeoffs between reaction time, deployment and the granularity of containment mechanisms (signature based vs. IP address based) [23]. Two of their key findings motivate our work. First, they demonstrated that signature-based methods can be an order of magnitude more effective than simply quarantining infected hosts piecemeal. The rough intuition for this is simple: if a worm can compromise a new host with an average latency of x seconds, then an address based quarantine can must react more quickly than x seconds to prevent the worm from spreading. By contrast, a signature based system can, in principle, halt all subsequent spreading once a signature is identified. The second important result was their derivation, via simulation, of “benchmarks” for how quickly such signatures must be generated to offer effective containment. Slowspreading worms, such as CodeRed can be effectively contained if signatures are generated within 60 minutes, while containing high-speed worms, such as Slammer, may require signature generation in well under 5 minutes – perhaps as little as 60 seconds. Our principal contribution is demonstrating practical mechanisms for achieving this requirement. In the remainder of this section we examine existing techniques for detecting worm outbreaks, characterizing worms and proposed countermeasures for mitigating worm spread. 2.1 Worm Detection Three current classes of methods are used for detecting new worms: scan detection, honeypots, and behavioral techniques at end hosts. We consider each of these in turn. Worms spread by selecting susceptible target hosts, infecting them over the network, and then repeating this process in a distributed recursive fashion. Many existing worms, excepting email viruses, will select targets using a random process. For instance, CodeRed selected target IP addresses uniformly from the entire address space. As a result, a worm may will be highly unusual in the number, frequency and distribution of addresses that it scans. This can be leveraged to detect worms in several ways. To monitor random scanning worms from a global per-
spective, one approach is to use network telescopes to send a packet from the same buffer containing a re- passive network monitors that observe large ranges of ceived packet is often indicative of suspicious activity unused, yet routable, address space [25, 22, 26]. Under While behavioral techniques are able to leverage large the assumption that worms will select target victims at amounts of detailed context about application and sys random, a new worm will scan a given network telescope tem behavior, they can be expensive to manage and de with a probability directly proportional to the worms ploy ubiquitously. Moreover, end-host systems can, by scan rate and the network telescope's"size; that is, the definition, only detect an attack against a single host and number of IP addresses monitored. Consequently, large not infer the presence of a large-scale outbreak. Clearly, network telescopes will be able to detect fast spreading from a management, cost and reuse standpoint, it is ideal worms of this type fairly quickly. At the enterprise level, to detect and block new attacks in the network. That Staniford provides a comprehensive analysis of the fac- said, end-host approaches offer a level of sensitivity that tors impacting the ability of a network monitor to suc- is difficult to match in the network and can be a useful cessfully detect and quarantine infected hosts in an on- complement- particularly for detecting potential slow line fashion [41] or stealthy worms that do not leave a significant imprint However, there are two key limitations to the scan de- on the network tection approach. First, it is not well suited to worms which spread in a non-random fashion, such as e-mail 2.2 Characterization viruses or worms spread via instant messenger or peer- Characterization is the process of analyzing and identify- to-peer applications. Such worms generate a target list ing a new worm or exploit, so that targeted defenses may from address books or buddy lists at the victim and there- be deployed fore spread topologically-according to the implicit rela- One approach is to create a priori vulnerability sig tionship graph between individuals. Consequently, they natures that match known exploitable vulnerabilities in do not exhibit anomalous scanning patterns and will not deployed software [44, 45]. For example, a vulnerability be detected as a consequence. The second drawback is signature for the Slammer worm might match all UDP infected sites, not a signature identifying their behavior. searching for such traffic. either in the network or on Consequently, defenses based on scan detection must be the host, a new worm exploiting the same vulnerability an order of magnitude faster than those based on signa- will be revealed. This is very similar to traditional in- ture extraction[23 trusion detection systems (IDS), such as Snort [1] and a different approach to worm detection is demon- Bro [29], which compare traffic content to databases of strated by Honeypots. First introduced to the commu- strings used in known attacks. This general approach has nity via Cliff Stoll,s book, The Cuckoo's Egg, and Bill the advantage that it can deployed before the outbreak of Cheswick's paper"An Evening with Berferd, honeypots a new worm and therefore can offer an added measure are simply monitored idle hosts with untreated vulner- of defense. However, this sort of proactive characteriza abilities. Any outside interaction with the host is, by tion can only be applied to vulnerabilities that are already definition, unsolicited and any malicious actions can be well-known and well-characterized manually. Further, observed directly. Consequently, any unsolicited out- the tradeoff between vulnerability signature specificity, bound traffic generated by a honeypot represents unde- complexity and false positives remains an open question niable evidence of an intrusion and possibly a worm in- Wang et als Shield, is by far the best-known vulnera- fection. Moreover, since the honeypot host is directly bility blocking system and it focuses on an end-host im- controlled, malicious code can be differentiated from the plementation precisely to better manage some of these default configuration. In this manner, the "body"of a tradeoffs [44]. We do not consider this approach further worm can be isolated and then analyzed to extract a sig- in this paper, but we believe it can be a valuable com- nature. This approach is commonly used for acquiring plement to the automated signature extraction alternative worm instances for manual analysis [18]. There are two we explore principal drawbacks to honeypots: they require a signif The earliest automation for signature extraction is due cant amount of slow manual analysis and they depend to Kephart and Arnold [15]. Their system, used commer- the honeypot being quickly infected by a new worm cially by IBM, allows viruses to infect known"decoy Finally, a technique that has found increasing traction programs in a controlled environment, extracts the in- in the commercial world(e. g. via recently acquired star- fected (i.e, modified)regions of the decoys and then uses tups, Okena and Entracept) is host-based behavioral de- a variety of heuristics to identify invariant code strings tection. Such systems dynamically analyze the patterns across infection instances. Among this set of candidates of system calls for anomalous activity [31, 28, 3]indicat- an"optimal"signature is determined by estimating the ing code injection or propagation. For example, attempts false positive probability against a measured corpus of
spective, one approach is to use network telescopes – passive network monitors that observe large ranges of unused, yet routable, address space [25, 22, 26]. Under the assumption that worms will select target victims at random, a new worm will scan a given network telescope with a probability directly proportional to the worm’s scan rate and the network telescope’s “size”; that is, the number of IP addresses monitored. Consequently, large network telescopes will be able to detect fast spreading worms of this type fairly quickly. At the enterprise level, Staniford provides a comprehensive analysis of the factors impacting the ability of a network monitor to successfully detect and quarantine infected hosts in an online fashion [41]. However, there are two key limitations to the scan detection approach. First, it is not well suited to worms which spread in a non-random fashion, such as e-mail viruses or worms spread via instant messenger or peerto-peer applications. Such worms generate a target list from address books or buddy lists at the victim and therefore spread topologically – according to the implicit relationship graph between individuals. Consequently, they do not exhibit anomalous scanning patterns and will not be detected as a consequence. The second drawback is that scan detection can only provide the IP address of infected sites, not a signature identifying their behavior. Consequently, defenses based on scan detection must be an order of magnitude faster than those based on signature extraction [23]. A different approach to worm detection is demonstrated by Honeypots. First introduced to the community via Cliff Stoll’s book, “The Cuckoo’s Egg”, and Bill Cheswick’s paper “An Evening with Berferd”, honeypots are simply monitored idle hosts with untreated vulnerabilities. Any outside interaction with the host is, by definition, unsolicited and any malicious actions can be observed directly. Consequently, any unsolicited outbound traffic generated by a honeypot represents undeniable evidence of an intrusion and possibly a worm infection. Moreover, since the honeypot host is directly controlled, malicious code can be differentiated from the default configuration. In this manner, the “body” of a worm can be isolated and then analyzed to extract a signature. This approach is commonly used for acquiring worm instances for manual analysis [18]. There are two principal drawbacks to honeypots: they require a signifi- cant amount of slow manual analysis and they depend on the honeypot being quickly infected by a new worm. Finally, a technique that has found increasing traction in the commercial world (e.g. via recently acquired startups, Okena and Entracept) is host-based behavioral detection. Such systems dynamically analyze the patterns of system calls for anomalous activity [31, 28, 3] indicating code injection or propagation. For example, attempts to send a packet from the same buffer containing a received packet is often indicative of suspicious activity. While behavioral techniques are able to leverage large amounts of detailed context about application and system behavior, they can be expensive to manage and deploy ubiquitously. Moreover, end-host systems can, by definition, only detect an attack against a single host and not infer the presence of a large-scale outbreak. Clearly, from a management, cost and reuse standpoint, it is ideal to detect and block new attacks in the network. That said, end-host approaches offer a level of sensitivity that is difficult to match in the network and can be a useful complement – particularly for detecting potential slow or stealthy worms that do not leave a significant imprint on the network. 2.2 Characterization Characterization is the process of analyzing and identifying a new worm or exploit, so that targeted defenses may be deployed. One approach is to create a priori vulnerability signatures that match known exploitable vulnerabilities in deployed software [44, 45]. For example, a vulnerability signature for the Slammer worm might match all UDP traffic on port 1434 that is longer than 100 bytes. By searching for such traffic, either in the network or on the host, a new worm exploiting the same vulnerability will be revealed. This is very similar to traditional intrusion detection systems (IDS), such as Snort [1] and Bro [29], which compare traffic content to databases of strings used in known attacks. This general approach has the advantage that it can deployed before the outbreak of a new worm and therefore can offer an added measure of defense. However, this sort of proactive characterization can only be applied to vulnerabilitiesthat are already well-known and well-characterized manually. Further, the tradeoff between vulnerability signature specificity, complexity and false positives remains an open question. Wang et al’s Shield, is by far the best-known vulnerability blocking system and it focuses on an end-host implementation precisely to better manage some of these tradeoffs [44]. We do not consider this approach further in this paper, but we believe it can be a valuable complement to the automated signature extraction alternative we explore. The earliest automation for signature extraction is due to Kephart and Arnold [15]. Their system, used commercially by IBM, allows viruses to infect known “decoy” programs in a controlled environment, extracts the infected (i.e., modified) regions of the decoys and then uses a variety of heuristics to identify invariant code strings across infection instances. Among this set of candidates an “optimal” signature is determined by estimating the false positive probability against a measured corpus of
n-grams found in normal computer programs. This ap- lapping fixed-length content strings over each byte off proach is extremely powerful, but assumes the presence set) although it is not currently clear what the impact of of a known instance of a virus and a controlled environ ment to monitor The former limitation is partially addressed by the 2.3 Containment Honeycomb system of Kreibich and Crowcroft [17. Containment refers to the mechanism used to slow or stop the spread of an active worm. There are three Honeycomb is a host-based intrusion detection system containment mechanisms in use today: host quarantine, that automatically generates signatures by looking for longest common subsequences among sets of strings string-matching and connection throttling. Host quaran found in message exchanges. This basic procedure is tine is simply the act of preventing an infected host from similar to our own, but there are also important structural communicating with other hosts-typically implemented via Ip-level access control lists on routers or firewalls the most important of which is scale. Honeycomb is de- String-matching containment - typified by signature- signed for a host-based context with orders of magr tude less processing required. To put this in context, our matches network trattic against particular strings, or sig Earlybird system currently processes more traffic in one natures, of known worms and can then drop associated second than the prototype Honeycomb observed in 24 packets. To enable high-bandwidth deployments, sev- hours. However, one clear advantage offered by the host eral hardware vendors are now producing high-speed context is its natural imperviousness to network evasion String matching and regular expression checking chip techniques [30]. We discuss this issue further in Se for worm and virus filtering. lockwood et al. describe tion 7 application [ 19]. Finally, a different strategy, proposed Finally,over the last two years of Earlybird's devel- by Twycross and Williamson([43], is to proactively limit opment([34, 35, 37], the clearest parallels can be drawn the rate of all outgoing connections made by a machine to Kim and Karp's contemporaneously-developed Auto- and thereby slow-but not stop-the spread of any worm graph system [16]. Like Earlybird, Autograph also uses Their approach was proposed in a host context, but there network-level data to infer worm signatures and both systems employ Rabin fingerprints to index counters of is no reason such connection throttling cannot be applied content substrings and use white-lists to set aside wel at the network level as well known false positives. However, there are several im- In this paper, we assume the availability of portant differences as well. First, Autograph relies on matching containment (perhaps in concert wit tling) and our Earlybird prototype generates sig a prefiltering step that identifies flows with suspicious for a Snort in-line intrusion detection system -blocking scanning activity (particularly the number of unsuccess. all packets containing discovered worm signatures prevalence. By contrast, Earlybird measures the preva- 3 Defining Worm behavior lence of all content entering the network and only then Network worms, due to their distinct purpose, tend to be considers the addressing activity. This difference means have quite differently from the popular client-server and that Autograph cannot detect large classes of worms that peer-to-peer applications deployed on today's networks Earlybird can-including almost all e-mail borne worms, In this section we explore these key behaviors in more such as MyDoom, UDP-based worms such as Slammer, detail and how they can be exploited to detect and char- poofed source worms, or worms carried via IM or p2P acterize network worms clients. Second, Autograph has extensive support for distributed deployments-involving active cooperation 3.1 Content invariance between multiple sensors. By contrast, Earlybird has In all existing worms of which we are aware, some or focused almost entirely on the algorithmics required all of the worm program is invariant across every copy support a robust and scalable wire-speed implementation Typically, the entire worm program is identical across in a single sensor and only supports distribution through every host it infects. However, some worms make use a centralized aggregator. Third, Earlybird is an on-line of limited polymorphism- by encrypting each worm in- system that has been in near-production use for eight stance independently and/or randomizing filler text. In months and handles 200 megabits of live traffic, these cases, much of the worm body is variable, but key while, as described, Autograph is an off-line system that portions are still invariant(e. g, the decryption routine) has only been evaluated using traces. Finally, there are For the purposes of this paper, we assume that a worm many differences in the details of the algorithms used has some amount of invariant content or has relatively (e.g. Autograph breaks content into non-overlapping few variants. We discuss violations of this assumption in variable-length chunks while Earlybird Section 7
n-grams found in normal computer programs. This approach is extremely powerful, but assumes the presence of a known instance of a virus and a controlled environment to monitor. The former limitation is partially addressed by the Honeycomb system of Kreibich and Crowcroft [17]. Honeycomb is a host-based intrusion detection system that automatically generates signatures by looking for longest common subsequences among sets of strings found in message exchanges. This basic procedure is similar to our own, but there are also important structural and algorithmic differences between our two approaches, the most important of which is scale. Honeycomb is designed for a host-based context with orders of magnitude less processing required. To put this in context, our Earlybird system currently processes more traffic in one second than the prototype Honeycomb observed in 24 hours. However, one clear advantage offered by the host context is its natural imperviousness to network evasion techniques [30]. We discuss this issue further in Section 7. Finally, over the last two years of Earlybird’s development [34, 35, 37], the clearest parallels can be drawn to Kim and Karp’s contemporaneously-developed Autograph system [16]. Like Earlybird, Autograph also uses network-level data to infer worm signatures and both systems employ Rabin fingerprints to index counters of content substrings and use white-lists to set aside wellknown false positives. However, there are several important differences as well. First, Autograph relies on a prefiltering step that identifies flows with suspicious scanning activity (particularly the number of unsuccessful TCP connection attempts) before calculating content prevalence. By contrast, Earlybird measures the prevalence of all content entering the network and only then considers the addressing activity. This difference means that Autograph cannot detect large classes of worms that Earlybird can – including almost all e-mail borne worms, such as MyDoom, UDP-based worms such as Slammer, spoofed source worms, or worms carried via IM or P2P clients. Second, Autograph has extensive support for distributed deployments – involving active cooperation between multiple sensors. By contrast, Earlybird has focused almost entirely on the algorithmics required to support a robust and scalable wire-speed implementation in a single sensor and only supports distribution through a centralized aggregator. Third, Earlybird is an on-line system that has been in near-production use for eight months and handles over 200 megabits of live traffic, while, as described, Autograph is an off-line system that has only been evaluated using traces. Finally, there are many differences in the details of the algorithms used (e.g. Autograph breaks content into non-overlapping variable-length chunks while Earlybird manages overlapping fixed-length content strings over each byte offset) although it is not currently clear what the impact of these differences is. 2.3 Containment Containment refers to the mechanism used to slow or stop the spread of an active worm. There are three containment mechanisms in use today: host quarantine, string-matching and connection throttling. Host quarantine is simply the act of preventing an infected host from communicating with other hosts – typically implemented via IP-level access control lists on routers or firewalls. String-matching containment – typified by signaturebased network intrusion prevention syst ems (NIPS) – matches network traffic against particular strings, or signatures, of known worms and can then drop associated packets. To enable high-bandwidth deployments, several hardware vendors are now producing high-speed string matching and regular expression checking chips for worm and virus filtering. Lockwood et al. describe an FPGA-based research prototype programmed for this application [19]. Finally, a different strategy, proposed by Twycross and Williamson [43], is to proactively limit the rate of all outgoing connections made by a machine and thereby slow – but not stop – the spread of any worm. Their approach was proposed in a host context, but there is no reason such connection throttling cannot be applied at the network level as well. In this paper, we assume the availability of stringmatching containment (perhaps in concert with throttling) and our Earlybird prototype generates signatures for a Snort in-line intrusion detection system – blocking all packets containing discovered worm signatures. 3 Defining Worm Behavior Network worms, due to their distinct purpose, tend to behave quite differently from the popular client-server and peer-to-peer applications deployed on today’s networks. In this section we explore these key behaviors in more detail and how they can be exploited to detect and characterize network worms. 3.1 Content invariance In all existing worms of which we are aware, some or all of the worm program is invariant across every copy. Typically, the entire worm program is identical across every host it infects. However, some worms make use of limited polymorphism – by encrypting each worm instance independently and/or randomizing filler text. In these cases, much of the worm body is variable, but key portions are still invariant (e.g., the decryption routine). For the purposes of this paper, we assume that a worm has some amount of invariant content or has relatively few variants. We discuss violations of this assumption in Section 7
3.2 Content prevalence ProcessTraffic(payload, srclP, dstlP) ant portion of a worms content will appear frequ versely, content which is not prevalent will not represent ests)>Dst DispTh) if (payload in kno the invariant portion of a worm and therefore is not a 9 e urn useful candidate for constructing signatures Insert(payload, known Signatures) NewSignature Alarm(payload) 3.3 Address dispersion For the same reasons, the number of distinct hosts in- Figure 1: The idealized content sifting algorithm detects all fected by a worm will grow over time. Consequently, packet contents that are seen often enough and are comi packets containing a live worm will tend to reflect a vari- enough sources and going to enough destinations.The ty of different source and destination addresses. More- table is used are both parameters of the algorithm over, during a major outbreak, the number of such ad- dresses can grow extremely quickly. Finally, it is reason able to expect that the distribution of these addresses will be far can have significant clustering [9]. In this paper we only ke advantage of the first of these thre Stage 3 re believe there is potential value in considering all of them Figure 2: Multi-stage is hashed us- 4 Finding worm signatures ing hash function hl into Stage 2 table From these assumptions, we can conclude that network if all the hashed counters are ab this traffic will contain common substrings and will be an approximation error decreaseve shown that the number directed between a variety of different sources and desti- nations. While it is not yet clear that this characterization or not widely dispersed is sifted out, leaving only the is exclusively caused by worms, for now we will assume that identifying this traffic pattern is sufficient for detect worm-like content. However, while content sifting can correctly identify worm signatures, the basic algorithm ing worms. We examine the issue of false positives later we have described is far too inefficient to be practical. In in the paper. In principle, detecting this traffic pattern the next section we describe algorithms for approximat- is relatively straightforward. Figure 1 shows an ideal- ized algorithm that achieves this goal. For each network ing correctness in exchange for efficiency and practical- packet,the content is extracted and all substrings pro.ity cessed. Each substring is indexed into a prevalence table 5 Practical Content Sifting that increments a count field for a given substring each For automated signature extraction to scale to high-speed time it is found. In effect, this table implements a his- links, the algorithms must have small processing require togram of all observed substrings. To maintain a count of ments(ideally well-suited to parallelization), and small unique source and destination addresses, each table entry memory requirements. Finally, to allow arbitrary de also maintains two lists, containing IP addresses, that are ployment strategies, the algorithm should not depend searched and potentially updated each time a substring on having a symmetric vantage point in the network. count is incremented. Sorting this table on the substring To satisfy these requirements, we now describe scalable of the address li and accurate algorithms for estimating content preva of likely worm traffic. Better still, the table entries meet nce and address dispersion, and techniques for man- ing this worm behavior criteria are exactly those contain- aging CPU overload through smooth tradeoffs between ing the invariant substrings of the worm. It is these sub- detection time and overhead. For simplicity, in this sec strings that can be used as signatures to filter the worm tion we describe our algorithms at packet(and not flow out of legitimate network traf granularity We call this approach content sifting because it effec tively implements a high-pass filter on the contents of 5.1 Estimating content prevalence etwork traffic Network content which is not prevalent Identifying common content involves finding the packet As described earlier, the presence of"dark"IP addresses can also payloads that appear at least z times among the N pack- provide qualitatively strong evidence of worm-like behavior ets sent during a given interval. However, a table indexed
3.2 Content prevalence Since worms are designed foremost to spread, the invariant portion of a worm’s content will appear frequently on the network as it spreads or attempts to spread. Conversely, content which is not prevalent will not represent the invariant portion of a worm and therefore is not a useful candidate for constructing signatures. 3.3 Address dispersion For the same reasons, the number of distinct hosts infected by a worm will grow over time. Consequently, packets containing a live worm will tend to reflect a variety of different source and destination addresses. Moreover, during a major outbreak, the number of such addresses can grow extremely quickly. Finally, it is reasonable to expect that the distribution of these addresses will be far more uniform than typical network traffic which can have significant clustering [9].1 In this paper we only take advantage of the first of these three observations, but we believe there is potential value in considering all of them. 4 Finding worm signatures From these assumptions, we can conclude that network worms must generate significant traffic to spread and that this traffic will contain common substrings and will be directed between a variety of different sources and destinations. While it is not yet clear that this characterization is exclusively caused by worms, for now we will assume that identifying this traffic pattern is sufficient for detecting worms. We examine the issue of false positives later in the paper. In principle, detecting this traffic pattern is relatively straightforward. Figure 1 shows an idealized algorithm that achieves this goal. For each network packet, the content is extracted and all substrings processed. Each substring is indexed into a prevalence table that increments a count field for a given substring each time it is found. In effect, this table implements a histogram of all observed substrings. To maintain a count of unique source and destination addresses, each table entry also maintains two lists, containing IP addresses, that are searched and potentially updated each time a substring count is incremented. Sorting this table on the substring count and the size of the address lists will produce the set of likely worm traffic. Better still, the table entries meeting this worm behavior criteria are exactly those containing the invariant substrings of the worm. It is these substrings that can be used as signatures to filter the worm out of legitimate network traffic. We call this approach content sifting because it effectively implements a high-pass filter on the contents of network traffic. Network content which is not prevalent 1As described earlier, the presence of “dark” IP addresses can also provide qualitatively strong evidence of worm-like behavior. ProcessTraffic(payload,srcIP,dstIP) 1 prevalence[payload]++ 2 Insert(srcIP,dispersion[payload].sources) 3 Insert(dstIP,dispersion[payload].dests) 4 if (prevalence[payload]>PrevalenceTh 5 and size(dispersion[payload].sources)>SrcDispTh 6 and size(dispersion[payload].dests)>DstDispTh) 7 if (payload in knownSignatures) 8 return 9 endif 10 Insert(payload,knownSignatures) 11 NewSignatureAlarm(payload) 12 endif Figure 1: The idealized content sifting algorithm detects all packet contents that are seen often enough and are coming from enough sources and going to enough destinations. The value of the detection thresholds and the time window over which each table is used are both parameters of the algorithm. ✁✁✁✁✁ ✂✁✂✂✁✂✂✁✂✂✁✂✂✁✂✄✁✄✁✄ ✄✁✄✁✄ ✄✁✄✁✄ ✄✁✄✁✄ ✄✁✄✁✄ ☎✁☎☎✁☎☎✁☎☎✁☎☎✁☎ ✆✁✆✁✆ ✆✁✆✁✆ ✆✁✆✁✆ ✆✁✆✁✆ ✆✁✆✁✆ ✝✁✝✝✁✝✝✁✝✝✁✝✝✁✝ ✞✁✞✞✁✞✞✁✞✞✁✞✞✁✞ ✟✁✟✟✁✟✟✁✟✟✁✟✟✁✟ ✠✁✠✠✁✠✠✁✠✠✁✠✠✁✠ ✡✁✡✡✁✡✡✁✡✡✁✡✡✁✡ ☛✁☛☛✁☛☛✁☛☛✁☛☛✁☛ ☞✁☞☞✁☞☞✁☞☞✁☞☞✁☞ ✌✁✌✁✌ ✌✁✌✁✌ ✌✁✌✁✌ ✌✁✌✁✌ ✌✁✌✁✌ ✍✁✍✁✍ ✍✁✍✁✍ ✍✁✍✁✍ ✍✁✍✁✍ ✍✁✍✁✍ ✎✁✎✁✎✁✎✁✎✁✎ ✏✁✏✁✏✁✏✁✏✁✏✑✁✑✁✑✁✑✁✑✁✑✁✑✁✑✁✑ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✒✁✒✁✒✁✒✁✒✁✒✁✒✁✒ ✓✁✓✁✓✁✓✁✓✁✓✁✓✁✓✁✓✁✓✁✓✁✓✁✓✁✓ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔✁✔ ✕✁✕✁✕✁✕✁✕✁✕✁✕✁✕✁✕✁✕✁✕ ✖✁✖✁✖✁✖✁✖✁✖✁✖✁✖✁✖✁✖✁✖ ✗✁✗✁✗✁✗✁✗✁✗✁✗✁✗✁✗✁✗✁✗ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘✁✘ ✙✁✙✁✙✁✙✁✙ ✚✁✚✁✚✁✚✁✚ ✚✁✚✁✚✁✚✁✚ ✚✁✚✁✚✁✚✁✚ ✚✁✚✁✚✁✚✁✚ ✚✁✚✁✚✁✚✁✚ ✚✁✚✁✚✁✚✁✚ ✚✁✚✁✚✁✚✁✚ ✚✁✚✁✚✁✚✁✚ ✚✁✚✁✚✁✚✁✚ h2(F) ✛✁✛✁✛✁✛✁✛ ✜✁✜✁✜✁✜✁✜ h1(F) h3(F) Stage 3 Stage 2 Stage 1 payload Packet All Large? Figure 2: Multi-stage Filters. A piece of content is hashed using hash function h1 into a Stage 1 table, h2 into a Stage 2 table, etc and each table entry contains a counter that is incremented. If all the hashed counters are above the prevalence threshold, then the content string is saved for address dispersion measurements. In previous work we have shown that the probability of an approximation error decreases exponentially with the number of stages and consequently is extremely small in practice [10]. or not widely dispersed is sifted out, leaving only the worm-like content. However, while content sifting can correctly identify worm signatures, the basic algorithm we have described is far too inefficient to be practical. In the next section we describe algorithms for approximating correctness in exchange for efficiency and practicality. 5 Practical Content Sifting For automated signature extraction to scale to high-speed links, the algorithms must have small processing requirements (ideally well-suited to parallelization), and small memory requirements. Finally, to allow arbitrary deployment strategies, the algorithm should not depend on having a symmetric vantage point in the network. To satisfy these requirements, we now describe scalable and accurate algorithms for estimating content prevalence and address dispersion, and techniques for managing CPU overload through smooth tradeoffs between detection time and overhead. For simplicity, in this section we describe our algorithms at packet (and not flow) granularity. 5.1 Estimating content prevalence Identifying common content involves finding the packet payloads that appear at least x times among the N packets sent during a given interval. However, a table indexed