POMP:Protocol Oblivious SDN Programming with Automatic Multi-Table Pipelining Chunhui He Xinyu Feng School of Computer Science and Technology State Key Laboratory for Novel Software Technology University of Science and Technology of China Nanjing University hchunhuiemail.ustc.edu.cn xyfengenju.edu.cn Abstract-SDN programming has been challenging because 2)to design flow table layout and forwarding pipelines for programmers have to not only implement the control logic,but new generation SDN to achieve compact flow tables also handle low-level details such as the generation of flow tables and efficient forwarding.As we explain in Sec.III, and the communication between the controller and switches.New generation of SDN with protocol oblivious forwarding and multi- different forwarding pipelines may generate flow tables table pipelining introduces even more low-level details to consider with significantly different sizes We propose POMP,the first SDN programming environment 3)to implement parsers for packets with new header fields supporting both protocol oblivious forwarding and automatic to fit in the underlying protocol oblivious forwarding multi-table pipelining.POMP applies the static taint analysis mechanism.For instance,one needs to generate the technique to automatically infer compact and efficient multi-table pipelines from a data-plane agnostic network policy written by (offset,length)tuples required by POF [2]to the programmer.The runtime system tracks the execution of the locate packet fields. network policy,and automatically generates table entries.POMP There have been many languages proposed to simplify also introduces a novel notion of dependent labels in the taint analysis,which,combined with the runtime information of the SDN programming,but none of them address all the prob- network policy,can further reduce the number of table entries. lems above.Languages such as NetKAT [6],NetCore [7] Like P4,POMP supports protocol-oblivious programming by and Maple [5]try to address the first problem,but they providing a network protocol specification language.Parsers of are designed for Openflow and do not support multi-table packets can be automatically generated based on the protocol pipelining and protocol-oblivious forwarding.P4 [1]provides specification.POMP supports two main emerging SDN platforms, POF and P4,therefore network policies written in POMP are a header specification language for protocol independence. portable over any switches supporting POF or P4. Programmers can write a specification of the header format, from which P4 generate parsers automatically.But P4 is I.INTRODUCTION more of a low-level switch configuration language,and is not Software-Defined Networking (SDN)is a network archi- suitable for high-level controller programming. tecture that decouples control and forwarding.Distributed We propose POMP,a high-level programming environ- switches are managed by a logically-centralized controller, ment to simplify SDN programming.Following the idea of which can be implemented by software through SDN pro- Maple [5],POMP provides a set of APIs and a runtime system. gramming. Programmers can use the APIs to describe the network policy OpenFlow is the first SDN standard,but it only supports a by writing an algorithmic sequential program with a high-level predefined fixed set of networking protocols,and allows only a centralized view of the network environment.This high-level fixed single flow table.New generation of SDN platforms [1], network policy is data-plane agnostic.It is the runtime system [2],[3],[4]offer two new flexible features at the data-plane, that tracks the execution of the network policy and generates i.e.multi-table pipelining and protocol-oblivious forwarding. fow table entries automatically.POMP also makes significant The former allows multiple flow tables on switches to form a extensions to the above ideas in Maple to support automatic forwarding pipeline,which can be customized by users.The multi-table pipelining and protocol oblivious programming. latter supports customized packets for new protocols. As far as we know,POMP is the first high-level programming SDN programming has been challenging because the pro- environment that solves all the aforementioned problems.Our grammer has to not only implement the network policy (i.e. work on POMP makes the following new contributions: the control logic),but also handle low-level data-plane details. Specifically,one faces the following challenges: We apply the static taint analysis technique [8]to anal- yse the network policy's dependence over the fields of 1)to manually translate high-level network policies to flow table entries.It is error-prone,and makes network packets.Based on the fine-grained dependence relation we automatically generate compact and efficient multi- policies hard to write and read [5]. table pipelines.The analysis and the pipeline generation Corresponding author:Xinyu Feng.This work is supported in part by grants from National Natural Science Foundation of China(NSFC)under Grant Nos. IThe name POMP highlights the two key features of our work,i.e.protocol- 61379039and61632005. oblivious programming and multi-table pipelining
POMP: Protocol Oblivious SDN Programming with Automatic Multi-Table Pipelining Chunhui He School of Computer Science and Technology University of Science and Technology of China hchunhui@mail.ustc.edu.cn Xinyu Feng State Key Laboratory for Novel Software Technology Nanjing University xyfeng@nju.edu.cn Abstract—SDN programming has been challenging because programmers have to not only implement the control logic, but also handle low-level details such as the generation of flow tables and the communication between the controller and switches. New generation of SDN with protocol oblivious forwarding and multitable pipelining introduces even more low-level details to consider. We propose POMP, the first SDN programming environment supporting both protocol oblivious forwarding and automatic multi-table pipelining. POMP applies the static taint analysis technique to automatically infer compact and efficient multi-table pipelines from a data-plane agnostic network policy written by the programmer. The runtime system tracks the execution of the network policy, and automatically generates table entries. POMP also introduces a novel notion of dependent labels in the taint analysis, which, combined with the runtime information of the network policy, can further reduce the number of table entries. Like P4, POMP supports protocol-oblivious programming by providing a network protocol specification language. Parsers of packets can be automatically generated based on the protocol specification. POMP supports two main emerging SDN platforms, POF and P4, therefore network policies written in POMP are portable over any switches supporting POF or P4. I. INTRODUCTION Software-Defined Networking (SDN) is a network architecture that decouples control and forwarding. Distributed switches are managed by a logically-centralized controller, which can be implemented by software through SDN programming. OpenFlow is the first SDN standard, but it only supports a predefined fixed set of networking protocols, and allows only a fixed single flow table. New generation of SDN platforms [1], [2], [3], [4] offer two new flexible features at the data-plane, i.e. multi-table pipelining and protocol-oblivious forwarding. The former allows multiple flow tables on switches to form a forwarding pipeline, which can be customized by users. The latter supports customized packets for new protocols. SDN programming has been challenging because the programmer has to not only implement the network policy (i.e. the control logic), but also handle low-level data-plane details. Specifically, one faces the following challenges: 1) to manually translate high-level network policies to flow table entries. It is error-prone, and makes network policies hard to write and read [5]. Corresponding author: Xinyu Feng. This work is supported in part by grants from National Natural Science Foundation of China (NSFC) under Grant Nos. 61379039 and 61632005. 2) to design flow table layout and forwarding pipelines for new generation SDN to achieve compact flow tables and efficient forwarding. As we explain in Sec. III, different forwarding pipelines may generate flow tables with significantly different sizes. 3) to implement parsers for packets with new header fields to fit in the underlying protocol oblivious forwarding mechanism. For instance, one needs to generate the (offset, length) tuples required by POF [2] to locate packet fields. There have been many languages proposed to simplify SDN programming, but none of them address all the problems above. Languages such as NetKAT [6], NetCore [7] and Maple [5] try to address the first problem, but they are designed for Openflow and do not support multi-table pipelining and protocol-oblivious forwarding. P4 [1] provides a header specification language for protocol independence. Programmers can write a specification of the header format, from which P4 generate parsers automatically. But P4 is more of a low-level switch configuration language, and is not suitable for high-level controller programming. We propose POMP, a high-level programming environment to simplify SDN programming. Following the idea of Maple [5], POMP provides a set of APIs and a runtime system. Programmers can use the APIs to describe the network policy by writing an algorithmic sequential program with a high-level centralized view of the network environment. This high-level network policy is data-plane agnostic. It is the runtime system that tracks the execution of the network policy and generates flow table entries automatically. POMP also makes significant extensions to the above ideas in Maple to support automatic multi-table pipelining and protocol oblivious programming.1 As far as we know, POMP is the first high-level programming environment that solves all the aforementioned problems. Our work on POMP makes the following new contributions: • We apply the static taint analysis technique [8] to analyse the network policy’s dependence over the fields of packets. Based on the fine-grained dependence relation we automatically generate compact and efficient multitable pipelines. The analysis and the pipeline generation 1The name POMP highlights the two key features of our work, i.e. protocoloblivious programming and multi-table pipelining
From programmers'point of view,f(pkt,env)is in- header spec. network policy f(pkt,env) Taint Analysis voked upon the arrival of every packet pkt in each data- Parser Gen API Calls plane switch.f also takes a centralized view of the network environment env.It calculates the routing path and returns Parser Runtime System Pipeline the port number to forward the packet.The return value can packet-In flow entries also be 0 or negative,which means to drop the packet or P4 Compiler Switch Abstraction Layer to broadcast it,respectively.f may also call POMP APIs to modify the global environment or to modify fields of packets. P4 protocol. POF protocol What really happens is that the controller (the runtime P4 Switch POF Switch system)invokes f(pkt,env)when it receives a PacketIn Fig.1.Structure of POMP message,and passes the packet to f.The runtime then monitors the execution of f and generates an execution trace, which records the sequence of API calls made by f to read is done statically before the deployment of the controller, or update packet header fields or the environment data.Based therefore it does not introduce any runtime overhead. on the dependence between actions and values of packet fields We extend the taint analysis with a novel notion of reflected in the trace,the runtime automatically generates flow dependent labels,which further refines the dependence table entries and install them on switches. relation and allows it to be conditional upon the if So far the ideas all come from Maple,but Maple does not statement branches taken at runtime.Combining the support multi-table pipelining and protocol-oblivious forward- dependent labels with the runtime execution traces of the ing.As we demonstrate in Sec.III,the dependence derived network policy,the runtime system can further reduce the by Maple from the execution trace is way too imprecise and number of flow table entries.Our experiments show that, leads to flow tables with O(n2)forwarding rules for learning for learning switches,POMP generates flow tables with switches with n hosts,even though O(n)rules are sufficient. up to 47x fewer table entries and being 137x faster than POMP introduces a static taint analysis phase to analyse the the traditional single flow table (as generated in Maple). code of f,from which it infers the fine-grained dependence Inspired by P4,POMP incorporates a packet header spec- between actions and packet fields.Based on the fine-grained ification language,with which programmers can specify dependence,it automatically generates a multi-table pipeline. the format of packet headers.Based on the specification The runtime takes the multi-table pipeline into account and we can automatically generate parsers for packets.This generates table entries accordingly. makes POMP a protocol oblivious programming environ- To support protocol-oblivious programming,we follow the ment.Our runtime system can support two main emerging ideas in P4 and ask programmers to provide a header format SDN platforms:POF and P4.It makes network policies specification written in a specification language.Then the written in our language portable over any SDN chipsets parser generator automatically generates parsers for packets. and software switches that support POF or P4. POMP supports both POF and P4.For POF,the runtime In the remaining part of the paper,we give a system queries the parser to map the string field names to the overview and introduce a running example in Sec.II.We (offset,length)tuples.For P4,the generated parser present the taint analysis and the pipeline generation in and forwarding pipeline are fed into the P4 compiler to configure the switch.We also introduce a switch abstraction Sec.III,and propose the dependent labels and the runtime generation of table entries in Sec.IV.We then introduce the layer,which accepts P4 or POF messages separately and then packet parsing for protocol oblivious programming in Sec.V. converts them into a uniform format for the runtime system We show evaluation results in Sec.VI.Finally we discuss B.Learning Switches:a running example related work in Sec.VII and conclude in Sec.VIII. We use learning switches to demonstrate how the system II.OVERVIEW works.It is also used as a running example to illustrate the key ideas in the following sections.As shown in Fig.1,our system In this section we first give an overview of the system takes as inputs a protocol specification and a network policy structure of POMP.Then we introduce learning switches as f,which are shown in Figs.2 and 3 respectively.(Ignore the a running example used in the following sections. comments in Fig.3,which shows the result of taint analysis and is explained in Sec.II). A.System structure The protocol specification describes the eth and ipv4 Figure 1 shows the system structure.Following Maple,protocols.Each is defined by a header block.For each POMP allows programmers to implement the network policy header block,there are the fields section that describes as an algorithmic sequential program with a centralized view the fields of the protocol,and the next section that describes of the network environment.The network policy is imple-the inner protocols.For example,the eth protocol has three mented as a function f(pkt,env).It is a C program with fields:48 bits dst,48 bits src and 16 bits type.Depending API calls of the POMP library. on the values of type,the protocol follows the eth is ipv4
Fig. 1. Structure of POMP is done statically before the deployment of the controller, therefore it does not introduce any runtime overhead. • We extend the taint analysis with a novel notion of dependent labels, which further refines the dependence relation and allows it to be conditional upon the if statement branches taken at runtime. Combining the dependent labels with the runtime execution traces of the network policy, the runtime system can further reduce the number of flow table entries. Our experiments show that, for learning switches, POMP generates flow tables with up to 47x fewer table entries and being 137x faster than the traditional single flow table (as generated in Maple). • Inspired by P4, POMP incorporates a packet header specification language, with which programmers can specify the format of packet headers. Based on the specification we can automatically generate parsers for packets. This makes POMP a protocol oblivious programming environment. Our runtime system can support two main emerging SDN platforms: POF and P4. It makes network policies written in our language portable over any SDN chipsets and software switches that support POF or P4. In the remaining part of the paper, we give a system overview and introduce a running example in Sec. II. We present the taint analysis and the pipeline generation in Sec. III, and propose the dependent labels and the runtime generation of table entries in Sec. IV. We then introduce the packet parsing for protocol oblivious programming in Sec. V. We show evaluation results in Sec. VI. Finally we discuss related work in Sec. VII and conclude in Sec. VIII. II. OVERVIEW In this section we first give an overview of the system structure of POMP. Then we introduce learning switches as a running example used in the following sections. A. System structure Figure 1 shows the system structure. Following Maple, POMP allows programmers to implement the network policy as an algorithmic sequential program with a centralized view of the network environment. The network policy is implemented as a function f(pkt, env). It is a C program with API calls of the POMP library. From programmers’ point of view, f(pkt, env) is invoked upon the arrival of every packet pkt in each dataplane switch. f also takes a centralized view of the network environment env. It calculates the routing path and returns the port number to forward the packet. The return value can also be 0 or negative, which means to drop the packet or to broadcast it, respectively. f may also call POMP APIs to modify the global environment or to modify fields of packets. What really happens is that the controller (the runtime system) invokes f(pkt, env) when it receives a PacketIn message, and passes the packet to f. The runtime then monitors the execution of f and generates an execution trace, which records the sequence of API calls made by f to read or update packet header fields or the environment data. Based on the dependence between actions and values of packet fields reflected in the trace, the runtime automatically generates flow table entries and install them on switches. So far the ideas all come from Maple, but Maple does not support multi-table pipelining and protocol-oblivious forwarding. As we demonstrate in Sec. III, the dependence derived by Maple from the execution trace is way too imprecise and leads to flow tables with O(n 2 ) forwarding rules for learning switches with n hosts, even though O(n) rules are sufficient. POMP introduces a static taint analysis phase to analyse the code of f, from which it infers the fine-grained dependence between actions and packet fields. Based on the fine-grained dependence, it automatically generates a multi-table pipeline. The runtime takes the multi-table pipeline into account and generates table entries accordingly. To support protocol-oblivious programming, we follow the ideas in P4 and ask programmers to provide a header format specification written in a specification language. Then the parser generator automatically generates parsers for packets. POMP supports both POF and P4. For POF, the runtime queries the parser to map the string field names to the (offset, length) tuples. For P4, the generated parser and forwarding pipeline are fed into the P4 compiler to configure the switch. We also introduce a switch abstraction layer, which accepts P4 or POF messages separately and then converts them into a uniform format for the runtime system. B. Learning Switches: a running example We use learning switches to demonstrate how the system works. It is also used as a running example to illustrate the key ideas in the following sections. As shown in Fig. 1, our system takes as inputs a protocol specification and a network policy f, which are shown in Figs. 2 and 3 respectively. (Ignore the comments in Fig. 3, which shows the result of taint analysis and is explained in Sec. III). The protocol specification describes the eth and ipv4 protocols. Each is defined by a header block. For each header block, there are the fields section that describes the fields of the protocol, and the next section that describes the inner protocols. For example, the eth protocol has three fields: 48 bits dst, 48 bits src and 16 bits type. Depending on the values of type, the protocol follows the eth is ipv4
header eth fields header ipv4 "inport")combination,which leads to a packet-in message dst 48; fields sending to the controller,so that the controller can update src 48; type 16; tt1:8; the environment mac2port (line 6 in Fig.3). But still we do not need to exhaustively enumerate all the next select (type)( combination of“src”and“dst”(leading to O(n2)forwarding case 0x0800:ipv4; rules).Learning the topology needs to match "inport"and case 0x86dd:ipv6; "“src”but not“dst”,and the forwarding needs to match“dst" (and“ttl”for IPv44 packets)but not“src”.Each of the two start eth; functionalities requires O(n)rules respectively.Therefore we Fig.2.Header specification could greatly reduce the number of forwarding rules (i.e. flow table entries)if we can have multiple flow tables,each corresponds to one independent functionality only or ipv6.The start clause at the end indicates the name of the outmost protocol. To achieve this,we not only need the support of multi-table The network policy f is written in C with POMP primitives. forwarding pipelines on switches (which is available in the As shown in Fig.3,the f for learning switches first reads emerging SDN platforms),but also need to have more fine- the ingress port,the source address,the destination address grained dependence relation between the functionality and the packet header fields.Since Maple only tracks the execution and the type of the packet.It learns the source host,and remembers the ingress port in mac2port (line 6).Then it trace of f,it lets an action (e.g.the return of the value r looks up mac2port to find the port for destination (line 7) at line 23)depend on all earlier actions that read packets or and set the return value to the port number (line 9).If the environments (e.g.the read of "src"at line 4).This is overly result is 0,it means the port is unknown,so we set the return conservative and leads to imprecise dependence relation. value to negative (line 11)to broadcast the packet. As one of the major contributions of the work,we use For IPv4 packets,we also decrease thet(line 17)if it static taint analysis [to infer more fine-grained dependence is greater than 0.Otherwise we drop the packet by setting the relation.The analysis takes the function f as input and tracks return value to 0. the information flow from packets and environments to the Data-plane agnostic and protocol-oblivious network policy. actions that perform the functionalities (e.g.mod env (at Note that f does not describe forwarding pipelines and for- line 6),without actually running f.The result can be used to warding rules,which are automatically generated.Also it uses design multi-table forwarding pipelines. strings as field names to access packet header fields (e.g., read packet (pkt,"eth.dst")).Parsing is done au- A.Taint analysis tomatically to map field names to their offsets in packets. Taint analysis computes the information flow from sources III.PIPELINE GENERATION to sinks.In our settings,the sources are the ingress port, Following the idea of Maple by recording the execution packets'header fields,and environments.The sinks are the traces of f,we can derive the actions and their dependence operations that output information,including the return of over packet header fields,and generate a single flow table. routing decisions,and the update of packets and environments. Fig.4 shows the flow table for learning switches. Since all accesses of packets and environments must be made The match fields of the flow table come from through POMP APIs,they can be easily recognized in the read inport (and read packet (in the network code.In Table I,we list the APIs that obtain information policy f.There are two possible actions of the flow table.from the sources.We also assign labels to identify each The first is to modify the ttl field.and then to forward the source.Note that for test_equal(pkt,fld,v)(which packet.The second is to drop the packet. tests the equality between fld and v),we label the source However,such a flow table may cause unnecessarily large with test(fld)instead of fld.The latter means we need number of table entries.To see the problem,suppose there the exact value of the field fld,while the former means are n hosts in the network.When a host h;sends a packet we only care about certain property of fld.They indicate to hj,the controller has to install a flow entry that matches different number of entries in the flow table.Distinguish- “inport'”(represented as“inp”in the table),“src”and“dst”.ing them allows us to generate more compact forwarding This results in O(n)entries of the flow table. pipelines,which we explain below.The sinks are the API calls One may have noticed that,since the switch forwards of mod packet (pkt,fld,v)and mod_env(env, packets only based on the destination address,there is no var,key,v),and also the command return r. need to enumerate all combinations of"src"and "dst".An The process of the taint analysis propagates labels from apparent optimization might be omitting the match fields"src" sources to sinks.Our algorithm is mostly standard [8]except and"inport"by filling in*"in the table.But this is incorrect the extension with dependent labels.Here we only introduce because we do need to match the exact values of"src"and the rough idea of taint analysis based on the learning switches "inport"-we rely on a mismatch to detect a new ("src",example.Dependent labels are introduced later in Sec.IV
header eth { fields { dst : 48; src : 48; type : 16; } next select (type) { case 0x0800: ipv4; case 0x86dd: ipv6; ... } } header ipv4 { fields { ... ttl : 8; ... } ... } start eth; Fig. 2. Header specification or ipv6. The start clause at the end indicates the name of the outmost protocol. The network policy f is written in C with POMP primitives. As shown in Fig. 3, the f for learning switches first reads the ingress port, the source address, the destination address and the type of the packet. It learns the source host, and remembers the ingress port in mac2port (line 6). Then it looks up mac2port to find the port for destination (line 7) and set the return value to the port number (line 9). If the result is 0, it means the port is unknown, so we set the return value to negative (line 11) to broadcast the packet. For IPv4 packets, we also decrease the ttl (line 17) if it is greater than 0. Otherwise we drop the packet by setting the return value to 0. Data-plane agnostic and protocol-oblivious network policy. Note that f does not describe forwarding pipelines and forwarding rules, which are automatically generated. Also it uses strings as field names to access packet header fields (e.g., read_packet(pkt, "eth.dst")). Parsing is done automatically to map field names to their offsets in packets. III. PIPELINE GENERATION Following the idea of Maple by recording the execution traces of f, we can derive the actions and their dependence over packet header fields, and generate a single flow table. Fig. 4 shows the flow table for learning switches. The match fields of the flow table come from read_inport() and read_packet() in the network policy f. There are two possible actions of the flow table. The first is to modify the ttl field, and then to forward the packet. The second is to drop the packet. However, such a flow table may cause unnecessarily large number of table entries. To see the problem, suppose there are n hosts in the network. When a host hi sends a packet to hj , the controller has to install a flow entry that matches “inport” (represented as “in p” in the table), “src” and “dst”. This results in O(n 2 ) entries of the flow table. One may have noticed that, since the switch forwards packets only based on the destination address, there is no need to enumerate all combinations of “src” and “dst”. An apparent optimization might be omitting the match fields “src” and “inport” by filling in “*” in the table. But this is incorrect because we do need to match the exact values of “src” and “inport” — we rely on a mismatch to detect a new (“src”, “inport”) combination, which leads to a packet-in message sending to the controller, so that the controller can update the environment mac2port (line 6 in Fig. 3). But still we do not need to exhaustively enumerate all the combination of “src” and “dst” (leading to O(n 2 ) forwarding rules). Learning the topology needs to match “inport” and “src” but not “dst”, and the forwarding needs to match “dst” (and “ttl” for IPv4 packets) but not “src”. Each of the two functionalities requires O(n) rules respectively. Therefore we could greatly reduce the number of forwarding rules (i.e. flow table entries) if we can have multiple flow tables, each corresponds to one independent functionality only. To achieve this, we not only need the support of multi-table forwarding pipelines on switches (which is available in the emerging SDN platforms), but also need to have more finegrained dependence relation between the functionality and the packet header fields. Since Maple only tracks the execution trace of f, it lets an action (e.g. the return of the value r at line 23) depend on all earlier actions that read packets or environments (e.g. the read of “src” at line 4). This is overly conservative and leads to imprecise dependence relation. As one of the major contributions of the work, we use static taint analysis [8] to infer more fine-grained dependence relation. The analysis takes the function f as input and tracks the information flow from packets and environments to the actions that perform the functionalities (e.g. mod_env() at line 6), without actually running f. The result can be used to design multi-table forwarding pipelines. A. Taint analysis Taint analysis computes the information flow from sources to sinks. In our settings, the sources are the ingress port, packets’ header fields, and environments. The sinks are the operations that output information, including the return of routing decisions, and the update of packets and environments. Since all accesses of packets and environments must be made through POMP APIs, they can be easily recognized in the code. In Table I, we list the APIs that obtain information from the sources. We also assign labels to identify each source. Note that for test_equal(pkt, fld, v) (which tests the equality between fld and v), we label the source with test(fld) instead of fld. The latter means we need the exact value of the field fld, while the former means we only care about certain property of fld. They indicate different number of entries in the flow table. Distinguishing them allows us to generate more compact forwarding pipelines, which we explain below. The sinks are the API calls of mod_packet(pkt, fld, v) and mod_env(env, var, key, v), and also the command return r. The process of the taint analysis propagates labels from sources to sinks. Our algorithm is mostly standard [8] except the extension with dependent labels. Here we only introduce the rough idea of taint analysis based on the learning switches example. Dependent labels are introduced later in Sec. IV
if(pkt,env)【 2 inport read_inport (pkt); //inport<-{inport】 dst read_packet(pkt,"eth.dst"); //dst<-(dst】 src read packet (pkt,"eth.src"); /src <-(src) mod env(env,"mac2port",src,inport); /mod enve6 <-(src,inport) port read_env(env,"mac2port",dst); port <-(dst,env(mac2port)) if(port!=0){ /branche8 <-(dst,env(mac2port)) 9 上=port; /r <-(dst,env(mac2port)) else r =-inport; //r <-{dst,env(mac2port),inport) //r <-(dst,env(mac2port),inport) 14 if(test_equal(pkt,"eth.type",0x800))( /branche14 <-(test (type)) ttl read_packet (pkt,"eth.ipv4.ttl"); /ttl <-(test(type),ttl) if (!test_equal(pkt,"eth.ipv4.ttl",1)){//branche16 <-(test(ttl)) 17 mod_packet (pkt,"eth.ipv4.ttl",ttl -1);//mod_packete17 <-(test(type),ttl) 】else{ r=0: //r <(test (type),test(ttl)) 21 22 //r<-{dst, env(mac2port),inport,test (type),test(ttl)} return ri /return@23 <-(dst,env(mac2port),inport,test(type),test(ttl)) 24 Fig.3.Taint analysis for learning switches in POMP priority in_p src dst type ttl action merge the results.As an example,the label set for r at line 13 2 h2 hi 0x800 64 MOD FLD (tt1,63); is the union of the sets at lines 9 and 11 respectively.When OUT[1] 0x800 MOD FLD(ttl,63); analysing each branch,we also need to consider the label set OUT[2] of the boolean branch condition of the if statements.because h3 0x800 DROP of the implicit information flow caused by control dependence. Therefore the label set of variable r is the union of the label Fig.4.Flow table layout of learning switches sets of both inport and port. TABLE I B.Xgraphs and Pipeline Generation SOURCES AND SINKS Although the taint analysis generates dependence for each Source Label sink (at lines 6,17 and 23),we also need to maintain the read inport (pkt) inport control flow to decide the order of these actions in the read_packet(pkt,fld) fld forwarding pipeline.We let the taint analysis generate Xgraph, test_equal(pkt,fld,v) test(fld) read_env(env,var,key) env(var) an intermediate representation of both the control flow and (a)Sources and Labels the dependence (i.e.label sets).The Xgraph for our example Sink is shown in Fig.5.There are two types of nodes.The mod packet (pkt,fld,v) mod_env(env,var,key,v) square node represents an action corresponding to a sink. return r It records the name and the line number of the action (e.g. (b)Sinks mod_enve6 in the first node),and the corresponding label set (e.g.{src,inport in the first node).Each diamond node represents a branch.It records the label set of the branch Comments in Fig.3 demonstrates how we trace the infor- expression and the line number of the branch.The edges in mation flow.Every variable is assigned to a set of labels, the Xgraph represent the control flow. recording the information flowing into the variable Given the Xgraph,we can do a"node to node"translation For assignment statements,the label set of the variable on to generate the multi-stage forwarding pipeline.Fig.6 shows the left hand side is the union of the label sets of the variables the pipeline generated from Fig.5.For each square node in on the right and the set of sources accessed.For example, the Xgraph,we generate a flow table for the functionality.The the variable dst is assigned the label set fdst at line 7 match fields of the flow table are the packet header fields and (packet field prefix omitted in the label set to avoid clutter).the ingress port in the dependence set of the corresponding and port at line 4 is assigned {env(mac2port),dst}.Xgraph node.The action is translated from the corresponding which is obtained by fenv (mac2port)}Ulabel(dst). controller action in the Xgraph node,but not necessarily The label set of a sink is the union of the label set of its the same.For example,we translate the controller action arguments.The modenv (at line 6 is a sink.Its label set mod_env (in Fig.5 into a [GOTO]action,which does is the union of the label sets of src and inport. nothing and jumps to the next flow table on the pipeline.As For if statements,we analyse both branches and then we explained before,the only effect of this table is to generate
1 f(pkt, env) { 2 inport = read_inport(pkt); // inport <- {inport} 3 dst = read_packet(pkt, "eth.dst"); // dst <- {dst} 4 src = read_packet(pkt, "eth.src"); // src <- {src} 5 6 mod_env(env, "mac2port", src, inport); // mod_env@6 <- {src, inport} 7 port = read_env(env, "mac2port", dst); // port <- {dst, env(mac2port)} 8 if (port != 0) { // branch@8 <- {dst, env(mac2port)} 9 r = port; // r <- {dst, env(mac2port)} 10 } else { 11 r = -inport; // r <- {dst, env(mac2port), inport} 12 } 13 // r <- {dst, env(mac2port), inport} 14 if(test_equal(pkt, "eth.type", 0x800)) { // branch@14 <- {test(type)} 15 ttl = read_packet(pkt, "eth.ipv4.ttl"); // ttl <- {test(type), ttl} 16 if (!test_equal(pkt, "eth.ipv4.ttl", 1)) { // branch@16 <- {test(ttl)} 17 mod_packet(pkt, "eth.ipv4.ttl", ttl - 1); // mod_packet@17 <- {test(type), ttl} 18 } else { 19 r = 0; // r <- {test(type), test(ttl)} 20 } 21 } 22 // r <- {dst, env(mac2port), inport, test(type), test(ttl)} 23 return r; // return@23 <- {dst, env(mac2port), inport, test(type), test(ttl)} 24 } Fig. 3. Taint analysis for learning switches in POMP priority in p src dst type ttl action 1 2 h2 h1 0x800 64 MOD_FLD(ttl,63); OUT[1] 1 1 h1 h2 0x800 64 MOD_FLD(ttl,63); OUT[2] 1 1 h1 h3 0x800 1 DROP ... ... ... ... ... ... Fig. 4. Flow table layout of learning switches TABLE I SOURCES AND SINKS Source Label read_inport(pkt) inport read_packet(pkt, fld) fld test_equal(pkt, fld, v) test(fld) read_env(env, var, key) env(var) (a) Sources and Labels Sink mod_packet(pkt, fld, v) mod_env(env, var, key, v) return r (b) Sinks Comments in Fig. 3 demonstrates how we trace the information flow. Every variable is assigned to a set of labels, recording the information flowing into the variable. For assignment statements, the label set of the variable on the left hand side is the union of the label sets of the variables on the right and the set of sources accessed. For example, the variable dst is assigned the label set {dst} at line 7 (packet field prefix omitted in the label set to avoid clutter), and port at line 4 is assigned {env(mac2port), dst}, which is obtained by {env(mac2port)} ∪ label(dst). The label set of a sink is the union of the label set of its arguments. The mod_env() at line 6 is a sink. Its label set is the union of the label sets of src and inport. For if statements, we analyse both branches and then merge the results. As an example, the label set for r at line 13 is the union of the sets at lines 9 and 11 respectively. When analysing each branch, we also need to consider the label set of the boolean branch condition of the if statements, because of the implicit information flow caused by control dependence. Therefore the label set of variable r is the union of the label sets of both inport and port. B. Xgraphs and Pipeline Generation Although the taint analysis generates dependence for each sink (at lines 6, 17 and 23), we also need to maintain the control flow to decide the order of these actions in the forwarding pipeline. We let the taint analysis generate Xgraph, an intermediate representation of both the control flow and the dependence (i.e. label sets). The Xgraph for our example is shown in Fig. 5. There are two types of nodes. The square node represents an action corresponding to a sink. It records the name and the line number of the action (e.g. mod_env@6 in the first node), and the corresponding label set (e.g. {src, inport} in the first node). Each diamond node represents a branch. It records the label set of the branch expression and the line number of the branch. The edges in the Xgraph represent the control flow. Given the Xgraph, we can do a “node to node” translation to generate the multi-stage forwarding pipeline. Fig. 6 shows the pipeline generated from Fig. 5. For each square node in the Xgraph, we generate a flow table for the functionality. The match fields of the flow table are the packet header fields and the ingress port in the dependence set of the corresponding Xgraph node. The action is translated from the corresponding controller action in the Xgraph node, but not necessarily the same. For example, we translate the controller action mod_env() in Fig. 5 into a [GOTO] action, which does nothing and jumps to the next flow table on the pipeline. As we explained before, the only effect of this table is to generate
/learning topology * C.Optimizations depends:(src,inport) action:mod env6 The generated pipeline can be further optimized.From Fig.6 we can see the match fields in different flow tables have branch(test(type))@14 overlaps,which may cause these fields to be matched multiple times.If we can merge some of these tables into one,we can branch(test(ttl))@16 reduce the redundancy.Another advantage for the merge is to reduce the length of the pipeline,so that the execution time is /*decreasing TTL * reduced accordingly.However,we have to be careful with the depends:(test(type),ttl) merge to avoid unnecessary combination of different match action:mod_packet 17 fields.as shown in Fig.4. /forwarding packets * We do the optimization by first trying to merge the Xgraph depends:(dst,env(mac2port),inport,test(type),test(ttl)> nodes.We consider two situations.In the first situation,we action:return23 merge adjacent square nodes n and n2 if the dependence set Fig.5.The Xgraph for the example of one is a subset of the other,i.e.n.depends n2.depends or n2.depends En.depends,where L EL def Vh ELn.32 L2.E tid:0 fields:(src,inport) h l def h =la V(dh =test (f1d)Al=f1d). actions:{[GOTO]) As we explain before,since the label test (fld)refers to tid:1 certain properties of the value of the field only,while fld fields:(type) relies on the exact value,the former contains less information actions:([GOTO]) than the latter and thus we let test(fld)fld.The new node after the merge simply contains the union of the tid:2 fields:(ttl) dependence set and the union of the actions in n and n2. actions:{[GOTO]) In the second situation,we merge the diamond node b and the square nodes n and n2 in its branches,if each branch tid:3 has at most one square node(there could be an empty branch fields:(type,ttl) with no nodes,as shown in Fig.5),and the dependence sets of actions:{[MOD_FIELD(ttl);GOTO]) these two branches overlap.More formally,we do the merge if n.depends En2.depends or n2.depends En.depends, bd:4 fields:{dst,inport,type,ttl) (if there is an empty branch,we can view it as a dummy node actions:{[OUTPUT可,[DROP} with empty dependence set).Then we merge the three nodes (b,ni and n2)into n,where n.depends n1.depends U Fig.6.Multi-stage pipeline n2.depends Ub.depends.The action set of n is also a union of those of n and n2.For example,in Fig.5 the action of decreasing TTL at line 17 can share a flow table with the mismatches (and the corresponding PacketIn messages)for sibling empty branch,so that we can merge the square node fresh("src","inport")pairs and to let the controller to receive with its parent diamond node. the packet and execute the mod env ()action.The controller We repeat the above processes until there are no more action mod packet (is translated to the flow table ac- nodes that can be merged.For the Xgraph in Fig.5,we tion [MOD_FLD;GOTO],which modifies the corresponding eventually merge the two diamond nodes and the square node packet field and then jumps to the next table.The return for decreasing TTL.The new Xgraph has three square nodes action in the Xgraph is translated into a corresponding OUT only.The merge does not increase the number of table entries, or DROP flow table action. because it does not introduce new combination of fields. We also generate a flow table for each diamond node.The Then we translate the merged Xgraph into a forwarding match fields are the fields that the branch depends on.The pipeline.For the Xgraph in Fig.5,after merging we get an action of the table is [GOTO].The flow table can jump to optimized pipeline of flow tables shown in Fig.7.In addition different flow tables based on the values of the match fields. to the table layout,we also show some table entries to help Because we generate a separate flow table for each inde- understanding how this pipeline works(the generation of table pendent functionality,and our taint-analysis generates more entries is explained in the next section).Table 0 depends fine-grained dependence relation than Maple,we can avoid only on "inport"and "src",and the action is to jump to enumerating values of unnecessary combination of match Table 1 directly.Table I depends on the exact values of"ttl", fields.For learning switches,we can avoid the O(n2)table decrements it (if the value is greater than 1)and jumps to Table entries in Fig.4.The total number of possible entries of the 2,which then either drops the packet or send it out depending pipeline in Fig.6 is O(n). on whether“ttl”is1 or not
/* learning topology */ depends: {src, inport} action: mod_env@6 branch(test(type))@14 branch(test(ttl))@16 /* forwarding packets */ depends: {dst, env(mac2port), inport, test(type), test(ttl)} action: return@23 /* decreasing TTL */ depends: {test(type), ttl} action: mod_packet@17 Fig. 5. The Xgraph for the example tid: 0 fields: {src, inport} actions: {[GOTO]} tid: 1 fields: {type} actions: {[GOTO]} tid: 2 fields: {ttl} actions: {[GOTO]} tid: 4 fields: {dst, inport, type, ttl} actions: {[OUTPUT], [DROP]} tid: 3 fields: {type, ttl} actions: {[MOD_FIELD(ttl); GOTO]} Fig. 6. Multi-stage pipeline mismatches (and the corresponding PacketIn messages) for fresh (“src”, “inport”) pairs and to let the controller to receive the packet and execute the mod_env() action. The controller action mod_packet() is translated to the flow table action [MOD_FLD; GOTO], which modifies the corresponding packet field and then jumps to the next table. The return action in the Xgraph is translated into a corresponding OUT or DROP flow table action. We also generate a flow table for each diamond node. The match fields are the fields that the branch depends on. The action of the table is [GOTO]. The flow table can jump to different flow tables based on the values of the match fields. Because we generate a separate flow table for each independent functionality, and our taint-analysis generates more fine-grained dependence relation than Maple, we can avoid enumerating values of unnecessary combination of match fields. For learning switches, we can avoid the O(n 2 ) table entries in Fig. 4. The total number of possible entries of the pipeline in Fig. 6 is O(n). C. Optimizations The generated pipeline can be further optimized. From Fig. 6 we can see the match fields in different flow tables have overlaps, which may cause these fields to be matched multiple times. If we can merge some of these tables into one, we can reduce the redundancy. Another advantage for the merge is to reduce the length of the pipeline, so that the execution time is reduced accordingly. However, we have to be careful with the merge to avoid unnecessary combination of different match fields, as shown in Fig. 4. We do the optimization by first trying to merge the Xgraph nodes. We consider two situations. In the first situation, we merge adjacent square nodes n1 and n2 if the dependence set of one is a subset of the other, i.e. n1.depends v n2.depends or n2.depends v n1.depends, where L1 v L2 def = ∀l1 ∈ L1. ∃l2 ∈ L2. l1 v l2 l1 v l2 def = l1 = l2 ∨ (∃fld.l1 = test(fld) ∧ l2 = fld). As we explain before, since the label test(fld) refers to certain properties of the value of the field only, while fld relies on the exact value, the former contains less information than the latter and thus we let test(fld) v fld. The new node after the merge simply contains the union of the dependence set and the union of the actions in n1 and n2. In the second situation, we merge the diamond node b and the square nodes n1 and n2 in its branches, if each branch has at most one square node (there could be an empty branch with no nodes, as shown in Fig. 5), and the dependence sets of these two branches overlap. More formally, we do the merge if n1.depends v n2.depends or n2.depends v n1.depends, (if there is an empty branch, we can view it as a dummy node with empty dependence set). Then we merge the three nodes (b, n1 and n2) into n, where n.depends = n1.depends ∪ n2.depends ∪ b.depends. The action set of n is also a union of those of n1 and n2. For example, in Fig. 5 the action of decreasing TTL at line 17 can share a flow table with the sibling empty branch, so that we can merge the square node with its parent diamond node. We repeat the above processes until there are no more nodes that can be merged. For the Xgraph in Fig. 5, we eventually merge the two diamond nodes and the square node for decreasing TTL. The new Xgraph has three square nodes only. The merge does not increase the number of table entries, because it does not introduce new combination of fields. Then we translate the merged Xgraph into a forwarding pipeline. For the Xgraph in Fig. 5, after merging we get an optimized pipeline of flow tables shown in Fig. 7. In addition to the table layout, we also show some table entries to help understanding how this pipeline works (the generation of table entries is explained in the next section). Table 0 depends only on “inport” and “src”, and the action is to jump to Table 1 directly. Table 1 depends on the exact values of “ttl”, decrements it (if the value is greater than 1) and jumps to Table 2, which then either drops the packet or send it out depending on whether “ttl” is 1 or not