1 Determining Source-Destination Connectivity in Uncertain Networks:Modeling and Solutions Luoyi Fu,Xinzhe Fu,Zhiying Xu2,Qianyang Pengl,Xinbing Wang2,and Songwu Lu3 Dept.of Computer Science,Shanghai Jiao Tong University,China. Email:{yiluofu,fxz0114,az950512,xwang8 @sjtu.edu.cn. 2Dept.of Electrical Engineering,Shanghai Jiao Tong University,China.Email:{xuzhiying}@sjtu.edu.cn 3Dept.of Computer Science,University of California,Los Angeles,USA.Email:{slu}@cs.ucla.edu Abstract-Determination of source-destination connectivity in the networks investigated are modeled as deterministic graphs networks has long been a fundamental problem,where most 4],[5]with the source-destination connectivity problems existing works are based on deterministic graphs that overlook transformed to the corresponding graph reachability problems. the inherent uncertainty in network links.To overcome such limitation,this paper models the network as an uncertain graph However,as indeterminacy plagues in our life,deterministic where each edge e exists independently with some probability graph often fails to serve as a suitable model for networks p(e).The problem examined is that of determining whether nowadays.Usually,we do not have certain knowledge of a given pair of nodes,a source s and a destination t,are existence of network links.For instance,in social networks. connected by a path or separated by a cut.Assuming that during due to the variability of social ties [6],the relations between each determining process we are associated with an underlying graph,the existence of each edge can be unraveled through edge network nodes may not be known in advance:in communi- testing at a cost of c(e).Our goal is to find an optimal strategy cation systems,established connections between nodes may incurring the minimum expected testing cost with the expectation frequently fail because of the unreliability of data links [7]. taken over all possible underlying graphs that form a product [8].It has also been pointed out that more than 90%of distribution. network links are observed to be unreliable [10].Consequent- Formulating it into a combinatorial optimization problem, we first characterize the computational complexity of optimally ly,we may not obtain deterministic network configuration determining source-destination connectivity in uncertain graphs. from the predesigned topology;sometimes we even have to Specifically,through proving the NP-hardness of two closely intentionally blur the links for privacy reasons [11].All those related problems,we show that,contrary to its counterpart in factors impose a need to incorporate uncertainty into the deterministic graphs,this problem cannot be solved in polynomial network,which can essentially be modeled as an uncertain time unless P=NP.Driven by the necessity of designing an exact algorithm,we then apply the Markov Decision Process graph [11],where,instead of appearing deterministically,each framework to give a dynamic programming algorithm that edge is associated with some prior existence probability.The derives the optimal strategies.As the exact algorithm may have existence probabilities not only are symbols of uncertainty,but prohibitive time complexity in practical situations,we further also bear important attributes of network links.Take social propose two more efficient approximation schemes compromising the optimality.The first one is a simple greedy approach with network again for example.These probabilities may represent linear approximation ratio.Interestingly,we show that naive as the confidence of link prediction [12],or the strength of the it is,it enjoys significantly better performance guarantee than influence that a node has on the other [2].In communication some other seemingly more sophisticated algorithms.Second,by networks such as data center networks,these probabilities harnessing the submodularity of the problem,we further design a reflect the failure frequency of communication links [7]. more elaborate algorithm with better approximation ratio.The When the graph is uncertain,traditional methods such as effectiveness of the proposed algorithms are justified through extensive simulations on three real network datasets,from which depth-first-traversal,breadth-first-traversal and graph labeling we demonstrate that the proposed algorithms yield strategies with are no longer suitable for determining the source-destination smaller expected cost than conventional heuristics. connectivity of networks due to the lack of deterministic information on edges'existence.To hedge the uncertainty,we I.INTRODUCTION need to test the edges to determine whether they truly exist or not.However,such edge testing involves far more complicated Source and destination connectivity of networks has sig- procedures than simply identifying uncertain links and thus nificant applications in real life.It concerns crucial issues may turn out to be more costly.For example,in citation such as reliability,routing,information diffusion [1],[2],etc. networks,we can establish probabilistic relationships between Hence,in the past few decades,a lot of research has been papers just by reference information.In contrast,to unravel the dedicated to this problem [3],4,5]and there have been genuine relation between papers,we have to apply advanced many efficient algorithms proposed under various types of data mining approaches which involves considerably more networks.A common feature shared by all those works is that intensive computation.Consequently,it is extremely desirable The early version of this paper is to appear in the Proceedings of IEEE to test the most cost-effective edges,i.e.,to design a testing INFOCOM 2017 [41]. strategy that determines the source-destination connectivity of
1 Determining Source-Destination Connectivity in Uncertain Networks: Modeling and Solutions Luoyi Fu1 , Xinzhe Fu1 , Zhiying Xu2 , Qianyang Peng1 , Xinbing Wang1,2, and Songwu Lu3 1Dept. of Computer Science, Shanghai Jiao Tong University, China. Email:{yiluofu,fxz0114,az950512,xwang8}@sjtu.edu.cn. 2Dept. of Electrical Engineering, Shanghai Jiao Tong University, China. Email:{xuzhiying}@sjtu.edu.cn 3Dept. of Computer Science, University of California, Los Angeles, USA. Email:{slu}@cs.ucla.edu Abstract—Determination of source-destination connectivity in networks has long been a fundamental problem, where most existing works are based on deterministic graphs that overlook the inherent uncertainty in network links. To overcome such limitation, this paper models the network as an uncertain graph where each edge e exists independently with some probability p(e). The problem examined is that of determining whether a given pair of nodes, a source s and a destination t, are connected by a path or separated by a cut. Assuming that during each determining process we are associated with an underlying graph, the existence of each edge can be unraveled through edge testing at a cost of c(e). Our goal is to find an optimal strategy incurring the minimum expected testing cost with the expectation taken over all possible underlying graphs that form a product distribution. Formulating it into a combinatorial optimization problem, we first characterize the computational complexity of optimally determining source-destination connectivity in uncertain graphs. Specifically, through proving the NP-hardness of two closely related problems, we show that, contrary to its counterpart in deterministic graphs, this problem cannot be solved in polynomial time unless P=NP. Driven by the necessity of designing an exact algorithm, we then apply the Markov Decision Process framework to give a dynamic programming algorithm that derives the optimal strategies. As the exact algorithm may have prohibitive time complexity in practical situations, we further propose two more efficient approximation schemes compromising the optimality. The first one is a simple greedy approach with linear approximation ratio. Interestingly, we show that naive as it is, it enjoys significantly better performance guarantee than some other seemingly more sophisticated algorithms. Second, by harnessing the submodularity of the problem, we further design a more elaborate algorithm with better approximation ratio. The effectiveness of the proposed algorithms are justified through extensive simulations on three real network datasets, from which we demonstrate that the proposed algorithms yield strategies with smaller expected cost than conventional heuristics. I. INTRODUCTION Source and destination connectivity of networks has significant applications in real life. It concerns crucial issues such as reliability, routing, information diffusion [1], [2], etc. Hence, in the past few decades, a lot of research has been dedicated to this problem [3], [4], [5] and there have been many efficient algorithms proposed under various types of networks. A common feature shared by all those works is that The early version of this paper is to appear in the Proceedings of IEEE INFOCOM 2017 [41]. the networks investigated are modeled as deterministic graphs [4], [5] with the source-destination connectivity problems transformed to the corresponding graph reachability problems. However, as indeterminacy plagues in our life, deterministic graph often fails to serve as a suitable model for networks nowadays. Usually, we do not have certain knowledge of existence of network links. For instance, in social networks, due to the variability of social ties [6], the relations between network nodes may not be known in advance; in communication systems, established connections between nodes may frequently fail because of the unreliability of data links [7], [8]. It has also been pointed out that more than 90% of network links are observed to be unreliable [10]. Consequently, we may not obtain deterministic network configuration from the predesigned topology; sometimes we even have to intentionally blur the links for privacy reasons [11]. All those factors impose a need to incorporate uncertainty into the network, which can essentially be modeled as an uncertain graph [11], where, instead of appearing deterministically, each edge is associated with some prior existence probability. The existence probabilities not only are symbols of uncertainty, but also bear important attributes of network links. Take social network again for example. These probabilities may represent the confidence of link prediction [12], or the strength of the influence that a node has on the other [2]. In communication networks such as data center networks, these probabilities reflect the failure frequency of communication links [7]. When the graph is uncertain, traditional methods such as depth-first-traversal, breadth-first-traversal and graph labeling are no longer suitable for determining the source-destination connectivity of networks due to the lack of deterministic information on edges’ existence. To hedge the uncertainty, we need to test the edges to determine whether they truly exist or not. However, such edge testing involves far more complicated procedures than simply identifying uncertain links and thus may turn out to be more costly. For example, in citation networks, we can establish probabilistic relationships between papers just by reference information. In contrast, to unravel the genuine relation between papers, we have to apply advanced data mining approaches which involves considerably more intensive computation. Consequently, it is extremely desirable to test the most cost-effective edges, i.e., to design a testing strategy that determines the source-destination connectivity of
3 uncertain networks incurring minimum cost.Furthermore,to heuristics as they achieve better tradeoff between the fully utilize the results of previous tests,the strategy should be complexity of the algorithm and the optimality of the adaptive,which means that we may determine the next edge to solutions. test based on the edge existence information we have already The rest of the paper is organized as follows.we review acquired through previous tests.And we defer a more detailed related studies in SectionⅡ.In SectionⅡ,.we formally explanation of how the problem of interest can be applied to introduce the definitions and notations related to our problem. other realistic scenarios to the end of Section III-B. In Section IV,we investigate the computational complexity In this paper,we are thus motivated to present a first of the problem.We present our exact dynamic programming look into the problem of determining source-destination con- algorithm based on Markov Decision Process framework in nectivity in uncertain networks.Given a network modeled Section V.In Section VI,we present the two approximation as an uncertain graph with each edge associated with an algorithms and we evaluate our algorithms on real life data existence probability and a testing cost,together with two in Section VII.We give concluding remarks as well as future network nodes s,t designated as source and destination,we directions in Section VIII. aim to derive efficient strategy specifying which edges to test so that we can verify whether s and t are connected II.RELATED WORK by a path or separated by a cut with the minimum cost incurred.Note that the source and destination connectivity 1)Uncertain Networks:Uncertain network has been under is also referred to as s-t connectivity.Comparing with s-t intensive study for long.However,instead of verifying the connectivity in deterministic graphs that can be easily solved existence of some structures in uncertain networks,most by graph traversal methods in polynomial time,by proving the efforts have been devoted to calculating the existence prob- NP-hardness of the problem,we find that the s-t connectivity ability of those structures.One of the fundamental problems in uncertain graphs turns out to be far more complicated in that regard is the network reliability problem,which asks and highly non-trivial.Driven by the necessity of pursuing the probability that uncertain networks are connected [1]. exact algorithms that can capture the features of the optimal Following this.Jin et al.consider the distance-constrained solutions,we proceed by converting our problem into an reachability,i.e..the probability that two nodes are connected equivalent Markov Decision Process (MDP)to give a dynamic by a path shorter than a predefined threshold in an uncertain programming algorithm that yields optimal strategies but has network [13].The work in [14]focuses on discovering sub- exponential running time.Considering that the prohibitive graphs with high reliability measure.In recent years,other time complexity of such exact algorithm renders it unsuitable types of study on uncertain networks(graphs)include reliable for practical applications,we therefore design approximation topology design [15],extracting representative subgraphs for schemes to compromise the optimality of computed strategy the acceleration of various querying processes [16],perfor- for the efficiency of the algorithms.In doing so,we first put mance analysis of unreliable wireless networks [8]as well as forward a simple greedy approach that computes near optimal connectivity and resilience of secure sensor networks [9]. solutions with linear approximation guarantee,which can be The modeling of uncertain networks in the present work is further improved by a second algorithm we propose through similar to random graph which was first introduced by Erdos the exploration of submodularity in our problem. and Renyi in [17].Despite of this similarity,the problems Our key contributions are summarized as follows: investigated are quite different.Specifically,previous works Theory:We formally define the problem of determining on random graphs [17][18]is dedicated to the analysis of s-t connectivity in uncertain networks.We prove compu- model property in an asymptotic sense where the number tational complexity-theoretic results of the problem show-of nodes goes to infinity.In contrast,our focus here lies in ing that it cannot be solved in polynomial time unless the combinatorial optimization problem formulated from the P=NP.The results provide useful insights to the inherent determination of source-destination connectivity in uncertain hardness and combinatoric nature of our problem. networks,with the model serving as a mean to characterize Algorithm:We derive an exact dynamic programming the uncertainty in networks and a parsimonious media for algorithm by converting our problem into an equivalent fi- extracting the essence of the problem. nite horizon Markov Decision Process.To further counter 2)Sequential Testing:The nature of our problem is analo- the problem,we design two approximation schemes.The gous to a class of sequential testing problems which involves first one is a simple greedy approach and we show that diagnosing a system by determining the states of its compo- naive as it is,it can provide non-trivial performance nents through a series of tests.The dependency of the whole guarantee.More surprisingly,its performance is far better system on its components'states can be given by explicit than some other more complicated algorithms.Then,we function or via an oracle.Existing results include optimal further improve the approximation ratio of the greedy diagnosing strategies on series and parallel systems,double algorithm by utilizing the submodularity of the problem regular systems,etc.See [19]for a comprehensive review.A in the second algorithm. special class of sequential testing problems called Stochastic Application:We demonstrate the effectiveness of our Boolean Function Evaluation (SBFE)have close connection algorithms on practical applications through extensive to our problem.In SBFE,each component has two states simulations with real network datasets.It is shown that and thus can be considered as a Boolean variable and the our proposed algorithms are superior to the conventional system is given by a Boolean function.The works in [20]and
2 uncertain networks incurring minimum cost. Furthermore, to fully utilize the results of previous tests, the strategy should be adaptive, which means that we may determine the next edge to test based on the edge existence information we have already acquired through previous tests. And we defer a more detailed explanation of how the problem of interest can be applied to other realistic scenarios to the end of Section III-B. In this paper, we are thus motivated to present a first look into the problem of determining source-destination connectivity in uncertain networks. Given a network modeled as an uncertain graph with each edge associated with an existence probability and a testing cost, together with two network nodes s, t designated as source and destination, we aim to derive efficient strategy specifying which edges to test so that we can verify whether s and t are connected by a path or separated by a cut with the minimum cost incurred. Note that the source and destination connectivity is also referred to as s-t connectivity. Comparing with s-t connectivity in deterministic graphs that can be easily solved by graph traversal methods in polynomial time, by proving the NP-hardness of the problem, we find that the s-t connectivity in uncertain graphs turns out to be far more complicated and highly non-trivial. Driven by the necessity of pursuing exact algorithms that can capture the features of the optimal solutions, we proceed by converting our problem into an equivalent Markov Decision Process (MDP) to give a dynamic programming algorithm that yields optimal strategies but has exponential running time. Considering that the prohibitive time complexity of such exact algorithm renders it unsuitable for practical applications, we therefore design approximation schemes to compromise the optimality of computed strategy for the efficiency of the algorithms. In doing so, we first put forward a simple greedy approach that computes near optimal solutions with linear approximation guarantee, which can be further improved by a second algorithm we propose through the exploration of submodularity in our problem. Our key contributions are summarized as follows: • Theory: We formally define the problem of determining s-t connectivity in uncertain networks. We prove computational complexity-theoretic results of the problem showing that it cannot be solved in polynomial time unless P=NP. The results provide useful insights to the inherent hardness and combinatoric nature of our problem. • Algorithm: We derive an exact dynamic programming algorithm by converting our problem into an equivalent fi- nite horizon Markov Decision Process. To further counter the problem, we design two approximation schemes. The first one is a simple greedy approach and we show that naive as it is, it can provide non-trivial performance guarantee. More surprisingly, its performance is far better than some other more complicated algorithms. Then, we further improve the approximation ratio of the greedy algorithm by utilizing the submodularity of the problem in the second algorithm. • Application: We demonstrate the effectiveness of our algorithms on practical applications through extensive simulations with real network datasets. It is shown that our proposed algorithms are superior to the conventional heuristics as they achieve better tradeoff between the complexity of the algorithm and the optimality of the solutions. The rest of the paper is organized as follows. we review related studies in Section II. In Section III, we formally introduce the definitions and notations related to our problem. In Section IV, we investigate the computational complexity of the problem. We present our exact dynamic programming algorithm based on Markov Decision Process framework in Section V. In Section VI, we present the two approximation algorithms and we evaluate our algorithms on real life data in Section VII. We give concluding remarks as well as future directions in Section VIII. II. RELATED WORK 1) Uncertain Networks: Uncertain network has been under intensive study for long. However, instead of verifying the existence of some structures in uncertain networks, most efforts have been devoted to calculating the existence probability of those structures. One of the fundamental problems in that regard is the network reliability problem, which asks the probability that uncertain networks are connected [1]. Following this, Jin et al. consider the distance-constrained reachability, i.e., the probability that two nodes are connected by a path shorter than a predefined threshold in an uncertain network [13]. The work in [14] focuses on discovering subgraphs with high reliability measure. In recent years, other types of study on uncertain networks (graphs) include reliable topology design [15], extracting representative subgraphs for the acceleration of various querying processes [16], performance analysis of unreliable wireless networks [8] as well as connectivity and resilience of secure sensor networks [9]. The modeling of uncertain networks in the present work is similar to random graph which was first introduced by Erdos¨ and Renyi in [17]. Despite of this similarity, the problems investigated are quite different. Specifically, previous works on random graphs [17] [18] is dedicated to the analysis of model property in an asymptotic sense where the number of nodes goes to infinity. In contrast, our focus here lies in the combinatorial optimization problem formulated from the determination of source-destination connectivity in uncertain networks, with the model serving as a mean to characterize the uncertainty in networks and a parsimonious media for extracting the essence of the problem. 2) Sequential Testing: The nature of our problem is analogous to a class of sequential testing problems which involves diagnosing a system by determining the states of its components through a series of tests. The dependency of the whole system on its components’ states can be given by explicit function or via an oracle. Existing results include optimal diagnosing strategies on series and parallel systems, double regular systems, etc. See [19] for a comprehensive review. A special class of sequential testing problems called Stochastic Boolean Function Evaluation (SBFE) have close connection to our problem. In SBFE, each component has two states and thus can be considered as a Boolean variable and the system is given by a Boolean function. The works in [20] and
0 ⊙ © 02 s 0,', 06 PrG=0.16 PrG)=0.16 Pr(G=0.04 0,l,}9 0,,} )2 K 个 ,, ⊙ ",, plea)=0.5 clez)=5 tcerlain graph a possible underlying graph ,*,0乌 PrG)=0.24 PHG)=0.04 ,0,}% 它 0 f,101乌 ,0,19 a known edge .---a potential edge {,0,049 a known non-edge PrG)=0.06 PHG)=0.24 PrG)=0.06 Fig.2.The table in the left demonstrates an adaptive testing strategy with the action of terminating states omitted.The left entries denote the temporary Fig.1.An uncertain graph with three edges and its eight possible underlying states and the right entries represent the corresponding testing edges.The right graphs.The existence probability of each edge is labeled beside it.For part illustrates the transition of temporary states when the strategy is executed clearance.we do not show the direction of each edge. on the underlying graph in the figure.Following the strategy on the left,when the underlying graph is as shown on the right,the evolution of temporary state [21]propose approximate algorithms for evaluating DNF,CNF is{◆,*,*},{0,*,*},{0,l,*},{0,l,l}.Note that some non-terminating states are and CDNF formulas.Deshpande et al.[22]propose a general not reachable during the testing process starting from the initial state,but we still show them in the figure in accordance to the definition of adaptive testing method called the Q-value approach to approximately solve strategy,which is a mapping defined on the set of all temporary states.For SBFE problems based on the adaptive submodular framework clearance,we do not show the direction of each edge. proposed in [23]. We note that,there are no previous works that study the element“O”,“I”amd“*”.And we define S={0,l,*}Eto same problem as ours except the two from Kowshik [24]be the set of temporary states associated with g. and Fu et al.[25][26],respectively,but in more restrictive settings.Particularly,Kowshik derive the optimal solution for Each temporary state s E S represents a set of outcomes s-t connectivity problem in parallel-series and series-parallel during the testing process,where"0"means the corresponding uncertain graphs [24].Fu et al.[25][26]propose an efficient edge has been tested and found not existing,"1"means algorithm and prove its optimality in an ER graph,i.e,a the corresponding edge has been tested and found existing complete graph where each edge has the same probability of and "*"means the corresponding edge has not been tested. existence and the cost of testing each edge is uniform.Our Additionally,we denote the condition of edge e in state s as work is the first attempt to consider whether optimality exists se.As our goal is to determine the s-t connectivity of the in determination of s-t connectivity in a general uncertain underlying graph for g,for a temporary state s,we define it graph to be a terminating state if either the edge set fe se =1 forms a superset of an s-t path in g or edge set fe se=0 III.MODELS AND PROBLEM FORMULATION forms a superset of an s-t cut'in g.We successfully determine A.Uncertain Graph Model the s-t connectivity by reaching a terminating state We denote an uncertain directed graph by g=(V,E,p,c),Definition 2.(Adaptive Testing Strategy)An adaptive testing where V is the set of vertices,E is the set of edges,p:E strategy is a deterministic mappingπ:S→EU{⊥}.Initially (0.1 is a function that assigns each edge e its corresponding starting from the all-*state,an adaptive testing strategy existence probability,and c:ER+represents the testing specifies which edge to test (or terminate as denoted by L) cost of each edge based on the previous testing outcomes. Following the state of art [13].we assume the existence probability of each edge to be independent.And we interpret In the present work,we restrict our consideration to reason- G as a distribution on the set {G=(V,EG),EC CE}of El able strategies where all the terminating states are mapped toL possible underlying deterministic graphs,where denotes the and no state is mapped to any edge that has already been tested cardinality of a set.The probability of a deterministic graph in that state.Also note that some states may never be reached G(V,EG)being the underlying graph is: but we still include them in the strategy for consistency. During each determining process,we are associated with Pr(G=Πp(e)Π(1-p(e). an underlying graph.The outcome of tests are dictated by the eEE\EG underlying graph and after each test the current temporary state We also use G EG to represent that G is a possible underlying will evolve into a new state.Therefore,an adaptive testing graph for g.We define g to be s-t connected if there exists an strategy may test different sets of edges before termination s-t path in the underlying graph of g.Figure 1 demonstrates when executed on different underlying graphs of C.For a an example of a three-edge uncertain graph with its possible specific underlying graph G.we denote E(G)as the set of underlying graphs. edges strategy m tests on it.Note that as G is deterministic, E(G)is also deterministic.It follows that the expected cost B.Problem Formulation Definition 1.(Temporary State)A temporary state s of an All the cuts in this paper are graph s-t cut,ie.,the minimal cut sets that uncertain graph g(V,E,p,c)is an E-dimension vector with partition s and t into different subsets
3 1e 2 e 3e 1 ( ) 0.2 p e = 2 p(e ) = 0.5 3 ( ) 0.6 p e = Pr(G) = 0.16 Pr(G) = 0.16 Pr(G) = 0.04 Pr(G) = 0.04 Pr(G) = 0.06 Pr(G) = 0.24 Pr(G) = 0.06 Pr(G) = 0.24 s s s s s s s s s t t t t t t t t t 1 2 3 4 5 6 7 8 Fig. 1. An uncertain graph with three edges and its eight possible underlying graphs. The existence probability of each edge is labeled beside it. For clearance, we do not show the direction of each edge. [21] propose approximate algorithms for evaluating DNF, CNF and CDNF formulas. Deshpande et al. [22] propose a general method called the Q-value approach to approximately solve SBFE problems based on the adaptive submodular framework proposed in [23]. We note that, there are no previous works that study the same problem as ours except the two from Kowshik [24] and Fu et al. [25] [26], respectively, but in more restrictive settings. Particularly, Kowshik derive the optimal solution for s-t connectivity problem in parallel-series and series-parallel uncertain graphs [24]. Fu et al. [25] [26] propose an efficient algorithm and prove its optimality in an ER graph, i.e, a complete graph where each edge has the same probability of existence and the cost of testing each edge is uniform. Our work is the first attempt to consider whether optimality exists in determination of s-t connectivity in a general uncertain graph. III. MODELS AND PROBLEM FORMULATION A. Uncertain Graph Model We denote an uncertain directed graph by G = (V, E, p, c), where V is the set of vertices, E is the set of edges, p : E 7→ (0, 1] is a function that assigns each edge e its corresponding existence probability, and c : E 7→ R + represents the testing cost of each edge. Following the state of art [13], we assume the existence probability of each edge to be independent. And we interpret G as a distribution on the set {G = (V, EG), EG ⊆ E} of 2 |E| possible underlying deterministic graphs, where |·| denotes the cardinality of a set. The probability of a deterministic graph G(V, EG) being the underlying graph is: P r(G) = Y e∈EG p(e) Y e∈E\EG (1 − p(e)). We also use G ∈ G to represent that G is a possible underlying graph for G. We define G to be s-t connected if there exists an s-t path in the underlying graph of G. Figure 1 demonstrates an example of a three-edge uncertain graph with its possible underlying graphs. B. Problem Formulation Definition 1. (Temporary State) A temporary state s of an uncertain graph G(V, E, p, c) is an |E|-dimension vector with 1e 2 e 3e 1 ( ) 0.2 p e = 2 p(e ) = 0.5 3 ( ) 0.6 p e = s t s t 1 ( ) 3 c e = 2 c(e ) = 5 3 ( ) 2 c e = 1 2 3 2 3 1 1 1 1 1 1 {*,*,*} {0,*,*} {0,1,*} {0,*,1} {*,1,*} {*,*,1} {*,*,0} {*,0,*} {*,1,0} {*,0,1} {*,0,0} e e e e e e e e e e e 1e 2e 3e uncertain graph a possible underlying graph a known edge a known non - edge a potential edge Fig. 2. The table in the left demonstrates an adaptive testing strategy with the action of terminating states omitted. The left entries denote the temporary states and the right entries represent the corresponding testing edges. The right part illustrates the transition of temporary states when the strategy is executed on the underlying graph in the figure. Following the strategy on the left, when the underlying graph is as shown on the right, the evolution of temporary state is {*,*,*},{0,*,*},{0,1,*},{0,1,1}. Note that some non-terminating states are not reachable during the testing process starting from the initial state, but we still show them in the figure in accordance to the definition of adaptive testing strategy, which is a mapping defined on the set of all temporary states. For clearance, we do not show the direction of each edge. element “0”, “1” and “*”. And we define S = {0, 1, ∗}|E| to be the set of temporary states associated with G. Each temporary state s ∈ S represents a set of outcomes during the testing process, where “0” means the corresponding edge has been tested and found not existing, “1” means the corresponding edge has been tested and found existing and “*” means the corresponding edge has not been tested. Additionally, we denote the condition of edge e in state s as se. As our goal is to determine the s-t connectivity of the underlying graph for G, for a temporary state s, we define it to be a terminating state if either the edge set {e | se = 1} forms a superset of an s-t path in G or edge set {e | se = 0} forms a superset of an s-t cut1 in G. We successfully determine the s-t connectivity by reaching a terminating state. Definition 2. (Adaptive Testing Strategy) An adaptive testing strategy is a deterministic mapping π : S 7→ E∪{⊥}. Initially starting from the all-∗ state, an adaptive testing strategy specifies which edge to test (or terminate as denoted by ⊥) based on the previous testing outcomes. In the present work, we restrict our consideration to reasonable strategies where all the terminating states are mapped to ⊥ and no state is mapped to any edge that has already been tested in that state. Also note that some states may never be reached but we still include them in the strategy for consistency. During each determining process, we are associated with an underlying graph. The outcome of tests are dictated by the underlying graph and after each test the current temporary state will evolve into a new state. Therefore, an adaptive testing strategy may test different sets of edges before termination when executed on different underlying graphs of G. For a specific underlying graph G, we denote Eπ(G) as the set of edges strategy π tests on it. Note that as G is deterministic, Eπ(G) is also deterministic. It follows that the expected cost 1All the cuts in this paper are graph s-t cut, i.e., the minimal cut sets that partition s and t into different subsets
of a is given by: whole network structure,it is reasonable to render the link Cost)=∑[Pr(G)∑c(el existence between nodes as a preassigned probability.And the algorithm for the problem thus helps us more efficiently GEg e∈Er(G) discover the relationship between nodes,which manifests its where (c)c(e)equals to the cost incurred by when significant use in link prediction [311.Moreover,the structure the underlying graph is G,and the expected cost is the weight- of the social networks can also be revealed [32],[33]by ed sum of the costs incurred on all the possible underlying applying the algorithm to different node pairs.Last but not graphs.An example of an adaptive testing strategy on an least,more potential applications of the illustrated scenario uncertain graph is illustrated in Figure 2. also include sensor networks [28],P2P networks [29]and Data Based on all the conditions above,now we give a formal center networks [7]. definition of our problem stated as follows. IV.COMPUTATIONAL COMPLEXITY Definition 3.(The Connectivity Determination Problem)Giv- en an uncertain directed graph2g(V,E,p,c)with two nodes In this section,we investigate the computational complexity s,t EV designated as source and destination,respectively, of the Connectivity Determination problem.By demonstrating the goal is to find an adaptive testing strategy n that incurs the hardness of two closely related problems,we show both the minimum expected cost. computing the testing strategy with the minimum expected cost holistically and sequentially are NP-hard.More specifically, Remark:Apart from deriving the strategy's action in all we first convert our problem into its corresponding decision temporary states at once,an algorithm for the Connectivity version that asks for the existence of an adaptive testing Determination problem can instead compute the strategy se- strategy with expected cost less than some value l for a given quentially,only deciding the next edge to test based on the uncertain graph.Then.we consider the problem of deciding current state.In algorithmic point of view,we consider the which edge to test first in the optimal strategy.The inherent time complexity of an algorithm for Connectivity Determina-tension of the Connectivity Determination problem is therefore tion problem as the maximum time it takes to compute all the disclosed through demonstrating the NP-hardness of these two relevant actions of a determining process.Therefore,finding problems,as stated in Theorems 1 and 2,respectively. the optimal strategy in a sequential fashion,on the surface, may appear to simplify the problem compared to computing it Theorem 1.The decision version of Connectivity Determina- holistically.However,we show in next section that the problem tion Problem is NP-hard. is NP-hard regardless of in which way we compute the optimal Proof:Inspired by [27],we prove the theorem by re- strategy.Table I summarizes the notations that will be used duction from the s-t reliability problem [1]:Given a directed throughout the paper. graph G and two nodes s and t.The s-t reliability is to Applicability of the model:We illustrate how to project our compute the probability of s being connected to t assuming model into real situations using several examples.Let us firstly the edges in G exist independently with probability As s-t take communication networks for example,where the edge reliability problem is #P-hard [1]3,its decision version that existence probability corresponds to the reliability of network quests whether the probability of s being connected to t is links.The testing cost represents the probing costs of the links. larger than some predefined value ro is NP-hard. The connectivity determination problem asks for a minimum The reduction works as follows.For a graph G(V,E),we cost probing strategy determining the source-destination con- transform it to an uncertain graph G(V,E',p,c)by adding an nectivity of the networks,allowing for the prevalent existence edge M between s.t and set the rest of g is just the same of edge uncertainty.Other than communication networks,such as G.Define n as the number of edges in G.We set the cost uncertain graph can also be projected into the structure of a of M as c(M)=n2+1 and the cost of testing other edges priced information,with each edge representing a piece of as 1.Then we assign the existence probability of all edges in information (data)and its testing cost corresponds to the price g as Finally,we designate s,t in g as the source and the of the data.Under such circumstance,the optimal strategy al- destination in the constructed instance. lows us to successfully query the information paying minimum Let r be the s-t reliability in G and I be the expected cost price [30].Another application is a social network graph where incurred by the optimal testing strategy on g.We define a an edge can be treated as a first cousin relationship,and the generic G'as a subgraph resulted from an underlying graph object of interest is to determine whether two individuals from of G with edge M removed.We will show that if we know l, a large population are distant cousins.Obviously,determining then we can efficiently compute r. whether an edge exists between two individuals is usually First,from the definitions,we have r=for some very expensive,involving costly genetic testing that outweighs integer k,and I must obey the following two constraints: any computational cost.An edge could of course represent I>(1-r)c(M)and I<rn +(1-r)c(M).Here,the a variety of other relationships that are expensive to check,first inequality follows from the fact that we have to test M for instance due to confidentiality or physical restrictions.In whenever we find out that s and t is not connected in G those scenarios where users have no prior knowledge of the The second inequality holds since the expected cost of the 2Without loss of generality.we assume the graph with vertex set V and 3#P is a complexity class for counting problems.#P-hard is at least as hard edge set E is s-t connected,i.e.,g is s-t connected if all its edges exist. as NP-hard [1]
4 of π is given by: Cost(π) = X G∈G [P r(G) X e∈Eπ(G) c(e)], where P e∈Eπ(G) c(e) equals to the cost incurred by π when the underlying graph is G, and the expected cost is the weighted sum of the costs incurred on all the possible underlying graphs. An example of an adaptive testing strategy on an uncertain graph is illustrated in Figure 2. Based on all the conditions above, now we give a formal definition of our problem stated as follows. Definition 3. (The Connectivity Determination Problem) Given an uncertain directed graph2 G(V, E, p, c) with two nodes s, t ∈ V designated as source and destination, respectively, the goal is to find an adaptive testing strategy π that incurs the minimum expected cost. Remark: Apart from deriving the strategy’s action in all temporary states at once, an algorithm for the Connectivity Determination problem can instead compute the strategy sequentially, only deciding the next edge to test based on the current state. In algorithmic point of view, we consider the time complexity of an algorithm for Connectivity Determination problem as the maximum time it takes to compute all the relevant actions of a determining process. Therefore, finding the optimal strategy in a sequential fashion, on the surface, may appear to simplify the problem compared to computing it holistically. However, we show in next section that the problem is NP-hard regardless of in which way we compute the optimal strategy. Table I summarizes the notations that will be used throughout the paper. Applicability of the model: We illustrate how to project our model into real situations using several examples. Let us firstly take communication networks for example, where the edge existence probability corresponds to the reliability of network links. The testing cost represents the probing costs of the links. The connectivity determination problem asks for a minimum cost probing strategy determining the source-destination connectivity of the networks, allowing for the prevalent existence of edge uncertainty. Other than communication networks, such uncertain graph can also be projected into the structure of a priced information, with each edge representing a piece of information (data) and its testing cost corresponds to the price of the data. Under such circumstance, the optimal strategy allows us to successfully query the information paying minimum price [30]. Another application is a social network graph where an edge can be treated as a first cousin relationship, and the object of interest is to determine whether two individuals from a large population are distant cousins. Obviously, determining whether an edge exists between two individuals is usually very expensive, involving costly genetic testing that outweighs any computational cost. An edge could of course represent a variety of other relationships that are expensive to check, for instance due to confidentiality or physical restrictions. In those scenarios where users have no prior knowledge of the 2Without loss of generality, we assume the graph with vertex set V and edge set E is s-t connected, i.e., G is s-t connected if all its edges exist. whole network structure, it is reasonable to render the link existence between nodes as a preassigned probability. And the algorithm for the problem thus helps us more efficiently discover the relationship between nodes, which manifests its significant use in link prediction [31]. Moreover, the structure of the social networks can also be revealed [32], [33] by applying the algorithm to different node pairs. Last but not least, more potential applications of the illustrated scenario also include sensor networks [28], P2P networks [29] and Data center networks [7]. IV. COMPUTATIONAL COMPLEXITY In this section, we investigate the computational complexity of the Connectivity Determination problem. By demonstrating the hardness of two closely related problems, we show both computing the testing strategy with the minimum expected cost holistically and sequentially are NP-hard. More specifically, we first convert our problem into its corresponding decision version that asks for the existence of an adaptive testing strategy with expected cost less than some value l for a given uncertain graph. Then, we consider the problem of deciding which edge to test first in the optimal strategy. The inherent tension of the Connectivity Determination problem is therefore disclosed through demonstrating the NP-hardness of these two problems, as stated in Theorems 1 and 2, respectively. Theorem 1. The decision version of Connectivity Determination Problem is NP-hard. Proof: Inspired by [27], we prove the theorem by reduction from the s-t reliability problem [1]: Given a directed graph G and two nodes s and t. The s-t reliability is to compute the probability of s being connected to t assuming the edges in G exist independently with probability 1 2 . As s-t reliability problem is #P-hard [1]3 , its decision version that quests whether the probability of s being connected to t is larger than some predefined value r0 is NP-hard. The reduction works as follows. For a graph G(V, E), we transform it to an uncertain graph G(V, E0 , p, c) by adding an edge M between s, t and set the rest of G is just the same as G. Define n as the number of edges in G. We set the cost of M as c(M) = n2 n+1 and the cost of testing other edges as 1. Then we assign the existence probability of all edges in G as 1 2 . Finally, we designate s, t in G as the source and the destination in the constructed instance. Let r be the s-t reliability in G and l be the expected cost incurred by the optimal testing strategy on G. We define a generic G0 as a subgraph resulted from an underlying graph of G with edge M removed. We will show that if we know l, then we can efficiently compute r. First, from the definitions, we have r = k 2n for some integer k, and l must obey the following two constraints: l ≥ (1 − r)c(M) and l ≤ rn + (1 − r)c(M) . Here, the first inequality follows from the fact that we have to test M whenever we find out that s and t is not connected in G0 . The second inequality holds since the expected cost of the 3#P is a complexity class for counting problems. #P-hard is at least as hard as NP-hard [1]
TABLE I NOTATIONS AND DEFINITIONS Theorem 1 establishes the NP-hardness of the decision version of our problem,which implies the NP-hardness of computing Notation Definition the optimal strategy in a holistic fashion.Theorem 2 shows uncertain directed graph vertex set of the uncertain graph that even computing the optimal testing strategy sequentially edge set of the uncertain graph cannot be completed in polynomial time unless P=NP. e probability function indicating the existence probability of edges in the uncertain graph cost function indicating the testing cost of edges in the uncertain graph G underlying deterministic graph S={8,e3} s,t source and destination set of temporary states 32={8,6} s,s,a,b temporary states S={62,6,6} Se the element corresponding to edge e in state s M 开 adaptive testing strategy ExG)】 set of edges tests before termination Fig.3.The uncertain graph (right)constructed for the set cover instance when the underlying graph is G (left).We omit the probability and cost of edges in the uncertain graph and Cost() the expected cost of strategy m refer them to the Appendix A. utility function in the Markov Decision Process utility function in the Q-value approach the collection of s-t paths in g V.MDP-BASED EXACT ALGORITHM C the collection of s-t cuts in g The NP-hardness analysis in the previous section implies Pe the subfamily of s-t paths in g that edge e lies on Ce the subfamily of s-t cuts in g that edge e lies on that solving the problem exactly may lead to a prohibitively the goal value in the Q-value approach large cost.However,it is still essential to design an exact algorithm to capture the features of the optimal solutions and gain insights of our Connectivity Determination problem. optimal strategy is certainly no greater than that of a simple The main idea of seeking for an exact algorithm is through strategy that first test all the edges in E and test M if no converting our problem into an equivalent Markov Decision s-t path is found.Combining the two inequalities,we have Process (MDP).Adopting the notations in [34],in the sequel, 2mc≤k≤2"c0.Consequently,k=L2e40g」 c(M) c(M)-n c(M)-n- we will first show how the elements in our problem can be Therefore,if we have a polynomial time algorithm that solves naturally mapped to the components in a finite horizon MDP. the decision version of Connectivity Determination problem, then we can efficiently solve the decision version of s-t reli- A.Mapping the Problem into MDP ability problem.Since the latter is NP-hard,we conclude that As a mathematical framework for planning or navigating the decision version of Connectivity Determination problem is ◆ uncertain systems,MDP models the way of a decision maker's also NP-hard. choosing actions so that the system can perform optimally Theorem 2.Deciding the optimal first edge to test (the edge with regard to some predefined criterion.The key components tested by the optimal strategy in the intial state)is NP-hard. of an MDP include decision epochs,state space,action sets, transition probabilities,rewards,decision policy and optimality Proof:We only present a proof sketch here and refer the criterion.Regarding this,now we show the mapping between details to Appendix A.The proof is done by reduction from these components and the elements in our problem one by set cover problem.Given a universe of elements,a family of one.The correspondence is also summarized in Table II. subsets of the universe and a predefined integer k,a cover is a subfamily of sets whose union equals to the universe. TABLE II The set cover problem asks whether there exists a cover of THE CORRELATION BETWEEN CONNECTIVITY DETERMINATION PROBLEM AND MARKOV DECISION PROCESS cardinality less than k.For a set cover instance,we construct a corresponding uncertain graph as follows.We first create Markov Decision Process Connectivity Determination Problem a set vertex for each subset in the family and an element state space set of temporary states S state vertex for each element in the universe.Next,we add three a temporary state s action set testing or termination,EU special vertices:source s,destination t and a special set vertex transition probability function P probability function p of edges sM.Then,we add edges from s to each set vertex,from reward function r cost function c of edges each element vertex to t and from each set vertex to the decision policy adaptive testing strategy element vertices it contains in the original instance.Specially. Decision Epochs:In an MDP,decisions are made at we add edges from sM to all the element vertices.By carefully points of time called decision epochs.In our problem the assigning the cost and probability of each edge,we prove that decision epochs are the times we need to decide which the optimal first edge to test is the edge M from s to sM if and edge to test next or terminate.Since we at most need only if there does not exist a cover of size smaller than k in to test E edges whereE is the number of possible the original set cover instance.Figure 3 presents the uncertain edges in the uncertain graph,our corresponding MDP is graph constructed for a set cover instance. ◆ of finite horizon. Remark:The two theorems characterize the complexity State Space:The state space of an MDP represents of the Connectivity Determination problem from two aspects. the possible states that a system can be in.It naturally
5 TABLE I NOTATIONS AND DEFINITIONS Notation Definition G uncertain directed graph V vertex set of the uncertain graph E edge set of the uncertain graph p probability function indicating the existence probability of edges in the uncertain graph c cost function indicating the testing cost of edges in the uncertain graph G underlying deterministic graph s, t source and destination S set of temporary states s, s 0 , a, b temporary states se the element corresponding to edge e in state s π adaptive testing strategy Eπ(G) set of edges π tests before termination when the underlying graph is G Cost(π) the expected cost of strategy π u utility function in the Markov Decision Process g utility function in the Q-value approach P the collection of s-t paths in G C the collection of s-t cuts in G Pe the subfamily of s-t paths in G that edge e lies on Ce the subfamily of s-t cuts in G that edge e lies on Q the goal value in the Q-value approach optimal strategy is certainly no greater than that of a simple strategy that first test all the edges in E and test M if no s-t path is found. Combining the two inequalities, we have 2 n c(M)−l c(M) ≤ k ≤ 2 n c(M)−l c(M)−n . Consequently, k = b2 n c(M)−l c(M)−n c. Therefore, if we have a polynomial time algorithm that solves the decision version of Connectivity Determination problem, then we can efficiently solve the decision version of s-t reliability problem. Since the latter is NP-hard, we conclude that the decision version of Connectivity Determination problem is also NP-hard. Theorem 2. Deciding the optimal first edge to test (the edge tested by the optimal strategy in the intial state) is NP-hard. Proof: We only present a proof sketch here and refer the details to Appendix A. The proof is done by reduction from set cover problem. Given a universe of elements, a family of subsets of the universe and a predefined integer k, a cover is a subfamily of sets whose union equals to the universe. The set cover problem asks whether there exists a cover of cardinality less than k. For a set cover instance, we construct a corresponding uncertain graph as follows. We first create a set vertex for each subset in the family and an element vertex for each element in the universe. Next, we add three special vertices: source s, destination t and a special set vertex sM. Then, we add edges from s to each set vertex, from each element vertex to t and from each set vertex to the element vertices it contains in the original instance. Specially, we add edges from sM to all the element vertices. By carefully assigning the cost and probability of each edge, we prove that the optimal first edge to test is the edge M from s to sM if and only if there does not exist a cover of size smaller than k in the original set cover instance. Figure 3 presents the uncertain graph constructed for a set cover instance. Remark: The two theorems characterize the complexity of the Connectivity Determination problem from two aspects. Theorem 1 establishes the NP-hardness of the decision version of our problem, which implies the NP-hardness of computing the optimal strategy in a holistic fashion. Theorem 2 shows that even computing the optimal testing strategy sequentially cannot be completed in polynomial time unless P=NP. M s t 1s 2 s 3 s SM ε 1 2 ε 3 ε 4 ε 1 1 3 2 1 4 3 2 3 4 { , } { , } { , , } s s s ε ε ε ε ε ε ε = = = Fig. 3. The uncertain graph (right) constructed for the set cover instance (left). We omit the probability and cost of edges in the uncertain graph and refer them to the Appendix A. V. MDP-BASED EXACT ALGORITHM The NP-hardness analysis in the previous section implies that solving the problem exactly may lead to a prohibitively large cost. However, it is still essential to design an exact algorithm to capture the features of the optimal solutions and gain insights of our Connectivity Determination problem. The main idea of seeking for an exact algorithm is through converting our problem into an equivalent Markov Decision Process (MDP). Adopting the notations in [34], in the sequel, we will first show how the elements in our problem can be naturally mapped to the components in a finite horizon MDP. A. Mapping the Problem into MDP As a mathematical framework for planning or navigating uncertain systems, MDP models the way of a decision maker’s choosing actions so that the system can perform optimally with regard to some predefined criterion. The key components of an MDP include decision epochs, state space, action sets, transition probabilities, rewards, decision policy and optimality criterion. Regarding this, now we show the mapping between these components and the elements in our problem one by one. The correspondence is also summarized in Table II. TABLE II THE CORRELATION BETWEEN CONNECTIVITY DETERMINATION PROBLEM AND MARKOV DECISION PROCESS Markov Decision Process Connectivity Determination Problem state space set of temporary states S state a temporary state s action set testing or termination, E ∪ {⊥} transition probability function P probability function p of edges reward function r cost function c of edges decision policy adaptive testing strategy π • Decision Epochs: In an MDP, decisions are made at points of time called decision epochs. In our problem the decision epochs are the times we need to decide which edge to test next or terminate. Since we at most need to test |E| edges where |E| is the number of possible edges in the uncertain graph, our corresponding MDP is of finite horizon. • State Space: The state space of an MDP represents the possible states that a system can be in. It naturally