corresponds to the set of temporary states S in our Lemma 1.For any state s S,the optimal utility function problem.We may also partition the state space S into satisfies E disjoint subsets based on the number of edges having u(s)=maxaeA.(r(s,a)+ss P(s'Is,e)u(s)) been tested in the states as S=So US1U...USEl.In Particularly,if s is a non-terminating state,then decision epoch i,the system can only be in a state in Si. u(s)=maxeEA.{-c(e)+p(e)u(s.e)+(1-p(e))u(s\e)} Action Sets:For each state s E S.there is a set of actions and for any terminating state,its utility is 0. that can be performed under it.We define the associated action set As of state s as the set of edges that have Based on the Lemma 1,we design an algorithm that not been tested in s.Additionally,for terminating states, computes the optimal testing strategy a following the standard their action set also contains the terminating action L.As dynamic programming paradigm,as shown in Algorithm 1. a result,the whole action set A=Uses As =EU[L). Algorithm 1 The MDP-based Exact Algorithm Transition Probabilities and Rewards:The transition probability function and reward function characterize the Input:Uncertain graph G(V,E,p,c),source s,destination t Output:The optimal testing strategy m result of choosing some action at some state.Generally speaking,at each state,choosing an action will gain 上:Initialize:un(s)=O,for all s∈SEl 2:fori=E]to 0 do some reward and the system will evolve into other states 3: for All s in Si do probabilistically at the next decision epoch.Projecting 4 if s is a terminating state then into our problem,the transition probability of action e 5: ux(s):=0,T(s):=⊥. (testing edge e)is given by the existence probability 6 else of edge e.Denote by s.e the temporary state evolved 7: from s by setting se as 1 and by se the temporary state e*:arg maxeeA {-c(e)+p(e)ur(s.e) evolved from s by setting se as 0.Formally,the transition +(1-p(e)un(se)}, 8: u.(s)=-c(e*)+p(e*)u.(s·e*) probability function is given by: p(e) ifs=s·e, +(1-p(e*)un(s\e*), 9: π(s):=e* P(s's,e)= 1-p(e)if s'=s\e, lo:return元 0 otherwise, and 1 if s'=s, We prove the correctness of the dynamic programming P(sls,⊥)= algorithm in the following theorem. 0 otherwise. Then it follows that the reward function is r(s,e)= Theorem 3.For an uncertain graph g,Algorithm I yields -c(e)and r(s,L)=0.Note that the reward function is an optimal adaptive testing strategy and has a complexity of negative,corresponds to the cost and the transition prob- O((V+E)3E).where V]denotes the number of nodes ability and reward function are independent with regard and E denotes the number of edges in g. to decision epochs or previous state,which demonstrates Proof:Denote an optimal testing strategy as n,the the Markov property of our problem. strategy given by Algorithm 1 as m.By backward induction, Decision Policy:A decision policy is a mapping from we prove that the utility function u of m is no less than the state space to action set.Therefore,in our problem,it is optimal utility function u=u on every state,which implies equivalent to an adaptive testing strategy. that is an optimal strategy. Optimality Criterion:Obviously,in our case,the op- First,for all s E SEl.obviously ur(s)=ur(s)=0. timality criterion is the expected total reward criterion, Suppose for all states s∈Si,i≥k,r(s)≥ur-(s),then we i.e.,the decision policy with the maximum expected prove that for all states sE Sk1,u(s)>u(s).Indeed,by total reward of the constructed MDP corresponds to the the selection criterion of the algorithm,for a state s e S optimal adaptive testing strategy. that is non-terminating,we have (s)-c(e)+p(e)u (s.)+(1-p(e))u(se)) B.Exact Dynamic Programming Algorithm ≥-c(π*(s)+p(π*(s)ux(s·π*(s) From the equivalence between our problem and an MDP, it follows that our problem also satisfies the "Principle of +(1-p(π*(s))ux(s\π*(s)】 Optimality"in MDP [34],i.e.,starting at any states,the ≥-c(π*(s)+p(π*(s)um-(s·π*(s) optimal adaptive testing strategy incurs the minimum expected +(1-p(π*(s))u+(s\π*(s) (1) cost among all strategies.This enables us to define the optimal =u(s), utility function u of states assigning each temporary state the expected reward (negative cost)of the optimal strategy where Inequality (1)follows from the induction hypothesis. starting at that state.Similarly,we define a utility function u And if s is a terminating state,then also u(s)=u(s)=0. associated with strategy m as the reward gained by starting Hence,we prove that under every state s,following is from each state.By the Bellman equation [34],we have the optimal,and particularly from the initial all-*state,returns following lemma. the maximum expected reward,or equivalently,incurs the minimum expected cost
6 corresponds to the set of temporary states S in our problem. We may also partition the state space S into |E| disjoint subsets based on the number of edges having been tested in the states as S = S0 ∪ S1 ∪ . . . ∪ S|E| . In decision epoch i, the system can only be in a state in Si . • Action Sets: For each state s ∈ S, there is a set of actions that can be performed under it. We define the associated action set As of state s as the set of edges that have not been tested in s. Additionally, for terminating states, their action set also contains the terminating action ⊥. As a result, the whole action set A = S s∈S As = E ∪ {⊥}. • Transition Probabilities and Rewards: The transition probability function and reward function characterize the result of choosing some action at some state. Generally speaking, at each state, choosing an action will gain some reward and the system will evolve into other states probabilistically at the next decision epoch. Projecting into our problem, the transition probability of action e (testing edge e) is given by the existence probability of edge e. Denote by s · e the temporary state evolved from s by setting se as 1 and by s\e the temporary state evolved from s by setting se as 0. Formally, the transition probability function is given by: P(s 0 |s, e) = p(e) if s 0 = s · e, 1 − p(e) if s 0 = s\e, 0 otherwise, and P(s 0 |s, ⊥) = ( 1 if s 0 = s, 0 otherwise. Then it follows that the reward function is r(s, e) = −c(e) and r(s, ⊥) = 0. Note that the reward function is negative, corresponds to the cost and the transition probability and reward function are independent with regard to decision epochs or previous state, which demonstrates the Markov property of our problem. • Decision Policy: A decision policy is a mapping from state space to action set. Therefore, in our problem, it is equivalent to an adaptive testing strategy. • Optimality Criterion: Obviously, in our case, the optimality criterion is the expected total reward criterion, i.e., the decision policy with the maximum expected total reward of the constructed MDP corresponds to the optimal adaptive testing strategy. B. Exact Dynamic Programming Algorithm From the equivalence between our problem and an MDP, it follows that our problem also satisfies the “Principle of Optimality” in MDP [34], i.e., starting at any states, the optimal adaptive testing strategy incurs the minimum expected cost among all strategies. This enables us to define the optimal utility function u of states assigning each temporary state the expected reward (negative cost) of the optimal strategy starting at that state. Similarly, we define a utility function uπ associated with strategy π as the reward gained by π starting from each state. By the Bellman equation [34], we have the following lemma. Lemma 1. For any state s ∈ S, the optimal utility function satisfies u(s) = maxa∈As {r(s, a) + P s 0∈S P(s 0 | s, e)u(s 0 )}. Particularly, if s is a non-terminating state, then u(s) = maxe∈As {−c(e) + p(e)u(s · e) + (1 − p(e))u(s\e)} and for any terminating state, its utility is 0. Based on the Lemma 1, we design an algorithm that computes the optimal testing strategy π following the standard dynamic programming paradigm, as shown in Algorithm 1. Algorithm 1 The MDP-based Exact Algorithm Input: Uncertain graph G(V, E, p, c), source s, destination t Output: The optimal testing strategy π 1: Initialize: uπ(s) = 0, for all s ∈ S|E| 2: for i = |E| to 0 do 3: for All s in Si do 4: if s is a terminating state then 5: uπ(s) := 0, π(s) := ⊥. 6: else 7: e ∗ := arg maxe∈As {−c(e) + p(e)uπ(s · e) +(1 − p(e))uπ(s\e)}, 8: uπ(s) := −c(e ∗ ) + p(e ∗ )uπ(s · e ∗ ) +(1 − p(e ∗ ))uπ(s\e ∗ ), 9: π(s) := e ∗ . 10: return π We prove the correctness of the dynamic programming algorithm in the following theorem. Theorem 3. For an uncertain graph G, Algorithm 1 yields an optimal adaptive testing strategy and has a complexity of O((|V | + |E|)3|E| ), where |V | denotes the number of nodes and |E| denotes the number of edges in G. Proof: Denote an optimal testing strategy as π ∗ , the strategy given by Algorithm 1 as π. By backward induction, we prove that the utility function uπ of π is no less than the optimal utility function uπ∗ = u on every state, which implies that π is an optimal strategy. First, for all s ∈ S|E| , obviously uπ(s) = uπ∗ (s) = 0. Suppose for all states s ∈ Si , i ≥ k, uπ(s) ≥ uπ∗ (s), then we prove that for all states s ∈ Sk−1, uπ(s) ≥ uπ∗ (s). Indeed, by the selection criterion of the algorithm, for a state s ∈ Sk−1 that is non-terminating, we have uπ(s) = max e∈As {−c(e) + p(e)uπ(s · e) + (1 − p(e))uπ(s\e)} ≥ − c(π ∗ (s)) + p(π ∗ (s))uπ(s · π ∗ (s)) + (1 − p(π ∗ (s)))uπ(s\π ∗ (s)) ≥ − c(π ∗ (s)) + p(π ∗ (s))uπ∗ (s · π ∗ (s)) + (1 − p(π ∗ (s)))uπ∗ (s\π ∗ (s)) (1) =uπ∗ (s), where Inequality (1) follows from the induction hypothesis. And if s is a terminating state, then also uπ(s) = uπ∗ (s) = 0. Hence, we prove that under every state s, following π is optimal, and particularly from the initial all-∗ state, π returns the maximum expected reward, or equivalently, incurs the minimum expected cost
> The time complexity of the algorithm can be justified as Therefore,.Cost()=∑Geo[Pr(G)∑eeE.(ec(el≤lE follows.There are in total 3El temporary states associated ∑G∈c[Pr(G)Cert(G】≤E·Cost(π*) ■ with the uncertain graph.Qualifying whether a state s is a ter- Strictly speaking,the greedy algorithm yields non-adaptive minating state can be realized by querying the s-t connectivity strategies,i.e.,the strategies it derives do not make decisions on two deterministic graphs G(V,E1)and G2(V,E2),where based on previous results but tests the edges following a E1=fe l se 1}and E2=fe l se 1 or se =*)This is predefined order.We can modify it into an adaptive version, implementable in O(V+E)time by depth-first traversal, which omits the edges that do not effect the s-t connectivity, and selecting the optimal action for each state requires O(E) e.g.the edges that do not lie on any of the s-t paths in the time.Hence,the algorithm terminates and finds the optimal current state.Although the adaptive version of the greedy solution in ((V+E)3El)time. ■ algorithm has better performance,Theorem 4 holds for both Remark:Note that the exact algorithm computes the whole the adaptive and non-adaptive version. testing strategy in one run,meaning that we can obtain the There are three further notes regarding the properties of corresponding action for all temporary states by invoking the the greedy algorithm:(i)Although the proof of Theorem 4 algorithm once.Therefore,on average,the exact algorithm is completed by comparing the performance of the greedy takes O(E)time to determine the optimal action for each algorithm to the minimum certifier cost,the resulting bound temporary state.However,as the time complexity we consider is actually tight,(ii)The approximation ratio of an alternative here is the total time for an algorithm to determine all the greedy algorithm based on the existence probability of edges actions of a testing process and we have to execute the whole is far worse than O(E)and (iii)The greedy algorithm only algorithm at the beginning of each testing process,the exact considers the testing cost of edges.However,it has significant- algorithm still has exponential time complexity.On the other ly better performance guarantee than some seemingly more hand,the approximation algorithms presented in the following sophisticated algorithms that take into account the existence section exploits the sequential way of computing the testing probabilities of edges.The soundness of the above three strategy.Based on each temporary state,they use polynomial arguments is supported in our Appendices C. time to determine the corresponding action,thus can be categorized as polynomial time approximation algorithms. B.Adaptive Submodular Algorithm VI.APPROXIMATION ALGORITHMS To further improve the approximation ratio,we adopt the Q-value approach in Stochastic Boolean Function Evaluation As stated previously,it is unrealistic to pursue efficient exact Problem (SBFE)proposed by [22].Based on that,we utilize algorithm on general uncertain graphs due to the inherent the adaptive submodularity [23]in our problem and propose tension of the Connectivity Determination problem.Therefore. the Adaptive Submodular Algorithm,of which the approxi- in this section,we propose approximation schemes that have mation ratio is logarithmic to the number of edges for most both polynomial time complexity and good approximation uncertain graphs guarantee.Specifically,we focus on analyzing the approxima- tion ratios of the two proposed algorithms.The readers may 1)Preliminaries:The Q-value approach is proposed for refer to Appendix IV for more details of their time complexity. the SBFE problem,which has close connection with our problem.The SBFE problem is that given a Boolean function f {0,1)"10,1}on an unknown input x.Each bit i A.A Simple Greedy Approach of x can only be determined by paying a cost ci.The prior An intuitive greedy algorithm is to test the edge with probability of the value of each bit being 1 is specified by a the minimum cost (breaking ties arbitrarily).Surprisingly,we probability vector p=(p1,p2,...,pn)and x is drawn from show that this greedy algorithm has a non-trivial approxima- the corresponding product distribution.The goal is to derive an tion ratio of (E). evaluation strategy minimizing the expected cost.The relation Theorem 4.Given an instance of our problem with uncer- between our problem and SBFE can be established by consid- tain graph g(V,E,p,c)and two nodes s,t as source and ering each edge as a Boolean variable and see the function as implicitly given by the s-t connectivity of the uncertain graph. destination,let n be a strategy that tests the edges in g according to their costs sorted in an increasing order.Then, However,as constructing such a function from a graph may Cost(π)≤lE·Cost(π*),whereπ*is the optimal strategy. have exponential time complexity,we cannot directly use the algorithms proposed for SBFE problems. Proof:Suppose that we know the underlying graph G We then introduce some useful definitions for the Q-value of G in advance.Denote Cert(G)as the certifier of G's s-t approach adapted for our problem.For two temporary states connectivity with the minimum cost.If s and t are connected a.b E S.a is an extension of b,written as a~b,if a;=b; in G,then a certifier consists of an s-t path in g whose edges for all bi*.A function g:SN is said to be monotone exist in G.If s and t are disconnected in G.then a certifier if for sES and all s'~s,g(s')-g(s)>0.g is submodular consists of an s-t cut in g whose edges do not exist in G.if g(s.e)-g(s)>g(s'.e)-g(s')and g(s\e)-g(s)> Since tests the edges from cheap to expensive,we must have g(s'e)-g(s')whenever s'~s and se =se=*.For a state eE(G)c(e)<ECert(G)for any G.And due to the s E S and edge e with se =*the expected marginal gain fact that even the optimal strategy has no prior knowledge of of the edge to the current state with respect to g is given by the underlying graphG,clearly∑eeE,ec(e)≥Cert(G).p(e)g(s·e)+(1-ple)g(s\e)-gs)
7 The time complexity of the algorithm can be justified as follows. There are in total 3 |E| temporary states associated with the uncertain graph. Qualifying whether a state s is a terminating state can be realized by querying the s-t connectivity on two deterministic graphs G1 s (V, E1) and G2 s (V, E2), where E1 = {e | se = 1} and E2 = {e | se = 1 or se = ∗}. This is implementable in O(|V | + |E|) time by depth-first traversal, and selecting the optimal action for each state requires O(|E|) time. Hence, the algorithm terminates and finds the optimal solution in O((|V | + |E|)3|E| ) time. Remark: Note that the exact algorithm computes the whole testing strategy in one run, meaning that we can obtain the corresponding action for all temporary states by invoking the algorithm once. Therefore, on average, the exact algorithm takes O(|E|) time to determine the optimal action for each temporary state. However, as the time complexity we consider here is the total time for an algorithm to determine all the actions of a testing process and we have to execute the whole algorithm at the beginning of each testing process, the exact algorithm still has exponential time complexity. On the other hand, the approximation algorithms presented in the following section exploits the sequential way of computing the testing strategy. Based on each temporary state, they use polynomial time to determine the corresponding action, thus can be categorized as polynomial time approximation algorithms. VI. APPROXIMATION ALGORITHMS As stated previously, it is unrealistic to pursue efficient exact algorithm on general uncertain graphs due to the inherent tension of the Connectivity Determination problem. Therefore, in this section, we propose approximation schemes that have both polynomial time complexity and good approximation guarantee. Specifically, we focus on analyzing the approximation ratios of the two proposed algorithms. The readers may refer to Appendix IV for more details of their time complexity. A. A Simple Greedy Approach An intuitive greedy algorithm is to test the edge with the minimum cost (breaking ties arbitrarily). Surprisingly, we show that this greedy algorithm has a non-trivial approximation ratio of O(|E|). Theorem 4. Given an instance of our problem with uncertain graph G(V, E, p, c) and two nodes s, t as source and destination, let π be a strategy that tests the edges in G according to their costs sorted in an increasing order. Then, Cost(π) ≤ |E| · Cost(π ∗ ), where π ∗ is the optimal strategy. Proof: Suppose that we know the underlying graph G of G in advance. Denote Cert(G) as the certifier of G’s s-t connectivity with the minimum cost. If s and t are connected in G, then a certifier consists of an s-t path in G whose edges exist in G. If s and t are disconnected in G, then a certifier consists of an s-t cut in G whose edges do not exist in G. Since P π tests the edges from cheap to expensive, we must have e∈Eπ(G) c(e) ≤ |E| · Cert(G) for any G. And due to the fact that even the optimal strategy has no prior knowledge of the underlying graph G, clearly P e∈Eπ∗ (G) c(e) ≥ Cert(G). Therefore, Cost(π) = P G∈G[P r(G) P e∈Eπ(G) P c(e)] ≤ |E| · G∈G[P r(G)Cert(G)] ≤ |E| · Cost(π ∗ ). Strictly speaking, the greedy algorithm yields non-adaptive strategies, i.e., the strategies it derives do not make decisions based on previous results but tests the edges following a predefined order. We can modify it into an adaptive version, which omits the edges that do not effect the s-t connectivity, e.g. the edges that do not lie on any of the s-t paths in the current state. Although the adaptive version of the greedy algorithm has better performance, Theorem 4 holds for both the adaptive and non-adaptive version. There are three further notes regarding the properties of the greedy algorithm: (i) Although the proof of Theorem 4 is completed by comparing the performance of the greedy algorithm to the minimum certifier cost, the resulting bound is actually tight, (ii) The approximation ratio of an alternative greedy algorithm based on the existence probability of edges is far worse than O(|E|) and (iii) The greedy algorithm only considers the testing cost of edges. However, it has significantly better performance guarantee than some seemingly more sophisticated algorithms that take into account the existence probabilities of edges. The soundness of the above three arguments is supported in our Appendices C. B. Adaptive Submodular Algorithm To further improve the approximation ratio, we adopt the Q-value approach in Stochastic Boolean Function Evaluation Problem (SBFE) proposed by [22]. Based on that, we utilize the adaptive submodularity [23] in our problem and propose the Adaptive Submodular Algorithm, of which the approximation ratio is logarithmic to the number of edges for most uncertain graphs. 1) Preliminaries: The Q-value approach is proposed for the SBFE problem, which has close connection with our problem. The SBFE problem is that given a Boolean function f : {0, 1} n 7→ {0, 1} on an unknown input x. Each bit xi of x can only be determined by paying a cost ci . The prior probability of the value of each bit being 1 is specified by a probability vector p = (p1, p2, . . . , pn) and x is drawn from the corresponding product distribution. The goal is to derive an evaluation strategy minimizing the expected cost. The relation between our problem and SBFE can be established by considering each edge as a Boolean variable and see the function as implicitly given by the s-t connectivity of the uncertain graph. However, as constructing such a function from a graph may have exponential time complexity, we cannot directly use the algorithms proposed for SBFE problems. We then introduce some useful definitions for the Q-value approach adapted for our problem. For two temporary states a, b ∈ S, a is an extension of b, written as a ∼ b, if ai = bi for all bi 6= ∗. A function g : S 7→ N is said to be monotone if for s ∈ S and all s 0 ∼ s, g(s 0 ) − g(s) ≥ 0. g is submodular if g(s · e) − g(s) ≥ g(s 0 · e) − g(s 0 ) and g(s\e) − g(s) ≥ g(s 0\e) − g(s 0 ) whenever s 0 ∼ s and se = s 0 e = ∗. For a state s ∈ S and edge e with se = ∗, the expected marginal gain of the edge to the current state with respect to g is given by p(e)g(s · e) + (1 − p(e))g(s\e) − g(s)