6 B.Temporary Tree in General Distribution For a node R,we use NR to denote the regions where nodes might be selected by R as a its neighbor.We use a rectangular Based on the conclusions of tree length in uniform distribu- region as approximate neighbor region N.and N C NR. tion,we further study the case that nodes are non-uniformly The approximate neighbor region is the region marked with distributed.We partition the unit square into k small squares, where m =k1+7 and 0<Y<1.We construct trees among parallel lines in Figure 5.1.We use the method adopted in the proof of Lemma 3 to estimate the length of T. nodes in each square,and then connect nodes in different cells so that all nodes in the network are connected.For each square, It can be observed that the approximate neighbor regions the source is outside the square and we still apply the TST are the same in both cases that we take S as the source and algorithm for the tree construction.Lemma 4 can estimate the that we take S'as the source.There are some details that intra-square edge length,and we study the inter-square edge need to be clarified.Firstly,when we only consider the nodes length in Lemma 5.With both inter-and intra-square edge in approximate neighbor region,the estimated tree length is estimation,we derive upper bound for temporary tree length larger than actual length,because we ignore the nodes that in general distribution. are closer to node R.Secondly,if neighbors exist in the approximate region,the estimations of edge length are the same for both T and T.Thirdly,if no neighbor is found in approximate region for a node,we assume that it doesn't connect to others in T but it connects to the source in T in our calculation.From the analysis above,we can conclude that length of T is upper bounded by the length of T. Also recall that in our proof of Lemma 3,and 5.622vm is the upper bound for temporary tree length wherever S'is.In summary,we can directly use estimated tree length in Lemma 3 as the upper bound of the tree length of T.This completes our proof. Lemma 5:(Inter-square edges)Let m nodes be indepen- So- dently distributed in a unit square with density function f(x). The unit square [0,1]x [0,1]can be partitioned into k square S, cells with edge length of1 where m=k+and0<<1. The length of inter-square edges connecting k cells in the unit Fig.5.1:Approximate neighbor region when the source is square is o(vm). located outside the square Proof:We know that the expected number of nodes in each square cell is greater than e=kYe1.To compute the Lemma 4:(Intra-square edges)Let m nodes be indepen- minimal distance between two nodes in adjacent squares,we dently distributed in a unit square with density function f(x). partition the cell with edge length ofinto smaller grids The source S is located outside the square.Let each node with edge length of where a>. connect to the closest neighbor that has shorter Euclidean We claim that if a-<,the minimal length between two distance to S than the node itself.If no such receiver exists. adjacent cells is in an order of This comes from the it doesn't connect to other nodes.A tree can be constructed observation that we can connect adjacent cells by connecting among m nodes,and the expected length of such a tree is upper bounded by cvm,where c=5.622. nodes in adjacent grids whose edge length is as is shown in Figure 5.2.In this figure,the yellow and the black squares Proof:For those nodes that not closest to S,they can always are two adjacent cells with edge length of The blue grids find another node to connect to.For the node that is closest contained in them are the smaller squares with edge length of to S,it will be connected to by other nodes.We can prove Green lines are used to show that nodes in the adjacent that the topology formed by m nodes is exactly a tree with grids are connected. Lemma 1.We denote this tree as T. As we can see from Figure 5.2,for two adjacent cells with There are two differences of this lemma from Lemma 3. One is that the source is located outside the square,and the edge length of12 pairs of nodes in adjacent grids might exist.Denote P as the probability that a node exists other is that a node won't connect to others when it can't find another one that has shorter Euclidean distance to the source. in a grid with edge length of 1/k.Since the area of each Now we find a point S'that is closest to S in the boundary square is very small,we can regard nodes in the same square uniformly distributed.We have of square region,as is shown in Figure 5.1.With S as the source,a temporary tree as mentioned in TST algorithm can be established spanning all nodes in the network.We denote A=1-1-学 the temporary tree as T.In the following we demonstrate that Denote P2 as the probability that nodes exist in both of the the tree length of T can be used to estimate the upper bound adjacent grids of length of T. P2=1-P
6 B. Temporary Tree in General Distribution Based on the conclusions of tree length in uniform distribution, we further study the case that nodes are non-uniformly distributed. We partition the unit square into k small squares, where m = k 1+γ and 0 < γ < 1. We construct trees among nodes in each square, and then connect nodes in different cells so that all nodes in the network are connected. For each square, the source is outside the square and we still apply the TST algorithm for the tree construction. Lemma 4 can estimate the intra-square edge length, and we study the inter-square edge length in Lemma 5. With both inter- and intra- square edge estimation, we derive upper bound for temporary tree length in general distribution. S S’ R x y Fig. 5.1: Approximate neighbor region when the source is located outside the square Lemma 4: (Intra-square edges) Let m nodes be independently distributed in a unit square with density function f(x). The source S is located outside the square. Let each node connect to the closest neighbor that has shorter Euclidean distance to S than the node itself. If no such receiver exists, it doesn’t connect to other nodes. A tree can be constructed among m nodes, and the expected length of such a tree is upper bounded by c √ m, where c = 5.622. Proof: For those nodes that not closest to S, they can always find another node to connect to. For the node that is closest to S, it will be connected to by other nodes. We can prove that the topology formed by m nodes is exactly a tree with Lemma 1. We denote this tree as T. There are two differences of this lemma from Lemma 3. One is that the source is located outside the square, and the other is that a node won’t connect to others when it can’t find another one that has shorter Euclidean distance to the source. Now we find a point S 0 that is closest to S in the boundary of square region, as is shown in Figure 5.1. With S 0 as the source, a temporary tree as mentioned in TST algorithm can be established spanning all nodes in the network. We denote the temporary tree as T 0 . In the following we demonstrate that the tree length of T 0 can be used to estimate the upper bound of length of T. For a node R, we use NR to denote the regions where nodes might be selected by R as a its neighbor. We use a rectangular region as approximate neighbor region N0 R, and N0 R ⊆ NR. The approximate neighbor region is the region marked with parallel lines in Figure 5.1. We use the method adopted in the proof of Lemma 3 to estimate the length of T. It can be observed that the approximate neighbor regions are the same in both cases that we take S as the source and that we take S 0 as the source. There are some details that need to be clarified. Firstly, when we only consider the nodes in approximate neighbor region, the estimated tree length is larger than actual length, because we ignore the nodes that are closer to node R. Secondly, if neighbors exist in the approximate region, the estimations of edge length are the same for both T and T 0 . Thirdly, if no neighbor is found in approximate region for a node, we assume that it doesn’t connect to others in T but it connects to the source in T 0 in our calculation. From the analysis above, we can conclude that length of T is upper bounded by the length of T 0 . Also recall that in our proof of Lemma 3, and 5.622√ m is the upper bound for temporary tree length wherever S 0 is. In summary, we can directly use estimated tree length in Lemma 3 as the upper bound of the tree length of T. This completes our proof. Lemma 5: (Inter-square edges) Let m nodes be independently distributed in a unit square with density function f(x). The unit square [0, 1] × [0, 1] can be partitioned into k square cells with edge length of √ 1 k , where m = k 1+γ and 0 < γ < 1. The length of inter-square edges connecting k cells in the unit square is o( √ m). Proof: We know that the expected number of nodes in each square cell is greater than m k 1 = k γ 1. To compute the minimal distance between two nodes in adjacent squares, we partition the cell with edge length of √ 1 k into smaller grids with edge length of 1 kα , where α > 1 2 . We claim that if α−γ < 1 2 , the minimal length between two adjacent cells is in an order of o ⑨ √ 1 k ❾ . This comes from the observation that we can connect adjacent cells by connecting nodes in adjacent grids whose edge length is 1 kα , as is shown in Figure 5.2. In this figure, the yellow and the black squares are two adjacent cells with edge length of √ 1 k . The blue grids contained in them are the smaller squares with edge length of 1 kα . Green lines are used to show that nodes in the adjacent grids are connected. As we can see from Figure 5.2, for two adjacent cells with edge length of √ 1 k , k α−1/2 pairs of nodes in adjacent grids might exist. Denote P1 as the probability that a node exists in a grid with edge length of 1/kα. Since the area of each square is very small, we can regard nodes in the same square uniformly distributed. We have P1 = 1 − (1 − 1 k 2α−1 ) m1 k . Denote P2 as the probability that nodes exist in both of the adjacent grids. P2 = 1 − P 2 1
7 There are k-1/2 pairs of nodes in adjacent grids,and we C.Path With Minimal Hops denote P as the probability that at least one pair exist.We Receivers are connected by the minimal-hop path.In this have P=1-P5-a part,we study the relationship between the path length and Euclidean distance between two nodes. Hence we have Lemma 7:Let n nodes be independently and identically distributed over [0,1]x [0,1]with distribution function f(x). Suppose that the Euclidean distance between two nodes u and P-1-1-1-1-)学-w (5-1)vis z.The following properties hold: (a)The expectation of fewest relays that are needed to In order to let k squares connected by inter-square edges,it connect u and v converges to as n approaches oo; should hold that pk1.Therefore,we need the following (b)The length expectation of the path connecting uv and in- condition volving the fewest relays converges to Euclidean distance 1-k72(1-(1-a)学尺 1 (5-2) T. Proof:See Appendix A. The expression that r1()>r2()means that r2()/r1() 0 as k-oo.Condition (5-2)is equivalent to condition(5-3). D.Multicast Tree logk 1 k-1/2+y+ae1 ←2a-i (5-3) We divide the [0,1]x [0,1]network region into k squares.In each square,we construct a tree and connect nodes with intra- Condition(5-3)can be satisfied when a<+.With square edges.Adjacent squares are connected by the inter- <<+we can evaluate P.By (5-1)it can be square edges.All nodes are connected by intra-and inter- verified that square edges,and they can be used to estimate the tree length. P~1-exp(-2e1k/2-a+n-0g2k/2-a), (5-4) Theorem I:Let n nodes be independently and identically distributed in a unit square and their distribution satisfies the It is easy to show that pk1 with the expression (5-4), density function f(x).We construct a multicast tree spanning which means such pairs of nodes exist for all adjacent cells m receivers as well as the source with TST algorithm.When with high probability.Since(1-P)is exponentially decaying m and n are both very large,the expected length of the tree is to zero,the expectation of total path length needed to connect upper bounded by cmf(x)dx,where c=5.622. cells is Proof:Denote ei.j as the edge connecting Receivers i and j in k1-aPk+k1/2k(1-P)~k1-a=o(k1/2) (5-5)the temporary tree Tv,li.j as the length of the minimal-hop path between the two receivers.Since redundant edges will Due to the fact that k =o(m),the expected path length for be eliminated,E(LM)>E(i).And the path length inter-square connection is in the order of o(vm). et,∈Tv converges to Euclidean distance as network size goes to oo according to Lemma 7.So we have: 1/k E(LM)≤cvm Vf(x)dx (5-6) Jx∈0.1]2 Remark:We derive an upper bound for the Toward Source Tree,but it is not a tight bound.In Section VIlI,we will show that TST algorithm has even better empirical performance than our theoretical bound. Lemma 8:Suppose Xi,1<i<oo,are independent random variables with distribution u having compact support in Rd, d 2.If the monotone function satisfies ()~x as x0 for some 0<a<d,then with probability 1 1/2 1/k/2 imn(do)/4M()=c(a,d) f(x)(d-a)/ddx Fig.5.2:Inter-square edges between nodes in adjacent square cells (5-7) Here f denotes the density of the absolutely continuous part of u and c(o,d)denotes a strictly positive constant which Lemma 6:Let m nodes be independently distributed with density function f(x).The expectation for the total length of depends only on the power a and the dimension d [26]. temporary tree E[Lv]is smaller than cvm.We have E[Lv]< Given a graph with some nodes and edges,building a minimal length tree spanning a subset of nodes with relays cvm feto.f(x)dx.where c5.622. appropriately added is formulated as Steiner Tree Problem.If Proof:See Appendix A. no relay nodes are allowed,then the tree with minimal length
7 There are k α−1/2 pairs of nodes in adjacent grids, and we denote P as the probability that at least one pair exist. We have P = 1 − P k α−1/2 2 . Hence we have P = 1 − (1 − (1 − (1 − 1 k 2α−1 ) m1 k ) 2 ) k α−1/2 (5-1) In order to let k squares connected by inter-square edges, it should hold that P k → 1. Therefore, we need the following condition 1 − k − 1 kα−1/2 (1 − (1 − 1 k 2α−1 ) m1 k ) 2 . (5-2) The expression that r1(k) r2(k) means that r2(k)/r1(k) → 0 as k → ∞. Condition (5-2) is equivalent to condition (5-3). log k k−1/2+γ+α1 1 k 2α−1 , (5-3) Condition (5-3) can be satisfied when α < γ + 1 2 . With 1 2 < α < γ + 1 2 , we can evaluate P. By (5-1) it can be verified that P ∼ 1 − exp(−21k 1/2−α+γ − 1 log 2k 1/2−α ). (5-4) It is easy to show that P k → 1 with the expression (5-4), which means such pairs of nodes exist for all adjacent cells with high probability. Since (1−P) is exponentially decaying to zero, the expectation of total path length needed to connect k cells is k 1−αP k + k 1/2 k(1 − P) ∼ k 1−α = o(k 1/2 ) (5-5) Due to the fact that k = o(m), the expected path length for inter-square connection is in the order of o( √ m). 1/kα 1/k1/2 1/k1/2 Fig. 5.2: Inter-square edges between nodes in adjacent square cells Lemma 6: Let m nodes be independently distributed with density function f(x). The expectation for the total length of temporary tree E[LV ] is smaller than c √ m. We have E[LV ] ≤ c √ m ❘ x∈[0,1]2 ➮ f(x)dx, where c ≈ 5.622. Proof: See Appendix A. C. Path With Minimal Hops Receivers are connected by the minimal-hop path. In this part, we study the relationship between the path length and Euclidean distance between two nodes. Lemma 7: Let n nodes be independently and identically distributed over [0, 1] × [0, 1] with distribution function f(x). Suppose that the Euclidean distance between two nodes u and v is x. The following properties hold: (a) The expectation of fewest relays that are needed to connect u and v converges to x r as n approaches ∞; (b) The length expectation of the path connecting uv and involving the fewest relays converges to Euclidean distance x. Proof: See Appendix A. D. Multicast Tree We divide the [0, 1]×[0, 1] network region into k squares. In each square, we construct a tree and connect nodes with intrasquare edges. Adjacent squares are connected by the intersquare edges. All nodes are connected by intra- and intersquare edges, and they can be used to estimate the tree length. Theorem 1: Let n nodes be independently and identically distributed in a unit square and their distribution satisfies the density function f(x). We construct a multicast tree spanning m receivers as well as the source with TST algorithm. When m and n are both very large, the expected length of the tree is upper bounded by c √ m ❘ x∈[0,1]2 ➮ f(x)dx, where c = 5.622. Proof: Denote ei,j as the edge connecting Receivers i and j in the temporary tree TV , li,j as the length of the minimal-hop path between the two receivers. Since redundant edges will be eliminated, E(LM) ≤ P ei,j∈TV E(li,j ). And the path length converges to Euclidean distance as network size goes to ∞ according to Lemma 7. So we have: E(LM) ≤ c √ m ❩ x∈[0,1]2 ➮ f(x)dx. (5-6) Remark: We derive an upper bound for the Toward Source Tree, but it is not a tight bound. In Section VIII, we will show that TST algorithm has even better empirical performance than our theoretical bound. Lemma 8: Suppose Xi , 1 ≤ i < ∞, are independent random variables with distribution µ having compact support in R d , d ≥ 2. If the monotone function ψ satisfies ψ(x) ∼ x α as x → 0 for some 0 < α < d, then with probability 1 limn→∞ n −(d−α)/dM(X1X2, ..., Xn) = c(α, d) ❩ Rd f(x) (d−α)/ddx (5-7) Here f denotes the density of the absolutely continuous part of µ and c(α, d) denotes a strictly positive constant which depends only on the power α and the dimension d [26]. Given a graph with some nodes and edges, building a minimal length tree spanning a subset of nodes with relays appropriately added is formulated as Steiner Tree Problem. If no relay nodes are allowed, then the tree with minimal length