Optimized Storage Placement over Large Scale Sensor Networks Lei Xie,Sanglu Lu,Yingchun Cao,and Daoxu Chen State Key Laboratory of Novel Software Technology Nanjing University.Nanjing,China Ixie@nju.edu.cn,sanglu@nju.edu.cn,yccao@nju.edu.cn,cdx@nju.edu.cn ABSTRACT they can process these queries,filter locally stored raw sen- Data storage has become an important issue for energy effi- sor data,and send out query replies to users[2].In this way cient data management over sensor networks.In this paper, a large amount of raw sensor data can be avoided for trans- we investigate into the optimized storage placement prob- mission such that the overall energy cost for data forwarding lem over large scale sensor networks,aiming to achieve min- is greatly reduced.Based on the above understanding,var- imized energy cost.In order to efficiently deal with the ious data-centric storage schemes [3]have been proposed to large scale deployment area with irregular shape,we pro- sufficiently leverage the sensors'local storage capability to pose to utilize the "hop"as the computation unit instead of achieve energy efficiency. the "node",such that the computation complexity can be greatly reduced.We propose methodologies to solve the op- However,on the other side,it has also been demonstrated timization problem respectively in situations for unlimited not energy-efficient to locally store the collected raw sensor number of storages and limited number of storages.The ul- data in all sensor nodes.This will introduce a large amount timate goal of this paper is to give a fundamental guidance of query diffusion cost,since the queries should be broad- of optimized storage placement for large scale sensor net- casted from the sink to each of the storage nodes.We note works.Simulation results confirm the performance of our that by appropriately deploying storage nodes over sensor methodologies. networks,the heavy load of query diffusion and raw sensor data forwarding can be alleviated,through making appro- Keywords priate tradeoffs between the above“pure push”and“pure pull”schemes[3.Therefore,a“push and pull'”based stor- Data Storage,Optimization,Storage Placement,Sensor Net- age scheme is essential to extract information from sensor work,Large Scale network,while achieving the overall energy efficiency 4 In this scheme,some intermediate "storage nodes"are de- 1 INTRODUCTION ployed over the sensor network,other ordinary nodes called Sensor networks in pervasive computing applications,such "forwarding nodes"just send their raw data upward to these as environment monitoring,health caring and target track- storages nodes along the routing tree,and queries are dif- ing,generate a large amount of data.Generally these data fused to these storage nodes to fetch those filtered sensor are collected from sensor nodes over the sensor network and data as query replies.Based on the above scenario,a chal- the end users retrieve them through diffusing specific queries lenging problem appears,that is how to place the storage from the sink into the network [1.Conventionally both raw nodes over the sensor network such that the overall energy sensor data and queries are continuously generated over the efficiency can be achieved. sensor network.Due to the limited battery power in these sensor nodes,it will greatly increase the overall energy cost To deal with this problem,Sheng et al.have proposed op- by simply forwarding all raw sensor data to the sink,more- timized algorithms based on dynamic programming to solve over,this will make sensor nodes around the sink heavily the storage placement problem in the tree based model 5 used and quickly exhausted in energy.We note that cur- However,in some industrial or research applications where rently some specially designed sensor nodes are equipped sensor networks are widely and densely deployed in a large with larger storage capability than normal sensors,thus they scale approach,it is laborious and time-consuming to com- can store a certain amount of raw sensor data in their local pute the optimized storage placement one by one based on storages.Hence when queries are diffused into these nodes the above algorithm.As a matter of fact,suppose the sensor nodes are uniformly distributed,it is not essential to com- pute the exact optimized storage placement due to the large scale deployment.Therefore,novel approaches should be proposed to simplify the computation of storage placement over the large scale sensor network,which is in irregular shape for general cases.In this paper,we investigate into this problem and propose optimized storage placement in the situations respectively with unlimited number of stor- ages and limited number of storages.Due to the large scale
Optimized Storage Placement over Large Scale Sensor Networks Lei Xie, Sanglu Lu, Yingchun Cao, and Daoxu Chen State Key Laboratory of Novel Software Technology Nanjing University, Nanjing, China lxie@nju.edu.cn, sanglu@nju.edu.cn, yccao@nju.edu.cn, cdx@nju.edu.cn ABSTRACT Data storage has become an important issue for energy effi- cient data management over sensor networks. In this paper, we investigate into the optimized storage placement problem over large scale sensor networks, aiming to achieve minimized energy cost. In order to efficiently deal with the large scale deployment area with irregular shape, we propose to utilize the “hop” as the computation unit instead of the “node”, such that the computation complexity can be greatly reduced. We propose methodologies to solve the optimization problem respectively in situations for unlimited number of storages and limited number of storages. The ultimate goal of this paper is to give a fundamental guidance of optimized storage placement for large scale sensor networks. Simulation results confirm the performance of our methodologies. Keywords Data Storage, Optimization, Storage Placement, Sensor Network, Large Scale 1. INTRODUCTION Sensor networks in pervasive computing applications, such as environment monitoring, health caring and target tracking, generate a large amount of data. Generally these data are collected from sensor nodes over the sensor network and the end users retrieve them through diffusing specific queries from the sink into the network [1]. Conventionally both raw sensor data and queries are continuously generated over the sensor network. Due to the limited battery power in these sensor nodes, it will greatly increase the overall energy cost by simply forwarding all raw sensor data to the sink, moreover, this will make sensor nodes around the sink heavily used and quickly exhausted in energy. We note that currently some specially designed sensor nodes are equipped with larger storage capability than normal sensors, thus they can store a certain amount of raw sensor data in their local storages. Hence when queries are diffused into these nodes, they can process these queries, filter locally stored raw sensor data, and send out query replies to users[2]. In this way, a large amount of raw sensor data can be avoided for transmission such that the overall energy cost for data forwarding is greatly reduced. Based on the above understanding, various data-centric storage schemes [3] have been proposed to sufficiently leverage the sensors’ local storage capability to achieve energy efficiency. However, on the other side, it has also been demonstrated not energy-efficient to locally store the collected raw sensor data in all sensor nodes. This will introduce a large amount of query diffusion cost, since the queries should be broadcasted from the sink to each of the storage nodes. We note that by appropriately deploying storage nodes over sensor networks, the heavy load of query diffusion and raw sensor data forwarding can be alleviated, through making appropriate tradeoffs between the above “pure push” and “pure pull” schemes[3]. Therefore, a “push and pull” based storage scheme is essential to extract information from sensor network, while achieving the overall energy efficiency [4]. In this scheme, some intermediate “storage nodes” are deployed over the sensor network, other ordinary nodes called “forwarding nodes” just send their raw data upward to these storages nodes along the routing tree, and queries are diffused to these storage nodes to fetch those filtered sensor data as query replies. Based on the above scenario, a challenging problem appears, that is how to place the storage nodes over the sensor network such that the overall energy efficiency can be achieved. To deal with this problem, Sheng et al. have proposed optimized algorithms based on dynamic programming to solve the storage placement problem in the tree based model [5]. However, in some industrial or research applications where sensor networks are widely and densely deployed in a large scale approach, it is laborious and time-consuming to compute the optimized storage placement one by one based on the above algorithm. As a matter of fact, suppose the sensor nodes are uniformly distributed, it is not essential to compute the exact optimized storage placement due to the large scale deployment. Therefore, novel approaches should be proposed to simplify the computation of storage placement over the large scale sensor network, which is in irregular shape for general cases. In this paper, we investigate into this problem and propose optimized storage placement in the situations respectively with unlimited number of storages and limited number of storages. Due to the large scale
deployment,we divide the deployment area into separate over large scale sensor networks,due to the huge computa- partitions and leverage the "hop"as the computation unit tion complexity for large scale deployment instead of the "node",which greatly reduces the computing complexity.By utilizing the characteristics of large scale 3. PROBLEM FORMULATION sensor network,our solution greatly simplifies the computa- Without loss of generality,we consider a data collecting sce- tion complexities. nario in a large-scale sensor network as follows:all sensor nodes are densely and uniformly distributed over a deploy- We aim to minimize the overall energy cost in the data accu- ment area,while the sink is set at some specified position mulation and query by appropriately placing storage nodes within the region in advance.The deployment area is re- over the sensor network.With unlimited number of stor- quired to satisfy the following convex property:for any de- ages,we analyze the optimized storage structure over the ployed sensor node.each point on the straight line segment deployment area,and further provide the optimized parame- that joins the sensor node with the sink also lies within the ters for storage deployment according to the irregular shapes deployment area.The exact shape of the deployment area is of the deployment area.With limited number of storages not restricted,either regular or irregular shape is included. we reduce the problem into an unbounded knapsack prob- We classify these sensor nodes into forwarding nodes and lem.Furthermore,we respectively propose dynamic pro- storage nodes,where storage nodes store sensor data and gramming based algorithm and greedy approximation algo- forwarding nodes simply forward their data to those storage rithm to effectively solve this problem. nodes.For ease of presentation,we can also regard the sink node as a storage node.Raw sensor data collected from The rest of this paper is organized as follows,Section II forwarding nodes are continuously sent upward along the reviews the related work.Section III defines problems for routing tree to the first parent storage node they meet,and storage node placement in the aforementioned scenario.Sec- queries are diffused from the sink to all storage nodes to tion IV and Section V respectively present optimized storage fetch the most recent sensed data,query replies will be sent placement methodologies for unlimited number of storages back from storage nodes to the sink along the routing tree and limited number of storages.Section VI conducts per- structure.Fig.1 depicts the data access model with storage formance analysis.Section VII presents simulation results. nodes.Due to the dense and uniform distribution of sensor Section VIII concludes the paper. nodes,for ease of analysis,we assume to deal with a uniform topology structure for the large-scale sensor network,hence 2.RELATED WORK the network is supposed to have a near-symmetric routing There have been a lot of research works for data-centric tree topology over the deployment area,which implies nodes storage schemes over sensor network,which mainly include at the same hop from the sink have similar subtree topolo- structured storage scheme and unstructured storage scheme gies. Structured storage scheme mainly utilizes a mapping func- tion such as geographic hash function to map event infor- mation to specified geographic points,and leverages routing algorithms like GPSR 6 to route messages to corresponding storage nodes,research works including GHT 6,DIMEN- SIONS [7]and DIM [8]belong to this scheme. Unstruc- tured storage scheme mainly utilizes the "push-pull"scheme to disseminate and gather information,research works in- cluding Double Ruling [9.Comb-Needles 10 belong to this scheme.Fang et.al.[11 propose a hybrid storage scheme to implement the event storage and query.Trigoni et.al. Forwarding Nodes Reply 12 presents a hybrid push and pull scheme for tree based Raw Dat Storage Nedes● structure where queries are injected from the sink.Ahn et.al.[13]conduct fundamental analysis over the scalability Figure 1:Data access model with storage nodes, of these two kinds of data-centric storage schemes. the black nodes denote the storage nodes while the white nodes denote the forwarding nodes. Recent research works focus on the storage placement prob- lem to enhance energy efficiency and load balance.Zhang We define the following parameters for the above scenario: et.al.14 propose a ring based index structure to help re- for data generation,we define rd as data generate rate and sd duce energy consumption of storage schemes.KDDCS [15] as the raw data size;for query diffusion,we use re to denote presents a load-balanced storage scheme for sensor networks query rate and sa to denote query size;for query reply,we based on K-D tree.Sheng et.al.present optimized algo- use a to denote reply reduction rate,for input t,which is rithms based on dynamic programming to solve the storage the size of raw data generated by a set of nodes,function placement problem over fixed tree model 516.moreover f(r)=ox for a E (0,1 returns the size of processed data they propose an approximation algorithm to solve the stor- as query replies. age placement problem over randomized dynamic tree model 17.Their dynamic programming methodology must save a For the large scale deployed sensor network,the objective is large amount of intermediate state parameters,and compute to place storage sensor nodes (and hence forwarding sensor from bottom to top along the tree based routing structure nodes)over the deployment area with irregular shape such While achieving a fair degree of accuracy,their optimized that the overall energy cost is minimized.Then the prob- solutions.however,is not suitable for the storage planning lems to solve are as follows:(1)Given unlimited storage
deployment, we divide the deployment area into separate partitions and leverage the “hop” as the computation unit instead of the “node”, which greatly reduces the computing complexity. By utilizing the characteristics of large scale sensor network, our solution greatly simplifies the computation complexities. We aim to minimize the overall energy cost in the data accumulation and query by appropriately placing storage nodes over the sensor network. With unlimited number of storages, we analyze the optimized storage structure over the deployment area, and further provide the optimized parameters for storage deployment according to the irregular shapes of the deployment area. With limited number of storages, we reduce the problem into an unbounded knapsack problem. Furthermore, we respectively propose dynamic programming based algorithm and greedy approximation algorithm to effectively solve this problem. The rest of this paper is organized as follows, Section II reviews the related work. Section III defines problems for storage node placement in the aforementioned scenario. Section IV and Section V respectively present optimized storage placement methodologies for unlimited number of storages and limited number of storages. Section VI conducts performance analysis. Section VII presents simulation results. Section VIII concludes the paper. 2. RELATED WORK There have been a lot of research works for data-centric storage schemes over sensor network, which mainly include structured storage scheme and unstructured storage scheme. Structured storage scheme mainly utilizes a mapping function such as geographic hash function to map event information to specified geographic points, and leverages routing algorithms like GPSR [6] to route messages to corresponding storage nodes, research works including GHT [6], DIMENSIONS [7] and DIM [8] belong to this scheme. Unstructured storage scheme mainly utilizes the “push-pull” scheme to disseminate and gather information, research works including Double Ruling [9], Comb-Needles [10] belong to this scheme. Fang et.al. [11] propose a hybrid storage scheme to implement the event storage and query. Trigoni et.al. [12] presents a hybrid push and pull scheme for tree based structure where queries are injected from the sink. Ahn et.al. [13] conduct fundamental analysis over the scalability of these two kinds of data-centric storage schemes. Recent research works focus on the storage placement problem to enhance energy efficiency and load balance. Zhang et. al. [14] propose a ring based index structure to help reduce energy consumption of storage schemes. KDDCS [15] presents a load-balanced storage scheme for sensor networks based on K-D tree. Sheng et. al. present optimized algorithms based on dynamic programming to solve the storage placement problem over fixed tree model [5][16], moreover, they propose an approximation algorithm to solve the storage placement problem over randomized dynamic tree model [17]. Their dynamic programming methodology must save a large amount of intermediate state parameters, and compute from bottom to top along the tree based routing structure. While achieving a fair degree of accuracy, their optimized solutions, however, is not suitable for the storage planning over large scale sensor networks, due to the huge computation complexity for large scale deployment. 3. PROBLEM FORMULATION Without loss of generality, we consider a data collecting scenario in a large-scale sensor network as follows: all sensor nodes are densely and uniformly distributed over a deployment area, while the sink is set at some specified position within the region in advance. The deployment area is required to satisfy the following convex property: for any deployed sensor node, each point on the straight line segment that joins the sensor node with the sink also lies within the deployment area. The exact shape of the deployment area is not restricted, either regular or irregular shape is included. We classify these sensor nodes into forwarding nodes and storage nodes, where storage nodes store sensor data and forwarding nodes simply forward their data to those storage nodes. For ease of presentation, we can also regard the sink node as a storage node. Raw sensor data collected from forwarding nodes are continuously sent upward along the routing tree to the first parent storage node they meet, and queries are diffused from the sink to all storage nodes to fetch the most recent sensed data, query replies will be sent back from storage nodes to the sink along the routing tree structure. Fig.1 depicts the data access model with storage nodes. Due to the dense and uniform distribution of sensor nodes, for ease of analysis, we assume to deal with a uniform topology structure for the large-scale sensor network, hence the network is supposed to have a near-symmetric routing tree topology over the deployment area, which implies nodes at the same hop from the sink have similar subtree topologies. Figure 1: Data access model with storage nodes, the black nodes denote the storage nodes while the white nodes denote the forwarding nodes. We define the following parameters for the above scenario: for data generation, we define rd as data generate rate and sd as the raw data size; for query diffusion, we use rq to denote query rate and sq to denote query size; for query reply, we use α to denote reply reduction rate, for input x, which is the size of raw data generated by a set of nodes, function f(x) = αx for α ∈ (0, 1] returns the size of processed data as query replies. For the large scale deployed sensor network, the objective is to place storage sensor nodes (and hence forwarding sensor nodes) over the deployment area with irregular shape such that the overall energy cost is minimized. Then the problems to solve are as follows: (1) Given unlimited storage
nodes,how to deploy storage nodes so as to attain the min PROOF.According to conclusion from [5],all storage nodes imum overall energy cost?(2)Given limited storage nodes have storage ancestors over the tree in the optimal solution (number of storage nodes=K),how to deploy storage nodes thus we can deduce the optimized storage placement struc- so as to attain the minimum overall energy cost?Most im- ture is a storage area surrounding the sink node.As the portant of all,the corresponding solutions are required to be network topology in the unit zone is uniform and symmet- scalable to the large scale deployment in computation com- ric,we can prove the storage area is a fan-shaped area by plexity,such that the optimized storage placement can be contradiction.As Fig.3 shows,suppose the storage area in figured out in an efficient approach. a unit zone is in an irregular shape instead of a regular fan shape,as the gray area depicts.Inside the storage area,we 4. UNLIMITED NUMBER OF STORAGES can find a fan-shaped region R with the maximum radius When the storage capability over each sensor node is large then we randomly choose a storage node outside R,we de- enough to hold some raw sensor data for further usage,then note it as node A.Due to the symmetry of the topology any sensor node has the opportunity to become a storage any node with the same hop of node A,e.g.,node B,must node,thus we can suppose to have unlimited number of have the same opportunity to be storage nodes in the op- timal solution.However,node B is set to be a forwarding storages,every sensor node over the routing tree structure can switch to the role of storage node if necessary node in the former hypothesis.Therefore we get a contradic- tion against the former hypothesis,hence the theorem gets proved.☐ 4.1 Optimized Storage Placement Structure Before analyzing on the optimized storage placement,we give the definition of unit zone.A unit zone is a fan-shaped Sink region rooted from the sink within the deployment area,it has a complete tree based topology with the minimum gran- ularity,which we called unit topology.Fig.2 illustrates an example of the unit zones within the gray deployment area. Due to the irregular shape of the deployment area,the max- imum hop varies among the unit zones,but the degree 0 is equivalent for all unit zones.Therefore,the overall deploy- ment area can be divided into a certain number of unit zones and the whole topology is composed of these corresponding unit topologies.The optimal storage placement inside each Forwarding Area unit zone is independent with each other among the unit zones. Figure 3:Proof of Theorem 1,we prove it by con- tradiction. Unit Zone ●、 5 4.2 Calculating Optimized Parameters According to the conclusion from [5]:If ara rd,then the optimal tree with the minimum energy cost contains only forwarding nodes (except for the root).If ara <rd,there erits a storage area for the optimal tree.,we know that if ara 2 rd,the optimized solution includes no storage nodes. Therefore in the remaining of this paper we only consider the are<rd situation.According to Theorem 1,it is proved that the optimized storage structure inside each unit zone is a fan-shaped area around the sink,there must exist an opti- Figure 2:Unit zone mized hop-count kopt for the fan-shaped area,where sensors within the kopt hops are storage nodes,and sensors outside We rely on the following conclusion to obtain the optimized the kopt hops are forwarding nodes.Hence kopt actually de- storage structure,which is first proposed by 5:In the op- notes the boundary of the storage area.Next we will calcu- timal tree,if si is a forwarding node,all of its descendants late the optimized parameter kop for the optimized storage are forwarding nodes as well.If si is a storage node,all area.Without loss of generality,in the rest of this paper its ancestors are storage nodes as well.According to this we utilize the equal sign "="to denote the expected value conclusion,we have Theorem 1. of specified variables. Assume the degree of a unit zone is 0(0<6<),the THEOREM 1.With unlimited number of storages,the op- maximum hop of a specified unit topology is n,the average timized placement structure for storage nodes in the unit hop distance is d,and the network node density is p,thus the zone is a fan-shaped area,where sensors inside the fan- number of sensor nodes for the ith hop is N =0(2i-1)d2p. shaped area are storage nodes,and others are forwarding Now we analyze the energy consumption model.To transmit nodes. s data units,the energy costs of the sender and receiver are
nodes, how to deploy storage nodes so as to attain the minimum overall energy cost? (2) Given limited storage nodes (number of storage nodes= K), how to deploy storage nodes so as to attain the minimum overall energy cost? Most important of all, the corresponding solutions are required to be scalable to the large scale deployment in computation complexity, such that the optimized storage placement can be figured out in an efficient approach. 4. UNLIMITED NUMBER OF STORAGES When the storage capability over each sensor node is large enough to hold some raw sensor data for further usage, then any sensor node has the opportunity to become a storage node, thus we can suppose to have unlimited number of storages, every sensor node over the routing tree structure can switch to the role of storage node if necessary. 4.1 Optimized Storage Placement Structure Before analyzing on the optimized storage placement, we give the definition of unit zone. A unit zone is a fan-shaped region rooted from the sink within the deployment area, it has a complete tree based topology with the minimum granularity, which we called unit topology. Fig.2 illustrates an example of the unit zones within the gray deployment area. Due to the irregular shape of the deployment area, the maximum hop varies among the unit zones, but the degree θ is equivalent for all unit zones. Therefore, the overall deployment area can be divided into a certain number of unit zones and the whole topology is composed of these corresponding unit topologies. The optimal storage placement inside each unit zone is independent with each other among the unit zones. Unit Zone Figure 2: Unit zone We rely on the following conclusion to obtain the optimized storage structure, which is first proposed by [5]: In the optimal tree, if si is a forwarding node, all of its descendants are forwarding nodes as well. If si is a storage node, all its ancestors are storage nodes as well. According to this conclusion, we have Theorem 1. Theorem 1. With unlimited number of storages, the optimized placement structure for storage nodes in the unit zone is a fan-shaped area, where sensors inside the fanshaped area are storage nodes, and others are forwarding nodes. Proof. According to conclusion from [5], all storage nodes have storage ancestors over the tree in the optimal solution, thus we can deduce the optimized storage placement structure is a storage area surrounding the sink node. As the network topology in the unit zone is uniform and symmetric, we can prove the storage area is a fan-shaped area by contradiction. As Fig.3 shows, suppose the storage area in a unit zone is in an irregular shape instead of a regular fan shape, as the gray area depicts. Inside the storage area, we can find a fan-shaped region R with the maximum radius, then we randomly choose a storage node outside R, we denote it as node A. Due to the symmetry of the topology, any node with the same hop of node A, e.g., node B, must have the same opportunity to be storage nodes in the optimal solution. However, node B is set to be a forwarding node in the former hypothesis. Therefore we get a contradiction against the former hypothesis, hence the theorem gets proved. Figure 3: Proof of Theorem 1, we prove it by contradiction. 4.2 Calculating Optimized Parameters According to the conclusion from [5]: If αrq ≥ rd, then the optimal tree with the minimum energy cost contains only forwarding nodes (except for the root). If αrq < rd, there exits a storage area for the optimal tree., we know that if αrq ≥ rd, the optimized solution includes no storage nodes. Therefore in the remaining of this paper we only consider the αrq < rd situation. According to Theorem 1, it is proved that the optimized storage structure inside each unit zone is a fan-shaped area around the sink, there must exist an optimized hop-count kopt for the fan-shaped area, where sensors within the kopt hops are storage nodes, and sensors outside the kopt hops are forwarding nodes. Hence kopt actually denotes the boundary of the storage area. Next we will calculate the optimized parameter kopt for the optimized storage area. Without loss of generality, in the rest of this paper we utilize the equal sign “=” to denote the expected value of specified variables. Assume the degree of a unit zone is θ(0 < θ < π), the maximum hop of a specified unit topology is n, the average hop distance is d, and the network node density is ρ, thus the number of sensor nodes for the ith hop is Ni = θ(2i−1)d 2 ρ. Now we analyze the energy consumption model. To transmit s data units, the energy costs of the sender and receiver are
etrs and ere.s,where etr and ere are the energy costs Therefore the overall energy consumption can be depicted for transmitting and receiving a unit data respectively,and as: etr is also relevant to the distance between the sender and receiver.For simplicity of presentation,the receiving energy Etotal(k)= C[(E,+Eg)·N+E,·Nk+ (Ea·N) cost is assigned to the transmitter without changing the total energy cost.Here we use Si to denote an arbitrary sensor (5) node in the ith hop,when sensor Si in the ith hop sends s And we let data units to Si in the jth hop,the energy cost of S;is 0, Rr=rqasd and the energy consumed by Si is: Rd rasd (etr +ere)s if Si is parent of Si Ro =Tqsa l(etr+ere·c),s if Sj is one of S.'s children· (1) To calculate the optimal hop count kopt for minimized over- In the above equation ci is the number of Si's children.For all energy consumption,we obtain the following discussion,we normalize the energy costs by OEtotal(k) =0→ (etr +ere)for ease of presentation.Thus,to transfer s data 8k units from Si to its parent,the transmitting energy will be (Ra-Rr).k2+(Rr-Ra+2Rg).k+C=0, s and to broadcast s data units to its children,sensor S will consume bis energy,where bi=Based on where tr十er the assumption of near-symmetric topology we can denote ci as the average number of children for sensor nodes in the C=1-n(R-R)+。rRg 6 ith hop,where c.And we set Ti as the Thus kopt has two possible solutions average number of sensor nodes in the subtree for any node in the ith hop (including the ith hop node itself),thus we =BB-2)+d=B-B-2)-4 have: 2(Ra-Rr) 2(Ra-R,】 =1+=1+2- where 2i-1 6=V(R,-Ra+2Rg)2-4C(Ra-R,) As we deduce from ar<ra there exists Rd>Rr,therefore As we know the energy cost Ei for a sensor node in the ith we get: hop consists of three types of energy costs: Ra 6 .1 Ei=Eq+Er+Ed. =互--R-2Ra-R<2 In the above equation,Eg denotes energy cost for query As we have kopt >1 we obsolete k2 and finally obtain the diffusion,Er denotes energy cost for query reply,and Ea optimal hop count: denotes energy cost for raw data forwarding.Hence based on the above energy model,we calculate the energy con- kopt (Ra-R-2Ra)+8 (6) 2(Ra-R) sumption for each ith hop,here we set the sink at the Oth hop. According to Eq.(6),it is known that the optimal hop We use k to denote the last hop of the storage node area. count for the storage area inside the unit zone,i.e.kopt, depends on the maximum hop count of the corresponding For storage nodes within the kth hop (1<i<k )we have: unit topology,i.e.n.Furthermore,it can be inferred that Er rqa Tisd the larger value of n is,the larger value of kopt can be ob- Ed=0 (2) tained.Based on the above analysis,if there are unlimited Ea birasa number of storages,we can compute the optimized storage placement as follows:For a specified deployment area with Here we set the reply data size as aTisd for the snapshot irregular shape,it can be approximately divided into a cer- query only fetches the most recent sensing data,and we set tain number of fan-shaped partitions according to the irreg- E=0 because although the storage nodes generate raw ular shape.Each fan-shaped partition is rooted from the data locally,they don't need to forward these data until sink and must contain no less than one unit zones.More- they're sent out as query replies.For storage nodes right in over,the fan-shaped partition is relatively regular in shape the kth hop,(i=k),we have: as compared to the overall deployment area,thus we can av- erage thethe hop counts of leaf nodes inside the fan-shaped Er Tga Tk sd partition and approximate the maximum hop count as n. Ed=0 (3) Therefore,we can approximately compute the optimal hop E=0 count Kopt for the storage area respectively for each fan- Here we set Eg =0 because the last hop of storage nodes shaped partition.Then,by putting the pieces together,the need not to diffuse queries to their forwarding node children. whole storage area can be obtained.Fig.4 illustrates the ba- For forwarding nodes outside the kth hop (k+1si<n): sic idea of the method.Theoretically,when the deployment area is divided into more refined partitions,the solution is Er=0 closer to the ideal optimal storage structure.In extreme Ed Tirasd. (4) case,when each of the fan-shaped partitions is exactly a E=0 unit zone,the optimal storage structure can be obtained
etr · s and ere · s, where etr and ere are the energy costs for transmitting and receiving a unit data respectively, and etr is also relevant to the distance between the sender and receiver. For simplicity of presentation, the receiving energy cost is assigned to the transmitter without changing the total energy cost. Here we use Si to denote an arbitrary sensor node in the ith hop, when sensor Si in the ith hop sends s data units to Sj in the jth hop, the energy cost of Sj is 0, and the energy consumed by Si is: (etr + ere) · s if Sj is parent of Si (etr + ere · ci) · s if Sj is one of Si’s children . (1) In the above equation ci is the number of Si’s children. For the following discussion, we normalize the energy costs by (etr + ere) for ease of presentation. Thus, to transfer s data units from Si to its parent, the transmitting energy will be s and to broadcast s data units to its children, sensor Si will consume bi · s energy, where bi = etr+ere·ci etr+ere . Based on the assumption of near-symmetric topology we can denote ci as the average number of children for sensor nodes in the ith hop, where ci = Ni+1 Ni = 2i+1 2i−1 . And we set |Ti| as the average number of sensor nodes in the subtree for any node in the ith hop (including the ith hop node itself), thus we have: |Ti| = 1 + Pn i+1 Nj Ni = 1 + n 2 − i 2 2i − 1 . As we know the energy cost Ei for a sensor node in the ith hop consists of three types of energy costs: Ei = Eq + Er + Ed. In the above equation, Eq denotes energy cost for query diffusion, Er denotes energy cost for query reply, and Ed denotes energy cost for raw data forwarding. Hence based on the above energy model, we calculate the energy consumption for each ith hop, here we set the sink at the 0th hop. We use k to denote the last hop of the storage node area. For storage nodes within the kth hop (1 < i < k ), we have: Er = rqα|Ti|sd Ed = 0 Eq = birqsq . (2) Here we set the reply data size as α|Ti|sd for the snapshot query only fetches the most recent sensing data, and we set Ed = 0 because although the storage nodes generate raw data locally, they don’t need to forward these data until they’re sent out as query replies. For storage nodes right in the kth hop, (i = k), we have: Er = rqα|Tk|sd Ed = 0 Eq = 0 . (3) Here we set Eq = 0 because the last hop of storage nodes need not to diffuse queries to their forwarding node children. For forwarding nodes outside the kth hop (k + 1 ≤ i ≤ n): Er = 0 Ed = |Ti|rdsd Eq = 0 . (4) Therefore the overall energy consumption can be depicted as: Etotal(k) = kX−1 i=1 [(Er + Eq) · Ni] + Er · Nk + Xn i=k+1 (Ed · Ni). (5) And we let Rr = rqαsd Rd = rdsd Rq = rqsq . To calculate the optimal hop count kopt for minimized overall energy consumption, we obtain ∂Etotal(k) ∂k = 0 ⇒ (Rd − Rr) · k 2 + (Rr − Rd + 2Rq) · k + C = 0, where C = 1 − n 2 6 (Rd − Rr) + −2etr ere + etr Rq. Thus kopt has two possible solutions k1 = (Rd − Rr − 2Rq) + δ 2(Rd − Rr) , k2 = (Rd − Rr − 2Rq) − δ 2(Rd − Rr) , where δ = p (Rr − Rd + 2Rq) 2 − 4C(Rd − Rr). As we deduce from αrq < rd there exists Rd > Rr, therefore we get: k2 = 1 2 − Rq Rd − Rr − δ 2(Rd − Rr) < 1 2 . As we have kopt ≥ 1 we obsolete k2 and finally obtain the optimal hop count: kopt = (Rd − Rr − 2Rq) + δ 2(Rd − Rr) . (6) According to Eq. (6), it is known that the optimal hop count for the storage area inside the unit zone, i.e. kopt, depends on the maximum hop count of the corresponding unit topology, i.e. n. Furthermore, it can be inferred that the larger value of n is, the larger value of kopt can be obtained. Based on the above analysis, if there are unlimited number of storages, we can compute the optimized storage placement as follows: For a specified deployment area with irregular shape, it can be approximately divided into a certain number of fan-shaped partitions according to the irregular shape. Each fan-shaped partition is rooted from the sink and must contain no less than one unit zones. Moreover, the fan-shaped partition is relatively regular in shape as compared to the overall deployment area, thus we can average the the hop counts of leaf nodes inside the fan-shaped partition and approximate the maximum hop count as nb. Therefore, we can approximately compute the optimal hop count kopt for the storage area respectively for each fanshaped partition. Then, by putting the pieces together, the whole storage area can be obtained. Fig.4 illustrates the basic idea of the method. Theoretically, when the deployment area is divided into more refined partitions, the solution is closer to the ideal optimal storage structure. In extreme case, when each of the fan-shaped partitions is exactly a unit zone, the optimal storage structure can be obtained
7000 6000 5000 4000 3000 2000 1000 10 15 The value of maximum hops Figure 5:Compute coverage area as the maximum hop count increases (d=3m) Figure 4:Compute optimized storage structure with unlimited storage case.We denote these nodes as S.Then as we increase the number of storages from K to K*,the nodes in S will obvi- 5.LIMITED NUMBER OF STORAGES ously switch from storage nodes to forwarding nodes,being In some situations we observe that not all sensor nodes have replaced by other storage nodes within the koptth hop inside each unit zone.During the process,we denote those nodes enough local storages because more storage will come along which switch from forwarding nodes to storage nodes as R. with increased hardware cost,we are only allowed to deploy Based on the above analysis.due to the optimization effect a limited number of specially designed storage nodes for en- ergy efficiency.When dealing with the storage placement of the new storage structure,we obtain the energy reduction AE(R)<AE(S).Due to the increment of K,we know that problem with limited number of storages,it means that the the size of R is larger than the size of S.According to the number of storages,K,is smaller than the optimal num- above inequality,with K limited number of storages,we can ber of storages,K*,for unlimited storages,otherwise it can propose a better optimized solution as compared to the cur- be reduced to the unlimited storage problem.As described rent solution:switch part (or:all)of the storage nodes in S above we only consider the arg<rd situation. to forwarding nodes,and switch equal number of nodes in the R to be new storage nodes,we can rebuild part of the 5.1 Optimized Storage Placement Structure new storage structure and thus achieve better performance. We first give the definition of energy reduction.The energy Hence we get the contradiction with the former assumption, reduction of a node Si means the changed energy cost for S the theorem gets proved. to switch from a forwarding node to a storage node,while the roles of other nodes remain unchanged,we denote it as AE(S).Therefore,when a node Si is selected as a storage As indicated in Theorem 2,it can be inferred that in the node in the optimal solution,it will always satisfy AE(Si)< optimal storage placement with limited number of storages. 0,and its absolute value AE(Si)denotes the amount of the storage node set is a subset of the storage nodes in the energy cost reduction.For ease of representation,in the optimal storage placement with unlimited number of stor- following part we also use the energy reduction AE(S)to ages.Therefore,the solution space for the optimized storage express the changed energy cost for a bunch of nodes S to placement is greatly reduced.Moreover,as the deployment switch from forwarding nodes to storage nodes.Similar to area is divided into a certain number of fan-shaped parti- the strategy proposed in unlimited number of storages,as tions,each fan-shaped partition is fairly regular in shape, Fig.4 illustrates,we divide the deployment area in irregular then nodes in the same hop should have the same opportu- shape into a certain number of fan-shaped partitions.Then nity to be storage nodes or forwarding nodes.Hence we can we compute the optimized storage placement inside each utilize the hop as the computation unit instead of the node partition.The optimized storage placement relies on the then the compute complexity is further reduced.As a matter following conclusion: of fact,even for a large scale sensor network,the maximum hop is very limited.Fig.5 illustrates the coverage area of THEOREM 2.In the optimal storage placement with lim- a conventional sensor network as the maximum hop count ited number of storages,the storage nodes are all within the increases.Here we set the average hop distance d 3m. optimal storage area with unlimited number of storages. We note that as the maximum hop count reaches 15,the coverage area is already near 6500 m2,which is a rather large scale deployment for conventional sensor network ap- PROOF.We prove this theorem by contradiction.Sup- plications.Thus the maximum hop count is smaller than pose in the optimal storage placement with unlimited num- 15 for conventional cases.Therefore,with limited number ber of storages,the essential number of storages is K.As- of storages,it is practical to enumerate all possible storage sume in the optimal solution with K limited number of stor- placement solutions for any unit zone and compare their ages,there exist some storage nodes outside the boundary of performance.The size of solution space is s=2opt,as the optimized storage area calculated in unlimited storage conventionally kopt 15,s is smaller than 215 =32768
Figure 4: Compute optimized storage structure with unlimited storage 5. LIMITED NUMBER OF STORAGES In some situations we observe that not all sensor nodes have enough local storages because more storage will come along with increased hardware cost, we are only allowed to deploy a limited number of specially designed storage nodes for energy efficiency. When dealing with the storage placement problem with limited number of storages, it means that the number of storages, K, is smaller than the optimal number of storages, K∗ , for unlimited storages, otherwise it can be reduced to the unlimited storage problem. As described above we only consider the αrq < rd situation. 5.1 Optimized Storage Placement Structure We first give the definition of energy reduction. The energy reduction of a node Si means the changed energy cost for Si to switch from a forwarding node to a storage node, while the roles of other nodes remain unchanged, we denote it as ∆E(Si). Therefore, when a node Si is selected as a storage node in the optimal solution, it will always satisfy ∆E(Si) < 0, and its absolute value |∆E(Si)| denotes the amount of energy cost reduction. For ease of representation, in the following part we also use the energy reduction ∆E(S) to express the changed energy cost for a bunch of nodes S to switch from forwarding nodes to storage nodes. Similar to the strategy proposed in unlimited number of storages, as Fig.4 illustrates, we divide the deployment area in irregular shape into a certain number of fan-shaped partitions. Then we compute the optimized storage placement inside each partition. The optimized storage placement relies on the following conclusion: Theorem 2. In the optimal storage placement with limited number of storages, the storage nodes are all within the optimal storage area with unlimited number of storages. Proof. We prove this theorem by contradiction. Suppose in the optimal storage placement with unlimited number of storages, the essential number of storages is K∗ . Assume in the optimal solution with K limited number of storages, there exist some storage nodes outside the boundary of the optimized storage area calculated in unlimited storage 0 5 10 15 0 1000 2000 3000 4000 5000 6000 7000 The value of maximum hops The coverage area (m 2 ) Figure 5: Compute coverage area as the maximum hop count increases (d=3m) case. We denote these nodes as S. Then as we increase the number of storages from K to K∗ , the nodes in S will obviously switch from storage nodes to forwarding nodes, being replaced by other storage nodes within the koptth hop inside each unit zone. During the process, we denote those nodes which switch from forwarding nodes to storage nodes as R. Based on the above analysis, due to the optimization effect of the new storage structure, we obtain the energy reduction ∆E(R) < ∆E(S). Due to the increment of K, we know that the size of R is larger than the size of S. According to the above inequality, with K limited number of storages, we can propose a better optimized solution as compared to the current solution: switch part (or: all) of the storage nodes in S to forwarding nodes, and switch equal number of nodes in the R to be new storage nodes, we can rebuild part of the new storage structure and thus achieve better performance. Hence we get the contradiction with the former assumption, the theorem gets proved. As indicated in Theorem 2, it can be inferred that in the optimal storage placement with limited number of storages, the storage node set is a subset of the storage nodes in the optimal storage placement with unlimited number of storages. Therefore, the solution space for the optimized storage placement is greatly reduced. Moreover, as the deployment area is divided into a certain number of fan-shaped partitions, each fan-shaped partition is fairly regular in shape, then nodes in the same hop should have the same opportunity to be storage nodes or forwarding nodes. Hence we can utilize the hop as the computation unit instead of the node, then the compute complexity is further reduced. As a matter of fact, even for a large scale sensor network, the maximum hop is very limited. Fig.5 illustrates the coverage area of a conventional sensor network as the maximum hop count increases. Here we set the average hop distance d = 3m. We note that as the maximum hop count reaches 15, the coverage area is already near 6500 m2 , which is a rather large scale deployment for conventional sensor network applications. Thus the maximum hop count is smaller than 15 for conventional cases. Therefore, with limited number of storages, it is practical to enumerate all possible storage placement solutions for any unit zone and compare their performance. The size of solution space is s = 2kopt , as conventionally kopt ≤ 15, s is smaller than 215 = 32768