This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR Burstiness-Aware Resource Reservation for Server Consolidation in Computing Clouds Sheng Zhang,Member,IEEE,Zhuzhong Qian,Member,IEEE,Zhaoyi Luo,Jie Wu,Fellow,IEEE,and Sanglu Lu,Member,IEEE Abstract-In computing clouds,burstiness of a virtual machine(VM)workload widely exists in real applications,where spikes usually occur aperiodically with low frequency and short duration.This could be effectively handled through dynamically scaling up/down in a virtualization-based computing cloud;however,to minimize energy consumption,VMs are often highly consolidated with the minimum number of physical machines(PMs)used.In this case,to meet the dynamic runtime resource demands of VMs in a PM,some VMs have to be migrated to some other PMs,which may cause potential performance degradation.In this paper,we investigate the burstiness- aware server consolidation problem from the perspective of resource reservation,i.e.,reserving a certain amount of extra resources on each PM to avoid live migrations,and propose a novel server consolidation algorithm,QUEUE.We first model the resource requirement pattern of each VM as a two-state Markov chain to capture burstiness,then we design a resource reservation strategy for each PM based on the stationary distribution of a Markov chain.Finally,we present QUEUE,a complete server consolidation algorithm with a reasonable time complexity.We also show how to cope with heterogenous spikes and provide remarks on several extensions. Simulation and testbed results show that,QUEUE improves the consolidation ratio by up to 45%with large spike size and around 30%with normal spike size compared with the strategy that provisions for peak workload,and achieves a better balance between performance and energy consumption in comparison with other commonly-used consolidation algorithms. Index Terms-Bursty workload,Markov chain,resource reservation,server consolidation,stationary distribution 1 INTRODUCTION We observed that the variability and burstiness of VM LOUD computing has been gaining more and more workload widely exists in modern computing clouds, traction in the past few years,and it is changing as evidenced in prior studies [4],[6],[7],[8],[9].Take the way we access and retrieve information [1].The a typical web server for example,burstiness may be recent emergence of virtual desktop [2]has further el- caused by flash crowed with bursty incoming requests. evated the importance of computing clouds.As a cru- We all know that VMs should be provisioned with cial technique in modern computing clouds,virtualiza- resources commensurate with their workload require- tion enables one physical machine (PM)to host many ments [10],which becomes more complex when con- performance-isolated virtual machines(VMs).It greatly sidering workload variation.As shown in Fig.1,two benefits a computing cloud where VMs running various kinds of resource provisioning strategies are commonly applications are aggregated together to improve resource used to deal with workload burstiness-provisioning for utilization.It has been shown in previous work [3] peak workload and provisioning for normal workload. that,the cost of energy consumption,e.g.,power supply, Provisioning for peak workload is favourable to VM and cooling,occupies a significant fraction of the total performance guarantee,but it undermines the advantage operating costs in a cloud.Therefore,making optimal of elasticity from virtualization and may lead to low utilization of underlying resources to reduce the energy resource utilization [1],[8],[9]. consumption is becoming an important issue [4],[5].To In contrast,provisioning for normal workload makes cut back the energy consumption in clouds,server con- use of elasticity in cloud computing.In this case,to solidation is proposed to tightly pack VMs to reduce the meet the dynamic resource requirements of VMs,local number of running PMs;however,VMs'performance resizing and live migration are the two pervasively-used may be seriously affected if VMs are not appropriately methods.Local resizing adaptively adjusts VM configu- placed,especially in a highly consolidated cloud. ration according to the real-time resource requirement with negligible time and computing overheads [11].On the other hand,live migration moves some VM(s)to a .S.Zhang.Z.Z.Qian,and S.L.Lu are with the State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,210023,China. relatively idle PM,when local resizing is not able to E-mail:{sheng,qzz,sanglu@nju.edu.cn. allocate enough resources.However,in a highly con- .Z.Y.Luo is with the David Cheriton School of Computer Science,Univer- solidated computing cloud where resource contention sity of Waterloo,Waterloo,N2L3G1,Canada. E-mail:zhaoyi.luo@uwaterloo.ca is generally prominent among VMs,live migration may .J.Wu is with the Department of Computer and Information Sciences, cause significant service downtime;furthermore,it also Temple University,Philadelphia,PA 19122,USA. incurs noticeable CPU usage on the host PM [12],which E-mail:jiewu@temple.edu. probably degrades the co-located VMs'performance 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 1 Burstiness-Aware Resource Reservation for Server Consolidation in Computing Clouds Sheng Zhang, Member, IEEE, Zhuzhong Qian, Member, IEEE, Zhaoyi Luo, Jie Wu, Fellow, IEEE, and Sanglu Lu, Member, IEEE Abstract—In computing clouds, burstiness of a virtual machine (VM) workload widely exists in real applications, where spikes usually occur aperiodically with low frequency and short duration. This could be effectively handled through dynamically scaling up/down in a virtualization-based computing cloud; however, to minimize energy consumption, VMs are often highly consolidated with the minimum number of physical machines (PMs) used. In this case, to meet the dynamic runtime resource demands of VMs in a PM, some VMs have to be migrated to some other PMs, which may cause potential performance degradation. In this paper, we investigate the burstinessaware server consolidation problem from the perspective of resource reservation, i.e., reserving a certain amount of extra resources on each PM to avoid live migrations, and propose a novel server consolidation algorithm, QUEUE. We first model the resource requirement pattern of each VM as a two-state Markov chain to capture burstiness, then we design a resource reservation strategy for each PM based on the stationary distribution of a Markov chain. Finally, we present QUEUE, a complete server consolidation algorithm with a reasonable time complexity. We also show how to cope with heterogenous spikes and provide remarks on several extensions. Simulation and testbed results show that, QUEUE improves the consolidation ratio by up to 45% with large spike size and around 30% with normal spike size compared with the strategy that provisions for peak workload, and achieves a better balance between performance and energy consumption in comparison with other commonly-used consolidation algorithms. Index Terms—Bursty workload, Markov chain, resource reservation, server consolidation, stationary distribution ✦ 1 INTRODUCTION C LOUD computing has been gaining more and more traction in the past few years, and it is changing the way we access and retrieve information [1]. The recent emergence of virtual desktop [2] has further elevated the importance of computing clouds. As a crucial technique in modern computing clouds, virtualization enables one physical machine (PM) to host many performance-isolated virtual machines (VMs). It greatly benefits a computing cloud where VMs running various applications are aggregated together to improve resource utilization. It has been shown in previous work [3] that, the cost of energy consumption, e.g., power supply, and cooling, occupies a significant fraction of the total operating costs in a cloud. Therefore, making optimal utilization of underlying resources to reduce the energy consumption is becoming an important issue [4], [5]. To cut back the energy consumption in clouds, server consolidation is proposed to tightly pack VMs to reduce the number of running PMs; however, VMs’ performance may be seriously affected if VMs are not appropriately placed, especially in a highly consolidated cloud. • S. Zhang, Z.Z. Qian, and S.L. Lu are with the State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China. E-mail: {sheng, qzz, sanglu}@nju.edu.cn. • Z.Y. Luo is with the David Cheriton School of Computer Science, University of Waterloo, Waterloo, N2L3G1, Canada. E-mail: zhaoyi.luo@uwaterloo.ca • J. Wu is with the Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA. E-mail: jiewu@temple.edu. We observed that the variability and burstiness of VM workload widely exists in modern computing clouds, as evidenced in prior studies [4], [6], [7], [8], [9]. Take a typical web server for example, burstiness may be caused by flash crowed with bursty incoming requests. We all know that VMs should be provisioned with resources commensurate with their workload requirements [10], which becomes more complex when considering workload variation. As shown in Fig. 1, two kinds of resource provisioning strategies are commonly used to deal with workload burstiness—provisioning for peak workload and provisioning for normal workload. Provisioning for peak workload is favourable to VM performance guarantee, but it undermines the advantage of elasticity from virtualization and may lead to low resource utilization [1], [8], [9]. In contrast, provisioning for normal workload makes use of elasticity in cloud computing. In this case, to meet the dynamic resource requirements of VMs, local resizing and live migration are the two pervasively-used methods. Local resizing adaptively adjusts VM configuration according to the real-time resource requirement with negligible time and computing overheads [11]. On the other hand, live migration moves some VM(s) to a relatively idle PM, when local resizing is not able to allocate enough resources. However, in a highly consolidated computing cloud where resource contention is generally prominent among VMs, live migration may cause significant service downtime; furthermore, it also incurs noticeable CPU usage on the host PM [12], which probably degrades the co-located VMs’ performance
This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR (1)To the best of our knowledge,we are the first to quantify the amount of reserved resources with consideration of workload burstiness.We propose to use the two-state Markov chain model to capture workload burstiness,and we present a formal prob- lem description and its NP-completeness. (2) We develop a novel algorithm,QUEUE,for burstiness-aware resource reservation,based on the Fig.1.An example of workload with bursty spikes. stationary distribution of a Markov chain.We also show how to cope with heterogeneous spikes to further improve the performance of QUEUE. In this paper,we propose to reserve some extra (3) Extensive simulations and testbed experiments are resources on each PM to accommodate bursty work- conducted to validate the effectiveness and advan- load [13].In doing so,when a resource spike occurs, tages of QUEUE. VMs can be quickly reconfigured to the new level of We now continue by presenting related work in Sec- resource requirement through local resizing with mini- tion 2 and our model in Section 3.Problem formulation is mal overheads,instead of being migrated to some other provided in Section 4.We show the details of QUEUE in PMs.Hence,the number of live migrations could be Section 5.Heterogeneous spikes are handled in Section 6. reduced considerably and the overall performance of a Evaluation results are presented in Section 7.Before computing cloud could be improved. concluding the paper in Section 9,we discuss known Specifically,we investigate the problem of minimizing issues and extensions of QUEUE in Section 8. the amount of extra resources reserved on each PM during server consolidation while the overall perfor- mance is probabilistically guaranteed.By "probabilisti- 2 RELATED WORK cally guaranteed",we mean that,the fraction of time Most of prior studies [3],[15],[16]on server consolida- within which the aggregated workloads of a PM exceed tion focused on minimizing the number of active PMs its physical capacity is not larger than a threshold. from the perspective of bin packing.A heterogeneity- Imposing such a threshold rather than conducting live aware resource management system for dynamic capac- migration upon PM's capacity overflow is a way to ity provisioning in clouds was developed in [17].Stable tolerate minor fluctuations of resource usage (like the resource allocation in geographically-distributed clouds case of CPU usage)and to break the tradeoff between was considered in [18].Network-aware virtual machine utilization and performance.Then,our problem can be placement was considered in [19].Scalable virtual net- formulated as an optimization,wherein the goal is to work models were designed in [8],[20]to allow cloud minimize the amount of resource reserved on each PM, tenants to explicitly specify computing and networking and the constraint is that the capacity violation ratio of requirements to achieve predictable performance. every PM is not larger than a predetermined threshold. In a computing cloud,burstiness of workload widely We use a two-state Markov chain to capture the bursti- exists in real applications,which becomes an inevitable ness of workload [7],and also shows how to learn the characteristic in server consolidation [1],[4],[6],[7],[21]. chain parameters.Inspired by the serving windows in Some recent works [22],[23]used stochastic bin-packing queueing theory [14],we abstract the resources reserved (SBP)techniques to deal with variable workloads,where on each PM for workload spikes as blocks.Denoting workload is modeled as random variable.Some other re- by 0(t)the number of busy blocks at time t on a PM, search [10],[24],[25]studied the SBP problem assuming we show that a sequence of (0),0(1),0(2),...has the VM workload follows normal distribution.Several other Markov property,namely that,the next state only de- studies [26],[27]focused on workload prediction while pends on the current state and not on the past sequence the application runs.Different from them,in our model a of states.Then we develop a novel server consolidation lower limit of provisioning is set at the normal workload algorithm,QUEUE,based on the stationary distribution level which effectively prevents VM interference caused of this Markov chain.We also show how to further by unpredictable behaviors from co-located VMs. improve the effectiveness of QUEUE with more careful Markov chain was used to inject burstiness into a treatment of heterogenous workload spikes.Simulation traditional benchmark in [7].Several works [5],[28],[29] and testbed results show that,QUEUE improves the studied modeling and dynamic provisioning of bursty consolidation ratio by up to 45%with large spike size workload in cloud computing.A previous study [30] and around 30%with normal spike size compared with proposed to reserve a constant level of hardware re- the strategy that provisions for peak workload,and source on each PM to tolerate workload fluctuation;but achieves a better balance between performance and en- how much resource should be reserved was not given. ergy consumption in comparison with other commonly-To the best of our knowledge,we are the first to quantify used consolidation algorithms.The contributions of our the amount of reserved resources with consideration on paper are three-fold. various,but distinct,workload burstiness 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 2 Time W o r k l o a d S p i k e (R e ) P e a k W o r k l o a d (R p ) N o rm a lW o r k l o a d (R b ) Fig. 1. An example of workload with bursty spikes. In this paper, we propose to reserve some extra resources on each PM to accommodate bursty workload [13]. In doing so, when a resource spike occurs, VMs can be quickly reconfigured to the new level of resource requirement through local resizing with minimal overheads, instead of being migrated to some other PMs. Hence, the number of live migrations could be reduced considerably and the overall performance of a computing cloud could be improved. Specifically, we investigate the problem of minimizing the amount of extra resources reserved on each PM during server consolidation while the overall performance is probabilistically guaranteed. By “probabilistically guaranteed”, we mean that, the fraction of time within which the aggregated workloads of a PM exceed its physical capacity is not larger than a threshold. Imposing such a threshold rather than conducting live migration upon PM’s capacity overflow is a way to tolerate minor fluctuations of resource usage (like the case of CPU usage) and to break the tradeoff between utilization and performance. Then, our problem can be formulated as an optimization, wherein the goal is to minimize the amount of resource reserved on each PM, and the constraint is that the capacity violation ratio of every PM is not larger than a predetermined threshold. We use a two-state Markov chain to capture the burstiness of workload [7], and also shows how to learn the chain parameters. Inspired by the serving windows in queueing theory [14], we abstract the resources reserved on each PM for workload spikes as blocks. Denoting by θ(t) the number of busy blocks at time t on a PM, we show that a sequence of θ(0), θ(1), θ(2), ... has the Markov property, namely that, the next state only depends on the current state and not on the past sequence of states. Then we develop a novel server consolidation algorithm, QUEUE, based on the stationary distribution of this Markov chain. We also show how to further improve the effectiveness of QUEUE with more careful treatment of heterogenous workload spikes. Simulation and testbed results show that, QUEUE improves the consolidation ratio by up to 45% with large spike size and around 30% with normal spike size compared with the strategy that provisions for peak workload, and achieves a better balance between performance and energy consumption in comparison with other commonlyused consolidation algorithms. The contributions of our paper are three-fold. (1) To the best of our knowledge, we are the first to quantify the amount of reserved resources with consideration of workload burstiness. We propose to use the two-state Markov chain model to capture workload burstiness, and we present a formal problem description and its NP-completeness. (2) We develop a novel algorithm, QUEUE, for burstiness-aware resource reservation, based on the stationary distribution of a Markov chain. We also show how to cope with heterogeneous spikes to further improve the performance of QUEUE. (3) Extensive simulations and testbed experiments are conducted to validate the effectiveness and advantages of QUEUE. We now continue by presenting related work in Section 2 and our model in Section 3. Problem formulation is provided in Section 4. We show the details of QUEUE in Section 5. Heterogeneous spikes are handled in Section 6. Evaluation results are presented in Section 7. Before concluding the paper in Section 9, we discuss known issues and extensions of QUEUE in Section 8. 2 RELATED WORK Most of prior studies [3], [15], [16] on server consolidation focused on minimizing the number of active PMs from the perspective of bin packing. A heterogeneityaware resource management system for dynamic capacity provisioning in clouds was developed in [17]. Stable resource allocation in geographically-distributed clouds was considered in [18]. Network-aware virtual machine placement was considered in [19]. Scalable virtual network models were designed in [8], [20] to allow cloud tenants to explicitly specify computing and networking requirements to achieve predictable performance. In a computing cloud, burstiness of workload widely exists in real applications, which becomes an inevitable characteristic in server consolidation [1], [4], [6], [7], [21]. Some recent works [22], [23] used stochastic bin-packing (SBP) techniques to deal with variable workloads, where workload is modeled as random variable. Some other research [10], [24], [25] studied the SBP problem assuming VM workload follows normal distribution. Several other studies [26], [27] focused on workload prediction while the application runs. Different from them, in our model a lower limit of provisioning is set at the normal workload level which effectively prevents VM interference caused by unpredictable behaviors from co-located VMs. Markov chain was used to inject burstiness into a traditional benchmark in [7]. Several works [5], [28], [29] studied modeling and dynamic provisioning of bursty workload in cloud computing. A previous study [30] proposed to reserve a constant level of hardware resource on each PM to tolerate workload fluctuation; but how much resource should be reserved was not given. To the best of our knowledge, we are the first to quantify the amount of reserved resources with consideration on various, but distinct, workload burstiness
This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR -卫 345678901112131415T Fig.2.Two-state Markov chain.The "ON"state repre- Workload Prote ·*-…ON-OFF Model sents peak workload(R)while the "OFF"state repre- sents normal workload(R).Pon and poff are the state Fig.3.Given the predetermined R and Re,we conser- switch probabilities vatively round the solid black curve up to the dashed red curve,based on which we can calculate pon and poff. 3 MODELING VIRTUAL MACHINE WORKLOAD where the solid black curve represents the workload 3.1 Two-state Markov Chain over time.Given the predetermined R and Re,we It has been well recognized in previous studies [4],[6], conservatively round the solid black curve up to the [7]that VM workload is time-varying with bursty spikes, dashed red curve.Denote by W(t)the workload at time t as shown in Fig.1.Several works [9],[10],[22],[23], in the dashed red curve,e.g.,W(4)=Rp,and W(7)=Ro. [24],[25]modeled the workload of a VM as a random Denote by Spr the number of switches from state variable,which follows the Bernoulli distribution in [9] "OFF"to state "OFF"in two consecutive time slots or normal distribution in [10],[24],[25].Different from during the time period of interest;denote by Srn the these works,we model the workload of a VM as a two- number of switches from state "OFF"to state "ON"in state Markov chain,which takes the additional dimen- two consecutive time slots during the time period of sion of time into consideration,and thus describes the interest,e.g.,SFF =5 and SFN =2 in Fig.3.Similarly, characteristics of spikes more precisely. we can define SNN and SNF,which are equal to 5 and Fig.2 shows an example.We denote the resource 2,respectively,in the figure.It is then easy to see that requirements of peak workload,normal workload,and SFN workload spike by Rp,Ro,and Re,respectively,where SNE Re =Rp-Ro as demonstrated in Fig.1.The "ON" Pmn SEN +SFF'and Polf-SNF+SNN state represents peak workload while the "OFF"state represents normal workload.We use Pon and poff to 3.3 Potential Benefits denote the state switch probabilities.More specifically, The 2-state Markov chain model allows cloud tenants to if a VM is in the ON state,then the probability of it flexibly control the tradeoff between VM performance switching to OFF at the next time is poff,and remaining and deployment cost through adjusting Ro and Re. ON is 1-poff.Similarly if a VM is in the OFF state, When a tenant wants to maximize VM performance,the then the probability of it switching to ON at next time tenant should choose a large R and a small Re.As we will is pon and remaining ON is 1-pon.We emphasize that show later in this paper,there may be multiple work- this model is able to describe the characteristics of spikes load spikes that share some common physical resources. precisely-intuitively,Re denotes the size of a spike,and Thus,when the aggregated amount of workload spikes Pon denotes the frequency of spike occurrence.Thus, that simultaneously occur is larger than the amount of each VM can be described by a four-tuple the shared common resources,capacity overflow hap- pens and VM performance is probably affected. V=(pan:paff,R路,FRe),1≤i≤n, (1) When a tenant wants to minimize deployment cost,the where n is the number of VMs. tenant should choose a small Ro and a large Re.By "deploy- ment cost",we mean the fee which is paid by a cloud tenant to a cloud provider.Since physical resources 3.2 Learning Model Parameters are opportunistically shared among multiple workload This subsection provides a simple strategy for cloud spikes,the charge for workload spike should be smaller tenants to generate model parameters for their VM than that for normal workload [9].Therefore,decreasing workload.It consists of two phases. Ro helps tenants to reduce the deployment cost. First,a cloud tenant must have the workload traces Our model is also a tradeoff between modeling com- and guarantees that they will be consistent with the plexity and precision.We could model time-varying realistic deployment in computing clouds.This could be workload by 3-state or even more states of Markov chain, achieved by tentatively deploying VMs in a cloud for which should capture the workload bustiness more a relatively short period;the cloud system collects the precisely;however,the complexity in learning model resource usage traces and feeds them back to tenants. parameters and allocating physical resources increases Second,given a VM workload trace,a cloud tenant as well,which may complicate the interactions between generates a four-tuple.We use Fig.3 as an illustration, cloud providers and tenants. 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 3 OFF (Rb) ON (Rp) pon poff 1-pon 1-poff Fig. 2. Two-state Markov chain. The “ON” state represents peak workload (Rp) while the “OFF” state represents normal workload (Rb). pon and pof f are the state switch probabilities. 3 MODELING VIRTUAL MACHINE WORKLOAD 3.1 Two-state Markov Chain It has been well recognized in previous studies [4], [6], [7] that VM workload is time-varying with bursty spikes, as shown in Fig. 1. Several works [9], [10], [22], [23], [24], [25] modeled the workload of a VM as a random variable, which follows the Bernoulli distribution in [9] or normal distribution in [10], [24], [25]. Different from these works, we model the workload of a VM as a twostate Markov chain, which takes the additional dimension of time into consideration, and thus describes the characteristics of spikes more precisely. Fig. 2 shows an example. We denote the resource requirements of peak workload, normal workload, and workload spike by Rp, Rb, and Re, respectively, where Re = Rp − Rb as demonstrated in Fig. 1. The “ON” state represents peak workload while the “OFF” state represents normal workload. We use pon and pof f to denote the state switch probabilities. More specifically, if a VM is in the ON state, then the probability of it switching to OFF at the next time is poff , and remaining ON is 1 − pof f . Similarly if a VM is in the OFF state, then the probability of it switching to ON at next time is pon and remaining ON is 1 − pon. We emphasize that this model is able to describe the characteristics of spikes precisely—intuitively, Re denotes the size of a spike, and pon denotes the frequency of spike occurrence. Thus, each VM can be described by a four-tuple Vi = (p i on, pi of f , Ri b , Ri e ), ∀1 ≤ i ≤ n, (1) where n is the number of VMs. 3.2 Learning Model Parameters This subsection provides a simple strategy for cloud tenants to generate model parameters for their VM workload. It consists of two phases. First, a cloud tenant must have the workload traces and guarantees that they will be consistent with the realistic deployment in computing clouds. This could be achieved by tentatively deploying VMs in a cloud for a relatively short period; the cloud system collects the resource usage traces and feeds them back to tenants. Second, given a VM workload trace, a cloud tenant generates a four-tuple. We use Fig. 3 as an illustration, Time W o r k l o a d Workload Profile ON-OFF Model S p i k e (R e ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 P e a k W o r k l o a d (R p ) N o rm a lW o r k l o a d (R b ) Fig. 3. Given the predetermined Rb and Re, we conservatively round the solid black curve up to the dashed red curve, based on which we can calculate pon and pof f . where the solid black curve represents the workload over time. Given the predetermined Rb and Re, we conservatively round the solid black curve up to the dashed red curve. Denote by W(t) the workload at time t in the dashed red curve, e.g., W(4) = Rp, and W(7) = Rb. Denote by SF F the number of switches from state “OFF” to state “OFF” in two consecutive time slots during the time period of interest; denote by SF N the number of switches from state “OFF” to state “ON” in two consecutive time slots during the time period of interest, e.g., SF F = 5 and SF N = 2 in Fig. 3. Similarly, we can define SNN and SNF , which are equal to 5 and 2, respectively, in the figure. It is then easy to see that pon = SF N SF N + SF F , and poff = SNF SNF + SNN . 3.3 Potential Benefits The 2-state Markov chain model allows cloud tenants to flexibly control the tradeoff between VM performance and deployment cost through adjusting Rb and Re. When a tenant wants to maximize VM performance, the tenant should choose a large Rb and a small Re. As we will show later in this paper, there may be multiple workload spikes that share some common physical resources. Thus, when the aggregated amount of workload spikes that simultaneously occur is larger than the amount of the shared common resources, capacity overflow happens and VM performance is probably affected. When a tenant wants to minimize deployment cost, the tenant should choose a small Rb and a large Re. By ”deployment cost”, we mean the fee which is paid by a cloud tenant to a cloud provider. Since physical resources are opportunistically shared among multiple workload spikes, the charge for workload spike should be smaller than that for normal workload [9]. Therefore, decreasing Rb helps tenants to reduce the deployment cost. Our model is also a tradeoff between modeling complexity and precision. We could model time-varying workload by 3-state or even more states of Markov chain, which should capture the workload bustiness more precisely; however, the complexity in learning model parameters and allocating physical resources increases as well, which may complicate the interactions between cloud providers and tenants
This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR Symbol Meaning the number of VMs performance of Hi.The performance of each PM should the i-th VM be probabilistically guaranteed,so we have R阳 the normal workload size of V the spiky workload size of Vi Φj≤p,1≤j≤m. (3) the peak workload size of Vi,and Ri=Ri+R Pon the probability of Vi switching from OFF to ON Here,p is a predetermined value serving as the threshold Poff the probability of Vi switching from ON to OFF of COR.Main notations are summarized in Fig.4 for m the number of PMs quick reference.Our problem can be stated as follows. H the j-th PM Problem 1:(Burtiness-Aware Server Consolidation, C the physical capacity of H; i衫 the variable indicating whether VMi is placed on PM BASC)Given a set of n VMs and a set of m PMs,find a the capacity overflow ratio of PM PM; VM-to-PM mapping X to minimize the number of PMs the threshold of capacity overflow ratio used while making sure that(1)the initial placement sat- Fig.4.Main notations for quick reference. isfies capacity constraint,and(2)the capacity overflow ratio of each PM is not larger than the threshold p.It can be formally formulated as follows: 4 PROBLEM FORMULATION n We consider a computing cloud with 1-dimensional re- min I{jl∑x>0,1≤j≤ml source;for scenarios with multi-dimensional resources, (4) we provide a few remarks in Section 8.There are m s.t.CO9=0,1≤j≤m physical machines in the computing cloud,and each PM Φ)≤p,∀1≤j≤m is described by its physical capacity Here,S denotes the cardinality of set S.In the H=(C),1≤j≤m. (2) following theorem,we can prove that,the BASC problem is NP-complete. We use a binary matrix X [ziilnxm to represent Theorem 1:The BASC problem is NP-complete. the results of placing n VMs on m PMs:i;=1,if Vi Proof:We prove this theorem by reduction from the is placed on Hj,and 0 otherwise.We assume that the Bin Packing (BP)problem [31],which is NP-complete workloads of VMs are mutually independent.Let Wi(t) The decision version of the BP problem is as follows. be the resource requirements of Vi at time t.According Given n items with sizes s1,s2,...,sn E (0,1],can we the Markov chain model,we have pack them in no more than k unit-sized bins? Given an instance of the decision version of the BP Ri if Vi is in the "OFF"state at time t, problem,we can construct an instance of the decision W(t)= if V is in the "ON"state at time t. version of our problem as follows:let Ri =Ri=si, 1≤i≤n;letm=k;letC,=l,1≤j≤m;and let Then,the aggregated resource requirement of VMs on p=0,i.e.,capacity overflow is not allowed. PM H,isW:(t). It is not hard to see that the construction can be Let CO indicate whether the capacity overflow hap- finished in polynomial time;thus,we reduce solving pens on PM Hj at time t,i.e., the NP-complete BP problem to solving a special case of our problem,implying that our problem is NP-hard. It is easy to verify that the BASC problem is also in NP; zijWi(t)>Ci, i=1 the theorem follows immediately. 口 otherwise. 5 BURSTINESS-AWARE RESOURCE RESER- Intuitively,the results of VM placement should guaran- VATION tee that the capacity constraint is satisfied on each PM In this section,we first present the main idea of our at the beginning of the time period of interest,i.e., solution to the BASC problem,then we design a resource CO9=0,1≤j≤m reservation strategy for a single PM,based on which we develop QUEUE,a complete server consolidation We now can define our metric for probabilistic perfor- algorithm.In the end,we provide a concrete example mance guarantee-capacity overflow ratio (COR),which is to help readers better understand our algorithm. the fraction of time that the aggregated workloads of a PM exceed its physical capacity.Denoting the capacity 5.1 Overview of QUEUE overflow ratio of PM Hj as i,we have We propose reserving a certain amount of physical re- EisisTcof sources on each PM to accommodate workload spikes. The main idea is to abstract the reserved spaces as T blocks (i.e.,serving windows in queueing theory).We where T is the length of the time period of interest. give an informal illustration of the evolution process of It is easy to see that,a smaller implies a better our queueing system in Fig.5. 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 4 Symbol Meaning n the number of VMs Vi the i-th VM Ri b the normal workload size of Vi Ri e the spiky workload size of Vi Ri p the peak workload size of Vi, and Ri p = Ri b + Ri e p i on the probability of Vi switching from OFF to ON p i off the probability of Vi switching from ON to OFF m the number of PMs Hj the j-th PM Cj the physical capacity of Hj xij the variable indicating whether V Mi is placed on PMj Φj the capacity overflow ratio of PM PMj ρ the threshold of capacity overflow ratio Fig. 4. Main notations for quick reference. 4 PROBLEM FORMULATION We consider a computing cloud with 1-dimensional resource; for scenarios with multi-dimensional resources, we provide a few remarks in Section 8. There are m physical machines in the computing cloud, and each PM is described by its physical capacity Hj = (Cj ), ∀1 ≤ j ≤ m. (2) We use a binary matrix X = [xij ]n×m to represent the results of placing n VMs on m PMs: xij = 1, if Vi is placed on Hj , and 0 otherwise. We assume that the workloads of VMs are mutually independent. Let Wi(t) be the resource requirements of Vi at time t. According the Markov chain model, we have Wi(t) = Ri b if Vi is in the “OFF” state at time t, Ri p if Vi is in the “ON” state at time t. Then, the aggregated resource requirement of VMs on PM Hj is ∑n i=1 xijWi(t). Let COt j indicate whether the capacity overflow happens on PM Hj at time t, i.e., COt j = 1 if ∑n i=1 xijWi(t) > Cj , 0 otherwise. Intuitively, the results of VM placement should guarantee that the capacity constraint is satisfied on each PM at the beginning of the time period of interest, i.e., CO0 j = 0, ∀1 ≤ j ≤ m. We now can define our metric for probabilistic performance guarantee—capacity overflow ratio (COR), which is the fraction of time that the aggregated workloads of a PM exceed its physical capacity. Denoting the capacity overflow ratio of PM Hj as Φj , we have Φj = ∑ 1≤t≤T COt j T , where T is the length of the time period of interest. It is easy to see that, a smaller Φj implies a better performance of Hj . The performance of each PM should be probabilistically guaranteed, so we have Φj ≤ ρ, ∀1 ≤ j ≤ m. (3) Here, ρ is a predetermined value serving as the threshold of COR. Main notations are summarized in Fig. 4 for quick reference. Our problem can be stated as follows. Problem 1: (Burtiness-Aware Server Consolidation, BASC) Given a set of n VMs and a set of m PMs, find a VM-to-PM mapping X to minimize the number of PMs used while making sure that (1) the initial placement satisfies capacity constraint, and (2) the capacity overflow ratio of each PM is not larger than the threshold ρ. It can be formally formulated as follows: min |{j| ∑n i=1 xij > 0, 1 ≤ j ≤ m}| s.t. CO0 j = 0, ∀1 ≤ j ≤ m Φj ≤ ρ, ∀1 ≤ j ≤ m (4) Here, |S| denotes the cardinality of set S. In the following theorem, we can prove that, the BASC problem is NP-complete. Theorem 1: The BASC problem is NP-complete. Proof: We prove this theorem by reduction from the Bin Packing (BP) problem [31], which is NP-complete. The decision version of the BP problem is as follows. Given n items with sizes s1, s2, ..., sn ∈ (0, 1], can we pack them in no more than k unit-sized bins? Given an instance of the decision version of the BP problem, we can construct an instance of the decision version of our problem as follows: let Ri b = Ri p = si , ∀1 ≤ i ≤ n; let m = k; let Cj = 1, ∀1 ≤ j ≤ m; and let ρ = 0, i.e., capacity overflow is not allowed. It is not hard to see that the construction can be finished in polynomial time; thus, we reduce solving the NP-complete BP problem to solving a special case of our problem, implying that our problem is NP-hard. It is easy to verify that the BASC problem is also in NP; the theorem follows immediately. 5 BURSTINESS-AWARE RESOURCE RESERVATION In this section, we first present the main idea of our solution to the BASC problem, then we design a resource reservation strategy for a single PM, based on which we develop QUEUE, a complete server consolidation algorithm. In the end, we provide a concrete example to help readers better understand our algorithm. 5.1 Overview of QUEUE We propose reserving a certain amount of physical resources on each PM to accommodate workload spikes. The main idea is to abstract the reserved spaces as blocks (i.e., serving windows in queueing theory). We give an informal illustration of the evolution process of our queueing system in Fig. 5
This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR Queueing: Reducing R RR RR System Blocks OFF to ON (i.e.,VMs that enter the queueing system) at timet,respectively.We use (to denoteie x choose y.Since the workloads of VMs are mutually independent,we have Physical Machine cal Machine Physical Machine Pr{O(t)=x}= )P1-pof)- @(t) (a) (b) (c) and Fig.5.An illustration of the evolution process.(a)The original provisioning strategy for peak workload.(b)Gath- Pr{I(t)=x}= ering all R's together to form a queueing system.(c)Re- k-e)p6n(1-pom-- ducing the number of blocks while still satisfying Equ.(3). which suggest that both O(t)and I(t)follow the bino- mial distribution: Initially,all VMs are provisioned by R+Re,and each O(t)B(0(t),Poff), VM has its own block (denoted as Re in Fig.5).A VM (5) I(t)~B(k-0(t);Pon). uses only its Ro part during periods of normal workload, however,when a workload spike occurs,the extra Re Without loss of generality,we assume that the switch part is put into use.We note that,the collected Re's between two consecutive states of all VMs happens at altogether form a queueing system-when a workload the end of each time interval.Then we have the recursive spike occurs in a VM,the VM enters the queueing system relation of (t), and occupies one of the idle blocks;when the spike (t+1)=θ(t)-O(t)+I(t) (6) disappears,the corresponding VM leaves the queue- ing system and releases the block.It is worth noting Combining Equs.(5)and(6)together,we see that,the that,there is no waiting space in the queueing system; next state (t+1)only depends on the current state thus,the PM capacity constraint would be violated if a 0(t)and not on the past sequence of states 0(t-1), workload spike occurs while all the blocks are occupied,(t-2),...,0(0).Therefore,the stochastic process (0), which never happens when the number of blocks equals 0(1),...of discrete time (10,1,2,...})and discrete space the number of co-located VMs (as shown in Fig.5(b)). ([0,1,2,...,))is a Markov chain.The stochastic process However,we may find that a certain number of blocks is said to be in state i(1<i<k)if the number of busy are idle for the majority of the time in Fig.5(b),so we can blocks is i.Fig.6 shows the transition graph of the chain. reduce the number of blocks while only incurring very Let pij be the transition probability from state i to state few capacity violations(as shown in Fig.5(c)).Therefore,j.That is to say,if (t)=i,then the probability that our goal becomes reserving minimal number of blocks 0(t+1)=j is pij.For the sake of convenience,when on each PM while the performance constraint in Equ.(3) y>x or y<x<0,we let (be 0.Then,Pij can be is still satisfied. derived as follows. p=Pr{9(t+1)=(t)= 5.2 Resource Reservation Strategy for a Single PM In this subsection,We focus on resource reservation for =∑Pr0=因=j-i+r8= a single PM.For the sake of convenience,we set the size r=0 of each block as the size of the maximum spike of all co-located VMs on a PM.In Section 6,we will present =∑Pr{O(=r(=i r=0 how to cope with heterogenous workload spikes in an (7) effort to further improve the performance of QUEUE. x Pr{I(t)=j-i+0(t)=i} We also assume that all VMs have the same state switch probabilities,i.e.,pon =Pon and poff=Poff,for all 1< i<n.In Section 8,we will show how to cluster VMs when they have different state switch probabilities. Suppose there are k VMs on the PM of interest and initially each VM Vi occupies R resources.We initialize In the above formula,the first and second equations the number of blocks reserved on this PM as k,and our are due to the definition of pij and the recursive relation objective is to reduce the number of blocks to K(K<k), of 0(t),respectively.Observing that O(t)and I(t)are while the capacity overflow ratio does not exceed the independent of each other,we get the third equation threshold p.Let 0(t)be the number of busy blocks at The last equation can be obtained by replacing O(t)and time t,implying that,there are 0(t)VMs in the ON state I(t)with their distributions. and (k-0(t))VMs in the OFF state.Let O(t)and I(t)The stochastic matrix P=pijl(+)x(+)is called denote the number of VMs that switch state from ON to the transition matrix of the Markov chain.We see OFF(i.e.,VMs that leave the queueing system)and from that,it does not depend on the time.Let () 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 5 Queueing System Physical Machine V1 V2 V3 V4 V5 Physical Machine Re Re Re Re Re Reducing Blocks Physical Machine Re Re Re (a) (b) (c) Re Re Re Re Re V1 V2 V3 V4 V5 V1 V2 V3 V4 V5 Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Fig. 5. An illustration of the evolution process. (a) The original provisioning strategy for peak workload. (b) Gathering all Re ′s together to form a queueing system. (c) Reducing the number of blocks while still satisfying Equ. (3). Initially, all VMs are provisioned by Rb +Re, and each VM has its own block (denoted as Re in Fig. 5). A VM uses only its Rb part during periods of normal workload, however, when a workload spike occurs, the extra Re part is put into use. We note that, the collected Re ′ s altogether form a queueing system—when a workload spike occurs in a VM, the VM enters the queueing system and occupies one of the idle blocks; when the spike disappears, the corresponding VM leaves the queueing system and releases the block. It is worth noting that, there is no waiting space in the queueing system; thus, the PM capacity constraint would be violated if a workload spike occurs while all the blocks are occupied, which never happens when the number of blocks equals the number of co-located VMs (as shown in Fig. 5(b)). However, we may find that a certain number of blocks are idle for the majority of the time in Fig. 5(b), so we can reduce the number of blocks while only incurring very few capacity violations (as shown in Fig. 5(c)). Therefore, our goal becomes reserving minimal number of blocks on each PM while the performance constraint in Equ. (3) is still satisfied. 5.2 Resource Reservation Strategy for a Single PM In this subsection, We focus on resource reservation for a single PM. For the sake of convenience, we set the size of each block as the size of the maximum spike of all co-located VMs on a PM. In Section 6, we will present how to cope with heterogenous workload spikes in an effort to further improve the performance of QUEUE. We also assume that all VMs have the same state switch probabilities, i.e., p i on = pon and p i of f = poff , for all 1 ≤ i ≤ n. In Section 8, we will show how to cluster VMs when they have different state switch probabilities. Suppose there are k VMs on the PM of interest and initially each VM Vi occupies Ri b resources. We initialize the number of blocks reserved on this PM as k, and our objective is to reduce the number of blocks to K (K < k), while the capacity overflow ratio Φ does not exceed the threshold ρ. Let θ(t) be the number of busy blocks at time t, implying that, there are θ(t) VMs in the ON state and (k − θ(t)) VMs in the OFF state. Let O(t) and I(t) denote the number of VMs that switch state from ON to OFF (i.e., VMs that leave the queueing system) and from OFF to ON (i.e., VMs that enter the queueing system) at time t, respectively. We use ( x y ) to denote x! y!(x−y)! , i.e., x choose y. Since the workloads of VMs are mutually independent, we have P r{O(t) = x} = ( θ(t) x ) p x of f (1 − poff ) θ(t)−x , and P r{I(t) = x} = ( k − θ(t) x ) p x on(1 − pon) k−θ(t)−x , which suggest that both O(t) and I(t) follow the binomial distribution: { O(t) ∼ B(θ(t), poff ), I(t) ∼ B(k − θ(t), pon). (5) Without loss of generality, we assume that the switch between two consecutive states of all VMs happens at the end of each time interval. Then we have the recursive relation of θ(t), θ(t + 1) = θ(t) − O(t) + I(t). (6) Combining Equs. (5) and (6) together, we see that, the next state θ(t + 1) only depends on the current state θ(t) and not on the past sequence of states θ(t − 1), θ(t − 2), ..., θ(0). Therefore, the stochastic process θ(0), θ(1), ... of discrete time ({0, 1, 2, ...}) and discrete space ({0, 1, 2, ..., k}) is a Markov chain. The stochastic process is said to be in state i (1 ≤ i ≤ k) if the number of busy blocks is i. Fig. 6 shows the transition graph of the chain. Let pij be the transition probability from state i to state j. That is to say, if θ(t) = i, then the probability that θ(t + 1) = j is pij . For the sake of convenience, when y > x or y ≤ x < 0, we let ( x y ) be 0. Then, pij can be derived as follows. pij =P r{θ(t + 1) = j|θ(t) = i} = ∑ i r=0 P r{O(t) = r, I(t) = j − i + r|θ(t) = i} = ∑ i r=0 P r{O(t) = r|θ(t) = i} × P r{I(t) = j − i + r|θ(t) = i} = ∑ i r=0 ( i r ) p r off (1 − poff ) i−r × ( k − i j − i + r ) p j−i+r on (1 − pon) k−j−r . (7) In the above formula, the first and second equations are due to the definition of pij and the recursive relation of θ(t), respectively. Observing that O(t) and I(t) are independent of each other, we get the third equation. The last equation can be obtained by replacing O(t) and I(t) with their distributions. The stochastic matrix P = [pij ](k+1)×(k+1) is called the transition matrix of the Markov chain. We see that, it does not depend on the time. Let π (t) =