Availability "Weak Point Analysis over an SOA Deployment Framework Lei Xiel,Jing Luo2,Jie Qiu2,John A Pershing3,Ying Li2,Ying Chen2 1Department of Computer Science,Nanjing University xielei@dislab.nju.edu.cn 2 IBM China Research Lab {jingluo,qiujie,lying,yingch}@cn.ibm.com 3 IBM T.J.Watson Research Center,Ha wthorne,NY 10532 pershng@us.ibm.com I Abstract-Availability is one of the important factors to to the workflow specification,the availability for the chains be considered for business-driven IT service management.This of resources over the IT infrastructure forms the end to end paper addresses the issue of analyzing what we call availability availability. weak-points in an SOA deployment framework,leveraging work- flow definitions to specify the high availability requirement at the Note that,even if all single points of failure have been business process level.In our weak-point analysis framework,we made redundant,some of these (redundant)resources still present an effective analysis methodology to calculate the optimal may not exhibit the necessary availability level to satisfy high availability solution with minimum cost,while meeting the the requirements of the business processes.We refer to this business level availability requirements.We evaluate the weak- point analysis methodology,and show that our methodology can situation as an availability weak-point,and it may be necessary identify a near-optimal solution for availability enhancement over to introduce even more redundancy in order to meet the the SOA deployment framework. availability requirements. The key to deliver successful,robust solutions is determin- I.INTRODUCTION ing the right level of high availability IT infrastructure [5]:not Service-Oriented Architecture (SOA)has opened up new enough could result in costly outages,and too much could be opportunities for organizations seeking more flexibility and an expensive waste.So it makes sense to perform availability responsiveness to business demands over the large scale de- analysis over the distributed IT infrastructure in conjunction ployed IT infrastructures.Availability of computing resources with business level requirements,and further plan for high is an important consideration for IT service management. availability solutions. Note,though,that the actual availability requirements are Therefore,detecting and analyzing the availability weak dictated by the various business processes and services that are points from the SOA deployment topology is the premise for supported by the IT infrastructure;the availability requirement applying high availability (HA)solutions [1][2][3]over the of an individual resource is simply to support the overall IT infrastructure.In this paper we propose a workflow-based availability of the busines processes and services.Business weak-point analysis methodology over the SOA deployment services today are not only doing more work but also have framework:the novelty of our approach is that we propose more users,often spread out across the globe,and requiring a framework to analyze the weak-points and give indications near 24/7 availability. for optimal HA solutions over the deployment topology.Using The basic principle of high availability management for our framework,it can be determined which components from IT infrastructure is to eliminate single points of failure by the topology need to be HA enhanced,and to what level providing redundancy,which can be implemented to varying they should be enhanced to satisfy the business-level HA degrees with a wide range of associated cost and perfor- requirements,while keeping the overall cost close to the mance considerations.Common high availability techniques minimum. include clustering [1],hot failover mechanisms [2][3],recur- The rest of the paper is organized as follows:In section sive restartability [4],redundant arrays of independent disks II we describe the basic structure of our availability weak- (RAIDs),and other approaches.From the business process point analysis framework.We introduce the workflow based level for enterprise applications,the availability metric is methodology for calculating a near-optimal solution in section actually an end to end availability;thus,the business process III.Section IV shows the experiment evaluation to depict can be depicted as a workflow.For general applications the the efficiency of our analysis framework.In Section V we workflow crosses the typical three-tiered IT infrastructure: introduce the related works.Section VI concludes the paper. web tier,middleware tier,database tier.Therefore,according II.THE WEAK-POINT ANALYSIS FRAMEWORK IThis paper work is done when the first author is working as an intern in The overall weak-point analysis framework is shown in IBM China Research Lab. Fig.1.The framework includes the following three major
Availability “Weak Point” Analysis over an SOA Deployment Framework Lei Xie1 , Jing Luo2 , Jie Qiu2 , John A Pershing3 , Ying Li2 ,Ying Chen2 1 Department of Computer Science, Nanjing University xielei@dislab.nju.edu.cn 2 IBM China Research Lab {jingluo, qiujie,lying, yingch}@cn.ibm.com 3 IBM T. J. Watson Research Center, Ha wthorne, NY 10532 pershng@us.ibm.com 1 Abstract— Availability is one of the important factors to be considered for business-driven IT service management. This paper addresses the issue of analyzing what we call availability weak-points in an SOA deployment framework, leveraging work- flow definitions to specify the high availability requirement at the business process level. In our weak-point analysis framework, we present an effective analysis methodology to calculate the optimal high availability solution with minimum cost, while meeting the business level availability requirements. We evaluate the weakpoint analysis methodology, and show that our methodology can identify a near-optimal solution for availability enhancement over the SOA deployment framework. I. INTRODUCTION Service-Oriented Architecture (SOA) has opened up new opportunities for organizations seeking more flexibility and responsiveness to business demands over the large scale deployed IT infrastructures. Availability of computing resources is an important consideration for IT service management. Note, though, that the actual availability requirements are dictated by the various business processes and services that are supported by the IT infrastructure; the availability requirement of an individual resource is simply to support the overall availability of the busines processes and services. Business services today are not only doing more work but also have more users, often spread out across the globe, and requiring near 24/7 availability. The basic principle of high availability management for IT infrastructure is to eliminate single points of failure by providing redundancy, which can be implemented to varying degrees with a wide range of associated cost and performance considerations. Common high availability techniques include clustering [1], hot failover mechanisms [2] [3], recursive restartability [4], redundant arrays of independent disks (RAIDs), and other approaches. From the business process level for enterprise applications, the availability metric is actually an end to end availability; thus, the business process can be depicted as a workflow. For general applications the workflow crosses the typical three-tiered IT infrastructure: web tier, middleware tier, database tier. Therefore, according 1This paper work is done when the first author is working as an intern in IBM China Research Lab. to the workflow specification, the availability for the chains of resources over the IT infrastructure forms the end to end availability. Note that, even if all single points of failure have been made redundant, some of these (redundant) resources still may not exhibit the necessary availability level to satisfy the requirements of the business processes. We refer to this situation as an availability weak-point, and it may be necessary to introduce even more redundancy in order to meet the availability requirements. The key to deliver successful, robust solutions is determining the right level of high availability IT infrastructure [5]: not enough could result in costly outages, and too much could be an expensive waste. So it makes sense to perform availability analysis over the distributed IT infrastructure in conjunction with business level requirements, and further plan for high availability solutions. Therefore, detecting and analyzing the availability weak points from the SOA deployment topology is the premise for applying high availability (HA) solutions [1] [2] [3] over the IT infrastructure. In this paper we propose a workflow-based weak-point analysis methodology over the SOA deployment framework: the novelty of our approach is that we propose a framework to analyze the weak-points and give indications for optimal HA solutions over the deployment topology. Using our framework, it can be determined which components from the topology need to be HA enhanced, and to what level they should be enhanced to satisfy the business-level HA requirements, while keeping the overall cost close to the minimum. The rest of the paper is organized as follows: In section II we describe the basic structure of our availability weakpoint analysis framework. We introduce the workflow based methodology for calculating a near-optimal solution in section III. Section IV shows the experiment evaluation to depict the efficiency of our analysis framework. In Section V we introduce the related works. Section VI concludes the paper. II. THE WEAK-POINT ANALYSIS FRAMEWORK The overall weak-point analysis framework is shown in Fig.1. The framework includes the following three major
modules: as MTBF.MTTR).Then,it checks whether the availability requirement for each business workflow has been satisfied:for those unsatisfied workflows,the resources where the relevant services are deployed should have their availability enhanced through the use of clustering or a "hot standby"configura- tion.The weak-point analysis module calculates the optimal HA solution over the topology,subject to a utility function, ngle Point Anal producing the HA enhancement parameters for each relevant nCk resource.This module utilizes a Lagrangian constrained op- timization algorithm to achieve a near-optimal solution for HA enhancement,we will describe this algorithm in detail HA Pattern Mapping M in Section III. The HA Pattern Mapping Module applies relevant HA patterns to the identified weak-point IT resources according Fig.1.Framework for Workflow based High Availability Analysis to the optimal solution produced by the weak-point analysis module:these patterns may be generic (e.g.,clustering,hot standby)or product-specific (e.g.,DB2 High Availability and Disaster Recovery).This module finally produces an HA- enhanced deployment topology which satisfies the business ws ws level availability requirement,and requires the minimum over- all cost. WAR EAR EAR In our weak-point analysis framework,we specify the usage WAR EAR of the IT resources from the business process level,leveraging BPEL [6]to specify the business process workflows defining the process and services which we are interested in.Over the 2 SOA deployment topology,we can further map the workflows to the application and IT infrastructure levels by inspecting the hosting and dependency relationships that are defined in the Fig.2.Workflow Mapping over the SOA Deployment Topology IT infrastructure.As Fig.2 shows,the hosting relationships are specified over the SOA deployment topology;through the The Workflow Specification Module extracts relevant infor- hosting relationship,Workflow I and Workflow 2 are mapped mation from the business process workflows:it takes business to IT infrastructure level.Through this workflow mapping,we level workflows as input.specifying the process flow at busi- can extract the relevant resource list at the IT infrastructure ness level leveraging the BPEL [6]structure files,and it also level for each workflow.Based on the resource lists and HA takes as input the availability requirement for each business requirements for each workflow over the SOA deployment workflow.The availability requirement can be expressed as topology,we can calculate the optimal HA enhancement equal to MTBF/(MTBF MTTR),where MTBF represents solution for the current deployment topology. mean time between failures and MTTR represents mean time to repair,so the availability requirement actually lies in the III.WORKFLOW BASED WEAK-POINT ANALYSIS range from 0 and 1 (in reality,it probably lies in the range METHODOLOGY from.9 to 1).Then,this module maps the workflow from the The key contribution of our weak-point analysis method- service(business)level to the IT infrastructure level according ology is the use of business-level workflow specifications to the SOA deployment topology.Finally,it produces the to specify availability requirements and to map the flow of extracted workflow mapping matrix,which specifies necessary transactions through the IT infrastructure.In this section we information for the relevant resources for each workflow at the will describe our analysis methodology,which is responsible IT infrastructure level. for recommending HA solutions such that business-level avail- The Weak-Point Analysis Module does weak-point analysis ability requirements are met while keeping the overall cost based on the mapping matrix created by the Specification close to the minimum. Module,plus MTBF,MITR,and cost metrics for the various IT resources.It calculates a near-optimal solution with mini- A.Workflow Specification mum overall cost while meeting the business level availability After analysis by the workflow specification module de- requirements:this solution indicates which resources need to scribed above,we have extracted the list of IT resources which be HA enhanced,and suggests the size of the clusters for are involved in each workflow.Now we give the following those resources.First,the current availability capability for definitions:We assume there exist n workflows over the SOA each workflow is calculated according to the component failure deployment topology,denoted by Wi,W2,W3....,Wn,and behavior parameters obtained from historical experience(such these workflows are specified with availability requirements
modules: Fig. 1. Framework for Workflow based High Availability Analysis Fig. 2. Workflow Mapping over the SOA Deployment Topology The Workflow Specification Module extracts relevant information from the business process workflows: it takes business level workflows as input, specifying the process flow at business level leveraging the BPEL [6] structure files, and it also takes as input the availability requirement for each business workflow. The availability requirement can be expressed as equal to MTBF/(MTBF + MTTR), where MTBF represents mean time between failures and MTTR represents mean time to repair, so the availability requirement actually lies in the range from 0 and 1 (in reality, it probably lies in the range from .9 to 1). Then, this module maps the workflow from the service (business) level to the IT infrastructure level according to the SOA deployment topology. Finally, it produces the extracted workflow mapping matrix, which specifies necessary information for the relevant resources for each workflow at the IT infrastructure level. The Weak-Point Analysis Module does weak-point analysis based on the mapping matrix created by the Specification Module, plus MTBF, MTTR, and cost metrics for the various IT resources. It calculates a near-optimal solution with minimum overall cost while meeting the business level availability requirements; this solution indicates which resources need to be HA enhanced, and suggests the size of the clusters for those resources. First, the current availability capability for each workflow is calculated according to the component failure behavior parameters obtained from historical experience (such as MTBF, MTTR). Then, it checks whether the availability requirement for each business workflow has been satisfied; for those unsatisfied workflows, the resources where the relevant services are deployed should have their availability enhanced through the use of clustering or a “hot standby” configuration. The weak-point analysis module calculates the optimal HA solution over the topology, subject to a utility function, producing the HA enhancement parameters for each relevant resource. This module utilizes a Lagrangian constrained optimization algorithm to achieve a near-optimal solution for HA enhancement, we will describe this algorithm in detail in Section III. The HA Pattern Mapping Module applies relevant HA patterns to the identified weak-point IT resources according to the optimal solution produced by the weak-point analysis module: these patterns may be generic (e.g., clustering, hot standby) or product-specific (e.g., DB2 High Availability and Disaster Recovery). This module finally produces an HAenhanced deployment topology which satisfies the business level availability requirement, and requires the minimum overall cost. In our weak-point analysis framework, we specify the usage of the IT resources from the business process level, leveraging BPEL [6] to specify the business process workflows defining the process and services which we are interested in. Over the SOA deployment topology, we can further map the workflows to the application and IT infrastructure levels by inspecting the hosting and dependency relationships that are defined in the IT infrastructure. As Fig. 2 shows, the hosting relationships are specified over the SOA deployment topology; through the hosting relationship, Workflow 1 and Workflow 2 are mapped to IT infrastructure level. Through this workflow mapping, we can extract the relevant resource list at the IT infrastructure level for each workflow. Based on the resource lists and HA requirements for each workflow over the SOA deployment topology, we can calculate the optimal HA enhancement solution for the current deployment topology. III. WORKFLOW BASED WEAK-POINT ANALYSIS METHODOLOGY The key contribution of our weak-point analysis methodology is the use of business-level workflow specifications to specify availability requirements and to map the flow of transactions through the IT infrastructure. In this section we will describe our analysis methodology, which is responsible for recommending HA solutions such that business-level availability requirements are met while keeping the overall cost close to the minimum. A. Workflow Specification After analysis by the workflow specification module described above, we have extracted the list of IT resources which are involved in each workflow. Now we give the following definitions: We assume there exist n workflows over the SOA deployment topology, denoted by W1, W2, W3,...,Wn, and these workflows are specified with availability requirements
Pi,P2,P3,..,P,where 0<Pi<1.We also assume that standalone at initial step for calculating,so we can calculate there are m IT resources in the infrastructure,denoted by Ci, the current availability for each workflow as formula I shows: C2,...,Cm.Each resource consists of a "stack"of hardware and software components:for instance.an X86 server.a Linux m OS,and a Websphere Application Server. P(W (P(Cj)R) (1) C C W(P1 R1.1 1,21,3 Ri.m W2P2)2.1 2.22,3 R2.m P(Wi)is the current availability capability for workflow Wn(Pn)Rn.1 Rn.2 Rn.3.. Rn.m Wi.We compare it with the workflow's availability require- ment P:if P(W:)>P,the requirement is met:otherwise,the TABLE I availability requirement is unsatisfied,and some resources in THE WORKFLOW-RESOURCE RELATIONSHIP MATRIX the resource list of workflow W;need to have their availability enhanced through the deployment of an HA pattern to meet the availability requirement.This is an optimization problem: which resources should be enhanced for availability to meet Business Process Flow (Worknow 1] the availability requirement,while keeping HA enhancement SerAc 2 cost as low as possible? PIC PICa PICN A conventional method of addressing an optimization prob- onent 1 lem is by enumerating all possible solutions and comparing Resourse CI Resource C2 their cost;however,this approach is computationally expensive for all but the simplest problems,and is sometimes unsolvable when the number of resources is large.Literature [7]proposes Resourse C4 an approach to search for the optimal solution through multi- D4e*n4y山nm tier system design,based on exhaustive iteration.However, our weak-point analysis methodology calculates a near-optimal Fig.3.Example BPEL Workflow solution for HA enhancement using the method of Lagrange multipliers [8],which is a compute-effective approach. We construct a matrix to capture the workflow-resource Assume the number of workflows whose availability re- relationship.Table I shows the matrix;the relationship be- quirements have not yet been met is n;for workflow Wi tween workflow Wi and resource Cj is Ri.j,where Ri.j is an we define the enhancement parameter PWi as the amount integer value depicting the number of references to resource by which that workflow's current availability needs to be Ci from workflow Wi,and is set to 0 when resource C;is enhanced to meet the availability requirement Pi: not included in the resource list of Wi.For example,Fig. 3 shows a workflow with two services,which are mapped to three IT infrastructure resources,C1,C2 and C3,plus PWi=- P one resource Ca which is not included in the workflow. P(Wi) (2) Note that,at the application level,Component 1 depends on Component 2 to implement Service 1,Component 2 depends By definition.PW;1.We also define the enhancement on Component 3 to implement Service 2 as well.We denote parameter for each resource as PC1,PC2,..,PCm;thus,we the availability capability of resource Ci as P(Ci),therefore, form the following constraints: the availability for the two services are P(C1).P(C2).P(C3) and P(C2).P(C3);thus,the availability for the workflow is P(C1).P(C2)2.P(C3)2,and the matrix for Workflow I is set to [1,2.2.01.For standalone services which have no PW1≤PCB,1.PC1,2.…PCRm dependency relationships,we can simply set Ri.i to I for all PW2≤PCR.PCB22.…PCR2m the referenced resources,and 0 for unreferenced resources. PW≤PCR,1.PCB2.PCRm (3) B.Optimal Solution Calculation Given the workflow-resource relationship matrix,we can PWn≤PC1.PC2.…PCRm calculate the current availability capability for each workflow according to its resource list.Assume that the availability of In other words,the overall availability enhancement for the m resources are P(C1),P(C2),P(C3),.,P(Cm):these the IT resources within the workflow should be no less than availabilities can be derived from historical measurements the availability enhancement requirement for the workflow. or,perhaps,from data obtained from the manufacturer.For We take the logarithm of the inequalities 3 to simplify the this scenario,we assume that the relevant resources are all calculation,yielding:
P1, P2, P3,..., Pn, where 0 < Pi < 1. We also assume that there are m IT resources in the infrastructure, denoted by C1, C2, ..., Cm. Each resource consists of a “stack” of hardware and software components; for instance, an X86 server, a Linux OS, and a Websphere Application Server. C1 C2 C3 ... Cm W1(P1) R1,1 R1,2 R1,3 ... R1,m W2(P2) R2,1 R2,2 R2,3 ... R2,m ... ... ... ... ... ... Wn(Pn) Rn,1 Rn,2 Rn,3 ... Rn,m TABLE I THE WORKFLOW-RESOURCE RELATIONSHIP MATRIX Fig. 3. Example BPEL Workflow We construct a matrix to capture the workflow-resource relationship. Table I shows the matrix; the relationship between workflow Wi and resource Cj is Ri,j , where Ri,j is an integer value depicting the number of references to resource Cj from workflow Wi , and is set to 0 when resource Cj is not included in the resource list of Wi . For example, Fig. 3 shows a workflow with two services, which are mapped to three IT infrastructure resources, C1, C2 and C3, plus one resource C4 which is not included in the workflow. Note that, at the application level, Component 1 depends on Component 2 to implement Service 1, Component 2 depends on Component 3 to implement Service 2 as well. We denote the availability capability of resource Ci as P(Ci), therefore, the availability for the two services are P(C1)·P(C2)·P(C3) and P(C2) · P(C3); thus, the availability for the workflow is P(C1) · P(C2) 2 · P(C3) 2 , and the matrix for Workflow 1 is set to [1,2,2,0]. For standalone services which have no dependency relationships, we can simply set Ri,j to 1 for all the referenced resources, and 0 for unreferenced resources. B. Optimal Solution Calculation Given the workflow-resource relationship matrix, we can calculate the current availability capability for each workflow according to its resource list. Assume that the availability of the m resources are P(C1), P(C2), P(C3), ..., P(Cm): these availabilities can be derived from historical measurements or, perhaps, from data obtained from the manufacturer. For this scenario, we assume that the relevant resources are all standalone at initial step for calculating, so we can calculate the current availability for each workflow as formula 1 shows: P(Wi) = Ym j=1 (P(Cj ) Ri,j ) (1) P(Wi) is the current availability capability for workflow Wi . We compare it with the workflow’s availability requirement Pi : if P(Wi) ≥ Pi , the requirement is met; otherwise, the availability requirement is unsatisfied, and some resources in the resource list of workflow Wi need to have their availability enhanced through the deployment of an HA pattern to meet the availability requirement. This is an optimization problem: which resources should be enhanced for availability to meet the availability requirement, while keeping HA enhancement cost as low as possible? A conventional method of addressing an optimization problem is by enumerating all possible solutions and comparing their cost; however, this approach is computationally expensive for all but the simplest problems, and is sometimes unsolvable when the number of resources is large. Literature [7] proposes an approach to search for the optimal solution through multitier system design, based on exhaustive iteration. However, our weak-point analysis methodology calculates a near-optimal solution for HA enhancement using the method of Lagrange multipliers [8], which is a compute-effective approach. Assume the number of workflows whose availability requirements have not yet been met is n; for workflow Wi we define the enhancement parameter PWi as the amount by which that workflow’s current availability needs to be enhanced to meet the availability requirement Pi : PWi = Pi P(Wi) (2) By definition, PWi ≥ 1. We also define the enhancement parameter for each resource as P C1, P C2, ..., P Cm; thus, we form the following constraints: PW1 ≤ P CR1,1 1 · P CR1,2 2 · ... · P CR1,m m PW2 ≤ P CR2,1 1 · P CR2,2 2 · ... · P CR2,m m ... PWi ≤ P CRi,1 1 · P CRi,2 2 · ... · P CRi,m m ... PWn ≤ P CRn,1 1 · P CRn,2 2 · ... · P CRn,m m (3) In other words, the overall availability enhancement for the IT resources within the workflow should be no less than the availability enhancement requirement for the workflow. We take the logarithm of the inequalities 3 to simplify the calculation, yielding:
Therefore solution P(X1..X,..,Xm)has lower cost n(PWi)≤R.1·ln(PC)+.+R1,m·ln(PCm) than P(X1,...,Xi,...,Xm).Thus the former assumption that n(PW2)≤R2.1·n(PC)+.+R2.m·ln(PCm) P(X1,X2,...,Xm)is an optimal solution point is untenable, proving that the optimal solution will definitely exist on some ln(PW)≤R.1·ln(PC)+.+Ri.m·n(PCm) (4)closed lower boundary of the constraint space. ■ Therefore,the closed lower boundaries for the constraint n(PWn)≤Rm,l·ln(PC)+…+Rm,m·ln(PCm) space can be expressed with an equation g(X1,X2,...,Xm)= 0,g(X1,X2,...,Xm)can be a piecewise function to depict the We let In(PC1),In(PC2),...,In(PCm)as X1,X2....,Xm, different closed boundaries. there exists0≤Xi≤ln(pa)because1sPCi≤p; The optimal HA enhancement solution is eventually deter- for the failover HA pattern where only one primary server and mined by the overall utility function.The utility function for one standby server exist in the cluster,we can adjust the upper the specified resource Ci is associated with two parameters: bound to n()and we can adjust the lower bound n the original HA cluster size of resource C(for standalone P(CO from 0 to n()if we want the initial cluster resources,ni is set to 1),and Xi,the enhancement parameter P(C) size to be ni instead of 1,and we let In(PW1),In(PW2),..., for resource i.Therefore,the utility function for resource Ci In(PWn)as B1,B2,..,Bn,therefore the following constraints can be expressed as fi(ni,Xi),and the overall cost will be as should be satisfied: follows: B1≤B1,1·X1+…+B1,m·Xm f(X1,X2,,Xm)=fi(n1,X1)+f2(n2,X2)+… B2≤R2,1·X1+…+B2.m·Xm +fm(nm,Xm)=∑f(n,X) (6) B:≤R,1·X1+…+Ri,m·Xm i=1 The utility function fi(n,Xi)can be defined like this as Bn≤Rn,1·X1++Rn,m·Xm (5) an example: 0≤X≤ln(pCa) 0≤X2≤ln(pa) fi(ni,Xi)=Ei(ni-ni) (7) 4 0≤Xm≤ln(pda) In the above equation,n;denotes the cluster size of resource C;after HA enhancement.and E;denotes the cost for avail- The above constraint forms a continuous region for the ability enhancement per unit;it can include the initial fixed solutions in the multi-dimensional space S(X1,X2,X3.. cost for purchasing hardware and software,and the annual Xm).We utilize a utility function f to depict the overall maintenance cost.The utility function is determined by the cost for HA enhancement,and we will prove that the closed business service providers who want to provide appropriate lower boundaries of the solution space will include the optimal IT resources to support their business services at appropriate solution for the minimum enhancement cost.Therefore we can cost;thus,it may vary according to their demands.Now,we achieve the optimal solution for the utility function subject to can calculate n;according to Xi and we can get the example the constrained solution space of the closed lower boundaries. utility function as equation 8: Theorem 1:The closed lower boundaries of the solution region in the multi-dimensional space S(X1,X2,X3....,Xm) will include the optimal solution Popt. P'(C)=1-(1-P(C)m4 Proof:Assume there exists an optimal solution point P(C)=P(C)·PC P(X1,X2....Xm)in the constraint space beyond the closed Xi=In(PCi) lower boundaries;we will prove that there will exist a solution point which is a better solution compared to point P,therefore →n=r1-Pce91 (8) further proving that the optimal solution Popt locates in the In(1-P(Ci)) closed lower boundaries of the constraint space.Here we In the above formula P'(C:)denotes the enhanced avail- define the overall utility function for HA enhancement as f ability for resource Ci,and P(Ci)denotes the availability of We define x as the mapping from point one single resource.Therefore the optimal solution can be Pi(X1:....Xi;...,Xm)to point Pi(X1,...Xi,....Xm) calculated with the utility function subject to the constraint in closed lower boundary Bi along decreasing direction in depicted by equation g(X1,X2,...,Xm)=0.Following the the X;dimension: Lagrange multiplier method [8],we construct the auxiliary Pi(X1,....Xi;...,Xm)xPi(X1,....Xi,...Xm). function F(X1,X2,...,Xm,A)to calculate the optimal solu- .0<X<Xi and the utility function f always has tion,defining it as equation 9 shows,where f(X1,X2.....Xm) positive correlation with enhancement parameter Xi, denotes the utility function,and g(X1,X2...,Xm)denotes the ..f(P(X1,...,Xi,...Xm))<f(P(X1,...,Xi,...,Xm)). function for the constraint space:
ln (PW1) ≤ R1,1 · ln (P C1) + ... + R1,m · ln (P Cm) ln (PW2) ≤ R2,1 · ln (P C1) + ... + R2,m · ln (P Cm) ... ln (PWi) ≤ Ri,1 · ln (P C1) + ... + Ri,m · ln (P Cm) ... ln (PWn) ≤ Rn,1 · ln (P C1) + ... + Rn,m · ln (P Cm) (4) We let ln (P C1), ln (P C2),..., ln (P Cm) as X1, X2,...,Xm, there exists 0 ≤ Xi ≤ ln ( 1 P (Ci) ) because 1 ≤ P Ci ≤ 1 P (Ci) , for the failover HA pattern where only one primary server and one standby server exist in the cluster, we can adjust the upper bound to ln( 1−(1−P (Ci))2 P (Ci) ) and we can adjust the lower bound from 0 to ln( 1−(1−P (Ci))ni P (Ci) ) if we want the initial cluster size to be ni instead of 1, and we let ln(PW1), ln(PW2),..., ln(PWn) as B1, B2,...,Bn, therefore the following constraints should be satisfied: B1 ≤ R1,1 · X1 + ... + R1,m · Xm B2 ≤ R2,1 · X1 + ... + R2,m · Xm ... Bi ≤ Ri,1 · X1 + ... + Ri,m · Xm ... Bn ≤ Rn,1 · X1 + ... + Rn,m · Xm 0 ≤ X1 ≤ ln( 1 P (C1) ) 0 ≤ X2 ≤ ln( 1 P (C2) ) ... 0 ≤ Xm ≤ ln( 1 P (Cm) ) (5) The above constraint forms a continuous region for the solutions in the multi-dimensional space S(X1, X2, X3,..., Xm). We utilize a utility function f to depict the overall cost for HA enhancement, and we will prove that the closed lower boundaries of the solution space will include the optimal solution for the minimum enhancement cost. Therefore we can achieve the optimal solution for the utility function subject to the constrained solution space of the closed lower boundaries. Theorem 1: The closed lower boundaries of the solution region in the multi-dimensional space S(X1, X2, X3,..., Xm) will include the optimal solution Popt. Proof: Assume there exists an optimal solution point P(X1, X2, ..., Xm) in the constraint space beyond the closed lower boundaries; we will prove that there will exist a solution point which is a better solution compared to point P, therefore further proving that the optimal solution Popt locates in the closed lower boundaries of the constraint space. Here we define the overall utility function for HA enhancement as f. We define ⇒Xi as the mapping from point P1(X1, ..., Xi , ..., Xm) to point Pi(X1, ..., X0 i , ..., Xm) in closed lower boundary Bi along decreasing direction in the Xi dimension: P1(X1, ..., Xi , ..., Xm) ⇒Xi Pi(X1, ..., X0 i , ..., Xm). ∵ 0 < X0 i < Xi and the utility function f always has positive correlation with enhancement parameter Xi , ∴ f(P1(X1, ..., X0 i , ..., Xm)) < f(P1(X1, ..., Xi , ..., Xm)). Therefore solution Pi(X1, ..., X0 i , ..., Xm) has lower cost than P1(X1, ..., Xi , ..., Xm). Thus the former assumption that P(X1, X2, ..., Xm) is an optimal solution point is untenable, proving that the optimal solution will definitely exist on some closed lower boundary of the constraint space. Therefore, the closed lower boundaries for the constraint space can be expressed with an equation g(X1, X2, ..., Xm) = 0, g(X1, X2, ..., Xm) can be a piecewise function to depict the different closed boundaries. The optimal HA enhancement solution is eventually determined by the overall utility function. The utility function for the specified resource Ci is associated with two parameters: ni , the original HA cluster size of resource Ci (for standalone resources, ni is set to 1), and Xi , the enhancement parameter for resource i. Therefore, the utility function for resource Ci can be expressed as fi(ni , Xi), and the overall cost will be as follows: f(X1, X2, ..., Xm) = f1(n1, X1) + f2(n2, X2) + ... +fm(nm, Xm) = Xm i=1 fi(ni , Xi) (6) The utility function fi(ni , Xi) can be defined like this as an example: fi(ni , Xi) = Ei(n 0 i − ni) (7) In the above equation, n 0 i denotes the cluster size of resource Ci after HA enhancement, and Ei denotes the cost for availability enhancement per unit; it can include the initial fixed cost for purchasing hardware and software, and the annual maintenance cost. The utility function is determined by the business service providers who want to provide appropriate IT resources to support their business services at appropriate cost; thus, it may vary according to their demands. Now, we can calculate n 0 i according to Xi and we can get the example utility function as equation 8: P 0 (Ci) = 1 − (1 − P(Ci))n 0 i P 0 (Ci) = P(Ci) · P Ci Xi = ln(P Ci) ⇒ n 0 i = d ln(1 − P(Ci) · e Xi ) ln(1 − P(Ci)) e (8) In the above formula P 0 (Ci) denotes the enhanced availability for resource Ci , and P(Ci) denotes the availability of one single resource. Therefore the optimal solution can be calculated with the utility function subject to the constraint depicted by equation g(X1, X2, ..., Xm) = 0. Following the Lagrange multiplier method [8], we construct the auxiliary function F(X1, X2, ..., Xm, λ) to calculate the optimal solution, defining it as equation 9 shows, where f(X1, X2, ..., Xm) denotes the utility function, and g(X1, X2, ..., Xm) denotes the function for the constraint space:
availability requirement of workflow Wi.In this way,the F(X1,X2,,Xm,)=f(X1,X2,,Xm) priority list of resources can be determined according to the weight.Those resources which support more workflows and +入·g(X1,X2,,Xm)】 (9) more availability-critical workflows will have higher weights. By calculating the following partial derivatives according According to the priority list,the top k resources can be to the Lagrange multiplier method.we can finally get the selected to calculate the HA solution:the calculated solution optimal solution (X1,X2....Xm).(Fdenotes to calculate will be a near optimal solution for only the k candidate re- the partial derivative function for F according to the variable sources which are taken into consideration,but the calculation X.) complexity can be greatly reduced according to the selected 是FX,X2…,Xm,)=0 number k. a成F(X1,X2,,Xm,)=0 D.Computational Complexity Analysis (10) In this section,we analyze the computational complexity 泉F(X,X2,,Xm,)=0 of the conventional exhaustive iteration method and our op- According to the optimal solution for resource HA enhance- timal solution calculation method.Assume that there exist ment (X1,X2,...,Xm),we can get the enhanced availabilities n candidate resources which need to be HA enhanced,and (P(C1),P(C2),...,P(Cm)),and the exact HA solutions we set the upper bound for the cluster size of any resource can be found (e.g,whether a cluster should be constructed to k (which is necessary for the iteration method but not and what is the size of cluster).Assume there should be n for our optimal solution calculation method).Then,for the members to support the HA cluster,the availability capability iteration method,the computational complexity to arrive at the for the cluster should be as follows: optimal solution is·k·..·k,that is,.O(k").For our optimal solution calculation method,since the solution is calculated P(C)=1-(1-P(C)m (11) by solving the equations 10,the computational complexity is According to the above formula,the size n of the cluster only bound by the number of variables in the equations,which can be calculated as follows: have the computational complexity of a polynomial:that is, O(nm),where m is a constant.Apparently,our method is n(1-P(C) scalable to the size of candidate resources,and has much lower n=[in(1-P(C)) (12) computational complexity than the iteration method when the Leveraging the domain information for the component,the number of candidate resources is large. HA cluster pattern can be generated and deployed into the E.Alternative Resource Selection topology. For some HA requirement analysis cases users may not C.Weight-based Optimization Approach to Reduce Calcula-be able to confirm the exact components of the resource,for tion Complexity example,for DB2 HA solution,the user is not sure whether Because the number of candidate resources for availabil-a hotstandby solution with X86 platform will well satisfy ity enhancement over the IT infrastructure can be large,it the HA requirement or a mainframe solution is better,user increases the computational complexity of calculating the may specify several candidate resources types for the exact optimal solutions by solving equations 10.Therefore,we pro- resource.Based on the above analysis,we further propose an pose a method to effectively reduce the number of candidate algorithm with alternative resource selection,as algorithm I resources,in order to simplify the calculation. shows.Here we abstract our availability weak point analysis The principle of our weight-based optimization approach is methodology into a function WeakPointAnalysis(ResourceList, to select a subset of the IT resources,based on weight,for use UtilityFunction,Topology.WorkflowList).As shown in al- in the optimal solution calculation.We note that,for those gorithm 1,we first generate all possible resource lists and resources which are involved in more workflows with more relevant utility functions according to the various candidate critical availability requirements,enhancing the availablity of resource types specified by user,then we leverage function these resources will yield better overall HA enhancement WeakPointAnalysis to calculate various solutions according to for the workflows,in a cost-efficient manner.Therefore,we those various resource lists.Thus we can finally decide the propose a weight-based method to select relevant resources best solution among those candidate solutions. as follows:for resource Ci,we define the weight for Ci F Example calculated as: In this section,we show a detail example to depict our optimal solution calculation work.Fig.4 shows an example W(C)= 〉(R·P) (13) topology for HA enhancement,there exist two candidate 1=1 resources standalone resource Ci and C2 over the original In the above formula,R.;denotes the Integer value defined topology which need to be availability enhanced,and resource in the workflow-resource mapping matrix.P denotes the C3 has been supported by a mainframe which needs no HA
F(X1, X2, ..., Xm, λ) = f(X1, X2, ..., Xm) +λ · g(X1, X2, ..., Xm) (9) By calculating the following partial derivatives according to the Lagrange multiplier method, we can finally get the optimal solution (X1, X2, ..., Xm). ( ∂ ∂X F denotes to calculate the partial derivative function for F according to the variable X.) ∂ ∂X1 F(X1, X2, ..., Xm, λ) = 0 ∂ ∂X2 F(X1, X2, ..., Xm, λ) = 0 ... ∂ ∂λ F(X1, X2, ..., Xm, λ) = 0 (10) According to the optimal solution for resource HA enhancement (X1, X2, ..., Xm), we can get the enhanced availabilities (P 0 (C1), P0 (C2), ..., P0 (Cm)), and the exact HA solutions can be found (e.g., whether a cluster should be constructed and what is the size of cluster). Assume there should be n members to support the HA cluster; the availability capability for the cluster should be as follows: P 0 (Ci) = 1 − (1 − P(Ci))n (11) According to the above formula, the size n of the cluster can be calculated as follows: n = d ln(1 − P 0 (Ci)) ln(1 − P(Ci)) e (12) Leveraging the domain information for the component, the HA cluster pattern can be generated and deployed into the topology. C. Weight-based Optimization Approach to Reduce Calculation Complexity Because the number of candidate resources for availability enhancement over the IT infrastructure can be large, it increases the computational complexity of calculating the optimal solutions by solving equations 10. Therefore, we propose a method to effectively reduce the number of candidate resources, in order to simplify the calculation. The principle of our weight-based optimization approach is to select a subset of the IT resources, based on weight, for use in the optimal solution calculation. We note that, for those resources which are involved in more workflows with more critical availability requirements, enhancing the availablity of these resources will yield better overall HA enhancement for the workflows, in a cost-efficient manner. Therefore, we propose a weight-based method to select relevant resources as follows: for resource Cj , we define the weight for Cj calculated as: W(Cj ) = Xn i=1 (Ri,j · Pi) (13) In the above formula, Ri,j denotes the Integer value defined in the workflow-resource mapping matrix. Pi denotes the availability requirement of workflow Wi . In this way, the priority list of resources can be determined according to the weight. Those resources which support more workflows and more availability-critical workflows will have higher weights. According to the priority list, the top k resources can be selected to calculate the HA solution; the calculated solution will be a near optimal solution for only the k candidate resources which are taken into consideration, but the calculation complexity can be greatly reduced according to the selected number k. D. Computational Complexity Analysis In this section, we analyze the computational complexity of the conventional exhaustive iteration method and our optimal solution calculation method. Assume that there exist n candidate resources which need to be HA enhanced, and we set the upper bound for the cluster size of any resource to k (which is necessary for the iteration method but not for our optimal solution calculation method). Then, for the iteration method, the computational complexity to arrive at the optimal solution is k · k · ... · k | {z } n ; that is, O(k n). For our optimal solution calculation method, since the solution is calculated by solving the equations 10, the computational complexity is only bound by the number of variables in the equations, which have the computational complexity of a polynomial: that is, O(n m), where m is a constant. Apparently, our method is scalable to the size of candidate resources, and has much lower computational complexity than the iteration method when the number of candidate resources is large. E. Alternative Resource Selection For some HA requirement analysis cases users may not be able to confirm the exact components of the resource, for example, for DB2 HA solution, the user is not sure whether a hotstandby solution with X86 platform will well satisfy the HA requirement or a mainframe solution is better, user may specify several candidate resources types for the exact resource. Based on the above analysis, we further propose an algorithm with alternative resource selection, as algorithm 1 shows. Here we abstract our availability weak point analysis methodology into a function WeakPointAnalysis(ResourceList, UtilityFunction, Topology, WorkflowList). As shown in algorithm 1, we first generate all possible resource lists and relevant utility functions according to the various candidate resource types specified by user, then we leverage function WeakPointAnalysis to calculate various solutions according to those various resource lists. Thus we can finally decide the best solution among those candidate solutions. F. Example In this section, we show a detail example to depict our optimal solution calculation work. Fig.4 shows an example topology for HA enhancement, there exist two candidate resources standalone resource C1 and C2 over the original topology which need to be availability enhanced, and resource C3 has been supported by a mainframe which needs no HA