Platform Best Practices The Platform is the technical portion of the product. The best practices applied here are to help provide the most optimal options for the platform to provide a stable running environment for daily operational use. Before deployment of vRealize Operations Manager, the first step is to size the environment. This section will cover sizing and backup& restore or disaster recovery. These best practices will help ensure that the platform, v Realize OperalonsRp deploying the product. Additional best practices are included for administration tasks such Manager, is properly sized, running and able to handle the monitored load efficiently Sizing Storage Approach Size the deployment with twelve to eighteen months of infrastructure growth When an environment outgrows the original deployment size, performance degradation and usability problems may become present. Planning for infrastructure growth of twelve to eighteen months will allow the system to continue functioning without the need to immediately resize or scale out the deployment. For example, if you nticipate a 10%annual growth, increase the initial size by 15% to obtain an eighteen-month sizing Review the sizing guidelines frequently and often during the growth of the environment(resizing To keep the environment running with optimal parameters, it is important to review the sizing guidelines and resize the deployment if necessary. Even with expected growth, reviewing the sizing guidelines regularly will proactively prevent performance and usability problems typically associated with undersized environments General Guidelin Validate the sizing guidelines with your actual environment The sizing guidelines provide general estimates and requires confirmation with the actual environment. For example, the data entered into the sizing calculator may yield additional objects not captured in the actual Calculate only the components which will be monitored It is possible that some components do not need to be monitored; therefore, exclude those components in the sizing calculations Size the cluster There are multiple sizes for analytics nodes, extra small, small, medium, large and extra-large. It is best to use the least number of nodes when possible. For example, if the recommendation is to have 10 large nodes or 4 ex large nodes, use the lesser extra-large nodes to minimize communication across more nodes Size the remote collectors There are two sizes for default remote collectors, standard and large. Use the correct size remote collector based nvironment The default setting for data retention is six months. If three months is all that is needed, lower the default value. nderstand what you gain when using long data retention periods. It may not necessarily help having longer tention periods. Depending on your deployments needs, configure the retention period to suit your requirements vRealize Operations Manager Best Practices /6
vRealize Operations Manager Best Practices /6 Platform Best Practices The Platform is the technical portion of the product. The best practices applied here are to help provide the most optimal options for the platform to provide a stable running environment for daily operational use. Before deployment of vRealize Operations Manager, the first step is to size the environment. This section will cover sizing and recommendations after deploying the product. Additional best practices are included for administration tasks such as backup & restore or disaster recovery. These best practices will help ensure that the platform, vRealize Operations Manager, is properly sized, running and able to handle the monitored load efficiently. Sizing Storage Approach • Size the deployment with twelve to eighteen months of infrastructure growth When an environment outgrows the original deployment size, performance degradation and usability problems may become present. Planning for infrastructure growth of twelve to eighteen months will allow the system to continue functioning without the need to immediately resize or scale out the deployment. For example, if you anticipate a 10% annual growth, increase the initial size by 15% to obtain an eighteen-month sizing recommendation. • Review the sizing guidelines frequently and often during the growth of the environment (resizing) To keep the environment running with optimal parameters, it is important to review the sizing guidelines and resize the deployment if necessary. Even with expected growth, reviewing the sizing guidelines regularly will proactively prevent performance and usability problems typically associated with undersized environments. General Guidelines • Validate the sizing guidelines with your actual environment The sizing guidelines provide general estimates and requires confirmation with the actual environment. For example, the data entered into the sizing calculator may yield additional objects not captured in the actual environment or vice versa. • Calculate only the components which will be monitored It is possible that some components do not need to be monitored; therefore, exclude those components in the sizing calculations. • Size the Cluster There are multiple sizes for analytics nodes, extra small, small, medium, large and extra-large. It is best to use the least number of nodes when possible. For example, if the recommendation is to have 10 large nodes or 4 extralarge nodes, use the lesser extra-large nodes to minimize communication across more nodes. • Size the Remote Collectors There are two sizes for default remote collectors, standard and large. Use the correct size remote collector based on collected data. If necessary, use multiple remote collectors to ensure proper sizing of remote collectors for the environment. • Adjust the time series data retention to keep data for a timeline which data is truly needed The default setting for data retention is six months. If three months is all that is needed, lower the default value. Understand what you gain when using long data retention periods. It may not necessarily help having longer retention periods. Depending on your deployments needs, configure the retention period to suit your requirements
Consider additional storage and io requirements for longer data retention For those times when longer data retention periods are required, consider additional storage and increased I0 requirements. For example, retail businesses may need to keep more than one year to account for seasonal peaks Leverage the additional time series retention to keep longer historical data while minimizing the time series data The default setting for additional time series retention is thirty-Six mon Adjust the default value to a necessary Only install Management Packs that are available on the VMware Solution Exchange There are several management packs available for vRealize Operations Manager, however, only management packs certified and supported by VMware are available on the VMware Solution Exchange Before adding Management Packs, verify the additional metrics they will providing The metric name may look correct but may not al ways mean it is what you want. Be sure that the metrics from management packs are what you really need and used properly; otherwise, disable unnecessary metrics Architecture High Availability(HA) Understand what HA provides(or does not provide) before enabling(or disabling) Enabling HA may require double the resources, as data is stored redundantly in two nodes as opposed to only on one node when Ha is disabled. Since the data is being stored in two nodes, this limits the total capacity by 50% For example, a deployment of 6 extra-large nodes will support the maximum number of objects rEalize HA DIsabled HA Enabled 6.6 180.000 240.000 120,000 7.0 HA will allow losing only one data node for the cluster to remain functional. It is important to understand and Enable ha only after all nodes in the cluster have been added and are online fit the appropriate sizing and then enable HA. If adding new data nodes to an existing cluster, add as many dal o dd all data nodes to the cluster before enabling HA. On new deployments, add data nodes to build the cluster nodes as necessary, then enable HA. The goal is to minimize the number of times for enabling HA; the process to enable Ha can be very disruptive so perform only when necessary Deploy analytics cluster nodes on separate hosts for redundancy and isolation If possible, establish a 1: I mapping for nodes to hosts. This will protect the cluster if one host goes down, then vRealize Operations Manager Best Practices/
vRealize Operations Manager Best Practices /7 • Consider additional storage and IO requirements for longer data retention For those times when longer data retention periods are required, consider additional storage and increased IO requirements. For example, retail businesses may need to keep more than one year to account for seasonal peaks. • Leverage the additional time series retention to keep longer historical data while minimizing the time series data retention period. The default setting for additional time series retention is thirty-six months. Adjust the default value to a necessary period and lower the time series data retention period to save on the amount of data being retained. • Only install Management Packs that are available on the VMware Solution Exchange There are several management packs available for vRealize Operations Manager; however, only management packs certified and supported by VMware are available on the VMware Solution Exchange. • Before adding Management Packs, verify the additional metrics they will providing The metric name may look correct but may not always mean it is what you want. Be sure that the metrics from management packs are what you really need and used properly; otherwise, disable unnecessary metrics. Architecture High Availability (HA) • Understand what HA provides (or does not provide) before enabling (or disabling) Enabling HA may require double the resources, as data is stored redundantly in two nodes as opposed to only on one node when HA is disabled. Since the data is being stored in two nodes, this limits the total capacity by 50%. For example, a deployment of 6 extra-large nodes will support the maximum number of objects: vRealize Operations Manager HA Disabled HA Enabled 6.6 180,000 90,000 6.7 240,000 120,000 7.0 240,000 120,000 • HA will allow losing only one data node for the cluster to remain functional. It is important to understand and weigh the cost of the extra resources to the benefits that HA provides. • Enable HA only after all nodes in the cluster have been added and are online Add all data nodes to the cluster before enabling HA. On new deployments, add data nodes to build the cluster to fit the appropriate sizing and then enable HA. If adding new data nodes to an existing cluster, add as many data nodes as necessary, then enable HA. The goal is to minimize the number of times for enabling HA; the process to enable HA can be very disruptive so perform only when necessary. • Deploy analytics cluster nodes on separate hosts for redundancy and isolation If possible, establish a 1:1 mapping for nodes to hosts. This will protect the cluster if one host goes down, then
only one node is lost and the cluster remains functional. If it is not possible to establish a 1: I mapping for nodes to host, make sure to separate the master node and master replica node on different hosts. This will safeguard the cluster if one of these hosts were to go down. Use anti-affinity rules that keep nodes on specific hosts in the vSphere cluster To keep nodes separately on different hosts, use anti-affinity rules to prevent grouping of nodes on specific hosts The idea is to prevent multiple nodes from going down if hosted on one node Roles may change for nodes so statically naming a node a specific name may be confusing. For example, a node named" may no longer be the actual master node after promoting the replica node. This will avoid user confusion associated with poor naming convention. Ha is not a Disaster Recovery(dr)strategy Hsed os e ealize hopera tons Ma nager is not a disaster revere mechanism sh a separate a wis alton me st ontinue running if either the master node, the replica node or one data node fails. The entire cluster does not recover if multiple nodes fail at the same time · Hosts need to be on th For performance and consistency, use of the same storage is required Remote Collectors Consider using Remote Collectors for local collections with larger vEnters(7K objects) Using remote collectors will help to reduce bandwidth across data centers and reduce the load on the rEalize Operations Manager analytics cluster Create collector groups when using multiple Remote Collectors When utilizing multiple remote collectors for one vCenter, create a collector group to provide high availability and Deploy or update Remote Collectors to the same version of the Analytics nodes Do not utilize mixed versions of Remote Collectors and Analytics nodes. Not only is a cluster running mixed versions unsupported, it may exhibit potential problems Use Remote Collectors when using End Point Operations Manager(EPOps)agents Use remote collectors to isolate collection from End Point Operations Manager agents and reduce the load on the Remote Collectors based on number of collecting objects/metrics remote collectors using the default sizing of standard and large nodes to accommodate the number of objects Remote Collectors are necessary to be included in the backup strategy nclude all remote collectors when taking a backup to restore the entire cluster health. Load Balancers Use load balancers to provide a single Ul entry for users vRealize Operations Manager Best Practices/s
vRealize Operations Manager Best Practices /8 only one node is lost and the cluster remains functional. If it is not possible to establish a 1:1 mapping for nodes to host, make sure to separate the master node and master replica node on different hosts. This will safeguard the cluster if one of these hosts were to go down. • Use anti-affinity rules that keep nodes on specific hosts in the vSphere cluster To keep nodes separately on different hosts, use anti-affinity rules to prevent grouping of nodes on specific hosts. The idea is to prevent multiple nodes from going down if hosted on one node. • Name nodes independent of role Roles may change for nodes so statically naming a node a specific name may be confusing. For example, a node named ‘Master’ may no longer be the actual master node after promoting the replica node. This will avoid user confusion associated with poor naming convention. • HA is not a Disaster Recovery (DR) strategy HA for vRealize Operations Manager is not a disaster recovery mechanism so a separate DR solution must be used. See https://www.vmware.com/support/pubs/vmware-vrealize-suite-pubs.html . HA will allow the cluster to continue running if either the master node, the replica node or one data node fails. The entire cluster does not recover if multiple nodes fail at the same time. • Hosts need to be on the same storage For performance and consistency, use of the same storage is required. Remote Collectors • Consider using Remote Collectors for local collections with larger vCenters (>7K objects) Using remote collectors will help to reduce bandwidth across data centers and reduce the load on the vRealize Operations Manager analytics cluster. • Create collector groups when using multiple Remote Collectors When utilizing multiple remote collectors for one vCenter, create a collector group to provide high availability and redundancy. • Deploy or update Remote Collectors to the same version of the Analytics nodes Do not utilize mixed versions of Remote Collectors and Analytics nodes. Not only is a cluster running mixed versions unsupported, it may exhibit potential problems. • Use Remote Collectors when using End Point Operations Manager (EPOps) agents Use remote collectors to isolate collection from End Point Operations Manager agents and reduce the load on the vRealize Operations Manager analytics cluster. • Size Remote Collectors based on number of collecting objects/metrics Size remote collectors using the default sizing of standard and large nodes to accommodate the number of objects and metrics, which it will be collecting. • Remote Collectors are necessary to be included in the backup strategy Include all remote collectors when taking a backup to restore the entire cluster health. Load Balancers • Use load balancers to provide a single UI entry for users