Load consolidation strategy

This specification relates to blueprint: https://blueprints.launchpad.net/watcher/+spec/basic-cloud-consolidation-integration

Problem description

Watcher is a framework which provides support for more energy efficient OpenStack operations. It does this by providing access to system state information and a set of available actions which can be performed on an OpenStack installation. It is specifically designed to provide support for different approaches to realizing energy efficient operations: consequently, interested parties are encouraged to provide their own energy efficiency approaches and integrate them with Watcher. This specification focuses on integration of the rudimentary load consolidation mechanism developed at ICCLab cloud computing research lab at Zürcher Hochschule für Angewandte Wissenschaften (ZHAW) with Watcher.

The original code implementing this algorithm (not in the context of Watcher) was published here: https://github.com/icclab/cloud-consolidation

Use Cases

The use case is one in which the Administrator wants to perform a load consolidation on the resources to reduce the amount of underutilized servers. The Administrator invokes Watcher with the Goal of “VM_WORKLOAD_CONSOLIDATION”. Watcher then executes the Strategy “VM_WORKLOAD_CONSOLIDATION_STRATEGY”. It then presents a set of Actions to the Administrator. The Administrator then approves the recommended action plan - typically VM live-migration actions - and instructs Watcher to perform the actions.

Project Priority

Not relevant because Watcher is not in the big tent so far.

Proposed change

The proposed change is to add a new Goal and a new Strategy to Watcher. The new Goal is “VM_WORKLOAD_CONSOLIDATION” and the new Strategy is “VM_WORKLOAD_CONSOLIDATION_STRATEGY”. The new Strategy is designed to be a lightweight consolidation mechanism which can be tuned based on experience; it also operates quickly. The purpose of the strategy is to move the aggregate operating point of the Cluster to increase the number of servers with moderate to high load and minimize the number of servers with low load. This can be used in conjunction with a server management mechanism to reduce overall energy consumption.

The new Strategy will leverage a modified first-fit algorithm to achieve increased server CPU and memory utilization which ultimately leads to freeing some of the hosts that can be powered down to save energy. It comprises of two phases, one focused on identifying server with high load and reducing their load and one focused on identifying servers which have spare underutilized capacity. Each of these operates as a first-fit algorithm with utilization ordered in different ways as input to each.

This Strategy will consider compute host’s CPU utilization and memory constraints. These upper utilization thresholds can be set relative to resource capacity and hence will provide simple resource overbooking management if needed. This strategy will not deal with any other limitations such as actual VM memory change rate, network constraints, etc. and relies upon a robust live migration mechanism.

In order to be able to predict host resources utilization the following utilization estimation model is used. A host resource utilization equals to a sum of the resource utilizations of the hosted workloads (VMs). Considering hosts H1, H2 with a workload W running on H1, moving the workload W from H1 to H2 will result in predicted resource utilization as follows: H1 = H1 - W and H2 = H2 + W with the metrics relating to the VM taken from telemetry and those pertaining to the host available via nova metrics.

The strategy will work in two phases. The first phase handles decreasingly sorted hosts (by their CPU utilization) whose CPU utilization is exceeding defined threshold and offloads their workload (VM) to the first suitable less loaded host which is able to accommodate the workload without violating any of the constraints described. This host offloading process is repeated for all overloaded hosts until the host’s CPU utilization is predicted to be under the threshold. Doing so for all overloaded servers outputs in a system without overloaded servers. In this phase the workloads (VM) are handled sorted increasingly by its CPU utilization. The second phase then iterates through the servers in reversed order (sorted increasingly by their CPU utilization and thus starting with the least loaded servers) and looks for a smallest possible space where to accommodate its remaining workloads starting with the largest workload and the most loaded hosts. This process is repeated until there is no workload (VM) left on the host in which case this host can be deactivated. This continues again for the next hosts in the same manner until the source and the destination host becomes the same. In this phase the workloads (VM) are handled sorted decreasingly by its CPU utilization.

Both phases result in a solution whose execution leads to a consolidated system with no overloaded hosts.

This change will not affect any existing Strategies and will not affect Watcher performance.

Concretely, the new Strategy will be implemented as a new Strategy called VMWorkloadConsolidationStrategy inheriting from BaseStrategy. The implementation will be very much based on the BasicConsolidation example in the current Watcher codebase.

Alternatives

The alternatives to this approach are to use different Goals and associated Strategies defined in Watcher.

Data model impact

None expected.

Having reviewed the data models for both information available to the different Strategies as well as the data models for the Actions, we believe that no modifications are necessary to implement this Strategy.

REST API impact

There is no impact on the REST API.

Security impact

As the strategy only computes a new VM placement and doesn’t deal with placement itself, no security impact is envisaged.

Notifications impact

No specific notifications associated with executing a specific Strategy are envisaged. (Notifications could arise from the resulting actions, but these are presumably handled in other parts of Watcher).

Other end user impact

This capability will not have any specific impact on the API. It will have a small impact in how it is used via the python-watcherclient as a new option will now be available for goal parameter in an Audit Template.

Performance Impact

No specific performance impact is envisaged. The Strategy has been designed to operate over hundreds of servers in the order of a few seconds.

Other deployer impact

No specific deployer impact is envisaged.

Developer impact

This will not impact other developers working on OpenStack.

Implementation

Assignee(s)

Primary assignee:

Seán Murphy <murp>

Other contributors:

Bruno Grazioli <bwg-bruno> Vojtech Cima <cima-vojtech>

Work Items

This task can be considered atomic. It just requires the development and test of a single class.

Dependencies

No dependencies.

Testing

Several unit tests will be provided to test various scenarios using a fake mock models (mock model collector and mock metrics collector) including edge scenarios such as a consolidation of an empty cluster, a consolidation of randomly generated clusters or consolidation of an overloaded cluster.

Testing approaches similar to the basic consolidation strategy will be used, comprising of unit tests and integration tests in which a specific input is given and compared against the expected output.

Documentation Impact

It will be necessary to add new content relating to this new Goal and Strategy to the documentation.

References

No references.

History

No history.