Hyper-Converge Containers

date:2017-09-01 22:00
tags:containers, hyperconverged, performance

Reduce container counts across the infra structure hosts.

To lower our deployment times and resource consumption across the board. This spec looks to remove single purpose containers that have little to no benefit on the architecture at scale.

This change groups services resulting in fewer containers. This does not mix service categories so there’s no worry of cross polluting a different service with unknown packages or unknown workloads. We’re only look to minimize the container types we have and simplify operations. By converging containers we’re removing no less than 10 steps in the container deployment process and the service setup. Operationally we’re reducing the load on operations teams managing clouds at any scale.

Problem description

When we started this project we started with the best of intentions to create a pseudo micro-service model for our system layout and container orchestration. While this works today, it does create a lot of unnecessary containers in terms of resource utilization.

Proposed change

Converge groups of containers found within the env.d directory into a single container where at all possible. Most the changes we need to get this work done have already been committed. In some instances we will need to “revert a change” to get the core functionality of this spec into master but there will be little to no development required to get the initial convergence work completed.

Once the convergence work is complete we intend to develop a set of playbooks which will allow the deployer to run an “opt-in” set of tasks which will cleanup containers and services wherever necessary. Services behind a load balanacer will need to be updated. Updates to the load balancer will be covered by the “opt-in” playbooks provided the environment is using our supported software LB (HAProxy). The “opt-in” playbooks will need to be codified, tested, and documented. Should it be decided that the hyperconverged work is to be cherry-picked to a stable branch, the new playbooks will need to first exist and be tested within our periodic gates. We should expect no playbook impact in-terms of the general deployer workflow.

Alternatives

We could leave everything as-is which carries the resource requirements we currently have along with an understanding that the resources required will grow given the fact OpenStack services, both existing and net new, are ever expanding.

Playbook/Role impact

At least one new playbook will be added allowing a deployer to cleanup old container types from the run-time and inventory should they decide to. The cleanup playbook(s) will be “opt-in” and will not be part of our normal automated deployment process.

Upgrade impact

There is no upgrade impact with this change as any existing deployment would already have the all required associations within inventory. Services would continue to function normally after this change. Greenfield deployments on the other hand would have fewer containers to manage which reduces the resource requirements while also ensuring we retain the host, network, and process separation we have today.

We will create a set of playbooks to cleanup some of the redundant containers that would exist post upgrade however the execution of this playbook would be opt-in.

Security impact

Security is not a concern within this spec however reducing the container count would reduce the potential attack surface we already have.

Performance impact

Hyperconverging containers will reduce resource consumption on physical host. Reducing the resources required to run an OpenStack cloud will improve the performance of the playbooks and the system as a whole.

End user impact

N/A

Deployer impact

Deployers will have fewer containers to manage and be concerned with as they run clouds for long periods of time.

  • Within an upgrade scenario a deployer will have the option to “opt-in” to a hyperconverged setup. This change will have no service impact on running deployments by default.

Developer impact

N/A

Dependencies

  • If we’re to test the “opt-in” cleanup playbooks we’ll need a periodic upgrade gate job. The playbooks would be executed by the upgrade gate job and post results to the ML/channel so that the OSA development team is notified of the failure.

Implementation

Assignee(s)

Primary assignee:
Kevin Carter (IRC: cloudnull) Major Hayden (IRC: mhayden)

Work items

  • Converge the containers into fewer groups
  • Create the “opt-in” container reduction playbooks
  • Document the new playbooks

Testing

  • The core functionality of this patch will be tested on every commit.
  • If the upgrade test dependencies are met we can create a code path within the periodic gates and test the “opt-in” cleanup playbooks.

Documentation impact

Documentation will be created for the “opt-in” container cleanup playbooks created.

References

N/A