Cinder Volume Active/Active support - General description

https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support

Right now cinder-volume service can run only in Active/Passive HA fashion and this spec proposes a possible path to support Active/Active configurations in Cinder Volumes.

This spec will only provide a general description of the problem enumerating the different issues we have to resolve without actually going into too much detail. It’s more an eagle’s eye kind of view of the problem.

Each specific issue will have its own spec that gives a detailed description of the problem with proposed solution to the problem.

Problem description

Right now cinder-volume service only accepts Active/Passive High Availability configurations, and there are a number of things that need to change for it to support Active/Active configurations.

API Races

On API nodes given current code we are open to races in the code that affect resources on the database, and this will be exacerbated when working with Active/Active configurations.

Local Manager Locks

We have multiple local locks in the manager code of the volume nodes to prevent multiple green threads from accessing the same resource on specific operations.

This locking is local to the nodes and doesn’t extend to other nodes, so we need to solve mutual exclusion among volume nodes of the same cluster.

Job distribution

Cinder has no concept of clusters, only has the concept of hosts and each host implements a specific backend/service. A mechanism is needed to group hosts from the same cluster under the same conceptual unit while retaining the individual identities of the nodes belonging to the cluster for differentiation in the clean up of crashed nodes.

Cleanup

Right now only one node can work on a specific backend, and therefore on the resources that it contains, so the cleanup is done by the node itself on startup. And if the node does not come up and the resources are left on a stale state it is not a big deal.

It is different with an Active/Active deployment since multiple nodes are sharing the same storage back-end and a node can only do cleanup for the nodes he was working on when he died/failed.

It is also important to do proper cleanup even when a specific node does not come back to life, since other nodes from the same cluster can still manage those resources.

Data Corruption Prevention

Since multiple nodes will be accessing the same storage back-end we have to be extra careful not to access resources that are accessed by other nodes.

More relevant case is when we lose connection to the DB and we no longer can send Service Heartbeats, since Scheduler’s cleanup process (explained in Cleanup proposed changes) will come into place and we could have 2 different nodes accessing the same resource, one because it’s still working on it and the other because it is trying to do the cleanup.

Drivers’ Locks

Some drivers require mutual exclusion for certain operations or when accessing the same resources.

This mutual exclusion is currently being done using local locks in the same way the manager does and they need to be able to work when multiple nodes are accessing the same storage back-end.

Use Cases

Operators that have hard requirements, SLA or other reasons, to have their cloud operational at all times or have higher throughput requirements will want to have the possibility to configure their deployments with an Active/Active configuration.

Proposed change

API Races

Races on the API nodes will be removed used compare-and-swap updates to the DB.

Job distribution

Job distribution will add the concept of cluster to cinder and send jobs using a topic message queue using the cluster instead of the host like we are doing now.

Cleanup

Cleanup will keep track of resources that are have ongoing operations and will have cleanup mechanisms on the Scheduler as well as the Volume nodes.

Cleanup on the nodes will happen on initialization as it is doing now but we’ll also have an automatic cleanup job on the scheduler for the cases where a node with the same host name is not brought up.

Automatic cleanup mechanism will be disabled by default and it will be possible to trigger it manually.

Data Corruption Prevention

Stop listening to new jobs from the Message Broker and halt all ongoing operations so we are no longer accessing resources on the Storage Backend.

Manager Local Locks

Default solution will be using a DLM with TooZ as the abstraction layer:

An alternative solution, that will be initially left as nice to have, will be available for systems that don’t want to install a DLM solution and are using drivers that don’t require distributed locking for Active-Active configurations. This solution replaces local file locks on c-vol’s manager with a DB locking mechanism using workers DB table (introduced by Cleanup changes).

Drivers’ Locks

We will be using a DLM solution with TooZ as the abstraction layer:

Alternatives

There are quite a number of alternatives to not only each of the issues we need to fix, and they are discussed in the respective specs except for the Drivers’ lock alternative that creates a generic locking mechanism extending the locking mechanism implemented to remove Manager Local Locks.

Data model impact

Discussed in the respective specs.

REST API impact

Discussed in the respective specs.

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

Discussed in the respective specs.

Other deployer impact

Discussed in the respective specs.

Developer impact

None

Implementation

Assignee(s)

Discussed in the respective specs.

Work Items

  • API Races

  • Job distribution

  • Cleanup

  • Data Corruption Prevention

  • Manager Local Locks

  • Drivers’ Locks

Dependencies

None

Testing

Discussed in the respective specs.

Documentation Impact

Discussed in the respective specs.

References

None