This proposal is to develop a general solution for preventing race conditions which will work across services and in deployments where there are multiple copies of the same services (commonly known as Active/Active HA deployments).
The focus is on keeping all state in the database, and protecting changes to database state using briefly-held locks. Also concurrent operations which are mutually exclusive should fail as early as possible with a helpful error code to simplify the retry logic of upper layers.
Certain operations in Manila should not be allowed to proceed in parallel, because the result of one operation would prevent the other operation from completing successfully.
For example, taking a snapshot of a share cannot happen simultaneously while deleting that share. Either the snapshot must occur first, which prevents the delete – or the delete must occur first, which prevents the snapshot. Unfortunately, not enough state is stored in the database to prevent these operations from racing with each other, so in practice two API calls can both proceed through the API service to the share manager, where eventually an error will occur and one or both operations will fail mysteriously.
There are multiple scenarios like the above where undefined behavior results. This specification does not attempt to enumerate all of them because the goal is to describe a mechanism for fixing these kinds of issues rather than explicitly fixing all such issues. Generally speaking, race conditions should be treated as bugs, but up until now Manila has lacked the tools to fix these bugs reliably.
More transitional states will be added so that operations which can conflict with other operations can be explicitly detected by looking at the share state. This spec only proposes one specific new state to address the races involving snapshots, but more generally provides a framework for resolving similar races as they are discovered.
Transitions between states will always be done while holding a distributed lock – a lock implemented by a distributed lock manager (DLM). Use of distributed locks ensures that all services see the same locking state even if services run on different nodes, and even across transient failures such as node failures and network partitions. The lock will be held only for the duration of the database test-and-set operation to minimize lock contention.
No locks will be held during calls from the share manager to the share driver. Mutual exclusion between driver calls will be achieved with state checks.
No locks will be held during RPC calls or casts.
The approach used by Cinder which relies on elaborate SQL calls to compare-and-swap fields was considered but rejected for the following reasons:
New states will be added:
New states will be visible through any API that shows states. Also new error conditions will become possible as we detect races earlier and report them directly.
The behavioral changes related to locking will not be microversioned, as it won’t be possible or desirable to emulate the old behavior once the changes are implemented. However in cases where new states are added, those changes will be microversioned so that clients which depend on the new states can detect that the server supports them.
Distributed locking is expected to moderately slow down state changes. Also adding more state changes will slow down operations that require them.
Requirement to deploy and configure suitable Tooz backend. Since Manila will depend on tooz for correctness, tooz backends that fail to meet the API contract won’t be suitable.
This will be significant. Developers will need to follow the new model for all features that involve state changes. Also care will be needed with locks to avoid deadlock situations. Holding locks for very limited time will help avoid deadlocks but in case 2 locks are ever held at the same time, they need to be deadlock-proofed by establishing a lock order.
Existing tests will help ensure no regressions, but to detect race conditions we need rally tests or similarly high-concurrency tests.
Admin guide - need to document tooz requirements.
Developer reference - need to document state machines and locking protocol