Scalable Backup Service¶
https://blueprints.launchpad.net/cinder/+spec/scalable-backup-service
Because cinder backup workloads run as a dedicated service, they have potential to be run independently of other cinder volume services. Ideally, backup services would be fired up on demand in the cloud as backup workloads are generated, and elastically de-commissioned as workloads go away. Ideally, backup tasks would farmed out to the least busy of the available backup services.
Pursuit of this ideal is blocked today by a tight coupling of cinder backup service to cinder volume service.
This spec is dedicated to breaking this tight coupling of the cinder backup and volume services.
That is the first step in working towards a truly elastic, horizontally scalable backup service. When this coupling is loosened, multiple backup services can be fired up concurrently and run without any requirement that they they be colocated with the volume services or with one another for that matter.
We expect that there will be followup work to address subsequent steps such as scheduler support for backup tasks and for dynamic generation of backup service processes using VMs or containers rather than dedicated physical nodes, but those subjects would be addressed by followup specs and are not in scope here.
One thing at a time.
Problem description¶
When the backup service backs up or restores a volume, it uses a backup driver specific to a chosen and configured backup repository to interact with e.g. a swift or Ceph object store, an NFS, glusterfs, or generic POSIX filesystem, or Tivoli Storage Manager. This backup driver plugs in to the backup manager and provides functionality to put data in the backup repository or to retrieve it from the repository. A particular backup service process has only one backup driver and therefore only one backup repository. It may, however, perform backup and restore operations for volumes for any of the enabled backends handled by the volume service.
The backup service requires backend-specific functionality to attach to any particular volume in order to read it for backup or to write to it for restore.
Today, this backend-specific information is provided as follows. When
the backup service process starts up, the backup manager loads the
volume manager module for each enabled backend configured in
cinder.conf
. When an OpenStack tenant or administrator makes a
request to create a backup from a cinder volume or to restore a backup
to a cinder volume, the backup api looks at the host
field for the
volume in question and directs the rpc for the backup or restore
operation to the backup manager on that node. There, the backup
manager selects the volume
manager for the relevant volume backend
and invokes its driver’s backup_volume
or restore_volume
method, passing that method the backup object and the backup
driver (swift, ceph, NFS, etc.) for the configured backup
repository. Then the volume driver attaches to the volume in
question, opens it, and invokes the backup driver’s backup
or
restore
method using the file handle from the volume driver’s
attach and open operation.
Thus today the backup manager does not invoke its backup driver methods directly; it finds an appropriate volume driver via the appropriate volume manager, and the volume driver invokes the appropriate backup driver. The volume driver by definition runs on the cinder node that runs the volume service for the relevant backend, so since the backup manager, volume driver, and backup driver code all run in the context of a single operating system process on a single node, the backup service is necessarily tied to the node that also runs the volume service.
When multiple pools and backends run on the same node, this means that concurrent backups end up running into the resource constraints of a single node. Since backup tasks do not flow through a scheduler, they typically also run with the resource constraints of a single operating system process.
Use Cases¶
An OpenStack tenant or administrator needs to do backups on a schedule or restores on demand such that to get the work done in a timely manner many backups and restores need to run concurrently.
Resource requirements to handle peak backup and restore loads exceed the capabilities of a standard physical node.
Note that the need for concurrent backup and restore operations, especially to the same backend, may be artificially suppressed historically because of the lack of support for live backups. We anticipate that now that backups of in-use volumes are supported ([3] [5]) this need will significantly increase.
Proposed change¶
Leverage remote attach capability to move the functionality provided by the volume driver into the backup manager. That way the backup manager can invoke the backup driver directly rather than having to load the volume manager at startup and run the relevant volume manager’s driver.
Nowadays attaching a volume is done using a brick
library that
builds an appropriate connector
for the attachment given
information about the backend (iSCSI, FibreChannel, NFS, whatever)
obtained by running a volume driver initialize_connection
method
– either directly or via an rpc to the appropriate volume service.
In the case of a remote attachment, initialize_connection
is
invoked by rpc. This is the only part of the workflow that actually
requires the volume service, and since it can be an rpc invocation,
this means that the direct load of volume manager and volume driver
can be eliminated in the backup service itself and the backup service
can therefore run on a different node than the node for the backend’s
volume service.
Now that backups of in-use volumes are supported, volume drivers
can supply an attach_snapshot
method, which is then used as
optimization instead of attaching a temporary volume copy of the
source volume. In the initial implementation [6], an attach_snapshot
method was added that only allows for local attaches and the reference
lvm driver
explicitly uses local_path
when getting volumes
for backup operations [5]. As part of the work implementing this
blueprint spec, we will need to revisit snapshot attachment to allow
for remote attaches. Drivers like lvm driver
that cannot support
remote attach for snapshots will need to fall back to using temporary
volumes instead [5].
* Create a temporary volume from the original volume.
* Backup the temporary volume.
* Clean up the temporary volume.
Alternatives¶
Keep the existing scheme. Backup service continues to work but cannot scale out.
The node running backup service has to be scaled up to handle projected peak backup workload and is likely to be either underutilized or underpowered for most actual backup workloads.
Instead of loading the volume manager directly from the backup manager, expose
backup_volume
andrestore_volume
methods in the volume manager, have these invoke the corresponding volume driver’s methods by the same name, and have the backup manager use rpc to the volume service to trigger the volume manager’sbackup_volume
andrestore_volume
methods.This approach would reduce memory requirements for the backup service process since it would no longer load the volume managers for all enabled backends. And the linkage between backup manager and volume service is now loosely coupled. However, the bulk of the work - data transfer, encryption, compression - is done by the backup driver, which still remains tightly coupled to the volume service.
What this specification does not solve¶
A backup task scheduler.
The backup api process can get a list of active backup services from the database and choose as an rpc destination e.g. the first service, make a random choice, or round-robin among the choices. This is where a call to a scheduler could go in the future, but a scheduler is not itself in scope for this spec.
Elastic backup service placement.
Backup services will be started more or less manually by an administrator or configured to start on boot of a node. One can imagine a dynamic mechanism for starting service VMs or containers triggered by the backup api as workloads arrive. The decoupling of backup and volume services addressed by this spec is a pre-condition for such an elastic backup service placement capability but it is only a small step towards enabling such a capability.
Service init_host
cleanup¶
At startup, the current backup service code makes an attempt to discover and cleanup orphaned, incomplete backup and restore operations (e.g., they were in process when the backup process itself was terminated). The backup service assumes that it is the only backup process, so that if it finds backups in creating or restoring state at startup it can safely reset their state and detach the volumes that were being backed up or restored.
This assumption is not safe if multiple backup processes can run concurrently, and on separate nodes. At startup, a backup service needs to distinguish between in-flight operations that are owned by another backup-service instance and orphaned operations.
Eventually, it will make sense for a backup service process to cleanup stuff left behind either by earlier incarnations of itself or by other abnormally terminated backup processes. A solution to this general problem, however, requires a reliable capability to auto-fence oneself on connection loss as being developed as part of the solution for Active-Active HA for the cinder volume service [7].
Here we will align with the community decision at the Mitaka design summit to defer the auto-fencing capability and start on Active-Active HA for the cinder volume service without automatic cleanup, by restricting backup service initialization cleanup to leftovers from the same backup service.
The host
field for a backup object will be set to the host for the
backup service to which the backup operation is cast. The status update
of the backup and host update will be handled in an transaction. Cleanup
at initialization can then be restricted to leftover objects that
chain through their corresponding backup object to a host
field
matching oneself. Compare-and-swap DB operation will be used to prevent
race conditions.
For an example of how the host
field will be set, consider
a volume with a backend handled by volume service on node A where
backup service processes are running on node B and node C. When
a backup is created using the service on node B, the host
field
for the backup object will be set to B. When restoring from that
backup using the backup service on node C, the host
field for
the backup object will be set to C.
Cleanup of associated volumes, temporary volumes, and temporary snapshots will be done via rpc to the appropriate volume service host.
Note that the backup object contains a volume_id
field for the
volume it backs up, as well as temp_volume_id
and
temp_snapshot_id
fields for live backups, but it does not
currently keep the id of volumes to which it is restoring backups. We
will need to add this field in order to determine orphaned
restore-operation volumes.
Special Volume Driver Backup/Restore Considerations¶
Since the functionality of the current volume driver backup_volume
and restore_backup
methods will in this proposal move into the
backup manager, these methods will no longer be needed and can be
removed from the codebase. That said, some volume drivers override
these with methods that apparently have a bit more “special sauce”
than just preparing their volume for presentation as a block device.
We will need to analyze the codebase to root out any of these and determine how to accommodate any special needs.
An example is the vmware volume driver [4], where a “backing” and
temporary vmdk file are created for the cinder volume and the
temporary vmdk file is used as the backup source. We will have to
determine whether all this can be done in the volume driver’s
initialize_connection
method during attach
, or whether we will
require an additional rpc hook to a prepare_backup_volume method or
some such for volume drivers of this sort.
Data model impact¶
None.
REST API impact¶
None.
Security impact¶
TBD.
We need to understand exactly where it is necessary to elevate privileges in running backup and restore operations and ensure that there is no unnecessary elevation above the normal privileges for the admin or tentant requesting the these operations.
Notifications impact¶
We should be able to do exactly the same backup service notifications as those done now.
Other end user impact¶
No change in function or client interaction.
Performance Impact¶
Backup process will be lighter weight since volume manager and drivers are no longer loaded.
The proposed change enables running multiple backup processes as required.
Other deployer impact¶
Backup service can now run on multiple nodes and no longer has to run on the same node as the volume service handling a volume’s backend.
Developer impact¶
Enables potentially valuable future features such as backup scheduler or elastic backup service placement.
Implementation¶
Assignee(s)¶
Primary assignee:
Tom Barron (tbarron, tpb@dyncloud.net)
Other contributors: * LisaLi (lixiaoy11, xiaoyan.li@intel.com) * Huang Zhiteng (winston-d, winston.d@gmail.com)
Work Items¶
Write the code. A POC is available now [1].
Determine and address any impact on existing volume drivers that have their own backup or restore methods.
Run/test the new code with multiple backup processes running on multiple nodes, other than the node or nodes where the volume services run for enabled backends.
Dependencies¶
None
Testing¶
Unit tests will be extended to cover new backup code for functionality formerly provided by volume driver.
Unit tests for backup manager and volume drivers will be modified to reflect code removed from the backup service to load volume manager, run volume drivers, etc. and from the volume driver to run backup and restore operations.
Existing tempest tests should provide sufficient coverage to ensure that current functionality does not regress. Potentially new multi-node tempest tests could be added to verify distributed interactions. We should take advantage of opportunities to extend current tempest coverage for backup and add functional tests for backup when this is feasible.
Documentation Impact¶
Update with new deployment options.