Volume Replication

https://blueprints.launchpad.net/cinder/+spec/volume-replication

Volume replication is a key storage feature and a requirement for features such as high-availability and disaster recovery of applications running on top of OpenStack clouds. This blueprint is an attempt to add initial support for volume replication in Cinder, and is considered a first take which will include support for: * Replicate volumes (primary to secondary approach) * Promote a secondary to primary (and stop replication) * Re-enable replication * Test that replication is running properly

It is important to note that this is a first pass at volume replication. The process of implementing replication for drivers has uncovered a number of challenges that will be addressed in a future revision of replication that will address the ability to have different replication types and the ability to replicate across multiple backends.

While this blueprint focuses on volume replication, a related blueprint focuses on consistency groups, and replication will be extended to support it.

Use Cases

Problem description

The main use of volume replication is resiliency in presence of failures. Examples of possible failures are:

  • Storage system failure
  • Rack(s) level failure
  • Datacenter level failure

Here we specifically exclude failures like media failures, disk failures, etc. Such failures are typically addressed by local resiliency schemes.

Replication can be implemented in the following ways:

  • Host-based - Requires Nova integration
  • Storage-based
    • Typical block based approach - replication is specified between two existing volumes (or groups of volumes) on the controllers.
    • Typically file system based approach - a file (in Cinder context, the file representing a block device) placed in a directory (or group or fileset, etc) that is automatically copied to a specified remote location.

Assumptions:

  • Replication should be transparent to the end-user, failover, failback and test will be executed by the cloud admin. However, to test that the application is working, the end-user may be involved, as they will be required to verify that his application is working with the volume replica.
  • The storage admin will provide the setup and configuration to enable the actual replication between the storage systems. This could be performed at the storage back-end or storage driver level depending on the storage back-end. Specifically, storage drivers are expected to report with whom they can replicate and report this to the scheduler.
  • The cloud admin will enable the replication feature through the use of volume types.
  • The end-user will not be directly exposed to the replication feature. Selecting a volume-type will determine if the volume will be replicated, based on the actual extra-spec definition of the volume type (defined by the cloud admin).
  • Quota management: quota are consumed as 2x as two volumes are created and the consumed space is doubled. We can re-examine this mechanism after we get comments from deployers.

Proposed change

Introduction:

The proposed design provides just a framework in Cinder for backend volume drivers to implement volume replication using the facilities in the storage backend. As such, this spec provides guidance as to how volume replication should be implemented but the actual implementation will vary depending upon the backend in question.

The key to enabling replication starts with adding an extra spec to the volume type to indicate that replication is desired. That extra spec is then used in the volume driver to enable the set-up and control of replication on the storage backend in each of the different functions documented below.

Since Cinder is just providing the framework for backend volume drivers to implement replication, details of replication implementation are left to the backend to implement. The backend driver developer will need to decide for their storage backend the best way to enable replication. For instance one storage provider may feel that implementing synchronous replication is the best choice while another storage provider may choose asynchronous. A provider could also choose to make it a configurable option. Implementing volume replication in Cinder in this manner allows the greatest flexibility to the backend developer to implement replication.

It is also important to note that the developer documentation must provide examples of how this is implemented in the Storwize driver. This is an important item to note as it is not currently possible to demonstrate volume replication in Cinder’s reference implementation, LVM. Therefore developer documentation will have to serve as the reference.

Add extra-specs in the volume type to indicate replication:

  • capabilities:replication <is> True - if True, the volume is to be replicated, if supported, by the backend driver. If the option is not specified or False, then replication is not enabled. This option is required to enable replication.

Create volume with replication enabled:

  • Backend drivers that wish to enable replication will need to update their create_volume() function to check for the ‘capabilities:replication <is> True’ extra spec. It is up to the backend driver developers to implement replication in a manner that is compatible with their storage backend.

    When a replicated volume is created it is expected that the volume dictionary will be populated as follows:

    ** volume[‘replication_status’] = ‘copying’ ** volume[‘replication_extended_status’] = <driver specific value> ** volume[‘driver_data’] = <driver specific value>

    The replica volume is hidden from the end user as the end user will never need to directly interact with the replica volume. Any interaction with the replica happens through the primary volume.

    Further details around the dictionary fields above may be seen in the data “Data Model Impact” section below.

Create Volume from Snapshot:

If the volume type extra specs include ‘capabilities:replication <is> True’ for the new volume, the driver needs to create a volume replica at volume creation time and set up replication between the newly created volume and its associated replica. The volume dictionary should be populated in the same manner as create volume.

Create Cloned Volume:

If the volume type extra specs include ‘capabilities:replication <is> True’ for the new volume, the driver needs to create a volume replica at clone creation time and set up replication between the newly created volume and its associated replica. The volume dictionary should be populated in the same manner as create volume.

Create Replica Test Volume:

Create a clone of the replica (secondary) volume. This clone can then be used for testing replication to ensure that fail-over can be executed when necessary. It is important to note that this doesn’t actually execute the the promote path as the intention is not to promote the replica but it gives a method to ensure that the replica contains data and would be useful if it had to be promoted.

The administrator is able to access this functionality using the –source-replica option when creating a volume.

Delete volume:

For volumes with replication enabled the replica needs to be deleted along with the primary copy. So, if a volume type has ‘capabilities:replication <is> True’ set, the driver will need to do the additional deletion.

Get Volume Stats:

If the storage backend driver supports replication the following state should be reported: * replication = True (None or False disables replication)

Re-type volume:

Changing volume-type is the mechanism an admin can use to make an existing volume replicated, or to disable replication for a volume. Change the volume-type of a volume to a volume-type that includes ‘capabilities:replication: <is> True’ (and didn’t have it before) should result in adding a secondary copy to a volume. Change the volume-type of a volume to a volume-type that no longer includes ‘capabilities:replication: <is> True’ should result in removing the secondary copy while preserving the primary copy.

Returns either:
A boolean indicating whether the retype occurred, or A tuple (retyped, model_update) where retyped is a boolean indicating if the retype occurred, and the model_update includes changes for the volume db.

The steps to implement this would look as follows: * Do a diff[‘extra_specs’] and see if ‘replication’ is included. * If replication was enabled for the original volume_type but is not

not enabled for the new volume_type, then replication should be disabled.
  • The replica should be deleted.
  • The volume dictionary should be updated as follows:

** volume[‘replication_status’] = ‘disabled’ ** volume[‘replication_extended_status’] = None ** volume[‘driver_data’] = None * If replication was not enabled for the original volume_type but is

enabled for the new volume_type, then replication should be enabled.
  • A volume replica should be created and the replication should be set up between the volume and the newly created replica.
  • The volume dictionary should be updated as follows:

** volume[‘replication_status’] = ‘copying’ ** volume[‘replication_extended_status’] = <driver specific value> ** volume[‘driver_data’] = <driver specific value>

Get Replication Status:

This will be used to update the status of replication between the primary and secondary volume.

This function is called by the “_update_replication_relationship_status” function in ‘manager.py’ and is the mechanism to update the status replication between the primary and secondary copies.

The actual state of the replication, as the storage backed is aware of, should be returned and the Cinder database should be updated to reflect the status reported from the storage backend.

It is expected that the following model update for the volume will happen:

  • volume[‘replication_status’] = <error | copying | active | active-stopped |
    inactive>

** ‘error’ if an error occurred with replication. ** ‘copying’ replication copying data to secondary (inconsistent) ** ‘active’ replication copying data to secondary (consistent) ** ‘active-stopped’ replication data copy on hold (consistent) ** ‘inactive’ if replication data copy is stopped (inconsistent) * volume[‘replication_extended_status’] = <driver specific value> * volume[‘driver_data’] = <driver specific value>

Note for get replication status, that the replication_extended_status and driver_data may not need to be updated.

Promote replica:

Promotion of a replica means that the secondary volume will take over for the primary volume. This can be thought of as a ‘fail over’ operation. Once promotion has happened replication between the two volumes, at the storage level, should be stopped, the replica should be available to be attached and the replication status should be changed to ‘inactive’ if the change is successful, otherwise it should be ‘error’.

A model update for the volume is returned.

As with the functions above, the volume driver is expected to update the volume dictionary as follows: * volume[‘replication_status’] = <error | inactive> ** ‘error’ if an error occurred with replication. ** ‘inactive’ if replication data copy on hold (inconsistent) * volume[‘replication_extended_status’] = <driver specific value> * volume[‘driver_data’] = <driver specific value>

Re-enable replication:

Re-enabling replication would be used to fix the replication between the primary and secondary volumes. Replication would need to be re-enabled as part of the fail-back process to make the promoted volume and the old primary volume consistent again.

The volume driver returns a model update to reflect the actions taken.

The backend driver is expected to update the following volume dictionary entries: * volume[‘replication_status’] = <error | copying | active | active-stopped |

inactive>

** ‘error’ if an error occurred with replication. ** ‘copying’ replication copying data to secondary (inconsistent) ** ‘active’ replication copying data to secondary (consistent) ** ‘active-stopped’ replication data copy on hold (consistent) ** ‘inactive’ if replication data copy is stopped (inconsistent) * volume[‘replication_extended_status’] = <driver specific value> * volume[‘driver_data’] = <driver specific value>

Notes:

The replication_extended_status should be used to store information that the backend driver will need to track replication status. For instance, the Storwize driver, will use the replication_extended_status to track the primary copy status and synchronization status for the primary volume and the copy status, synchronization status and synchronization progress for the replica (secondary) volume.

The driver_data field may be, optionally, used to contain any additional data that the backend driver may require. Some backend drivers may not need to use the driver_data field.

Driver API:

  • promote: Promotes a replica that is in active or active-stopped state to
    be the primary.
  • reenable: Reenables replication on a volume that is in inactive,
    active-stopped or error status.

Alternatives

Replication can be performed outside of Cinder, and OpenStack can be unaware of it. However, this requires vendor specific scripts, and is not visible to the admin user, as only the storage system admin will see the replica and the state of the replication. Also all recovery actions (failover, failback) will require both the the storage and cloud admins to work together. While replication in Cinder reduces the role of the storage admin to only the setup phase, and the cloud admin is responsible for failover and failback with (typically) no need for intervention from the cloud admin.

Data model impact

  • The volumes table will be updated:
** Add replication_status column (string) for indicating the status of
replication for a give volume. Possible values are:
*** ‘copying’ - Data is being copied between volumes, the secondary is
inconsistent.

* ‘disabled’ - Volume replication is disabled. * ‘error’ - Replication is in error state. *** ‘active’ - Data is being copied to the secondary and the secondary is

consistent.
*** ‘active-stopped’ - Data is not being copied to the secondary (on hold),
the secondary volume is consistent.
*** ‘inactive’ - Data is not being copied to the secondary, the secondary
copy is inconsistent.
** Add replication_extended_status column to contain details with regards
to replication status of the primary and secondary volumes.
** Add replication_driver_data column to contain additional details that
may be needed by a vendor’s driver to implement replication on a backend.

State diagram for replication (status)

<start>
                                         any error
                                         condition    +-------+
Create volume   +-----+                +------------> | error |
                      |                               +---+---+
                      |                                   | Storage admin to
                      |                                   | fix, and status
                      |                                   | check will update
                +-----+-----+                             |
+-------------> |  copying  |           any state <-------+
|               +-----+-----+
|                    |
|             status |
|             check  |       status check
|               +----+-----+ +------> +----------------+
|               | active   |          | active-stopped |
|               +----+-----+ <------+ +----------------+
|                    |       status check
|                    |
|                    | promote to primary
|                    |
| re-enable     +----+-----+
+------------+  | inactive |
                +----------+

<end>

REST API impact

Create volume API will have “source-replica” added:

{

“volume”: {

“source-replica”: “Volume uuid of primary to clone”,

}

}

  • Promote volume to be the primary volume
    • Promote the secondary copy to be primary. the primary will become secondary and Replication should become inactive.
    • Method type: POST
    • Normal Response Code: 202
    • Expected error http response code(s)
      • 500: Replication is not enabled for volume
      • 500: Replication status for volume must be active or active-stopped, but current status is: <status>
      • 500: Volume status for volume must be available, but current status is: <status>
    • V2/<tenant id>/volumes/os-promote-replica/<volume uuid>
    • This API has no body
  • Re-enable replication between the primary and secondary volume.
    • Re-enable the replication between the primary and secondary volume. Typically follows a promote operation on the replication.
    • Method type: POST
    • Normal Response Code: 202
    • Expected error http response code(s)
      • 500: Replication is not enabled
      • 500: Replication status for volume must be inactive, active-stopped, or error, but current status is: <status>
    • /v2/<tenant id>/volumes/os-reenable-replica/<volume uuid>
    • This API has no body

Security impact

  • Does this change touch sensitive data such as tokens, keys, or user data? No.
  • Does this change alter the API in a way that may impact security, such as a new way to access sensitive information or a new way to login? No.
  • Does this change involve cryptography or hashing? No.
  • Does this change require the use of sudo or any elevated privileges? No.
  • Does this change involve using or parsing user-provided data? This could be directly at the API level or indirectly such as changes to a cache layer. No.
  • Can this change enable a resource exhaustion attack, such as allowing a single API interaction to consume significant server resources? Some examples of this include launching subprocesses for each connection, or entity expansion attacks in XML. Yes, enabling replication consume cloud and storage resources.

Notifications impact

Will add notification for promoting and re-enabling replication for volumes.

Other end user impact

  • End-user to use volume types to enable replication.
  • Cloud admin to use the replication-promote, replication-reenable and create –source-replica commands in the python-cinderclient to execute failover, failback and test.

Performance Impact

  • Extra db calls identifying if replication exists are added to retype, snapshot operations, etc will add a small latency to these functions.

Other deployer impact

  • Added options for volume types (see above)
  • Add new driver capabilities, needs to be supported by the volume drivers, which may imply changes to the driver configuration options.
  • This change will require explicit enablement (to be used by users) from the cloud administrator.

Developer impact

  • Change to the driver API is noted above. Third party backends that wish to enable replication will need to add replication support to their driver.
  • The API will expand to include consistency groups following the merge of consistency group support to Cinder.

Implementation

Assignee(s)

Primary assignee:
ronenkat
Other contributors:
Jay Bryant - E-Mail: jsbryant@us.ibm.com IRC: jungleboyj

Work Items

  • Cinder public (admin) APIs for replication
  • DB schema updates for replication
  • Cinder driver API additions for replication
  • Cinder manager update for replication
  • Testing

Dependencies

Testing

  • Testing in gate is not supported due to the following considerations:
    • LVM has no replication support, to be addressed using DRBD in a separate contribution.
    • requires setting up at least two nodes using DRBD
  • Should be discussed/addressed as support for LVM is added.
  • 3rd party driver CI will be expected to test replication.

Documentation Impact

  • Public (admin) API changes.
  • Details how replication is used by leveraging volume types.
  • Driver docs explaining how replication is setup for each driver.
  • Provide examples of volume replication implementation for the Storwize backend.

References

Etherpad on improvements needed in documentation:
https://etherpad.openstack.org/p/cinder-replication-redoc