Deletion of volumes with associated snapshots

https://blueprints.launchpad.net/cinder/+spec/del-vols-with-snaps

Allow deletion of volumes with existing snapshots. The proposal is to integrate this with the existing volume delete path, using an additional parameter to request deletion of snapshots as well when deleting a volume.

Problem description

When deleting a volume, the delete operation may fail due to snapshots existing for that volume. The caller is forced to examine snapshot information and make many calls to remove snapshots, even if they are not interested in the snapshots at all, and just want the volume gone.

Since snapshots are “children” of volumes in our model of the world, it is reasonable to allow a volume and its snapshots to be removed in one operation both for usability and performance reasons.

Use Cases

  • More friendly and expected behavior for end-users.

Currently, if a volume has snapshots, the basic user experience is:
  1. Try to delete volume

  2. Get back error message about it having snapshots

  3. Go delete X snapshots manually, become slightly frustrated

  4. Delete the volume

  • Simpler for other software integrating with Cinder.

I received a request for this functionality because a project integrating with Cinder would like to be able to just delete volumes without having to handle logic for this. I think that is a reasonable point of view (just as it’s reasonable to figure that a user shouldn’t have to handle this).

  • Faster and more efficient for some volume drivers.

There are two losses of performance in requiring back and forth between cinder-volume and the backend to delete a volume and snapshots:

  1. Extra time spent checking the status of X requests.

  2. Time spent merging snapshot data into another snapshot or volume which is going to immediately be deleted.

This means that we currently force a “delete it all” operation to take more I/O and time than it really needs to. (The degree of which depends on the particular backend.)

Proposed change

A volume delete operation should handle this by default.

Phase 1:

This is the generic/”non-optimized” case which will work with any volume driver.

When a volume delete request is received:
  1. Look for snapshots belonging to the volume, set them all to “deleting” status.

  2. Set the volume to “deleting” status. (Done after #1 so as not to diverge more than needed from our current model of state transitions.)

  3. Issue a snapshot delete for each snapshot. (This loop happens in the volume manager.)

  4. If any snapshot delete operations fail, fail the operation and ensure the volume returns to available. Any snapshots that were successfully deleted remain so. Any snapshots which failed to be deleted are marked as ‘error_deleting’. It may make sense to continue deleting snapshots if an error occurs, or it may be best to stop, depending on the type of error. We probably need some experience with implementing this to be sure of the details for this.

  5. Volume manager now moves all snapshots in ‘deleting’ state to deleted. (volume_destroy/snapshot_destroy)

Phase 2:

This case is for volume drivers that wish to handle mass volume/snapshot deletion in an optimized fashion.

When a volume delete request is received:

Starting in the volume manager…

  1. Check for a driver capability of ‘volume_with_snapshots_delete’. (Name TBD.) This will be a new abc driver feature.

  2. If the driver supports this, call driver.delete_volume_and_snapshots(). This will be passed the volume, and a list of all relevant snapshots.

  3. No exception thrown by the driver will indicate that everything was successfully deleted. The driver may return information indicating that the volume itself is intact, but snapshot operations failed.

  4. Volume manager now moves all snapshots and the volume from ‘deleting’ to deleted. (volume_destroy/snapshot_destroy)

  5. If an exception occurred, set the volume and all snapshots to ‘error_deleting’. We don’t have enough information to do anything else safely.

  6. The driver returns a list of dicts indicating the new statuses of the volume and each snapshot. This allows handling cases where deletion of some things succeeded but the process did not complete.

Alternatives

  • Implement as the default behavior in volume delete. - Deemed not a suitable change at this time.

  • Introduce this as a separate volume_action instead of in the standard volume delete path. - Does not help usability without client modifications.

Data model impact

No direct impact.

In implementation, we need to ensure we don’t end up with strange things like a volume in a “deleting” status that has snapshots in “available” status. Thus, failures to delete a single snapshot in this model may cascade to marking the volume and all other associated snapshots as errored. (Only relevant for phase 2 above. This doesn’t happen if we leave the snapshot and volume delete operations separate internally.)

REST API impact

Add a boolean parameter “delete_snapshots” to the delete volume call, which defaults to false.

A volume delete with snapshots which previously returned 400 will now succeed.

Security impact

None.

Notifications impact

None.

All snapshot/volume delete notifications will still be fired.

Other end user impact

New –delete-snapshots parameter for volume-delete in cinderclient.

Performance Impact

  • Someone deleting a volume and all snapshots should be able to achieve this more quickly, and with fewer REST calls.

  • Some storage backends will experience less load due to not having to merge snapshots being deleted.

Other deployer impact

None.

Developer impact

  • New, optional, driver interface:
    def delete_volume_and_snapshots(volume, snapshots[]):

    This should take whatever driver-specific steps are needed to delete the snapshots and associated volume data.

    The assumption can be made that any failed snapshot delete results in a failed volume, so this does not have to account for partial failures.

  • Note: None of this has to happen at a level above the volume manager since the volume manager handles all related status updates.

Implementation

Assignee(s)

Primary assignee:

eharney (spec, some implementation)

Other contributors:

Other associates (implementation)

Work Items

Investigation:

  • Understand interaction w/ public/shared snapshots.

Implementation:

Rough order should be: * Add parsing for new parameter to volume delete API * Implement volume manager logic to delete everything * Create an abc class for the new driver interface * Implement volume manager logic to talk to the new driver interface * Implement an optimized case for the LVM driver

Dependencies

None

Testing

Tempest tests will be added to cover this.

Documentation Impact

Need to document the new behavior of the volume delete call, as well as related client examples, etc.

References