Cleanup dangling volumes block device mapping

https://blueprints.launchpad.net/nova/+spec/cleanup-dangling-volume-attachments

Find out if there are any dangling/unattached volumes in Nova and Cinder database and remove them, if they exists.

Problem description

In case after some volume related operation, volume get detached from instance at but Nova did not get notified and thinks volume is still attached to an instance because volume attachment id is still listed in BDM table of Nova.

This can lead to different issues in functionalities, which required volume details from block_device_mapping table, such as live miration and resizing of instance.

Similarly attachment for instance exists at Cinder side but not in Nova DB.

Use Cases

  • As an operator, I want all dangling volume attachments safely removed from my instance, as having these attachments in BDM may makes instance goes to error state on instance startup.

  • As an operator, I want all dangling volume attachments safely removed from my instance, so any volume-related operations do not get affected.

  • As an admin, I want all dangling attachments listed at Cinder, safely removed from Cinder DB that are claiming to be for the instance.

Proposed change

Notes

To spawn a new instance, Nova retrieves a copy of the base OS image from Glance, now this image is an instance storage, which means if we create any file, it will persist in this storage. Nova creates a BDM for it in the block_device_mapping database with source_type as image and destination_type as local.

Similarly, when we ask Nova to attach volume to an instance, Nova creates a BDM of it in the block_device_mapping database and sets source_type and destination_type as volume.

Changes

While restarting the instance, verify, on the basis of source_type and volume_type, whether the attached BDM is a volume or not, if it is a volume, then verify if this volume exists in Cinder or not. If it exists, verify if its status is “in-use” or “available”. If it’s “in-use”, that means the volume attachment is correct, and both Nova and Cinder are aware of this attachment. If it’s “available” that means the volume is not attached properly to the instance, so remove or soft delete the BDM from the block_device_mapping database.

Also log the update info, so operators can be aware of the reason for this modification in the database.

Code Changes

To delete the BDM’s from the database, we first must need to shutdown the instance, so instance domain get redefined at the virt level. We need to make sure BDM’s updated before generating the new XML.

Hence, this functionality should be added in the instance reboot process. While rebooting, update the block_device_mapping DB at Nova side and volume_attachment DB at Cinder side via Cinder API call. Once after instance shutoff properly, while starting again, at the virt level (such as libvirt) driver module will generate a new XML domain with updated BDM’s.

Functionality _delete_dangling_bdms() should be added inside ComptuteManager and called from ComptuteManager.reboot_instance. It should verify whether target volume BDM source and destination type is not image and local but volume and then if target volume is not listed in Cinder or status of volume at Cinder is ‘available’ and not ‘in-use’ delete the BDM mapping from block_device_mapping table.

Once a dangling volume is found, log a message saying removing stale volume attachments.

Alternatives

  • A cleanup functionality for Nova-manage utility, which takes instance and remove all dangling volumes from instance.

    $ nova-manage volume_attachment cleanup <server-id>
    
  • A cron job which check for each instance in the Nova BDM and Cinder volume_attachment table, if instance has dangling volumes, remove volume entry from table. In this job instance UUID is not required.

Data model impact

None

REST API impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

Server might take more time to reboot, as there will be GET and DELETE API call(s) towards Cinder service.

It primarily depends on number of attachments to delete.

Other deployer impact

None

Developer impact

None

Upgrade impact

None

Implementation

Assignee(s)

Primary assignee:

auniyal

Feature Liaison

Feature liaison:

None

Work Items

  • Create a cleanup functionality and add in instance restart process.

  • Add unit and functional tests for cleanup.

Dependencies

None

Testing

Unit and Functional tests will be added.

Documentation Impact

  • Releasenote for cleanup dangling volumes while server restart will be added.

  • Update admin manage volumes doc.

References

None

History

Revisions

Release Name

Description

2023.2 Bobcat

Introduced