volume-backed server rebuild¶
Currently, the compute API will fail if a user tries to rebuild a volume-backed server with a new image. This spec proposes to add support for rebuilding a volume-backed server with a new image.
Currently Nova rebuild (with a new image) only supports instances which are booted from images. The volume-backed instance cannot be rebuilt when a new image is supplied. Trying to rebuild a volume-backed instance will raise a HTTPBadRequest exception.
As a user, I would like to rebuild my volume-backed server with a new image.
First, change the existing API for rebuilding a volume-backed server. Then the API flow would be:
A new microversion will be required to opt into the new functionality. If it is old API microversion request, then it should be 400 returned. Note that the old behaviour still allows to rebuild a volume backed server with the same image in which we don’t wipe out the data of the volume so to prevent the user from accidently destroying all their data, we require them to use the new microversion.
If the cinder microversion is new enough to support reimage the boot volume. If not, will raise CinderAPIVersionNotAvailable exception.
In case of multiattach volumes, n-api will reject the request since rebuilding multiattach volumes require complex attachment handling and the effort would outweigh the benefit.
Then the nova-compute manager will perform the following steps:
Create an empty (no connector) volume attachment for the volume and server. This ensures the volume remains
reservedthrough the next step.
Delete the existing volume attachment (the old one).
Save the new attachment UUID to the BDM.
The above two steps are needed to keep the volume in
reservedstate as a management state which is required by cinder to perform re-image operation on it.
Call the new
Add a new ‘volume-reimaged’ external event to wait for cinder to complete the reimage. Like we use for volume-extend. See perform_resize_volume_online for details.
After successful completion of the re-image operation, cinder will notify Nova via external events API that the reimage operation is completed.
Call cinder to Update the empty volume attachment by passing the connector info and cinder will return connection info to Nova.
After Nova completes the connection with brick, complete the attachment marking the volume
In this process, there are some conditions that we could hit:
If we failed to re-image the volume and the volume is in ‘error’ status then we should set the instance status as “error”. Since users can rebuild instances in error status, the user has a way to retry the rebuild once the cause of the cinder side failure is resolved. Note that nova-compute will not attempt to update the volume attachment records with the host connector again on the volume in error status.
If the cinder API itself returns a >=400 error, nothing changed about the root volume and in that case the instance action should be ‘failed’ and the instance status should go back to what it was (we can see how _error_out_instance_on_exception is used).
The main alternative is that nova would perform the rebuild like an initial boot from volume where nova-compute would create a new volume from the new image and then replace the root volume on the instance during rebuild.
There are issues with this, however, like what to do about the old volume:
Regarding ‘delete_on_termination’ flag in the BDM, delete_on_termination=True means: delete the volume when we kill the instance. Rebuild means: re-initialize this instance in place. The rebuild flow would have to determine what to do if the old root volume BDM was marked with delete_on_termination=True. If delete_on_termination is True, delete the old root volume, otherwise, preserve it.
We could pass a new flag to the rebuild API telling nova what to do about the old volume (delete it or not). If the flag is true to delete the old volume but the old volume has snapshots, Nova won’t be deleting the volume snapshots just to delete the volume during a rebuild.
But there are several issues with that as mentioned above like quota and the questions about what nova should do about the old volume, you can see more detailed information in References.
Data model impact¶
REST API impact¶
Change the rebuild request response code from 400 to 202 if the conditions described in the Proposed change section are met. The API microversion and compute RPC version will also be incremented to indicate the new support.
Other end user impact¶
The python-novaclient and python-openstackclient will be updated to support
the new microversion.
Two additional parameters –reimage-boot-volume` and
--no-reimage-boot-volume will be added as a check (along with the
microversion check) on the OpenStackClient side that will determine if
the user really knows it will reimage the volume.
The operation will take longer because of the external dependency involved and the work that needs to happen in Cinder.
Other deployer impact¶
If the cinder volume
reimage API operation fails and the volume goes to
error status, an admin will likely need to investigate and resolve the
issue in cinder and then reset the volume status to
The API microversion and compute service version will also be incremented to indicate the new support, therefore users will not be able to leverage the feature until the nova-compute service hosting a volume-backed instance is upgraded.
- Primary assignee:
Rajat Dhasmana <firstname.lastname@example.org> (whoami-rajat)
Add a new parameter
--confirm-reimageon the client side.
Change the existing rebuild API to allow volume backed instance rebuild with a new image.
Create an empty attachment for the root volume so the volume remains in-use during rebuild (we do this today already).
Delete the old volume attachment.
Call the cinder API to re-image the volume.
Update and complete the volume attachment once re-imaged.
Adopt the new compute version.
Adopt the new microversion in python-novaclient.
Adopt the new microversion in python-openstackclient.
Depends on the cinder blueprint for re-imaging a volume, see more detail information in References.
The following tests are added.
Nova unit tests for negative scenarios
Nova functional tests for “happy path” testing
Tempest integration tests to make sure the nova/cinder integration works properly
We will replace the note in the API reference with a note about the required minimum microversion for rebuilding a volume-backed server with a new image.
The following document will be updated:
We also need to mention in the documentation that when the volume is re-imaged, all current content on the volume will be destroyed. This is important as cinder volumes are considered to be persistent, which is not the case with this operation.
Stein PTG etherpad: https://etherpad.openstack.org/p/nova-ptg-stein
This is the discussion about rebuild the volume-backed server:
This is the discussion about what we should do about the root volume during a rebuild:
The cinder blueprint for re-imaging a volume: