Provide a way to pause VM during live migration

Blueprint: https://blueprints.launchpad.net/nova/+spec/pause-vm-during-live-migration

When using live migrations, an operator might want to have a possibility to increase success chance of migration even at the cost of longer VM downtime. This spec proposes a new nova API for pausing VM during live migration.

Problem description

The most common use case of live migration is host maintenance for different purposes. It might be, e.g., OpenStack upgrade to newer version or even hardware upgrade. Hypervisors have some features such as CPU throttling or memory compression to make it possible to live migrate every VM to other hosts. However, a VM might run workload that will prevent live migration from finishing. In such case operator might want to pause VM during live migration to stop memory writes on a VM.

Another use case is imminent host failure where live migration duration might be crucial to keep VMs running regardless of VMs downtime during transition to destination host.

Currently to pause VM during live migration operator needs to pause VM through libvirt/hypervisor. This pause is transparent for Nova as this is the same that happens during ‘pause-and-copy’ step during live migration.

Use Cases

As an operator of an OpenStack cloud, I would like the ability to pause VM during live migration. This operation prevents VM from dirtying memory and therefore it forces live migration to complete.

Proposed change

A new API method for pausing VM during live migration. This will make asynchronous RPC call to compute node to pause a VM through libvirt. Also this will introduce new instance action ‘live-migration-paused-vm’. The Migration object and MigrationList object will be used to establish which migrations exist, with additional optional data provided by the compute driver.

This will need an increment to the rpcapi version too.

Alternatives

Alternative is not doing this and let operator pause VM manually through hypervisor.

Another alternative is to reuse existing pause operation in nova. However, it might bring some confusion to operators. Libvirt preserves VM state that was in effect when live migration started. When live migration completes libvirt reverts VM state to preserved one. Example workflow:

  • VM is active

  • Operator starts live migration

  • Libvirt preserves active state of a VM

  • Operator pauses VM during transition (e.g., nova pause VM)

  • LM finishes

  • Libvirt reverts VM state to preserved one - in this case to active.

Because of such behavior it is not recommended to reuse existing pause operation. It might be confusing for operators that single operation is used for two different purposes.

Also in the future there might be multiple methods to force end of live migration. This API can be extended to give hints to do things other than pause the VM during live migration.

This also will be suitable for Tasks API.

Data model impact

None. The Migration objects used are already created and tracked by nova.

REST API impact

To be added in a new microversion.

  • Force live migration to complete by pausing VM

    POST /servers/{id}/migrations/{id}/action

Body:

{
  "force_complete": null
}

Normal http response code: `202 Accepted`
No response body is needed

Expected error http response code: `400 Bad Request`
- the instance state is invalid for forcing live migration to complete,
i.e., the task state is not 'migrating' or the migration is not in a
'running' state and the type is 'live-migration'. Also when live
migration cancel action is undergoing.

Expected error http response code: `403 Forbidden`
- Policy violation if the caller is not granted access to
'os_compute_api:servers:migrations:force_complete' in policy.json

Expected error http response code: `404 Not Found`
- the instance does not exist

Because this is async call there might be an error that will not be exposed through API. For instance, hypervisor does not support pausing VM during live migration. Such error will be logged by compute service.

Security impact

None

Notifications impact

There will be new notification to indicate start and outcome of pausing VM during ongoing live migration.

Other end user impact

python-novaclient will be extended by new operation to force ongoing live migration to complete by pausing VM during transition to destination host.

Performance Impact

None

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee: Pawel Koniszewski (irc: pkoniszewski)

Work Items

  • Pausing VM during live migration through libvirt

  • python-novaclient ‘nova live-migration-force-complete’

Dependencies

None

Testing

  • Unit and Functional tests in Nova

  • Tempest tests if possible to slow down live migration or start never-ending live migration

Documentation Impact

New API needs to be documented:

References

None