Expose auto converge and post copy¶
Currently auto converge and post copy can only be enabled/disabled via configuration, which is somewhat inflexible. If an application sensitive to reduced performance (some scientific computing applications may be more sensitive to memory access latency) is on a host with these options enabled, live migration may cause the application to raise an error. Therefore, the user wants to control whether to enable/disable auto converge or post copy during live migration.
Some applications do not want increased risk of being rebooted due to a network failure or memory page access failure during post-copy live-migration.
Some applications are performance sensitive (such as some scientific computing applications); such applications do not want performance throttled back by the auto-converge feature during live-migration.
Some applications would like to avoid reboot risk and performance throttling. If the network between two compute nodes is interrupted during post-copy live-migration, the live-migration will fail and the user will need to reset the instance to make it available. Therefore such applications do not want use both features during live-migration.
For the above problems, the operator wants to control whether a single instance enables auto converge or post copy during live migration. But currently the minimum unit that can be controlled is the compute node.
Support for auto converge and post copy requires QEMU version >= 2.5.0. Since
the Rocky release, the minimum required version of QEMU is 2.5.0 .
Therefore, all compute nodes using the libvirt driver should support these
features. There are flags from the libvirt
... VIR_MIGRATE_AUTO_CONVERGE = 8192 VIR_MIGRATE_POSTCOPY = 32768 ...
live_migration_permit_post_copy can only affect the hypervisor by
modifying the configuration, but traits can affect a single instance.
In order to request the feature (scheduling an instance to nodes that provide the feature) we propose defining two new traits. The traits are reported by the libvirt driver, regardless of the conf:
Introduce two new flavor extra specs:
And introduce two new image properties:
Use these properties, instead of asking the operator to set
forbidden on the traits. Before calling placement, when
compute:live_migration_post_copy=true, we add required traits
for the corresponding feature to the placement request. When
compute:live_migration_post_copy=false, we just add nothing to
the placement request. Thus we still can schedule an instance on a host with
the features but we disable these two features for that instance. We use these
keys in the scheduler to optionally add required traits to ensure that the
instance can land on a host that is capable of the requested behavior. The
libvirt driver will then interpret the values to decide whether to use the
features during live migration. For example, if the flavor says “false”:
We don’t add the trait to the scheduling request, so the instance can land anywhere.
The driver will not use the feature for live-migrate, regardless of what the compute’s config says.
By default, when the operator creates an instance without any related metadata,
the scheduler will not care whether the host supports auto-converge or
post-copy. If the configurations
live_migration_permit_post_copy are True, the libvirt driver will prefer to
use auto-converge or post-copy. These can be used when the operator wants all
instances on a given compute node to use auto-converge/post-copy. For
If an instance that has not requested related metadata is scheduled to a host that enabled
live_migration_permit_post_copy, then libvirt will try to use auto-converge or post-copy during live migration.
If the operator creates instance with
these metadata will override the configurations:
compute_live_migration_post_copy are both true or flavor extra specs
is in conflict with image properties, the ‘create’ API call will raise an
When using auto-converge during live migration, if the operator calls the force complete API, libvirt will not be converted to use post-copy because it’s not required in flavor extra specs or image properties.
According to this spec , if post-copy is enabled during live migration, the
abort API call will be rejected by libvirt driver. Now we can reject the
request in the API by checking
Another method is to use traits in flavor extra_specs/image properties. This
could work well when the operators need auto-converge/post-copy. But it can’t
be used to disable auto-converge/post-copy.
Since the Rocky release, all libvirt hypervisor hosts support
auto-converge/post-copy, which means every libvirt hypervisor host would have
If operators want to not use auto-converge or post-copy, they would use
traits:COMPUTE_MIGRATE_POST_COPY=forbidden. Which means don’t schedule
my vm to the hosts who support auto-converge/post-copy, as the above says, this
means that all libvirt compute nodes will be ignored. The result will be that
the vm creation failed because the compute node can’t be scheduled.
Data model impact¶
Add the two image properties to the ImageMeta object:
The ImageMeta is stored in table instance_system_metadata, no schema modification is needed.
REST API impact¶
Other end user impact¶
Other deployer impact¶
- Primary assignee:
Support for new placement traits.
Libvirt driver changes to report traits to placement, the traits will be reported by the libvirt driver as part of
update_provider_tree. This will not be added to the generic compute capabilities dict inherited by all the virt drivers because these traits are libvirt-specific.
Scheduler changes to translate metadata to traits.
_live_migration_flagsbefore live migration start in the libvirt driver.
Add functional tests and unit tests.
Unit tests and functional tests will be included to test the new functionality.
The live migration document should be changed to introduce this new feature.