vTPM live migration¶
https://blueprints.launchpad.net/nova/+spec/vtpm-live-migration
When Nova first added vTPM support, all non-spawn operations were rejected at the API level. Extra work was necessary to manage the vTPM state when moving an instance. This work was eventually completed for resize and cold migration, and those operations were unblocked. The blocks on live migration, evacuation, shelving and rescue are still in place.
A TPM device is required for certain features of Windows Server 2022 and 2025, notably BitLocker Drive Encryption. It’s also required to run Windows 11 at all. The inability to live migrate instances with vTPM is a major roadblock for anyone operating Windows guests in an OpenStack cloud.
Libvirt support for vTPM live migration now exists (more details in Problem description), but Nova changes are necessary before being able to remove the API block. This spec describes those changes.
Problem description¶
There are four aspects to vTPM live migration: shared vs non-shared vTPM state storage, Libvirt support, and secret management. There is also an adjacent problem, that - while not related to live migration - can be resolved by the changes necessary to support live migration: vTPM instances cannot be started back up by Nova after a compute host reboot.
vTPM state storage¶
vTPM state storage is not the same as instance state storage and Libvirt supports the use of local storage and shared storage such as NFS, for both.
Libvirt can be told where to store the vTPM state via the source XML element, which Nova
does not support.
Nova deployments use the Libvirt default vTPM state path. On both Ubuntu and
Red Hat operating systems, this path is /var/lib/libvirt/swtpm/<instance
UUID>. This path is distinct from the instance state path.
Testing will generally focus on local storage and could be expanded to shared storage like NFS in the future. Currently the Nova CI gate does not have any jobs that are configured with NFS.
Libvirt support¶
Though it was impossible to find Libvirt artifacts explicitly demonstrating vTPM live migration support for non-shared vTPM state storage, as of version 8.10, vTPM live migration with shared vTPM storage is supported, and this comment suggests that for non-shared storage, vTPM live migration has been supported since version 7.1.0.
Therefore, this spec requires Libvirt 7.1.0. Our current minimum Libvirt version is 8.0.0 as of 2025.1 (Epoxy), so we will not need to do any minimum version checks while implementing this feature.
Secret management¶
When creating an instance with vTPM, Nova asks a key manager - normally Barbican - to generate a secret. Crucially, this is done with the user’s token, and the created secret is owned by the user, with no one else - not even admin or the Nova service user - being able to read it. Nova then defines the secret in Libvirt, and in the instance XML references the secret by its UUID. This tells Libvirt to encrypt the instance’s vTPM state using the contents of that secret as the symmetric key. Nova undefines the secret once the Libvirt domain spawns successfully.
For vTPM live migration to work, a Libvirt secret with the same UUID and contents needs to be defined on the destination host so that destination Libvirt can decrypt the vTPM state. Currently, Nova has no way of doing this. Live migration is an admin operation, and neither admin nor the Nova service user have access to the Barbican secret (unless the admin happens to be the owen of the instance, but that’s an edge case). The Libvirt secret cannot be read back on the source host either, because it’s defined as private and is undefined once the domain spawns.
Compute host reboot¶
For the exact same reasons (lack of Barbican secret access and inability to read the Libvirt secret back from Libvirt), Nova cannot start back up vTPM instances after a compute host reboot.
Use Cases¶
As a cloud operator, I want to be able to live migrate instances with vTPM devices, in particular Windows instances.
As a cloud user, I want to keep the contents of my instance’s vTPM private. The cloud system should only be able to decrypt it when I request it via my user token and the system should only keep the decryption secret around for a limited time. I as a user am willing to accept that such privacy requirements limit some of the admin initiated lifecycle operations on my instance.
As a cloud operator, I want vTPM instances on a compute host to start back up again after a host reboot.
Proposed change¶
Because the security of the vTPM secret (either in Barbican or in Libvirt) affects what operations can be performed on an instance, users should be able to specify what level of security they require, and operators need to specify what level of security they’re willing to support. There also needs to be a default level applied to an instance if nothing is explicitly specified.
Three possible security levels are proposed. They are presented in the table below.
Value |
Mechanism |
Security implications |
Instance mobility |
|---|---|---|---|
|
Only the instance owner has access to the Barbican secret. This is existing behavior and will be the default behavior. |
This is the most secure option, as even the Nova service user and root on the compute host cannot read the secret. |
The instance is immovable and cannot be restarted by Nova in the event of a compute host crash or reboot. |
|
The Libvirt secret is persistent and retrievable. |
This is “medium” security. API-level admins and the Nova service user do not have access to the secret, but it can be accessed by users with sufficient privileges on the compute host. |
The instance can be live migrated because Nova can read the secret back from Libvirt on the source host and send it to the destination over RPC. Security over the wire is left as the operator’s responsibility, but TLS or similar is assumed. The instance can also be restarted by Nova in the event of a compute host crash or reboot for the exact same reason. |
|
The Nova service user owns the Barbican secret. |
This is the least secure but most flexible option. |
The instance can be live migrated because Nova can download the secret from Barbican and define it in Libvirt on the destination host. The instance can also be restarted by Nova in the event of a compute host crash or reboot for the exact same reason. |
Users are able to choose what level they require on their instance by selecting
a flavor that sets the new hw:tpm_secret_security flavor extra spec. If no
specific policy was indicated in the flavor extra spec, the instance will
default to the user policy, which is the same as legacy behavior.
For simplicity, if hw:tpm_secret_security is not set in the flavor extra
specs, an instance with vTPM will default to the user TPM secret security
policy.
A new image property is intentionally not provided because server rebuild is blocked in the API. If a user were to create a server with a given TPM secret security policy via an image property, that policy would become locked-in and unable to be changed. The user would not be able to change the image property because they would not be able to rebuild, and they would not be able to resize to a different TPM secret security policy because the image property and flavor extra spec would conflict and fail with HTTP 409.
Operators are able to specify what level they support by using the new
[libvirt]supported_tpm_secret_security config option. This is a
per compute host list option that can take the value of one or more of the
security levels from the previous table. Its default value is all three levels.
These values are exposed as driver capability traits. The
hw:tpm_secret_security flavor extra spec is translated to a required trait
to match the driver capabilities.
The behavior of an instance during live migration is defined by its persisted
embedded flavor hw:tpm_secret_security extra spec. Instances with user
cannot be live migrated. For instances with host, the source compute host
reads the secret from Libvirt and sends it over RPC to the destination. For
instances with deployment, the destination host downloads the secret from
Barbican and defines it in Libvirt. Because the instance’s
hw:tpm_secret_security value translates to a required trait, it’s
guaranteed that the destination host chosen for live migration supports
whatever behavior the instance requires.
Alternatives¶
This is the only version of this spec that covers the essentials: users of new instances can choose the security level that they require, and operators can choose which security levels they are willing to support given the limitations imposed by higher security levels.
We could also provide an image property for selection of the TPM secret security policy but it would be problematic because of the current inability to rebuild instances with vTPM (it is blocked in the API). Without the ability to rebuild a vTPM instance, any user who chose their policy via image property would be locked in to that policy unable to change it. They would not be able to change the image property value because they cannot rebuild and they would also not be able to change the policy via flavor extra spec because that would fail due to conflicting values between image property vs flavor extra spec.
If we would like to support image property in the future, we could possibly do it if we could add the ability to rebuild vTPM instances at the same time. It is not yet known if there are any technical limitations that prevent the possibility of implementing rebuild, but we could certainly investigate.
Data model impact¶
None.
REST API impact¶
No new microversion. The flavor extra spec validation code is updated to allow
hw:tpm_secret_security.
Security impact¶
The main security consequences of this spec are the implications of the
host and deployment values of hw:tpm_secret_security.
In the host case, anyone with sufficient access to the compute host can
read vTPM secrets. While this is not great, it’s also something the user opts
in to, and the compute host are assumed to be secured by the cloud operator.
In the deployment case, a compromise of the Nova service user leads to an
exposure of all vTPM secrets. Once again, this is something the user opts in
to, and the Nova service user is assumed to be secure.
Notifications impact¶
None.
Other end user impact¶
None.
Performance Impact¶
None.
Other deployer impact¶
None.
Developer impact¶
None.
Upgrade impact¶
A compute service version bump is necessary.
Live migration of instances with vTPM will be blocked until the minimum service version of the deployment is the upgraded version. The cloud must be fully upgraded.
Deployers must create flavor(s) with the hw:tpm_secret_security extra spec
set to host or deployment in order to enable creation of instances with
the respective TPM secret security policies.
Any instances without this set are pre-existing instances and for simplicity,
they will not be migrated. If a user would like to opt-in to live migration,
they can resize their pre-existing instance to a flavor that has the
hw:tpm_secret_security extra spec set to host or deployment.
Automatic migration of pre-existing instances into TPM secret security policies could be discussed and considered as future work.
Implementation¶
Assignee(s)¶
- Primary assignee:
notartom, melwitt
Feature Liaison¶
- Feature liaison:
melwitt, dansmith
Work Items¶
Introduce the
hw:tpm_secret_securityflavor extra spec, and[libvirt]supported_tpm_secret_securityconfig optionAdd
vtpm_secret_uuidandvtpm_secret_valuefields to theLibvirtLiveMigrateDataobject to carry the data over RPC from the source host to the destination host in the case of thehostTPM secret security policyModify the pre live migration and rollback code to handle secret definition and cleanup
Modify the resize code to handle TPM secret security policy conversions including absence of TPM secret security policy for pre-existing instances
Bump the service version
Modify the existing API block to only allow live migration of
hostordeploymentinstances once the minimum service version has reached the bumped versionAdd a whitebox/integration test
Add regular Tempest tests if possible
Update the documentation
Dependencies¶
Libvirt version 7.1.0. This can be enforced dynamically in code.
Testing¶
Nova’s functional tests are extended to test the Nova logic using the Libvirt fixture. This is particularly useful for cases that cannot be easily tested in a real environment, like rollback.
The existing whitebox-tempest-plugin vTPM tests are extended to test live migration in a real environment with an actual Libvirt.
Documentation Impact¶
Nova’s vTPM documentation is updated
to remove the live migration limitation and explain the usage of the
supported_tpm_secret_security configuration option, as well as the
implications of all possible values. The expectation that vTPM state storage is
not shared and that shared vTPM state storage live migration is untested is
made explicit.
References¶
Empty.
History¶
Release Name |
Description |
|---|---|
2026.1 Gazpacho |
Re-proposed |
2025.2 Flamingo |
Re-proposed |
2025.1 Epoxy |
Introduced |