Add maxphysaddr support for Libvirt

https://blueprints.launchpad.net/nova/+spec/libvirt-maxphysaddr-support

This blueprint propose new flavor extra_specs to control the physical address bits of vCPUs in Libvirt guests.

Problem description

When booting a guest with 1TB+ RAM, the default physical address bits are too small and the boot fails [1]. So a knob is needed to specify the appropriate physical address bits.

Use Cases

Booting a guest with large RAM.

Proposed change

In Libvirt v8.7.0+ and QEMU v2.7.0+, physical address bits can be specified with following XML elements [2] [3]. The former means to adopt any physical address bits, the latter means to adopt the physical address bits of the host CPU.

  • <maxphysaddr mode='emulate' bits='42'/>

  • <maxphysaddr mode='passthrough'/>

Flavor extra_specs

Here I suggest the following two flavor extra_specs. Of course, if these are omitted, the behavior is the same as before.

  • hw:maxphysaddr_mode can be either emulate or passthrough.

  • hw:maxphysaddr_bits takes a positive integer value. Only meaningful and must be specified if hw:maxphysaddr_mode=emulate.

Nova scheduler changes

Nova scheduler also needs to be modified to take these two properties into account.

There can be a mix of supported and unsupported hosts depending on Libvirt and QEMU versions. So add new traits COMPUTE_ADDRESS_SPACE_PASSTHROUGH and COMPUTE_ADDRESS_SPACE_EMULATED to check the scheduled host supports this feature. trait:COMPUTE_ADDRESS_SPACE_PASSTHROUGH=required is automatically added if hw:maxphysaddr_mode=passthrough is specified in flavor extra_specs. And same for hw:maxphysaddr_mode=emulate.

Passthrough and emulate modes have different properties. So let’s consider the two separately.

The case of hw:maxphysaddr_mode=passthrough. In this case, cpu_mode=host-passthrough is a requirement, which is already taken into account in nova scheduling, and no additional modifications are required in this proposal. It is not guaranteed whether the instance can be migrated by nova. So the admin needs to make sure that targets of cold and live migration have similar hardware and software. This restriction is similar for cpu_mode=host-passthrough.

The case of hw:maxphysaddr_mode=emulate. In nova scheduling, it is necessary to check that the hypervisor supports at least hw:maxphysaddr_bits. The maximum number of bits supported by hypervisor can be obtained by using libvirt capabilities [4]. Therefore, ComputeCapabilitiesFilter can be used to compare the number of bits in scheduling. For example, this can be accomplished by adding capabilities:cpu_info:maxphysaddr:bits>=42 automatically. Cold migration and live migration can also be realized with this filter and COMPUTE_ADDRESS_SPACE_EMULATED trait. So the overall flavor extra_specs look like the following:

openstack flavor set <flavor> \
  --property hw:maxphysaddr_mode=emulate \
  --property hw:maxphysaddr_bits=42

Note

Since ComputeCapabilitiesFilter only supports flavor extra_specs and not image properties [5], this proposal is out of scope for image properties.

Alternatives

Before the maxphysaddr option was introduced into Libvirt, it was specified as a workaround with the QEMU comanndline parameter. But this alternative is not allowed in nova.

Also, some Linux distributions may have machine types with host-phys-bits=true [6]. For example, pc-i440fx-bionic-hpb and pc-q35-bionic-hpb. However, this alternative has following two issues and cannot be adopted for general-purpose use cases.

  • Ubuntu package maintainers are applying a patch to QEMU [7]. It means this is not included in vanilla QEMU and is not available in other distributions.

  • This is only the case for hw:maxphysaddr_mode=passthrough and does not include hw:maxphysaddr_mode=emulate. Since hw:maxphysaddr_mode=passthrough requires cpu_mode=host-passthrough to be used [8], this alternative cannot be used with cpu_mode=custom or cpu_mode=host-model. So, this alternative is not sufficient for a cloud with many different CPU models.

As for scheduling, placement does not currently support numeric traits, so the maximum number of bits supported by hypervisor cannot be checked by this mechanism.

Data model impact

None

REST API impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

None

Other deployer impact

Operators should specify appropriate flavor extra_specs as needed.

Developer impact

None

Upgrade impact

As described earlier, the new traits COMPUTE_ADDRESS_SPACE_PASSTHROUGH and COMPUTE_ADDRESS_SPACE_EMULATED signal if the upgraded compute nodes support this feature.

Implementation

Assignee(s)

Primary assignee:

nmiki

Other contributors:

None

Feature Liaison

Feature liaison:

Liaison Needed

Work Items

  • Add new guest configs

  • Add new fileds in nova/api/validation/extra_specs/hw.py

  • Add new fields in LibvirtConfigCPU in nova/virt/livbirt/config.py

  • Add new traits to check Libvirt and QEMU versions

  • Add new field maxphysaddr to cpu_info in nova/virt/libvirt/driver.py

  • Add docs and release notes for new flavor extra_specs

Dependencies

Libivrt v8.7.0+. QEMU v2.7.0+.

Testing

Add the following unit tests:

  • check that proposed flavor extra_specs are properly validated

  • check that intended XML elements are output

  • check that traits are properly added and used

  • check that new field in ComputeCapabilitiesFilter is property added and used

Documentation Impact

For operators, the documentation describes what proposed flavor extra_specs mean and how they should be set.

References

History

Revisions

Release Name

Description

2023.1 Antelope

Introduced