Add maxphysaddr support for Libvirt

https://blueprints.launchpad.net/nova/+spec/libvirt-maxphysaddr-support

This blueprint propose new flavor extra_specs and image properties to control the physical address bits of vCPUs in Libvirt guests.

Problem description

When booting a guest with 1TB+ RAM, the default physical address bits are too small and the boot fails [1]. So a knob is needed to specify the appropriate physical address bits.

Use Cases

Booting a guest with large RAM.

Proposed change

In Libvirt v8.7.0+ and QEMU v2.7.0+, physical address bits can be specified with following XML elements [2] [3]. The former means to adopt any physical address bits, the latter means to adopt the physical address bits of the host CPU.

  • <maxphysaddr mode='emulate' bits='42'/>

  • <maxphysaddr mode='passthrough'/>

Flavor extra_specs and image properties

Here I suggest the following two for flavor extra_specs and image properties. Of course, if these are omitted, the behavior is the same as before.

  • hw:maxphysaddr_mode can be either emulate or passthrough.

  • hw:maxphysaddr_bits takes a positive integer value. Only meaningful and must be specified if hw:maxphysaddr_mode=emulate.

So the overall flavor extra_specs look like the following:

openstack flavor set <flavor> \
  --property hw:maxphysaddr_mode=emulate \
  --property hw:maxphysaddr_bits=42

Also the same, but the overall image properties look like the following:

openstack image set <image> \
  --property hw_maxphysaddr_mode=emulate \
  --property hw_maxphysaddr_bits=42

Nova scheduler changes

Nova scheduler also needs to be modified to take these two properties into account.

hw:maxphysaddr_mode

There can be a mix of supported and unsupported hosts depending on Libvirt and QEMU versions. So add new traits COMPUTE_ADDRESS_SPACE_PASSTHROUGH and COMPUTE_ADDRESS_SPACE_EMULATED to check the scheduled host supports this feature. trait:COMPUTE_ADDRESS_SPACE_PASSTHROUGH=required is automatically added if hw:maxphysaddr_mode=passthrough is specified in flavor extra_specs or image properties. And same for hw:maxphysaddr_mode=emulate. This can be implemented inside the from_request_spec method of ResourceRequest class.

Passthrough and emulate modes have different properties. So let’s consider the two separately.

The case of hw:maxphysaddr_mode=passthrough. In this case, cpu_mode=host-passthrough is a requirement, which is already taken into account in nova scheduling, and no additional modifications are required in this proposal. It is not guaranteed whether the instance can be migrated by nova. So the admin needs to make sure that targets of cold and live migration have similar hardware and software. This restriction is similar for cpu_mode=host-passthrough.

The case of hw:maxphysaddr_mode=emulate. In nova scheduling, it is necessary to check that the hypervisor supports at least hw:maxphysaddr_bits. Numerical comparison is implemented differently for flavor extra_specs and image properties, so it is divided into two cases.

hw:maxphysadr_bits

The maximum number of bits supported by hypervisor can be obtained by using libvirt capabilities [4].

If hw:maxphysaddr_bits is set to flavor extra_specs, ComputeCapabilitiesFilter can be used to compare the number of bits in scheduling. For example, this can be accomplished by adding capabilities:cpu_info:maxphysaddr:bits>=42 automatically.

If hw_maxphysaddr_bits is set to image properties, perform a numeric comparison with ImagePropertiesFilter.

Cold migration and live migration can also be realized with these filter and COMPUTE_ADDRESS_SPACE_EMULATED trait.

Alternatives

Before the maxphysaddr option was introduced into Libvirt, it was specified as a workaround with the QEMU comanndline parameter. But this alternative is not allowed in nova.

Also, some Linux distributions may have machine types with host-phys-bits=true [5]. For example, pc-i440fx-bionic-hpb and pc-q35-bionic-hpb. However, this alternative has following two issues and cannot be adopted for general-purpose use cases.

  • Ubuntu package maintainers are applying a patch to QEMU [6]. It means this is not included in vanilla QEMU and is not available in other distributions.

  • This is only the case for hw:maxphysaddr_mode=passthrough and does not include hw:maxphysaddr_mode=emulate. Since hw:maxphysaddr_mode=passthrough requires cpu_mode=host-passthrough to be used [7], this alternative cannot be used with cpu_mode=custom or cpu_mode=host-model. So, this alternative is not sufficient for a cloud with many different CPU models.

As for scheduling, placement does not currently support numeric traits, so the maximum number of bits supported by hypervisor cannot be checked by this mechanism. Numeric comparisons can also be performed with JsonFilter. However, JsonFilter appears to be vulnerable to changes in HostState and its child attributes, which is mentioned as a warning [10]. So this spec employs ComputeCapabilitiesFilter and ImagePropertiesFilter.

Data model impact

None

REST API impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

None

Other deployer impact

Operators should specify appropriate flavor extra_specs or image properties as needed.

Developer impact

None

Upgrade impact

As described earlier, the new traits COMPUTE_ADDRESS_SPACE_PASSTHROUGH and COMPUTE_ADDRESS_SPACE_EMULATED signal if the upgraded compute nodes support this feature.

Implementation

Assignee(s)

Primary assignee:

nmiki

Other contributors:

None

Feature Liaison

Feature liaison:

Liaison Needed

Work Items

This spec is addressed across multiple dev cycles. The merged and missing items are shown below, respectively.

Merged Items

  • Add new traits to check Libvirt and QEMU versions [8] [9]

Missing Items

  • Add new guest configs

  • Add new fileds in nova/api/validation/extra_specs/hw.py

  • Add new fileds in nova/objects/image_meta.py

  • Add new fields in LibvirtConfigCPU in nova/virt/livbirt/config.py

  • Add new field maxphysaddr to cpu_info in nova/virt/libvirt/driver.py

  • Add docs and release notes for new flavor extra_specs

  • Support for hw:maxphysadar_bits numeric comparison in ComputeCapabilitiesFilter

  • Support for hw_maxphysaddr_bits numeric comparison in ImagePropertiesFilter

Dependencies

Libivrt v8.7.0+. QEMU v2.7.0+.

Testing

Add the following unit tests:

  • check that proposed flavor extra_specs are properly validated

  • check that proposed image properties are properly validated

  • check that intended XML elements are output

  • check that traits are properly added and used

  • check that new field in ComputeCapabilitiesFilter is property added and used

  • check that new field in ImagePropertiesFilter is property added and used

Documentation Impact

For operators, the documentation describes what proposed flavor extra_specs and image properties mean and how they should be set.

References

History

Revisions

Release Name

Description

2023.1 Antelope

Introduced

2023.2 Bobcat

Reproposed

2024.1 Caracal

Reproposed