VM Scoped SR-IOV NUMA Affinity Policies¶
In the Queens release 1 support was added to allow PCI NUMA affinity policies to be specified via PCI aliases. This work builds on a previous feature introduced in the Juno release 2 that introduced strict NUMA affinity for PCI devices; however, the Queens feature did not address the NUMA affinity of neutron SR-IOV interfaces which were also enforced by the original Juno enhancement. This spec seeks to provide a per-VM mechanism to set a VM-wide NUMA afinity policy for all PCI passthrough devices, including but not limited to neutron SR-IOV interfaces (vnic_type=direct,direct-phyical,macvtap,virtio-forwarder)
In some environments the server form factor is restricted, preventing PCI devices from being physically installed across all NUMA nodes on a server, e.g. high density blade/multi server systems or non standard form factor equipment. In such an environment the default legacy policy which is applied to all neutron SR-IOV interfaces prevents VMs from using SR-IOV on a non local NUMA node if the VM has a NUMA topology (uses cpu pinning, vPMEM, hugepages or requests a NUMA topology explicitly).
To use a remote SR-IOV device via neutron ports in such an environment the operator is forced to either configure the guest to have multiple NUMA nodes or disable NUMA reporting on the host server. Both options pessimize the performance of both the guest and host in different ways. While a VM with multiple virtual NUMA nodes can outperform a VM with the same resources and a single NUMA node in a memory bound workload, that is only true if the workload is NUMA-aware. A two-node NUMA topology, if enforced on a workload that is not NUMA-aware, can result in increased cross-NUMA traffic and result in a lower throughput. Similarly while disabling NUMA reporting at the hardware level is beneficial in some HPC workloads due to the increased memory bandwidth, it comes at the cost of increased memory latency, making it unsuitable for realtime workloads such as VOIP.
As an operator deploying openstack on high density or restricted form factor hardware, I wish to specify a per-VM NUMA affinity policy for SR-IOV devices via standard flavor extra specs.
As a tenant or VNF vendor, I want to be able to customize the affinity of my VMs via image properties so I can express the NUMA affinity requirements of my workloads.
This spec proposes extending the PCI NUMA affinity polices introduced
by 1 to all PCI and SR-IOV devices including neutron ports by adding a
new flavor extra spec
hw_pci_numa_affinity_policy image metadata property.
The new properties will accept one of three values:
legacy as defined in 1. If a PCI device is requested using a flavor
alias, the NUMA affinity policy specified in the flavor or image will
take precedence over any policy set in the host PCI alias. If no
PCI NUMA affinity policy is specified in the flavor or image, alias based
PCI pass-through will fall back to the policy set in the alias. If no policy
is set in the flavor or image and no policy is set in the alias the legacy
policy will continue to be used. For neutron SR-IOV interfaces if no policy
is set in the flavor or image the legacy policy will be used.
The Queens spec 1 originally contained both of the proposed flavor and image properties but were removed during implementation as the original neutron port usecase that motivated the feature was not captured in the spec. As a result, while the Queens feature addressed NUMA affinity for flavor-based PCI pass-through, no mechanism is available to specify the policy for neutron SR-IOV interfaces.
We could change the default policy to
preferred if no policy is specified.
This would optimize for cases where people do not care about NUMA affinity
at the expense of requiring those who do to specify a policy.
As this would be a change in behavior on upgrade it is not proposed that we
take this approach.
We could enable per-interface NUMA affinity polices. This is not mutually exclusive with this proposal and will be proposed separately as an additional feature. The flavor- and image-based approach covers 80% of the use cases enabled by per-interface NUMA affinity polices without requiring neutron api changes.
Data model impact¶
The image metadata object and related notification objects will be updated to contain the new PCI NUMA affinity field. As the PCI request spec object already has a NUMA affinity policy field for alias-based pass-through, no other data model changes are required.
REST API impact¶
There will be no direct changes to any existing API. However, a new flavor extra spec will be introduced.
The image metadata properties payload will be extended with the new property field. No other impact is expected.
Other end user impact¶
To utilize this feature operators and tenants will need to modify their
images and flavors to add the
As the scheduler was already asserting legacy PCI affinity, passing a policy to assert instead should not affect the overall scheduling time. Depending on the policy selected the performance of the guest may improve or be reduced inline with the guarantees expressed by that policy.
Other deployer impact¶
As was previously required to enable NUMA affinity to be enforced for SR-IOV/PCI devices, the PCI pass-through and NUMA topology filters must be enabled.
- Primary assignee:
- Feature liaison:
As this feature relates to SR-IOV it cannot be tested in the upstream gate via tempest. Unit tests will be provided to assert that the policy is correctly conveyed to the existing PCI assignment code and the existing functional test can be extended as required.
As this feature simply provides another way to specify the PCI affinity policy the code change is minimal and can leverage much of the existing test coverage.
A release note and updates to the existing user flavor docs will be provided, and the glance metadefs should be updated to reflect the new image property.