Virt driver pinning guest vCPU threads policies

https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-thread-pinning

This feature aims to implement the remaining functionality of the virt-driver-cpu-pinning spec. This entails implementing support for thread policies.

Problem description

Some applications must exhibit real-time or near real-time behavior. This is general possible by making use of processor affinity and binding vCPUs to pCPUs. This functionality currently exist in Nova. However, it is also necessary to consider thread affinity in the context of simultaneous multithreading (SMT) enabled systems, such as those with Intel(R) Hyper-Threading Technology. In these systems, competition for shared resources can result in unpredictable behavior.

Use Cases

Depending on the workload being executed the end user or cloud admin may wish to have control over how the guest uses hardware threads. To maximise cache efficiency, the guest may wish to be pinned to thread siblings. Conversely the guest may wish to avoid thread siblings. This level of control is of particular importance to Network Function Virtualization (NFV) deployments, which care about maximizing cache efficiency of vCPUs.

Project Priority

None

Proposed change

The flavor extra specs will be enhanced to support one new parameter:

  • hw:cpu_thread_policy=prefer|isolate|require

This policy is an extension to the already implemented CPU policy parameter:

  • hw:cpu_policy=shared|dedicated

The threads policy will control how the scheduler / virt driver places guests with respect to CPU threads. It will only apply if the CPU policy is ‘dedicated’, i.e. guest vCPUs are being pinned to host pCPUs.

  • prefer: The host may or may not have an SMT architecture. This retains the legacy behavior, whereby siblings are prefered when available. This is the default if no policy is specified.

  • isolate: The host must not have an SMT architecture, or must emulate a non-SMT architecture. If the host does not have an SMT architecture, each vCPU will simply be placed on a different core as expected. If the host does have an SMT architecture (i.e. one or more cores have “thread siblings”) then each vCPU will be placed on a different physical core and no vCPUs from other guests will be placed on the same core. As such, one thread sibling is always guaranteed to always be unused.

  • require: The host must have an SMT architecture. Each vCPU will be allocated on thread siblings. If the host does not have an SMT architecture then it will not be used. If the host has an SMT architecture, but not enough cores with free thread siblings are available, then scheduling will fail.

The image metadata properties will also allow specification of the threads policy:

  • hw_cpu_thread_policy=prefer|isolate|require

This will only be honored if the flavor specifies the ‘prefer’ policy, either explicitly or implicitly as the defalt option. This ensures that the cloud administrator can have absolute control over threads policy if desired.

Alternatives

None.

Data model impact

None.

The necessary changes were already completed in the original spec.

REST API impact

No impact.

The existing APIs already support arbitrary data in the flavor extra specs.

Security impact

No impact.

Notifications impact

No impact.

The notifications system is not used by this change.

Other end user impact

No impact.

Support for flavor extra specs is already available in the Python clients.

Performance Impact

The scheduler will incur small further overhead if a threads policy is set on the image or flavor. This overhead will be negligible compared to that implied by the enhancements to support NUMA policy and huge pages. It is anticipated that dedicated CPU guests will typically be used in conjunction with huge pages.

Other deployer impact

The cloud administrator will gain the ability to define flavors with explicit threading policy. Although not required by this design, it is expected that the administrator will commonly use the same host aggregates to group hosts for both CPU pinning and large page usage, since these concepts are complementary and expected to be used together. This will minimize the administrative burden of configuring host aggregates.

Developer impact

It is expected that most hypervisors will have the ability to support the required thread policies. The flavor parameter is simple enough that any Nova driver would be able to support it.

Implementation

Assignee(s)

Primary assignee:

sfinucan

Work Items

  • Enhance the scheduler to take account of threads policy when choosing which host to place the guest on.

  • Enhance the scheduler to take account of threads policy when mapping vCPUs to pCPUs

Dependencies

None.

Testing

It is not practical to test this feature using the gate and tempest at this time, since effective testing will require that the guests running the test be provided with multiple NUMA nodes, each in turn with multiple CPUs.

These features will be validated using a third-party CI (Intel Compute CI).

Documentation Impact

None.

The documentation changes were made in the previous change.

References

Current “big picture” research and design for the topic of CPU and memory resource utilization and placement. vCPU topology is a subset of this work:

Current CPU pinning validation tests for Intel Compute CI:

Existing CPU Pinning spec:

History

Revisions

Release Name

Description

Liberty

Introduced

Mitaka

Revised to include rework policies, removing two, adding one and clarifying the remainder