Virt driver guest vCPU topology configuration¶
This feature aims to give users and administrators the ability to control the vCPU topology exposed to guests. This enables them to avoid hitting limitations on vCPU topologies that OS vendors place on their products.
When a guest is given multiple vCPUs, these are typically exposed in the hardware model as discrete sockets. Some operating system vendors will place artificial limits on the topologies that their product will support. So for example, a Windows guest may support 8 vCPUs only if it is exposed as 2 sockets with 4 cores each. If the vCPUs were exposed as 8 sockets with 1 core each, some of the vCPUs will be inaccessible to the guest. It is thus desirable to be able to control the mixture of cores and sockets exposed to the guest. The cloud administrator needs to be able to define topologies for flavors, to override the hypervisor defaults, such that commonly used OS’ will not encounter their socket count limits. The end user also needs to be able to express preferences for topologies to use with their images.
While the choice of sockets vs cores does not have a significant impact on performance, if a guest is given threads or is running on host OS CPUs which are thread siblings, this can have a notable performance impact. It only makes sense to expose a value of threads > 1 to a guest if all the guest vCPUs are strictly pinned to host pCPUs and some of the host pCPUs are thread siblings. While this blueprint will describe how to set the threads count, it will only make sense to set this to a value > 1 once the CPU pinning feature is integrated in Nova.
If the flavor admin wishes to define flavors which avoid scheduling on hosts which have pCPUs with threads > 1, then can use scheduler aggregates to setup host groups.
The proposal is to add support for configuration of aspects of vCPU topology at multiple levels.
At the flavor there will be the ability to define default parameters for the vCPU topology using flavor extra specs
hw:cpu_sockets=NN - preferred number of sockets to expose to the guest
hw:cpu_cores=NN - preferred number of cores to expose to the guest
hw:cpu_threads=NN - preferred number of threads to expose to the guest
hw:cpu_max_sockets=NN - maximum number of sockets to expose to the guest
hw:cpu_max_cores=NN - maximum number of cores to expose to the guest
hw:cpu_max_threads=NN - maximum number of threads to expose to the guest
It is not expected that administrators will set all these parameters against every flavor. The simplest expected use case will be for the cloud admin to set “hw:cpu_max_sockets=2” to prevent the flavor exceeding 2 sockets. The virtualization driver will calculate the exact number of cores/sockets/threads based on the flavor vCPU count and this maximum sockets constraint.
For larger vCPU counts there may be many possible configurations, so the “hw:cpu_sockets”, “hw:cpu_cores”, “hw:cpu_threads” parameters enable the cloud administrator to express their preferred choice from the large set.
The “hw:max_cores” parameter allows the cloud administrator to place an upper limit on the number of cores used, which can be useful to ensure a socket count greater than 1 and thus enable a VM to be spread across NUMA nodes.
The “hw:max_sockets”, “hw:max_cores” & “hw:max_threads” settings allow the cloud admin to set mandatory upper limits on the permitted configurations that the user can override with properties against the image.
At the image level the exact same set of parameters will be permitted, with the exception that image properties will use underscores throughout instead of an initial colon.
hw_cpu_sockets=NN - preferred number of sockets to expose to the guest
hw_cpu_cores=NN - preferred number of cores to expose to the guest
hw_cpu_threads=NN - preferred number of threads to expose to the guest
hw_cpu_max_sockets=NN - maximum number of sockets to expose to the guest
hw_cpu_max_cores=NN - maximum number of cores to expose to the guest
hw_cpu_max_threads=NN - maximum number of threads to expose to the guest
If the user sets “hw_cpu_max_sockets”, “hw_cpu_max_cores”, or “hw_cpu_max_threads”, these must be strictly lower than the values already set against the flavor. The purpose of this is to allow the user to further restrict the range of possible topologies that the compute host will consider using for the instance.
The “hw_cpu_sockets”, “hw_cpu_cores” & “hw_cpu_threads” values against the image may not exceed the “hw_cpu_max_sockets”, “hw_cpu_max_cores” & “hw_cpu_max_threads” values set against the flavor or image. If the upper bounds are exceeded, this will be considered a configuration error and the instance will go into an error state and not boot.
If there are multiple possible topology solutions implied by the set of parameters defined against the flavor or image, then the hypervisor will prefer the solution that uses a greater number of sockets. This preference will likely be further refined when integrating support for NUMA placement in a later blueprint.
If the user wants their settings to be used unchanged by the compute host they should set “hw_cpu_sockets” == “hw_cpu_max_sockets”, “hw_cpu_cores” == “hw_cpu_max_cores”, and “hw_cpu_threads” == “hw_cpu_max_threads” on the image. This will force use of the exact specified topology.
Note that there is no requirement in this design or implementation for the compute host topologies to match what is being exposed to the guest. ie this will allow a flavor to be given sockets=2,cores=2 and still be used to launch instances on a host with sockets=16,cores=1. If the admin wishes to optionally control this, they will be able todo so by setting up host aggregates.
The intent is to implement this for the libvirt driver, targeting QEMU / KVM hypervisors. Conceptually it is applicable to all other full machine virtualization hypervisors such as Xen and VMWare.
The virtualization driver could hard code a different default topology, so that all guest always use
While this would address the immediate need of current Windows OS’, this is not likely to be sufficiently flexible for the longer term, as it forces all OS into using cores, even if they don’t have any similar licensing restrictions. The over-use of cores will limit the ability to do an effective job at NUMA placement, so it is desirable to use cores as little as possible.
The settings could be defined exclusively against the images, and not make any use of flavor extra specs. This is undesirable because to have best NUMA utilization, the cloud administrator will need to be able to constrain what topologies the user is allowed to use. The administrator would also like to have the ability to set up define behaviour so that guest can get a specific topology without requiring every single image uploaded to glance to be tagged with the same repeated set of properties.
A more fine grained configuration option would be to allow the specification of the core and thread count for each separate socket. This would allow for asymmetrical topologies eg
It is noted, however, that at time of writing, no virtualization technology provides any way to configure such asymmetrical topologies. Thus Nova is better served by ignoring this purely theoretical possibility and keeping its syntax simpler to match real-world capabilities that already exist.
Data model impact¶
The new properties will use the existing flavor extra specs and image property storage models.
REST API impact¶
The new properties will use the existing flavor extra specs and image property API facilities.
The choice of sockets vs cores can have an impact on host resource utilization when NUMA is involved, since over use of cores will prevent a guest being split across multiple NUMA nodes. This feature addresses this by allowing the flavor administrator to define hard caps, and ensuring the flavor will always take priority over the image settings.
There is no need for this feature to integrate with notifications.
Other end user impact¶
The user will gain the ability to control aspects of the vCPU topology used by their guest. They will achieve this by setting image properties in glance.
The cores vs sockets vs threads decision does not involve any scheduler interaction, since this design is not attempting to match host topology to guest topology. A later blueprint on CPU pinning will make it possible todo such host to guest topology matching, and its performance impact will be considered there.
Other deployer impact¶
The flavor extra specs will gain new parameters in extra specs which a cloud administrator can choose to use. If none are set then the default behaviour is unchanged from previous releases.
The initial implementation will be done for libvirt with QEMU/KVM. It should be possible to add support for using the cores/sockets/threads parameters in the XenAPI and VMWare drivers.
Provide helper methods against the computer driver base class for calculating valid CPU topology solutions for the given hw_cpu_* parameters.
Add Libvirt driver support for choosing a CPU topology solution based on the given hw_cpu_* parameters.
No external dependencies
No tempest changes.
The mechanisms for the cloud administrator and end user to set parameters against the flavor and/or image are already well tested. The new functionality focuses on interpreting the parameters and setting corresponding libvirt XML parameters. This is something that is effectively covered by the unit testing framework.
The new flavor extra specs and image properties will need to be documented. Guidance should be given to cloud administrators on how to make most effective use of the new features. Guidance should be given to the end user on how to use the new features to address their use cases.
Current “big picture” research and design for the topic of CPU and memory resource utilization and placement. vCPU topology is a subset of this work