Network Bandwidth resource provider¶
This spec proposes adding new resource classes representing network bandwidth and modeling network backends as resource providers in Placement. As well as adding scheduling support for the new resources in Nova.
Currently there is no method in the Nova scheduler to place a server based on the network bandwidth available in a host. The Placement service doesn’t track the different network back-ends present in a host and their available bandwidth.
A user wants to spawn a server with a port associated with a specific physical network. The user also wants a defined guaranteed minimum bandwidth for this port. The Nova scheduler must select a host which satisfies this request.
This spec proposes Neutron to model the bandwidth resource of the physical NICs on a compute host and their resources providers in the Placement service, express the bandwidth request in the Neutron port, and modify Nova to consider the requested bandwidth resource during the scheduling of the server based on the available bandwidth resources on each compute host.
This also means that this spec proposes to use Placement and the nova-scheduler to select which bandwidth providing RP and therefore which physical device will provide the bandwidth for a given Neutron port. Today selecting the physical device happens during Neutron port binding but after this spec is implemented this selection will happen when an allocation candidate is selected for the server in the nova-scheduler. Therefore Neutron needs to provide enough information in the Networking RP model in Placement and in the resource_request field of the port so that Nova can query Placement and receive allocation candidates that are not conflicting with Neutron port binding logic. The Networking RP model and the schema of the new resource_request port attribute is described in QoS minimum bandwidth allocation in Placement API Neutron spec.
Please note that today Neutron port binding could fail if the nova-scheduler selects a compute host where Neutron cannot bind the port. We are not aiming to remove this limitation by this spec but also we don’t want to increase the frequency of such port binding failures as it would ruin the usability of the system.
Separation of responsibilities¶
Nova creates the root RP of the compute node RP tree as today
Neutron creates the networking RP tree of a compute node under the compute node root RP and reports bandwidth inventories
Neutron provides the resource_request of a port in the Neutron API
Nova takes the ports’ resource_request and includes it in the GET /allocation_candidate request. Nova does not need to understand or manipulate the actual resource request. But Nova needs to assign unique granular resource request group suffix for each port’s resource request.
Nova selects one allocation candidate and claims the resources in Placement.
Nova passes the RP UUID used to fulfill the port resource request to Neutron during port binding
Due to the size and complexity of this feature the scope of the current spec is limited. To keep backward compatibility while the feature is not fully implemented both new Neutron API extensions will be optional and turned off by default. Nova will check for the extension that introduces the port’s resource_request field and fall back to the current resource handling behavior if the extension is not loaded.
Out of scope from Nova perspective:
Supporting separate proximity policy for the granular resource request groups created from the Neutron port’s resource_request. Nova will use the policy defined in the flavor extra_spec for the whole request as today such policy is global for an allocation_candidate request.
Handling Neutron mechanism driver preference order in a weigher in the nova-scheduler
Interface attach with a port or network having a QoS minimum bandwidth policy rule as interface_attach does not call scheduler today. Nova will reject interface_attach request if the port (passed in or created in network that is passed in) resource request in non empty.
Server create with network having QoS minimum bandwidth policy rule as a port in this network is created by the nova-compute after the scheduling decision. This spec proposes to fail such boot in the compute-manager.
QoS policy rule create or update on bound port
QoS aware trunk subport create under a bound parent port
Baremetal port having a QoS bandwidth policy rule is out of scope as Neutron does not own the networking devices on a baremetal compute node.
This spec needs to consider multiple flows and scenarios detailed in the following sections.
Neutron agent first start¶
The Neutron agent running on a given compute host uses the existing
neutron.conf variable to find the compute RP related to its host in Placement.
See Finding the compute RP for details and reasoning.
The Neutron agent creates the networking RPs under the compute RP with proper traits then reports resource inventories based on the discovered and / or configured resource inventory of the compute host. See QoS minimum bandwidth allocation in Placement API for details.
Neutron agent restart¶
During restart the Neutron agent ensures that the proper RP tree exists in Placement with correct inventories and traits by creating / updating the RP tree if necessary. The Neutron agent only modifies the inventory and traits of the RPs that were created by the agent. Also Neutron only modifies the pieces that actually got added or deleted. Unmodified pieces should be left in place (no delete and re-create).
Server create with pre-created Neutron ports having QoS policy rule¶
The end user creates a Neutron port with the Neutron API and attaches a QoS policy minimum bandwidth rule to it, either directly or indirectly by attaching the rule to the network the port is created in. Then the end user creates a server in Nova and passes in the port UUID in the server create request.
Nova fetches the port data from Neutron. This already happens in create_pci_requests_for_sriov_ports in the current code base. The port contains the requested resources and required traits. See Resource request in the port.
The create_pci_requests_for_sriov_ports() call needs to be refactored to a more generic call that not just generates PCI requests but also collects the requested resources from the Neutron ports.
The nova-api stores the requested resources and required traits in a new field of the RequestSpec object called requested_resources. The new requested_resources field should not be persisted in the api database as it is computed data based on the resource requests from different sources in this case from the Neutron ports and the data in the port might change outside of Nova.
The nova-scheduler uses this information from the RequestSpec to send an allocation candidate request to Placement that contains the port related resource requests besides the compute related resource requests. The requested resources and required traits from each port will be considered to be restricted to a single RP with a separate, numbered request group as defined in the granular-resource-request spec. This is necessary as mixing requested resource and required traits from different ports (i.e. one OVS and one SRIOV) towards placement will cause empty allocation candidate response as no RP will have both OVS and SRIOV traits at the same time.
Alternatively we could extend and use the requested_networks (NetworkRequestList ovo) parameter of the build_instance code path to store and communicate the resource needs coming from the Neutron ports. Then the select_destinations() scheduler rpc call needs to be extended with a new parameter holding the NetworkRequestList.
The nova.scheduler.utils.resources_from_request_spec() call needs to be modified to use the newly introduced requested_resources field from the RequestSpec object to generate the proper allocation candidate request.
Later on the resource request in the Neutron port API can be evolved to support the same level of granularity that the Nova flavor resource override functionality supports.
Then Placement returns allocation candidates. After additional filtering and weighing in the nova-scheduler, the scheduler claims the resources in the selected candidate in a single transaction in Placement. The consumer_id of the created allocations is the instance_uuid. See The consumer of the port related resources.
When multiple ports, having QoS policy rules towards the same physical network, are attached to the server (e.g. two VFs on the same PF) then the resulting allocation is the sum of the resource amounts of each individual port request.
Delete a server with ports having QoS policy rule¶
During normal delete, local delete and shelve_offload Nova today deletes the resource allocation in placement where the consumer_id is the instance_uuid. As this allocation will include the port related resources those are also cleaned up.
Detach_interface with a port having QoS policy rule¶
After the detach succeeds in Neutron and in the hypervisor, the nova-compute needs to delete the allocation related to the detached port in Placement. The rest of the server’s allocation will not be changed.
Server move operations (cold migrate, evacuate, resize, live migrate)¶
During the move operation Nova makes allocation on the destination host with consumer_id == instance_uuid while the allocation on the source host is changed to have consumer_id == migration_uuid. These allocation sets will contain the port related allocations as well. When the move operation succeeds Nova deletes the allocation towards the source host. If the move operation rolled back Nova cleans up the allocations towards the destination host.
As the port related resource request is not persisted in the RequestSpec object Nova needs to re-calculate that from the ports’ data before calling the scheduler.
Move operations with force host flag (evacuate, live-migrate) do not call the scheduler. So to make sure that every case is handled we have to go through every direct or indirect call of reportclient.claim_resources() function and ensure that the port related resources are handled properly. Today we blindly copy the allocation from source host to destination host by using the destination host as the RP. This will be lot more complex as there will be more than one RP to be replaced and Nova will have a hard time to figure out what Network RP from the source host maps to what Network RP on the destination host. A possible solution is to send the move requests through the scheduler regardless of the force flag but skipping the scheduler filters.
Server move operations with ports having resource request are not supported in Stein.
Shelve_offload and unshelve¶
During shelve_offload Nova deletes the resource allocations including the port related resources as those also have the same consumer_id, the instance uuid. During unshelve a new scheduling is done in the same way as described in the server create case.
Unshelve after Shelve offload operations with ports having resource request are not supported in Stein.
Finding the compute RP¶
Neutron already depends on the
host conf variable to be set to the same id
that Nova uses in the Neutron port binding. Nova uses the hostname in the port
binding. If the
host is not defined in the Neutron config then it defaults
to the hostname as well. This way Neutron and Nova are in sync today. The same
mechanism (i.e. the hostname) can be used in Neutron agent to find the compute
RP created by Nova for the same compute host.
Having non fully qualified hostnames in a deployment can cause ambiguity. For example different cells might contain hosts with the same hostname. This hostname ambiguity in a multicell deployment is already a problem without the currently proposed feature as Nova uses the hostname as the compute RP name in Placement and the name field has a unique constraint in the Placement db model. So in an ambiguous situation the Nova compute services having non unique hostnames have already failed to create RPs in Placement.
The ambiguity can be fixed by enforcing that hostnames are FQDNs. However as this problem is not special for the currently proposed feature this fix is out of scope of this spec. The override-compute-node-uuid blueprint describes a possible solution.
Separating non QoS aware and QoS aware ports¶
If QoS aware and non QoS aware ports are mixed on the same physical port then the minimum bandwidth rule cannot be fulfilled. The separation can be achieved at least on two levels:
Separating compute hosts via host aggregates. The deployer can create two host aggregates in Nova, one for QoS aware server and another for non QoS aware servers. This separation can be done without changing either Nova or Neutron. This is the proposed solution for the first version of this feature.
Separating physical ports via traits. The Neutron agent can put traits, like CUSTOM_GUARANTEED_BW_ONLY and CUSTOM_BEST_EFFORT_BW_ONLY to the network RPs to indicate which physical port belongs to which group. Neutron can offer this configurability via neutron.conf. Then Neutron can add CUSTOM_GUARANTEED_BW_ONLY trait in resource request of the port that is QoS aware and add CUSTOM_BEST_EFFORT_BW_ONLY trait otherwise. This solution would allow better granularity as a server can request guaranteed bandwidth on its data port and can accept best effort connectivity on its control port. This solution needs additional work in Neutron but no additional work in Nova. Also this would mean that ports without QoS policy rules would also have at least a trait request (CUSTOM_BEST_EFFORT_BW_ONLY) and it would cause scheduling problems with a port created by the nova-compute. Therefore this option can only be supported after nova port create is moved to the conductor.
If we use *_ONLY traits then we can never combine them, though that would be desirable. For example it makes perfect sense to guarantee 5 gigabits of a 10 gigabit card to somebody and let the rest to be used on a best effort basis. To allow this we only need to turn the logic around and use traits CUSTOM_GUARANTEED_BW and CUSTOM_BEST_EFFORT_BW. If the admin still wants to keep guaranteed and best effort traffic fully separated then s/he never puts both traits on the same RP. But one can mix them if one wants to. Even the possible starvation of best effort traffic (next to guaranteed traffic) could be easily addressed by reserving some of the bandwidth inventory.
Alternatives are discussed in their respective sub chapters in this spec.
Data model impact¶
Two new standard Resource Classes will be defined to represent the bandwidth in each direction, named as NET_BW_IGR_KILOBIT_PER_SEC and NET_BW_EGR_KILOBIT_PER_SEC. The kbps unit is selected as the Neutron API already use this unit in the QoS minimum bandwidth rule API and we would like to keep the units in sync.
A new requested_resources field is added to the RequestSpec versioned object with ListOfObjectField(‘RequestGroup’) type to store the resource and trait requests coming from the Neutron ports. This field will not be persisted in the database.
A new field
requester_id is added to the InstancePCIRequest versioned
object to connect the PCI request created from a Neutron port to the resource
request created from the same Neutron port. Nova will populate this field with
port_id of the Neutron port and the
requester_id field of the
RequestGroup versioned object will be used to match the PCI request with the
The QoS minimum bandwidth allocation in Placement API Neutron spec will
propose the modeling of the Networking RP subtree in Placement. Nova will
not depend on the exact structure of such model as Neutron will provide the
port’s resource request in an opaque way and Nova will only need to blindly
include that resource request to the
GET allocation_candidates request.
Resource request in the port¶
Neutron needs to express the port’s resource needs in the port API in a similar way the resource request can be done via flavor extra_spec. For now we assume that a single port requests resources from a single RP. Therefore Nova will map each port’s resource request to a single numbered resource request group as defined in granular-resource-request spec. That spec requires that the name of the numbered resource groups has a form of resources<integer>. Nova will map a port’s resource_request to the first unused numbered group in the allocation_candidate request. Neutron does not know which ports are used together in a server create request, and which numbered groups have already been used by the flavor extra_spec therefore Neutron cannot assign unique integer ids to the resource groups in these ports.
From implementation perspective it means Nova will create one RequestGroup instance for each Neutron port based on the port’s resource_request and insert it to the end of the list in RequestSpec.requested_resources.
In case when the Neutron multi-provider extension is used and a logical network maps to more than one physnet then the port’s resource request will require that the selected network RP has one of the physnet traits the network maps to. This any-traits type of request is not supported by Placement today but can be implemented similarly to member_of query param used for aggregate selection in Placement. This will be proposed in a separate spec any-traits-in-allocation_candidates-query.
Mapping between physical resource consumption and claimed resources¶
Neutron must ensure that the resources allocated in Placement for a port are the same as the resources consumed by that port from the physical infrastructure. To be able to do that Neutron needs to know the mapping between a port’s resource request and a specific RP (or RPs) in the allocation record of the server that are fulfilling the request.
Nova will calculate which port is fulfilled by which RP and the RP UUID will be provided to Neutron during the port binding.
REST API impact¶
Neutron REST API impact is discussed in the separate QoS minimum bandwidth allocation in Placement API Neutron spec.
The Placement REST API needs to be extended to support querying allocation candidates with an RP that has at least one of the traits from a list of requested traits. This feature will be described in the separate any-traits-in-allocation_candidates-query spec.
A new microversion will be added to the Nova REST API to indicate that server create supports ports with resource request. Server operations (e.g. create, interface_attach, move) involving ports having resource request will be rejected with older microversion. However server delete and port detach will be supported with old microversion for these server too.
Server move operations are not supported in Stein even with the new microversion.
Other end user impact¶
Placement API will be used from Neutron to create RPs and the compute RP tree will grow in size.
Nova will send more complex allocation candidate request to Placement as it will include the port related resource request as well.
Nova will calculate the mapping between each port’s resource request and the RP in the overall allocation that fulfills such request.
As Placement do not seem to be a bottleneck today we do not foresee performance degradation due to the above changes.
Other deployer impact¶
This feature impacts multiple modules and creates new dependencies between Nova, Neutron and Placement.
Also the deployer should be aware that after this feature the server create and move operations could fail due to bandwidth limits managed by Neutron.
Servers could exist today with SRIOV ports having QoS minimum bandwidth policy rule and for them the resource allocation is not enforced in Placement during scheduling. Upgrading to an OpenStack version that implements this feature will make it possible to change the rule in Neutron to be placement aware (i.e. request resources) then (live) migrate the servers and during the selection of the target of the migration the minimum bandwidth rule will be enforced by the scheduler. Tools can also be provided to search for existing instances and try to do the minimum bandwidth allocation in place. This way the number of necessary migrations can be limited.
The end user will see behavior change of the Nova API after such upgrade:
Booting a server with a network that has QoS minimum bandwidth policy rule requesting bandwidth resources will fail. The current Neutron feature proposal introduces the possibility of a QoS policy rule to request resources but in the first iteration Nova will only support such rule on a pre-created port.
Attaching a port or a network having QoS minimum bandwidth policy rule requesting bandwidth resources to a running server will fail. The current Neutron feature proposal introduces the possibility of a QoS policy rule to request resources but in the first iteration Nova will not support such rule for interface_attach.
The new QoS rule API extension and the new port API extension in Neutron will be marked experimental until the above two limitations are resolved.
balazs-gibizer (Balazs Gibizer)
xuhj (Alex Xu)
minsel (Miguel Lavalle)
bence-romsics (Bence Romsics)
lajos-katona (Lajos Katona)
This spec does not list work items for the Neutron impact.
Make RequestGroup an ovo and add the new requested_resources field to the RequestSpec. Then refactor the resources_from_request_spec() to use the new field.
Implement any-traits-in-allocation_candidates-query and mixing-required-traits-with-any-traits support in Placement. This work can be done in parallel with the below work items as any-traits type of query only needed for a small subset of the use cases.
Read the resource_request from the Neutron port in the nova-api and store the requests in the RequestSpec object.
Include the port related resources in the allocation candidate request in nova-scheduler and nova-conductor and claim port related resources based on a selected candidate.
Send the server’s whole allocation to the Neutron during port binding
Ensure that server move operations with force flag handles port resource correctly by sending such operations through the scheduler.
Delete the port related allocations from Placement after successful interface detach operation
Reject an interface_attach request that contains a port or a network having a QoS policy rule attached that requests resources.
Check in nova-compute that a port created by the nova-compute during server boot has a non empty resource_request in the Neutron API and fail the boot if it has
any-traits-in-allocation_candidates-query and mixing-required-traits-with-any-traits to support multi-provider networks. While these placement enhancements are not in place this feature will only support networks with a single network segment having a physnet defined.
nested-resource-providers to allow modelling the networking RPs
granular-resource-request to allow requesting each port related resource from a single RP
QoS minimum bandwidth allocation in Placement API for the Neutron impacts
Tempest tests as well as functional tests will be added to ensure that server create operation, server move operations, shelve_offload and unshelve and interface detach work with QoS aware ports and the resource allocation is correct.
User documentation about how to use the QoS aware ports.
Reworked after several discussions