Granular Resource Request Syntax

https://blueprints.launchpad.net/nova/+spec/granular-resource-requests

As Generic and Nested Resource Providers begin to crystallize and be exercised, it becomes necessary to be able to express:

  • Requirement 1: Requesting an allocation of a particular resource class with a particular set of traits, and requesting a different allocation of the same resource class with a different set of traits.

  • Requirement 2: Ensuring that requests of certain resources are allocated from the same resource provider (affinity).

  • Requirement 3: Ensuring that requests of certain resources are allocated from different resource providers (anti-affinity).

  • Requirement 4: The ability to spread allocations of effectively-identical resources across multiple resource providers in situations of high saturation (“any fit”).

This specification attempts to address these requirements by way of a numbered syntax on resource and trait keys in flavor extra_specs and the GET /allocation_candidates Placement API.

Note

This document uses “RP” as an abbreviation for “Resource Provider” throughout.

Problem description

Up to this point with generic and nested resource providers and traits, it is only possible to request a single blob of resources with a single blob of traits. More specifically:

  • The resources can only be expressed as an integer count of a single resource class. There is no way to express a second resource_class:count with the same resource class.

  • All specified traits apply to all requested resources. There is no way to apply certain traits to certain resources.

  • All resources of a given resource class are allocated from the same RP.

The Use Cases below exemplify scenarios that cannot be expressed within these restrictions.

Use Cases

Consider the following hardware representation (“wiring diagram”):

+-----------------------------------+
|                CN1                |
+-+--------------+-+--------------+-+
  |     NIC1     | |     NIC2     |
  +-+---+--+---+-+ +-+---+--+---+-+
    |PF1|  |PF2|     |PF3|  |PF4|
    +-+-+  +-+-+     +-+-+  +-+-+
       \      \__   __/      /
        \        \ /        /
        |         X         |
        |    ____/ \____    |
        |   /           \   |
      +-+--+-+         +-+--+-+
      | NET1 |         | NET2 |
      +------+         +------+

Assume this is modeled in Placement as:

RP1 (represents PF1):
{
    SRIOV_NET_VF=16,
    NET_EGRESS_BYTES_SEC=1250000000,  # 10Gbps
    traits: [CUSTOM_NET1, HW_NIC_ACCEL_SSL]
}
RP2 (represents PF2):
{
    SRIOV_NET_VF=16,
    NET_EGRESS_BYTES_SEC=1250000000,  # 10Gbps
    traits: [CUSTOM_NET2, HW_NIC_ACCEL_SSL]
}
RP3 (represents PF3):
{
    SRIOV_NET_VF=16,
    NET_EGRESS_BYTES_SEC=125000000,  # 1Gbps
    traits: [CUSTOM_NET1]
}
RP4 (represents PF4):
{
    SRIOV_NET_VF=16,
    NET_EGRESS_BYTES_SEC=125000000,  # 1Gbps
    traits: [CUSTOM_NET2]
}

Use Case 1

As an Operator, I need to be able to express a boot request for an instance with one SR-IOV VF on physical network NET1 and a second SR-IOV VF on physical network NET2.

I expect the scheduler to receive the following allocation candidates:

  • [RP1(SRIOV_NET_VF:1), RP2(SRIOV_NET_VF:1)]

  • [RP1(SRIOV_NET_VF:1), RP4(SRIOV_NET_VF:1)]

  • [RP3(SRIOV_NET_VF:1), RP2(SRIOV_NET_VF:1)]

  • [RP3(SRIOV_NET_VF:1), RP4(SRIOV_NET_VF:1)]

This demonstrates the ability to get different allocations of the same resource class from different providers in a single request (Requirement 1).

Use Case 2

Request: one VF with egress bandwidth of 10000 bytes/sec. (No, it doesn’t make sense that I don’t care which physnet I’m on – mentally replace NET with SWITCH if that bothers you.)

Expect:

  • [RP1(SRIOV_NET_VF:1), RP1(NET_EGRESS_BYTES_SEC:10000)]

  • [RP2(SRIOV_NET_VF:1), RP2(NET_EGRESS_BYTES_SEC:10000)]

  • [RP3(SRIOV_NET_VF:1), RP3(NET_EGRESS_BYTES_SEC:10000)]

  • [RP4(SRIOV_NET_VF:1), RP4(NET_EGRESS_BYTES_SEC:10000)]

This demonstrates the ability to ensure that allocations of different resource classes can be made to come from the same resource provider (Requirement 2).

Use Case 3

Request:

  • One VF on NET1 with bandwidth 10000 bytes/sec

  • One VF on NET2 with bandwidth 20000 bytes/sec on a NIC with SSL acceleration (This one should always land on RP2.)

Expect:

* [RP1(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:10000),
RP2(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:20000)]
* [RP3(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:10000),
RP2(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:20000)]

This demonstrates both Requirement 1 and Requirement 2.

Use Case 4

In a high-availability scenario, request two VFs on NET1 from different PFs.

Expect:

  • [RP1(SRIOV_NET_VF:1), RP3(SRIOV_NET_VF:1)]

But not either of:

  • [RP1(SRIOV_NET_VF:2)]

  • [RP3(SRIOV_NET_VF:2)]

This demonstrates Requirement 3.

Use Case 5

As an Operator, I need to be able to express a request for more than one VF and have the request succeed even if my PFs are nearly saturated. For this use case, assume that each PF resource provider has only two VFs unallocated. I need to be able to express a request for four VFs on NET1.

Expect: [RP1(SRIOV_NET_VF:2), RP3(SRIOV_NET_VF:2)]

This demonstrates Requirement 4.

Proposed change

Numbered Request Groups

With the existing syntax (once Dependencies land), a resource request can be logically expressed as:

resources = { resource_classA: rcA_count,
              resource_classB: rcB_count,
              ... },
required = [ TRAIT_C, TRAIT_D, ... ]

Semantically, each resulting allocation candidate will consist of resource_classN: rcN_count resources spread arbitrarily across resource providers within the same tree (i.e. all resource providers in a single allocation candidate will have the same root_provider_uuid). Each resource provider in each resulting allocation candidate will possess all of the listed required traits.

Note

When shared resource providers are fully implemented, the above will read, “…spread arbitrarily across resource providers within the same tree or aggregate”.

Also, it is unsupported for resource classes or traits to be repeated.

The proposed change is to augment the above to include numbered resource groupings as follows:

Logical Representation

resources = { resource_classA: rcA_count,
              resource_classB: rcB_count,
              ... },
required = [ TRAIT_C, TRAIT_D, ... ],

resources1 = { resource_class1A: rc1A_count,
               resource_class1B: rc1B_count,
               ... },
required1 = [ TRAIT_1C, TRAIT_1D, ... ],

resources2 = { resource_class2A: rc2A_count,
               resource_class2B: rc2B_count,
               ... },
required2 = [ TRAIT_2C, TRAIT_2D, ... ],

...,

resourcesX = { resource_classXA: rcXA_count,
               resource_classXB: rcXB_count,
               ... },
requiredX = [ TRAIT_XC, TRAIT_XD, ... ],

group_policy = "none"|"isolate"

Semantics

The term “results” is used below to refer to the contents of one item in the allocation_requests list within the GET /allocation_candidates response.

  • The semantic for the (single) un-numbered grouping is unchanged. That is, it may still return results from different RPs in the same tree (or, when “shared” is fully implemented, the same aggregate).

  • However, a numbered group will always return results from the same RP. This is to satisfy Requirement 2.

  • With group_policy=none, separate groups (numbered or un-numbered) may return results from different RPs or the same RP (assuming isolation is not otherwise forced e.g. via traits or inventory/usage constraints).

  • With group_policy=isolate, numbered request groups are guaranteed to be satisfied by separate RPs. This applies only to numbered request groups. That is, resources within the un-numbered group are still able to be provided by any RPs in the tree (or aggregate); and there is no restriction between the RPs satisfied by the un-numbered group and those satisfied by the numbered groups.

  • The group_policy option is required when more than one numbered group is specified; omitting it will result in a 400 error.

  • It is still not supported to repeat a resource class within a given (numbered or un-numbered) resources grouping, but there is no restriction on repeating a resource class from one grouping to the next. The same applies to traits. This is to satisfy Requirement 1.

  • A given requiredN list applies only to its matching resourcesN list. This goes for the un-numbered required/resources as well.

  • The numeric suffixes are arbitrary. Other than binding resourcesN to requiredN, they have no implied meaning. In particular, they are not required to be sequential; and there is no semantic significance to their order.

  • For both numbered and un-numbered resources, a single resource_class:count will never be split across multiple RPs. While such a split could be seen to be sane for e.g. VFs, it is clearly not valid for e.g. DISK_GB. If you want to be able to split, use separate numbered groups. This satisfies Requirement 4.

  • Specifying a resources (numbered or un-numbered) without a corresponding required returns results unfiltered by traits.

  • It is an error to specify a required (numbered or un-numbered) without a corresponding resources.

Syntax In Flavors

In reference to the Logical Representation, the existing (once Dependencies have landed) implementation is to specify resources and required traits in the flavor extra_specs as follows:

  • Each member of resources is specified as a separate extra_specs entry of the form:

resources:resource_classA=rcA_count
  • Each member of required is specified as a separate extra_specs entry of the form:

trait:TRAIT_B=required

For example:

resources:VCPU=2
resources:MEMORY_MB=2048
trait:HW_CPU_X86_AVX=required
trait:CUSTOM_MAGIC=required

Proposed: Allow the same syntax for numbered resource and trait groupings via the number being appended to the resources and trait keyword:

resourcesN:resource_classC=rcC_count
traitN:TRAIT_D=required

A given numbered resources or trait key may be repeated to specify multiple resources/traits in the same grouping, just as with the un-numbered syntax.

Specify inter-group affinity policy via the group_policy key, which may have the following values:

  • isolate: Different numbered request groups will be satisfied by different providers.

  • none: Different numbered request groups may be satisfied by different providers or common providers.

For example:

resources:VCPU=2
resources:MEMORY_MB=2048
trait:HW_CPU_X86_AVX=required
trait:CUSTOM_MAGIC=required
resources1:SRIOV_NET_VF=1
resources1:NET_EGRESS_BYTES_SEC=10000
trait1:CUSTOM_PHYSNET_NET1=required
resources2:SRIOV_NET_VF=1
resources2:NET_EGRESS_BYTES_SEC:20000
trait2:CUSTOM_PHYSNET_NET2=required
trait2:HW_NIC_ACCEL_SSL=required
group_policy=isolate

Syntax In the Placement API

In reference to the Logical Representation, the existing (once Dependencies have landed) Placement API implementation is via the GET /allocation_candidates querystring as follows:

  • The resources are grouped together under a single key called resources whose value is a comma-separated list of resource_classN:rcN_count.

  • The traits are grouped together under a single key called required whose value is a comma-separated list of TRAIT_Y.

For example:

GET /allocation_candidates?resources=VCPU:2,MEMORY_MB:2048
    &required=HW_CPU_X86_AVX,CUSTOM_MAGIC

Proposed: Allow the same syntax for numbered resource and trait groupings via the number being appended to the resources and required keywords, and require a group_policy to be specified when more than one numbered grouping is given. In the following example, groups 1 and 2 represent Use Case 3:

GET /allocation_candidates?resources=VCPU:2,MEMORY_MB:2048
    &required=HW_CPU_X86_AVX,CUSTOM_MAGIC
    &resources1=SRIOV_NET_VF:1,NET_EGRESS_BYTES_SEC:10000
    &required1=CUSTOM_PHYSNET_NET1
    &resources2=SRIOV_NET_VF:1,NET_EGRESS_BYTES_SEC:20000
    &required2=CUSTOM_PHYSNET_NET2,HW_NIC_ACCEL_SSL
    &group_policy=none

The following example demonstrates the use of group_policy=isolate and represents Use Case 4 by ensuring that the two VFs come from different providers, even though they are otherwise identical:

GET /allocation_candidates
    ?resources1=SRIOV_NET_VF:1
    &required1=CUSTOM_PHYSNET_NET1
    &resources2=SRIOV_NET_VF:1
    &required2=CUSTOM_PHYSNET_NET1
    &group_policy=isolate

There is no change to the response payload syntax.

Alternatives

  • Requirement 2 could also be expressed via aggregates by associating each RP with a unique aggregate, once shared resource providers are fully implemented.

  • We could allow the “number” suffixes to be any arbitrary string. However, using integers is easy to understand and validate, and obviates worries about escaping/encoding special characters, etc.

  • There has been discussion over time about the need for a JSON payload-based API to enable richer expression to request allocation candidates. While this is still a possibility for the future, it was considered unnecessary in this case, as the current requirements can be met via the proposed (relatively simple) enhancements to the querystring syntax of the existing GET /allocation_candidates API.

  • Much discussion has occurred around whether and how to satisfy both anti-affinity (Requirement 3) and “any fit” (Requirement 4). See the separate_providers proposal, the can_split proposal, and the mailing list thread for details.

Data model impact

None.

REST API impact

See Syntax In the Placement API. To summarize, the GET /allocation_candidates Placement API is modified to accept arbitrary query parameter keys of the format resourcesN and requiredN, where N can be any integer. The format of the values to these query parameters is identical to that of resources and required, respectively.

Otherwise, there is no REST API impact.

Security impact

None

Notifications impact

None

Other end user impact

Operators will need to understand the Syntax In Flavors and the Semantics of the changes in order to create flavors exploiting the new functionality. See Documentation Impact.

There is no impact on the nova or openstack CLIs. The existing CLI syntax is adequate for expressing the newly-supported extra_specs keys.

Performance Impact

Use of the new syntax results in the GET /allocation_candidates Placement API effectively doing multiple lookups per request. This has the potential to impact performance in the database by a factor of N+1, where N is the number of numbered resource groupings specified in a given request. Clever SQL expression may reduce or eliminate this impact.

There should be no impact outside of the database, as this feature should not result in a significant increase in the number of records returned by the GET /allocation_candidates API (if anything, the increased specificity will decrease the number of results).

Other deployer impact

None

Developer impact

Developers of modules supplying Resource Provider representations (e.g. virt drivers) will need to be aware of this feature in order to model their RPs appropriately.

Upgrade impact

None

Implementation

Assignee(s)

  • efried

Work Items

Implementation work was begun in Queens. Several patches were merged; the remaining patches have been started but are waiting on dependencies.

https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/granular-resource-requests

Scheduler

  • Negotiate microversion capabilities with the Placement API.

  • Recognize and parse the new Syntax In Flavors.

  • If the new flavor extra_specs syntax is recognized and the Placement API is not capable of the appropriate microversion, error.

  • Construct the GET /allocation_candidates querystring according to the flavor extra_specs.

  • Send the GET /allocation_candidates request to Placement, specifying the appropriate microversion if the new syntax is in play.

Placement

  • Publish a new microversion.

  • Recognize and parse the new GET /allocation_candidates querystring key formats if invoked at the new microversion.

  • Construct the appropriate database query/ies.

  • Everything else is unchanged.

Dependencies

This work builds on reapproval and completion of the Nested Resource Providers effort.

Testing

Functional tests, including gabbits, will be added to exercise the new syntax. New fixtures may be required to express some of the more complicated configurations, particularly involving nested resource providers. Test cases will be designed to prove various combinations and permutations of the items listed in Semantics. For example, a GET /allocation_candidates request using both numbered and un-numbered groupings against a placement service containing multiple nested resource provider trees with three or more levels and involving trait propagation. Migration scenarios will also be tested.

Documentation Impact

  • The Placement API reference will be updated to describe the new syntax to the GET /allocation_candidates API.

  • The Placement Devref will be updated to describe the new microversion.

  • Admin documentation (presumably the same as introduced/enhanced via the Traits in Flavors effort) will be updated to describe the new Syntax In Flavors.

References

History

Revisions

Release Name

Description

Queens

Introduced, approved, implementation started

Rocky

Reproposed