Return Selection Objects¶
https://blueprints.launchpad.net/nova/+spec/return-selection-objects
In Queens, we will be changing what we return from select_destinations() in order to both provide additional ‘alternate’ hosts for each requested instance and also the allocation_request for building on each host. Returning this as an unstructured chunk of data will be fragile and potentially confusing. It would be far better to create an object to hold this data and make it accessible in a simpler and documented way.
Problem description¶
Before Queens, the scheduler’s select_destinations() method returned a list, containing a dictionary representing the selected host for each requested instance. In Queens, we need to return much more information to the caller of select_destinations(). We could attempt to return a list of HostDicts, which represents the selected host as in the past, along with zero or more ‘alternate’ hosts that are in the same cell and also meet the requested resources. Additionally, each of these will also be accompanied by a dictionary for the allocation_request required to claim that host. The end result will be a list, with one item per requested instance. Each item in that list will be a list of 2-tuples of (HostState, allocation_request). The HostState is a simple dict, but the allocation_request is itself a complex nested dict.
The result of these changes would mean that the data returned would be a complex nested combination of dictionaries, lists, and tuples. This data structure would be both difficult to understand how to use correctly, and confusing to developers looking at the code for the first time (or even after a period of being away from it). It is also unversioned, meaning it is impossible to track and respond to future changes in a reliable manner.
Use Cases¶
As an experienced Nova developer, I would like to be able to write code that uses the information returned from select_destinations() without having to decipher a complex data structure.
As a newcomer to the Nova codebase, I would like to be able to read code that is clear so that I can work with it quicker and with more confidence that my changes won’t break something.
Proposed change¶
We propose to create a new Selection object that would contain the data that represents a single destination: both the host information as well as the corresponding allocation_request needed for claiming. The host information, which is currently in a dictionary containing hostname, nodename, and limits keys, will now be stored as simple object fields along with the allocation_request. Additionally, the compute_node_uuid field will be added, as it would be useful to have this available in some of our allocation cleanup tasks.
There is no need for a corresponding SelectionList object, as there is no need for DB creation or retrieval. The select_destinations() method will return simple Python lists of Selection objects. The Scheduler will return one list of Selection objects for each requested instance, representing the selected host as well as any alternates.
Alternatives¶
We could cache the allocation_request data in placement, and simply return a key along with the resource providers. When a claim needs to be made, the key would be POSTed instead of the full allocation_request data, and Placement would use the cached data to carry out the claim. This has the advantage that nothing on the Nova side of things ever uses the data in the allocation_request; to Nova, it’s an opaque blob. The downside, of course, is that Placement would have to handle the cache.
We could return the full allocation_request data to the scheduler, and then handle the caching and key management on the Nova side. When a claim/unclaim is needed, the allocation_request would be retrieved from this cache and POSTed to placement. This alternative doesn’t require any changes to placement, but requires that both the API-level cell and all local cells have access to some form of ‘global ram’ cache that is accessible across cells.
We could just return an unstructured bunch of Python data, and add a ton of comments everywhere it is used in the hope that anyone looking at the code would understand what each bit represents, and that every future change to the data required would be able to be handled without versioning.
Data model impact¶
There will be no changes to any database schemas, but this will introduce a new versioned object. This object will contain the following fields, along with their types:
* compute_node_uuid: fields.UUIDField
* service_host: fields.StringField
* nodename: fields.StringField
* cell_uuid: fields.UUIDField
* numa_limits: fields.ObjectField("NUMATopologyLimits")
* allocation_request: fields.StringField
There isn’t a good field type for the allocation_request value, as it is a complex nested structure, so instead we’ll store it as its JSON respresentation in a StringField. The structure of an allocation_request, as described in this spec[2], looks like:
"allocations": [
{
"resource_provider": {
"uuid": $COMPUTE_NODE1_UUID
},
"resources": {
"VCPU": $AMOUNT_REQUESTED_VCPU,
"MEMORY_MB": $AMOUNT_REQUESTED_MEMORY_MB
}
},
{
"resource_provider": {
"uuid": $SHARED_STORAGE_UUID
},
"resources": {
"DISK_GB": $AMOUNT_REQUESTED_DISK_GB
}
},
]
REST API impact¶
None
Security impact¶
None
Notifications impact¶
None
Other end user impact¶
None
Performance Impact¶
None
Other deployer impact¶
None
Developer impact¶
It will make life a little easier for anyone working with the Nova codebase by not making them decipher complex data structures, but other than that, none.
Implementation¶
Assignee(s)¶
- Primary assignee:
ed-leafe
- Other contributors:
None
Work Items¶
Create the Selection object.
Modify the scheduler’s select_destinations() method to populate these objects with the selected host info and return them.
Dependencies¶
None
Testing¶
This is one part of the overall sweeping changes being made in Queens, and all of it will have to be tested. The Selection object will need some basic tests, but the bulk of the testing will be in the conductor to verify that it is working with Selection objects for host selection, resource claiming, and retries on build failures.
Documentation Impact¶
The developer reference docs will need to be updated to document this new object. The docs for the scheduler workflow will also need to be updated to reflect these changes.
References¶
The initial problem was documented in a blog post[0], and was then discussed at the Nova Scheduler subteam meeting[1], where this approach was agreed upon.
[0] https://blog.leafe.com/handling-unstructured-data/ [1] http://eavesdrop.openstack.org/meetings/nova_scheduler/2017/nova_scheduler.2017-08-28-14.00.log.html#l-140 [2] https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/placement-allocation-requests.html
History¶
Release Name |
Description |
---|---|
Queens |
Introduced |