Isolate Scheduler Database for Aggregates¶
We want to split out nova-scheduler into gantt. To do this, this blueprint is the second stage after scheduler-lib split. These two blueprints are independent however.
In this blueprint, we need to isolate all accesses to the database that Scheduler is doing and refactor code (manager, filters, weighters) so that scheduler is only internally accessing scheduler-related tables or resources.
Note : this spec is only targeting changes to the Aggregates-related filters.
When making decisions involving information about an aggregate, the scheduler accesses the Nova DB’s aggregates table either directly or indirectly via nova.objects.AggregateList. In order for the split of the scheduler to be clean, any access by the Nova scheduler to tables that will stay in the Nova DB (i.e. aggregates table) must be refactored so that the scheduler has an API method that allows nova-conductor or other services to update the scheduler’s view of aggregate information.
Below is the summary of all filters impacted by that proposal
AggregateCoreFilter (calls n.objects.aggregate.AggregateList.get_by_host)
AggregateRamFilter (calls n.objects.aggregate.AggregateList.get_by_host)
AggregateTypeAffinityFilter (calls n.objects.aggregate.AggregateList.get_by_host)
N/A, this is a refactoring effort.
This blueprint is part of the ‘scheduler’ refactoring effort identified as a priority for Kilo.
The strategy will consist in updating the scheduler each time a change comes to an Aggregate (adding or removing a host or changing metadata).
As the current Scheduler design scales with the number of requests (for each request, a new HostState object is generated using get_all_host_states method in the HostManager module), we can’t hardly ask the Scheduler to update a DB each time a new compute comes in an aggregate. It would then create a new paradigm where the Scheduler would scale with the number of computes added to aggregates and which could create some race conditions.
Instead, we propose to create an in-memory view of all the aggregates in the
Scheduler which would be populated when the scheduler is starting by calling
the Nova Aggregates API and leave the filters access these objects instead of
calling by themselves the Nova aggregates DB table indirectly.
Updates to the Aggregates which are done using the
nova.compute.api.AggregateAPI will also call the Scheduler RPC API to ask
the Scheduler to update the relevant view.
Obviously, the main concern is about duplicating aggregates information and the potential race conditions that can occur. In our humble opinion, duplicating the information in the Scheduler memory is a small price to pay for making sure that the Scheduler could one day live by its own.
A corollary would be to consider that if duplication is not good, then the Scheduler should fully own the Aggregates table. Consequently, all the calls in the nova.compute.api.AggregatesAPI would be treated as “external” calls and once the Scheduler would be splitted out, the Aggregates would no longer reside in Nova.
Another mid-term approach would be to envisage a second service for the Scheduler (like nova-scheduler-updater - still very bad at naming…) which would accept RPC API calls and write the Scheduler DB separatly from the nova-scheduler service which would actually be treated like a “nova-api”-ish thing because we could consider that the warmup period for the Scheduler for populating the relative HostState informations could be problematic and we could prefer to persist all these objects into the Scheduler DB.
Finally, we definitely are against calling Aggregates API from the Scheduler each time a filter needs information because it doesn’t scale.
Data model impact¶
None, we only create an in-memory object which won’t be persisted.
REST API impact¶
None. The atomicity of the operation (adding/modifying an Aggregate) remains identical, we don’t want to add 2 notifications for the same operation.
Other end user impact¶
Accesses should be done against a memory object instead of accessing the DB, so we definitely expect better access times and scalability should be improved.
Other deployer impact¶
Filters should no longer place calls to other bits of code except Scheduler. This will be done by modifying Scheduler component to proxy conductor calls to a Singleton which will refuse anything but scheduler-related objects. See footnote  as example. As said above, we will still provide a failback mode for Kilo release in order to have compatibility with N-1 release.
Here, we propose to set the collection of
nova.objects.AggregateList.get_all() during the initialization
nova.scheduler.host_state.HostManager as an attribute to HostManager.
In order to access the list of aggregates than an host belongs to, we plan
to add a list of references to the corresponding Aggregate objects as an
extra attribute of
nova.scheduler.host_state.HostState during that
The second phase would consist to provide updates to that caching system by amending the Scheduler RPC API by adding a new update_aggregate() method, which nova.scheduler.client would expose it too.
The update_aggregate() method would take only one argument, a
nova.objects.Aggregate object and would properly update the
HostManager.aggregates attribute so that the
reference would implicetely be updated.
Every time that an Aggregate would be updated, we would hook the existing nova.compute.api.AggregateAPI class and each method in it by adding another call to nova.scheduler.client which would RPC fanout the call to all nova-scheduler services.
Once all of that would be done, filters would just have to look into HostState.aggregates to access all aggregate information (incl. metadata) related to the aggregates the host belongs to.
- Primary assignee:
- Other contributors:
Instanciate HostManager.aggregates and HostState.aggregates when scheduler is starting
Add update_aggregate() method to the Scheduler RPC API and bump a version
Create nova.scheduler.client method for update_aggregate()
Modify nova.api.AggregateAPI methods to call the scheduler client method
Modify filters so they can look to HostState
Modify scheduler entrypoint to block conductor accesses to Aggregates (once Lxxx release development will be open)
Covered by existing tempest tests and CIs.