Support disabling a cell¶
It would be useful to have a mechanism by which we could totally stop scheduling to a particular cell or a group of cells by supporting the concept of disabling cells. Given that we do not have any existing means to disable a cell, this spec proposes a simple solution to support this new feature in nova.
Currently we have a number of ways to pre-select cells into which we want the VMs to be scheduled into, like using host aggregates or scheduler filters. These mechanisms however are white listing and selecting suitable hosts by which we are indirectly able to pre-select a cell. So although we have ways to remove undesired hosts from being selected for scheduling, large deployments may not always want to engage on a host-level. If they want to just stop scheduling to a set of cells, presently they would somehow have to exclude all the hosts in those cells from being considered by the scheduler since there is no way to simply black list those set of cells.
So the problem that this spec is trying to address is the fact that there is no elegant way to block scheduling to a group of cells.
As an operator, I wish to disable a group of cells (like during failures or interventions when new instances should not be spawned) so as to stop scheduling to them without having to deal with the individual compute nodes (micromanagement).
This spec aims to make a change in the
nova_api.cell_mappings table schema
and add a new field to the
CellMapping object through which the
host_manager of the scheduler will become aware of the cells which are
disabled and there by not query for those compute nodes and services which
belong to the disabled cells while getting the host states of the hosts
returned by placement to the scheduler. A detailed procedure of how this is
aimed to be implemented is explained below:
Add a new column
disabledto the nova_api.cell_mappings table which can be set to either True or False for each record. Setting it to True means that cell is disabled; and so by default this value will be set to False.
Add a new field
disabledto the CellMapping object which will represent the value of the newly added column in the cell_mappings table.
Add a query method to CellMappingList object, to query for only the enabled cells.
Presently the scheduler calls the host_manager to get_host_states_by_uuids and the host_manager queries for the compute_nodes and services in the cells by calling _get_computes_for_cells from get_host_states_by_uuids function. While loading the cells in the get_host_states_by_uuids function, the disabled cells will be filtered out and only the enabled cells will be passed to the _get_computes_for_cells function by using the new query added to CellMappingList. Hence only the states of hosts in the enabled cells will be passed back to the filter scheduler so that no scheduling happens to the disabled cells.
Since the list of cells are currently cached globally (better performance) after every enabling/disabling action of any cell, this cache will be refreshed so that the new changes are reflected. The refreshing will be done using a “SIGHUP” handler that will be created in the scheduler and a signal to this handler will be made during the change to disabled column.
Since we have the nova-manage utility for the operators the nova-manage command to update the fields in the cell_mappings table can be reused in the following manner, thus allowing the operator to enable/disable a cell.
Add new flags to
nova-manage cell_v2 update_cellcommand -
nova-manage cell_v2 --update_cell --cell_uuid <cell_uuid> [--disable]
which will disable an enabled cell, meaning set the
disabledfield of this cell’s cell_mapping record in the api DB to 1.
nova-manage cell_v2 --update_cell --cell_uuid <cell_uuid> [--enable]
which will enable a disabled cell, meaning set the
disabledfield of this cell_mapping record back to 0.
When creating a new cell, by default the cell will be in enabled state, however
disabled will be added to the
nova-manage cell_v2 create_cell
command by which the users will be able to create pre-disabled cells which can
be enabled later whenever needed.
Also the disabled column will be added to the list of columns to be displayed
nova-manage cell_v2 list_cells command since it will be useful
for the operators.
The scope of this spec is limited to considering the scenario of using a filter scheduler since that is the maintained scheduler. Also note that this spec only focuses on stopping new scheduling to the disabled cells and does not hamper any user operations for existing VMs in the disabled cells like resizing. For example, even if the RequestSpec.request_destination.cell is set to a disabled cell this operation will not be blocked.
This could also be implemented as a post-placement filter enabled through a config boolean in the scheduler to filter out disabled cells, but since this would anyways still need a new field in cell_mappings, it would be more integrated if this is implemented through a simple change in query in the host_manager.
Another alternative would be loop through all the compute services in that cell and enable/disable them, but this may not be ideal in cases of cells having large number of computes.
Data model impact¶
A nova_api DB schema change will be required for adding the
of type Boolean to the
nova_api.cell_mappings table. An api_migration will
be required. This column will be set to False by default.
CellMapping object will need to gain a new field called
REST API impact¶
Other end user impact¶
Users will gain two new options to the existing
update_cell command called
enable plus a new option
disabled to the existing
nova-manage cell_v2 create_cell command. The
documentation will be updated to benefit the users.
There will not be any major impact on performance. Instead of the scheduler querying for all the cells to get the host states it will query for only enabled cells.
Other deployer impact¶
There will not be any impact on the deployer operations since by default all the cells will be enabled and scheduling will work normally. Supporting cell disable will only make it more agile since the deployer can now block scheduling to a group of cells, rather than involving in micromanagement of services, meaning individually tend to each service in those cells by filtering them out or disabling each compute service in that cell.
Since there will be a change in the api DB schema, the
sync command will have to be run to update the cell_mappings table.
- Primary assignee:
- Other contributors:
Add a new column
disabledto nova_api.cell_mappings table.
Add a new field
disabledto CellMapping object.
Add a query method to CellMappingList to obtain all the cell mapping records of enabled cells.
Change the method of querying for the host states in the host_manager to only query in the enabled cells and add a SIGHUP handler.
Add the new flags to the nova-manage cell_v2 update_cell command.
Add the new flag to the nova-manage cell_v2 create_cell command.
Modify the nova-manage cell_v2 list_cells command to print the new column.
Unit and functional tests for verifying the working of the disabling mechanism
The nova-manage documentation for the users would be updated by documenting
the new flags for the
nova-manage cell_v2 update_cell command and
nova-manage cell_v2 create_cell command in nova-manage.rst file.