Nova Server Count API Extension

https://blueprints.launchpad.net/nova/+spec/server-count-api

This blueprint proposes a new REST API extension that returns the number of servers that match the specified search criteria.

Problem description

There is no current API that can retrieve summary count data for servers that match a variety of search filters. For example, getting the total number of servers in a given state.

Retrieving all servers and then manually determining the count data does not scale because pagination queries must be implemented (see Alternatives section for a detailed explanation).

The use cases that are driving this API extension are derived from a user’s experience in a GUI.

Use Case 1: A UI dashboard that contains servers in various states for a cloud administrator. A new API extension is needed to retrieve the server count data associated with various filters (ie, servers in active state, servers in building state, servers in error state, etc.) for the entire cloud.

Assume that you have 5k instances in your cloud. The admin wants to see a summary of instances in each state – this API extension will help them quickly determine if there is an issue that need attention; for example, if there are many instances in ‘error’. It is likely that once the admin sees this count that they will then drill down into the data. However, without this new API extension, the admin will not know if there are unacceptable number of systems in a given state without drilling down into each set.

From a deployer’s perspective, creating this dashboard with the existing APIs is very painful since pagination is required (assume more then the default of 1k items). Also, processing time to get this data using the existing APIs (even the non-detailed) is slow (and possibly inaccurate – see #3) compared to the processing time to get and return a single number.

Use Case 2: Showing filtered data in a table in the UI. Assume that the UI supports tables that show filtered data (ie, table just showing instances in ‘error’ state) and uses pagination to get the data. Many users do not like “infinite scrolling” where they have no idea how many items really are in the list (more just show up as you scroll down or navigate to the next page). Using this new count API, the UI table can indicate how many total items are in the list (ie, showing 1-20 of 1000).

Assume that you have 500 instances in error state and that you can open a UI table showing their details – when creating the table, assume that the UI uses a page size of 100 and assume that there is no dashboard showing the ‘error’ count. In this case, the admin logs into the UI and wants to know how many servers are in error state. In order to do this, the admin navigates to the ‘servers in error state’ table – the UI only retrieves the first 100 items so it impossible to know if there are 101 total items or 500 total items. As an admin, I would like to know what the total number of items in the table is.

Use Case 3: Inherent timing window when adding a new item with limit/marker processing. Assume that you are using pagination to iterate over the data to get a count. When you are getting page n, it is possible that page n-1 has a new item x that was just added. Due to the sorting of the data, limit/marker will not detect that this new item was added.

While this timing window is small, it does exist so getting an accurate count using this method is not guaranteed to be accurate.

I realize that you can argue that the count API may not handle this UI use case either. However, the count will always be accurate from the DB at the time that the .count() function was processed – the same claim cannot be made about getting the count using limit/marker since multiple DB calls are being invoked to calculate the number.

Proposed change

The new count API extension must accept that same filter values as the existing /servers and /servers/details APIs and re-use the existing filter processing (once the common parts are refactored into utility methods that can be utilized by both paths). Once the filters are processed to create the query object, then the number of matching servers will be retrieved and returned from the database.

The count API extension will be both per tenant and global (admin-only), similar to the existing /servers APIs. An admin can supply the ‘all_tenants’ parameter to signify that server count data should be retrieved globally.

This new flow requires new functions to retrieve the count value in the compute API layer, in the instance layer, and in the database layers; all functions return an integer value. The naming conventions for the functions will follow the existing functions used for retrieving server instances, for example:

  • Compute API: get_count function

  • Instance layer (InstanceList class): get_count_by_filters function

  • DB layer: instance_count_by_filters function

  • Sqlalchemy layer: instance_count_by_filters function

In the sqlalchemy DB layer, the filter processing (for processing exact name filters, regex filters, and tag filters) needs to be moved into a common function so that both the new count API extension and the existing get servers APIs can utilize it. Once the query object is created, then the count() function is invoked to retrieve the total number of matching servers for the given query.

For the v2 API extension, the existing filtering pre-processing done in nova.api.openstack.compute.servers.Controller._get_servers needs to be moved into a static utility method so that the new count API extension can utilize it; this is critical so that the filtering support for the count API matches the filtering support for the /servers API.

For the v3 API, a new count function (similar to ‘index’ and ‘detail’) needs to be added to nova.api.openstack.compute.plugins.v3.servers directly. Common filter processing needs to broken out into utility functions (same idea as the v2 API). For v3, the ‘count’ GET API can be registered with the Servers extensions.V3APIExtensionBase directly.

Alternatives

Other APIs exist that return count data (quotas and limit) but they do not accept filter values.

A user could accomplish the same result (less the timing window noted in Use Case #3) using the existing non-detailed /servers API with a filter and then count up the results. However, the primary use case for this blueprint is getting summary count data at scale. For example, if the total cloud has 5k VMs then doing paginated queries to iterate over the non-detailed ‘/servers’ API with a filter and limit/marker is really inefficient – the API is going to return more data then the user cares about (and do a lot of processing to get it). Assume that there are 2,500 instances in an active state; if the non-detailed query (and the default limit of 1k) is used then the application would have to make 3 separate REST API calls to get the all of the VMs and, at the DB layer, the marker processing would be used to find the correct page of data to return. Since the user only cares about a summary count, then the most efficient mechanism to retrieve that data would be a single DB query using the count() function.

Note that the default maximum page set is set on the server (default of 1k); therefore, a user MUST HANDLE pagination since the number of items being queried may be greater then the default.

There are other options for how the v2 and v3 APIs can be registered. For v2, the new count API could be registered by modifying the API routing in nova.api.openstack.compute.__init__.APIRouter directly (to create the /servers/count API just like /server/detail). Since v3 is still experimental, this blueprint is proposing that the count API is baked into nova.api.openstack.compute.plugins.v3.servers directly.

I cannot think of alternative implementations. The new API needs to utilitize the existing filter processing as the current /servers APIs in order to ensure consistency and prevent dual maintenance.

Data model impact

None

REST API impact

The response for the existing /servers and /servers/detail REST APIs will not be affected.

  • New v2 API extension:

    • Name: ServerCounts

    • Alias: os-server-counts

  • NEW v2 URL: v2/{tenant_id}/servers/count

  • NEW v3 URL: v3/servers/count

  • Description: Get number of servers

  • Method type: GET

  • Normal Response Codes: Same as the ‘v2/{tenant_id}/servers/detail’ API):

    • 200

    • 203

  • Error Response Codes (same as the ‘v2/{tenant_id}/servers/detail’ API):

    • computeFault (400, 500, …)

    • serviceUnavailable (503)

    • badRequest (400)

    • unauthorized (401)

    • forbidden (403)

    • badMethod (405)

  • Parameters (same as the ‘v2/{tenant_id}/servers’ API except the ‘limit’ and ‘marker’ parameters):

Parameter

Style

Type

Description

all_tenants (optional)

query

xsd:boolean

Display server count information from all tenants (Admin only).

changes-since (optional)

query

xsd:dateTime

A time/date stamp for when the serverlast changed status.

image (optional)

query

xsd:anyURI

Name of the image in URL format.

flavor (optional)

query

xsd:anyURI

Name of the flavor in URL format.

name (optional)

query

xsd:string

Name of the server as a string.

status (optional)

query

csapi:Server Status

Value of the status of the server so that you can filter on “ACTIVE” for example.

  • JSON schema definition for the body data: N/A

  • JSON schema definition for the response data: {“count”: <int>}

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

None – This new API is not introducing any new DB joins that would affect performance.

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

Steven Kaufer

Other contributors:

<launchpad-id or None>

Work Items

  • Move filter processing code into utility functions at the API layer and at the DB sqlalchemy layer.

  • Create new API functions in the various layers to get the count data.

  • v2 API extension and v3 API updates to expose the new count API function.

Dependencies

Related (but independent) change being proposed in cinder: https://blueprints.launchpad.net/cinder/+spec/volume-count-api

Testing

Both unit and Tempest tests need to be created to ensure that the count data is accurate for various filters.

Testing should be done against multiple backend database types.

Documentation Impact

Document the new v2 API extension and v3 API updates (see “REST API impact” section for details).

References

None