Simple tenant usage pagination

https://blueprints.launchpad.net/nova/+spec/paginate-simple-tenant-usage

The blueprint aims to add optional limit and marker parameters to the GET /os-simple-tenant-usage endpoints.

GET /os-simple-tenant-usage?limit={limit}&marker={instance_uuid}
GET /os-simple-tenant-usage/{tenant_id}?limit={limit}&marker={instance_uuid}

Problem description

The simple tenant usage API can return extremely large amounts of data and provides no way to paginate the results. Because the API does not use the pagination code, it doesn’t even respect the “max results” sanity limit. Because it can query a ton of data, it also causes the api workers to inflate their memory footprint to the size of the DB result set, which is large. Since horizon queries this by default, most users are affected unless their ops team is extremely diligent about purging deleted instances (which are returned by the API by design).

Use Cases

Horizon uses these endpoints to display server usage.

Proposed change

Add an API microversion that allows for pagination of the simple tenant usage results using Nova’a existing approach to pagination (optional limit and marker query parameters).

Pagination would be made available for both the “all tenants” (index) and “specific tenant” (show) cases.

List Tenant Usage For All Tenants
/os-simple-tenant-usage?limit={limit}&marker={instance_uuid}

Show Usage Details For Tenant
/os-simple-tenant-usage/{tenant_id}?limit={limit}&marker={instance_uuid}

Currently, the simple tenant usage endpoints include aggregate data (like total_hours) which is the sum of the hours for each instance in a specific time window, grouped by tenant.

Note

For clarity, I’ve removed all other usage response fields from the examples.

GET /os-simple-tenant-usage?detailed=1

{
    "tenant_usages": [
        {
            "server_usages": [
                {
                    "instance_id": "instance-uuid-1",
                    "tenant_id": "tenant-uuid-1",
                    "hours": 1
                },
                {
                    "instance_id": "instance-uuid-2",
                    "tenant_id": "tenant-uuid-1",
                    "hours": 1
                },
                {
                    "instance_id": "instance-uuid-3",
                    "tenant_id": "tenant-uuid-1",
                    "hours": 1
                }
            ],
            "tenant_id": "tenant-uuid-1",
            "total_hours": 3
        },
        {
            "server_usages": [
                {
                    "instance_id": "instance-uuid-4",
                    "tenant_id": "tenant-uuid-2",
                    "hours": 1
                }
            ],
            "tenant_id": "tenant-uuid-2",
            "total_hours": 1
        }
    ]
}

Once paging is introduced, API consumers would need to stitch together the aggregate results if they still want totals for all instances in a specific time window, grouped by tenant.

For example, that same data would be returned as follows if the limit query parameter was set to 2. Note that the totals on the first page of results only reflect 2 of the 3 instances for tenant-uuid-1, and that the tenant-uuid-1 totals on the second page of results only reflect the remaining instance for tenant-uuid-1. API consumers would need to manually add these totals back up if they want the totals to reflect all 3 instances for tenant-uuid-1.

/os-simple-tenant-usage?detailed=1&limit=2

{
    "tenant_usages": [
        {
            "server_usages": [
                {
                    "instance_id": "instance-uuid-1",
                    "tenant_id": "tenant-uuid-1",
                    "hours": 1
                },
                {
                    "instance_id": "instance-uuid-2",
                    "tenant_id": "tenant-uuid-1",
                    "hours": 1
                }
            ],
            "tenant_id": "tenant-uuid-1",
            "total_hours": 2
        },
    ],
    "tenant_usages_links": [
        {
            "href": "/os-simple-tenant-usage?detailed=1&limit=2&marker=instance-uuid-2",
            "rel": "next"
        }
    ]
}
/os-simple-tenant-usage?detailed=1&limit=2&marker=instance-uuid-2

{
    "tenant_usages": [
        {
            "server_usages": [
                {
                    "instance_id": "instance-uuid-3",
                    "tenant_id": "tenant-uuid-1",
                    "hours": 1
                }
            ],
            "tenant_id": "tenant-uuid-1",
            "total_hours": 1
        },
        {
            "server_usages": [
                {
                    "instance_id": "instance-uuid-4",
                    "tenant_id": "tenant-uuid-2",
                    "hours": 1
                }
            ],
            "tenant_id": "tenant-uuid-2",
            "total_hours": 1
        },
    ]
}

Paging is done on the inner server_usages list. The marker is the last instance UUID in the server_usages list from the previous page.

The simple tenant usage endpoints will also include the conventional “next” links: tenant_usages_links in the case of index and tenant_usage_links in the show case.

/os-simple-tenant-usage?detailed=1&limit={limit}

{
    "tenant_usages": [
        {
            "server_usages": [
               ...
            ],
            "tenant_id": "{tenant_id}",
        }
    ],
    "tenant_usages_links": [
        {
            "href": "/os-simple-tenant-usage?detailed=1&limit={limit}&marker={marker}",
            "rel": "next"
        }
    ]
}
/os-simple-tenant-usage/{tenant_id}?detailed=1&limit={limit}

{
    "tenant_usage": {
        "server_usages": [
           ...
        ]
    },
    "tenant_usage_links": [
        {
            "href": "os-simple-tenant-usage/{tenant_id}?limit={limit}&marker={marker}",
            "rel": "next"
        }
    ]
}

Note

For clarity, I omitted the additional query parameters (like start & end) from the next links, but they need to be preserved. An actual next link would look more like this.

"tenant_usages_links": [
    {
        "href": "http://openstack.example.com/v2.1/6f70656e737461636b20342065766572/os-simple-tenant-usage?detailed=1&end=2016-10-12+18%3A22%3A04.868106&limit=1&marker=1f1deceb-17b5-4c04-84c7-e0d4499c8fe0&start=2016-10-12+18%3A22%3A04.868106",
        "rel": "next"
    }
]

Alternatives

None

Data model impact

Sorting will need to be added to the query that returns the instances in the server_usages list. The sort order will need to be deterministic across cell databases, and we may need to modify/add a new database index as a result.

REST API impact

Add an API microversion that allows for pagination of the simple tenant usage results using optional limit and marker query parameters. If limit isn’t provided, it will default to CONF.osapi_max_limit which is currently 1000.

GET /os-simple-tenant-usage?limit={limit}&marker={instance_uuid}
GET /os-simple-tenant-usage/{tenant_id}?limit={limit}&marker={instance_uuid}

Older versions of the os-simple-tenant-usage endpoints will not accept these new paging query parameters, but they will start to silently limit by CONF.osapi_max_limit to encourage the adoption of this new microversion, and circumvent the existing possibility DoS-like usage requests on systems with thousands of instances.

Security impact

None

Notifications impact

None

Other end user impact

Also change the python-novaclient to accept limit and marker options for simple tenant usage.

Performance Impact

Horizon consumes these API endpoints which are currently slow with a large memory profile when there are a lot of instances.

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

diana_clarke

Other contributors:

None

Work Items

  • Create a new API microversion for simple tenant usage pagination.

  • Update python-novaclient to be able to take advantage of these changes.

  • Communicate these changes to the Horizon team.

Dependencies

None

Testing

Needs functional and unit tests.

Documentation Impact

Update the “Usage reports” section of the compute api-ref to mention the new microversion and optional limit and marker query parameters.

References

Bug that describes the problem:

[1] https://bugs.launchpad.net/nova/+bug/1421471

Proof of concept (nova & python-novaclient):

[2] https://review.openstack.org/#/c/386093/

[3] https://review.openstack.org/#/c/394653/

History

Revisions

Release Name

Description

Ocata

Introduced