Quotas, Usage Plans, and Capacity Management

Sections in italics are optional.

Problem description

A canonical property of an IaaS system like OpenStack is “capacity on demand”. Users expect to be able to allocate new resources via UI or API whenever needed, and to release them when the need ends. By supporting a large number of users, pooling resources, and maintaining some excess capacity, the cloud service provider (CSP) presents the illusion of infinite capacity.

In practice, of course, the resources are not infinite, and the CSP must institute measures to manage capacity so that resource exhaustion is minimized. This is generally done by imposing a cap or quota on the resources that a particular project may consume, and by managing the relationship between the available physical resources and the aggregate quotas for all projects. When a project requires more resources than its assigned quota, the user is generally required to submit a request, generally requiring human approval. The CSP may reject the request, or delay it until sufficient capacity is available. When the request is approved, the quota for the project is modified to reflect the new limit.

Other CSPs have introduced a number of mechanisms to provide them with flexibility in managing capacity. These include group quotas (shared by related projects), reserved instances, ephemeral instances (which may be reclaimed for reallocation), and market-based allocation models. At the present time, OpenStack does not support any of these.

One common factor in all these processes is that they do not reflect temporal variations in resource usage. Yet in many cases the user knows how their usage is going to vary over time, and such information would be useful to the CSP who needs to decide how to handle each request. It might also facilitate the automation of some of the processing. The following user stories capture the possibilities here.

User Stories

  • As an OpenStack user, I want to specify my resource usage request (RUR) in a way that will enable automated processing by the CSP, so that my RUR will be handled more quickly and accurately.
  • As a CSP I want to be able to automate the processing of RURs so that I can meet my user SLAs and gain more timely and accurate data input to my capacity management and planning systems.
  • As a user, I want to be able to describe the temporal characteristics of my RUR, so that the CSP can plan capacity more accurately and reduce the chances of a resource request failure. My CSP may also offer me better pricing for more accurate usage prediction. Some examples of time-based RURs:
  1. I plan to use up to 60 vCPUs and 240GB of RAM from 6/1/2016 to 8/14/2016.
  2. I plan to use 200GB of object storage starting on 8/14/2016, increasing by 100GB every calendar month thereafter.
  3. I want guaranteed access to 30 vCPUs and 200GB of RAM for my project. In addition, during October-December, I want to be able to increase my usage to 150 vCPUs and 1TB of RAM
  • As a user, I want to be able to submit an updated version of a rolling RUR for my project every month, so that my CSP has accurate information and can give me the best price and SLA.
  • As a user, I want to be able to take advantage of pricing and other offers from my CSP in order to meet the business objectives for my project. For example:
  1. I want 60 vCPUs for a minimum of one hour. After that time, the CSP may shut down all my instances if the resources are needed elsewhere. (I assume that the price is lower on such instances.)
  2. I want up to 100 vCPUs for the next 24 hours. Tell me how many I can have.
  • As a CSP, I want to be able to automate the construction and interpretation of a time-based resource usage plan so that I can schedule the most cost-effective actions to maintain my SLA. Some examples of actions:
  1. Schedule the provisioning of additional infrastructure.
  2. Repurpose existing allocated infrastructure.
  3. Assign a new project to one of a number of regions based on usage projections.
  4. Add “burst capacity” from a federation partner or reseller.
  5. Modify or defer another project.

Usage Scenarios Examples

TBD

Opportunity/Justification

None.

Requirements

  • The implementation of these capabilities will depend in part on the existence of a more flexible and holistic quota scheme, so that the capacity management system can adjust quotas programmatically.
  • It will also require a rich monitoring, notification, and visualization system, so that both user and CSP have accurate and timely data about the behavior of the system.

Gaps

None currently known.

Affected By

None.

External References

None.

Glossary