Gnocchi – Ceilometer API v3

https://blueprints.launchpad.net/ceilometer/+spec/gnocchi

Problem description

From the beginning of the Ceilometer project, a large part of the goal was to store time series data that were collected. In the early stages of the project, it wasn’t really clear what and how these time series were going to be handled, manipulated and queried, so the data model used by Ceilometer was very flexible. That ended up being really powerful and handy, but the resulting performance has been terrible, to a point where storing a large amount of metrics on several weeks is really hard to achieve without having the data storage backend collapsing.

Having such a flexible data model and query system is great, but in the end users are doing the same request over and over and the use cases that need to be addressed are a subset of that data model. On the other hand, some queries and use cases are not solved by the current data model, either because they are not easy to be expressed or because they are just too damn slow to run.

Lately, during the Icehouse Design Summit in Hong-Kong, developers and users showed interest in having Ceilometer doing metric data aggregation, in order to keep data in a more long running fashion. No work has been done during the Icehouse cycle on that, probably due to the lack of manpower around the idea, even if the idea and motivation was validated by the core team back then.

Considering the amount of data and metrics Ceilometer generates and has to store, a new strategy and a rethinking of the problem was needed, so Gnocchi is a try on that.

Proposed change

Ceilometer is nowadays trying to achieve two different things:

  • Store metrics, that is a list of (timestamp, value) for a given entity, this entity being anything from the temperature in your datacenter to the CPU usage of a VM.
  • Store events, that is a list of things that happens in your OpenStack installation: an API request has been received, a VM has been started, an image has been uploaded, a server fell of the roof, whatever

These two things are both very useful for all the use cases Ceilometer tries to achieve. Metrics are useful for monitoring, performance analysis, and alarming, where events are useful to do audit, billing, debugging, etc.

However, while the event collection of Ceilometer is pretty solid and ok (but still needs to be working on), the metrics part suffers terrible design and performance issues.

Having the so-called free form metadata associated with each metric generated by Ceilometer is the most problematic design we have. It stores a lot of redundant information that it is hard to query in an efficient manner. On the other hand, systems like RRD have existed for a while, storing a large amount of (aggregated) metrics without much problem. The metadata associated to these metrics being another issue.

So that left us with 2 different problem to solve: store metrics and store information (the so-called metadata) about resources.

Alternatives

None

Data model impact

This will bring a whole new data model, where metrics and resources are split with snapshots of resource metadata no longer wedded to individual sample datapoints.

It will be possible to migrate data from the current Ceilometer database to this new system by writing and running a script taking charge of that. A future blueprint should cover this topic.

REST API impact

The change of the data model is consequent, so a new API is also needed. It should be a version 3 of the Ceilometer API, or if the project is kept separate a version 1 of it.

The proposed API is:

  • POST /v1/entity:

    -> {"archives": [{"lifespan": 3600, "points": 1000},
                     {"lifespan": "1 year", "interval": 60},
                     {"points": 1000, "interval": 60}]}
    <- 201 Created
       Location: /v1/entity/<uuid>
    
    Create an entity storing:
      - 1000 points over an hour
      - a point every minute over a year
      - 1000 points with a point every minute.
    The uuid of the entity is returned.
  • POST /v1/entity/<uuid>/measures:

    -> [{"timestamp": "2013-01-01 12:12:23", "value": 42.0},
        {"timestamp": "2013-01-01 12:12:24", "value": 43.1}]
    <- 204 No Content
    
    Store measures for an entity
  • GET /v1/entity/<uuid>/measures:

    <- [{"timestamp": "2013-01-01 12:12:23", "value": 42.0},
        {"timestamp": "2013-01-01 12:12:24", "value": 43.1}]
    
    Returns a list of measures from this entity.
    Time span can be specified with a query string.
  • DELETE /v1/entity/<uuid>:

    <- 204 No Content
    
    Delete an entity.
  • POST /v1/resource/<resource type>:

    -> { "id": <uuid>,
         "started_at": "2013-01-01 12:23:12",
         "project_id": "foobar",
         "entities": { "cpu.util": <entity uuid> },
         "user_id": "foobaz"}
    <- { "id": <uuid>,
         "started_at": "2013-01-01 12:23:12",
         "project_id": "foobar",
         "entities": { "cpu.util": <entity uuid> },
         "type": <resource type>,
         "user_id": "foobaz"}
    
    Create a resource. The UUID has to be provided by the caller (and is
    expected to match the native UUID of the underlying resource) and
    various attributes can also be provided.
    
    Entities can be specified with their UUID, or with creation parameters:
    
    -> { "id": <uuid>,
         "started_at": "2013-01-01 12:23:12",
         "project_id": "foobar",
         "entities": { "cpu.util": {"archives": [{"lifespan": 3600, "points": 1000}]} },
         "user_id": "foobaz"}
    <- { "id": <uuid>,
         "started_at": "2013-01-01 12:23:12",
         "project_id": "foobar",
         "entities": { "cpu.util": <entity uuid> },
         "user_id": "foobaz"}
  • GET /v1/resource/<resource type>:

    <- [{ "id": <uuid>,
          "started_at": "2013-01-01 12:23:12",
          "project_id": "foobar",
          "type": "generic",
          "entities": { "cpu.util": <entity uuid> },
          "user_id": "foobaz"}]
    
    Return list of resources.
  • GET /v1/resource/<resource type>/<uuid>:

    <- { "id": <uuid>,
         "started_at": "2013-01-01 12:23:12",
         "project_id": "foobar",
         "type": "generic",
         "entities": { "cpu.util": <entity uuid> },
         "user_id": "foobaz"}
    
    Return details about a resource.
  • DELETE /v1/resource/<resource type>/<uuid>:

    <- 204 No Content
    
    Delete a resource.
  • PATCH /v1/resource/<resource type>/<uuid>:

    -> {"started_at": "2013-01-01 12:23:13"}
    <- { "id": <uuid>,
         "started_at": "2013-01-01 12:23:13",
         "type": "generic",
         "entities": { "cpu.util": <entity uuid> },
         "project_id": "foobar",
         "user_id": "foobaz"}
    
    Change value for a mutable attribute. The list of attributes that is
    mutable depends on the resource type, but all resource type can change:
    * ended_at
    * entities

All resources inherits from the generic resource type and can therefore be partially manipulated by using this resource type. Otherwise, resource types with more attributes are provided such as instance to create more complete resources.

All resources types are builtin within Gnocchi in order to be more performant. If a resource type needs to be indexed but is not known to Ceilometer, one can relies on the generic resource type and manage the attributes of the resource is another system, per user discretion.

The resource type known by Gnocchi will be the resource types provided by OpenStack, e.g. instance, port, network, volume, etc.

Security impact

Usual Keystone token-based authN and RBAC-based authZ.

No security mechanism is proposed to access entities. As the entities UUID are dynamically and randomly allocated, one has to know the UUID of that entity to access it. It can therefore be considered as a secret.

Access to the resources can be filtered based on the user_id and project_id fields that are stored and mandatory attributes. That’s the same mechanism currently used in Ceilometer API v2.

Pipeline impact

The publishing mechanism will need to be adapted to that new model, as resources needs to be created before they can be metered. Another blueprint should cover this topic.

Other end user impact

The ceilometerclient will need to be extended to support both the old and new APIs to that also.

Performance/Scalability Impacts

The scalability and performances of this new system should be drastically better than the old one.

Having real benchmarks of this system would also be interesting.

Other deployer impact

None

Developer impact

It’s likely that the API v2 of Ceilometer should be frozen and that no further improvements should be made at this stage.

Implementation

Assignee(s)

Primary assignee:
  • jdanjou
Other contributors:
  • sileht
  • dbelova
Ongoing maintainer:
  • jdanjou

Work Items

  • Build the Gnocchi service and API
  • Adjust Ceilometer data retrieval and publishing as needed to adapt to the new data storage and API
  • Make things work together

Future lifecycle

This will be a core component of Ceilometer, so everyone is going to take care of that, me included.

Dependencies

  • Canonical implementation of the storage driver requires Pandas and Swift.
  • Alternative statistical/storage driver(s) with different dependencies may also be provided in time.

Testing

Unit tests are provided.

Tempest tests should be added to cover the new API. A variant of the v2 tests running with v3 mechanism enabled is a possibility.

Documentation Impact

We must document the new API. There is currently no mechanism to auto-generate the API documentation, though it should be doable and interesting to do so.