As Ceilometer has expanded to capture more notifications from the OpenStack message ecosystem, the number of samples it generates grows even faster as many samples are derived from a single notification or polling request.
The “Existence of xyz” meters we store in our samples database represents a significant portion of the data we store. These meters however offer no useful measurement value and it’s true value is to capture the state of the resource at a given time – a value that is also available in the meters generated alongside the existence meters. Additionally, the volume=1 value is very confusing to consumers of data as for a while, Horizon used that value as the total number of values and often users would wonder why there was always 1 record of an instance, network, port, etc...
As we move to a more time-series focused storage for samples, the “volume=1” meters we collect has not just an impact on storage size, but also the overhead of rolling up and computing statistics on something as trivial and meaningless as the constant 1. Additionally, the rollup of samples will diminish the value of said meters as valid auditable datapoints.
At a high-level, Samples are the children of Events. Samples are a derived subset of an Event. Because of that, the Samples we create should capture an explicit datapoint of interest from an Event and not just be a shadow of an Event.
The “Existence of xyz” and “volume=1” meters can be better represented as what they really are: Events. The core event functionality was implemented in previous cycles and should be expanded to better cover these non-measurement meters.
This will offer better querying of data in Events, less storage of data in general, and less confusion as to what the volume attribute in a sample represents.
The proposed solution is to:
1. Mark 'volume=1' meters as available as Events in the documentation and add them to the event_definitions file. The vast majority of them are created from notifications so no work needs to be done aside from adding them to event_defintion. **COMPLETED IN KILO** 2. Change default pipeline to not enable these meters. NOTE: this does not affect existing solutions nor will it break anything. An option was added in Kilo to enable/disable meters. **COMPLETED IN KILO** 3. Some meters such as network related meters (LBaaS, VPNaaS, SDN, etc...) come from pollsters. We will need to convert these pollsters to republish information as events. An example would be health check polls made for network meters. Instead of creating a sample, it would create an event. 4. Henceforth, Samples will be measurements which can be aggregated and rolled up in meaningful ways (within gnocchi). Events will be the raw base that Samples come from and they should be stored accordingly. Both are time series driven data.
A rough list of meters to be dropped can be found at the correspond bug.
None. We could leave step 3 as optional as they are essentially healthcheck or existence events. We can also support a tool to migrate ‘non-metric’ meters to events but this is arguably not important.
None. Events already exist. This may have a side dependency on adding alarming on Events to maintain feature parity.
None. we may want to expand Events but nothing is directly impacted here.
Audit data. This is an issue that already exists and will just be migrated from Samples to Events. It should be possible to add a tag to the event_definition file to mark data as audit. It is technically possible to redirect audit data to a separate db using event pipeline
Users STILL have the option to keep these meters but we should advocate they use Events instead. As of Liberty, these meters will be disabled by default and remove completely as of M*. This has been documented in docs already.
Less sample data. Less equivalent data in Events (because of trait filtering). Events are a bit more scalable in its design (maybe not the SQL backend).
We should turn on store_events option by default. This will probably require deployers to use event_connection option as it probably isn’t advisable to store sample and event data together but they could.
Learn events. Make events better.