Adding some Prometheus query functions to the Prometheus collector

https://storyboard.openstack.org/#!/story/2006427

Problem Description

The Prometheus collector cannot handle Counter and Gauge data types.

A Counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.

A Gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

Proposed Change

If the delta function was used, it would be possible to get the difference between two collect cycles, and therefore to charge metrics of the Counter and Gauge type.

Adding two fields (range_function and query_function) to the extra_args section of the metrics.yml file would solve this problem.

Prometheus functions can be grouped in two categories, each matching one of the new fields:

Some functions take a Range Vector as argument and some take an Instant vector. Both categories return an Instant vector.

The new fields would only have a limited set of allowed functions because of the expected result format.

The following Prometheus query functions will be allowed:

Functions taking a Range Vector:

  • changes()

  • delta()

  • deriv()

  • idelta()

  • irange()

  • irate()

  • rate()

Functions taking an Instant Vector:

  • abs()

  • ceil()

  • exp()

  • floor()

  • ln()

  • log2()

  • log10()

  • round()

  • sqrt()

Functions accepting a Range vector will be will be set through the new range_function field. They will replace the current implicit {aggregation_method}_over_time function that is applied by the collector to obtain an Instant vector. The field will be optional and will default to {aggregation_method}_over_time.

Functions accepting an Instant vector will be set through the new query_function field. This will allow to apply an extra transformation to data before it is rated.

Both fields can be combined.

Example

The following config

metrics:
  gateway_function_invocation_total:
    unit: total
    groupby:
      - function_name
      - code
    extra_args:
      aggregation_method: max
      range_function: delta

would result in this query: max(delta(gateway_function_invocation_total{}[3600s])) by (function_name,code)

Another example:

metrics:
  gateway_function_invocation_total:
    unit: total
    groupby:
      - function_name
      - code
    extra_args:
      aggregation_method: max
      range_function: delta
      query_function: abs

Would result in: max(abs(delta(gateway_function_invocation_total{}[3600s]))) by (function_name,code)

And this:

metrics:
  gateway_function_invocation_total:
    unit: total
    groupby:
      - function_name
      - code
    extra_args:
      aggregation_method: max
      query_function: abs

Would result in: max(abs(max_over_time(gateway_function_invocation_total{}[3600s]))) by (function_name,code)

Alternatives

The PyScript module could be used in some cases but it’s a bit complex just for some simple operations. It would also require to save the latest state of the metric (in case of the delta function).

Data model impact

None

REST API impact

None

Security impact

None

Notifications Impact

None

Other end user impact

End users will be able to perform more operations on metrics retrieved by the Prometheus collector especially on Gauge and Counter metrics.

Performance Impact

None

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

<aimbot31>

Work Items

  • Add support for the query_function field to the Prometheus collector

  • Add support for the range_function field to the Prometheus collector

Dependencies

None

Testing

The proposed changes will be tested with Unit Tests.

Documentation Impact

An entry detailing the configuration of the new field will be added to : Admin/Configuration/Collector.