Dynamic pipeline configuration using File Reloading

https://blueprints.launchpad.net/ceilometer/+spec/reload-file-based-pipeline-configuration

Currently there is no way to enable or disable meters without restarting ceilometer. There are cases where operators do not want to run all the meters continuously.

By enhancing Ceilometer to monitor pipeline file configuration, the application could be made to dynamically activate/deactivate meters, collection targets or similar functions, have distinctly different configurations for multiple nodes in different environments (i.e. dev/test/prod or HA scenarios) and could ultimately be updated with new collection targets “on-the-fly”.

Problem description

Currently, Ceilometer relies on pipeline configuration file for determining run-time parameters used in polling and notification handling.

There is no way to enable or disable meters without restarting ceilometer. There are cases where operators do not want to run all the meters continuously. In these cases, there should be a way to disable or enable them dynamically (without restarting ceilometer).

Meters in ceilometer are enabled by adding them in “setup.cfg:entry_points” configuration file and restarting ceilometer agent.

When ceilometer agent restarts, during initialization it reads the entry_points and creates and loads all pollster objects corresponding to meters present in setup.cfg.

There are disadvantages with this implementation:

Ceilometer might be running several hundreds of meters continuously and restarting it might impact other meters as it involves the deletion and re-creation of the hundreds of pollster objects, when the need is to poll a few meters or avoid polling a few meters.

The subsequent sections will cover an approach to dynamically reload and update agent pipeline configuration. (pipeline.yaml)

Proposed change

Summary:

File polling by agent - Each agent polls its local pipeline.yaml file for changes and re-configures pollsters/listeners on detecting a change.

By default, this will be turned off. Its an optional feature and can be turned on by specifying reload_pipeline_config = True in ceilometer.conf under the default section.

The proposed high-level flow:

  1. Each agent daemon polls local pipeline configuration file at configurable interval.

  2. The agents will reload and validate the configuration.

  3. If validation succeeds, the new configuration is used.

Next steps for central-agent, compute-agent and ipmi-agent:

  1. For polling agent, determine which pollsters are needed based on new configuration and need to run at what frequency (polling interval)

  2. For polling-agent, all pollsters in flight are allowed to gracefully complete.

  3. In HA mode, re-initialize the partition coordinator with new groups

  4. Start the new set of pollsters.

Next steps for notification agent:

  1. Existing listeners are stopped to allow for graceful termination.

  2. Configure new listeners with the changed pipeline configuration.

Next steps for HA notification agent:

  1. Pipeline listeners(pipeline sink-based internal queue listener) are gracefully terminated.

  2. Pipeline listeners are re-configured with new pipeline.

Pros:

Preferred and Easiest approach

Cons:

It doesn’t centralize the pipeline definition and runs the risk of agents diverging on their pipeline definitions

This means we’re allowing any kind of error levels due to the fact file might be changed for one agent, and not changed for another one in the same coordination group.

Alternatives

  1. File polling by separate daemon - Each agent uses a separate daemon that polls the pipeline.yaml for changes. On detecting a change, it signals (HUP) the agents to re-load the pipeline configuration.

    Pros: Allows for future extensibility while keeping the agent to pipeline file contract unchanged. Cons: Same as Previous approach HA for the new daemon needed Using SIGHUP a potential loophole? (since anyone can send a SIGHUP and have the agent reload its configuration) More work for less gain

  2. HUP-based pipeline reload by agent - The agent uses signal handler to handle the HUP signal and reloads pipeline from the file in response to the signal. The assumption on receiving a signal is that the pipeline file has changed, so the pipeline file can be changed manually or using config management tools - Chef/Puppet.

    Pros: No API. Cons: File synchronization within coordination group is deployer responsibility Same as Approach 1 Using SIGHUP a potential loophole? (since anyone can send a SIGHUP and have the agent reload its configuration)

  3. Use automated deployment tools - Puppet, Chef, Ansible to change pipeline definitions. While this automates changing pipeline definitions across multiple agents, it doesn’t bring the value-add of on-the-fly updates to the agent, without incurring a restart of the daemons.

    Pros: No API. Cons: Is looking like the only pure admins approach. Will be more tricky for everyone not familiar with these tools (devs, DevOps, etc.) Not in ceilometer scope.

  4. Use a combination of 2 and 3 - Use configuration management tool to automate the change in pipeline configuration across multiple remote agent nodes. Do not restart any processes though. Agents on each node poll the pipeline file and on detecting a change in the next polling, will update the pollsters and listeners. This approach has a dependency with ops on the config mgmt tools.

    Pros: No API. Cons: File synchronization within coordination group is on deployer responsibility

Data model impact

REST API impact

Security impact

Pipeline impact

The use of the pipeline configuration will change from static to dynamic. This will affect the supported meters and their datapoint collection.

The impact to the system is expected to be minimal since changes to pipeline are expected to be low in frequency.

Other end user impact

Performance/Scalability Impacts

There could be a small window where message build-up occurs in oslo bus/ internal queues when the notification listeners are restarted.

Other deployer impact

Depending on the approach chosen, the deployer will have to synchronize pipeline file updates across multiple agent instances and then signal the agents to reload the configuration.

Developer impact

Implementation

Assignee(s)

Primary assignee:

rjaiswal

Other contributors:

TBD

Ongoing maintainer:

rjaiswal

Work Items

  • Implement timer to poll pipeline configuration file

  • Implement graceful termination of pollsters and listeners

  • Implement reconfiguring and reloading of pollsters in polling-agent

  • Implement reconfiguring and reloading of listeners in notification-agent

  • Unit Tests

  • Doc updates

Future lifecycle

We’ll want to continue to iterate on this for future cycles to gain additional functionality:

Use Tooz with publish-subscribe functionality to enable synchronization of pipeline configuration across multiple agent instances in a single coordination group.

Migration of event traits configuration

Dependencies

Relates to: https://blueprints.launchpad.net/ceilometer/+spec/dedicated-event-db https://review.openstack.org/#/c/119077/ (central and compute agents merge)

Testing

Add unit tests to exercise reloading of pipeline in agents

Documentation Impact

Update the relevant documentation about this new feature of reloading pipeline configuration

References

https://etherpad.openstack.org/p/liberty-ceilometer-pipeline-config

https://etherpad.openstack.org/p/configuration_via_data_store

https://review.openstack.org/#/c/171826/

https://wiki.openstack.org/wiki/Ceilometer/blueprints/Configuration-via-data-store

http://docs.openstack.org/developer/oslo.config/configopts.html#oslo_config.cfg.ConfigOpts.reload_config_files