Avoid loading one pollster if required environment is not ready. Remove
resources that cause continuous exception in polling time.
Currently pollsters are used in following way.
- define via setup.cfg
- load each pollster in initialization
- pollster’s get_samples get called in periodic way to collect metrics
This fixed process is not flexible enough to handle various situation, like
- One pollster depends on specific environment, e.g compute node pollsters
need right hypervisor inspector, ipmi pollsters need ipmitool installed.
Loading pollster unconditionally produces extra exceptions or dummy samples
with minor performance drop. Check in deployment can not resolve all the
issues, as environment changes dynamically. The perfect solution is that
check in loading time and do not load the pollster if no required
- One pollster gets loaded and runs well, then some polling resource becomes
unavailable, so the pollster fails to produce samples and probably throws
exceptions with performance drop. We need detect such changes, then remove
this resource from polling. Also need put warning in log, so operator can
track and handle it.
Provides infrastructure to get pollster specific response then take different
For case 1, need provide a function in pollster constructor, to raise specific
exception if required hardware is missing. When loading extensions, add
on_load_failure_callback, which checks and suppresses the exception to avoid
loading this extension. Probably need one configuration list to indicate which
pollster can be skipped.
For case 2, we need pollsters throw 2 different exceptions in run time:
- transient failure: treat like existing ones
- permanent failure: Not poll that resources from this pollster any more
To achieve this, we have PollingTask to catch that permanent failure, then
block polling for that resources.
Other end user impact
This BP improve performance due to avoid loading unnecessary pollster and
remove unavailable resources to eliminate continuous exceptions.
Other deployer impact
- Primary assignee:
- Other contributors:
- Ongoing maintainer:
- For loading time pollster
- Add environment check function in pollster constructor
- Add new on_load_failure_callback to avoid loading failed extension
- For running time pollster
- Modify required pollster to throw permanent failure exception
- Modify PollingTask to skip failure resources
Once this feature enabled, need test and bug fixing in next 2 releases to avoid