Clean up clusters that are in non-final state for a long time

https://blueprints.launchpad.net/sahara/+spec/periodic-cleanup

This spec is to introduce periodic task to clean up old clusters in non-final state.

Problem description

For now it is possible that sahara cluster becomes stuck because of different reasons (e.g. if sahara service was restarted during provisioning or neutron failed to assign floating IP). This could lead to clusters holding resources for a long time. This could happen in different tenants and it is hard to check such conditions manually.

Related bug: https://bugs.launchpad.net/sahara/+bug/1185909

Proposed change

Add “cleanup_time_for_nonfinal_clusters” parameter in “periodic” section of configuration.

Based on this configuration periodic task will search clusters that are in non-final state and weren’t updated for a given time.

Term “non-final” includes all cluster states except “Active” and “Error”.

“cleanup_time_for_nonfinal_clusters” parameter will be in hours. Non-positive value will indicate that clean up option is disabled.

Default value will be 0 to keep backward compatibility (users don’t expect that after upgrade all their non-final cluster will be deleted).

‘updated_at’ column of ‘clusters’ column will be used to determine last change. This is not 100% accurate, but good enough. This field is changed each time cluster status is changed.

Alternatives

Add such functionality to external service (e.g. Blazar).

Data model impact

None.

REST API impact

None.

Other end user impact

None.

Deployer impact

None.

Developer impact

None.

Sahara-image-elements impact

None.

Sahara-dashboard / Horizon impact

None.

Implementation

Assignee(s)

Primary assignee:

alazarev (Andrew Lazarev)

Other contributors:

None

Work Items

  • Implement feature

  • Document feature

Dependencies

None.

Testing

Manually.

Documentation Impact

Need to be documented.

References