We need ways in TripleO for performing validations at various stages of the deployment.
TripleO deployments, and more generally all OpenStack deployments, are complex, error prone, and highly dependent on the environment. An appropriate set of tools can help engineers to identify potential problems as early as possible and fix them before going further with the deployment.
People have already developed such tools , however they appear more like a random collection of scripts than a well integrated solution within TripleO. We need to expose the validation checks from a library so they can be consumed from the GUI or CLI without distinction and integrate flawlessly within TripleO deployment workflow.
We propose to extend the TripleO Overcloud Deployment Mistral workflow  to include Actions for validation checks.
These actions will need at least to:
Running validations will be implemented in a workflow to ensure the nodes meet certain expectations. For example, a baremetal validation may require the node to boot on a ramdisk first.
Mistral workflow execution can be started with the mistral execution-create command and can be stopped with the mistral execution-update command by setting the workflow status to either SUCCESS or ERROR.
Every run of the workflow (workflow execution) is stored in Mistral’s DB and can be retrieved for later use. The workflow execution object contains all information about the workflow and its execution, including all output data and statuses for all the tasks composing the workflow.
By introducing a reasonable validation workflows naming, we are able to use workflow names to identify stage at which the validations should run and trigger all validations of given stage (e.g. tripleo.validation.hardware.undercloudRootPartitionDiskSizeCheck)
Using the naming conventions, the user is also able to register a new validation workflow and add it to the existing ones.
One alternative is to ship a collection of scripts within TripleO to be run by engineers at different stages of the deployment. This solution is not optimal because it requires a lot of manual work and does not integrate with the UI.
Another alternative is to build our own API, but it would require significantly more effort to create and maintain. This topic has been discussed at length on the mailing list.
The whole point behind the validations framework is to permit running scripts on the nodes, thus providing access from the control node to the deployed nodes at different stages of the deployment. Special care needs to be taken to grant access to the target nodes using secure methods and ensure only trusted scripts can be executed from the library.
We expect reduced deployment time thanks to early issue detection.
Developers will need to keep the TripleO CI updated with changes, and will be responsible for fixing the CI as needed.
The work items required are:
All patches that implement these changes must pass CI and add additional tests as needed.
We are dependent upon the tripleo-mistral-deployment-library  work.
The TripleO CI should be updated to test the updated tripleo-common library.
Mistral Actions and Workflows are sort of self-documenting and can be easily introspected by running ‘mistral workflow-list’ or ‘mistral action-list’ on the command line. The updated library however will have to be well-documented and meet OpenStack standards. Documentation will be needed in both the tripleo-common and tripleo-docs repositories.