Workflow Error Analysis¶
Include the URL of your launchpad blueprint:
https://blueprints.launchpad.net/mistral/+spec/mistral-error-analysis
This specification will outline the need for error analysis command or method within Mistral.
Problem description¶
Currently there is not a central way or single command which can be issued to determine the root cause of an error that has occurred upon failure of a mistral workflow. The proposed functionality would give the developer and operator a method which can help debug errors which may stem from syntax errors within the workbook or reveal actual bugs, by reporting the necessary information from the execution to the client.
Use Cases¶
The main uses for this feature would involve post workflow runs which involve but not limited to OpenStack post deployment and workflow run investigation.
Proposed change¶
Provide a command line interface and public API which the operator can use to trigger the analysis of errors.
The table below is a draft example and subject to change once reviews are complete.
‘mistral report-generate <workflow id>’
Field | Value |
|
---|---|
Workflow_name |
my_workflow |
Workflow_ID |
xxxxx-xxxx-xxx-xxxxxxx |
Workflow_State |
[Error | Success ] |
**Workflow_State_info |
***<task_name: cause> |
Task_name |
my_task |
Task_ID |
xxxxx-xxxx-xxx-xxxxxxx |
Task_State |
[Error | Success] |
Task_State_info | <cause> |
‘mistral report-generate –include-trace <workflow id>’
Field | Value |
|
---|---|
Workflow_name |
my_workflow |
Workflow_ID |
xxxxx-xxxx-xxx-xxxxxxx |
Workflow_State |
[Error | Success ] |
**Workflow_State_info |
***<task_name: cause> |
Task_name |
my_task |
Task_ID |
xxxxx-xxxx-xxx-xxxxxxx |
Task_State |
[Error | Success] |
Task_State_info |
<cause> |
|
** State info would report <None> in the case where no error is generated.
*** Task name and cause, the cause would be evaluated from an enum value.
**** Workflow traceback would report a more verbose output of errors this output could be controlled with a cli switch –include-trace. Without the flag, the operator would just receive the enum value with a brief description.
- example:
E101 – task <task name> contains syntax error
E120 – task <task name> missing input
E201 – action failed to complete
Alternatives¶
The current method of determining a error would involve looking through the workflow execution id list to determine what is in an error state.
‘mistral task-list <workflow execution id>’ and see what are in ERROR
- for each failed task execution run:
‘mistral action-execution-list’ and see what are in ERROR
- for each failed action run:
‘mistral action-execution-get-output <id>’ to see the description of the error
for each failed task execution of type Workflow, find the sub-workflow execution ID, and go back to the first bullet.
Data model impact¶
None.
REST API impact¶
This is still in discussion.
A separate REST API endpoint to build reports on the current status of execution and/or error analysis
End user impact¶
The end user would have a newly documented method/function to call to start the error analysis.
Performance Impact¶
If this is implemented on the server side the performance impact should be greatly reduced as the need for ReST calls would be drastically reduced.
Deployer impact¶
This would provide additional information to help the operator correct errors in the deployment, or it will provide enough information which can be attached to a bug report to help development correct the offending source.
Implementation¶
Assignee(s)¶
- Primary assignee:
toure
- Other contributors:
rakhmerov
Work Items¶
Create new Mistral engine error analysis functionality.
Update python-mistralclient to include new API changes.
Update documentation to explain usage.
Create CI scripts/jobs to mimic error in workflows.
Dependencies¶
None.
Testing¶
Functional tests that imitate workflow failures and make sure that we get the right report.
References¶
None.