The TripleO GUI currently has no way to persist logging information.
The TripleO GUI is a web application without its own dedicated backend. As such, any and all client-side errors are lost when the End User reloads the page or navigates away from the application. When things go wrong, the End User is unable to retrieve client-side logs because this information is not persisted.
I propose that we use Zaqar as a persistence backend for client-side logging. At present, the web application is already communicating with Zaqar using websockets. We can use this connection to publish new messages to a dedicated logging queue.
Zaqar messages have a TTL of one hour. So once every thirty minutes, Mistral will query Zaqar using crontrigger, and retrieve all messages from the tripleo-ui-logging queue. Mistral will then look for a file called tripleo-ui-log in Swift. If this file exists, Mistral will check its size. If the size exceeds a predetermined size (e.g. 10MB), Mistral will rename it to tripleo-ui-log-<timestamp>, and create a new file in its place. The file will then receive the messages from Zaqar, one per line. Once we reach, let’s say, a hundred archives (about 1GB) we can start removing dropping data in order to prevent unnecessary data accoumulation.
To view the logging data, we can ask Swift for 10 latest messages with a prefix of tripleo-ui-log. These files can be presented in the GUI for download. Should the user require, we can present a “View more” link that will display the rest of the collected files.
None at this time
There is a chance of logging sensitive data. I propose that we apply some common scrubbing mechanism to the messages before they are stored in Swift.
Sending additional messages over an existing websocket connection should have a negligible performance impact on the web application. Likewise, running hourly cron tasks in Mistral shouldn’t impose a significant burden on the undercloud machine.
Developers should also benefit from having a centralized logging system in place as a means of improving productivity when debugging.
We can write unit tests for the code that handles sending messages over the websocket connection. We might be able to write an integration smoke test that will ensure that a message is received by the undercloud. We can also add some testing code to tripleo-common to cover the logic that drains the queue, and publishes the log data to Swift.
We need to document the default name of the Zaqar queue, the maximum size of each log file, and how many log files can be stored at most. On the End User side, we should document the fact that a GUI-oriented log is available, and the way to get it.