Workflow Simplification

https://blueprints.launchpad.net/tripleo/+spec/workflow-simplification

The TripleO workflow is still too complex for many (most?) users to follow successfully. There are some fairly simple steps we can take to improve that situation.

Problem Description

The current TripleO workflow grew somewhat haphazardly out of a collection of bash scripts that originally made up instack-undercloud. These scripts started out life as primarily a proof of concept exercise to demonstrate that the idea was viable, and while the steps still work fine when followed correctly, it seems “when followed correctly” is too difficult today, at least based on the feedback I’m hearing from users.

Proposed Change

Overview

There seem to be a number of low-hanging fruit candidates for cleanup. In the order in which they appear in the docs, these would be:

  1. Node registration Why is this two steps? Is there ever a case where we would want to register a node but not configure it to be able to boot? If there is, is it a significant enough use case to justify the added step every time a user registers nodes?

    I propose that we configure boot on newly registered nodes automatically. Note that this will probably require us to also update the boot configuration when updating images, but again this is a good workflow improvement. Users are likely to forget to reconfigure their nodes’ boot images after updating them in Glance.

    Note

    This would not remove the openstack baremetal configure boot command for independently updating the boot configuration of Ironic nodes. In essence, it would just always call the configure boot command immediately after registering nodes so it wouldn’t be a mandatory step.

    This also means that the deploy ramdisk would have to be built and loaded into Glance before registering nodes, but our documented process already satisfies that requirement, and we could provide a –no-configure-boot param to import for cases where someone wanted to register nodes without configuring them.

  2. Flavor creation Nowhere in our documentation do we recommend or provide guidance on customizing the flavors that will be used for deployment. While it is possible to deploy solely based on flavor hardware values (ram, disk, cpu), in practice it is often simpler to just assign profiles to Ironic nodes and have scheduling done solely on that basis. This is also the method we document at this time.

    I propose that we simply create all of the recommended flavors at undercloud install time and assign them the appropriate localboot and profile properties at that time. These flavors would be created with the minimum supported cpu, ram, and disk values so they would work for any valid hardware configuration. This would also reduce the possibility of typos in the flavor creation commands causing avoidable deployment failures.

    These default flavors can always be customized if a user desires, so there is no loss of functionality from making this change.

  3. Node profile assignment This is not currently part of the standard workflow, but in practice it is something we need to be doing for most real-world deployments with heterogeneous hardware for controllers, computes, cephs, etc. Right now the documentation requires running an ironic node-update command specifying all of the necessary capabilities (in the manual case anyway, this section does not apply to the AHC workflow).

    os-cloud-config does have support for specifying the node profile in the imported JSON file, but to my knowledge we don’t mention that anywhere in the documentation. This would be the lowest of low-hanging fruit since it’s simply a question of documenting something we already have.

    We could even give the generic baremetal flavor a profile and have our default instackenv.json template include that[1], with a note that it can be overridden to a more specific profile if desired. If users want to change a profile assignment after registration, the node update command for ironic will still be available.

    1. For backwards compatibility, we might want to instead create a new flavor named something like ‘default’ and use that, leaving the old baremetal flavor as an unprofiled thing for users with existing unprofiled nodes.

Alternatives

tripleo.sh

tripleo.sh addresses the problem to some extent for developers, but it is not a viable option for real world deployments (nor should it be IMHO). However, it may be valuable to look at tripleo.sh for guidance on a simpler flow that can be more easily followed, as that is largely the purpose of the script. A similar flow codified into the client/API would be a good result of these proposed changes.

Node Registration

One option Dmitry has suggested is to make the node registration operation idempotent, so that it can be re-run any number of times and will simply update the details of any already registered nodes. He also suggested moving the bulk import functionality out of os-cloud-config and (hopefully) into Ironic itself.

I’m totally in favor of both these options, but I suspect that they will represent a significantly larger amount of work than the other items in this spec, so I think I’d like that to be addressed as an independent spec since this one is already quite large.

Security Impact

Minimal, if any. This is simply combining existing deployment steps. If we were to add a new API for node profile assignment that would have some slight security impact as it would increase our attack surface, but I feel even that would be negligible.

Other End User Impact

Simpler deployments. This is all about the end user.

Performance Impact

Some individual steps may take longer, but only because they will be performing actions that were previously in separate steps. In aggregate the process should take about the same time.

Other Deployer Impact

If all of these suggested improvements are implemented, it will make the standard deployment process somewhat less flexible. However, in the Proposed Change section I attempted to address any such new limitations, and I feel they are limited to the edgiest of edge cases that in most cases can still be implemented through some extra manual steps (which likely would have been necessary anyway - they are edge cases after all).

Developer Impact

There will be some changes in the basic workflow, but as noted above the same basic steps will be getting run. Developers will see some impact from the proposed changes, but as they will still likely be using tripleo.sh for an already simplified workflow it should be minimal.

Implementation

Assignee(s)

bnemec

Work Items

  • Configure boot on newly registered nodes automatically.
  • Reconfigure boot on nodes after deploy images are updated.
  • Remove explicit step for configure boot from the docs, but leave the actual function itself in the client so it can still be used when needed.
  • Create flavors at undercloud install time and move documentation on creating them manually to the advanced section of the docs.
  • Add a ‘default’ flavor to the undercloud.
  • Update the sample instackenv.json to include setting a profile (by default, the ‘default’ profile associated with the flavor from the previous step).

Dependencies

Nothing that I’m aware of.

Testing

As these changes are implemented, we would need to update tripleo.sh to match the new flow, which will result in the changes being covered in CI.

Documentation Impact

This should reduce the number of steps in the basic deployment flow in the documentation. It is intended to simplify the documentation.

References

Proposed change to create flavors at undercloud install time: https://review.openstack.org/250059 https://review.openstack.org/251555