Deploy Templates

https://storyboard.openstack.org/#!/story/1722275

This builds on the deployment steps spec, such that it is possible to request a particular set of deployment steps by requesting a set of traits.

When Ironic is used with Nova, this allows the end user to pick a Flavor, that requires a specific set of traits, that Ironic turns into a request for a specific set of deployment steps in Ironic.

Problem description

One node can be configured in many ways. Different user workloads require the node to be configured in different ways. As such, operators would like a single pool of hardware to be reconfigured to match each users requirements, rather than attempting to guess how many of each configurations are required in advance.

In this spec, we are focusing on reconfiguring regular hardware to match the needs of each users workload. We are not considering composable hardware.

When creating a Nova instance with Ironic as the backend, there is currently no way to influence the deployment steps used by Ironic to provision the node.

Example Use Case

Consider a bare metal node that has three disks. Different workloads may require different RAID configurations. Operators want to test their hardware and determine the specific set of RAID configurations that work best, and offer that choice to users of Nova, so they can pick the configuration that best matches their workload.

Proposed change

Context: Deployment Steps Framework

This spec depends on the deployment steps framework spec. The deployment steps framework provides the concept of a deployment step that may be executed when a node is deployed.

It is assumed that there is a default set of steps that an operator configures to happen by default on every deploy. Each step task has a priority defined, to determine the order that that step runs relative to other enabled deploy steps. Configuration options and hard-coded priorities define if a task runs by default or not. The priority says when a task runs if the task should run during deployment.

Deploy Templates

This spec introduces the concept of a Deploy Template. It is a mapping from a valid trait name for the node to one or more Deployment Steps and all the arguments that should be given to those deployment steps. There is a new API added to Ironic for Operators to specify these Deploy Templates.

To allow a Deploy Template for a given Ironic node, you add the trait name of the deploy template to the traits list on each Ironic node that needs the deploy template enabled. The validation of the node must fail if any of the enabled Deploy Templates are not supported on that node.

It is worth noting that all traits set on the node are synchronised to Nova’s placement API. In turn, should a user request a flavor that requires a trait (that may or may not map to a deploy template) only nodes that have the trait set will be offered as candidates by Nova’s Placement API.

To request the specified Deploy Template when you provision a particular Ironic node, the corresponding trait is added to a list in node.instance_info under the key of traits. This is what Nova’s ironic virt driver already does for any required trait in the flavor’s extra specs for that instance. Again, Ironic already validates that the traits in node.instance_info are a subset of the traits that the Operator has set on the Ironic node, via the new node-traits API.

During the provisioning process Ironic checks the list of traits set in node.instance_info and checks if they match any Deploy Templates. The list of matching Deploy Templates are then used to extend the list of deployment steps for this particular provision operation. As already defined in the deployment steps framework, this is then combined with the list of deployment steps from what is configured to happen by default for all builds.

The order in which the steps are executed will be defined by the priority of each step. If the code for a deploy step defines a deploy priority of 0, and that is not changed by a configuration option, that deploy step does not get executed by default. If a deploy template specifies a priority (this is required if the code has a default priority of 0), this overrides both the code default and any configuration override.

It is acceptable to request the same deploy step more than once. This could be done to execute a deploy step multiple times with different arguments, for example to partition multiple disks.

Any deploy step in a requested deploy template will override the default arguments and priority for that deploy step. 0 is the only priority override that can be set for any of the core deploy steps, i.e. you can only disable the core step, you can’t change the order of its execution.

Trait names used in Deploy Templates should be unique - no two deploy templates should specify the same trait name.

In summary, you can use traits to specify additional deploy steps, by the mapping between traits and deploy steps specified in the new Deploy Templates API.

Current Limitations

When mapping this back to Nova integration, currently there would need to be a flavor for each of these combinations of traits and resource_class. Longer term Nova is expected to offer the option of a user specifying an override trait on a boot request, based on what the flavor says is possible. This spec has no impact on the Nova ironic virt driver beyond what is already implemented to support node-traits.

Example

Lets now look at how we could request a particular deploy template via Nova.

FlavorVMXMirror has these extra specs:

  • resource:CUSTOM_COMPUTE_A = 1

  • trait:CUSTOM_CLASS_A = required

  • trait:CUSTOM_BM_CONFIG_BIOS_VMX_ON = required

  • trait:CUSTOM_BM_CONFIG_RAID_DISK_MIRROR = required

FlavorNoVMXStripe has these extra specs:

  • resource:CUSTOM_COMPUTE_A = 1

  • trait:CUSTOM_CLASS_A = required

  • trait:CUSTOM_BM_CONFIG_BIOS_VMX_OFF = required

  • trait:CUSTOM_BM_CONFIG_RAID_DISK_STRIPE = required

It’s possible the operator has set all of the Ironic nodes with COMPUTE_A as the resource class to have all of these traits assigned:

  • CUSTOM_BM_CONFIG_BIOS_VMX_ON

  • CUSTOM_BM_CONFIG_BIOS_VMX_OFF

  • CUSTOM_OTHER_TRAIT_I_AM_USUALLY_IGNORED

  • CUSTOM_BM_CONFIG_RAID_DISK_MIRROR

  • CUSTOM_BM_CONFIG_RAID_DISK_STRIPE

The Operator has also defined the following deploy templates:

{
  "deploy-templates": [
    {
      "name": "CUSTOM_BM_CONFIG_RAID_DISK_MIRROR",
      "steps": [
        {
          "interface": "raid",
          "step": "create_configuration",
          "args": {
            "logical_disks": [
              {
                "size_gb": "MAX",
                "raid_level": "1",
                "is_root_volume": true
              }
            ],
            "delete_configuration": true
          },
          "priority": 10
        }
      ]
    },
    {
      "name": "CUSTOM_BM_CONFIG_RAID_DISK_STRIPE",
      "steps": [
        {
          "interface": "raid",
          "step": "create_configuration",
          "args": {
            "logical_disks": [
              {
                "size_gb": "MAX",
                "raid_level": "0",
                "is_root_volume": true
              }
            ],
            "delete_configuration": true
          },
          "priority": 10
        }
      ]
    },
    {
      "name": "CUSTOM_BM_CONFIG_BIOS_VMX_ON",
      "steps": [...]
    },
    {
      "name": "CUSTOM_BM_CONFIG_BIOS_VMX_OFF",
      "steps": [...]
    }
  ]
}

When a Nova instance is created with FlavorVMXMirror, the required traits for that flavor are set on node.instance_info['traits'] such that Ironic adds the deploy steps defined in CUSTOM_BM_CONFIG_BIOS_VMX_ON and CUSTOM_BM_CONFIG_RAID_DISK_MIRROR, and the node is appropriately configured for workloads that want that specific flavor.

Alternatives

Alternative approach

This design solves two problems:

  1. I want to request some custom configuration to be applied to my bare metal server during provisioning.

  2. Ensure that my instance is scheduled to a bare metal node that supports the requested configuration.

As with capabilities, the proposed design uses a single field (traits) to encode configuration and scheduling information. An alternative approach could separate these two concerns.

Deploy templates could be requested by a name (not necessarily a trait) or UUID as a nova flavor extra_spec, and pushed to a deploy_templates field in the ironic node’s instance_info field by the nova virt driver. Ironic would then apply the requested deploy templates during provisioning.

If some influence in the scheduling process is required, this could be provided by traits, but this would be a separate concern.

Adapting the earlier example:

FlavorVMXMirror has these extra specs:

  • resource:CUSTOM_COMPUTE_A = 1

  • trait:CUSTOM_BM_CONFIG_BIOS_VMX_ON = required

  • trait:CUSTOM_BM_CONFIG_RAID_DISK_MIRROR = required

  • deploy_template:BIOS_VMX_ON=<?>

  • deploy_template:BIOS_RAID_DISK_MIRROR=<?>

Only ironic nodes supporting the CUSTOM_BM_CONFIG_BIOS_VMX_ON and CUSTOM_BM_CONFIG_RAID_DISK_MIRROR traits would be scheduled to, and the nova virt driver would set instance_info.deploy_templates to BIOS_VMX_ON,BIOS_RAID_DISK_MIRROR.

There are some benefits to this alternative approach:

  • It would automatically support cases beyond the simple one trait mapping to one deploy template case we have here. For example, to support deploy template X, features Y and Z must be supported by the node (without combinatorial trait explosions).

  • In isolation, the configuration mechanism is conceptually simpler - the flavor specifies a deploy template directly.

  • It would work in standalone ironic installs without introducing concepts from placement.

  • We don’t overload the concept of traits for carrying configuration information.

There are also some drawbacks:

  • Additional complexity for users and operators that now need to apply both traits and deploy templates to flavors.

  • Less familiar for users of capabilities.

  • Having flavors that specify resources, traits and deploy templates in extra_specs could leave operators and users scratching their heads.

Extensions

This spec attempts to specify the minimum viable feature that builds on top of the deployment steps framework specification. As such, there are many possible extensions to this concept that are not being included:

  • While you can use standard traits as names of the deploy templates, it is likely that many operators will be forced into using custom traits for most of their deploy templates. We could better support the users of standard traits if we added a list of traits associated with each deploy template, in addition to the trait based name. This list of traits will act as an alias for the name of the deploy template, but this alias may also be used by many other deploy templates. The node validate will fail if for any individual node one of traits set maps to multiple deploy templates. To disambiguate which deploy template is requested, you can look at what deploy template names are in the chosen node’s trait list. For each deploy template you look at any other traits that can be used to trigger that template, eventually building up a trait to deploy template mapping for each trait set on the node (some traits will not map to any deploy template). That can be used to detect if any of the traits on the node map to multiple deploy templates, causing the node validate to fail.

  • For some operators, they will end up creating a crazy number of flavors to cover all the possible combinations of hardware they want to offer. It is hoped Nova will eventually allow operators to have flavors that list possible traits, and a default set of traits, such that end users can request the specific set of traits they require in addition to the chosen flavor.

  • While ironic inspector can be used to ensure each node is given an appropriate set of traits, it feels error prone to add so many traits to each Ironic node. It is hoped when a concept of node groups is added, traits could be applied to a group of nodes instead of only applying traits to individual nodes (possibly in a similar way to host aggregates in Nova). One suggestion was to use the Resource Class as a possible grouping, but that is only a very small part of the more general issue of groups nodes to physical networks, routed network segments, power distribution groups, all mapping to different ironic conductors, etc.

  • There were discussions about automatically detecting which Deploy Templates each of the nodes supported. However most operators will want to control what is available to only the things they wish to support.

Data model impact

Two new database tables will be added for deploy templates:

CREATE TABLE deploy_templates (
    id INT(11) NOT NULL AUTO_INCREMENT,
    name VARCHAR(255) CHARACTER SET utf8 NOT NULL,
    uuid varchar(36) DEFAULT NULL,
    PRIMARY KEY (id),
    UNIQUE KEY `uniq_deploy_templaes0uuid` (`uuid`),
    UNIQUE KEY `uniq_deploy_templaes0name` (`name`),
)

CREATE TABLE deploy_template_steps (
    deploy_template_id INT(11) NOT NULL,
    interface VARCHAR(255) NOT NULL,
    step VARCHAR(255) NOT NULL,
    args TEXT NOT NULL,
    priority INT NOT NULL,
    KEY `deploy_template_id` (`deploy_template_id`),
    KEY `deploy_template_steps_interface_idx` (`interface`),
    KEY `deploy_template_steps_step_idx` (`step`),
    CONSTRAINT `deploy_template_steps_ibfk_1` FOREIGN KEY (`deploy_template_id`) REFERENCES `deploy_templates` (`id`),
)

The deploy_template_steps.args column is a JSON-encoded object of step arguments, JsonEncodedDict.

New ironic.objects.deploy_template.DeployTemplate and ironic.objects.deploy_template_step.DeployTemplateStep objects will be added to the object model. The deploy template object will provide support for looking up a list of deploy templates that match any of a list of trait names.

State Machine Impact

No impact beyond that already specified in the deploy steps specification.

REST API impact

A new REST API endpoint will be added for deploy templates, hidden behind a new API microversion. The endpoint will support standard CRUD operations.

In the following API, a UUID or trait name is accepted for a deploy template’s identity.

List all

List all deploy templates:

GET /v1/deploy-templates

Request: empty

Response:

{
  "deploy-templates": [
    {
      "name": "CUSTOM_BM_CONFIG_RAID_DISK_MIRROR",
      "steps": [
        {
          "interface": "raid",
          "step": "create_configuration",
          "args": {
            "logical_disks": [
              {
                "size_gb": "MAX",
                "raid_level": "1",
                "is_root_volume": true
              }
            ],
            "delete_configuration": true
          },
          "priority": 10
        }
      ],
      "uuid": "8221f906-208b-44a5-b575-f8e8a59c4a84"
    },
    {
      ...
    }
  ]
}

Response codes: 200, 400

Policy: admin or observer.

Show one

Show a single deploy template:

GET /v1/deploy-templates/<deploy template ident>

Request: empty

Response:

{
  "name": "CUSTOM_BM_CONFIG_RAID_DISK_MIRROR",
  "steps": [
    {
      "interface": "raid",
      "step": "create_configuration",
      "args": {
        "logical_disks": [
          {
            "size_gb": "MAX",
            "raid_level": "1",
            "is_root_volume": true
          }
        ],
        "delete_configuration": true
      },
      "priority": 10
    }
  ],
  "uuid": "8221f906-208b-44a5-b575-f8e8a59c4a84"
}

Response codes: 200, 400, 404

Policy: admin or observer.

Create

Create a deploy template:

POST /v1/deploy-templates

Request:

{
  "name": "CUSTOM_BM_CONFIG_RAID_DISK_MIRROR",
  "steps": [
    {
      "interface": "raid",
      "step": "create_configuration",
      "args": {
        "logical_disks": [
          {
            "size_gb": "MAX",
            "raid_level": "1",
            "is_root_volume": true
          }
        ],
        "delete_configuration": true
      },
      "priority": 10
    }
  ],
}

Response: as for show one.

Response codes: 201, 400, 409

Policy: admin.

Update

Update a deploy template:

PATCH /v1/deploy-templates/{deploy template ident}

Request:

[
  {
    "op": "replace",
    "path": "/name"
    "value": "CUSTOM_BM_CONFIG_RAID_DISK_MIRROR"
  },
  {
    "op": "replace",
    "path": "/steps"
    "value": [
      {
        "interface": "raid",
        "step": "create_configuration",
        "args": {
          "logical_disks": [
            {
              "size_gb": "MAX",
              "raid_level": "1",
              "is_root_volume": true
            }
          ],
          "delete_configuration": true
        },
        "priority": 10
      }
    ]
  }
]

Response: as for show one.

Response codes: 200, 400, 404, 409

Policy: admin.

The name and steps fields can be updated. The uuid field cannot.

Delete

Delete a deploy template:

DELETE /v1/deploy-templates/{deploy template ident}

Request: empty

Response: empty

Response codes: 204, 400, 404

Policy: admin.

Client (CLI) impact

“ironic” CLI

None

“openstack baremetal” CLI

In each of the following commands, a UUID or trait name is accepted for the deploy template’s identity.

For the --steps argument, either a path to a file containing the JSON data or - is required. If - is passed, the JSON data will be read from standard input.

List deploy templates:

openstack baremetal deploy template list

Show a single deploy template:

openstack baremetal deploy template show <deploy template ident>

Create a deploy template:

openstack baremetal deploy template create --name <trait> --steps <deploy steps>

Update a deploy template:

openstack baremetal deploy template set <deploy template ident> [--name <trait] [--steps <deploy steps>]

Delete a deploy template:

openstack baremetal deploy template delete <deploy template ident>

In these commands, <deploy steps> are in JSON format and support the same input methods as clean steps - string, file or standard input.

RPC API impact

None

Driver API impact

None

Nova driver impact

Existing traits integration is enough, only now the selected traits on boot become more important.

Ramdisk impact

None

Security impact

Allowing the deployment process to be customised via deploy templates could open up security holes. These risks are mitigated, as seen through the following observations:

  • Only admins can define the set of allowed traits for each node.

  • Only admins can define the set of requested traits for each Nova flavor, and allow access to that flavor for other users.

  • Only admins can create or update deploy templates via the API.

  • Deploy steps referenced in deploy templates are defined in driver code.

Other end user impact

Users will need to be able to discover what each Nova flavor does in terms of deployment customisation. Beyond checking requested traits and cross-referencing with the ironic deploy templates API, this is deemed to be out of scope. Operators should provide sufficient documentation about the properties of each flavor. The ability to look up a deploy template by trait name should help here.

Scalability impact

Increased activity during deployment could have a negative impact on the scalability of ironic.

Performance Impact

Increased activity during deployment could have a negative impact on the performance of ironic, including increasing the time required to provision a node.

Other deployer impact

Deployers will need to ensure that Nova flavors have required traits set appropriately.

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

Mark Goddard (mgoddard)

Other contributors:

  • Dmitry Tantsur (dtantsur)

  • Ruby Loo (rloo)

Work Items

  • Add DB tables and objects for deploy templates

  • Write code to map traits to deploy templates

  • Extend node validation to check all deploy templates are valid

  • Add API to add deploy templates

  • Extend CLI to support above API

  • Write tests

Dependencies

Testing

Unit tests will be added to ironic. Tempest API tests will exercise the deploy templates CRUD API.

Upgrades and Backwards Compatibility

The deploy steps API endpoint will be hidden behind a new API version.

During normal operation when the ironic conductor is not pinned, deploy templates will be used to add deploy steps during node provisioning, even if the caller of the node state API uses a microversion that does not support deploy templates.

During an upgrade when the ironic conductor is pinned, deploy templates will not be used to add deploy steps during node provisioning.

Documentation Impact

  • Admin guide on how to configure Nova flavors and deploy templates

  • Update API ref

  • Update CLI docs

References