Remove merge.py from TripleO Heat Templates

https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-remove-mergepy

merge.py is where we’ve historically accumulated the technical debt for our Heat templates [0] with the intention of migrating away from it when Heat meets our templating needs.

Its main functionality includes combining smaller template snippets into a single template describing the full TripleO deployment, merging certain resources together to reduce duplication while keeping the snippets themselves functional as standalone templates and a support for manual scaling of Heat resources.

This spec describes the changes necessary to move towards templates that do not depend on merge.py. We will use native Heat features where we can and document the rest, possibly driving new additions to the Heat template format.

It is largely based on the April 2014 discussion in openstack-dev [1].

Problem Description

Because of the mostly undocumented nature of merge.py our templates are difficult to understand or modify by newcomers (even those already familiar with Heat).

It has always been considered a short-term measure and Heat can now provide most of what we need in our templates.

Proposed Change

We will start with making small correctness-preserving changes to our templates and merge.py that move us onto using more Heat native features. Where we cannot make the change for some reason, we will file a bug with Heat and work with them to unblock the process.

Once we get to a point where we have to do large changes to the structure of our templates, we will split them off to new files and enable them in our CI as parallel implementations.

Once we are confident that the new templates fulfill the same requirements as the original ones, we will deprecate the old ones, deprecate merge.py and switch to the new ones as the default.

The list of action items necessary for the full transition is below.

1. Remove the custom resource types

TripleO Heat templates and merge.py carry two custom types that (after the move to software config [8], [9]) are no longer used for anything:

  • OpenStack::ImageBuilder::Elements
  • OpenStack::Role

We will drop them from the templates and deprecate in the merge tool.

2. Remove combining whitelisted resource types

If we have two AWS::AutoScaling::LaunchConfiguration resources with the same name, merge.py will combine their Properties and Metadata. Our templates are no longer using this after the software-config update.

3. Port TripleO Heat templates to HOT

With most of the non-Heat syntax out of the way, porting our CFN/YAML templates to pure HOT format [2] should be straightforward.

We will have to update merge.py as well. We should be able to support both the old format and HOT.

We should be able to differentiate between the two by looking for the heat_template_version top-level section which is mandatory in the HOT syntax.

Most of the changes to merge.py should be around spelling (Parameters -> parameters, Resources -> resources) and different names for intrinsic functions, etc. (Fn::GetAtt -> get_attr).

This task will require syntactic changes to all of our templates and unfortunately, it isn’t something different people can update bit by bit. We should be able to update the undercloud and overcloud portions separately, but we can’t e.g. just update a part of the overcloud. We are still putting templates together with merge.py at this point and we would end up with a template that has both CFN and HOT bits.

4. Move to Provider resources

Heat allows passing-in multiple templates when deploying a stack. These templates can map to custom resource types. Each template would represent a role (compute server, controller, block storage, etc.) and its parameters and outputs would map to the custom resource’s properties and attributes.

These roles will be referenced from a master template (overcloud.yaml, undercloud.yaml) and eventually wrapped in a scaling resource (OS::Heat::ResourceGroup [5]) or whatever scaling mechanism we adopt.

Note

Provider resources represent fully functional standalone templates. Any provider resource template can be passed to Heat and turned into a stack or treated as a custom resource in a larger deployment.

Here’s a hypothetical outline of compute.yaml:

parameters:
  flavor:
    type: string
  image:
    type: string
  amqp_host:
    type: string
  nova_compute_driver:
    type: string

resources:
  compute_instance:
    type: OS::Nova::Server
    properties:
      flavor: {get_param: flavor}
      image: {get_param: image}

  compute_deployment:
    type: OS::Heat::StructuredDeployment
    properties:
      server: {ref: compute_instance}
      config: {ref: compute_config}
      input_values:
        amqp_host: {get_param: amqp_host}
        nova_compute_driver: {get_param: nova_compute_driver}

  compute_config:
    type: OS::Heat::StructuredConfig
      properties:
        group: os-apply-config
        config:
          amqp:
            host: {get_input: amqp_host}
          nova:
            compute_driver: {get_input: nova_compute_driver}
          ...

We will use a similar structure for all the other roles (controller.yaml, block-storage.yaml, swift-storage.yaml, etc.). That is, each role will contain the OS::Nova::Server, the associated deployments and any other resources required (random string generators, security groups, ports, floating IPs, etc.).

We can map the roles to custom types using Heat environments [4].

role_map.yaml:

resource_registry:
  OS::TripleO::Compute: compute.yaml
  OS::TripleO::Controller: controller.yaml
  OS::TripleO::BlockStorage: block-storage.yaml
  OS::TripleO::SwiftStorage: swift-storage.yaml

Lastly, we’ll have a master template that puts it all together.

overcloud.yaml:

parameters:
  compute_flavor:
    type: string
  compute_image:
    type: string
  compute_amqp_host:
    type: string
  compute_driver:
    type: string
  ...

resources:
  compute0:
    # defined in controller.yaml, type mapping in role_map.yaml
    type: OS::TripleO::Compute
    parameters:
      flavor: {get_param: compute_flavor}
      image: {get_param: compute_image}
      amqp_host: {get_param: compute_amqp_host}
      nova_compute_driver: {get_param: compute_driver}

  controller0:
    # defined in controller.yaml, type mapping in role_map.yaml
    type: OS::TripleO::Controller
    parameters:
      flavor: {get_param: controller_flavor}
      image: {get_param: controller_image}
      ...

outputs:
  keystone_url:
    description: URL for the Overcloud Keystone service
    # `keystone_url` is an output defined in the `controller.yaml` template.
    # We're referencing it here to expose it to the Heat user.
    value: { get_attr: [controller_0, keystone_url] }

and similarly for undercloud.yaml.

Note

The individual roles (compute.yaml, controller.yaml) are structured in such a way that they can be launched as standalone stacks (i.e. in order to test the compute instance, one can type heat stack-create -f compute.yaml -P ...). Indeed, Heat treats provider resources as nested stacks internally.

5. Remove FileInclude from ``merge.py``

The goal of FileInclude was to keep individual Roles (to borrow a loaded term from TripleO UI) viable as templates that can be launched standalone. The canonical example is nova-compute-instance.yaml [3].

With the migration to provider resources, FileInclude is not necessary.

6. Move the templates to Heat-native scaling

Scaling of resources is currently handled by merge.py. The --scale command line argument takes a resource name and duplicates it as needed (it’s a bit more complicated than that, but that’s beside the point).

Heat has a native scaling OS::Heat::ResourceGroup [5] resource that does essentially the same thing:

scaled_compute:
  type: OS::Heat::ResourceGroup
  properties:
    count: 42
    resource_def:
      type: OS::TripleO::Compute
      parameters:
        flavor: baremetal
        image: compute-image-rhel7
        ...

This will create 42 instances of compute hosts.

7. Replace Merge::Map with scaling groups’ inner attributes

We are using the custom Merge::Map helper function for getting values out of scaled-out servers:

The ResourceGroup resource supports selecting an attribute of an inner resource as well as getting the same attribute from all resources and returning them as a list.

Example of getting an IP address of the controller node:

{get_attr: [controller_group, resource.0.networks, ctlplane, 0]}

(controller_group is the ResourceGroup of our controller nodes, ctlplane is the name of our control plane network)

Example of getting the list of names of all of the controller nodes:

{get_attr: [controller_group, name]}

The more complex uses of Merge::Map involve formatting the returned data in some way, for example building a list of {ip: ..., name: ...} dictionaries for haproxy or generating the /etc/hosts file.

Since our ResourceGroups will not be using Nova servers directly, but rather the custom role types using provider resources and environments, we can put this data formatting into the role’s outputs section and then use the same mechanism as above.

Example of building out the haproxy node entries:

# overcloud.yaml:
resources:
  controller_group:
    type: OS::Heat::ResourceGroup
    properties:
      count: {get_param: controller_scale}
      resource_def:
        type: OS::TripleO::Controller
        properties:
          ...

  controllerConfig:
    type: OS::Heat::StructuredConfig
    properties:
      ...
      haproxy:
        nodes: {get_attr: [controller_group, haproxy_node_entry]}



# controller.yaml:
resources:
  ...
  controller:
    type: OS::Nova::Server
    properties:
      ...

outputs:
  haproxy_node_entry:
    description: A {ip: ..., name: ...} dictionary for configuring the
      haproxy node
    value:
      ip: {get_attr: [controller, networks, ctlplane, 0]}
      name: {get_attr: [controller, name]}

Alternatives

This proposal is very t-h-t and Heat specific. One alternative is to do nothing and keep using and evolving merge.py. That was never the intent, and most members of the core team do not consider this a viable long-term option.

Security Impact

This proposal does not affect the overall functionality of TripleO in any way. It just changes the way TripleO Heat templates are stored and written.

If anything, this will move us towards more standard and thus more easily auditable templates.

Other End User Impact

There should be no impact for the users of vanilla TripleO.

More advanced users may want to customise the existing Heat templates or write their own. That will be made easier when we rely on standard Heat features only.

Performance Impact

This moves some of the template-assembling burden from merge.py to Heat. It will likely also end up producing more resources and nested stacks on the background.

As far as we’re aware, no one has tested these features at the scale we are inevitably going to hit.

Before we land changes that can affect this (provider config and scaling) we need to have scale tests in Tempest running TripleO to make sure Heat can cope.

These tests can be modeled after the large_ops scenario: a Heat template that creates and destroys a stack of 50 Nova server resources with associated software configs.

We should have two tests to asses the before and after performance:

  1. A single HOT template with 50 copies of the same server resource and software config/deployment.
  2. A template with a single server and its software config/deploys, an environment file with a custom type mapping and an overall template that wraps the new type in a ResourceGroup with the count of 50.

Other Deployer Impact

Deployers can keep using merge.py and the existing Heat templates as before – existing scripts ought not break.

With the new templates, Heat will be called directly and will need the resource registry (in a Heat environment file). This will mean a change in the deployment process.

Developer Impact

This should not affect non-Heat and non-TripleO OpenStack developers.

There will likely be a slight learning curve for the TripleO developers who want to write and understand our Heat templates. Chances are, we will also encounter bugs or unforeseen complications while swapping merge.py for Heat features.

The impact on Heat developers would involve processing the bugs and feature requests we uncover. This will hopefully not be an avalanche.

Implementation

Assignee(s)

Primary assignee:
Tomas Sedovic <lp: tsedovic> <irc: shadower>

Work Items

  1. Remove the custom resource types
  2. Remove combining whitelisted resource types
  3. Port TripleO Heat templates to HOT
  4. Move to Provider resources
  5. Remove FileInclude from merge.py
  6. Move the templates to Heat-native scaling
  7. Replace Merge::Map with scaling groups’ inner attributes

Dependencies

  • The Juno release of Heat
  • Being able to kill specific nodes in Heat (for scaling down or because they’re misbehaving) - Relevant Heat blueprint: autoscaling-parameters

Testing

All of these changes will be made to the tripleo-heat-templates repository and should be testable by our CI just as any other t-h-t change.

In addition, we will need to add Tempest scenarios for scale to ensure Heat can handle the load.

Documentation Impact

We will need to update the devtest, Deploying TripleO and Using TripleO documentation and create a guide for writing TripleO templates.