Refactoring OSA inventory

date:2018-04-12 22:00
tags:osa, inventory

The inventory as it stands today has been growing in complexity and has only grown organically since its first implementation in icehouse. Given that Ansible has changed a lot and has added capabilities which were not available in those early versions, it is time to take a step back and look at how it can be re-worked to reduce technical debt and make it easier to maintain.

Problem description

The current OpenStack-Ansible inventory provides the following features:

  • Assignment of hosts into groups
  • Generating the group structure
  • Assigning host variables
  • Generating container inventory_hostnames
  • Assigning and tracking container IPs based on cidr_networks, reserved IPs, and already allocated IPs.

All these features are included into a single dynamic inventory script, because at the time of its creation, only one inventory was allowed at a time in an ansible cli call.

The dynamic inventory shipped by OSA is core of the functionality of OpenStack-Ansible, yet it is not well understood, neither by the core maintainers nor by new contributors.

As a result, the inventory has grown organically, both in code and in memory usage (changes in the way we deploy things, adding new groups, adding edge cases), and has not seen much maintenance to reduce its scope or the technical debt.

At this point, due to a lack of tests and the complexity of the code, it is difficult to work on without causing hidden breakages which are often only found months later. Adding tests is unrealisticly hard for this legacy code.

The problems can therefore be summarized in a few points:

  • The inventory needs to be cleaned up of unnecessary groups and assignments, but it is difficult to clean up effectively without causing hidden breakages.
  • We have to carry code in openstack-ansible that is not actively maintained
  • We have to execute code that’s not actively audited, while it would be technically possible to avoid the execution of code with very few limitations for the end-user.
  • Introducing tests to verify regressions was attempted during the Newton, Ocata and Pike development cycles - but that has done nothing more than increase the code complexity and has done nothing to improve the reliability.

Proposed change

Now that we are using Ansible 2.4, we can:

  • Stack inventories together, and therefore we can split inventories into smaller inventories if necessary
  • Import, and convert inventories to a more readable format.

What I am proposing is to use static files for inventory. It is easier for people to edit the inventory, and review it. It’s easier to manipulate, and doesn’t require our code to run or edit it.

Host vars, group vars, and inventory structure would be static files, and slimmed down to the minimum.

Here are two example of slimming down (hosts vars, and inventory):

  • For me, the features to track proper IP assignment is the scope of a CMDB/IPAM. We shouldn’t reinvent the wheel there. Instead this should be spun out of the inventory. People should either:

    • use the old inventory to keep the same features, but we add a warning that the code is deprecated
    • provide their own IP addresses in a static file
    • provide their own dynamic inventory script or use a lookup to fetch data from their IPAM.

    With the generation of IPs outside the scope of the inventory, we could simplify the dynamic inventory further.

  • For me, the groups like haproxy, haproxy_all, haproxy_hosts or haproxy_containers are all confusing. Some are used interchangeably, which led to bugs. The proliferation of groups is only due to our inventory. These can all be consolidated into a single group, by changing the playbooks and roles. This is not only restricted to haproxy, and this pattern of group reduction should be extended to all our inventory.

So, at first we need to keep the same configuration style (conf.d/env.d/openstack_user_config). The generated json would then go through a script to generate and clean the static files.

That script would be part of the deploy and upgrade process.

Later, we could re-think the conf.d/env.d/openstack_user_config, or keep it the same but completely change the underlying code. That wouldn’t be a problems, because it could be done on the side, as a different inventory system. We would have, on the way, documented the input and outputs of the inventory, which could then be used for building test cases.

Alternatives

Do nothing

Playbook/Role impact

Removing references to old inventory data like old groups. Use lookups or ansible_facts better to reduce the amount of hostvars.

Upgrade impact

Because our inventories are already in a bad state, we already have hosts in the wrong groups.

Upgrade would need to run the tool to migrate the groups to the new groups presented in the playbooks.

Security impact

By ultimately shipping less code, we would marginally improve our security.

Performance impact

  • Moving from dynamic to static file with the same format doesn’t change performance
  • Moving from static json to static yaml may or may not improve performance in your deployment by reducing memory usage. It fully depends on the inventory. Large inventories are more likely to lose performance by switching to yaml for the same input.
  • Cleaning up the inventory have a positive performance impact.

End user impact

The end users will not notice any change.

Deployer impact

The deployer will have a different user configuration to deal with (static files)

Hopefully it shouldn’t be too hard to understand for an existing openstack-ansible user, or a experienced ansible user.

Developer impact

No change for the development of roles or playbooks.

At the same time we are removing technical debt, we are adding new technical debt by adding these new tools.

With the hope this tools would be easier to understand, read, review, and having more tests, it would overall reduce risks for the project.

Dependencies

None

Implementation

Assignee(s)

Primary assignee:
evrardjp
Other contributors:
None for now.

Work items

Use static files is not without downsides: We are losing some key features if we “just use” a static inventory which is created by the user, like the dynamic hostname generation, the dynamic IP allocations.

So I propose the following path:

  1. We list the groups required for a successful ansible deploy, and document those in the reference guide.

    Positive improvements:

    • For deployers that don’t want to use our inventory, we would now have an “explicit” contract of what they should do to run openstack-ansible with their own inventory groups

    Drawbacks:

    • All changes in groups now needs proper documentation
    • That’s not enough to come with your own inventory
  2. Keep the conf.d/env.d, and dynamic inventory script for now. We use it for generating a json that stays static during the lifecycle of the cloud, or until re-generated manually. The env.d/conf.d/openstack_user_config.yml are used as input for this “one-off” run of the dynamic inventory.

    To make sure deployers don’t misunderstand the “static” json file or confuse it with the current openstack_inventory.json, we should move the current files to a “cache” folder, and generated the “static” inventory into a inventory folder.

    Positive improvements:

    • No hidden failures, the generation of the inventory becomes a part of the deploy. We can add health checks easily.
    • Our code run only once, during the generation. Therefore we are not vulnerable to issues appearing when running multiple ansible simulatenously, or other side effects.
    • We keep the container name generation, provider networks, and IP assignments for free.

    Drawbacks:

    • Edition of static file will not be in sync with conf.d/env.d, but that was already the case with a manual change to openstack_inventory.json
    • The inventory_manage script becomes useless
  3. We provide default child mapping: we create the x_all groups in an easy to read .ini file in the openstack-ansible repo.

    Positive improvements:

    • All our users with their own inventory won’t have to create EXACTLY the same code to do child group mapping. Sharing is caring.
    • We would cary a lot of empty groups, and maybe people don’t need them.
    • The mapping could then be used to partially replace the documentation of step 1, and will fully replace the step 1 documentation when the groups will be cleaned in the playbooks and roles.
  4. We export the host vars into a static files inside the userspace inventory folder.

    Positive improvements:

    • Having static yaml files will make it easy to see repetitions, and things that can move to group vars

    Drawbacks: * More static files to maintain by the deployer.

    If we change a host var, we could change the inventory and it was applied everywhere. It would not be the case anymore.

  5. We write a tool manipulating the inventory json. By default, that tool would:

    • discard all the groups that aren’t listed in the reference guide
    • discard all the _all groups from the inventory, as they would not be required in the json anymore (handled at a previous step)
    • discard all the host variables (handled at a previous step)
    • discard groups that can be generated from facts/host variables, like all_containers (using group_by would provide the same result).

    Positive improvements:

    • The inventory would be lighter, and therefore require less memory to run. It would also run faster and require less computing power.

    Drawbacks:

    • All the changes in groups now require a modification of said tool, so a good design is necessary to make it easy to change.
  6. We document a list of the expected and required host/groups variables.

  7. We remove all the unnecessary group and host variables that were part of the inventory but aren’t important anymore by using/providing a tool manipulating variable files (yaml), or by providing release notes.

  8. We document how to export the cleaned up inventory into a new YAML file.

  9. The generation of conf.d, env.d, and openstack_user_config becomes totally optional at this point: We know what is required in a build, and ask deployers to provide their own group/host mapping.

    At this point it’s optional because:

    1. Assignment of hosts into groups can be done by the user with a simple .ini/.yaml file + documentation
    2. Standard group structure is provided by default
    3. We have documented the list of host variables, so they can be provided by the user
    4. Generating container with their inventory_hostnames can be done by the user. It’s just a series of host variables: ansible_host, container_name, container_tech, physical_host. It can even be done with a add_hosts and a loop based on a new variable like container_names (property of the host).
    5. Assigning and tracking container IPs based on cidr_networks, reserved IPs, and already allocated IPs are also host variables. Deployers are responsible to provide an IP for their containers. Example, the lxc_container_create role creates IP, network, and interfaces configuration based on lxc_container_networks_combined, which a variable taking information from the inventory, by combining default lxc_container_networks with the “container_networks” variable, which is part of the inventory. Note: this part can be later replaced by a lookup. By using a lookup, we would simplify the inventory, by completely removing its container networks of the host vars.
  10. We provide a script that runs all these actions for the user, but also allow step by step editions and manipulations.

  11. We provide a new tool to generate a new kind of inventory based on what we learned from users, which won’t necessary use the openstack_user_config, conf.d, or env.d. But we have all the time we need to do it better, because the expected inventory is not the same as the one we did the past.

  12. We spin the old inventory out.

Testing

All the work items would be separately tested in the integrated gates.

Documentation impact

Large. The inventory would need a refactor to explain the expectations for people coming with their inventory, and for people that will use our generation tool. At the last step, if another tool is provided, it would also require documenting.

Each step would require modifying the reference, and maybe the operations guide.

References

None