Granular deployment for fuel roles

Problem description

Our deployment process is very complicated. There are a lot of Puppet modules used together by our manifests and dependencies between these modules are also very complex and abundant. This leads to the following consequences: - It becomes very difficult to add new features. Even changes that could look minor from the first glance can easily and randomly break any other functionality. Trying to guess how any change could affect dependencies and ordering of deployment process is very hard and error-prone. - Debugging is also affected. Localizing bugs could be very troublesome due to manifests being complex and hard to understand. Debugging tools are also almost non-existent. - Reproducing bugs and testing takes lots of time because we have no easy and reliable way to repeat only some part of the deployment. The only thing we can do is to start the process again and wait for several hours to get any results. Snapshots are not very helpful because deployment cannot be reliably stopped and the state saved. These actions most likely break deployment or at least change its outcome. - New members of our team or outside developers who want to add some new functionality to our project are completely out of luck. They will have to spend many days just to gain minor understanding how our deployment works. And most likely will make a lot of hard to debug mistakes. - Using our product is also not as easy as we would like it to be for customers and other people in the community. People usually cannot easily understand how the deployment works and have to just follow every step in documentation. It makes them unable to act reasonably if something goes wrong.

Proposed change

If we want to address any of these issues we should find a way to make our architecture less complex and more manageable. It’s known that the best way to understand any large and monolithic structure is to to take it apart and then learn how does each of the pieces work and then how do they interact with each other.

So we should try to separate the whole deployment process to many small parts that could do only one or several closely related tasks. Each of these parts would be easy to understand for a single developer. Testing and debugging could also be done separately so localizing and fixing bugs would be much easier than it is now.

Thinking about the deployment process as a list of atomic tasks will make our reference architectures and server roles much more dynamic. If you change what tasks you are going to perform and their order you can create as many custom sets of roles as you need without any modification of the tasks themselves.

Each task can have some internal dependencies but most likely there would not be too many of them. It will make manual analyzing of dependency graph possible within a single task. The task can also have some requirements. System should be in the specific state before the task can be started.

The introduction of Granular Deployment will be a rather extensive change to almost all components of the Fuel project and a serious architectural modification.

Graph-based Task API

Several types of tasks will be introduced in addition to basic deployment types, like puppet, shell, rsync, upload_file. This types are groups and stages, and they will serve the purpose to build flexible graph of tasks.

Types of tasks:

- type: group - grouping of the tasks based on nailgun role entities
- type: stage - skeleton of deployment graph in fuel, right now there
  is 3 stages: pre_deployment, deployment, post_deploment

deployment tasks:
 - type: puppet - executes puppet manifests with puppet apply
 - type: shell - executes any shell command, like python script.py,
 /script.sh
 - type: upload_file - used for configuration upload to target nodes,
 repo creation
 - type: rsync
  • pre_deployment - other actions, including plugin
  • deployment - executes main deployment stage
  • post_deployment - actions that can be executed only after whole deployment is done

Ideally all dependencies between tasks should be described with requires and required_for attributes, it will allow us to build graph of tasks in nailgun and then serialize it into orchestrator acceptable format (workbooks for mistral, or astute-speficic roles with priorities).

type: GROUP

id: controller
type: group
role: [<roles>]
requires: [<groups>]
required_for: [<stages>, <roles>]
tasks: [<tasks>]
parameters:
    strategy:
        type: parallel
        amount: 8
  • each chunk of nodes with this role (8 in this example) will be executed in parallel
::
strategy:
type: one_by_one
  • all nodes with this role will be executed sequentially
::
strategy:
type: parallel
  • all nodes with this role will be executed in parallel
::
tasks: [<tasks>]
  • this section required for ease of understanding, which tasks belong where

type: STAGE

id: deploy
type: stage
requires: [<stages>]

Right now we are using hardcoded set of stages, but it is completely possible to make it flexible, and define them with API.

type: DEPLOYMENT TASK TYPES

id: deploy_legacy
type: puppet
role: [primary-controller, controller,
       cinder, compute, ceph-osd]
requires: [<tasks>]
required_for: [<stage>]
parameters:
    puppet_manifest: /etc/puppet/manifests/site.pp
    puppet_modules: /etc/puppet/modules
    timeout: 360

id: network
type: shell
groups: [primary-controller, controller]
requires: [deploy_legacy]
required_for: [deploy]
parameters:
    cmd: python /opt/setup_network.py
    timeout: 600

Conditional tasks

Major part of tasks will require conditional expressions. There is several ways to solve it:

1. Implement python framework for pluging a task. Each task will have clear interface for defining condition for a task, and if this condition passes - task will be serialized. This the most scalable and solid solution, but developing such framework will require a lot of effor, and we wont be able to land it in 6.1

2. Define conditions in custom expression parser that is also used on UI. There is couple of downsides with this approach: - Not all conditions can be expressed. For example, if zabbix-role present in cluster - deploy zabbix-agent for each role - It is new expression language, which we need to support ourselves - It depends on context data, which is quite easy to change

3. Define certain groups for tasks, and each mutually exclusive task will be able to specify its group. - This wont work with conditions that are not mutually exclusive.

4. Using strict API for conditions that can be used in expressions parsing. Pros: - it is not a new language - it has very strict api, so atleast we can try to guarantee its stability - complex abstract logic can be hidden in simple python methods

Stements will be expressed in the form of:

api.cluster_status == ‘operational’ api.role_in_deployment(‘ceph-osd’) api.role_in_cluster(‘zabbix-server’) api.cluster_status == ‘new’ and api.nodes_count() > 10

class ExpressionApi(object):

    def __init__(self, cluster, nodes):
        self.cluster = cluster
        self.nodes = nodes

    def role_in_deployment(self, role):
        for node in self.nodes:
            if role in node.roles:
                return True
        return False

    def role_in_cluster(self, role):
        for node in self.cluser.nodes:
            if role in node.roles:
                return True
        return False

    def nodes_count(self):
        return len(self.nodes)

    @property
    def cluster_status(self):
        return self.cluster.status

env = jinja2.Environment()
expr = env.compile_expression("api.cluster_status == 'operational'
                               and api.nodes_count() < 9")
print expr(api=API)

In 6.1 we will either stick to existing expression language that is used for cluster settings validation.

Operators are available in [2].

Usage of graph in nailgun

Based on provided tasks and dependencies between tasks we will build graph object with help of networkx library [1]. Format of serialized information will depend on orchestrator that we will use in any particular release.

Let me provide an example:

Consider that we have several types of roles:

- id: deploy
  type: stage
- id: primary-controller
  type: group
  role: [primar-controller]
  required_for: [deploy]
  parameters:
    strategy:
      type: one_by_one
- id: controller
  type: group
  role: [controller]
  requires: [primary-controller]
  required_for: [deploy]
  parameters:
    strategy:
      type: parallel
      amount: 2
- id: cinder
  type: group
  role: [cinder]
  requires: [controller]
  required_for: [deploy]
  parameters:
    strategy:
      type: parallel
- id: compute
  type: group
  role: [compute]
  requires: [controller]
  required_for: [deploy]
  parameters:
    strategy:
        type: parallel
- id: network
  type: group
  role: [network]
  requires: [controller]
  required_for: [compute, deploy]
  parameters:
    strategy:
        type: parallel

And there is defined tasks for each role:

- id: setup_services
  type: puppet
  requires: [setup_network]
  groups: [controller, primary-controller, compute, network, cinder]
  required_for: [deploy]
  parameters:
    puppet_manifests: /etc/puppet/manifests/controller.pp
    puppet_modules: /etc/puppet/modules
    timeout: 360
- id: setup_network
  type: shell
  groups: [controller, primary-controller, compute, network, cinder]
  required_for: [deploy]
  parameters:
    cmd: run_setup_network.sh
    timeout: 120

For each role we can define different subsets of tasks, but for simplicity lets make this tasks applicable for each role.

Based on this configuration nailgun will send to orchestrator config in expected by orchestator format.

For example we have several nodes for deployment:

::
primary-controller: [node-1] controller: [node-4, node-2, node-3, node-5] cinder: [node-6] network: [node-7] compute: [node-8]

This nodes will be executed in following order: Deploy primary-controller node-1 Deploy controller node-4, node-2 - you can see that parallel amount is 2 Deploy controller node-3, node-5 Deploy network role node-7 and cinder node-6 - they depend on controller Deploy compute node-8 - compute depends both on network and controller

During deployment for each node 2 tasks will be executed sequentially:

Run shell script setup_network Run puppet setup_services

Pre/post_deployment task examples

- id: update_hosts
  type: puppet
  role: '*'
  stage: post_deployment
  requires: [upload_nodes_info]
  parameters:
    puppet_manifest: /etc/pupppet/modules/update_hosts_file.pp
    puppet_modules: /etc/puppet/modules 16
    timeout: 3600
    cwd: /

- id: rsync_puppet
  type: rsync
  role: '*'
  stage: pre_deployment
  parameters:
    src: /etc/pupppet/{VERSION}
    dst: /etc/puppet/modules
    timeout: 3600

Alternatives

Execute deployment based not on roles, but on tasks. To consider this as alternative we need to modularize atleast each deployment role as separate manifest. So in current deployment model, there will be next set of manifests:

  • controller.pp
  • mongo.pp
  • ceph_osd.pp
  • cinder.pp
  • zabbix.pp
  • compute.pp

After this is done it is quite easy to transfrom this in simple set of tasks:

- id: primary-controller
  type: puppet
  required_for: [deploy]
  role: [primary-controller]
  strategy:
      type: one_by_one
  parameters:
    puppet_manifest: /etc/puppet/controller.pp
- id: controller
  type: puppet
  requires: [primary-controller]
  required_for: [deploy]
  strategy:
      type: parallel
      amount: 2
  parameters:
    puppet_manifest: /etc/puppet/controller.pp
- id: compute
  type: puppet
  requires: [controller]
  strategy:
    type: parallel
  parameters:
    puppet_manifest: /etc/puppet/compute.pp
- id: cinder
  type: puppet
  requires: [controller]
  strategy:
    type: parallel
  parameters:
    puppet_manifest: /etc/puppet/cinder.pp
- id: ceph-osd
  type: puppet
  requires: [controller]
  strategy:
    type: parallel
  parameters:
    puppet_manifest: /etc/puppet/ceph.pp

As you see there is no separation between tasks and roles. For example there is next set of roles to nodes:

primary-controller: [node-1]
controller: [node-4, node-2, node-3, node-5]
cinder: [node-6]
ceph-osd: [node-7]
compute: [node-8]

Deploy /etc/puppet/controller.pp on uids [1] Deploy /etc/puppet/controller.pp on uids [2,3] in parallel Deploy /etc/puppet/controller.pp on uids [4,5] in parallel Deploy /etc/puppet/compute.pp on uids [8] and Deploy /etc/puppet/cinder.pp on uids [6] and Deploy /etc/puppet/cinder.pp on uids [7] in parallel

Current model will allow us to make multiple cross-reference tasks, like:

- id: put_compute_into_maintenance_mode
  type: puppet
  role: [primary-controller]
- id: migrate_vms_from_compute
  type: puppet
  role: [primary-controller]
  requires: [put_vm_into_maintenance_mode]
- id: reinstall_ovs
  type: puppet
  role: [compute]
  requires: [put_vm_into_maintenance_mode, migrate_vms_from_compute]
- id: make_compute_available
  role: [primary-controller]
  requires: [reinstall_vs]

It is not full format, but in general it will do next things:

  1. Put vm into maintanance mode
  2. Migrate all virtual machines from this vm
  3. Reinstall ovs (or any risky/disruptibe action)
  4. Put this vm back into available mode

In nailgun rpc receiver we will need to track status of each node deployment ourselvers, by validations process of tasks performed. So task executor (astute) will send which task is completed after each puppet execution.

In case if role was not present at the time of writing deployment_graph, it will specify all tasks it wants to execute in metadata for this role.

Data model impact

Astute facts: Nailgun will generate additional section for astute facts. This section will contain list of tasks with its priorities for specific role. Astute fact will be extended with tasks exactly in same format it is stored in database, so if we are generating fact for compute role, astute will have section like:

tasks:
    -
      priority: 100
      type: puppet
      uids: [1] - this is done for compatibility reasons
      parameters:
        puppet_manifest: /etc/network.pp
        puppet_modules: /etc/puppet
        timeout: 360
        cwd: /
    -
      priority: 100
      type: puppet
      uids: [2]
      parameters:
        puppet_manifest: /etc/controller.pp
        puppet_modules: /etc/puppet
        timeout: 360
        cwd: /

Each astute.yaml will have part of deployment graph executed for that particular role.

REST API impact

Several API requests will be added:

GET/PUT clusters/<cluster_id>/deployment_tasks Reads, updates deployment configuration for concrete cluster. It will be usefull if someone wants to execute deployment in unique order.

GET/PUT releases/<release_id>/deployment_tasks Reads, updates deployment configuration for release

GET will support filters by start_task and end_task parameters:

GET releases/<release_id>/deployment_tasks/?end_task=netconfig&start_task=hiera

Will return all tasks that should start from start_task and finish at end_task

CLI Api impact

Several commands will be added to operate on tasks and to manipulate deployment API

Download/Upload deployment tasks from nailgun API will be available both for clusters and releases, by default dir parameter is current directory.

fuel rel –rel 2 –deployment-tasks –download –dir ./ fuel rel –rel 2 –deployment-tasks –upload –dir ./

fuel env –env 2 –deployment-tasks –download –dir ./ fuel env –env 2 –deployment-tasks –upload –dir ./

Sync deployment tasks for releases:

fuel rel –sync-deployment-tasks –dir /etc/puppet

All tasks.yaml that will be found recursively in directory “dir” will be merged and sended for correct release version, there is 2 approaches that can be taken to match releases to tasks: 1. Match them by path 2. Match by config file that will on root level of tasks directory structure

::
fuel rel –sync-deployment-tasks will be performed during master bootstrap.

Next set of commands is about deployment API, in general we will have ability to construct custom graph for concrete nodes.

::
fuel node –node 2 –env 2 –tasks netconfig hiera

Only this tasks will be executed on specified nodes.

::
fuel node –node 2,3 –env 2 –skip netconfig

Tasks specified in netconfig will be dropped from deployment.

::
fuel node –node 2,3,4 –env 2 –end pre_deployment

Tasks required for pre_deployment to be ready will be executed, in this API we will traverse graph up to pre_deployment and execute those tasks

::
fuel node –node 2,3,4 –env 2 –start netconfig –end galera

Start at netconfig and end execution at task that is used for galera installation.

Upgrade impact

After 6.1 release that task API that will be done as part of this feature will be considered as stable task API and we are going to support tasks described in that order.

Versioning will be done based on MOS version, so all tasks in any given version should conform to certain API version or not.

Deployment configuration will be stored in

Cluster.deployment_tasks Release.deployemtn_tasks

Initially graph configuraton will be filled on bootstrap_master_node stage, by api call to /release/<id>/deployment_tasks

If there will be any kind of incopatibilities with new deployment code and previous stored data - it will be possible to solve by migration or modification from upgrade script (by API calls).

Security impact

Notifications impact

Other end user impact

Performance Impact

Wont significantly affect deployment time. Maybe for some cases puppet run will be shorter.

Other deployer impact

We will need to put tasks from fuel-library for each release in nailgun, at the stage of bootstrap admin node.

Developer impact

Implementation

Assignee(s)

Feature lead: - Dmitry Shulyak dshulyak@mirantis.com

Devs: - Vladimir Sharshov vsharhov@mirantis.com - Sebastian Kalinowski skalinowski@mirantis.com - Kamil Sambor ksambor@mirantis.com

Library: - Dmitry Ilyin dilyin@mirantis.com - Alex Didenko adidenko@mirantis.com

QA: - Tatyana Leontovich tleontovich@mirantis.com - Denis Dmitriev ddmitriev@mirantis.com - Anastasia Palkin apalkina@mirantis.com

Work Items

  1. Graph based API for nailgun (config-defined tasks and roles)
  2. Add hooks support for deployment stage in astute
  3. Remove pre/post tasks from astute, orchestration to nailgun, functionality to library (reuse plugins mechanism)
  4. Modularizing puppet

Dependencies

python networkx library [1]

Testing

Every new piece of code will be covered by unit tests. This is internal functionality, therefore it will be covered by system tests without any modifications. Additional tests that will verify that we dont have regression in time of deployment. Tests that will create new task and add it into deployment graph, and then verify that node is in expected state. Acceptance critirea for each task granule will be added in another spec, eithre library modularizarion or modular tests.

Documentation Impact

Requires update to developer and plugin documentation.

References

  1. https://networkx.github.io/ - Python utilities for working with graph’s
  2. http://docs.mirantis.com/ fuel-dev/develop/nailgun/customization/settings.html#expression-syntax