.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode ================================== Granular deployment for fuel roles ================================== Problem description =================== Our deployment process is very complicated. There are a lot of Puppet modules used together by our manifests and dependencies between these modules are also very complex and abundant. This leads to the following consequences: - It becomes very difficult to add new features. Even changes that could look minor from the first glance can easily and randomly break any other functionality. Trying to guess how any change could affect dependencies and ordering of deployment process is very hard and error-prone. - Debugging is also affected. Localizing bugs could be very troublesome due to manifests being complex and hard to understand. Debugging tools are also almost non-existent. - Reproducing bugs and testing takes lots of time because we have no easy and reliable way to repeat only some part of the deployment. The only thing we can do is to start the process again and wait for several hours to get any results. Snapshots are not very helpful because deployment cannot be reliably stopped and the state saved. These actions most likely break deployment or at least change its outcome. - New members of our team or outside developers who want to add some new functionality to our project are completely out of luck. They will have to spend many days just to gain minor understanding how our deployment works. And most likely will make a lot of hard to debug mistakes. - Using our product is also not as easy as we would like it to be for customers and other people in the community. People usually cannot easily understand how the deployment works and have to just follow every step in documentation. It makes them unable to act reasonably if something goes wrong. Proposed change =============== If we want to address any of these issues we should find a way to make our architecture less complex and more manageable. It’s known that the best way to understand any large and monolithic structure is to to take it apart and then learn how does each of the pieces work and then how do they interact with each other. So we should try to separate the whole deployment process to many small parts that could do only one or several closely related tasks. Each of these parts would be easy to understand for a single developer. Testing and debugging could also be done separately so localizing and fixing bugs would be much easier than it is now. Thinking about the deployment process as a list of atomic tasks will make our reference architectures and server roles much more dynamic. If you change what tasks you are going to perform and their order you can create as many custom sets of roles as you need without any modification of the tasks themselves. Each task can have some internal dependencies but most likely there would not be too many of them. It will make manual analyzing of dependency graph possible within a single task. The task can also have some requirements. System should be in the specific state before the task can be started. The introduction of Granular Deployment will be a rather extensive change to almost all components of the Fuel project and a serious architectural modification. Graph-based Task API --------------------- Several types of tasks will be introduced in addition to basic deployment types, like puppet, shell, rsync, upload_file. This types are groups and stages, and they will serve the purpose to build flexible graph of tasks. Types of tasks:: - type: group - grouping of the tasks based on nailgun role entities - type: stage - skeleton of deployment graph in fuel, right now there is 3 stages: pre_deployment, deployment, post_deploment deployment tasks: - type: puppet - executes puppet manifests with puppet apply - type: shell - executes any shell command, like python script.py, /script.sh - type: upload_file - used for configuration upload to target nodes, repo creation - type: rsync * pre_deployment - other actions, including plugin * deployment - executes main deployment stage * post_deployment - actions that can be executed only after whole deployment is done Ideally all dependencies between tasks should be described with requires and required_for attributes, it will allow us to build graph of tasks in nailgun and then serialize it into orchestrator acceptable format (workbooks for mistral, or astute-speficic roles with priorities). type: GROUP ------------- :: id: controller type: group role: [] requires: [] required_for: [, ] tasks: [] parameters: strategy: type: parallel amount: 8 - each chunk of nodes with this role (8 in this example) will be executed in parallel :: strategy: type: one_by_one - all nodes with this role will be executed sequentially :: strategy: type: parallel - all nodes with this role will be executed in parallel :: tasks: [] - this section required for ease of understanding, which tasks belong where type: STAGE ------------ :: id: deploy type: stage requires: [] Right now we are using hardcoded set of stages, but it is completely possible to make it flexible, and define them with API. type: DEPLOYMENT TASK TYPES ---------------------------- :: id: deploy_legacy type: puppet role: [primary-controller, controller, cinder, compute, ceph-osd] requires: [] required_for: [] parameters: puppet_manifest: /etc/puppet/manifests/site.pp puppet_modules: /etc/puppet/modules timeout: 360 id: network type: shell groups: [primary-controller, controller] requires: [deploy_legacy] required_for: [deploy] parameters: cmd: python /opt/setup_network.py timeout: 600 Conditional tasks ---------------------- Major part of tasks will require conditional expressions. There is several ways to solve it: 1. Implement python framework for pluging a task. Each task will have clear interface for defining condition for a task, and if this condition passes - task will be serialized. This the most scalable and solid solution, but developing such framework will require a lot of effor, and we wont be able to land it in 6.1 2. Define conditions in custom expression parser that is also used on UI. There is couple of downsides with this approach: - Not all conditions can be expressed. For example, if zabbix-role present in cluster - deploy zabbix-agent for each role - It is new expression language, which we need to support ourselves - It depends on context data, which is quite easy to change 3. Define certain groups for tasks, and each mutually exclusive task will be able to specify its group. - This wont work with conditions that are not mutually exclusive. 4. Using strict API for conditions that can be used in expressions parsing. Pros: - it is not a new language - it has very strict api, so atleast we can try to guarantee its stability - complex abstract logic can be hidden in simple python methods Stements will be expressed in the form of: api.cluster_status == 'operational' api.role_in_deployment('ceph-osd') api.role_in_cluster('zabbix-server') api.cluster_status == 'new' and api.nodes_count() > 10 :: class ExpressionApi(object): def __init__(self, cluster, nodes): self.cluster = cluster self.nodes = nodes def role_in_deployment(self, role): for node in self.nodes: if role in node.roles: return True return False def role_in_cluster(self, role): for node in self.cluser.nodes: if role in node.roles: return True return False def nodes_count(self): return len(self.nodes) @property def cluster_status(self): return self.cluster.status env = jinja2.Environment() expr = env.compile_expression("api.cluster_status == 'operational' and api.nodes_count() < 9") print expr(api=API) In 6.1 we will either stick to existing expression language that is used for cluster settings validation. Operators are available in [2]. Usage of graph in nailgun ------------------------------------ Based on provided tasks and dependencies between tasks we will build graph object with help of networkx library [1]. Format of serialized information will depend on orchestrator that we will use in any particular release. Let me provide an example: Consider that we have several types of roles: :: - id: deploy type: stage - id: primary-controller type: group role: [primar-controller] required_for: [deploy] parameters: strategy: type: one_by_one - id: controller type: group role: [controller] requires: [primary-controller] required_for: [deploy] parameters: strategy: type: parallel amount: 2 - id: cinder type: group role: [cinder] requires: [controller] required_for: [deploy] parameters: strategy: type: parallel - id: compute type: group role: [compute] requires: [controller] required_for: [deploy] parameters: strategy: type: parallel - id: network type: group role: [network] requires: [controller] required_for: [compute, deploy] parameters: strategy: type: parallel And there is defined tasks for each role: :: - id: setup_services type: puppet requires: [setup_network] groups: [controller, primary-controller, compute, network, cinder] required_for: [deploy] parameters: puppet_manifests: /etc/puppet/manifests/controller.pp puppet_modules: /etc/puppet/modules timeout: 360 - id: setup_network type: shell groups: [controller, primary-controller, compute, network, cinder] required_for: [deploy] parameters: cmd: run_setup_network.sh timeout: 120 For each role we can define different subsets of tasks, but for simplicity lets make this tasks applicable for each role. Based on this configuration nailgun will send to orchestrator config in expected by orchestator format. For example we have several nodes for deployment: :: primary-controller: [node-1] controller: [node-4, node-2, node-3, node-5] cinder: [node-6] network: [node-7] compute: [node-8] This nodes will be executed in following order: Deploy primary-controller node-1 Deploy controller node-4, node-2 - you can see that parallel amount is 2 Deploy controller node-3, node-5 Deploy network role node-7 and cinder node-6 - they depend on controller Deploy compute node-8 - compute depends both on network and controller During deployment for each node 2 tasks will be executed sequentially: Run shell script setup_network Run puppet setup_services Pre/post_deployment task examples --------------------------------- :: - id: update_hosts type: puppet role: '*' stage: post_deployment requires: [upload_nodes_info] parameters: puppet_manifest: /etc/pupppet/modules/update_hosts_file.pp puppet_modules: /etc/puppet/modules 16 timeout: 3600 cwd: / - id: rsync_puppet type: rsync role: '*' stage: pre_deployment parameters: src: /etc/pupppet/{VERSION} dst: /etc/puppet/modules timeout: 3600 Alternatives ------------ Execute deployment based not on roles, but on tasks. To consider this as alternative we need to modularize atleast each deployment role as separate manifest. So in current deployment model, there will be next set of manifests: - controller.pp - mongo.pp - ceph_osd.pp - cinder.pp - zabbix.pp - compute.pp After this is done it is quite easy to transfrom this in simple set of tasks: :: - id: primary-controller type: puppet required_for: [deploy] role: [primary-controller] strategy: type: one_by_one parameters: puppet_manifest: /etc/puppet/controller.pp - id: controller type: puppet requires: [primary-controller] required_for: [deploy] strategy: type: parallel amount: 2 parameters: puppet_manifest: /etc/puppet/controller.pp - id: compute type: puppet requires: [controller] strategy: type: parallel parameters: puppet_manifest: /etc/puppet/compute.pp - id: cinder type: puppet requires: [controller] strategy: type: parallel parameters: puppet_manifest: /etc/puppet/cinder.pp - id: ceph-osd type: puppet requires: [controller] strategy: type: parallel parameters: puppet_manifest: /etc/puppet/ceph.pp As you see there is no separation between tasks and roles. For example there is next set of roles to nodes: :: primary-controller: [node-1] controller: [node-4, node-2, node-3, node-5] cinder: [node-6] ceph-osd: [node-7] compute: [node-8] Deploy /etc/puppet/controller.pp on uids [1] Deploy /etc/puppet/controller.pp on uids [2,3] in parallel Deploy /etc/puppet/controller.pp on uids [4,5] in parallel Deploy /etc/puppet/compute.pp on uids [8] and Deploy /etc/puppet/cinder.pp on uids [6] and Deploy /etc/puppet/cinder.pp on uids [7] in parallel Current model will allow us to make multiple cross-reference tasks, like: :: - id: put_compute_into_maintenance_mode type: puppet role: [primary-controller] - id: migrate_vms_from_compute type: puppet role: [primary-controller] requires: [put_vm_into_maintenance_mode] - id: reinstall_ovs type: puppet role: [compute] requires: [put_vm_into_maintenance_mode, migrate_vms_from_compute] - id: make_compute_available role: [primary-controller] requires: [reinstall_vs] It is not full format, but in general it will do next things: 1. Put vm into maintanance mode 2. Migrate all virtual machines from this vm 3. Reinstall ovs (or any risky/disruptibe action) 4. Put this vm back into available mode In nailgun rpc receiver we will need to track status of each node deployment ourselvers, by validations process of tasks performed. So task executor (astute) will send which task is completed after each puppet execution. In case if role was not present at the time of writing deployment_graph, it will specify all tasks it wants to execute in metadata for this role. Data model impact ----------------- Astute facts: Nailgun will generate additional section for astute facts. This section will contain list of tasks with its priorities for specific role. Astute fact will be extended with tasks exactly in same format it is stored in database, so if we are generating fact for compute role, astute will have section like: :: tasks: - priority: 100 type: puppet uids: [1] - this is done for compatibility reasons parameters: puppet_manifest: /etc/network.pp puppet_modules: /etc/puppet timeout: 360 cwd: / - priority: 100 type: puppet uids: [2] parameters: puppet_manifest: /etc/controller.pp puppet_modules: /etc/puppet timeout: 360 cwd: / Each astute.yaml will have part of deployment graph executed for that particular role. REST API impact --------------- Several API requests will be added: GET/PUT clusters//deployment_tasks Reads, updates deployment configuration for concrete cluster. It will be usefull if someone wants to execute deployment in unique order. GET/PUT releases//deployment_tasks Reads, updates deployment configuration for release GET will support filters by start_task and end_task parameters: GET releases//deployment_tasks/?end_task=netconfig&start_task=hiera Will return all tasks that should start from start_task and finish at end_task CLI Api impact -------------- Several commands will be added to operate on tasks and to manipulate deployment API Download/Upload deployment tasks from nailgun API will be available both for clusters and releases, by default dir parameter is current directory. fuel rel --rel 2 --deployment-tasks --download --dir ./ fuel rel --rel 2 --deployment-tasks --upload --dir ./ fuel env --env 2 --deployment-tasks --download --dir ./ fuel env --env 2 --deployment-tasks --upload --dir ./ Sync deployment tasks for releases: fuel rel --sync-deployment-tasks --dir /etc/puppet All tasks.yaml that will be found recursively in directory "dir" will be merged and sended for correct release version, there is 2 approaches that can be taken to match releases to tasks: 1. Match them by path 2. Match by config file that will on root level of tasks directory structure :: fuel rel --sync-deployment-tasks will be performed during master bootstrap. Next set of commands is about deployment API, in general we will have ability to construct custom graph for concrete nodes. :: fuel node --node 2 --env 2 --tasks netconfig hiera Only this tasks will be executed on specified nodes. :: fuel node --node 2,3 --env 2 --skip netconfig Tasks specified in netconfig will be dropped from deployment. :: fuel node --node 2,3,4 --env 2 --end pre_deployment Tasks required for pre_deployment to be ready will be executed, in this API we will traverse graph up to pre_deployment and execute those tasks :: fuel node --node 2,3,4 --env 2 --start netconfig --end galera Start at netconfig and end execution at task that is used for galera installation. Upgrade impact -------------- After 6.1 release that task API that will be done as part of this feature will be considered as stable task API and we are going to support tasks described in that order. Versioning will be done based on MOS version, so all tasks in any given version should conform to certain API version or not. Deployment configuration will be stored in Cluster.deployment_tasks Release.deployemtn_tasks Initially graph configuraton will be filled on bootstrap_master_node stage, by api call to /release//deployment_tasks If there will be any kind of incopatibilities with new deployment code and previous stored data - it will be possible to solve by migration or modification from upgrade script (by API calls). Security impact --------------- Notifications impact -------------------- Other end user impact --------------------- Performance Impact ------------------ Wont significantly affect deployment time. Maybe for some cases puppet run will be shorter. Other deployer impact --------------------- We will need to put tasks from fuel-library for each release in nailgun, at the stage of bootstrap admin node. Developer impact ---------------- Implementation ============== Assignee(s) ----------- Feature lead: - Dmitry Shulyak dshulyak@mirantis.com Devs: - Vladimir Sharshov vsharhov@mirantis.com - Sebastian Kalinowski skalinowski@mirantis.com - Kamil Sambor ksambor@mirantis.com Library: - Dmitry Ilyin dilyin@mirantis.com - Alex Didenko adidenko@mirantis.com QA: - Tatyana Leontovich tleontovich@mirantis.com - Denis Dmitriev ddmitriev@mirantis.com - Anastasia Palkin apalkina@mirantis.com Work Items ---------- 1. Graph based API for nailgun (config-defined tasks and roles) 2. Add hooks support for deployment stage in astute 3. Remove pre/post tasks from astute, orchestration to nailgun, functionality to library (reuse plugins mechanism) 4. Modularizing puppet Dependencies ============ python networkx library [1] Testing ======= Every new piece of code will be covered by unit tests. This is internal functionality, therefore it will be covered by system tests without any modifications. Additional tests that will verify that we dont have regression in time of deployment. Tests that will create new task and add it into deployment graph, and then verify that node is in expected state. Acceptance critirea for each task granule will be added in another spec, eithre library modularizarion or modular tests. Documentation Impact ==================== Requires update to developer and plugin documentation. References ========== 1. https://networkx.github.io/ - Python utilities for working with graph's 2. http://docs.mirantis.com/ fuel-dev/develop/nailgun/customization/settings.html#expression-syntax