Copyright (c) 2015 Hewlett-Packard Development Company, L.P.

This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode

Zuul v3

Storyboard: https://storyboard.openstack.org/#!/story/2000305

As part of an effort to streamline Zuul and Nodepool into an easier-to-use system that scales better and is more flexible, some significant changes are proposed to both. The overall goals are:

  • Make zuul scale to thousands of projects.
  • Make Zuul more multi-tenant friendly.
  • Make it easier to express complex scenarios in layout.
  • Make nodepool more useful for non virtual nodes.
  • Make nodepool more efficient for multi-node tests.
  • Remove need for long-running slaves.
  • Make it easier to use Zuul for continuous deployment.
  • Support private installations using external test resources.
  • Keep zuul simple.

Problem Description

Nodepool

Currently Nodepool is designed to supply single-use nodes to jobs. We have extended it to support supplying multiple nodes to a single job for multi-node tests, however the implementation of that is very inefficient and will not scale with heavy use. The current system uses a special multi-node label to indicate that a job requires the number of nodes provided by that label. It means that pairs (or triplets, or larger sets) of servers need to be created together, which may cause delays while specific servers are created, and servers may sit idle because they are destined only for use in multi-node jobs and can not be used for jobs which require fewer (or more) nodes.

Nodepool also currently has no ability to supply inventory of nodes which are not created and destroyed. It would be nice to allow nodepool to mediate access to real hardware, for instance.

Zuul

Zuul is currently fundamentally a single-tenant application. Some folks want to use it in a multi-tenant environment. Even within OpenStack, we have use for multitenancy. OpenStack might be one tenant, and each stackforge project might be another. Even without the OpenStack/stackforge divide, we may still want the kind of separation multi-tenancy can provide. Multi-tenancy should allow for multiple tenants to have the same job and project names, but also to share configuration if desired. Tenants should be able to define their own pipelines, and optionally control some or all of their own configuration.

OpenStack’s Zuul configuration currently uses Jenkins and Jenkins Job Builder (JJB) to define jobs. We use very few features of Jenkins and Zuul was designed to facilitate our move away from Jenkins. The JJB job definitions are complex, and some of the complexity comes from specific Jenkins behaviors that we currently need to support. Additionally, there is no support for orchestrating actions across multiple test nodes.

Proposed Change

Nodepool

Nodepool should be made to support explicit node requests and releases. That is to say, it should act more like its name – a node pool. It should support the existing model of single-use nodes as well as long-term nodes that need mediated access.

Nodepool should use ZooKeeper to fulfill node requests from Zuul. A request should be made using the Zookeeper priority queue construct at the path:

/nodepool/requests/500-123
  node_types: [list of node types]
  requestor: descriptive string of requestor (eg zuul)
  created_time: <unix timestamp>
  state: requested | pending | fulfilled | failed
  state_time: <unix timestamp>
  nodes: [list of node ids]
  declined_by: [list of launchers declining this request]

The name of the request node, “500-123”, is composed of the priority (“500”) followed by the sequence number (“123”). After creating the request node, Zuul should read the request node back and set a watch on it. If the read associated with the watch set indicates that the request has already been fulfilled, it should proceed to use the nodes, otherwise, it should wait to be notified by the watch. Note special care will need to be taken to re-set watches if the connection to ZooKeeper is reset. The pattern of read to test whether request is fulfilled and set watch if not can be repeated as many times as necessary until the request is fulfilled.

This model is much more efficient for multi-node tests, where we will no longer have to have special multinode labels. Instead the multinode configuration can be much more ad-hoc and vary per job. Requests for nodes are in a FIFO queue and will be satisfied in the order received according to node availability. This should make demand and allocation calculations much simpler.

A node type is simply a string such as ‘trusty’, that corresponds to an entry in the nodepool config file.

The component of Nodepool which will process these requests is known as a “launcher”. A Nodepool system may consiste of multiple launchers (for instance, one launcher for each cloud provider). Each launcher will continuously scan the request queue (sorted by request id) and attempt to process each request in sorted order. A single launcher may be engaged in satisfying multiple requests simultaneously.

When satisfying a request, Nodepool will first obtain a lock on the request using the Zookeeper lock construct at the path:

/nodepool/requests-lock/005-123

It will then attempt to satisfy the request from available nodes, and failing that, cause new nodes to be created. When multiple nodes are requested together, nodepool will return nodes within the same AZ of the same provider.

A simple algorithm which does not require that any launcher know about any other launchers is:

# Obtain next request # If image not available, decline # If request > quota, decline # If request < quota and request > available nodes (due to current

usage), begin satisfying the request and do not process further requests until satisfied
# If request < quota and request < available nodes, satisfy the
request and continue processing further requests

Since Nodepool consists of multiple launchers, each of which is only aware of its own configuration, there is no single component of the system that can determine if a request is permanently unsatisfiable. In order to avoid requests remaining in the queue indefinitely, each launcher will register itself at the path:

/nodepool/launchers/<hostname>-<pid>-<tid>

When a launcher is unable to satisfy a request, it will modify the request node (while still holding the lock) and add its identifier to the field declined_by. It should then check the contents of this field and compare it to the current contents of /nodepool/launchers. If all of the currently on-line launchers are represented in declined_by the request should be marked failed in the state field. The update of the request node will notify Zuul via the previously set watch, however, it will check the state, and if the request is not failed or fulfilled, will simply re-set the watch. The launcher will then release the lock and, if the request is not yet failed, other launchers will be able to attempt to process the request. When processing the request queue, the launcher should avoid obtaining the lock on any request it has already declined (though it should always perform a check for whether the request should be marked as failed in case the last launcher went off-line shortly after it declined the request).

Requests should not be marked as failed for transient errors (if a node destined for a request fails to boot, another node should take its place). Only in the case where it is impossible for Nodepool to satisfy a request should it be marked as failed. In that case, Zuul may report job failure as a result.

If at any point Nodepool detects that the ephemeral request node has been deleted, it should return any allocated nodes to the pool.

Each node should have a record in Zookeeper at the path:

/nodepool/nodes/456
  type: ubuntu-trusty
  provider: rax
  region: ord
  az: None
  public_ipv4: <IPv4 address>
  private_ipv4: <IPv4 address>
  public_ipv6: <IPv6 address>
  allocated_to: <request id>
  state: building | testing | ready | in-use | used | hold | deleting
  created_time: <unix timestamp>
  updated_time: <unix timestamp>
  image_id: /nodepool/image/ubuntu-trusty/builds/123/provider/rax/images/456
  launcher: <hostname>-<pid>-<tid>

The node should start in the building state and if being created in response to demand, set allocated_to to the id of the node request. While building, Nodepool should hold a lock on the node at:

/nodepool/nodes/456/lock

Once complete, the metadata should be updated, the state set to ready, and the lock released. Once all of the nodes in a request are ready, Nodepool should update the state of the request to fulfilled and release the lock. Zuul, which will have been notified of the change by the watch it set, should then obtain the lock on each node in the request and update its state to ‘in-use’. It should then delete the request node.

When Zuul is finished with the nodes, it should set their states to used and release their locks.

Nodepool will then decide whether the nodes should be returned to the pool, rebuilt, or deleted according to the type of node and current demand.

If any Nodepool or Zuul component fails at any point in this process, it should be possible to determine this and either recover or at least avoid leaking nodes. Nodepool should periodically examine all of the nodes and look for the following conditions:

  • A node allocated to a request that does not exist where the node is in the ready state for more than a short period of time (e.g., 300 seconds). This is a node that was either part of a fulfilled request and given to a requestor but the requestor has done nothing with it yet, or the request was canceled immediately after being fulfilled.
  • A node in the building or testing states without a lock. This means the Nodepool launcher handling that node died; it should be deleted.
  • A node in the in-use state without a lock. This means the Zuul launcher using the node died.

This should allow the main work of nodepool to be performed by multiple independent launchers, each of which is capable of processing the request queue and modifying the pool state as represented in Zookeeper.

The initial implementation will assume only one launcher is running for each provider in order to avoid complexities involving quota spanning across launchers, rate limits, and how to prevent request starvation in the case of multiple launchers for the same provider where one is handling a very large request. However, future work may enable this with more coordination between launchers in zk.

Nodepool should also allow the specification of static inventory of non-dynamic nodes. These may be nodes that are running on real hardware, for instance.

Zuul

Tenants

Zuul’s main configuration should define tenants, and tenants should specify config files to include. These include files should define pipelines, jobs, and projects, all of which are namespaced to the tenant (so different tenants may have different jobs with the same names):

### main.yaml
- tenant:
    name: openstack
    include:
      - global_config.yaml
      - openstack.yaml

Files may be included by more than one tenant, so common items can be placed in a common file and referenced globally. This means that for, eg, OpenStack, we can define pipelines and our base job definitions (with logging info, etc) once, and include them in all of our tenants:

### main.yaml (continued)
- tenant:
    name: openstack-infra
    include:
      - global_config.yaml
      - infra.yaml

A tenant may optionally specify repos from which it may derive its configuration. In this manner, a repo may keep its Zuul configuration within its own repo. This would only happen if the main configuration file specified that it is permitted:

### main.yaml (continued)
- tenant:
    name: random-stackforge-project
    include:
      - global_config.yaml
    source:
      my-gerrit:
        repos:
        - stackforge/random  # Specific project config is in-repo

Nodesets

A significant focus of Zuul v3 is a close interaction with Nodepool to both make running multi-node jobs simpler, as well as facilitate running jobs on static resources. To that end, the node configuration for a job is introduced as a first-class resource. This allows both simple and complex node configurations to be independently defined and then referenced by name in jobs:

### global_config.yaml
- nodeset:
    name: precise
    nodes:
      - name: controller
        image: ubuntu-precise
- nodeset:
    name: trusty
    nodes:
      - name: controller
       image: ubuntu-trusty
- nodeset:
    name: multinode
    nodes:
      - name: controller
        image: ubuntu-xenial
      - name: compute
       image: ubuntu-xenial

Jobs may either specify their own node configuration in-line, or refer to a previously defined nodeset by name.

Jobs

Jobs defined in-repo may not have access to the full feature set (including some authorization features). They also may not override existing jobs.

Job definitions continue to have the features in the current Zuul layout, but they also take on some of the responsibilities currently handled by the Jenkins (or other worker) definition:

### global_config.yaml
# Every tenant in the system has access to these jobs (because their
# tenant definition includes it).
- job:
    name: base
    timeout: 30m
    nodes: precise
    auth:
      inherit: true  # Child jobs may inherit these credentials
      swift:         # Swift usage may only be defined in config repo
        - container: logs
    workspace: /opt/workspace  # Where to place git repositories
    post-run:
      - archive-logs

Jobs have inheritance, and the above definition provides a base level of functionality for all jobs. It sets a default timeout, requests a single node (of type precise), and requests swift credentials to upload logs. For security, job credentials are not available to be inherited unless the ‘inherit’ flag is set to true. For example, a job to publish a release may need credentials to upload to a distribution site – users should not be able to subclass that job and use its credentials for another purpose.

Further jobs may extend and override the remaining parameters:

### global_config.yaml (continued)
# The python 2.7 unit test job
- job:
    name: python27
    parent: base
    nodes: trusty

Our use of job names specific to projects is a holdover from when we wanted long-lived slaves on Jenkins to efficiently re-use workspaces. This hasn’t been necessary for a while, though we have used this to our advantage when collecting stats and reports. However, job configuration can be simplified greatly if we simply have a job that runs the python 2.7 unit tests which can be used for any project. To the degree that we want to know how often this job failed on nova, we can add that information back in when reporting statistics. Jobs may have multiple aspects to accomodate differences among branches, etc.:

### global_config.yaml (continued)
# Version that is run for changes on stable/diablo
- job:
    name: python27
    parent: base
    branches: stable/diablo
    nodes:
      - name: controller
        image: ubuntu-lucid

# Version that is run for changes on stable/juno
- job:
    name: python27
    parent: base
    branches: stable/juno  # Could be combined into previous with regex
    nodes: precise         # if concept of "best match" is defined

Jobs may specify that they use other repos in the same tenant, and the launcher will ensure all of the named repos are in place at the start of the job:

### global_config.yaml (continued)
- job:
    name: devstack
    parent: base
    repos:
      - openstack/nova
      - openstack/keystone
      - openstack/glance

Jobs may specify that they require more than one node:

### global_config.yaml (continued)
- job:
    name: devstack-multinode
    parent: devstack
    nodes: multinode

Jobs may specify auth info:

### global_config.yaml (continued)
- job:
    name: pypi-upload
    parent: base
    auth:
      secrets:
        - pypi-credentials
        # This looks up the secrets bundle named 'pypi-credentials'
        # and adds it into variables for the job

Jobs may indicate that they may only be used by certain projects:

### shade.yaml (continued)
- job:
    name: shade-api-test
    parent: base
    allowed-projects:
      - openstack-infra/shade
    auth:
      secrets:
        - shade-cloud-credentials

Note that this job may not be inherited from because of the auth information.

Secrets

The auth attribute of a job provides way to add authentication or authorization requirements to a job. Examples above include swift and secrets, though other systems may be added.

A secret is a collection of key/value pairs and is defined as a top-level configuration object:

### global_config.yaml (continued)
- secret:
  name: pypi-credentials
  data:
    username: !encrypted/pkcs1 o+7OscBFYWJh26rlLWpBIg==
    password: !encrypted/pkcs1 o+7OscBF8GHW26rlLWpBIg==

PKCS1 with RSAES-OAEP (implemented by the Python cryptography library) will be used so that the data are effectively padded. Since the encryption scheme is specified by a YAML tag (encrypted/pkcs1 in this case), this can be extended later.

Zuul will maintain a private/public keypair for each repository (config or project) specified in its configuration. It will look for the keypair in /var/lib/zuul/keys/<source name>/<repo name>.pem. If a keypair is needed but not available, Zuul will generate one. Zuul will serve the public keys using its web server so that users can download them for use in creating the encrypted secrets. It should be easy for an end user to encrypt a secret, whether that is with an existing tool such as OpenSSL or a new Zuul CLI.

There is a keypair for each repository so that users can not copy a ciphertext from a given repo into a different repo that they control in order to coerce Zuul into decrypting it for them (since the private keys are different, decryption will fail).

It would still be possible for a user to copy a previously (or even currently) used secret in that same repo. Depending on how expansive and diverse the content of that repo is, that may be undesirable. However, this system allows for management of secrets to be pushed into repos where they are used and can be reviewed by people most knowledgable about their use. By facilitating management of secrets by repo specialists rather than forcing secrets for unrelated projects to be centrally managed, this risk should be minimized.

Further, a secret may only be used by a job that is defined in the same repo as that secret. This prevents users from defining a job which requests unrelated secrets and exposes them.

In many cases, jobs which use secrets will be safe to use by any repository in the system (for example, a Pypi upload job can be applied to any repo because it does not execute untrusted code from that repo). However, in some cases, jobs that use secrets will be too dangerous to allow other repositories to use them (especially when those repositories may be able to influence the job and cause it to expose secrets). We should add a flag to jobs which indicate that they may only be used by certain projects (typically only the repo in which they are defined).

Pipelines may be configured to either allow or disallow the use of secrets with a new boolean attribute, ‘allow-secrets’. This is intended to avoid the exposure of secrets by a job which was subject to dynamic reconfiguration in a check pipeline. We would disable the use of secrets in our check pipelines so that no jobs with secrets could be configured to run in it. However, jobs which use secrets for pre-merge testing (for example, to perform live API testing on a public cloud) could still be run in the gate pipeline (which would only happen after human review verified they were safe), or an access restricted on-demand pipeline.

Projects

Pipeline definitions are similar to the current syntax, except that it supports specifying additional information for jobs in the context of a given project and pipeline. For instance, rather than specifying that a job is globally non-voting, you may specify that it is non-voting for a given project in a given pipeline:

### openstack.yaml
- project:
    name: openstack/nova
    gate:
      queue: integrated  # Shared queues are manually built
      jobs:
        - python27  # Runs version of job appropriate to branch
        - pep8:
            nodes: trusty  # override the node type for this project
        - devstack
        - devstack-deprecated-feature:
            branches: stable/juno  # Only run on stable/juno changes
            voting: false  # Non-voting
    post:
      jobs:
        - tarball:
            jobs:
              - pypi-upload

Project templates are still supported, and can modify job parameters in the same way described above.

Before Zuul executes a job, it finalizes the job content and parameters by incorporating input from the multiple job definitions which may apply. The job that will ultimately be run is a job which inherits from all of the matching job definitions in the order in which they were encountered in the configuration. This allows for increasingly specific job definitions. For example, a python unit test job may be defined globally. A variant of that job (with the same name) may be specified with an alternate node definition for “stable” branches. Further, a project-local job specification may indicate that job should only run when files in the “tests/” directory are modified. The result is that the job will only run when files in “tests/” are modified, and, if the change is on a stable branch, the alternate node definition will be used.

Currently unique job names are used to build shared change queues. Since job names will no longer be unique, shared queues must be manually constructed by assigning them a name. Projects with the same queue name for the same pipeline will have a shared queue.

A subset of functionality is available to projects that are permitted to use in-repo configuration:

### stackforge/random/.zuul.yaml
- job:
    name: random-job
    parent: base      # From global config; gets us logs
    nodes: precise

- project:
    name: stackforge/random
    gate:
      jobs:
        - python27    # From global config
        - random-job  # Flom local config

Ansible

The actual execution of jobs will continue to be distributed to workers over Gearman. Therefore the actual implementation of how jobs are executed will remain pluggable, however, the zuul-gearman protocol will need to change. Because the system needs to perform coordinated tasks on one or more remote systems, the initial implementation of the workers will use Ansible, which is particularly suited to that job.

The executable content of jobs should be defined as ansible playbooks. Playbooks can be fairly simple and might consist of little more than “run this shell script” for those who are not otherwise interested in ansible:

### stackforge/random/playbooks/random-job.yaml
---
hosts: controller
tasks:
  - shell: run_some_tests.sh

Global jobs may define ansible roles for common functions:

### openstack-infra/zuul-playbooks/python27.yaml
---
hosts: controller
roles:
  - tox:
      env: py27

Because ansible has well-articulated multi-node orchestration features, this permits very expressive job definitions for multi-node tests. A playbook can specify different roles to apply to the different nodes that the job requested:

### openstack-infra/zuul-playbooks/devstack-multinode.yaml
---
hosts: controller
roles:
  - devstack
---
hosts: compute
roles:
  - devstack-compute

Additionally, if a project is already defining ansible roles for its deployment, then those roles may be easily applied in testing, making CI even closer to CD.

The pre- and post-run entries in the job definition might also apply to ansible playbooks and can be used to simplify job setup and cleanup:

### openstack-infra/zuul-playbooks/archive-logs.yaml
---
hosts: all
roles:
  - archive-logs: "/opt/workspace/logs"

All of the content of Ansible playbooks is held in the git repositories that Zuul operates on, and this is true for some of the Ansible roles as well, though some playbooks will require roles that are defined outside of this system. Because the content of roles must be already present on the host executing a playbook, Zuul will need to be able to prepare these roles prior to executing a job. To facilitate this, job definitions may also specify role dependencies:

### global_config.yaml (continued)
- job:
    name: ansible-nova
    parent: base
    roles:
      - zuul: openstack-infra/infra-roles
      - galaxy: openstack.nova
        name: nova

This would instruct zuul to prepare the execution context with roles collected from the zuul-managed “infra-roles” repository, as well as the “openstack.nova” role from Ansible Galaxy. An optional “name” attribute will cause the role will to be placed in a directory with that name so that the role may be referenced by it. When constructing a job using inheritance, roles for the child job will extend the list of roles from the parent job (this is intended to make it simple to ensure that all jobs have a basic set of roles available).

If a job references a role in a Zuul-managed repo, the usual dependency processing will apply (so that jobs can run with un-merged changes in other repositories).

A Zuul repository might be a bare single-role repository (e.g., ansible-role-puppet), or it might be a repository which contains multiple roles (e.g., infra-roles, or even project-config). Zuul should detect these cases and handle them accordingly.

  • If a repository appears to be a bare role (has tasks/, vars/, etc. directories at the root of the repo), the directory containing the repo checkout (which should otherwise be empty) should be added to the roles_path Ansible configuration value.
  • If a repository has a roles/ directory at the root, the roles/ directory within the repo should be added to roles_path.
  • Otherwise, the root of the repository should be added to the roles path (under the assumption that individual directories in the repository are roles).

In the future, Zuul may support reading Ansible requirements.yaml files to determine roles needed for jobs.

Execution

A new Zuul component would be created to execute jobs. Rather than running a worker process on each node (which requires installing software on the test node, and establishing and maintaining network connectivity back to Zuul, and the ability to coordinate actions across nodes for multi-node tests), this new component will pick up accept jobs from Zuul, and for each one, write an ansible inventory file with the node and variable information, and then execute the ansible playbook for that job. This means that the new Zuul component will maintain ssh connections to all hosts currently running a job. This could become a bottleneck, but ansible and ssh have been known to scale to a large number of simultaneous hosts, and this component may be scaled horizontally. It should be simple enough that it could even be automatically scaled if needed. In turn, however, this does make node configuration simpler (test nodes need only have an ssh public key installed) and makes tests behave more like deployment.

To support the use case where the Zuul control plane should not be accessible by the workers (for instance, because the control plane is on a private network while the workers are in a public cloud), the direction of transfer of changes under test to the workers will be reversed.

Instead of workers fetching from zuul-mergers, the new zuul-launcher will take on the task of calculating merges as well as running ansible.

Continuous Deployment

Special consideration is needed in order to use Zuul to drive continuous deployment of development or production systems. Rather than specifying that Zuul should obtain a node from nodepool in order to run a job, it may be configured to simply execute an ansible task on a specified host:

- job:
    name: run-puppet-apply
    parent: base
    host: review.openstack.org
    fingerprint: 4a:28:cb:03:6a:d6:79:0b:cc:dc:60:ae:6a:62:cf:5b

Because any configuration of the host and credential information is potentially accessible to anyone able to read the Zuul configuration (which is everyone for OpenStack’s configuration) and therefore could be copied to their own section of Zuul’s configuration, users must add one of two public keys to the server in order for the job to function. Zuul will generate an SSH keypair for every tenant as well as every project. If a user trusts anyone able to make configuration changes to their tenant, then they may use Zuul’s public key for their tenant. If they are only able to trust their own project configuration in Zuul, they may add Zuul’s public key for that specific project. Zuul will make all public keys available at known HTTP addresses so that users may retrieve them. When executing such a job, Zuul will try the project and tenant SSH keys in order.

Tenant Isolation

In order to prevent users of one Zuul tenant from accessing the git repositories of other tenants, Zuul will no longer consider the git repositories it manages to be public. This could be solved by passing credentials to the workers for them to use when fetching changes, however, an additional consideration is the desire to have workers fully network isolated from the Zuul control plane.

Instead of workers fetching from zuul-mergers, the new zuul-launcher will take on the task of calculating merges as well as running ansible. The launcher will then be responsible for placing prepared versions of requested repositories onto the worker.

Status reporting will also be tenant isolated, however without HTTP-level access controls, additional measures may be needed to prevent tenants from accessing the status of other tenants. Eventually, Zuul may support an authenticated REST API that will solve this problem natively.

Alternatives

Continuing with the status quo is an alternative, as well as continuing the process of switching to Turbo Hipster to replace Jenkins. However, this addresses only some of the goals stated at the top.

Implementation

Assignee(s)

Primary assignee:
  • corvus
Also:
  • jhesketh
  • mordred

Gerrit Branch

Nodepool and Zuul will both be branched for development related to this spec. The “master” branches will continue to receive patches related to maintaining the current versions, and the “feature/zuulv3” branches will receive patches related to this spec. The .gitreview files will be updated to submit to the correct branches by default.

Work Items

  • Modify nodepool to support new allocation and distribution (mordred)
  • Modify zuul to support new syntax and isolation (corvus)
  • Create zuul launcher (jhesketh)
  • Prepare basic infra ansible roles
  • Translate OpenStack JJB config to ansible

Repositories

We may create new repositories for ansible roles, or they may live in project-config.

Servers

We may create more combined zuul-launcher/mergers.

DNS Entries

No changes other than needed for additional servers.

Documentation

This will require changes to Nodepool and Zuul’s documentation, as well as infra-manual.

Security

No substantial changes to security around the Zuul server; use of Zuul private keys for access to remote hosts by Zuul has security implications but will not be immediately used by OpenStack Infrastructure.

Testing

Existing nodepool and Zuul tests will need to be adapted. Configuration will be different, however, much functionality should be the same, so many functional tests should have direct equivalencies.

Dependencies

None.