Support for partition preservation during rollback

https://blueprints.launchpad.net/fuel/+spec/rollback-partition-preservation

Support for partition preservation.

Problem description

First:

Currently Fuel performs full re-provisioning while performing rollback which implies data deletion. For instance, it doesn’t preserve virtual machine image files, compute node log files, database files. That said it loses valuable data during rollback. So operator has to manually backup valuable data before rollback.

Proposed change

The idea is to preserve certain partitions on the nodes during rollback while fully reformat others. For instance, keep partition /var/lib/nova/instances intact but create filesystem on / (root) from the scratch.

Proposed features

  • Allow keeping certain partitions with valuable data on the nodes during provision.
  • Allow configuring a set of partitions to preserve using standard fuel CLI commands to download and modify disk.yaml.

Implementation model

General overview

On data preservation following use cases were identified: 1) keep Ceph data 2) keep Swift data 3) keep Nova instances cache 4) keep DB/logs/other custom partition types

Architectural design

First off, here’s how partitioning in Fuel 6.x works right now. There are two components - VolumeManager (on nailgun’s side) and Fuel agent (a process inside bootstrap OS). Fuel agent is quite straightforward - it just executes orders from VolumeManager. Fuel agent knows nothing about partitioning layout, it doesn’t contain any business logic. So it is VolumeManager which contains that business logic.

The idea of Rollback Paritition Preservation is to keep data (vm images, logs, db files, etc) during reprovisioning. So in terms of partitioning instead of fully deleting partitions and creating a new partition table Fuel will preserve data on some specific partitions (not all) while it still will reformat others (like root partition).

Preserved partition must be located as separate mountpoints Mountpont should be added to the openstack.yaml in case when specific mountpoint doesn’t exist

Workflow will be as following:

  1. Add ‘keep’ or similar boolean flag to node disks settings in Nailgun API (i.e. disks.yaml):

    fuel node --node 1 --disk --download

    cat node_1/disks.yaml
    
    - extra:
      - disk/by-id/scsi-SATA_QEMU_HARDDISK_QM00001
      - disk/by-id/ata-QEMU_HARDDISK_QM00001
      id: disk/by-path/pci-0000:00:01.1-scsi-0:0:0:0
      name: sda
      size: 101836
      volumes:
      - name: os
        size: 101836
        keep: true
    

    then upload modified disk.yaml

    fuel node --node 1 --disk --upload

  2. Mcollective agent (erase_node.rb module in Mcollective) should be modified to disable erasing partitions with ‘keep: True’ when node is deleted from environment.

  3. Fuel agent should be modified to be able to preserve partition

  4. Fuel agent should save partition table if there is a partition with ‘keep’ flag set into true. On the Nailgun side all changes and operations of partition table should be not be accessible.

  5. Not preserved partition should be erased using new File System.

Delivery details

In context of this blueprint VolumeManager will be modified in order to handle new partition preservation parameter. Also, Fuel agent will be potentially changed, but those changes will be minor. In addition to VolumeManager and Fuel agent there will be small changes in nailgun in order to pass keep flag (disk.yam) to VolumeManager.

Alternatives

The alternative approach would be copying valuable data back and forth before and after the rollback. But that would drastically increase time needed for rollback.

REST API impact

As this blueprint is all about adding new configurable behaviour to node reinstallation feature - it introduces parameter to disk.yaml to REST API which performs default value as False.

Data model impact

None

Upgrade impact

This feature allows to speed up upgrade process of Compute and Storage nodes due to no need to move huge amount of data. This helps to minimize impact of upgrade procedure on end user workloads running in the cloud.

Security impact

Partition preservation must be disabled when node is being decommissioned from environment. Otherwise, user data remaining on preserved partitions can pose a security risk.

Notifications impact

None

Other end user impact

None

Performance Impact

This blueprint itself is about boosting speed of rollback and migration operations

Plugin impact

None

Other deployer impact

This feature can affect services which use files from preserved partition. In this case puppet manifests should be modified and conform this feature All changes for this services should be described in corresponding specs.

Developer impact

None

Implementation

Assignee(s)

Primary Assignee:
 Ivan Ponomarev
QA:Veronika Krayneva
Documentation:Peter Zhurba, Dmitry Klenov
Reviewer:Vladimir Kuklin, Vladimir Kozhukalov

Work Items

  1. Pass preserve partitions parameter from disk.yaml to Nailgun (VolumeManager)
  2. Adapt VolumeManager to take partition preservation flag and generate appropriate partition layout for Fuel agent
  3. Adapt fuel-agent/manager taking into account preserved partitions

Testing

Reinstall single compute on HW with partition preservation:

  1. Enable partition preservation in disks settings (disks.yaml) of the compute
  2. Do reinstallation of the compute
  3. Run OSTF tests set
  4. Run Network check
  5. Check data on partitions
  6. Check availability preserved VM’s

Reinstall single controller on HW with partition preservation

  1. Enable partition preservation in disks settings (disks.yaml) of the controller
  2. Do reinstallation of the controller
  3. Run OSTF tests set
  4. Run Network check
  5. Check data on partitions
  6. Check services data that have been preserved Services should normally works using preserved data

Documentation Impact

Documentation should be improved with information about Partition Preservation options.