Quiescing filesystems with QEMU guest agent during image snapshotting¶
When QEMU Guest Agent is installed in a kvm instance, we can request the instance to freeze filesystems via libvirt during snapshotting to make the snapshot consistent.
Currently we need to quiesce filesystems (fsfreeze) manually before snapshotting an image of active instances to create consistent backups. This should be automated when QEMU Guest Agent is enabled.
(Quiescing on cinder’s create-snapshot API is covered by another proposal 1)
With this feature, users can create a snapshot image with consistent file systems state while the instances are running (fsck will not run when the snapshot image is booted).
It will be nice when:
taking a quick backup before installing or upgrading softwares.
automatically taking backup images every night.
When QEMU Guest Agent is enabled in an instance, Nova-compute libvirt driver will request the agent to freeze the filesystems (and applications if fsfreeze-hook is installed) before taking snapshot of the image.
For boot-from-volume instances, Nova will call Cinder’s snapshot-create API for every volume attached after quiescing an instance. To avoid double quiescing, Nova should tell Cinder not to quiesce the instance on snapshot. For this purpose, ‘quiesce=True|False’ parameter will be added to Cinder’s snapshot-create API.
After taking snapshots, the driver will request the agent to thaw the filesystems.
The prerequisites of this feature are:
the hypervisor is ‘qemu’ or ‘kvm’
libvirt >= 1.2.5 (which has fsFreeze/fsThaw API) is installed in the hypervisor
‘hw_qemu_guest_agent=yes’ property and ‘hw_require_fsfreeze=yes’ property is set on the image metadata, and QEMU Guest Agent is installed and enabled in the instance
When quiesce is failed even though these conditions are satisfied (e.g. the agent is not responding), snapshotting may fail by exception not to get inconsistent snapshots.
Rewrite nova’s snapshotting with libvirt’s domain.createSnapshot API with VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE flag, although it will change the current naming scheme of disk images. In addition, it cannot be leveraged to implement live snapshot of cinder volumes.
Data model impact¶
REST API impact¶
Other end user impact¶
While taking snapshots, disk writes from the instance are blocked.
Other deployer impact¶
- Primary assignee:
Implement the automatic quiesce during snapshotting when it is available.
Add a quiesced snapshotting scenario test with libvirt >= 1.2.5 (Fedora experimental queue will be a good place to start testing.)
Live snapshotting with an image with qemu-guest-agent should be added to tempest. Note that it requires environment with libvirt >= 1.2.5, so it would be Fedora experimental queue job with virt-preview repository enabled.
Need to document how to use this feature in the operation guide (which currently recommends you use the fsfreeze tool manually).