Handle sparse images¶
https://blueprints.launchpad.net/glance-store/+spec/handle-sparse-image
Some drivers like rbd and filesystem support sparse image, meaning not really write null byte sequences but only the data itself at a given offset, the “holes” who can appear will automatically interpreted by the storage backend as null bytes, and do not really consume your storage.
Problem description¶
As glance deal with instance image, it appear that they are majorly composed of null bytes sequence to represent the whole disk size of the instances, by exemple the 8GB base CentOS 7 cloud image contain 1GB of data for 7GB of holes, so it will significantly optimize storage usage and upload time.
Current implementation of rbd and filesystem driver rely on the
utils.chunkreadable
function, which will basically split the file to
import into block of CHUNK_SIZE
, then these blocks will be directly written
to the backend whatever the content, and the offset will be incremented by the
size of the chunk.
Here is an example for a ceph backend with a standard CentOS 7 cloud image using Glance:
$ rbd du 9b86961e-6bf3-4d0d-99dc-7c762fe6881d
NAME PROVISIONED USED
9b86961e-6bf3-4d0d-99dc-7c762fe6881d@snap 8 GiB 8 GiB
9b86961e-6bf3-4d0d-99dc-7c762fe6881d 8 GiB 0 B
<TOTAL> 8 GiB 8 Gi
$ rbd export 9b86961e-6bf3-4d0d-99dc-7c762fe6881d /tmp/Centos7full.raw
$ md5sum /tmp/Centos7full.raw
aae49f6f57aecb9774f399149a0b7f35 /tmp/Centos7full.raw
And the same result when uploading the same image with qemu-img convert or rbd import:
$ rbd du 437e8de0-b897-4846-96aa-aff70cd8794c
NAME PROVISIONED USED
437e8de0-b897-4846-96aa-aff70cd8794c@snap 8 GiB 1008 MiB
437e8de0-b897-4846-96aa-aff70cd8794c 8 GiB 0 B
<TOTAL> 8 GiB 1008 MiB
$ rbd export 437e8de0-b897-4846-96aa-aff70cd8794c /tmp/Centos7sparse.raw
$ md5sum /tmp/Centos7sparse.raw
aae49f6f57aecb9774f399149a0b7f35 /tmp/Centos7sparse.raw
We can see here that the checksum of the downloaded file, either sparse or not
stay the same, so it should not have impact on the file integrity. In both
case, the glance image-download
command will produce a non sparse file
because download process just read the file in the backend chunk after chunk,
so null byte sequence will be read, sparse file or not.
Proposed change¶
There is two successive optimization we can make to achieve the same result as other import tool like qemu-img:
Do not write null bytes sequences inside chunk (Write optimization)
Rely on filesystem instruction to skip holes (Read optimization)
A new configuration option enable_thin_provisioning
will be added to rbd
and filesystem backend in order to make it switchable by operator. Enable it
will enable both read and write optimization.
Do not write null bytes sequences inside chunk¶
This first optimization will work in all case, wether or not the image file is sparse or not, it is the behaviour implemented in qemu-img. It consist on checking if the chunk readed is only composed of null bytes, if it’s the case, just increase the offset without writing any data to the store.
Rely on filesystem instruction to skip holes¶
This second optimization will rely on the syscall SEEK_HOLE and SEEK_DATA, available since kernel 3.8 and python 3.3. It consist on directly skipping holes, without even reading the null bytes sequences, which can be very long in case of a large image like an appliance (hundred of GB). As it rely on linux kernel syscall, older linux kernel or Windows node will just skip the optimization and work like before.
This second optimization can only work when the image file is actually considered as sparse by the filesystem, so it require to be converted “in-place” on staging store to raw file by the convert plugin of import workflow. If not, by exemple by sending directly a raw file trough Glance REST API, filesystem of the staging store won’t be aware of the hole.
Alternatives¶
None
Data model impact¶
None
REST API impact¶
None
Security impact¶
None
Notifications impact¶
None
Other end user impact¶
None
Performance Impact¶
Write optimization¶
These tests have been done against 2 rbd backend sent through web-download image-import workflow, with raw conversion enabled.
For a 8GO Centos qcow2:
Chunk size |
8MB |
32MB |
64MB |
---|---|---|---|
Time without sparse upload |
3min31 |
3min26 |
3min28 |
Time with sparse upload |
1min59 |
1min58 |
2min04 |
-44% |
-43% |
-40% |
|
Storage used without sparse upload |
8 GiB/8 GiB |
8 GiB/8 GiB |
8 GiB/8 GiB |
Storage used with sparse upload |
1.0 GiB/8 GiB |
1.0 GiB/8 GiB |
1.0 GiB/8 GiB |
-88% |
-88% |
-88% |
For a 200GO Centos qcow2:
Chunk size |
8MB |
---|---|
Time without sparse upload |
4h |
Time with sparse upload |
41min11 |
-83% |
|
Storage used without sparse upload |
200 GiB/200 GiB |
Storage used with sparse upload |
5.8 GiB/200 GiB |
-88% |
Read optimization¶
The following tests have been done by reading data of a Centos 7 image file
Centos 8GB Qcow2 |
Centos 8GB RAW |
Centos 100GB Qcow2 |
Centos 100GB RAW |
|
---|---|---|---|---|
Read all file (including holes) |
0m3.964s |
0m16.746s |
0m4.666s |
3m4.003s |
Read only data (skip holes) |
0m2.662s |
0m4.686s |
0m3.916s |
0m4.425s |
-32,8% |
-72,0% |
-16,1% |
-97,6% |
The optimization for the Qcow2 image tends to be negligible, as Qcow2 images does not have holes, so it should be very fast in all case. The point here is to show that there is no negative impact for Qcow2 images, and huge positive one for raw images, so we can apply this behaviour in all case.
Other deployer impact¶
Addition of a new enable_thin_provisioning
configuration option for rbd
and filesystem store will require operator to enable it. Without this option,
behaviour will stay the same as before.
As this configuration option is per store, it is possible in a multi-store environment to choose on which store it will be enabled.
Developer impact¶
None, as these optimizations are handled inside drivers itself and should not change their interfaces.
Implementation¶
Assignee(s)¶
- Primary assignee:
alistarle
- Other contributors:
yebinama
Work Items¶
Update drivers who can handle sparse images: filesystem and rbd.
Dependencies¶
None
Testing¶
Testing that there is no functional regression for the modified drivers.
Testing that it does not have a negative impact on system where SEEK_DATA/SEEK_HOLE instruction are not available.
Documentation Impact¶
Document the new configuration option
enable_thin_provisioning
for rbd and filesystem driver.
References¶
Original ceph.io article who expose these optimizations: https://ceph.io/planet/importing-an-existing-ceph-rbd-image-into-glance/
Initial abandonned patch in glance_store: https://review.opendev.org/#/c/430641/
Python implementation of SEEK_HOLE/SEEK_DATA syscall: https://bugs.python.org/issue10142