Secure Hash Algorithm Support¶
https://blueprints.launchpad.net/glance/+spec/multihash
This spec provides a future-proof mechanism for providing a secure SHA512 hash for each image.
Problem description¶
The hash of an image can be obtained from the checksum
field in the
image metadata. Currently, this checksum field holds a md5 hexdigest
value. While md5 can still be used to verify data integrity against
unintentional corruption, it should not be used to verify data integrity
against tampering. A more collision-resistant hashing algorithm should
be used for broader data integrity coverage.
Proposed change¶
This spec proposes adding two new fields to the image metadata. The
proposed keys for this change would be os_hash_algo
and os_hash_value
.
The proposed value is a hash value that is formatted as a sha512 hexdigest.
The hexdigest will be obtained using the current python hashlib library.
os_hash_algo
will be sha3_512 and os_hash_value
will be the hex. At this
point in time, the config will default to sha3_512, but this change
will allow us to alter allowed hashing algorithms at any time. The os_hash_algo
field will also allow for users to dynamically pass this value to their python code.
We will leave MD5 calculations and checksums for backwards compatibility.
Adding a new image will set all of checksum
, os_hash_algo
and os_hash_value
fields.
Requesting the image metadata of an image without the os_hash_algo/os_hash_value
value set, will result in a null for the os_hash_algo/os_hash_value
field in the
metadata response.
Alternatives¶
Update the value of the
checksum
field with the hexdigest of a different hashing algorithm. This will likely break client-side checksum verification code.Add only the new
os_hash_value
field. While it is a simpler change, it is a less future proof approach because it ties the sha512 algorithm to theos_hash_value
property in the same way that the md5 algorithm is tied to the currentchecksum
property. When a sha512 collision is produced, there will have to be a new spec in to add yet another checksum field.The approach described in this spec of using an additional
os_hash_algo
field will allow changing the hash algorithm without adding a new field or breaking the API contract. All we’ll have to do is update the algorithm used and put its name in theos_hash_algo
property, and then any image consumer will know how to interpret what’s in theos_hash_value
property.It is worth noting that the Glance implementation of image signature validation gives us a precedent for having a value in one property (
img_signature
) and the name of the algorithm used in another property (img_signature_hash_method
) [1]. Thus the proposal in this spec is completely consistent with a related Glance workflow.Use the hexdigest of a different algorithm. Tests using hashlib’s sha256 and sha512 algorithms consistently yielded faster times for sha512 (SEE: Performance impact). The implementation of the sha512 algorithm within hashlib demonstrates reasonable performance and should be considered collision-resistant for many years.
Implement a single
multihash
field in the image metadata in which we calculate the SHA512 value. The single field will contain the coded algorithm name and hash [0]. The advantage is that it is only one field being added to the data model and schema. However, having only one field would make it difficult to use the hash quickly as the end user would be required to decode what algorithm is being used before verifying the hash.
Data model impact¶
Triggers: None
Expand: Two new columns (
os_hash_algo
&os_hash_value
) with type string/varchar (defaulting to null) will be added to the images table. We will create an index similar to the one onchecksum
as well.Migrate: None
Contract: None
Conflicts: None
REST API impact¶
Two new fields (os_hash_algo
& os_hash_value
) will exist in requests for the image
metadata.
Security impact¶
A new more collision-resistant hash will be returned in addition to the current checksum.
Notification impact¶
None
Other end user impact¶
None
Performance impact¶
New image uploads (and the first download of previously uploaded images) will take an additional amount of time to calculate the sha512 checksum:
5G binary blob of random data: * md5: ~9s * sha256: ~22s * sha512: ~14s * 1Gbps line speed upload: 42s
1.5G Ubuntu 16.04 cloud image: * md5: ~2.9s * sha256: ~7.2s * sha512: ~4.6s * 1Gbps line speed upload: 12s
555M Debian 8 cloud image: * md5: ~1.0 * sha256: ~2.5 * sha512: ~1.6 * 1Gbps line speed upload: 4.5s
Note: SHA512 has been selected and should have minimal impact on overall upload time with regards to the entire process.
Test Code:
#!/usr/bin/env python3
import hashlib
import time
def runtime(f):
def wrapper(*args, **kwargs):
start = time.time()
f(*args, **kwargs)
print("Time elapsed: %s" % (time.time() - start))
return wrapper
@runtime
def checksum(filename, algorithm):
algorithms = {"md5": hashlib.md5,
"256": hashlib.sha256,
"512": hashlib.sha512,
}
with open(filename, "rb") as f:
m = algorithms[algorithm]()
for chunk in iter(lambda: f.read(65536), ''):
m.update(chunk)
print("%s: %s" % (algorithm, m.hexdigest()))
checksum("fake.img", "512")
checksum("fake.img", "256")
checksum("fake.img", "md5")
checksum("fake.img", "256")
checksum("fake.img", "md5")
checksum("fake.img", "512")
Developer impact¶
Any future checksum verification code should use the os_hash_algo
& os_hash_value
fields.
Fallback to the checksum
field if not properly populated.
Implementation¶
Assignee(s)¶
Primary assignee: Scott McClymont
Other contributors:
Work Items¶
Add tests
Update the db to add
os_hash_algo
&os_hash_value
columns to the images table (including expand, migrate, contract, and monolith code)Update the sections of code that calculate the
checksum
to also calculateos_hash_algo
&os_hash_value
(includes calculation on upload)Discuss updating on download
os_hash_algo
&os_hash_value
when value is nullUpdate internal checksum verification code to use the
os_hash_value
field and fallback tochecksum
field when not presentUpdate glance client
Update docs
Add the os_hash_algo value to the discovery API if the API is ready
Dependencies¶
None
Testing¶
Update the tests to verify proper population of image properties