Semver Utility for DB storage

https://blueprints.launchpad.net/glance/+spec/semver-support

Working with versions of various objects is a common problem, which already has a number of market-adopted solutions. One of these solutions is Semantic Versioning - a system of rules and requirements for assigning version numbers to software components and other objects. One can find the specification for SemVer freely at semver.org <http://semver.org>

It is proposed to add support of the semantic versioning concept into Glance, according to the version 2.0.0 of the specification [1], so Glance objects (starting from Artifacts, but probably including Images in future) may be properly versioned.

Problem description

Semantic versioning provides the ability to compare two or more objects based on the version with which they are associated. According to the specification [1] objects are compared first by their major versions, then minor versions, then patch versions, also there is a concept of “pre-release” versions (alphas, betas, release candidates (RC) etc) which should always be considered “lower” then the “released” version with the same values of numeric versions.

For example, 1.2.2 < 1.2.3-beta < 1.2.3

If we want to store versioned objects in the catalog (this may be applied to images, artifacts and other entities), then we need to be able to execute this kind of semantic comparison for large amounts of entities.

So, the comparison should be made not only in memory, but at the database as well, and there is no generic datatype in modern RDBMs to store this kind of versioning information. So, a method for storing easily-sortable version identifiers should be introduced.

Proposed change

First of all, Glance has to adopt some utility to parse strings which contain version information, verify their compliance with the specification and properly process the version objects in memory. There is a number of mature libraries which have this functionality and there is no need to re-implement them. After some research it has been suggested to use “semantic_version” library which is available at pypi [2]. This library is not present in OpenStack global requirements, so a patchset [3] has been submitted to add it there.

To be able to sort these version objects in the database it is required to convert them into some generic comparable data type. Due to the nature of version information (fixed numeric components for major-minor-patch part) and arbitrary sequences of alphanumeric strings for pre-release and metadata labels it is suggested to store them separately as three database fields: one for the numeric part, another for the pre-release label and the last one for build metadata.

Three numeric components (major, minor and patch) may be converted into a single unsigned 64-bit integer number: first 16 bits of this number will be allocated to store major revision, next 16 bits - for minor revision, next 16 bits - for patch revision. Remaining 16 bits will be used to store the release type flag (to make sure that the final release had higher precedence then pre-releases) and may be reserved for future improvement and storing additional information which is irrelevant for semantic versioning but may have some other meanings (see Alternatives section below for more details).

The labels of pre-release version should be stored independently from the numeric part as a regular string, because - according to the semver spec - they are to be compared according to regular alphanumeric comparison only if the numeric parts of the versions are identical.

So, these two values - long number and a string - may be combined into a single composite index in the database, which will provide efficient capabilities to sort and filter objects with the versions assigned.

However, there is one important difference between the semver requirement and simple comparison of alphanumeric strings: semver requires that the labels are compared “per component” (where “component” is a dot-separated part of the label), and the components which consists only of digits are to be treated as integers rather then ASCII strings. For example, version “1.0.0-alpha.4.foo” should have lower precedence then “1.0.0-alpha.10.bar”, because their numeric components are equal, and the labels have identical first component (‘alpha’) but differs in the second (“4” vs “10”), and 4 is less then 10. But the labels are compared as string database fields, the precedence will be wrong as “alpha.4.foo” is lexically greater then “alpha.10.bar” (due to “4” being greater then “1”).

To solve this problem it is suggested to add one constraint to this semver implementation: to limit the maximum length of numeric components in the pre- release label to a reasonably low value (say, 6 characters) and add extra leading zeros to these components when saving them to database.

In this case the “alpha.4.foo” label from the example above will become “alpha.000004.foo”, and “alpha.10.foo” - “alpha.000010.foo”. ASCII-based comparison of these strings will give the results which are consistent with the requirements of semver. Later, when these values are read from the database the leading zeros may be removed so the labels look fine again.

This applies only to the pre-release label part. Build metadata (the part which is separated by the ‘+’ character) does not take part in the precedence resolution, so it neither has to be part of the database index nor has to be pre-processed in anyway.

It is suggested to create custom composite field for SQLAlchemy which will encapsulate the above described logic (converting from semantic version into 3 database-friendly values and back) and will be usable for building version-aware model classes.

Alternatives

Semantic Versioning is not the only specification which defines the format for version string. There is another standard - PEP440 - which describes a scheme for identifying versions of Python software distributions [4]. It shares some common features with Semantic Versioning but has different and a bit more complicated notation.

Besides slightly different syntax (it just concatenates pre-release segments to the right of release number, while semver separates them with a dash), it puts extra constraints on what the pre-release label may contain. In semver, pre-release label may contain arbitrary alphanumeric characters, while in PEP440 they may be only be ‘a’, ‘b’ or ‘rc’ followed by a number. This could theoretically allow to store the pre-release component as part of the same 64-bit long database field which is used to store the release number (e.g. the release type flag takes 2 bits, and remaining 14 bits are left for the number of the pre-release build) - however this significantly decreases the flexibility of the pre-release version field.

Also, PEP440 adds more additional entities: it has a concept of development builds (being one additional special segment which goes after the pre-release segments), Epochs (which precedes the build number), local version (which is actually similar to build metadata of semantic versioning but has different purpose and also takes part in precedence resolution by following about the same rules as arbitrary pre-release label of semantic versioning) etc. Also, unlike semantic versioning PEP440 does not have any limits on the amount of numeric components in the build number: so, it may be anything from simple “1” to “1.2.3.4.5.6.7.8.9.10” and beyond. This, of course, gives more flexibility and power, but may not be easily mapped to efficient database storage.

Which is more important, PEP440 is a standard which is native to Python world, but is not known outside, while the purpose of Glance Artifacts is to be as generic as possible in terms of the nature of its objects. This means that the users of the artifacts are not restricted to be Python developers only: they may not be the developers at all. So, following easier and more generic standard seems preferable.

There is one more standard which stands between semver and pep440. It is called “Linux Compatible Semantic Versioning 3.0.0”, is a fork of regular semver (its 2.0 version) and is developed within OpenStack community [5]. It tries to blend regular semver with versions of Linux Distribution packages and uses some concepts of pep440 for it.

This notation is easier to map to the database type, however it is still local to relatively small community of developers (OpenStack developers in this case), so more generic and widely adopted standard as semver seems more preferable.

However we are not limited to having only a single versioning notation. In future we may add support for extra schemas, including some subset of pep440 or Linux Compatible Semantic Versioning. This may be implemented as part of further Artifact Repository roadmap or other activities. This particular spec leaves this out of scope and focuses only on semver implementation.

Data model impact

None: this spec does not cover any actual database changes, it just describes the utility which will allow to operate with semver objects and convert them to data which may be usable for DB storage - and back.

REST API impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

The proposed change does not affect existing code in any sense.

Other deployer impact

This spec assumes that [3] is merged, i.e. the semantic_version library is added to the global requirements.

Developer impact

The usage of the lib should be documented for developers, so they may efficiently use it in their code.

Implementation

Assignee(s)

Primary assignee:

ativelkov

Reviewers

Core reviewer(s):

jokke

Other reviewer(s):

ivasilevskaya mfedosin travis-tripp icordasc

Work Items

Initial implementation of the feature may be done in a single changeset. However it seems preferable to add this support to semantic_version library [2] and remove it from glance codebase aftwerwards. If the maintainer of the library does not accept this functionality (or if we decide to add support for more versioning notations later) then this code may be transferred to some common OpenStack library, such as Oslo.

After this feature is implemented we should continue the work to add support for other versioning schemas, such as pep440, Linux Compatible Semantic Versioning and others. These should be added as independent features covered by separate specs.

Dependencies

None

Testing

A unit test should be added demonstrating the data structure usage, comparison, string parsing and conversion operation to DB type (long)

Documentation Impact

Developers’ guide has to be updated to hint the developers on how to properly use the library in their code.

References

[1] http://semver.org [2] https://pypi.org/project/semantic_version/ [3] https://review.openstack.org/#/c/151466/ [4] https://www.python.org/dev/peps/pep-0440/