New Location APIs¶
https://blueprints.launchpad.net/glance/+spec/new-location-apis
Problem description¶
Currently we have two security vulnerabilities with
show_multiple_locations
config option, OSSN-0065 [1] and OSSN-0090 [2].
If we enable show_multiple_locations
and the policies for add/update
(set_image_location), get (get_image_location) and remove
(delete_image_location) locations are set for non-admins then non-admin users
can modify location data to corrupt an image that they own. Note that the
policies for add, get and remove locations are set for non-admins by default
else a non-admin user cannot associate data with an image record, or retrieve
image data, or delete image data.
When show_multiple_locations is False, users cannot modify image
locations via the image-update API call, even if they have the
{get,set,delete}_image_location
permissions. However, there are some
popular use cases where other services can bypass Glance and store or access
image data directly in the backend by writing or reading image locations,
using the image owner’s credentials, and this is why operators want to set
show_multiple_locations to True. What operators want to do, however, is to
enable optimized image data access; exposing image locations to non-admin
users is a side-effect, not the goal. We currently recommend that operators
who want to use optimized data access use a specialized Glance instance for
services, and only expose glance-api to end users with show_multiple_locations
set False. This is inconvinient for certain users.
Proposed change¶
There will be 3 phases in which the work will be done as follows:
Introduce 2 new API calls that allow operations on image locations which are described in detail in the REST API impact section. These calls will replace the image-update mechanism for consumers like cinder and nova.
Modify the consumer (cinder/nova) code to use the new location APIs. Also modify HTTP store to use new location APIs.
Remove
show_multiple_locations
config option when it is no longer required by other services (cinder/nova) to perform operations on locations. This will mostly be done 1 or 2 cycles after the consumers have adapted the new location APIs to handle the upgrade cases.
The config option show_multiple_locations
has been deprecated since Newton
but we will keep the config option until the consumers of glance locations
(nova, cinder, http store etc) start using the new location APIs. Since this
is a major effort spanning across multiple services (nova, cinder, glance),
we will implement the work items in different cycles to provide enough
time for developers (to implement this) and operators (to move away from the
config option).
We will introduce 2 new policies, for each API performing different operations like add and get, as follows:
The
add policy
can default to theproject member
orservice
role (when it is implemented).The
get policy
will default to theservice
role for authorization.
Along with the new add policy
, we will add a check in the location add API
code to check the status of image and only add location if it is in QUEUED
state and adding location when the image is in other states will be
disallowed. This is done in order to prevent malicious users from modifying
the image location again and again since the location added for the first time
is the correct one as far as Glance is concerned.
We will also introduce a new configuration parameter do_secure_hash
on
the glance side which will tell the API if we want to do the hash calculation
or not. This will be useful in cases when nova, cinder etc, adds a location
in glance since glance does not calculate the hash and checksum automatically
in these cases. The value of do_secure_hash
will be True
by default.
After nova or cinder send a request for adding a location for the VM snapshot
or upload volume case respectively and do_secure_hash
is True
, glance
will start a background process that will calculate the hash of the image.
Unless we have validation_data
(in the request body) to be verified,
image will be set to active
state after registering the location even if
the hash calculation is ongoing in the background. This is done so that the
image can be used to create instances and bootable volumes instantly after
we’ve registered the location and not wait for the hash calculation since
it is a long running task. After the hash calculation completes, image
properties will be updated with the checksum
, os_hash_algo
and
os_hash_value
values.
Following are the cases of image transition with different values of
do_secure_hash
and validation_data
:
do_secure_hash
isTrue
andvalidation_data
is not None:Image transition: (queued, importing, active)
In this case the consumer provides the hash values for validation and hash is calculated by glance. An example of this case will be providing validation_data for HTTP store. Here image hash will be calculated and verified before setting image to active state.
do_secure_hash
isTrue
andvalidation_data
is None:Image transition: (queued, active)
In this case validation data will not be provided by the consumer but hash is calculated by glance. Examples of this case will be when nova snapshots an instance or cinder uploads a volume to image. Here image hash calculation will be done and updated after setting image to active state. This is a tricky case because the consumer will have no idea if the
active
image will ever have a hash value or not and if it should wait for the hash to be populated in the image or not. To handle this, we will set theos_hash_algo
value in the image properties so the consumer will know that hash calculation is ongoing for this image and the hash will be populated here. Here are the following cases:active
image and noos_hash_algo
: This image will not have hash value populated.active
image and hasos_hash_algo
: Poll foractive
image status andos_hash_algo
until you getos_hash_value
. Polling foractive
image status is optional since the image gets active whenvalidation_data
is not provided and hash calculation is ongoing in the background i.e. this case. Theos_hash_algo
value will be popped if hash calculation fails.
do_secure_hash
isFalse
andvalidation_data
is not None:Image transition: (queued, active)
In this case validation data will be provided by the consumer and hash is not calculated by glance. An example of this case will be providing validation_data for HTTP store. Here image hash will not be calculated and verified but directly set to image with values provided by the user.
do_secure_hash
isFalse
andvalidation_data
is None:Image transition: (queued, active)
In this case validation data will not be provided by the consumer and hash is not calculated by glance. This can happen for all cases. Here hash value won’t be set in the image.
If the hash calculation fails, we will add a retry mechanism that will
reinitiate the task. We will add a new configuration option http_retries
with a default value of 3
i.e. the hash calculation will be executed
maximum 3 times by default if the first and second tries fail.
If after all the retries, the hash calculation still fails, we will not update
the hash and checksum values and image will stay in active
state.
End-user access to image locations via the Image API is no longer necessary. Since Train, Glance has multiple stores support, and we have added API calls that allow users to manipulate data locality with respect to store. Further, a store is an opaque identifier, whereas an image location exposes backend details that users don’t need to know.
Here are the current use cases for the direct manipulation of image locations along with an explanation of how they can be handled by the new Location API.
When using a copy-on-write (COW) backend shared by Nova and Glance, Nova can create an image record in Glance, snapshot a server image directly in the backend, and set the location on the image record. This use case is covered by the new add-location call, and having its default policy be project member (image owner) or service.
A user wants to have a single image record, but have image data stored in multiple locations for locality (i.e., to have image data as close as possible to where it’s consumed). This use case is handled by the glance multiple stores feature plus image import, which since API v2.8, allows a ‘stores’ parameter specifying where the image data should be stored. This applies to both newly created images and existing images (via the copy-image import method). In this workflow, Glance itself manipulates the image locations; there is no need for the user to interact with locations directly.
An operator wants to introduce a new storage backend and decommission the current backend while keeping the same image catalog. Similar to #2, this can be handled by using the copy-image import method and the delete-image-from-store API call introduced in v2.10. Note that there are some exceptions to this like:
HTTP store is read-only, so we can’t use copy-image in this case.
For RBD store, we will create a dependency chain if we launch a VM or create a bootable volume from it hence we can’t delete the source image until all of it’s children are flattened.
For cinder store, if the cinder backend uses COW cloning, it is similar to the RBD case mentioned in b) else the image delete will succeed.
Following APIs are not being implemented:
Update
: For service to service interaction, there is no value in updating
the metadata of a location. This would be beneficial if we plan to remove the
existing location code from image-update call and support the usecase of
operators/end-users doing location operations.
Delete
: We already have Delete Image From Store API for this purpose.
We don’t require the Delete Image From Store API call for the current
usecase but if we plan to extend the location APIs in future, we can do this
by updating the policies enforced by Delete Image From Store operation from
the default role:admin
to role:admin or role:service
.
Alternatives¶
We can remove the
show_multiple_locations
config option and filter the images with theadmin_or_service
role. This will require the consumers to provide admin credentials during add or get of an image to get the location. This was the original proposal but due to the disagreement here [3], we changed the design to the current proposal.Another alternative is to add this functionality in the import workflow. We can add a new import method
direct-location
which will allow end users to specify thelocation
andmetadata
parameters and create a new image based on the given parameters. We can also update an existing image withlocation
andmetadata
values but will require the image to be inqueued
state.For this, we will need to add a new import method
direct-location
and also add--metadata
and--location
parameters to the following commands:glance image-create-via-import --import-method direct-location --location <location> --metadata <key1=value1, key2=value2 ...>
glance image-import --import-method direct-location --location <location> --metadata <key1=value1, key2=value2 ...>
Data model impact¶
None
REST API impact¶
We are going to add 2 new location APIs:
Add Location
This will add a new location to an existing image. The request body will contain the location URL and
validation_data
[4] (optional). The purpose of including validation_data in the request body is when the consumer wants to validate the image hash or just directly wants to add the hash values to the image. The cases ofvalidation_data
withdo_secure_hash
are described in the Proposed change section. An example wherevalidation_data
will be provided is the HTTP store case, where the user will provide hash value for the HTTP image.Unlike old location API, we will not provide support of adding a location on a particular index. If we want to get the benefit of indexes, we can use the old location APIs or set location strategy as store_type [5]. A new location strategy
store_identifier
is proposed [6] and should be useful to download image from a specific store in case multiple stores are configured.POST /v2/images/{image_id}/locations
JSON request body
{ "url": "cinder://lvmdriver-1/1a304872-b0ca-4992-b2c2-6874c6d5d5f9", "validation_data": { "os_hash_algo": "sha512", "os_hash_value": "6b813aa46bb90b4da216a4d19376593fa3f4fc7e617f03a92b7fe11e9a3981cbe8f0959dbebe36225e5f53dc4492341a4863cac4ed1ee0909f3fc78ef9c3e869", } }
JSON response body
Success - 200
{ "url": "cinder://lvmdriver-1/1a304872-b0ca-4992-b2c2-6874c6d5d5f9", "metadata": "{'store': 'lvmdriver-1'}" "validation_data": { "os_hash_algo": "sha512", "os_hash_value": "6b813aa46bb90b4da216a4d19376593fa3f4fc7e617f03a92b7fe11e9a3981cbe8f0959dbebe36225e5f53dc4492341a4863cac4ed1ee0909f3fc78ef9c3e869", } }
Error - 409 (Location already exists or if image is not in QUEUED state), 403 (Forbidden for users that are not owner), 400 (BadRequest if hash validation fails)
Get Location(s)
This will show all the locations associated to an existing image. Returns an empty list if an image contains no locations.
GET /v2/images/{image_id}/locations
JSON response body
[ { "url": "cinder://lvmdriver-1/0f031ed1-5872-43d5-a638-4b0d07c10ab5", "metadata": "{'store': 'lvmdriver-1'}" }, { "url": "cinder://cephdriver-1/11b4fa9f-a44b-46c9-950c-0026c467252c", "metadata": "{'store': 'cephdriver-1'}" } ]
Error - 404 (Image ID does not exist), 403 (Forbidden for normal users)
The transition of image state during the image create operation will be as follows. Image upload (PUT), image stage (PUT) and location add (POST), will transition the image from queued to the next state that could be either of the following:
saving
uploading
importing
active
Below are the valid transitions for image from queued state.
‘queued’: (‘saving’, ‘uploading’, ‘importing’, ‘active’, ‘deleted’)
Security impact¶
No worse than it is now, and possibly better.
The get-locations policy is restricted to the ‘service’ role, so users will not be able to see image locations. Thus with ‘show_multiple_locations’ and ‘show_direct_url’ set to False, the new get-locations API will not expose location information to users.
The add-location policy is restricted by default to image-owner. This will allow end users to add a location to an image to address current uses of this functionality that we aren’t aware of. Even allowing this, the data-substitution attack is blocked because the API call will only be allowed for an image in ‘queued’ status. The add-location API cannot be used to add a location to an image in other states and then delete the original location, so the OSSN-0065 attack is not possible under this scenario. Further, the add-locations call (unlike the current method of updating locations via PATCH), does not require the locations to be visible to succeed. Thus operators will be able to configure Glance with ‘show_multiple_locations’ and ‘show_direct_url’ set to False, even when other services are sharing a COW backend with Glance and the operator wants an optimized workflow.
Notifications impact¶
None
Other end user impact¶
Since the new APIs are mainly for service to service interaction (except the HTTP store case), we will only expose the location add API via CLI. However, we will need to add methods for all APIs in openstacksdk (that will call the new location APIs) that will be used by other consumer services like cinder and nova. End users can still use the existing commands (that internally calls the image-update API) to perform operations on locations:
glance location-add:
Add a location (and related metadata) to an image.glance location-delete:
Remove locations (and related metadata) from an image.glance location-update:
Update metadata of an image’s location.
We will also add a new command to glanceclient and OSC that will allow end
users to add the location url
and validation-data
for HTTP store case.
glance add-location-properties --url <location> --validation-data <os_hash_algo=value1, os_hash_value=value2>
openstack image add location properties --url <location> --validation-data <os_hash_algo=value1, os_hash_value=value2>
Performance Impact¶
In the old location API, the consumers (nova, cinder) registered
the location in glance and the checksum, hash etc values weren’t
calculated. After the consumers adapt to the new location API,
and the do_secure_hash
config parameter is True
(default),
glance will read the image and calculate the hash in the background.
The hash calculation will be a long running task so it will consume
resources, however, this won’t affect the operation requested by
nova or cinder as the image will transition to active
state even
when the hash calculation is ongoing.
The performance downside will result in creation of more secure
images and the impact needs to be conveyed to the operators/end users
with documentation and releasenotes. Since do_secure_hash
will be a
configurable parameter on glance side, we will add suitable help text
to convey the performance and security impact of enabling/disabling this
option.
Other deployer impact¶
None
Developer impact¶
Consumers like Cinder, Nova and HTTP store need to modify code to call the new client functions to access the API. Some of the key things to consider while implementing consumer side changes are:
We will use SDK to make the API calls. The changes to call new location APIs will be in SDK and also in OSC/glanceclient for location ADD in case of HTTP store.
Keep backward compatibility with old behavior. Glance should support the legacy behavior as well as the new way to add/get locations. This is useful in upgrade cases where one compute node is running 2023.1 (Antelope) code and the other compute node has been upgraded to 2024.1 (CC) release.
Testing should be done to see if the existing functionalities supported with the legacy location APIs works as expected with the new APIs.
Implementation¶
Assignee(s)¶
- Primary assignee:
pdeore
- Other contributors:
whoami-rajat
Work Items¶
Add 2 new Location APIs for add and get operations.
Modify consumers like cinder and nova and http store to use the new location APIs.
Add a new configuration parameter
do_secure_hash
in glance and document it’s impact.Add a new configuration parameter
http_retries
in glance and document it’s usage.Add SDK support to call the new APIs.
Add a releasenote mentioning that we will remove the config option
show_multiple_locations
when the consumers (nova/cinder/http store) shift to using new location APIs.Tempest tests for the new add-location and get-location APIs.
Dependencies¶
None
Testing¶
Unit Tests
Functional Tests
Integration Tests
Tempest Tests
Documentation Impact¶
Need to document new location APIs.
References¶
Deprecate show_multiple_locations option | https://review.opendev.org/c/openstack/glance/+/313936
Update deprecated show_multiple_locations helptext | https://review.opendev.org/c/openstack/glance/+/426283
Update show_multiple_locations deprecation note | https://review.opendev.org/c/openstack/glance/+/625702
Original security bug | https://bugs.launchpad.net/ossn/+bug/1549483
New security bug | https://bugs.launchpad.net/ossn/+bug/1990157