ETags are “opaque validator[s] for differentiating between multiple representations of the same resource”. They are used in a variety of ways in HTTP to determine the outcome of conditional requests as described in RFC 7232. Understanding the full breadth of ETags requires a very complete understanding of HTTP and the nuances of resources and their representations. This document does not attempt to address all applications of ETags at once, instead it addresses specific use cases that have arisen in response to other guidelines. It will evolve over time.
HTTP is fundamentally a system for sending representations of resources back and forth across a network connection. A common interaction is to GET /some/resource, modify the representation, and then PUT /some/resource to update the resource on the server. This is an extremely useful and superfically simple pattern that drives many APIs in OpenStack and beyond.
That apparently simplicity is misleading: If there are two or more clients performing operations on /some/resource in the same time frame they can experience the lost update problem:
Client B’s changes have been lost. Neither client is made aware of this. This is a problem.
HTTP/1.1 and beyond has a solution for this problem called ETags. These provide a validator for different representations of a resource that make it straightforward to determine if the representation provided by a request or a response is the same as one already in hand. This is very useful when validating cached GET requests (the ETag answers the question “is what I have in my cache the same as what the server would give me?”) but is also useful for avoiding the lost update problem.
If the scenario described above is modified to use ETags it would work like this:
Client B’s changes have not been lost and client A has not inadvertently changed something that is not in the form they expected. Client A is made aware of this by the response code. At this stage, client A can choose to GET the resource again and compare their local representation with that just retrieved and choose a course of action.
If a service accepts PUT requests and needs to avoid lost updates it can do so by:
An ETag value is a double-quoted string: "the etag".
The If-Match header may contain multiple ETags (separated by commas). If it does, at least one must match for the request to proceed.
What section of a codebase takes the responsibility of managing the ETag and If-Match headers is greatly dependent on the architecture of the service. In general the handler or controller for each resource should be the locus of responsibility. It may be there are decorators or libraries that can be shared but such things are beyond the scope of this document. Early implementors are encouraged to write code that is transparent and easy to inspect, allowing easier future extraction.
ETags can be either strong or weak. see RFC 7232 for discussion on how weak ETags may be used. They are not addressed in this document as their import is primarily related to cache handling. Strong ETags signify byte-for-byte equivalence between representations of the same resource. Weak ETags indicate only semantic equivalence.
Each of the steps listed above require functionality to generate ETags for representations. Whenever the representation is different the ETag should be different. RFC 7232 has advice on how to generate good ETags. In practice they should be:
Ideally they should be fast to calculate or if not fast then easy to store (when the representation is written). A hash of a last udpated timestamp and the content-type can work, but only if updates are less frequent than clock updates.
Many details of how ETags can be useful are left out of this document. It is worth reading RFC 7232 in its entirety to understand their purpose, how they work, edge cases and how they interact with other modes of conditional request handling.
For simple resources that represent a single unified entity the above handling works well. For more complex resources the situation becomes more complicated. Some scenarios worth considering:
When there is a resource which represents a collection of resources (e.g. GET /resources versus GET /resources/some-id) the strict process for updating one of the resources in that collection when using ETags would be:
This may be considered cumbersome. One way to optimize this is to include an attribute whose value is the ETag in the individual representations of the singular resources in the collection resource. Then the second GET above can be skipped as the ETag is already available.
When a resource has sub resources (e.g. an /image/id resource contains a metadata attribute whose content is also available at /image/id/metadata) it can be desirable to retrieve the image resource and then PUT to the metadata resource. Strictly speaking this would require a GET of the metadata resource to determine the ETag.
If this is a problem, an optimization to work around this is to allow the ETag of the image resource to be an acceptable ETag of the metadata resource when provided in an If-Match header. If this is done, then it is important that the reverse not be true: The ETag sent with the metadata resource should not be valid in an If-Match header sent to the image resource.
In both of the above scenarios the semantics of ETags are being violated. An ETag is not a magic key to unlock a resource and make it writable. It is a value used to determine if two representations of the same resource are in fact the same. In the situations above they are comparing different resources. Services should only do so if they must. Either because the performance benefit is huge (in which case consider fixing the performance of the API) or the user experience improvement is significant. The latter is far more important and legitimate than the former