ETags
=====

ETags_ are "opaque validator[s] for differentiating between
multiple representations of the same resource". They are used in a
variety of ways in HTTP to determine the outcome of conditional
requests as described in :rfc:`7232`. Understanding the full breadth
of ETags requires a very complete understanding of HTTP and the
nuances of resources and their representations. This document does
not attempt to address all applications of ETags at once, instead it
addresses specific use cases that have arisen in response to other
guidelines. It will evolve over time.

ETags and the lost update problem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Problem
-----------

HTTP is fundamentally a system for sending representations of
resources back and forth across a network connection. A common
interaction is to ``GET /some/resource``, modify the representation,
and then ``PUT /some/resource`` to update the resource on the server.
This is an extremely useful and superfically simple pattern that drives
many APIs in OpenStack and beyond.

That apparently simplicity is misleading: If there are two or more
clients performing operations on ``/some/resource`` in the same time
frame they can experience the `lost update problem`_:

* Client A and client B both ``GET /some/resource`` and make changes to
  their local representation.
* Client B does a ``PUT /some/resource`` at time 1.
* Client A does a ``PUT /some/resource`` at time 2.

Client B's changes have been lost. Neither client is made aware of this.
This is a problem.

A Solution
----------

HTTP/1.1 and beyond has a solution for this problem called ETags_.
These provide a validator for different representations of a
resource that make it straightforward to determine if the
representation provided by a request or a response is the same as
one already in hand. This is very useful when validating cached GET
requests (the ETag answers the question "is what I have in my cache
the same as what the server would give me?") but is also useful for
avoiding the lost update problem.

If the scenario described above is modified to use ETags it would
work like this:

* Client A and client B both ``GET /some/resource``, including a
  response header named ``ETag`` that is the same for both clients
  (let's make the ETag 'red57'). Details on ETag generation can be
  found below.
* They both make changes to their local representation.
* Client B does a ``PUT /some/resource`` and includes a header
  named If-Match_ with a value of ``red57``. The request is
  successful because the ETag sent in the request is the same as the
  ETag generated by the server of its current state of the resource.
* Client A does a ``PUT /some/resource`` and includes the If-Match_
  header with value ``red57``. This request fails (with a 412_
  response code) because ``red57`` no longer matches the ETag
  generated by the server: Its current state has been updated by the
  request from client B.


Client B's changes have not been lost and client A has not
inadvertently changed something that is not in the form they
expected. Client A is made aware of this by the response code.
At this stage, client A can choose to GET the resource again and compare
their local representation with that just retrieved and choose a course
of action.

Details
-------

If a service accepts PUT requests and needs to avoid lost updates it
can do so by:

* Sending responses to GET requests with an ETag **header** (see
  below for some discussion on ETag-like attributes in
  representations).
* Requiring clients to send an If-Match header with a valid ETag when
  processing PUT requests.
* Processing the If-Match header on the server side to compare the
  ETag provided in the request with the generated ETag of the
  currently stored representation. If there is a match, carry on
  with the request action, if not, respond with a 412 status code.

.. note:: An ETag value is a double-quoted string: ``"the etag"``.

.. note:: The If-Match header may contain multiple ETags (separated
          by commas). If it does, at least one must match for the
          request to proceed.

.. note:: What section of a codebase takes the responsibility of
          managing the ETag and If-Match headers is greatly dependent on
          the architecture of the service. In general the handler or
          controller for each resource should be the locus of
          responsibility. It may be there are decorators or libraries
          that can be shared but such things are beyond the scope of
          this document. Early implementors are encouraged to write code
          that is transparent and easy to inspect, allowing easier
          future extraction.

.. note:: ETags_ can be either strong or weak. see :rfc:`7232` for
          discussion on how weak ETags may be used. They are not
          addressed in this document as their import is primarily
          related to cache handling. Strong ETags signify
          byte-for-byte equivalence between representations of the
          same resource. Weak ETags indicate only semantic equivalence.

Each of the steps listed above require functionality to generate ETags
for representations. Whenever the representation is different the ETag
should be different. :rfc:`7232#section-2.3.1` has advice on how to
generate good ETags. In practice they should be:

* Different for different forms of the same resource. For example, the
  XML and JSON representations of the same version of a resource
  should have different ETags.
* Different from version to version.
* Not based on something that will change when the system restarts.
  For example not be based on inodes or database keys that are ints
  or other non-universal identifiers.
* Not be based on hashes of strings that do not have reliable
  ordering. For example it can be tempting to make md5 or sha hashes
  of the JSON string that represents a resource. If the ordering in
  that JSON is not guaranteed, the ETag is not useful.

Ideally they should be fast to calculate or if not fast then easy
to store (when the representation is written). A hash of a last
udpated timestamp and the content-type can work, but only if updates
are less frequent than clock updates.

.. note:: Many details of how ETags can be useful are left out of this
          document. It is worth reading :rfc:`7232` in its entirety to
          understand their purpose, how they work, edge cases and
          how they interact with other modes of conditional request
          handling.

Special Cases
-------------

For simple resources that represent a single unified entity the
above handling works well. For more complex resources the situation
becomes more complicated. Some scenarios worth considering:

* When there is a resource which represents a collection of
  resources (e.g. ``GET /resources`` versus ``GET
  /resources/some-id``) the strict process for updating one of the
  resources in that collection when using ETags would be:

  * ``GET /resources`` to get the list of resources.
  * Do some client side processing to choose a singe resource's id.
  * ``GET /resources/that-id`` to get the resource and its ``ETag``
    header.
  * Modify the local representation.
  * ``PUT /resources/that-id`` with an ``If-Match`` header
    containing the ETag.

  This may be considered cumbersome. One way to optimize this is to
  include an attribute whose value is the ETag in the individual
  representations of the singular resources in the collection
  resource. Then the second GET above can be skipped as the ETag is
  already available.

* When a resource has sub resources (e.g. an ``/image/id`` resource
  contains a metadata attribute whose content is also available at
  ``/image/id/metadata``) it can be desirable to retrieve the image
  resource and then PUT to the metadata resource. Strictly speaking
  this would require a GET of the metadata resource to determine the
  ETag.

  If this is a problem, an optimization to work around this is to
  allow the ETag of the image resource to be an acceptable ETag of
  the metadata resource when provided in an ``If-Match`` header.
  If this is done, then it is important that the reverse not be
  true: The ETag sent with the metadata resource should not be valid
  in an ``If-Match`` header sent to the image resource.

.. note:: In both of the above scenarios the semantics of ETags are being
          violated. An ETag is not a magic key to unlock a resource and
          make it writable. It is a value used to determine if two
          representations of the same resource are in fact the same. In
          the situations above they are comparing different resources.
          Services should only do so if they must. Either because the
          performance benefit is huge (in which case consider fixing the
          performance of the API) or the user experience improvement is
          significant. The latter is far more important and legitimate
          than the former

.. _lost update problem: https://www.w3.org/1999/04/Editing/
.. _ETags: https://tools.ietf.org/html/rfc7232#section-2.3
.. _412: https://tools.ietf.org/html/rfc7232#section-4.2
.. _If-Match: https://tools.ietf.org/html/rfc7232#section-3.1