::

  This work is licensed under a Creative Commons Attribution 3.0
  Unported License.
  http://creativecommons.org/licenses/by/3.0/legalcode

====================
Swift Symbolic Links
====================

1. Problem description
======================

With the advent of storage policies and erasure codes, moving an
object between containers is becoming increasingly useful. However, we
don't want to break existing references to the object when we do so.

For example, a common object lifecycle has the object starting life
"hot" (i.e. frequently requested) and gradually "cooling" over time
(becoming less frequently requested). The user will want an object to
start out replicated for high requests-per-second while hot, but
eventually transition to EC for lower storage cost once cold.

A completely different use case is when an application is sharding
objects across multiple containers, but finds that it needs to use
even more containers; for example, going from 256 containers up to
4096 as write rate goes up. The application could migrate to the new
schema by creating 4096-sharded references for all 256-sharded
objects, thus avoiding a lot of data movement.

Yet a third use case is a user who has large amounts of
infrequently-accessed data that is stored replicated (because it was
uploaded prior to Swift's erasure-code support) and would like to
store it erasure-coded instead. The user will probably ask for Swift
to allow storage-policy changes at the container level, but as that is
fraught with peril, we can offer them this instead.


2. Proposed change
==================

Swift will gain the notion of a symbolic link ("symlink") object. This
object will reference another object. GET, HEAD, and OPTIONS
requests for a symlink object will operate on the referenced object.
DELETE and PUT requests for a symlink object will operate on the
symlink object, not the referenced object, and will delete or
overwrite it, respectively.

GET, HEAD, and OPTIONS requests can operate on a symlink object
instead of the referenced object by adding a query parameter
``?symlink=true`` to the request.

The ideal behaviour for POSTs would be for them to apply to the referenced
object, but due to Swift's eventually-consistent nature this is not possible.
Initially, it was suggested that POSTs should apply to the symlink directly,
and during a GET or HEAD both the symlink and referenced object's headers would be
compared and the newest returned. While this would work, the behaviour can be
rather odd if an application were to ever GET or HEAD the referenced object directly
as it would not contain any of the headers posted to the symlink.

Given all of this the best choice left is to fail a POST to a symlink and let
the application take care of it, namely by posting the referenced object
directly. Achieving this behaviour requires several changes:

1) To avoid a HEAD on every POST, the object server will be made aware of
symlinks and can detect their presence and fail appropriately.
2) Simply failing a POST in the object server when the object is a symlink will
not work; Consider the following scenarios:

Scenario A::

  - Add a symlink
  T0 - PUT /accnt/cont/obj?symlink=true
  - Overwrite symlink with an regular object
  T1 - PUT /accnt/cont/obj
  - Assume at this point some of the primary nodes were down so handoff nodes
    were used.
  T2 - POST /accnt/cont/obj
  - Depending on the object server hit it may see obj as either a symlink or a
    regular object, though we know in time it will indeed be a real object.

Scenario B::

  - Add a regular object
  T0 - PUT /accnt/cont/obj
  - Overwrite regular object with a symlink
  T1 - PUT /accnt/cont/obj?symlink=true
  - Assume at this point some of the primary nodes were down so handoff nodes
    were used.
  T2 - POST /accnt/cont/obj
  - Depending on the object server hit it may see obj as either a symlink or a
    regular object, though we know in time it will indeed be a symlink.

Given the scenarios above at T1 (i.e. during the post) it is possible some object
servers can see a symlink and others a regular object, thus it is not possible
to fail the POST of a symlink. Instead, the following behaviour will be
utilized, the object server will always apply the POST whether the object is a
symlink or a regular object. Next, we will still return an error to the client
if the object server believes it has seen a symlink. In scenario A) this would
imply the POST at T1 may fail but the update will indeed be applied to the
regular object, which is the correct behaviour. In scenario B) this would imply
the POST at T1 may fail but the update will indeed be applied to the symlink,
which while not ideal is not incorrect behaviour per say, and the error
returned to the application should cause it to apply the POST to the reference
object and given the initial point raised earlier this is indeed desirable.

The aim is for Swift symlinks to operate analogously to Unix symbolic
links (except where it does not make sense to do so).


2.1. Alternatives
-----------------

One could use a single-segment SLO manifest to achieve a similar
effect. However, the ETag of a SLO manifest is the MD5 of the ETags of
its segments, so using a single-segment SLO manifest changes the ETag
of the object. Also, object metadata (X-Object-Meta-\*) would have to
be copied to the SLO manifest since metadata from SLO segments does
not appear in the response. Further, SLO manifests contain the ETag of
the referenced segments, and if a segment changes, the manifest
becomes invalid. This is not a desirable property for symlinks.

A DLO manifest does not validate ETags, but it still fails to preserve
the referenced object's ETag and metadata, so it is also unsuitable.
Further, since DLOs are based on object name prefixes, the upload of a
new object (e.g. ``thesis.doc``, then later ``thesis.doc.old``) could
cause corrupted downloads.

Also, DLOs and SLOs cannot use each other as segments, while Swift
symlinks can reference DLOs and SLOs *and* act as segments in DLOs and
SLOs.

3. Client-facing API
====================

Clients create a Swift symlink by performing a zero-length PUT request
with the query parameter ``?symlink=true`` and the header
``X-Object-Symlink-Target-Object: <object>``.

For a cross-container symlink, also include the header
``X-Object-Symlink-Target-Container: <container>``. If omitted, it defaults to
the container of the symlink object.

For a cross-account symlink, also include the header
``X-Object-Symlink-Target-Account: <account>``. If omitted, it defaults to
the account of the symlink object.

Symlinks must be zero-byte objects. Attempting to PUT a symlink
with a nonempty request body will result in a 400-series error.

The referenced object need not exist at symlink-creation time. This
mimics the behavior of Unix symbolic links. Also, if we ever make bulk
uploads work with symbolic links in the tarballs, then we'll have to
avoid validation. ``tar`` just appends files to the archive as it
finds them; it does not push symbolic links to the back of the
archive. Thus, there's a 50% chance that any given symlink in a
tarball will precede its referent.


3.1 Example: Move an object to EC storage
-----------------------------------------

Assume the object is /v1/MY_acct/con/obj

1. Obtain an EC-storage-policy container either by finding a
   pre-existing one or by making a container PUT request with the
   right X-Storage-Policy header.

1. Make a COPY request to copy the object into the EC-policy
   container, e.g.::

    COPY /v1/MY_acct/con/obj
    Destination: ec-con/obj

1. Overwrite the replicated object with a symlink object::

    PUT /v1/MY_acct/con/obj?symlink=true
    X-Object-Symlink-Target-Container: ec-con
    X-Object-Symlink-Target-Object: obj

4. Interactions With Existing Features
======================================

4.1 COPY requests
-----------------

If you copy a symlink without ``?symlink=true``, you get a copy of the
referenced object. If you copy a symlink with ``?symlink=true``, you
get a copy of the symlink; it will refer to the same object,
container, and account.

However, if you copy a symlink without
``X-Object-Symlink-Target-Container`` between containers, or a symlink
without ``X-Object-Symlink-Target-Account`` between accounts, the new
symlink will refer to a different object.

4.2 Versioned Containers
------------------------

These will definitely interact. We should probably figure out how.


4.3 Object Expiration
---------------------

There's nothing special here. If you create the symlink with
``X-Delete-At``, the symlink will get deleted at the appropriate time.

If you use a plain POST to set ``X-Delete-At`` on a symlink, it gets
set on the referenced object just like other object metadata. If you
use POST with ``?symlink=true`` to set ``X-Delete-At`` on a symlink,
it will be set on the symlink itself.


4.4 Large Objects
-----------------

Since we'll almost certainly end up implementing symlinks as
middleware, we'll order the pipeline like this::

  [pipeline:main]
  pipeline = catch_errors ... slo dlo symlink ... proxy-server

This way, you can create a symlink whose target is a large object
*and* a large object can reference symlinks as segments.

This also works if we decide to implement symlinks in the proxy
server, though that would only happen if a compelling reason were
found.


4.5 User Authorization
----------------------

Authorization will be checked for both the symlink and the referenced
object. If the user is authorized to see the symlink but not the
referenced object, they'll get a 403, same as if they'd tried to
access the referenced object directly.


4.6. Quotas
-----------

Nothing special needed here. A symlink counts as 1 object toward an
object-count quota. Since symlinks are zero bytes, they do not count
toward a storage quota, and we do not need to write any code to make
that happen.


4.7 list_endpoints / Hadoop / ZeroVM
------------------------------------

If the application talks directly to the object server and fetches a
symlink, it's up to the application to deal with it. Applications that
bypass the proxy should either avoid use of symlinks or should know
how to handle them.

The same is true for SLO, DLO, versioning, erasure codes, and other
services that the Swift proxy server provides, so we are not without
precedent here.


4.8 Container Sync
------------------

Symlinks are synced like every other object. If the referenced object
in cluster A has a different container name than in cluster B, then
the symlink will point to the wrong place in one of the clusters.

Intra-container symlinks (those with only
``X-Object-Symlink-Target-Object``) will work correctly on both
clusters. Also, if containers are named identically on both clusters,
inter-container symlinks (those with
``X-Object-Symlink-Target-Object`` and
``X-Object-Symlink-Target-Container``) will work correctly too.


4.9 Bulk Uploads
----------------

Currently, bulk uploads ignore all non-file members in the uploaded
tarball. This could be expanded to also process symbolic-link members
(i.e. those for which ``tarinfo.issym() == True``) and create symlink
objects from them. This is not necessary for the initial
implementation of Swift symlinks, but it would be nice to have.

4.10 Swiftclient
----------------

python-swiftclient could download Swift symlinks as Unix symlinks if a
flag is given, or it could upload Unix symlinks as Swift symlinks in
some cases. This is not necessary for the initial implementation of
Swift symlinks, and is mainly mentioned here to show that
python-swiftclient was not forgotten.