Allow placeholders in datasource URLs

https://blueprints.launchpad.net/sahara/+spec/edp-datasource-placeholders

This spec is to allow using placeholders in EDP data source URL.

Problem description

Common use case: user wants to run EDP job two times. Now the only way to do that with the same data sources is to erase result of the first run before running job the second time. Allowing to have random part in URL will allow to use output with random suffix.

Proposed change

Introduce special strings that could be used in EDP data source URL and will be replaced with appropriate value.

The proposed syntax for placeholder is %FUNC(ARGS)%.

As a first step I suggest to implement two functions only:

  • %RANDSTR(len)% - will be replaced with random string of lowercase letters of length len.

  • %JOB_EXEC_ID% - will be replaced with the job execution ID.

Placeholders will not be allowed in protocol prefix. So, there will be no validation impact.

List of functions could be extended later (e.g. to have %JOB_ID%, etc.).

URLs after placeholders replacing will be stored in job_execution.info field during job_execution creation. This will allow to use them later to find objects created by a particular job run.

Example of create request for data source with placeholder:

{
    "name": "demo-pig-output",
    "description": "A data source for Pig output, stored in Swift",
    "type": "swift",
    "url": "swift://edp-examples.sahara/pig-job/data/output.%JOB_EXEC_ID%",
    "credentials": {
        "user": "demo",
        "password": "password"
    }
}

Alternatives

Do not allow placeholders.

Data model impact

job_execution.info field (json dict) will also store constructed URLs.

REST API impact

None

Other end user impact

None

Deployer impact

None

Developer impact

None

Sahara-image-elements impact

None

Sahara-dashboard / Horizon impact

Horizon need to be updated to display actual URLs for job execution. Input Data Source and Output Data Source sections of job execution details page will be extended to include information about URLs used.

REST will not be changed since new information is stored in the existing ‘info’ field.

Implementation

Assignee(s)

Primary assignee:

alazarev (Andrew Lazarev)

Other contributors:

None

Work Items

  • Implement feature

  • Document feature

Dependencies

None.

Testing

Manually.

Documentation Impact

Need to be documented.

References

None