[EDP] Add a new job-types endpoint¶
https://blueprints.launchpad.net/sahara/+spec/edp-job-types-endpoint
Add a new job-types endpoint that can report all supported job types for a Sahara instance and which plugins support them.
Problem description¶
There are currently two problems around job types in Sahara that impact user experience:
The current /jobs/config-hints/<job_type> endpoint is not adequate for providing configuration hints because it does not take into account plugin type or the framework version. This endpoint was meant to give users a list of likely configuration values for a job that they may want to modify (at one point the UI incorporated the hints as well). Although some config is common across multiple plugins, hints need to be specific to the plugin and the framework version in order to be useful. The endpoint does not take a cluster or plugin argument and so hints must be general.
A user currently has no indicator of the job types that are actually available from a Sahara instance (the UI lists them all). The set of valid job types is based on the plugins loaded for the current instance. Furthermore, not all job types will be available to run on all clusters launched by the user because they are plugin dependent.
These problems should be solved without breaking backward compatibility in the REST API.
Proposed change¶
Add a new endpoint that will indicate for the running Sahara instance which job types are supported by which versions of which plugins. Optionally, plugin-and-version-specific config hints will be included for each supported job type.
Because config hints can be very long, they will not be included in a response by default. A query string parameter will be used to indicate that they should be included.
The endpoint will support the following optional query strings for filtering. Each may be used more than once to query over a list of values, for example type=Pig&type=Java:
type A job type to consider. Default is all job types.
plugin A plugin to consider. Default is all plugins.
version A plugin version to consider. Default is all versions.
The REST API method is specified in detail below under REST API impact.
We will need two new optional methods in the Plugin SPI. This information ultimately comes from the EDP engine(s) used by a plugin but we do not want to actually allocate an EDP engine object for this so the existing get_edp_engine() will not suffice (and besides, it requires a cluster object):
@abc.abstractmethod
def get_edp_job_types(self, versions=[]):
return []
@abc.abstractmethod
def get_edp_config_hints(self, job_type, version):
return {}
These specific methods are mentioned here because they represent a change to the public Plugin SPI.
Alternatives¶
Fix the existing /jobs/config-hints endpoint to take a cluster id or a plugin-version pair and return appropriate config hints. However, this would break backward compatibility.
Still add an additional endpoint to retrieve the supported job types for the Sahara instance separate from config hints.
However, it makes more sense to deprecate the current config-hints interface and add the new endpoint which serves both purposes.
Data model impact¶
None
REST API impact¶
Backward compatibility will be maintained since this is a new endpoint.
GET /v1.1/{tenant_id}/job-types
Normal Response Code: 200 (OK)
Errors: none
Indicate which job types are supported by which versions of which plugins in the current instance.
- Example
request
GET http://sahara/v1.1/775181/job-types
response
HTTP/1.1 200 OK Content-Type: application/json
{ "job_types": [ { "name": "Hive", "plugins": [ { "description": "The Apache Vanilla plugin.", "name": "vanilla", "title": "Vanilla Apache Hadoop", "versions": { "1.2.1": {} } }, { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "1.3.2": {}, "2.0.6": {} } } ] }, { "name": "Java", "plugins": [ { "description": "The Apache Vanilla plugin.", "name": "vanilla", "title": "Vanilla Apache Hadoop", "versions": { "1.2.1": {} } }, { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "1.3.2": {}, "2.0.6": {} } } ] }, { "name": "MapReduce", "plugins": [ { "description": "The Apache Vanilla plugin.", "name": "vanilla", "title": "Vanilla Apache Hadoop", "versions": { "1.2.1": {} } }, { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "1.3.2": {}, "2.0.6": {} } } ] }, { "name": "MapReduce.Streaming", "plugins": [ { "description": "The Apache Vanilla plugin.", "name": "vanilla", "title": "Vanilla Apache Hadoop", "versions": { "1.2.1": {} } }, { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "1.3.2": {}, "2.0.6": {} } } ] }, { "name": "Pig", "plugins": [ { "description": "The Apache Vanilla plugin.", "name": "vanilla", "title": "Vanilla Apache Hadoop", "versions": { "1.2.1": {} } }, { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "1.3.2": {}, "2.0.6": {} } } ] } ] }
The job-types endpoint returns a list. Each item in the list is a dictionary describing a job type that is supported by the running Sahara. Notice for example that the Spark job type is missing.
Each job type dictionary contains the name of the job type and a list of plugins that support it.
For each plugin, we include the basic identifying information and then a versions dictionary. Each entry in the versions dictionary has the name of the version as the key and the corresponding config hints as the value. Since this example did not request config hints, the dictionaries are empty.
Here is an example of a request that uses the plugin and version filters:
- Example
request
GET http://sahara/v1.1/775181/job-types?plugin=hdp&version=2.0.6
response
HTTP/1.1 200 OK Content-Type: application/json
{ "job_types": [ { "name": "Hive", "plugins": [ { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "2.0.6": {} } } ] }, { "name": "Java", "plugins": [ { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "2.0.6": {} } } ] }, { "name": "MapReduce", "plugins": [ { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "2.0.6": {} } } ] }, { "name": "MapReduce.Streaming", "plugins": [ { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "2.0.6": {} } } ] }, { "name": "Pig", "plugins": [ { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "2.0.6": {} } } ] } ] }
Here is another example that enables config hints and also filters by plugin, version, and job type.
- Example
request
GET http://sahara/v1.1/775181/job-types?hints=true&plugin=hdp&version=1.3.2&type=Hive
response
HTTP/1.1 200 OK Content-Type: application/json
{ "job_types": [ { "name": "Hive", "plugins": [ { "description": "The Hortonworks Sahara plugin.", "name": "hdp", "title": "Hortonworks Data Platform", "versions": { "1.3.2": { "job_config": { "args": {}, "configs": [ { "description": "Reduce tasks.", "name": "mapred.reduce.tasks", "value": "-1" } ], "params": {} } } } } ] } ] }
This is an abbreviated example that shows imaginary config hints.
Other end user impact¶
The python-saharaclient should be extended to support this as well:
$ sahara job-types-list [--type] [--plugin [--plugin-version]]
Output should look like this (not sure where else to specify this):
+---------------------+-----------------------------------+
| name | plugin(versions) |
+---------------------+-----------------------------------+
| Hive | vanilla(1.2.1), hdp(1.3.2, 2.0.6) |
| Java | vanilla(1.2.1), hdp(1.3.2, 2.0.6) |
| MapReduce | vanilla(1.2.1), hdp(1.3.2, 2.0.6) |
| MapReduce.Streaming | vanilla(1.2.1), hdp(1.3.2, 2.0.6) |
| Pig | vanilla(1.2.1), hdp(1.3.2, 2.0.6) |
+---------------------+-----------------------------------+
Since config hints can return so much information, and description fields for instance can contain so much text, how to support config hints through the python-saharaclient is TBD.
As noted above, the Plugin SPI will be extended with optional methods. Existing plugins that support EDP will be modified as part of this change.
Deployer impact¶
None
Developer impact¶
None
Sahara-image-elements impact¶
None
Sahara-dashboard / Horizon impact¶
The UI will be able to take advantage of this information and filter the job types available to the user on the forms. It will also be able to make use of config hints.
Implementation¶
Assignee(s)¶
- Primary assignee:
tmckay
- Other contributors:
none
Work Items¶
Add basic endpoint support with optional methods in the plugin SPI
- Implement the methods for each plugin that supports EDP
This can be done as a series of separate small CRs
Add support to python-saharaclient
Update documentation
Dependencies¶
None
Testing¶
Unit tests
Tempest tests for API
Documentation Impact¶
It should be added to the REST API doc.