Running Spark Jobs on Cloudera Clusters 5.3.0¶
https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0
This specification proposes to add ability to run Spark Jobs on clusters with CDH (Cloudera Distribution Including Apache Hadoop).
Problem description¶
Sahara is able to run CDH clusters with running Spark services. However there was no possibility to run Spark jobs on clusters of this type.
Proposed change¶
The work involves adding a class for running Spark job via Cloudera plugin. Existing Spark engine was changed so that it lets to run Spark jobs with Spark and Cloudera plugins.
Alternatives¶
Do nothing.
Data model impact¶
None.
REST API impact¶
None.
Other end user impact¶
Required processes: - Master: SPARK_YARN_HISTORY_SERVER - Workers: YARN_NODEMANAGER
Deployer impact¶
None.
Developer impact¶
None.
Sahara-image-elements impact¶
None.
Sahara-dashboard / Horizon impact¶
None.
Implementation¶
Assignee(s)¶
- Primary assignee:
Alexander Aleksiyants
- Other contributors:
Oleg Borisenko
Work Items¶
Dependencies¶
None.
Testing¶
Unit tests to cover CDH engine for working with Spark jobs.
Unit tests for EDP Spark is now used for Spark Engine and EDP engine.
Documentation Impact¶
None.
References¶
None.