Run Spark jobs on vanilla Hadoop 2.x

https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-vanilla-hadoop

This specification proposes to add ability to run Spark jobs on cluster running vanilla version of Hadoop 2.x (YARN).

Problem description

Support for running Spark jobs in stand-alone mode exists as well as for CDH but not for vanilla version of Hadoop.

Proposed change

Add a new edp_engine class in the vanilla v2.x plugin that extends the SparkJobEngine. Leverage design and code from blueprint: https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0

Configure Spark to run on YARN by setting Spark’s configuration file (spark-env.sh) to point to Hadoop’s configuration and deploying that configuration file upon cluster creation.

Extend sahara-image-elements to support creating a vanilla image with Spark binaries (vanilla+spark).

Alternatives

Withouth these changes, the only way to run Spark along with Hadoop MapReduce is to run on a CDH cluster.

Data model impact

None

REST API impact

None

Other end user impact

None

Deployer impact

None

Developer impact

None

Sahara-image-elements impact

Requires changes to sahara-image-elements to support building a vanilla 2.x image with Spark binaries. New image type can be vanilla+spark. Spark version can be fixed at Spark 1.3.1.

Sahara-dashboard / Horizon impact

None

Implementation

Assignee(s)

Primary assignee:

None

Work Items

New edp class for vanilla 2.x plugin. sahara-image-elements vanilla+spark extension. Unit test

Dependencies

Leveraging blueprint: https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0

Testing

Unit tests to cover vanilla engine working with Spark.

Documentation Impact

None

References

None