Add ability of scheduling EDP jobs for sahara¶
https://blueprints.launchpad.net/sahara/+spec/enable-scheduled-edp-jobs
This spec is to allow running scheduled edp jobs in sahara.
Problem description¶
Currently sahara only supports one time click edp job. But in many use cases, we need scheduled edp jobs. So by adding scheduled ability to sahara edp engine, we can have different implementation for different engine.(oozie,spark,storm etc)
Proposed change¶
Define run_scheduled_job() interface in sahara base edp engine, then implement this interface to the oozie engine. (Spark and storm engine will be drafted in later spec)
Define two job execution types which indicate the job execution. But we need not to change the API, instead we can add parameters into job configs. Sahara will run different type of jobs according to the job execution type. In the api request, user should pass the job_execution_type into job_configs.
Two job execution types: (1)basic. runs simple one-time edp jobs, current sahara implementation (2)scheduled. runs scheduled edp jobs
Example of a scheduled edp job request
POST /v1.1/{tenant_id}/jobs/<job_id>/execute
For oozie engine implementation of scheduled edp jobs, we have changes as blow:
Before running the job, sahara will create a coordinator.xml to describe the job, then upload it to the HDFS EDP job lib folder. With this file, sahara call oozie client to submit this job, the job will be run at the scheduled time, the job status will be shown as “PREP” in the Horizon page. Certainly, user can delete this job in preparing status as welll as in running status.
Example of coordinator.xml
For spark and storm implementation, there is no implementation now, and we will add them later.
Alternatives¶
(1)Run edp job manually by login into the VM and running oozie command. (2)users can create cron jobs
Data model impact¶
None
REST API impact¶
There is no change here, and we can use current API, POST /v1.1/{tenant_id}/jobs/<job_id>/execute We can pass job_execution_type, start time, into job_configs to sahara.
Other end user impact¶
None
Deployer impact¶
None
Developer impact¶
None
Sahara-image-elements impact¶
None
Sahara-dashboard / Horizon impact¶
In Job launch page, add textbox for user to input start job time, default value is now, to compatible with current implementation
Implementation¶
Assignee(s)¶
- Primary assignee:
luhuichun(lu huichun)
- Other contributors:
None
Work Items¶
define scheduled job type
create coordinator.xml before run job in edp engine
upload the coordinator.xml to job’s HDFS folder
add run_schedule_job in oozie engine
modify sahara api reference docs
Add task to update the WADL at api-site
Dependencies¶
None.
Testing¶
unit test in edp engine add scenario integration test
Documentation Impact¶
Need to be documented.
References¶
oozie scheduled and recursive job implementation https://oozie.apache.org/docs/4.0.0/CoordinatorFunctionalSpec.html