Use first_run to One-step Start Cluster¶
https://blueprints.launchpad.net/sahara/+spec/first-run-api-usage
This specification proposes to use cm_api method first_run to start cluster for CDH plugin in Sahara, instead of current a batch of methods.
Problem description¶
Now in CDH plugin method start_cluster, a lot of methods defined in cloudera_utils.py is used to configure/prepare services to be started. Those methods include cm_api methods in their body, and check the return status. E.g., cu.format_namenode will call cm_api ApiService.format_hdfs.
However, this way is not preferred by Cloudera. The much easier way is using a single method first_run to most of those work. In fact, in Cloudera Manager first_run is also used to do the final step of deploying a cluster.
Changing current start_cluster codes into using first_run method will benefit by:
Leave work up to Cloudera Manager to itself, instead of manually doing it.
Simplify the work of adding more service support.
Avoid possible errors generated by future CM changes.
Proposed change¶
The implementation will change start_cluster to call first_run, and remove the other part of work can be done by first_run from the method body.
For detail, it will be like following:
In deploy.py, possible start_cluster method may be like:
def start_cluster(cluster):
cm_cluster = cu.get_cloudera_cluster(cluster)
""" some pre codes """
cu.first_run(cluster)
""" some post codes """
Current methods used to configure CDH components, like _configure_spark, _install_extjs, and most part of start_cluster body can be removed.
In cloudera_utils.py, first_run can be defined as (just for example):
@cloudera_cmd
def first_run(cluster):
cm_cluster = get_cloudera_cluster(cluster)
yield cm_cluster.first_run()
Methods for configuring CDH components, like create_yarn_job_history_dir, create_oozie_db, install_oozie_sharelib, create_hive_metastore_db, create_hive_dirs can be removed.
Alternatives¶
Current way works at this stage, but it increases complexity of coding work to add more services support to CDH plugin. And, when CM is upgraded in the future, the correctness of current codes cannot be assured. At the end, the first_run method to start services is recommended by Cloudera.
Data model impact¶
None
REST API impact¶
None
Other end user impact¶
None
Deployer impact¶
None
Developer impact¶
It will be easier for developers to add more CDH services support.
Sahara-image-elements impact¶
None
Sahara-dashboard / Horizon impact¶
None
Implementation¶
Assignee(s)¶
- Primary assignee:
ken chen
- Other contributors:
ken chen
Work Items¶
The work items will be:
Change current deploy.py and cloudera_utils.py in the way above.
Test and evaluate the change.
Dependencies¶
None
Testing¶
Take an integration test to create a cluster.
Documentation Impact¶
None
References¶
None