Use first_run to One-step Start Cluster

https://blueprints.launchpad.net/sahara/+spec/first-run-api-usage

This specification proposes to use cm_api method first_run to start cluster for CDH plugin in Sahara, instead of current a batch of methods.

Problem description

Now in CDH plugin method start_cluster, a lot of methods defined in cloudera_utils.py is used to configure/prepare services to be started. Those methods include cm_api methods in their body, and check the return status. E.g., cu.format_namenode will call cm_api ApiService.format_hdfs.

However, this way is not preferred by Cloudera. The much easier way is using a single method first_run to most of those work. In fact, in Cloudera Manager first_run is also used to do the final step of deploying a cluster.

Changing current start_cluster codes into using first_run method will benefit by:

  • Leave work up to Cloudera Manager to itself, instead of manually doing it.

  • Simplify the work of adding more service support.

  • Avoid possible errors generated by future CM changes.

Proposed change

The implementation will change start_cluster to call first_run, and remove the other part of work can be done by first_run from the method body.

For detail, it will be like following:

In deploy.py, possible start_cluster method may be like:

def start_cluster(cluster):
    cm_cluster = cu.get_cloudera_cluster(cluster)
    """ some pre codes """
    cu.first_run(cluster)
    """ some post codes """

Current methods used to configure CDH components, like _configure_spark, _install_extjs, and most part of start_cluster body can be removed.

In cloudera_utils.py, first_run can be defined as (just for example):

@cloudera_cmd
def first_run(cluster):
    cm_cluster = get_cloudera_cluster(cluster)
    yield cm_cluster.first_run()

Methods for configuring CDH components, like create_yarn_job_history_dir, create_oozie_db, install_oozie_sharelib, create_hive_metastore_db, create_hive_dirs can be removed.

Alternatives

Current way works at this stage, but it increases complexity of coding work to add more services support to CDH plugin. And, when CM is upgraded in the future, the correctness of current codes cannot be assured. At the end, the first_run method to start services is recommended by Cloudera.

Data model impact

None

REST API impact

None

Other end user impact

None

Deployer impact

None

Developer impact

It will be easier for developers to add more CDH services support.

Sahara-image-elements impact

None

Sahara-dashboard / Horizon impact

None

Implementation

Assignee(s)

Primary assignee:

ken chen

Other contributors:

ken chen

Work Items

The work items will be:

  • Change current deploy.py and cloudera_utils.py in the way above.

  • Test and evaluate the change.

Dependencies

None

Testing

Take an integration test to create a cluster.

Documentation Impact

None

References

None