[EDP] Improve Java type compatibility

https://blueprints.launchpad.net/sahara/+spec/edp-improve-compatibility

Currently, EDP MapReduce (Java type) examples must add modifications to be able to use from a java action in an Oozie workflow.

This bp aims that users can migrate from other Hadoop cluster to Sahara without any modifications into their applications.

Problem description

Users need to modify their MapReduce programs as below:

  • Add conf.addResource in order to read configuration values from the <configuration> tag specified in the Oozie workflow:

    // This will add properties from the <configuration> tag specified
    // in the Oozie workflow.  For java actions, Oozie writes the
    // configuration values to a file pointed to by ooze.action.conf.xml
    conf.addResource(new Path("file:///",
                              System.getProperty("oozie.action.conf.xml")));
    
  • Eliminate System.exit for following restrictions of Oozie’s Java action. e.g. hadoop-examples.jar bundled with Apache Hadoop has been used System.exit.

First, users would try to launch jobs using examples and/or some applications executed on other Hadoop clusters (e.g. Amazon EMR). We should support the above users.

Proposed change

We will provide a new job type, called Java EDP Action, which overrides the Main class specified by main_class. The overriding class adds property and calls the original main method. The class also catches an exception that is caused by System.exit.

Alternatives

According to Oozie docs, Oozie 4.0 or later provides the way of overriding an action’s Main class (3.2.7.1). The proposing implementation is more simple than using the Oozie feature. (We will implement this without any dependencies of Oozie library.)

Data model impact

None

REST API impact

None

Other end user impact

Users will no longer need to modify their applications to use EDP.

Deployer impact

None

Developer impact

None

Sahara-image-elements impact

None

Sahara-dashboard / Horizon impact

sahara-dashboard / horizon needs to add this new job type.

Implementation

Assignee(s)

Primary assignee: Kazuki Oikawa (k.oikw)

Other contributors: Yuji Yamada (yamada-yuji)

Work Items

  • Add new job type (Java.EDP)

    • Java.EDP will be subtype of Java

    • Implement of uploading jar file of overriding class to HDFS

    • Implement of creating the workflow.xml

  • Implement the overriding class

Dependencies

None

Testing

We will add a integration test. This test checks whether WordCount example bundled with Apache Hadoop executes successfully.

Documentation Impact

If EDP examples use this feature, the docs need to update.

References