Add HBase on Vanilla cluster

https://blueprints.launchpad.net/sahara/+spec/hbase-on-vanila

Apache HBase provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System(HDFS). This document serves as a description to add the support of HBase and ZooKeeper services on Vanilla cluster.

Problem description

Sahara vanilla plugin allows user to quickly provision a cluster with many core services, but it doesn’t support HBase and ZooKeeper.

Proposed change

To go against the Vanilla cluster distributed architecture, we only support fully-distributed HBase deployment. In a distributed configuration, the cluster contains multiple nodes, each of which runs one or more HBase Daemon. These include HBase Master instance, multiple ZooKeeper nodes and multiple RegionServer nodes.

A distributed HBase installation depends on a running ZooKeeper cluster. HBase default manages a ZooKeeper “cluster” for you, but you can also manage the ZooKeeper ensemble independent of HBase. The variable “HBASE_MANAGES_ZK” in “conf/hbase-env.sh”, which default to true, tells HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase.

We should expose this variable in “cluster_configs” to let user determine the creator of ZooKeeper service.

In production, it is recommended that run a ZooKeeper ensemble of 3, 5 or 7 machines; the more members an ensemble has, the more tolerant the ensemble is of host failures. Also, run an odd number of machines. An even number of peers is supported, but it is normally not used because an even sized ensemble requires, proportionally, more peers to form a quorum than an odd sized ensemble requires.

  • If we set “HBASE_MANAGES_ZK” to false, Sahara will validate the number of ZooKeeper services in node groups to keep ZK instances in odd number.

  • If we set “HBASE_MANAGES_ZK” to true, Sahara will automatically determine the instances to start ZooKeeper. The cluster contains ZK nodes more than 1 nodes, less than 5 nodes. If we want to have more ZK nodes, setting HBASE_MANAGES_ZK to false would be a good choice.

If we want to scale the cluster up or down, ZooKeeper and HBase services will be restarted. And after scaling up or down, the rest of ZooKeeper nodes should also be kept in odd number. If there is only one ZooKeeper node, the status of ZooKeeper service will be “standalone”.

One thing should be specified is the default value used in configuration:

ZooKeeper Configuration in “/opt/zookeeper/conf/zoo.cfg”:

dataDir=/var/data/zookeeper
clientPort=2181
server.1=zk-0:2888:3888
server.2=zk-1:2888:3888

HBase Configuration in “/opt/hbase/conf/hbase-site.xml”:

hbase.tmp.dir=/var/data/hbase
hbase.rootdir=hdfs://master:9000/hbase
hbase.cluster.distributed=true
hbase.master.port=16000
hbase.master.info.port=16010
hbase.regionserver.port=16020

Security Group will open ports (2181, 2888, 3888, 16000, 16010, 16020) after this change if configuration is not changed.

Alternatives

Data model impact

None

REST API impact

None

Other end user impact

None

Deployer impact

None

Developer impact

None

Sahara-image-elements impact

  • Build new Vanilla image includes ZK and HBase packages

Sahara-dashboard / Horizon impact

  • An option should be added to the Node Group create and update forms.

Implementation

Assignee(s)

Primary assignee:

Shu Yingya

Work Items

  • Build new image by sahara-image-elements

  • Add ZooKeeper to Vanilla in sahara

  • Add HBase to Vanilla in sahara

  • Update Sahara-dashboard to choose ZK creator in sahara-dashboard

Dependencies

None

Testing

  • Unit test coverage in sahara

Documentation Impact

  • Vanilla plugin description should be updated

References

None