Provide ability to configure most important configs automatically¶
https://blueprints.launchpad.net/sahara/+spec/recommend-configuration
Now users manually should configure most important hadoop configurations. It would be friendly to provide advices about cluster configurations for users.
Problem description¶
Now users manually should configure most important hadoop configs, but it’s required to have advanced knowledge in Hadoop. Most configs are complicated and not all users know them. We can provide advices about cluster configuration and automatically configure few basic configs, that will improve user experience. Created workaround can extended in future with new confiuguration and advices.
Proposed change¶
It’s proposed to add calculator, which would automatically configure most important configurations in dependency cluster specification: available disk space, ram, cpu, and so on. Such calculator already implemented in Ambari (see [1] and [2]), and we can use it as well. We should have ability to switch off autoconfiguration and if user also manually configured some hadoop config, autoconfiguration also will not be applied.
The following list of configs will be configured, using formulas from [1] and [2]:
yarn.nodemanager.resource.memory-mb
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.app.mapreduce.am.resource.mb
yarn.app.mapreduce.am.command-opts
mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
mapreduce.map.java.opts
mapreduce.reduce.java.opts
mapreduce.task.io.sort.mb
Also as a simple example we can autoconfigure before cluster validation
dfs.replication
if amout of datanodes
less than default value.
Also it’s required to add new plugin SPI method recommend_configs
which
will autoconfigure cluster configs.
Alternatives¶
None
Data model impact¶
It’s required to add new column use_autoconfig
to cluster,
cluster_template, node_group, node_group_template, templates_relations
objects in DB. By default use_autoconfig
will be True
. If
use_autoconfig
is False
, then we will not use autoconfiguration
during cluster creation. If none of the configs from the list above are
configured manually and use_autoconfig
is True
, then we will
autoconfigure configs from list above. Same behaviour will be used for
node_groups configs autoconfiguration.
REST API impact¶
Need to support of switch off autoconfiguration.
Other end user impact¶
Need to support of switch off autoconfiguration via python-saharaclient.
Deployer impact¶
None
Developer impact¶
None
Sahara-image-elements impact¶
None
Sahara-dashboard / Horizon impact¶
Need to add new checkbox which will allow to swith off autoconfiguration from
Horizon during cluster creation/scaling. If plugin doesn’t support autoconfig
this checkbox will not be displayed. We can use _info
field at [3] for
field.
Implementation¶
Assignee(s)¶
- Primary assignee:
vgridnev
- Other contributors:
sreshetniak
Work Items¶
Proposed change will consists with following steps:
Implement new plugin SPI method which will provide configuration advices;
Add support of this method in following plugins: CDH, Vanilla 2.6.0, Spark (
dfs.replication
only);Provide ability to switch on autoconfiguration via UI;
Provide ability to switch on autoconfiguration via saharaclient;
Update WADL docs about new feilds objects.
Dependencies¶
Depends on Openstack requirements
Testing¶
Unit tests will be implemented for this feature. Sahara CI also can start use autoconfiguration as well.
Documentation Impact¶
Need to document feature and all rules, which will be used for autoconfiguration.
References¶
[1] https://apache.googlesource.com/ambari/+/a940986517cbfeb2ef889f0d8a45579b27adad1c/ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py [2] https://apache.googlesource.com/ambari/+/a940986517cbfeb2ef889f0d8a45579b27adad1c/ambari-server/src/main/resources/stacks/HDP/2.1/services/stack_advisor.py [3] https://github.com/openstack/sahara/blob/master/sahara/service/api.py#L188