Improve anti-affinity behavior for cluster creation¶
https://blueprints.launchpad.net/sahara/+spec/improving-anti-affinity
Enable sahara to distribute node creation in a more equitable manner with respect to compute hardware affinity.
Problem description¶
Current anti-affinity in sahara allows nodes in the anti-affinity group, equal to the number of hypervisors (https://bugs.launchpad.net/sahara/+bug/1426398).
If the number of nodes in the anti-affinity group are more than the number of hypervisors, sahara throws an error.
Proposed change¶
User will be able to define a ratio i.e number of nodes per hypervisor when requesting anti-affinity for a process.
The ratio would be a field while creating a cluster if user selects anti-affinity
Based on the ratio given by the user and number of nodes, more server groups will be created.
Number of server groups would be equal to the number of nodes per hypervisor.
In terms of heat templates, the server groups would be created while serializing the resources if anti-affinity is enabled for that cluster.
Instances would be allocated to those server groups while serializing the instance using “group” property of “scheduler_hints” which will be set to different server group for each instance in round robin fashion.
For allocation of server groups, following changes would be required:
Create a parameter named SERVER_GROUP_NAMES of type list in the OS::Heat::ResourceGroup resource
Store the server group name for each instance in the node group in this parameter. So the size of the parameter list would be equal to the number of instances in the node group
Now the instance with index i would belong to the server group name stored at SERVER_GROUP_NAMES[i]
This parameter will then be accessed from the scheduler hints
So in the node group template, scheduler hints will look like this,
"scheduler_hints": {
"group": {
"get_param": [SERVER_GROUP_NAMES, {"get_param": "instance_index"}]
}
}
E.g
A = Number of hypervisors = 5
B = Total number of nodes in the a-a group = 10
C = Number of nodes per hypervisor = nodes:hypervisor = 2
Number of server groups = C = 2
Nodes would be distributed in each of the created server groups in round-robin fashion.
Although, placement of any node in any of the server groups does not matter because all the nodes are anti-affine.
In case the ratio given by the user in the above example is 1, user will still get an error which will be thrown by nova.
We won’t allow old clusters to scale with new ratio
When a user requests to scale a cluster after the ratio has changed or requests a new ratio on an existing cluster, an error would be thrown saying “This cluster was created with X ratio, but now the ratio is Y. You will need to recreate”.
Alternatives¶
None
Data model impact¶
Ratio would be a field in the cluster object
REST API impact¶
None
Other end user impact¶
The change would provision the instances without any error even in case of more nodes in the anti-affinity group than the number of hypervisors if user defines the ratio correctly.
Deployer impact¶
None
Developer impact¶
None
Sahara-image-elements impact¶
None
Sahara-dashboard / Horizon impact¶
Yes a field has to be added in the sahara dashboard for collecting the ratio. The field will be displayed only when anti-affinity is selected.
Implementation¶
Assignee(s)¶
- Primary assignee:
akanksha-aha
Work Items¶
Add a ratio field in sahara-dashboard
Add the same field in sahara wherever required (Data Access Layer)
Add a new API which creates more server groups when required
Write Unit tests and run those tests
Write documentation
Dependencies¶
None
Testing¶
Will need to write unit tests
Documentation Impact¶
Need to add about improved anti-affinity behavior
References¶
None