Sahara allows multiple types of data source and job binary. However, there’s no clean abstraction around them, and the code to deal with them is often very difficult to read and modify. This change proposes to create clean abstractions that each data source type and job binary type can implement differently depending on its own needs.
Currently, the data source and job binary code are spread over different folders and files in Sahara, making this code hard to change and to extend. Right now, a developer who wants to create a new data source needs to look all over the code and modify things in a lot of places (and it’s almost impossible to know all of them without deep experience with the code). Once this change is complete, developers will be able to create code in a single directory and will be able to write their data source by implementing an abstract class. This will allow users to enable data sources that they write themselves (and hopefully contribute upstream) much more easily, and it will allow operators to disable data sources that their own stack does not support as well.
This change proposes to create the data source and job binary abstractions as plugins, in order to provide loading code dynamically, with a well defined interface. The existing types of data sources and job binaries will be refactored.
The interfaces that will be implemented are described below:
Data Source Interface
Job Binary Interface
These interfaces will be organized in the following folders structure:
Probably some changes in the interface are possible until the changes are implemented (parameters, method names, parameter names), but the main structure and idea should stay the same.
Also a plugin manager will be needed to deal directly with the different types of data sources and job binaries and to provide methods for the operators to disable/enable data sources and job binaries dynamically. This plugin manager was not detailed because is going to be similar to the plugin manager already existent for the cluster plugins.
A clear alternative is let things the way they are, but Sahara would be more difficult to extend and to understand; An alternative for the abstractions defined in the Proposed Change section would be to have only one abstraction instead of two interfaces for data sources and job binaries since these interfaces have a lot in common, implementing this alternative would remove the edp/service/utilities folder letting the code more unified and compact, but job binary and data source code would be considered only one plugin, which could difficult the pluggability feature of this change (per example: the provider would not be able to disable manila for data sources, but enable it for job binaries) and because of this it was not considered the best approach, instead we keep job binaries and data sources apart, but in contrast we need the utilities folder to avoid code replication.
Probably some new methods to manage supported types of data sources and job binaries will be needed (similar to the methods already offered by plugins).
After this change is implemented developers will be able to add and enable new data sources and job binaries easily, by just implementing the abstraction.
Primary assignee: mariannelinharesm
Other contributors: egafford
This change will require only changes in existing unit tests.
Will be necessary to add a devref doc about the abstractions created.