Refactor the Datastore Manager¶
There is a large amount of boiler-plate and/or copy-and-paste code in each datastore manager. As more managers are added, the time involved in maintaining all this code will continue to grow. To alleviate this and the bugs it promotes, a base manager class needs to be created where all common code can reside.
Launchpad Blueprint: https://blueprints.launchpad.net/trove/+spec/datastore-manager-refactor
Problem Description¶
When bugs are discovered and fixed in the manager code of one datastore, these changes are seldom transferred to all the other datastores. This causes a ‘drift’ in the stability of each experimental datastore as currently there are no third-party CI’s. In addition, when ‘new improved’ ways to solve a problem are implemented they are done differently across each manager, if at all. It is also difficult to transfer features (and coding knowledge) from one datastore to another, and the problem is exacerbated as code is copied-and-pasted when a new datastore manager is implemented.
A current example of this problem is the issue where instances flip from BUILD to ACTIVE and back again 1, or from BUILD->SHUTDOWN->ACTIVE. 2 Fixing this means changing how prepare works, and currently this change would be required to be made in each datastore manager. Implementing this in a uniform way would be almost impossible with the current architecture of the manager code base.
Proposed Change¶
A new ‘Manager’ class will be created that will be the base class for all datastore managers. In order to keep the scope reasonable, only the barest minimum of functionality will be pulled back into the base class in the initial implementation - this will include methods that are currently ‘boiler-plate’ code (such as the response to the rpc_ping and get_fileststem_stats calls, and the periodic task, update_status).
A mechanism for encapsulating functionality will also be needed in order for the base manager to be able to execute datastore-specific instructions. This will be accomplished through the use of properties that can be overridden by each descendant. Some required ones (such as the ‘status’ object) will be declared abstract and must be present; others (such as the new configuration manager, and potentially a future dictionary of strategies) will be optional.
The new Manager class will reside in the datastore module:
trove/guestagent/datastore/manager.py
This is alongside the existing service.py file (which contains the existing BaseDbStatus class). The directory structure of the MySQL derived classes will also be tidied a bit and will end up looking like the following:
trove
+-- guestagent
+-- datastore
+-- __init__.py
+-- manager.py <-- new 'base' manager
+-- service.py
+-- experimental
+-- __init__.py
+-- cassandra
+-- __init__.py
+-- manager.py
+-- service.py
+-- system.py
<other experimental datastore modules>
+-- redis
+-- __init__.py
+-- manager.py
+-- service.py
+-- system.py
+-- vertica
+-- __init__.py
+-- manager.py
+-- service.py
+-- system.py
+-- mysql_common <-- new module
+-- __init__.py
+-- manager.py <-- renamed from mysql/manager_base.py
+-- service.py <-- renamed from mysql/service_base.py
+-- mysql
+-- __init__.py
+-- manager.py
+-- service.py
+-- technical-preview
+-- __init__.py
Within the context of this refactor (as a proof-of-concept), the issue with having instances flip in-and-out of BUILD state will be addressed properly. The prepare method will be moved into the base class, which will seamlessly implement the code required to ensure that no notifications are sent until it is guaranteed that prepare has finished successfully. The existing prepare methods will be renamed ‘do_prepare’ and will be called from within the base prepare method.
This manner of determining whether prepare has finished will be accomplished by having the manager write a file at the start and at the successful conclusion of the prepare operation. Any errors (exceptions) will be trapped and logged and the instance set into the FAILED state.
A sample of what this will look like in the base manager is as follows, although additional properties may be defined as they are deemed necessary (and others can be added in future cleanup work):
class Manager(periodic_task.PeriodicTasks):
"""This is the base class for all datastore managers. Over time, common
functionality should be pulled back here from the existing managers.
"""
def __init__(self, manager_name):
super(Manager, self).__init__(CONF)
# Manager properties
self.__manager_name = manager_name
self.__manager = None
self.__prepare_error = False
@property
def manager_name(self):
"""This returns the passed-in name of the manager."""
return self.__manager_name
@property
def manager(self):
"""This returns the name of the manager."""
if not self.__manager:
self.__manager = CONF.datastore_manager or self.__manager_name
return self.__manager
@property
def prepare_error(self):
return self.__prepare_error
@prepare_error.setter
def prepare_error(self, prepare_error):
self.__prepare_error = prepare_error
@property
def configuration_manager(self):
"""If the datastore supports the new-style configuration manager,
it should override this to return it.
"""
return None
@abc.abstractproperty
def status(self):
"""This should return an instance of a status class that has been
inherited from datastore.service.BaseDbStatus. Each datastore
must implement this property.
"""
return None
################
# Status related
################
@periodic_task.periodic_task
def update_status(self, context):
"""Update the status of the trove instance. It is decorated with
perodic_task so it is called automatically.
"""
LOG.debug("Update status called.")
self.status.update()
def rpc_ping(self, context):
LOG.debug("Responding to RPC ping.")
return True
#################
# Prepare related
#################
def prepare(self, context, packages, databases, memory_mb, users,
device_path=None, mount_point=None, backup_info=None,
config_contents=None, root_password=None, overrides=None,
cluster_config=None, snapshot=None):
"""Set up datastore on a Guest Instance."""
LOG.info(_("Starting datastore prepare for '%s'.") % self.manager)
self.status.begin_install()
post_processing = True if cluster_config else False
try:
self.do_prepare(context, packages, databases, memory_mb,
users, device_path, mount_point, backup_info,
config_contents, root_password, overrides,
cluster_config, snapshot)
except Exception as ex:
self.prepare_error = True
LOG.exception(_("An error occurred preparing datastore: %s") %
ex.message)
raise
finally:
LOG.info(_("Ending datastore prepare for '%s'.") % self.manager)
self.status.end_install(error_occurred=self.prepare_error,
post_processing=post_processing)
# At this point critical 'prepare' work is done and the instance
# is now in the correct 'ACTIVE' 'INSTANCE_READY' or 'ERROR' state.
# Of cource if an error has occurred, none of the code that follows
# will run.
LOG.info(_('Completed setup of datastore successfully.'))
# We only create databases and users automatically for non-cluster
# instances.
if not cluster_config:
try:
if databases:
LOG.debug('Calling add databases.')
self.create_database(context, databases)
if users:
LOG.debug('Calling add users.')
self.create_user(context, users)
except Exception as ex:
LOG.exception(_("An error occurred creating databases/users: "
"%s") % ex.message)
raise
try:
LOG.debug('Calling post_prepare.')
self.post_prepare(context, packages, databases, memory_mb,
users, device_path, mount_point, backup_info,
config_contents, root_password, overrides,
cluster_config, snapshot)
except Exception as ex:
LOG.exception(_("An error occurred in post prepare: %s") %
ex.message)
raise
@abc.abstractmethod
def do_prepare(self, context, packages, databases, memory_mb, users,
device_path, mount_point, backup_info, config_contents,
root_password, overrides, cluster_config, snapshot):
"""This is called from prepare when the Trove instance first comes
online. 'Prepare' is the first rpc message passed from the
task manager. do_prepare handles all the base configuration of
the instance and is where the actual work is done. Once this method
completes, the datastore is considered either 'ready' for use (or
for final connections to other datastores) or in an 'error' state,
and the status is changed accordingly. Each datastore must
implement this method.
"""
pass
def post_prepare(self, context, packages, databases, memory_mb, users,
device_path, mount_point, backup_info, config_contents,
root_password, overrides, cluster_config, snapshot):
"""This is called after prepare has completed successfully.
Processing done here should be limited to things that will not
affect the actual 'running' status of the datastore (for example,
creating databases and users, although these are now handled
automatically). Any exceptions are caught, logged and rethrown,
however no status changes are made and the end-user will not be
informed of the error.
"""
LOG.debug('No post_prepare work has been defined.')
pass
#####################
# File System related
#####################
def get_filesystem_stats(self, context, fs_path):
"""Gets the filesystem stats for the path given."""
# TODO(peterstac) - note that fs_path is not used in this method.
mount_point = CONF.get(self.manager).mount_point
LOG.debug("Getting file system stats for '%s'" % mount_point)
return dbaas.get_filesystem_volume_stats(mount_point)
#################
# Cluster related
#################
def cluster_complete(self, context):
LOG.debug("Cluster creation complete, starting status checks.")
self.status.end_install()
The base service class will be enhanced to contain the necessary methods to set a flag denoting whether prepare has finished of not. This will look something like the following (only new code is shown):
class BaseDbStatus(object):
GUESTAGENT_DIR = '~'
PREPARE_START_FILENAME = '.guestagent.prepare.start'
PREPARE_END_FILENAME = '.guestagent.prepare.end'
def __init__(self):
self._prepare_completed = None
@property
def prepare_completed(self):
if self._prepare_completed is None:
# Force the file check
self.prepare_completed = None
return self._prepare_completed
@prepare_completed.setter
def prepare_completed(self, value):
# Set the value based on the existence of the file; 'value' is
# ignored
# This is required as the value of prepare_completed is cached,
# so this must be referenced any time the existence of the
# file changes
self._prepare_completed = os.path.isfile(
guestagent_utils.build_file_path(
self.GUESTAGENT_DIR, self.PREPARE_END_FILENAME))
def begin_install(self):
"""Called right before DB is prepared."""
prepare_start_file = guestagent_utils.build_file_path(
self.GUESTAGENT_DIR, self.PREPARE_START_FILENAME)
operating_system.write_file(prepare_start_file, '')
self.prepare_completed = False
self.set_status(instance.ServiceStatuses.BUILDING, True)
def end_install(self, error_occurred=False, post_processing=False):
"""Called after prepare completes."""
# Set the "we're done" flag if there's no error and
# no post_processing is necessary
if not (error_occurred or post_processing):
prepare_end_file = guestagent_utils.build_file_path(
self.GUESTAGENT_DIR, self.PREPARE_END_FILENAME)
operating_system.write_file(prepare_end_file, '')
self.prepare_completed = True
final_status = None
if error_occurred:
final_status = instance.ServiceStatuses.FAILED
elif post_processing:
final_status = instance.ServiceStatuses.INSTANCE_READY
if final_status:
LOG.info(_("Set final status to %s.") % final_status)
self.set_status(final_status, force=True)
else:
self._end_install_or_restart(True)
def end_restart(self):
self.restart_mode = False
LOG.info(_("Ending restart."))
self._end_install_or_restart(False)
def _end_install_or_restart(self, force):
"""Called after DB is installed or restarted.
Updates the database with the actual DB server status.
"""
real_status = self._get_actual_db_status()
LOG.info(_("Current database status is '%s'.") % real_status)
self.set_status(real_status, force=force)
@property
def is_installed(self):
"""This is for compatibility - it may be removed during further
cleanup.
"""
return self.prepare_completed
def set_status(self, status, force=False):
"""Use conductor to update the DB app status."""
if force or self.is_installed:
LOG.debug("Casting set_status message to conductor "
"(status is '%s')." % status.description)
context = trove_context.TroveContext()
heartbeat = {'service_status': status.description}
conductor_api.API(context).heartbeat(
CONF.guest_id, heartbeat, sent=timeutils.float_utcnow())
LOG.debug("Successfully cast set_status.")
self.status = status
else:
LOG.debug("Prepare has not completed yet, skipping heartbeat.")
Configuration¶
No configuration changes are anticipated.
Database¶
None
Public API¶
None
Public API Security¶
None
Python API¶
None
CLI (python-troveclient)¶
None
Internal API¶
The ServiceStatuses.BUILD_PENDING state has been renamed to ServiceStatuses.INSTANCE_READY to better reflect the instance’s actual state. The value displayed will remain ‘BUILD’ so that there should be no outward differences and thus backwards compatibility will be maintained.
Guest Agent¶
This change should not affect any behavior on the Guest Agent. The current tests should be adequate to ensure that the change is fully compatible with the rest of the code base.
Alternatives¶
One suggestion as to how to fix the prepare issue was to use Nova metadata, which is available on the guest instance. If it is decided that this would be useful, it could be added in addition to the proposed method as a means of notification only.
Dashboard Impact (UX)¶
TBD (section added after approval)
Implementation¶
Assignee(s)¶
- Primary assignee:
<peterstac>
Milestones¶
- Target Milestone for completion:
eg. Mitaka-1
Work Items¶
All changes will be done in the context of a single task.
Upgrade Implications¶
None are anticipated.
Dependencies¶
None
Testing¶
The unittests will be modified as necessary, however minimal changes will be made in this regard as well. In order to keep the changes as small as possible, refactoring the tests will also be done in stages with only the bare minimum done to start and the remainder being left to a future date. The future work would include testing the base class thoroughly and then removing all the corresponding tests from the derived classes.
The int-tests should continue to run as always, and will be used to determine that no fundamental changes have been made to the implementation, with the exception of the bug fixes (and they should just lead to greater stability in the test infrastructure).
Documentation Impact¶
Since all the changes are implementation related, no documentation changes are foreseen.
References¶
Appendix¶
None