Vitrage-graph fast failover¶

https://blueprints.launchpad.net/vitrage/+spec/vitrage-fast-failover

vitrage-graph high availability should meet these requirements:

Single active instance of vitrage-graph (managed by pacemaker).
Initialize quickly upon failover without requesting updates.
In case of a long downtime, vitrage-graph startup will request collector updates

Problem description¶

Vitrage-graph is active standby. Currently on a failover, vitrage-graph needs to pull all the data again from the collector data-sources. This takes a considerable amount of time, in which data is inconsistent. As we wish to continue working with an in-memory graph (due to performance), vitrage-graph service will remain active-standby. Therefore, downtime must be minimized in failover events.

Proposed change¶

after every get_all, vitrage-graph stores a full entity graph snapshot in the db, so the majority of events do not need to be replayed.

Vitrage-graph sends each processed event to vitrage-persistor so these are stored in the order of handling.

Upon init vitrage-graph queries the db table graph_snapshots, fetching the latest entry, it will be used if it is not older than snapshot_interval.

Init with a snapshot - on failover

Unpickle stored snapshot to get the graph.
Run the processor on all the events (from events table) that occurred after the snapshot.
Enable the evaluators.
Process all the events that are waiting in the message bus.

Init without a snapshot - a fresh start (This is the current behaviour).

Start with a new empty graph.
RPC to Collector to run get_all for all drivers, then process the events.
Process all the events that are waiting in the message bus.
Enable evaluator and iterate all graph.

Alternatives¶

Using a persistent graph database can improve vitrage-graph high availability as fail-over will be quick due to running active-active. This may be a preferred solution in terms of high availability, but overall, when comparing performance compared to in-memory networkx, the degradation is not reasonable

Data model impact¶

May require minor changes, TBD.

REST API impact¶

None

Versioning impact¶

None

Other end user impact¶

None

Deployer impact¶

This will be enabled by default. Deployer may disable in by adding the following to vitrage.conf [persistancy] enable_persistancy=false

Developer impact¶

None

Horizon impact¶

None

Implementation¶

Assignee(s)¶

Primary assignee:: idan-hefetz
Other contributors:: None

Work Items¶

None

Dependencies¶

None

Testing¶

Additional tempest will be added for fail-over, as persistence is already covered by existing tempest. Unit tests will not be affective here as changes are mostly in the init process and scheduler. This feature mostly reuses existing (tested) functionality.

Documentation Impact¶

None

References¶

None

Vitrage-graph fast failover

Vitrage-graph fast failover¶

Problem description¶

Proposed change¶

Alternatives¶

Data model impact¶

REST API impact¶

Versioning impact¶

Other end user impact¶

Deployer impact¶

Developer impact¶

Horizon impact¶

Implementation¶

Assignee(s)¶

Work Items¶

Dependencies¶

Testing¶

Documentation Impact¶

References¶

Vitrage Specs 0.0.1.dev173

Page Contents