Z3 as an alternative Datalog engine

https://blueprints.launchpad.net/congress/+spec/alternative-engine-z3

Z3 is an open source Datalog engine developed by Microsoft. We propose an extension of Congress that lets the user define theories (ie. set of rules producing new tables from existing ones that can be later used by other theories) that will be treated by the Z3 engine instead of the existing one.

Problem description

A cloud operator may want to check if connectivity exists between two machines or if they are effectively isolated. This means understanding how traffic is routed (ie. what are the path the traffic may take) and how security groups and firewall rules are applied along those path.

Here are the main characteristics needed by the Datalog engine for solving such problems:

  • Recursion: routing packets means building paths (link paths, networks) and paths are naturally described recursively.

  • Efficient representation of IP addresses and IP address ranges as commonly found in switching devices and firewalls.

  • Efficient handling of large volumes of data.

Proposed change

We propose to integrate Z3, Microsoft open-source automatic prover, as an alternate Datalog engine for Congress.

Z3 theories will be expressed in the same syntax as standard theories. But the Datalog language supported by these theories will be tailored to what Z3 supports in Datalog mode:

  • different and reduced set of built-ins as explained in the Policy section,

  • no modals.

Architecture

Overview

A new kind of theory will be introduced alongside the classical NonRecursiveTheory. Z3 theories will consume data from other theories and use the Z3 engine to apply rules.

There will be one single Z3 solver instance per analyzer that will be abstracted in Python as a Z3 context object. Z3 context will host all tables of Z3 theories in Z3 format.

Z3 theories should be callable from other theories in particular from regular non-recursive theories.

Z3 theories can use data from other theories. The external tables used by Z3 theories will be completely imported in the Z3 context.

Compilation of Z3 theories will be delayed until queries are done on those theories. We will regularly update the representation of external tables in the Z3 engine to keep them synchronized. Z3 does not support undoing relations or rules, so a new context must be created each time a change occurs.

Type-checking

Z3 requires typed data. Type-checking will be completely implicit and we will not add type-constraints in the language.

The type-checker will preserve the types from external tables. It will raise an error if a variable should have two different types according to two different tables.

The type-checker will try to give the most precise type to constants that appear in Datalog programs to make them compatible with external tables. For example the value Ingress is usually typed as a string (Str type) but in the context of a Neutron security group, it will be given the type Direction

To be able to have two representations for a value, we will introduce explicit converter built-ins: builtin:cast(x, 'Direction', y, 'Str') will convert a variable x of type Direction in a variable y of the standard string type. It will be implemented as a table where each row is a pair of the representations in the two different types of a same object. Those tables will be filled when constants in programs are compiled and external tables are imported.

The same trick will be used to have a bit field representation of Ip addresses.

Finally the type-checker should support some kind of generic polymorphism for built-ins. For example equalities and comparison should work as long as the compared objects are of the same type, addition of two items belonging to a given subtype of integers should give back an item in the same subtype. We do not plan to support user-defined polymorphic tables. For example the following rule is ill defined:

add3(x,y,z,t) :- builtin:plus(x,y,u), builtin:plus(u,z,t)

It may be desirable to have type-constraints in the language to constrain such variables.

Internal representations

Internally Z3 will use bit-vectors. Integers will be translated in 32 bits bit-vectors. Strings will be bijectively associated with an integer and represented by its associated bit-vector.

When possible we will use finer types (enumerations) that require smaller bit-vector.

Alternatives

We could try to rephrase the theories to fit the current engine. For example we could unroll recursive calls up to a reasonable depth (packets have a TTL). Unfortunately this solution does not scale well. Unrolling recursive calls change in fact the overall complexity of the program. Experiments with octant have shown that some problems that require several minutes of computation with the current engine can be solved in tenth of seconds with octant.

We could try to incorporate Z3 algorithmic in Congress engine: this would be time consuming and hazardous. Z3 uses very efficient C++ data structures to represent tables and more that ten years have been devoted to its development.

We could use another external engine: there may be cases where another Datalog engine could be a better candidate. The new Congress architecture with explicit data-type has been designed to let several engines share a same pool of data. The strong points of Z3 are its overall efficiency and its data-structures specially crafted for network related problems.

There may be cases where we may wish to have several Z3 contexts with different settings. Contexts define the variant of the Z3 engine in use and different theories may benefit from different settings.

Policy

Here are the main characteristics of Z3 policies:

  • Adds support for recursive rules.

  • Support of built-in predicates for equalities / comparison

  • Support for basic bit arithmetic (to be used with IP address).

  • New built-ins for converting opaque IpAddress to bit fields and IpNetwork to a pair of bit-fields (mask and address).

  • Built-ins to convert a value that exists in different types from one representation to the other.

The last two kinds of built-ins will in fact be implemented by tables computed when data-sources and programs are synchronized.

Here is an example of policy that verifies if two networks are connected by a chain of routers and if VMs are attached to two networks that belong to two sets of networks that should be kept isolated:

connect1(X) :- neutronv2:networks(id=X, name="N1")
connect2(X) :- neutronv2:networks(id=X, name="N2")

linked(X,Y) :-
    neutronv2:ports(network_id=X, device_id=Z), neutronv2:routers(id=Z),
    neutronv2:ports(network_id=Y, device_id=Z)

path(X,Y) :- linked(X,Y)
path(X,Y) :- path(X,Z), linked(Z,Y)

interco_error(X,Y) :- connect1(X), connect2(Y), path(X,Y)

network1(X) :- connect1(Y), path(Y, X)
network2(X) :- connect2(Y), path(Y, X)

double_attach(X) :-
    nova:server(id=X),
    neutronv2:ports(network_id=X, device_id=Y), network1(Y),
    neutronv2:ports(network_id=X, device_id=Z), network2(Z)

Policy actions

None. Z3 theories will not support actions. Support may be added later.

Data sources

Independent from the data-source used.

Data model impact

None

REST API impact

We may want to move to more atomic definitions of theories. A theory is a complete program and should be considered as such. Adding and removing rules on the fly makes compilation and type-checking unnecessarily complex.

Security impact

None

Notifications impact

None

Other end user impact

For python-congressclient, when a new theory is created, there is a new kind alternative: z3.

Performance impact

Well behaved Z3 theories will be much faster than equivalent non-recursive theories. But programs that require too much computing power may be harder to abort on the Z3 engine.

Other deployer impact

Z3 is not a pypi package. There is an unofficial z3solver package but it has some limitations (not an up-to-date version and a version emitting a lot of spurious messages). For some Linux distributions, there is no pre-compiled version of Z3 available. Z3 must be compiled from source on those systems.

Due to all these constraints, we have chosen NOT to add z3 as a requirement. z3theory will be available IF z3 has been deployed on the server hosting congress.

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

pcregut

Other contributors:

Work items

For Rocky we do not plan to add built-ins. The basic Datalog language is sufficient for interesting examples.

During Stein iteration, built-ins and the treatment of polymorphism in the type-checker will be added.

Dependencies

This feature depends on Microsoft theorem prover Z3.

Testing

Unit tests will be done with a mock of z3 library.

For integration tests, new settings will be added to devstack so that Z3 can be either installed from source (rather slow) or directly deployed from a pre-compiled release (only available for Ubuntu 14.04, 16.04 and Debian 8.10). As the python package is deployed globally, tempest may skip Z3 tests if it cannot import the python package.

Documentation impact

Documentation should at least mention:

  • How to enable Z3,

  • Z3 theories and the kind of builtins supported,

  • Explain the benefit of recusive theories.