There is currently support for enabling SSL for the public endpoints of the OpenStack services. However, certain use cases require the availability of SSL everywhere. This spec proposes an approach to enable it.
Even though there is support for deploying both the overcloud and the undercloud with TLS/SSL support for the public endpoints, there are deployments that demand the usage of encrypted communications through all the interfaces.
The current approach for deploying SSL in TripleO is to inject the needed keys/certificates through Heat environment files; this requires the pre-creation of those. While this approach works for the public-facing services, as we attempt to secure the communication between different services, and in different levels of the infrastructure, the amount of keys and certificates grows. So, getting the deployer to generate all the certificates and manage them will be quite cumbersome.
On the other hand, TripleO is not meant to handle the PKI of the cloud. And being the case that we will at some point need to enable the deployer to be able to renew, revoke and keep track of the certificates and keys deployed in the cloud, we are in need of a system with such capabilities.
Instead of brewing an OpenStack-specific solution ourselves. I propose the usage of already existing systems that will make this a lot easier.
The proposal is to start using certmonger in the nodes of the overcloud to interact with a CA for managing the certificates that are being used. With this tool, we can request the fetching of the needed certificates for interfaces such as the internal OpenStack endpoints, the database cluster and the message broker for the cloud. Those certificates will in turn have automatic tracking, and for cases where there is a certificate to identify the node, it could even automatically request a renewal of the certificate when needed.
Certmonger is already available in several distributions (both Red Hat or Debian based) and has the capability of interacting with several CAs, so if the operator already has a working one, they could use that. On the other hand, certmonger has the mechanism for registering new CAs, and executing scripts (which are customizable) to communicate with those CAs. Those scripts are language independent. But for means of the open source community, a solution such as FreeIPA or Dogtag could be used to act as a CA and handle the certificates and keys for us. Note that it’s possible to write a plugin for certmonger to communicate with Barbican or another CA, if that’s what we would like to go for.
In the FreeIPA case, this will require a full FreeIPA system running either on another node in the cluster or in the undercloud in a container.
For cases where the services are terminated by HAProxy, and the overcloud being in an HA-deployment, the controller nodes will need to share a certificate that HAProxy will present when accessed. In this case, the workflow will be as following:
While the process of creating each node beforehand could sound cumbersome, this can be automated to increase usability. The proposed approach is to have a nova micro-service that automatically registers the nodes from the overcloud when they are created . This hook will not only register the node in the system, but will also inject an OTP which the node can use to fetch the required credentials and get its corresponding certificate and key. The aforementioned OTP is only used for enrollment. Once enrollment has already taken place, certmonger can already be used to fetch certificates from FreeIPA.
However, even if this micro-service is not in place, we could pass the OTP via the TripleO Heat Templates (in the overcloud deployment). So it is possible to have the controllers fetching their keytab and subsequently request their certificates even if we don’t have auto-enrollment in place.
Barbican could also be used instead of FreeIPA’s Vault. With the upside of it being an already accepted OpenStack service. However, Barbican will also need to have a backend, which might be Dogtag in our case, since having an HSM for the CI will probably not be an option.
Now, for services such as the message broker, where an individual certificate is required per-host, the process is much simpler, since the nodes will have already been registered in FreeIPA and will be able to fetch their credentials. Now we can just let certmonger do the work and request, and subsequently track the appropriate certificates.
Once the certificates and keys are present in the nodes, then we can let the subsequent steps of the overcloud deployment process take place; So the services will be configured to use those certificates and enable TLS where the deployer specifies it.
The alternative is to take the same approach as we did for the public endpoints. Which is to simply inject the certificates and keys to the nodes. That would have the downside that the certificates and keys will be pasted in heat environment files. This will be problematic for services such as RabbitMQ, where we are giving a list of nodes for communication, because to enable SSL in it, we need to have a certificate per-node serving as a message broker. In this case two approaches could be taken:
This approach enables better security for the overcloud, as it not only eases us to enable TLS everywhere (if desired) but it also helps us keep track and manage our PKI. On the other hand, it enables other means of security, such as mutual authentication. In the case of FreeIPA, we could let the nodes have client certificates, and so they would be able to authenticate to the services (as is possible with tools such as HAProxy or Galera/MySQL). However, this can come as subsequent work of this.
For doing this, the user will need to pass extra parameters to the overcloud deployment, such as the CA information. In the case of FreeIPA, we will need to pass the host and port, the kerberos realm, the kerberos principal of the undercloud and the location of the keytab (the credentials) for the undercloud.
However, this will be reflected in the documentation.
Having SSL everywhere will degrade the performance of the overcloud overall, as there will be some overhead in each call. However, this is a known issue and this is why SSL everywhere is optional. It should only be enabled for deployers that really need it.
The usage of an external CA or FreeIPA shouldn’t impact the overcloud performance, as the operations that it will be doing are not recurrent operations (issuing, revoking or renewing certificates).
If a deployer wants to enable SSL everywhere, they will need to have a working CA for this to work. Or if they don’t they could install FreeIPA in a node.
Discuss things that will affect other developers working on OpenStack.
We will need to create a new gate in CI to test this.
The documentation on how to use an external CA and how to install and use FreeIPA with TripleO needs to be created.