1 Secure oslo.messaging.rpc message¶
Contents
Trove utilizes oslo_messaging.rpc to perform RPC calls and the transport underlying this is oslo_messaging. Messages sent on oslo.messaging are currently treated as genuine. There is a benefit to adding a layer of validation that will ensure that the RPC calls are in fact genuine. We propose that the RPC calls be encrypted with unique keys.
Launchpad Blueprint: https://blueprints.launchpad.net/trove/+spec/secure-oslo-messaging-messages
1.1 Problem Description¶
Messages sent on oslo.messaging are currently treated as genuine by the recipient. Given that the names of the topics used are predictable, it is possible for a person with sufficient knowledge of Trove to, for example, compromise a guest instance or otherwise obtain credentials to connect to RabbitMQ (or the underlying transport to oslo-messaging) and then generate messages to, for example, the task manager by impersonating the API service. While there are already safeguards in place to contain the scope of this, such as by requiring that the message contain a valid keystone token with the appropriate access, this is still a point of vulnerability.
Currently, when a client wishes to make an asynchronous RPC (cast()), the method name and parameters are marshalled and sent down to oslo_messaging.rpc. It is the responsibility of oslo_messaging.rpc to transmit the information to the remote side, and then find and invoke the method specified. After the cast() is invoked on the client side, the next thing that is seen by the consumer of oslo_messaging.rpc is an invocation of the desired method on the server side.
The same thing happens for a synchronous RPC (call()) with the additional step of the client blocking, the server completing the operation and sending a response to the client, and the client receiving that and unblocking.
1.2 Proposed Change¶
After experimenting with several other alternative approaches, we propose to implement custom serializers (and deserializers) which can be provided to oslo_messaging.rpc.
All messages sent and return values in RPC call() will be serialized
through these custom methods which will encrypt the content. Due to a
bug Failure to use serializer in exception
if an RPC function
throws an exception, the exception is not encrypted.
1.2.1 How does TroveRPCDispatcher verify legitimacy of a message¶
The proposed implementation relies on cryptography and unique keys for the control plane and the guests. We propose to use symmetric keys for the purpose of encryption.
Trove has the following entities who are party to RPC invocations:
Trove API Service (client)
Trove Taskmanager Service (client and server)
Trove Conductor Service (server)
Trove Guestagent (client and server)
When an RPC call() or cast() is made, the client invokes the serializer which will encrypt all arguments. When received on the server side, oslo_messaging.rpc will invoke the deserializer which will decrypt the arguments.
It is assumed that the control plane is secure and the control plane symmetric key is secure. If it is compromised, then all bets are off.
In communication with the guest agent, each guest has a unique symmetric key that is generated by the control plane and passed to the guest at launch.
1.2.2 Securing the response¶
As described earlier, a response to a call() method will be secured in the same way as the request. As observed earlier, due to a bug, an exception thrown by an RPC function is not (currently) being serialized and will therefore be returned unencrypted. When (if) that bug is fixed in oslo_messaging.rpc, this exposure is minimized.
1.2.3 Contol plane key¶
The control plane key is constructed at system initialization time. The key is stored on the control plane (in the configuration file).
If the control plane consists of multiple machines, then the control plane services on all machines must have access to the control plane key.
1.2.4 Getting keys to the guest instance¶
On instance launch, the guest key is created and passed to the guest as an injected file. We assume that the mechanism for file injection is secure in that it cannot be intercepted and compromised by a bad actor.
A unique key is created for each instance.
1.2.5 Why is this secure?¶
We make two assumptions above; these are:
The control plane is secure, the control plane key is not compromised, and
The transmission of the guest key to the guest is secure and is not compromised.
These are, meaningful and reasonable assumptions to make given the architecture of an OpenStack system.
Should a guest be compromised, the bad actor can connect to the underlying transport (say Rabbit) but all they will be able to see are encrypted messages that they cannot decrypt.
1.2.6 Configuration¶
The control plane key is stored on the control plane in a secure way and there are configuration options to tell each service where to find it.
Each guest instance will have a key and that will be stored securely on the instance and a configuraiton setting will tell the guestagent where to find it.
cfg.StrOpt('tm_rpc_encr_key',
default='bzH6y0SGmjuoY0FNSTptrhgieGXNDX6PIhvz',
help='OpenSSL aes_cbc key for taskmanager RPC encryption.'),
cfg.StrOpt('inst_rpc_key_encr_key',
default='emYjgHFqfXNB1NGehAFIUeoyw4V4XwWHEaKP',
help='OpenSSL aes_cbc key to encrypt instance keys in DB.'),
cfg.StrOpt('instance_rpc_encr_key',
help='OpenSSL aes_cbc key for instance RPC encryption.'),
1.2.7 Database¶
The guest key for each guest instance will be stored in the database. A table instance_keys is proposed for this.
Field |
Type |
Null |
Key |
Default |
Extra |
id instance_id encrypted_key created updated deleted deleted_at |
varchar(64) varchar(64) varchar(255) datetime datetime tinyint(1) datetime |
NO NO NO NO NO NO YES |
PRI UNI |
NULL NULL NULL NULL NULL NULL NULL |
The guest instance keys are encrypted and stored in the encrypted_key column. A foreign key constraint links instance_id with instances.id. A unique constraint on instance_id is placed on this table.
1.2.8 Public API¶
No changes to the public API.
1.2.9 Public API Security¶
No changes.
1.2.10 Python API¶
No changes.
1.2.11 CLI (python-troveclient)¶
No changes.
1.2.12 Internal API¶
The internal API (from the perspective of developers, and invocations) will remain unaffected by this change as the implementation seeks to work below the Trove code entirely. As a result, the internal API will be radically different, and code must be in place to ensure that encrypting and non-encrypting clients and servers know how to interoperate.
1.2.13 Guest Agent¶
The guestagent will receive its key as a part of the configdrive/boot process and can use it to decrypt all messages.
1.2.14 Alternatives¶
Several alternatives were considered, prototyped, and abandoned. A short summary of each is provided below.
We proposed to the oslo_messaging.rpc team to implement a lightweight message signing and encryption mechanism in their code by providing a mechanism of callbacks which would allow the consumer (trove) to perform the signing and encryption. The oslo_messaging team did not want to go this route as they felt that the message included other private data structures which we (the consumer) could modify and cause unexpected behavior.
We proposed that that oslo_messaging.rpc allow consumers to provide a custom dispatcher for messages on the receiver side. With this implementation, a signature or message encryption could be performed on the client side and intercepted on the server side and reversed allowing us to have minimal changes on the server side. Again, the oslo_messaging.rpc team felt that the dispatcher was a private data structure and they did not feel that we should be encapsulating it.
We prototyped and experimented with a change where each RPC endpoint would be decorated and the decorator would provide a mechanism to construct the proper parameters and the invocation to the RPC method. The client side change would be identical to (b) but the server side change would involve a change to every RPC method to add the decorator. In addition, the call context would not be encrypted in this approach and it was abandoned.
We were advised that we should NOT be using oslo_messaging.rpc the way we are using it as it was only intended for use on the control plane. And that we should instead make the guest an RPC server. Unfortunately that’s not what we need? In Trove, the guest agent is an extension of the control plane and not well suited to a REST based communication strategy. What we need is an RPC mechanism, and it is sad that oslo_messaging.rpc can’t seem to provide a secure one.
1.4 Implementation¶
1.4.1 Assignee(s)¶
- Primary assignee:
amrith
- Dashboard assignee:
none
1.4.2 Milestones¶
Ocata-1
1.4.3 Work Items¶
Implementing code on control plane and guest
Implement changes to devstack plugin to create control plane key
Implement unit tests
Implement upgrade handling
Update documentation
1.5 Upgrade Implications¶
Minimal upgrade implications are anticipated, code is proposed that handles this transition.
The control plane key will be generated and persisted on all control plane nodes.
When guests are upgraded a key will be sent to them as part of the nova migrate process.
The API’s will be rev’ed one major version to account for this.
1.6 Dependencies¶
There is an assumed dependency on the RPC API versioning which has now merged.
1.7 Testing¶
Oh yeah, we’ll need some of this.
1.8 Documentation Impact¶
And some of this; details to follow.
1.9 References¶
Failure to use serializer in exception
: https://bugs.launchpad.net/oslo.messaging/+bug/1648254
1.10 Appendix¶
Any additional technical information and data.