Modify TripleO Ironic Inspector to PXE Boot Via DHCP Relay

https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-ironic-inspector

This blueprint is part of the series tripleo-routed-networks-deployment [0].

This spec describes adding features to the Undercloud to support Ironic Inspector performing PXE boot services for multiple routed subnets (with DHCP relay on the routers forwarding the requests). The changes required to support this will be in the format of undercloud.conf and in the Puppet script that writes the dnsmasq.conf configuration for Ironic Inspector.

TripleO uses Ironic Inspector to perform baremetal inspection of overcloud nodes prior to deployment. Today, the dnsmasq.conf that is used by Ironic Inspector is generated by Puppet scripts that run when the Undercloud is configured. A single subnet and IP allocation range is entered in undercloud.conf in the parameter inspection_iprange. This spec would implement support for multiple subnets in one provisioning network.

Background Context

For a detailed description of the desired topology and problems being addresssed, please reference the parent blueprint triplo-routed-networks-deployment [0].

Problem Descriptions

Ironic Inspector DHCP doesn’t yet support DHCP relay. This makes it difficult to do introspection when the hosts are not on the same L2 domain as the controllers. The dnsmasq process will actually function across a DHCP relay, but the configuration must be edited by hand.

Possible Solutions, Ideas, or Approaches:

  1. Add support for DHCP scopes and support for DHCP relays.
  2. Use remote DHCP/PXE boot but provide L3 routes back to the introspection server
  3. Use Neutron DHCP agent to PXE boot nodes for introspection (the Neutron dhcp-agent already supports multiple subnets, and can be modified to support DHCP relay). Note that there has been discussion about moving to Neutron for Ironic Introspection on this bug [3]. This is currently infeasible due to Neutron not being able to issue IPs for unknown MACs. The related patch has been abandoned [5].

Solution Implementation

The Ironic Inspector DHCP server uses dnsmasq, but only configures one subnet. We need to modify the Ironic Inspector DHCP configuration so that we can configure DHCP for multiple Neutron subnets and allocation pools. Then we should be able to use DHCP relay to send DHCP requests to the Ironic Inspector DHCP server. In the long term, we can likely leverage the Routed Networks work being done in Neutron to represent the subnets and allocation pools that would be used for the DHCP range sets below. This spec only covers the minimum needed for TripleO, so the work can be achieved simply by modifying the Undercloud Puppet scripts. The following has been tested and shown to result in successful introspection across two subnets, one local and one across a router configured with DHCP relay:

Current dnsmasq.conf representing one network (172.20.0.0/24), which is
configured in the "inspection_iprange" in undercloud.conf:
  port=0
  interface=br-ctlplane
  bind-interfaces
  dhcp-range=172.21.0.100,172.21.0.120,29
  dhcp-sequential-ip
  dhcp-match=ipxe,175
  # Client is running iPXE; move to next stage of chainloading
  dhcp-boot=tag:ipxe,http://172.20.0.1:8088/inspector.ipxe
  dhcp-boot=undionly.kpxe,localhost.localdomain,172.20.0.1

Multiple-subnet dnsmasq.conf representing multiple subnets:
  port=0
  interface=br-ctlplane
  bind-interfaces
  # Ranges and options
  dhcp-range=172.21.0.100,172.21.0.120,29
  dhcp-range=set:leaf1,172.20.0.100,172.20.0.120,255.255.255.0,29
  dhcp-option=tag:leaf1,option:router,172.20.0.254
  dhcp-range=set:leaf2,172.19.0.100,172.19.0.120,255.255.255.0,29
  dhcp-option=tag:leaf2,option:router,172.19.0.254

  dhcp-sequential-ip
  dhcp-match=ipxe,175
  # Client is running iPXE; move to next stage of chainloading
  dhcp-boot=tag:ipxe,http://172.20.0.1:8088/inspector.ipxe
  dhcp-boot=undionly.kpxe,localhost.localdomain,172.20.0.1

In the above configuration, a router is supplied for all subnets, including the subnet to which the Undercloud is attached. Note that the router is not required for nodes on the same subnet as the inspector host, but if it gets automatically generated it won’t hurt anything.

This file is created by the Puppet file located in [1]. That is where the changes will have to be made.

As discussed above, using a remote DHCP/PXE server is a possibility only if we have support in the top-of-rack switches, or if there is a system or VM listening on the remote subnet to relay DHCP requests. This configuration of dnsmasq will allow it to send DHCP offers to the DHCP relay, which forwards the offer on to the requesting host. After the offer is accepted, the host can communicate directly with the Undercloud, since it has already received the proper gateway address for packets to be forwarded. It will send a DHCP request directly based on the offer, and the DHCP ACK will be sent directly from the Undercloud to the client. Downloading of the PXE images is then done via TFTP and HTTP, not through the DHCP relay.

An additional problem is that Ironic Inspector blacklists nodes that have already been introspected using iptables rules blocking traffic from particular MAC addresses. Since packets relayed via DHCP relay will come from the MAC address of the router (not the original NIC that sent the packet), we will need to blacklist MACs based on the contents of the relayed DHCP packet. If possible, this blacklisting would be done using dnsmasq, which would provide the ability to decode the DHCP Discover packets and act on the contents. In order to do blacklisting directly with dnsmasq instead of using iptables, we need to be able to influence the dnsmasq configuration file.

Proposed Change

The proposed changes are discussed below.

Overview

The Puppet modules will need to be refactored to output a multi-subnet dnsmasq.conf from a list of subnets in undercloud.conf.

The blacklisting functionality will need to be updated. Filtering by MAC address won’t work for DHCP requests that are relayed by a router. In that case, the source MAC address will be the router interface that sent the relayed request. There are methods to blacklist MAC addresses within dnsmasq, such as this configuration:

dhcp-mac=blacklist,<target MAC address>
dhcp-ignore=blacklist

Or this configuration:

# Never offer DHCP service to a machine whose Ethernet
# address is 11:22:33:44:55:66
dhcp-host=11:22:33:44:55:66,ignore

The configuration could be placed into the main dnsmasq.conf file, or into a file in /etc/dnsmasq.d/. Either way, dnsmasq will have to be restarted in order to re-read the configuration files. This is due to a security feature in dnsmasq to prevent foreign configuration being loaded as root. Since DHCP has a built-in retry mechanism, the brief time it takes to restart dnsmasq should not impact introspection, as long as we don’t restart dnsmasq too many times in any 60-second period.

It does not appear that the dnsmasq DBus interface can be used to set the “dhcp-ignore” option for individual MAC addresses [4] [6].

Alternatives

One alternative approach is to use DHCP servers to assign IP addresses on all hosts on all interfaces. This would simplify configuration within the Heat templates and environment files. Unfortunately, this was the original approach of TripleO, and it was deemed insufficient by end-users, who wanted stability of IP addresses, and didn’t want to have an external dependency on DHCP.

Another approach which was considered was simply trunking all networks back to the Undercloud, so that dnsmasq could respond to DHCP requests directly, rather than requiring a DHCP relay. Unfortunately, this has already been identified as being unacceptable by some large operators, who have network architectures that make heavy use of L2 segregation via routers. This also won’t work well in situations where there is geographical separation between the VLANs, such as in split-site deployments.

Another approach is to use the DHCP server functionality in the network switch infrastructure in order to PXE boot systems, then assign static IP addresses after the PXE boot is done via DHCP. This approach would require configuration at the switch level that influenced where systems PXE boot, potentially opening up a security hole that is not under the control of OpenStack. This approach also doesn’t lend itself to automation that accounts for things like changes to the PXE image that is being served to hosts.

It is not necessary to use hardware routers to forward DHCP packets. There are DHCP relay and DHCP proxy packages available for Linux. It is possible to place a system or a VM on both the Provisioning network and the remote network in order to forward DHCP requests. This might be one method for implementing CI testing. Another method might trunk all remote provisioning networks back to the Undercloud, with DHCP relay running on the Undercloud forwarding to the local br-ctlplane.

Security Impact

One of the major differences between spine-and-leaf and standard isolated networking is that the various subnets are connected by routers, rather than being completely isolated. This means that without proper ACLs on the routers, private networks may be opened up to outside traffic.

This should be addressed in the documentation, and it should be stressed that ACLs should be in place to prevent unwanted network traffic. For instance, the Internal API network is sensitive in that the database and message queue services run on that network. It is supposed to be isolated from outside connections. This can be achieved fairly easily if supernets are used, so that if all Internal API subnets are a part of the 172.19.0.0/16 supernet, an ACL rule will allow only traffic between Internal API IPs (this is a simplified example that could be applied on all Internal API router VLAN interfaces or as a global ACL):

allow traffic from 172.19.0.0/16 to 172.19.0.0/16
deny traffic from * to 172.19.0.0/16

In the case of Ironic Inspector, the TFTP server is a potential point of vulnerability. TFTP is inherently unauthenticated and does not include an access control model. The network(s) where Ironic Inspector is operating should be secured from remote access.

Other End User Impact

Deploying with spine-and-leaf will require additional parameters to provide the routing information and multiple subnets required. This will have to be documented. Furthermore, the validation scripts may need to be updated to ensure that the configuration is validated, and that there is proper connectivity between overcloud hosts.

Performance Impact

Much of the traffic that is today made over layer 2 will be traversing layer 3 routing borders in this design. That adds some minimal latency and overhead, although in practice the difference may not be noticeable. One important consideration is that the routers must not be too overcommitted on their uplinks, and the routers must be monitored to ensure that they are not acting as a bottleneck, especially if complex access control lists are used.

The DHCP process is not likely to be affected, however delivery of system images via TFTP may suffer a performance degredation. Since TFTP does not deal well with packet loss, deployers will have to take care not to oversaturate the links between routing switches.

Other Deployer Impact

A spine-and-leaf deployment will be more difficult to troubleshoot than a deployment that simply uses a set of VLANs. The deployer may need to have more network expertise, or a dedicated network engineer may be needed to troubleshoot in some cases.

Developer Impact

Spine-and-leaf is not easily tested in virt environments. This should be possible, but due to the complexity of setting up libvirt bridges and routes, we may want to provide a simulation of spine-and-leaf for use in virtual environments. This may involve building multiple libvirt bridges and routing between them on the Undercloud, or it may involve using a DHCP relay on the virt-host as well as routing on the virt-host to simulate a full routing switch. A plan for development and testing will need to be formed, since not every developer can be expected to have a routed environment to work in. It may take some time to develop a routed virtual environment, so initial work will be done on bare metal.

Implementation

Assignee(s)

Primary assignee:
Dan Sneddon <dsneddon@redhat.com>

Final assignees to be determined.

Approver(s)

Primary approver:
Emilien Macchi <emacchi@redhat.com>

Work Items

  1. Modify Ironic Inspector dnsmasq.conf generation to allow export of multiple DHCP ranges. The patch enabling this has merged [7].
  2. Modify the Ironic Inspector blacklisting mechanism so that it supports DHCP relay, since the DHCP requests forwarded by the router will have the source MAC address of the router, not the node being deployed.
  3. Modify the documentation in tripleo-docs to cover the spine-and-leaf case.
  4. Add an upstream CI job to test booting across subnets (although hardware availability may make this a long-term goal).

[*] Note that depending on the timeline for Neutron/Ironic integration, it might make sense to implement support for multiple subnets via changes to the Puppet modules which process undercloud.conf first, then follow up with a patch to integrate Neutron networks into Ironic Inspector later on.

Implementation Details

Workflow for introspection and deployment:

  1. Network Administrator configures all provisioning VLANs with IP address of Undercloud server on the ctlplane network as DHCP relay or “helper-address”.
  2. Operator configures IP address ranges and default gateways in undercloud.conf. Each subnet will require its own IP address range.
  3. Operator imports baremetal instackenv.json.
  4. When introspection or deployment is run, the DHCP server receives the DHCP request from the baremetal host via DHCP relay.
  5. If the node has not been introspected, reply with an IP address from the introspection pool and the inspector PXE boot image.
  6. Introspection is performed. LLDP collection [2] is performed to gather information about attached network ports.
  7. The node is blacklisted in dnsmasq.conf (or in /etc/dnsmasq.d), and dnsmasq is restarted.
  8. On the next boot, if the MAC address is blacklisted and a port exists in Neutron, then Neutron replies with the IP address from the Neutron port and the overcloud-full deployment image.
  9. The Heat templates are processed which generate os-net-config templates, and os-net-config is run to assign static IPs from the correct subnets, as well as routes to other subnets via the router gateway addresses.

When using spine-and-leaf, the DHCP server will need to provide an introspection IP address on the appropriate subnet, depending on the information contained in the DHCP relay packet that is forwarded by the segment router. dnsmasq will automatically match the gateway address (GIADDR) of the router that forwarded the request to the subnet where the DHCP request was received, and will respond with an IP and gateway appropriate for that subnet.

The above workflow for the DHCP server should allow for provisioning IPs on multiple subnets.

Dependencies

There will be a dependency on routing switches that perform DHCP relay service for production spine-and-leaf deployments. Since we will not have routing switches in our virtual testing environment, a DHCP proxy may be set up as described in the testing section below.

Testing

In order to properly test this framework, we will need to establish at least one CI test that deploys spine-and-leaf. As discussed in this spec, it isn’t necessary to have a full routed bare metal environment in order to test this functionality, although there is some work required to get it working in virtual environments such as OVB.

For virtual testing, it is sufficient to trunk all VLANs back to the Undercloud, then run DHCP proxy on the Undercloud to receive all the requests and forward them to br-ctlplane, where dnsmasq listens. This will provide a substitute for routers running DHCP relay.

Documentation Impact

The TripleO docs will need to be updated to include detailed instructions for deploying in a spine-and-leaf environment, including the environment setup. Covering specific vendor implementations of switch configurations is outside this scope, but a specific overview of required configuration options should be included, such as enabling DHCP relay (or “helper-address” as it is also known) and setting the Undercloud as a server to receive DHCP requests.

The updates to TripleO docs will also have to include a detailed discussion of choices to be made about IP addressing before a deployment. If supernets are to be used for network isolation, then a good plan for IP addressing will be required to ensure scalability in the future.