lxc, veth, troubleshooting
This spec aims to make troubleshooting openstack-ansible issues a more efficient process by using container names to build names for veth interfaces.
Link to blueprint:
All veth interfaces on the host are named using randomly generated names, such as vethK070G4. This can make troubleshooting container networking issues more challenging since it’s difficult to trace a veth name to a particular network interface within the container.
Names of veth interfaces should be unique and easily correlated to their containers. However, names of network interfaces have restrictions which must be handled carefully:
16 characters maximum
Certain characters, like dashes (-) aren’t allowed
The random characters on the end of the container hostname could be used along with the interface name to form a veth name. As an example, a container called aio1_utility_container-a9ef9551 could have two named veth interfaces:
Leave veth names as randomly generated by LXC.
The veth names will only be adjusted on the host within the LXC configuration files themselves. Containers won’t be affected. The playbooks don’t use the veth names on the host for any actions.
If veths are not cleaned up properly when a container stops (this is sometimes called ‘dangling veths’), there’s a chance that the container won’t start until the dangling veth is manually removed with ip link del <veth>.
Upgrades should be unaffected. This change only adjusts the LXC container configuration files and doesn’t change the running configuration of any of the containers.
If a container is running and its LXC configuration file is adjusted to use named veths, it will only utilize those adjustments when it is restarted. If an upgrade happens to restart only a subset of the containers on the host, then only those containers will use named veths after they restart.
This change shouldn’t affect security.
This change shouldn’t affect performance.
End user impact¶
This change shouldn’t affect end users.
Users who deploy OpenStack should be able to troubleshoot network issues more efficiently.
For example, if a user was having trouble reaching the nova API container, they could quickly see which veths were associated with the container. This would allow users to diagnose network problems with various tools, like ethtool and tcpdump, without digging into interface indexes or writing scripts.
If a deployer wants to begin using named veth pairs immediately, then all containers must be restarted. This is because the LXC configuration files are adjusted on disk but running containers aren’t adjusted.
Much like the deployer impact above, this change could help developers diagnose issues within different containers more efficiently.
This spec has no known dependencies.
Update ansible playbooks to specify lxc.network.veth.pair in the main LXC configuration files as well as the interface .ini files
Do greenfield deployment and verify named veths
Do an upgrade between releases and verify named veths
Verify that both tests have no impact on running containers
Documentation would be beneficial, especially around how this helps with troubleshooting issues.