Named veths

date:2015-08-31 22:00
tags:lxc, veth, troubleshooting

This spec aims to make troubleshooting openstack-ansible issues a more efficient process by using container names to build names for veth interfaces.

Link to blueprint:

Problem description

All veth interfaces on the host are named using randomly generated names, such as vethK070G4. This can make troubleshooting container networking issues more challenging since it’s difficult to trace a veth name to a particular network interface within the container.

Proposed change

Names of veth interfaces should be unique and easily correlated to their containers. However, names of network interfaces have restrictions which must be handled carefully:

  • 16 characters maximum
  • Certain characters, like dashes (-) aren’t allowed

The random characters on the end of the container hostname could be used along with the interface name to form a veth name. As an example, a container called aio1_utility_container-a9ef9551 could have two named veth interfaces:

  • a9ef9551_eth0
  • a9ef9551_eth1

Alternatives

Leave veth names as randomly generated by LXC.

Playbook/Role impact

The veth names will only be adjusted on the host within the LXC configuration files themselves. Containers won’t be affected. The playbooks don’t use the veth names on the host for any actions.

If veths are not cleaned up properly when a container stops (this is sometimes called ‘dangling veths’), there’s a chance that the container won’t start until the dangling veth is manually removed with ip link del <veth>.

Upgrade impact

Upgrades should be unaffected. This change only adjusts the LXC container configuration files and doesn’t change the running configuration of any of the containers.

If a container is running and its LXC configuration file is adjusted to use named veths, it will only utilize those adjustments when it is restarted. If an upgrade happens to restart only a subset of the containers on the host, then only those containers will use named veths after they restart.

Security impact

This change shouldn’t affect security.

Performance impact

This change shouldn’t affect performance.

End user impact

This change shouldn’t affect end users.

Deployer impact

Users who deploy OpenStack should be able to troubleshoot network issues more efficiently.

For example, if a user was having trouble reaching the nova API container, they could quickly see which veths were associated with the container. This would allow users to diagnose network problems with various tools, like ethtool and tcpdump, without digging into interface indexes or writing scripts.

If a deployer wants to begin using named veth pairs immediately, then all containers must be restarted. This is because the LXC configuration files are adjusted on disk but running containers aren’t adjusted.

Developer impact

Much like the deployer impact above, this change could help developers diagnose issues within different containers more efficiently.

Dependencies

This spec has no known dependencies.

Implementation

Assignee(s)

Primary assignee:
https://launchpad.net/~rackerhacker mhayden

Work items

  • Update ansible playbooks to specify lxc.network.veth.pair in the main LXC configuration files as well as the interface .ini files

Testing

  • Do greenfield deployment and verify named veths
  • Do an upgrade between releases and verify named veths
  • Verify that both tests have no impact on running containers

Documentation impact

Documentation would be beneficial, especially around how this helps with troubleshooting issues.

References

N/A