Network Namespaces – How to Find the Network Namespace of a veth Peer ifindex

network-namespacesnetworkingveth

Task

I need to unambiguously and without "holistic" guessing find the peer network interface of a veth end in another network namespace.

Theory ./. Reality

Albeit a lot of documentation and also answers here on SO assume that the ifindex indices of network interfaces are globally unique per host across network namespaces, this doesn't hold in many cases: ifindex/iflink are ambiguous. Even the loopback already shows the contrary, having an ifindex of 1 in any network namespace. Also, depending on the container environment, ifindex numbers get reused in different namespaces. Which makes tracing veth wiring a nightmare, espcially with lots of containers and a host bridge with veth peers all ending in @if3 or so…

Example: link-netnsid is 0

Spin up a Docker container instance, just to get a new veth pair connecting from the host network namespace to the new container network namespace…

$ sudo docker run -it debian /bin/bash

Now, in the host network namespace list the network interfaces (I've left out those interfaces that are of no interest to this question):

$ ip link show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
...
4: docker0:  mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:34:23:81:f0 brd ff:ff:ff:ff:ff:ff
...
16: vethfc8d91e@if15:  mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether da:4c:f7:50:09:e2 brd ff:ff:ff:ff:ff:ff link-netnsid 0

As you can see, while the iflink is unambiguous, but the link-netnsid is 0, despite the peer end sitting in a different network namespace.

For reference, check the netnsid in the unnamed network namespace of the container:

$ sudo lsns -t net
        NS TYPE NPROCS   PID USER  COMMAND
...
...
4026532469 net       1 29616 root  /bin/bash

$ sudo nsenter -t 29616 -n ip link show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
15: eth0@if16:  mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

So, for both veth ends ip link show (and RTNETLINK fwif) tells us they're in the same network namespace with netnsid 0. Which is either wrong or correct under the assumptions that link-netnsids are local as opposed to global. I could not find any documentation that make it explicit what scope link-netnsids are supposed to have.

/sys/class/net/... NOT to the Rescue?

I've looked into /sys/class/net/if/… but can only find the ifindex and iflink elements; these are well documented. "ip link show" also only seems to show the peer ifindex in form of the (in)famous "@if#" notation. Or did I miss some additional network namespace element?

Bottom Line/Question

Are there any syscalls that allow retrieving the missing network namespace information for the peer end of a veth pair?

Best Answer

Here's the method I followed to find how to understand this problem. Available tools appear usable (with some convolution) for the namespace part, and (UPDATED) using /sys/ can easily get the peer's index. So it's quite long, bear with me. It's in two parts (which are not in the logical order, but namespace first helps explain the the index naming), using common tools, not any custom program:

  • Network namespace
  • Interface index

Network namespace

This information is available with the property link-netnsid in the output of ip link and can be matched with the id in the output of ip netns. It's possible to "associate" a container's network namespace with ip netns, thus using ip netns as a specialized tool. Of course doing a specific program for this would be better (some informations about syscalls at the end of each part).

About the nsid's description, here's what man ip netns tells (emphasis mine):

ip netns set NAME NETNSID - assign an id to a peer network namespace

This command assigns a id to a peer network namespace. This id is valid only in the current network namespace. This id will be used by the kernel in some netlink messages. If no id is assigned when the kernel needs it, it will be automatically assigned by the kernel. Once it is assigned, it's not possible to change it.

While creating a namespace with ip netns won't immediately create a netnsid, it will be created (on the current namespace, probably the "host") whenever a veth half is set to an other namespace. So it's always set for a typical container.

Here's an example using an LXC container:

# lxc-start -n stretch-amd64

A new veth link veth9RPX4M appeared (this can be tracked with ip monitor link). Here are the detailed informations:

# ip -o link show veth9RPX4M
44: veth9RPX4M@if43: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master lxcbr0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
link/ether fe:25:13:8a:00:f8 brd ff:ff:ff:ff:ff:ff link-netnsid 4

This link has the property link-netnsid 4, telling the other side is in the network namespace with nsid 4. How to verify it's the LXC container? The easiest way to get this information is making ip netns believe it created the container's network namespace, by doing the operations hinted in the manpage.

# mkdir -p /var/run/netns
# touch /var/run/netns/stretch-amd64
# mount -o bind /proc/$(lxc-info -H -p -n stretch-amd64)/ns/net /var/run/netns/stretch-amd64

UPDATE3: I didn't understand that finding back the global name was a problem. Here it is:

# ls -l /proc/$(lxc-info -H -p -n stretch-amd64)/ns/net
lrwxrwxrwx. 1 root root 0 mai    5 20:40 /proc/17855/ns/net -> net:[4026532831]

# stat -c %i /var/run/netns/stretch-amd64 
4026532831

Now the information is retrieved with:

# ip netns | grep stretch-amd64
stretch-amd64 (id: 4)

It confirms the veth's peer is in the network namespace with the same nsid = 4 = link-netnsid.

The container/ip netns "association" can be removed (without removing the namespace as long as the container is running):

# ip netns del stretch-amd64

Note: the nsid naming is per network namespace, usually starts with 0 for the first container, and the lowest value available is recycled with new namespaces.

About using syscalls, here are informations guessed from strace:

  • for the link part: it requires an AF_NETLINK socket (opened with socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)), asking ( sendmsg()) the link's informations with a message type RTM_GETLINK and retrieving (recvmsg()) the reply with message type RTM_NEWLINK.

  • for the netns nsid part: same method, the query message is type RTM_GETNSID with reply type RTM_NEWNSID.

I think the slightly higher level libraries to handle this are there: libnl. Anyway it's a topic for SO.

Interface index

Now it will be easier to follow why the index appear to have random behaviours. Let's do an experiment:

First enter a new net namespace to have a clean (index) slate:

# ip netns add test
# ip netns exec test bash
# ip netns id
test
# ip -o link 
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

As OP noted, lo begins with index 1.

Let's add 5 net namespaces, create veth pairs, then put a veth end on them:

# for i in {0..4}; do ip netns add test$i; ip link add type veth peer netns test$i ; done
# ip -o link|sed 's/^/    /'
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether e2:83:4f:60:5a:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
3: veth1@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 22:a7:75:8e:3c:95 brd ff:ff:ff:ff:ff:ff link-netnsid 1
4: veth2@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 72:94:6e:e4:2c:fc brd ff:ff:ff:ff:ff:ff link-netnsid 2
5: veth3@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether ee:b5:96:63:62:de brd ff:ff:ff:ff:ff:ff link-netnsid 3
6: veth4@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether e2:7d:e2:9a:3f:6d brd ff:ff:ff:ff:ff:ff link-netnsid 4

When it's displaying @if2 for each of them it becomes quite clear it's the peer's namespace interface index and index are not global, but per namespace. When it's displaying an actual interface name, it's a relation to an interface in the same name space (be it veth's peer, bridge, bond ...). So why veth0 doesn't have a peer displayed? I believe it's an ip link bug when the index is the same as itself. Just moving twice the peer link "solves" it here, because it forced an index change. I'm also sure sometimes ip link do other confusions and instead of displaying @ifXX, displays an interface in the current namespace with the same index.

# ip -n test0 link set veth0 name veth0b netns test
# ip link set veth0b netns test0
# ip -o link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0@if7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether e2:83:4f:60:5a:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
3: veth1@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 22:a7:75:8e:3c:95 brd ff:ff:ff:ff:ff:ff link-netnsid 1
4: veth2@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 72:94:6e:e4:2c:fc brd ff:ff:ff:ff:ff:ff link-netnsid 2
5: veth3@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether ee:b5:96:63:62:de brd ff:ff:ff:ff:ff:ff link-netnsid 3
6: veth4@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether e2:7d:e2:9a:3f:6d brd ff:ff:ff:ff:ff:ff link-netnsid 4

UPDATE: reading again informations in OP's question, the peer's index (but not nsid) is easily and unambiguously available with cat /sys/class/net/ interface /iflink.

UPDATE2:

All those iflink 2 may appear ambiguous, but what is unique is the combination of nsid and iflink, not iflink alone. For the above example that is:

interface    nsid:iflink
veth0        0:7
veth1        1:2
veth2        2:2
veth3        3:2
veth4        4:2

In this namespace (namely namespace test) there will never be two same nsid:pair .

If one was to look from each peer network the opposite information:

namespace    interface    nsid:iflink
test0        veth0        0:2
test1        veth0        0:3
test2        veth0        0:4
test3        veth0        0:5
test4        veth0        0:6

But bear in mind that all the 0: there is for each one a separate 0, that happens to map to the same peer namespace (namely: namespace test, not even the host). They can't be directly compared because they're tied to their namespace. So the whole comparable and unique information should be:

test0:0:2
test1:0:3
test2:0:4
test3:0:5
test4:0:6

Once it's confirmed that "test0:0" == "test1:0" etc. (true in this example, all map to the net namespace called test by ip netns) then they can be really compared.

About syscalls, still looking at strace results,the information is retrieved as above from RTM_GETLINK. Now there should be all informations available:

local: interface index with SIOCGIFINDEX / if_nametoindex
peer: both nsid and interface index with RTM_GETLINK.

All this should probably be used with libnl.

Related Question