Linux – Switching Network Namespace Doesn’t Change /sys/class/net

linuxnamespacenetwork-namespaces

The Linux man page for network namespaces(7) says:

Network namespaces provide isolation of the system resources associated with networking: […], the /sys/class/net directory, […].

However, simply switching into a different network namespace doesn't seem to change the contents of /sys/class/net (see below for how to reproduce). Am I just mistaken here in thinking that the setns() into the network namespace is already sufficient? Is it always necessary to remount /sys in order to get the correct /sys/class/net matching the currently joined network namespace? Or am I missing something else here?

Example to Reproduce

Take an *ubuntu system, find the PID of the rtkit-daemon, enter the daemon's network namespace, show its network interfaces, and then check /sys/class/net:

$ PID=`sudo lsns -t net -n -o PID,COMMAND | grep rtkit-daemon | cut -d ' ' -f 2`
$ sudo nsenter -t $PID -n
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# ls /sys/class/net
docker0  enp3s0  lo  lxcbr0  ...

Please notice that while ip link show correctly only shows lo, /sys/class/net shows all network interfaces visible in the "root" network namespace (and "root" mount namespace).

In the case of rtkit-daemon also entering the mount namespace of it doesn't make a difference: sudo nsenter -t $PID -n -m and then ls /sys/class/net still shows network interfaces not present in the network namespace.

"Fix"

Many kudos to @Danila Kiver for explaining what really is going on behind the Linux kernel scenes. Remounting sysfs while the correct network namespace is joined will show the correct entries in /sys/class/net:

$ PID=`sudo lsns -t net -n -o PID,COMMAND | grep rtkit-daemon | cut -d ' ' -f 2`
$ sudo nsenter -t $PID -n
# MNT=`mktemp -d`
# mount -t sysfs none $MNT
# ls $MNT/class/net/
lo
# umount $MNT
# rmdir $MNT
# exit

So this now yields the correct results in /sys/class/net.

Best Answer

Let's look into man 5 sysfs:

/sys/class/net
    Each  of the entries in this directory is a symbolic link representing
    one of the real or virtual networking devices that are visible in 
    the network namespace of the process that is accessing the directory.

So, according to this manpage, the output of ls /sys/class/net must depend on the network namespace of the ls process. But... Actual behavior does not seem to be as described in this manpage. There is a nice kernel documentation about how it works.

Each sysfs mount has a namespace tag associated with it. This tag is set when sysfs gets mounted and depends on the network namespace of the calling process. Each sysfs entry (e.g. an entry in /sys/class/net) also may have a namespace tag associated with it.

When you iterate over the sysfs directory, the kernel obtains the namespace tag of the sysfs mount, and then it iterates over the entries, filtering out those which have different namespace tag.

So, it turns out that the results of iterating over the /sys/class/net depend on the network namespace of the process which initiated /sys mount rather than on the network namespace of the current process, thus, you must always mount /sys in the current network namespace (from any process belonging to this namespace) to see the correct results.

How namespaces work in Linux

Every process has reference files for their namespaces in /proc/<pid>/ns/. Additionally, ip netns creates persistent reference files in /run/netns/. These files are used with setns system call to change the namespace of the running thread to a namespace pointed by such file.

From shell you can enter to another namespace using nsenter program, providing namespace files (paths) in arguments.

A good overview of Linux namespaces is given in the Namespaces in operation article series on LWN.net.

Setting up namespaces

When you set up multiple namespaces (mount, pid, user, etc.), set up network namespace as early as possible, before altering mount and pid namespaces. If you do not have shared mount or pid namespaces, you do not have any way to point to the network namespace outside, because you can not see the files referring to network namespaces outside.

If you need more flexibility than the command line utilities provide, you need to use the systemcalls to manage name spaces directly from your program. For documentation, see the relevant man pages: man 2 setns, man 2 unshare and man 7 namespaces.

Example to Reproduce

"Fix"

Best Answer

Related Solutions

Linux – bind mounts get removed with network namespaces

Network Namespaces – How to Connect Veth Device Inside and Outside

How namespaces work in Linux

Setting up namespaces

Related Question