Linux – Switching Network Namespace Doesn’t Change /sys/class/net

linuxnamespacenetwork-namespaces

The Linux man page for network namespaces(7) says:

Network namespaces provide isolation of the system resources associated with networking: […], the /sys/class/net directory, […].

However, simply switching into a different network namespace doesn't seem to change the contents of /sys/class/net (see below for how to reproduce). Am I just mistaken here in thinking that the setns() into the network namespace is already sufficient? Is it always necessary to remount /sys in order to get the correct /sys/class/net matching the currently joined network namespace? Or am I missing something else here?

Example to Reproduce

Take an *ubuntu system, find the PID of the rtkit-daemon, enter the daemon's network namespace, show its network interfaces, and then check /sys/class/net:

$ PID=`sudo lsns -t net -n -o PID,COMMAND | grep rtkit-daemon | cut -d ' ' -f 2`
$ sudo nsenter -t $PID -n
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# ls /sys/class/net
docker0  enp3s0  lo  lxcbr0  ...

Please notice that while ip link show correctly only shows lo, /sys/class/net shows all network interfaces visible in the "root" network namespace (and "root" mount namespace).

In the case of rtkit-daemon also entering the mount namespace of it doesn't make a difference: sudo nsenter -t $PID -n -m and then ls /sys/class/net still shows network interfaces not present in the network namespace.

"Fix"

Many kudos to @Danila Kiver for explaining what really is going on behind the Linux kernel scenes. Remounting sysfs while the correct network namespace is joined will show the correct entries in /sys/class/net:

$ PID=`sudo lsns -t net -n -o PID,COMMAND | grep rtkit-daemon | cut -d ' ' -f 2`
$ sudo nsenter -t $PID -n
# MNT=`mktemp -d`
# mount -t sysfs none $MNT
# ls $MNT/class/net/
lo
# umount $MNT
# rmdir $MNT
# exit

So this now yields the correct results in /sys/class/net.

Best Answer

Let's look into man 5 sysfs:

/sys/class/net
    Each  of the entries in this directory is a symbolic link representing
    one of the real or virtual networking devices that are visible in 
    the network namespace of the process that is accessing the directory.

So, according to this manpage, the output of ls /sys/class/net must depend on the network namespace of the ls process. But... Actual behavior does not seem to be as described in this manpage. There is a nice kernel documentation about how it works.

Each sysfs mount has a namespace tag associated with it. This tag is set when sysfs gets mounted and depends on the network namespace of the calling process. Each sysfs entry (e.g. an entry in /sys/class/net) also may have a namespace tag associated with it.

When you iterate over the sysfs directory, the kernel obtains the namespace tag of the sysfs mount, and then it iterates over the entries, filtering out those which have different namespace tag.

So, it turns out that the results of iterating over the /sys/class/net depend on the network namespace of the process which initiated /sys mount rather than on the network namespace of the current process, thus, you must always mount /sys in the current network namespace (from any process belonging to this namespace) to see the correct results.