The Linux man page for network namespaces(7) says:
Network namespaces provide isolation of the system resources associated with networking: […], the /sys/class/net directory, […].
However, simply switching into a different network namespace doesn't seem to change the contents of /sys/class/net
(see below for how to reproduce). Am I just mistaken here in thinking that the setns()
into the network namespace is already sufficient? Is it always necessary to remount /sys
in order to get the correct /sys/class/net
matching the currently joined network namespace? Or am I missing something else here?
Example to Reproduce
Take an *ubuntu system, find the PID of the rtkit-daemon, enter the daemon's network namespace, show its network interfaces, and then check /sys/class/net
:
$ PID=`sudo lsns -t net -n -o PID,COMMAND | grep rtkit-daemon | cut -d ' ' -f 2`
$ sudo nsenter -t $PID -n
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# ls /sys/class/net
docker0 enp3s0 lo lxcbr0 ...
Please notice that while ip link show
correctly only shows lo
, /sys/class/net
shows all network interfaces visible in the "root" network namespace (and "root" mount namespace).
In the case of rtkit-daemon
also entering the mount namespace of it doesn't make a difference: sudo nsenter -t $PID -n -m
and then ls /sys/class/net
still shows network interfaces not present in the network namespace.
"Fix"
Many kudos to @Danila Kiver for explaining what really is going on behind the Linux kernel scenes. Remounting sysfs
while the correct network namespace is joined will show the correct entries in /sys/class/net
:
$ PID=`sudo lsns -t net -n -o PID,COMMAND | grep rtkit-daemon | cut -d ' ' -f 2`
$ sudo nsenter -t $PID -n
# MNT=`mktemp -d`
# mount -t sysfs none $MNT
# ls $MNT/class/net/
lo
# umount $MNT
# rmdir $MNT
# exit
So this now yields the correct results in /sys/class/net
.
Best Answer
Let's look into
man 5 sysfs
:So, according to this manpage, the output of
ls /sys/class/net
must depend on the network namespace of thels
process. But... Actual behavior does not seem to be as described in this manpage. There is a nice kernel documentation about how it works.Each
sysfs
mount has a namespace tag associated with it. This tag is set when sysfs gets mounted and depends on the network namespace of the calling process. Each sysfs entry (e.g. an entry in/sys/class/net
) also may have a namespace tag associated with it.When you iterate over the sysfs directory, the kernel obtains the namespace tag of the sysfs mount, and then it iterates over the entries, filtering out those which have different namespace tag.
So, it turns out that the results of iterating over the
/sys/class/net
depend on the network namespace of the process which initiated/sys
mount rather than on the network namespace of the current process, thus, you must always mount/sys
in the current network namespace (from any process belonging to this namespace) to see the correct results.