Why does `systemd-nspawn -n` network namespace not show in `ip netns list`

iproutenamespacenetwork-namespacessystemd-nspawn

tl;dr Linux has namespaces, in particular, network namespaces. It seems the namespace supposedly created via the -n flag when running systemd-nspwawn does not show up when employing ip netns list (neither in the host nor in the supposedly created namespace). It is either systemd-nspawn or ip netns not actually dealing with Linux namespaces (something I thought to be this: https://lwn.net/Articles/531114/#series_index)?

longer story:
I use the following command to run a "light-weight container" of Arch Linux from within my Arch Linux:

systemd-nspawn -nbUD /mntpointArchLinuxSysFs

the data at /mntpointArchLinuxSysFs has been bootstrapped, and "runs/boots" well. The man systemd-nspawn tells me that the -n options-flag means:

-n, --network-veth

Create a virtual Ethernet link ("veth") between host and container. The host side of the Ethernet link will be available as a
network interface named after the container's name (as specified with
--machine=), prefixed with "ve-". The container side of the Ethernet link will be named "host0". The --network-veth option implies
--private-network.

In turn, the implied --private-network is explained thus

--private-network

Disconnect networking of the container from the host. This makes all network interfaces unavailable in the container, with the

exception of the loopback device and those specified with
--network-interface= and configured with --network-veth. If this option is specified, the CAP_NET_ADMIN capability will be added to the
set of capabilities the container retains. The latter may be disabled
by using --drop-capability=. If this option is not specified (or
implied by one of the options listed below), the container will have
full access to the host network.

which seems to be a feat which is achieved via Linux namespaces, in particular Linux network namespaces, this that the started processes (i.e. the init of the container at /mntpointArchLinuxSysFs/bin/init and all child processes are in a different network namespace, i.e. are --private-network and only have the veth (virtual ethernet pair) as a remaining connection to the host namespace/system.

Using lsns shows that indeed systemd-nspawn created a namespace

root@host$> lsns | grep net
4026531992 net       183     1 root     /sbin/init
4026532332 net         1   824 rtkit    /usr/lib/rtkit-daemon
4026532406 net         7  4697 vu-mnt-0 /usr/lib/systemd/systemd

However ip netns list does refuse to "play along":

root@host$> ip netns list
root@host$>

Then is I for the sake of understanding create a dummy namespace via ip netns like this

root@host$> ip netns add dummy_netns
root@host$> ip netns list
dummy_netns
root@host$>

A network namespace is displayed, however, misses ironically in the lsns.

In conclusion, it seems to be unclear how the term "network namespace" is used in systemd-nspawn, ip netns as my test seem to suggest they might not really be the same thing? Maybe the term is ambiguous?

update

this part of the systemd-nspawn man page suggest imho, however that indeed both iproute and systemd-nspawn refer to the same thing in terms of network namespaces.

--network-namespace-path=
Takes the path to a file representing a kernel network namespace
that the container shall run in. The specified path should refer to
a (possibly bind-mounted) network namespace file, as exposed by the
kernel below /proc/$PID/ns/net. This makes the container enter the
given network namespace. One of the typical use cases is to give a
network namespace under /run/netns created by ip-netns(8), for
example, --network-namespace-path=/run/netns/foo. Note that this
option cannot be used together with other network-related options,
such as –private-network or –network-interface=.

Even though the last part stating that it cannot be used with the --private-network option again seems to suggest some sort of distincion. what is going on here?

Best Answer

Both systemd-nspawn and ip-netns use namespaces, specifically network namespaces. The difference, as explained in the ip-netns manual, is that ip-netns deals with named network namespaces.

By convention a named network namespace is an object at /var/run/netns/NAME that can be opened. The file descriptor resulting from opening /var/run/netns/NAME refers to the specified network namespace. Holding that file descriptor open keeps the network namespace alive.

Anonymous network namespaces

The namespaces(7) manual explains that in general, a namespace is an abstraction associated with the lifetime of the processes in it:

Each process has a /proc/[pid]/ns/ subdirectory containing one entry for each namespace that supports being manipulated by setns(2) ... Opening one of the files in this directory (or a file that is bind mounted to one of these files) returns a file handle for the corresponding namespace of the process specified by pid. As long as this file descriptor remains open, the namespace will remain alive, even if all processes in the namespace terminate.

On my system, the most recently launched systemd process (pgrep -f -n systemd\$) is the init process of a container started using the default systemd-nspawn@.service template unit, which enables --network-veth and thus --private-network (it also adds --private-users). This command shows that the container's anonymous network namespace is different to the root network namespace, and owned by the container's root user:

# ls -l /proc/1/ns/net /proc/$(pgrep -f -n systemd\$)/ns/net
lrwxrwxrwx 0 root           /proc/1/ns/net -> net:[4026532008]
lrwxrwxrwx 0 vu-container-0 /proc/700/ns/net -> net:[4026532656]

This anonymous network namespace disappears when the container is terminated. However, if I want to make it a named network namespace that can be managed with ip-netns during the life of the container, I can bind mount it under /run/netns:

# mount --bind /proc/$(pgrep -f -n systemd\$)/ns/net /run/netns/container
# ip netns list
container (id: 1)

Creating named network namespaces with systemd

You've also pointed out systemd-nspawn's --network-namespace-path option, which is equivalent to the NetworkNamespacePath= setting documented in systemd.unit(5). It can only assign containers and units to a network namespace that already exists. Because a process can only be in one namespace, --network-namespace-path is incompatible with options like --private-network which create an anonymous network namespace and isolate the container in it.

It seems that systemd will get a Namespace= setting in some future release of systemd after v246 (v245 was released in March 2020). This will allow units to create their own named network namespaces, rather than being assigned to an existing namespace with NetworkNamespacePath= or creating a new anonymous namespace with PrivateNetwork=. When this feature is merged, it would make sense for Namespace=%i to be added to the systemd-nspawn@.service template, so that containers' network namespaces are named by default.

Related Question