Here's the method I followed to find how to understand this problem. Available tools appear usable (with some convolution) for the namespace part, and (UPDATED) using /sys/ can easily get the peer's index. So it's quite long, bear with me. It's in two parts (which are not in the logical order, but namespace first helps explain the the index naming), using common tools, not any custom program:
- Network namespace
- Interface index
Network namespace
This information is available with the property link-netnsid
in the output of ip link
and can be matched with the id in the output of ip netns
. It's possible to "associate" a container's network namespace with ip netns
, thus using ip netns
as a specialized tool. Of course doing a specific program for this would be better (some informations about syscalls at the end of each part).
About the nsid's description, here's what man ip netns
tells (emphasis mine):
ip netns set NAME NETNSID - assign an id to a peer network namespace
This command assigns a id to a peer network namespace. This id is valid only in the current network namespace. This id will be used by
the kernel in some netlink messages. If no id is assigned when the
kernel needs it, it will be automatically assigned by the kernel. Once
it is assigned, it's not possible to change it.
While creating a namespace with ip netns
won't immediately create a netnsid, it will be created (on the current namespace, probably the "host") whenever a veth half is set to an other namespace. So it's always set for a typical container.
Here's an example using an LXC container:
# lxc-start -n stretch-amd64
A new veth link veth9RPX4M
appeared (this can be tracked with ip monitor link
). Here are the detailed informations:
# ip -o link show veth9RPX4M
44: veth9RPX4M@if43: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master lxcbr0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
link/ether fe:25:13:8a:00:f8 brd ff:ff:ff:ff:ff:ff link-netnsid 4
This link has the property link-netnsid 4
, telling the other side is in the network namespace with nsid 4. How to verify it's the LXC container? The easiest way to get this information is making ip netns
believe it created the container's network namespace, by doing the operations hinted in the manpage.
# mkdir -p /var/run/netns
# touch /var/run/netns/stretch-amd64
# mount -o bind /proc/$(lxc-info -H -p -n stretch-amd64)/ns/net /var/run/netns/stretch-amd64
UPDATE3: I didn't understand that finding back the global name was a problem. Here it is:
# ls -l /proc/$(lxc-info -H -p -n stretch-amd64)/ns/net
lrwxrwxrwx. 1 root root 0 mai 5 20:40 /proc/17855/ns/net -> net:[4026532831]
# stat -c %i /var/run/netns/stretch-amd64
4026532831
Now the information is retrieved with:
# ip netns | grep stretch-amd64
stretch-amd64 (id: 4)
It confirms the veth's peer is in the network namespace with the same nsid = 4 = link-netnsid.
The container/ip netns
"association" can be removed (without removing the namespace as long as the container is running):
# ip netns del stretch-amd64
Note: the nsid naming is per network namespace, usually starts with 0 for the first container, and the lowest value available is recycled with new namespaces.
About using syscalls, here are informations guessed from strace:
for the link part: it requires an AF_NETLINK
socket (opened with socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)
), asking ( sendmsg()
) the link's informations with a message type RTM_GETLINK
and retrieving (recvmsg()
) the reply with message type RTM_NEWLINK
.
for the netns nsid part: same method, the query message is type RTM_GETNSID
with reply type RTM_NEWNSID
.
I think the slightly higher level libraries to handle this are there: libnl. Anyway it's a topic for SO.
Interface index
Now it will be easier to follow why the index appear to have random behaviours. Let's do an experiment:
First enter a new net namespace to have a clean (index) slate:
# ip netns add test
# ip netns exec test bash
# ip netns id
test
# ip -o link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
As OP noted, lo begins with index 1.
Let's add 5 net namespaces, create veth pairs, then put a veth end on them:
# for i in {0..4}; do ip netns add test$i; ip link add type veth peer netns test$i ; done
# ip -o link|sed 's/^/ /'
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether e2:83:4f:60:5a:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
3: veth1@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether 22:a7:75:8e:3c:95 brd ff:ff:ff:ff:ff:ff link-netnsid 1
4: veth2@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether 72:94:6e:e4:2c:fc brd ff:ff:ff:ff:ff:ff link-netnsid 2
5: veth3@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether ee:b5:96:63:62:de brd ff:ff:ff:ff:ff:ff link-netnsid 3
6: veth4@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether e2:7d:e2:9a:3f:6d brd ff:ff:ff:ff:ff:ff link-netnsid 4
When it's displaying @if2 for each of them it becomes quite clear it's the peer's namespace interface index and index are not global, but per namespace. When it's displaying an actual interface name, it's a relation to an interface in the same name space (be it veth's peer, bridge, bond ...). So why veth0 doesn't have a peer displayed? I believe it's an ip link
bug when the index is the same as itself. Just moving twice the peer link "solves" it here, because it forced an index change. I'm also sure sometimes ip link
do other confusions and instead of displaying @ifXX, displays an interface in the current namespace with the same index.
# ip -n test0 link set veth0 name veth0b netns test
# ip link set veth0b netns test0
# ip -o link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0@if7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether e2:83:4f:60:5a:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
3: veth1@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether 22:a7:75:8e:3c:95 brd ff:ff:ff:ff:ff:ff link-netnsid 1
4: veth2@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether 72:94:6e:e4:2c:fc brd ff:ff:ff:ff:ff:ff link-netnsid 2
5: veth3@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether ee:b5:96:63:62:de brd ff:ff:ff:ff:ff:ff link-netnsid 3
6: veth4@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\ link/ether e2:7d:e2:9a:3f:6d brd ff:ff:ff:ff:ff:ff link-netnsid 4
UPDATE: reading again informations in OP's question, the peer's index (but not nsid) is easily and unambiguously available with cat /sys/class/net/
interface
/iflink
.
UPDATE2:
All those iflink 2 may appear ambiguous, but what is unique is the combination of nsid and iflink, not iflink alone. For the above example that is:
interface nsid:iflink
veth0 0:7
veth1 1:2
veth2 2:2
veth3 3:2
veth4 4:2
In this namespace (namely namespace test
) there will never be two same nsid:pair .
If one was to look from each peer network the opposite information:
namespace interface nsid:iflink
test0 veth0 0:2
test1 veth0 0:3
test2 veth0 0:4
test3 veth0 0:5
test4 veth0 0:6
But bear in mind that all the 0:
there is for each one a separate 0, that happens to map to the same peer namespace (namely: namespace test
, not even the host). They can't be directly compared because they're tied to their namespace. So the whole comparable and unique information should be:
test0:0:2
test1:0:3
test2:0:4
test3:0:5
test4:0:6
Once it's confirmed that "test0:0" == "test1:0" etc. (true in this example, all map to the net namespace called test
by ip netns
) then they can be really compared.
About syscalls, still looking at strace results,the information is retrieved as above from RTM_GETLINK
. Now there should be all informations available:
local: interface index with SIOCGIFINDEX
/ if_nametoindex
peer: both nsid and interface index with RTM_GETLINK
.
All this should probably be used with libnl.
Both systemd-nspawn
and ip-netns
use namespaces, specifically network namespaces. The difference, as explained in the ip-netns manual, is that ip-netns
deals with named network namespaces.
By convention a named network namespace is an object at /var/run/netns/NAME
that can be opened. The file descriptor resulting from opening /var/run/netns/NAME
refers to the specified network namespace. Holding that file descriptor open keeps the network namespace alive.
Anonymous network namespaces
The namespaces(7) manual explains that in general, a namespace is an abstraction associated with the lifetime of the processes in it:
Each process has a /proc/[pid]/ns/
subdirectory containing one entry for each namespace that supports being manipulated by setns(2)
... Opening one of the files in this directory (or a file that is bind mounted to one of these files) returns a file handle for the corresponding namespace of the process specified by pid. As long as this file descriptor remains open, the namespace will remain alive, even if all processes in the namespace terminate.
On my system, the most recently launched systemd
process (pgrep -f -n systemd\$
) is the init process of a container started using the default systemd-nspawn@.service
template unit, which enables --network-veth
and thus --private-network
(it also adds --private-users
). This command shows that the container's anonymous network namespace is different to the root network namespace, and owned by the container's root user:
# ls -l /proc/1/ns/net /proc/$(pgrep -f -n systemd\$)/ns/net
lrwxrwxrwx 0 root /proc/1/ns/net -> net:[4026532008]
lrwxrwxrwx 0 vu-container-0 /proc/700/ns/net -> net:[4026532656]
This anonymous network namespace disappears when the container is terminated. However, if I want to make it a named network namespace that can be managed with ip-netns
during the life of the container, I can bind mount it under /run/netns
:
# mount --bind /proc/$(pgrep -f -n systemd\$)/ns/net /run/netns/container
# ip netns list
container (id: 1)
Creating named network namespaces with systemd
You've also pointed out systemd-nspawn
's --network-namespace-path
option, which is equivalent to the NetworkNamespacePath=
setting documented in systemd.unit(5). It can only assign containers and units to a network namespace that already exists. Because a process can only be in one namespace, --network-namespace-path
is incompatible with options like --private-network
which create an anonymous network namespace and isolate the container in it.
It seems that systemd will get a Namespace=
setting in some future release of systemd after v246 (v245 was released in March 2020). This will allow units to create their own named network namespaces, rather than being assigned to an existing namespace with NetworkNamespacePath=
or creating a new anonymous namespace with PrivateNetwork=
. When this feature is merged, it would make sense for Namespace=%i
to be added to the systemd-nspawn@.service
template, so that containers' network namespaces are named by default.
Best Answer
Binding applications to a specific IP address is a notoriously difficult problem: not all applications are as kind as ssh which allows you to specify the IP address you want to bind to by means of the -b option. For instance, Firefox and Chrome are notoriously impervious to this.
Luckily, there is a solution: this guy has modified the bind.so system library to allow one to specify the binding address on the command line, as follows:
By preloading the bind shared object, you bypass the system version which chooses the interface to bind to differently.
This is a heck of a lot easier and lighter on system resources than running multiple network spaces simultaneously.
The Web page above gives both instructions on how to compile the module and this link to pre-compiled 32- and 64-bit versions.
(Just for reference: I know you are not interested, but the code can be easily modified to force binding to a specific port).
EDIT:
I completely forgot that games would most likely use UDP, while the trick above only works for TCP connections. I am leaving my answer in place, in the hope of helping someone with TCP problems of this sort, but as an answer to Timmos this is completely useless.
To make up for my mistake, I am passing you a (very simple!) script I wrote which sets up one of (possibly many) network namespaces.
It assumes your main interface is called eth0 (if yours is called differently, change the single reference to it accordingly), and uses macvlan interfaces, which means you can use the script only with an ethernet connection. Also, it does not need to use bridges.
You start/stop a separate network namespace as follows (I call the script nns, but you can call it whatever you like):
You can have as many different network namespaces as your local DHCP server allows, since each macvlan interface gets an IP address from your LAN DHCP server. If a network namespace with the same name already exists, you will have to pick a different name.
All network namespaces can talk to each other, courtesy of the mode bridge option in their creation command. The script opens an xterm terminal in the new network namespace (I like xterm, if you do not you can change that at the top of the script) so that, from within the xterm you can start your applications.
I left the debugging option, set -x, in the script, which may help you iron out some initial problem. When done, just remove that line.
Cheers.