Network namespaces and public IP

namespacenetworking

To start, I've read this great article on network namespaces so I know more or less what network namespaces do and how to configure them.

The actual problem I'm facing is this:

I want to run some Valve game servers (CS, CS:GO, TF2) on 1 physical machine in LAN mode. So I want every client in the LAN to list all servers in the LAN, because this is the best user experience opposed to manually connecting to an ip:port.
The client software looks for LAN servers by broadcasting to ports 27015 – 27020, so 6 ports in total are available to run the server on, otherwise the servers won't be listed in the LAN browser. But, I have more than 6 servers so I need to use more than 1 IP for the same phyical server. The actual plan is to have 1 IP per game.
I cannot, and I repeat, cannot tell the game servers to bind to a specific IP as this stops the servers from being listed in the LAN browser at the client sides, even if I tell it explicitly to act like a LAN server.

(People that have been trying to run a CS 1.6, CSGO or TF2 server will probably recognize the "+ip <ip address>" problem here)

By having multiple IP's I cannot solve the problem, because I cannot tell the game server to bind to a specific IP address – the software always takes the primary IP so this can't work. (There is a port clash, or the software takes a port 27020+ which makes the server invisible in the LAN browser)

I want to solve it by using network namespaces – 1 network namespace per game:

In the "csgo" network namespace I will run 5 instances of CSGO. (27015 – 27019)
In the "tf2" network namespace I will run 1 instance of TF2. (27015)
In the "cs16" network namespace I will run 2 instances of CS 1.6. (27015 – 27016)

Because I would be running the game software in a namespace, the software will only see 1 IP and will automatically take that one. (Well, that's what I'm thinking!).

So 4 network namespaces in total ("default", "csgo", "tf2" and "cs16"). The configuration would look like:

- eth0 / 192.168.0.160 ("default" ns, internet access)
- veth0:0 / 192.168.0.161 ("default" ns) <======> veth0:1 / 192.168.0.171 ("csgo" NS)
- veth1:0 / 192.168.0.162 ("default" ns) <======> veth1:1 / 192.168.0.172 ("tf2" NS)
- veth2:0 / 192.168.0.163 ("default" ns) <======> veth2:1 / 192.168.0.173 ("cs16" NS)

Now the question is, can and will this work? If I run the CSGO server software in namespace "csgo", will the public IP of the LAN server then be 192.168.0.171? Or will it be 192.168.0.160? Or maybe 192.168.0.161? As per above, I really need a separate IP address for each game, to make all 9 server appear in the LAN browser.

If not, can this problem actually be solved by using network namespaces?

Best Answer

Binding applications to a specific IP address is a notoriously difficult problem: not all applications are as kind as ssh which allows you to specify the IP address you want to bind to by means of the -b option. For instance, Firefox and Chrome are notoriously impervious to this.

Luckily, there is a solution: this guy has modified the bind.so system library to allow one to specify the binding address on the command line, as follows:

$ BIND_ADDR="192.0.2.100" LD_PRELOAD=/usr/lib/bind.so firefox

By preloading the bind shared object, you bypass the system version which chooses the interface to bind to differently.

This is a heck of a lot easier and lighter on system resources than running multiple network spaces simultaneously.

The Web page above gives both instructions on how to compile the module and this link to pre-compiled 32- and 64-bit versions.

(Just for reference: I know you are not interested, but the code can be easily modified to force binding to a specific port).

EDIT:

I completely forgot that games would most likely use UDP, while the trick above only works for TCP connections. I am leaving my answer in place, in the hope of helping someone with TCP problems of this sort, but as an answer to Timmos this is completely useless.

To make up for my mistake, I am passing you a (very simple!) script I wrote which sets up one of (possibly many) network namespaces.

#!/bin/bash
#
# This script will setup a network namespace with a macvlan
# which obtains its IP address via dhclient from the LAN on which the host is
# placed
#

set -x

# It will open an xterm window in the new network namespace; if anything
# else is required, change the statement below.

export XTERM1=xterm

# The script will temporarily activate ip forwarding for you. If you
# do not wish to retain this feature, you will have to issue, at the 
# end of this session, the command
# echo 0 > /proc/sys/net/ipv4/ip_forward 
# yourself. 

###############################################################################

export WHEREIS=/usr/bin/whereis

# First of all, check that the script is run by root:

[ "root" != "$USER" ] && exec sudo $0 "$@"

if [ $# != 2 ]; then 
    echo "Usage $0 name action"
    echo "where name is the network namespace name,"
    echo " and action is one of start| stop| reload."
    exit 1
fi

# Do we have all it takes?

IERROR1=0
IERROR2=0
IERROR3=0

export IP=$($WHEREIS -b ip | /usr/bin/awk '{print $2}')
export IPTABLES=$($WHEREIS -b iptables | /usr/bin/awk '{print $2}')
export XTERM=$($WHEREIS -b $XTERM1 | /usr/bin/awk '{print $2}')

if [ "x$IP" = "x" ] ; then
    echo "please install the iproute2 package"
    IERROR1=1
fi

if [ "x$IPTABLES" = "x" ] ; then
    echo "please install the iptables package"
    IERROR2=1
fi

if [ "x$XTERM" = "x" ] ; then
    echo "please install the xterm package"
    IERROR3=1
fi

if [[ $IERROR1 == 0 && $IERROR2 == 0 && $IERROR3 == 0 ]] 
then
    :   
else
    exit 1
fi


prelim() {

# Perform some preliminary setup. First, clear the proposed 
# namespace name of blank characters; then create a directory
# for logging info, and a pid file in it; lastly, enable IPv4 
# forwarding. 

    VAR=$1
    export NNSNAME=${VAR//[[:space:]]}

    export OUTDIR=/var/log/newns/$NNSNAME

    if [ ! -d $OUTDIR ]; then
        /bin/mkdir -p $OUTDIR
    fi
    export PID=$OUTDIR/pid$NNSNAME

    echo 1 > /proc/sys/net/ipv4/ip_forward

}

start_nns() {

# Check whether a namespace with the same name already exists. 

    $IP netns list | /bin/grep $1 2> /dev/null
    if [ $? == 0 ]; then 
        echo "Network namespace $1 already exists,"
        echo "please choose another name"
        exit 1
    fi

# Here we take care of DNS

    /bin/mkdir -p /etc/netns/$1
    echo "nameserver 8.8.8.8" > /etc/netns/$1/resolv.conf
    echo "nameserver 8.8.4.4" >> /etc/netns/$1/resolv.conf

# The following creates the new namespace, and the macvlan interface

    $IP netns add $1
    $IP link add link eth0 mac$1 type macvlan mode bridge

# This assigns the macvlan interface, mac$1, to the new 
# namespace, asks for an IP address via a call to dhclient,
# brings up this and the (essential) lo interface, 
# creates a new terminal in the new namespace and 
# stores its pid for the purpose of tearing it cleanly, later. 

    $IP link set mac$1 netns $1
    $IP netns exec $1 /sbin/dhclient -v mac$1 1> /dev/null 2>&1
    $IP netns exec $1 $IP link set dev lo up
    $IP netns exec $1 su -c $XTERM $SUDO_USER &
    $IP netns exec $1 echo "$!" > $PID


}

stop_nns() {

# Check that the namespace to be torn down really exists

    $IP netns list | /bin/grep $1 2>&1 1> /dev/null
    if [ ! $? == 0 ]; then 
        echo "Network namespace $1 does not exist,"
        echo "please choose another name"
        exit 1
    fi

# This kills the terminal in the separate namespace and
# removes the file and the directory where it is stored.

    /bin/kill -TERM $(cat $PID) 2> /dev/null 1> /dev/null
    /bin/rm $PID
    /bin/rmdir $OUTDIR  
    $IP netns del $1

# This deletes the file and direcotory connected with the DNSes. 

    /bin/rm /etc/netns/$1/resolv.conf
    /bin/rmdir /etc/netns/$1

}


case $2 in
    start)
        prelim "$1"
        start_nns $NNSNAME
        ;;
    stop)
        prelim "$1"
        stop_nns $NNSNAME
        ;;
    reload)
        prelim "$1"
        stop_nns $NNSNAME
        prelim "$1"
        start_nns $NNSNAME
        ;;
    *) 
# This removes the absolute path from the command name

        NAME1=$0
        NAMESHORT=${NAME1##*/}

        echo "Usage:" $NAMESHORT "name action,"
        echo "where name is the name of the network namespace,"
        echo "and action is one of start|stop|reload"
        ;;
esac

It assumes your main interface is called eth0 (if yours is called differently, change the single reference to it accordingly), and uses macvlan interfaces, which means you can use the script only with an ethernet connection. Also, it does not need to use bridges.

You start/stop a separate network namespace as follows (I call the script nns, but you can call it whatever you like):

nns network_namespace_1 start
nns network_namespace_2 stop

You can have as many different network namespaces as your local DHCP server allows, since each macvlan interface gets an IP address from your LAN DHCP server. If a network namespace with the same name already exists, you will have to pick a different name.

All network namespaces can talk to each other, courtesy of the mode bridge option in their creation command. The script opens an xterm terminal in the new network namespace (I like xterm, if you do not you can change that at the top of the script) so that, from within the xterm you can start your applications.

I left the debugging option, set -x, in the script, which may help you iron out some initial problem. When done, just remove that line.

Cheers.

Network namespace

This information is available with the property link-netnsid in the output of ip link and can be matched with the id in the output of ip netns. It's possible to "associate" a container's network namespace with ip netns, thus using ip netns as a specialized tool. Of course doing a specific program for this would be better (some informations about syscalls at the end of each part).

About the nsid's description, here's what man ip netns tells (emphasis mine):

ip netns set NAME NETNSID - assign an id to a peer network namespace

This command assigns a id to a peer network namespace. This id is valid only in the current network namespace. This id will be used by the kernel in some netlink messages. If no id is assigned when the kernel needs it, it will be automatically assigned by the kernel. Once it is assigned, it's not possible to change it.

While creating a namespace with ip netns won't immediately create a netnsid, it will be created (on the current namespace, probably the "host") whenever a veth half is set to an other namespace. So it's always set for a typical container.

Here's an example using an LXC container:

# lxc-start -n stretch-amd64

A new veth link veth9RPX4M appeared (this can be tracked with ip monitor link). Here are the detailed informations:

# ip -o link show veth9RPX4M
44: veth9RPX4M@if43: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master lxcbr0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
link/ether fe:25:13:8a:00:f8 brd ff:ff:ff:ff:ff:ff link-netnsid 4

This link has the property link-netnsid 4, telling the other side is in the network namespace with nsid 4. How to verify it's the LXC container? The easiest way to get this information is making ip netns believe it created the container's network namespace, by doing the operations hinted in the manpage.

# mkdir -p /var/run/netns
# touch /var/run/netns/stretch-amd64
# mount -o bind /proc/$(lxc-info -H -p -n stretch-amd64)/ns/net /var/run/netns/stretch-amd64

UPDATE3: I didn't understand that finding back the global name was a problem. Here it is:

# ls -l /proc/$(lxc-info -H -p -n stretch-amd64)/ns/net
lrwxrwxrwx. 1 root root 0 mai    5 20:40 /proc/17855/ns/net -> net:[4026532831]

# stat -c %i /var/run/netns/stretch-amd64 
4026532831

Now the information is retrieved with:

# ip netns | grep stretch-amd64
stretch-amd64 (id: 4)

It confirms the veth's peer is in the network namespace with the same nsid = 4 = link-netnsid.

The container/ip netns "association" can be removed (without removing the namespace as long as the container is running):

# ip netns del stretch-amd64

Note: the nsid naming is per network namespace, usually starts with 0 for the first container, and the lowest value available is recycled with new namespaces.

About using syscalls, here are informations guessed from strace:

for the link part: it requires an AF_NETLINK socket (opened with socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)), asking ( sendmsg()) the link's informations with a message type RTM_GETLINK and retrieving (recvmsg()) the reply with message type RTM_NEWLINK.
for the netns nsid part: same method, the query message is type RTM_GETNSID with reply type RTM_NEWNSID.

I think the slightly higher level libraries to handle this are there: libnl. Anyway it's a topic for SO.

Interface index

Now it will be easier to follow why the index appear to have random behaviours. Let's do an experiment:

First enter a new net namespace to have a clean (index) slate:

# ip netns add test
# ip netns exec test bash
# ip netns id
test
# ip -o link 
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

As OP noted, lo begins with index 1.

Let's add 5 net namespaces, create veth pairs, then put a veth end on them:

# for i in {0..4}; do ip netns add test$i; ip link add type veth peer netns test$i ; done
# ip -o link|sed 's/^/    /'
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether e2:83:4f:60:5a:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
3: veth1@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 22:a7:75:8e:3c:95 brd ff:ff:ff:ff:ff:ff link-netnsid 1
4: veth2@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 72:94:6e:e4:2c:fc brd ff:ff:ff:ff:ff:ff link-netnsid 2
5: veth3@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether ee:b5:96:63:62:de brd ff:ff:ff:ff:ff:ff link-netnsid 3
6: veth4@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether e2:7d:e2:9a:3f:6d brd ff:ff:ff:ff:ff:ff link-netnsid 4

When it's displaying @if2 for each of them it becomes quite clear it's the peer's namespace interface index and index are not global, but per namespace. When it's displaying an actual interface name, it's a relation to an interface in the same name space (be it veth's peer, bridge, bond ...). So why veth0 doesn't have a peer displayed? I believe it's an ip link bug when the index is the same as itself. Just moving twice the peer link "solves" it here, because it forced an index change. I'm also sure sometimes ip link do other confusions and instead of displaying @ifXX, displays an interface in the current namespace with the same index.

# ip -n test0 link set veth0 name veth0b netns test
# ip link set veth0b netns test0
# ip -o link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth0@if7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether e2:83:4f:60:5a:30 brd ff:ff:ff:ff:ff:ff link-netnsid 0
3: veth1@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 22:a7:75:8e:3c:95 brd ff:ff:ff:ff:ff:ff link-netnsid 1
4: veth2@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 72:94:6e:e4:2c:fc brd ff:ff:ff:ff:ff:ff link-netnsid 2
5: veth3@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether ee:b5:96:63:62:de brd ff:ff:ff:ff:ff:ff link-netnsid 3
6: veth4@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether e2:7d:e2:9a:3f:6d brd ff:ff:ff:ff:ff:ff link-netnsid 4

UPDATE: reading again informations in OP's question, the peer's index (but not nsid) is easily and unambiguously available with cat /sys/class/net/ interface /iflink.

UPDATE2:

All those iflink 2 may appear ambiguous, but what is unique is the combination of nsid and iflink, not iflink alone. For the above example that is:

interface    nsid:iflink
veth0        0:7
veth1        1:2
veth2        2:2
veth3        3:2
veth4        4:2

In this namespace (namely namespace test) there will never be two same nsid:pair .

If one was to look from each peer network the opposite information:

namespace    interface    nsid:iflink
test0        veth0        0:2
test1        veth0        0:3
test2        veth0        0:4
test3        veth0        0:5
test4        veth0        0:6

But bear in mind that all the 0: there is for each one a separate 0, that happens to map to the same peer namespace (namely: namespace test, not even the host). They can't be directly compared because they're tied to their namespace. So the whole comparable and unique information should be:

test0:0:2
test1:0:3
test2:0:4
test3:0:5
test4:0:6

Once it's confirmed that "test0:0" == "test1:0" etc. (true in this example, all map to the net namespace called test by ip netns) then they can be really compared.

About syscalls, still looking at strace results,the information is retrieved as above from RTM_GETLINK. Now there should be all informations available:

local: interface index with SIOCGIFINDEX / if_nametoindex
peer: both nsid and interface index with RTM_GETLINK.

All this should probably be used with libnl.

Why does `systemd-nspawn -n` network namespace not show in `ip netns list`

Both systemd-nspawn and ip-netns use namespaces, specifically network namespaces. The difference, as explained in the ip-netns manual, is that ip-netns deals with named network namespaces.

By convention a named network namespace is an object at /var/run/netns/NAME that can be opened. The file descriptor resulting from opening /var/run/netns/NAME refers to the specified network namespace. Holding that file descriptor open keeps the network namespace alive.

Anonymous network namespaces

The namespaces(7) manual explains that in general, a namespace is an abstraction associated with the lifetime of the processes in it:

Each process has a /proc/[pid]/ns/ subdirectory containing one entry for each namespace that supports being manipulated by setns(2) ... Opening one of the files in this directory (or a file that is bind mounted to one of these files) returns a file handle for the corresponding namespace of the process specified by pid. As long as this file descriptor remains open, the namespace will remain alive, even if all processes in the namespace terminate.

On my system, the most recently launched systemd process (pgrep -f -n systemd\$) is the init process of a container started using the default systemd-nspawn@.service template unit, which enables --network-veth and thus --private-network (it also adds --private-users). This command shows that the container's anonymous network namespace is different to the root network namespace, and owned by the container's root user:

# ls -l /proc/1/ns/net /proc/$(pgrep -f -n systemd\$)/ns/net
lrwxrwxrwx 0 root           /proc/1/ns/net -> net:[4026532008]
lrwxrwxrwx 0 vu-container-0 /proc/700/ns/net -> net:[4026532656]

This anonymous network namespace disappears when the container is terminated. However, if I want to make it a named network namespace that can be managed with ip-netns during the life of the container, I can bind mount it under /run/netns:

# mount --bind /proc/$(pgrep -f -n systemd\$)/ns/net /run/netns/container
# ip netns list
container (id: 1)

Creating named network namespaces with systemd

You've also pointed out systemd-nspawn's --network-namespace-path option, which is equivalent to the NetworkNamespacePath= setting documented in systemd.unit(5). It can only assign containers and units to a network namespace that already exists. Because a process can only be in one namespace, --network-namespace-path is incompatible with options like --private-network which create an anonymous network namespace and isolate the container in it.

It seems that systemd will get a Namespace= setting in some future release of systemd after v246 (v245 was released in March 2020). This will allow units to create their own named network namespaces, rather than being assigned to an existing namespace with NetworkNamespacePath= or creating a new anonymous namespace with PrivateNetwork=. When this feature is merged, it would make sense for Namespace=%i to be added to the systemd-nspawn@.service template, so that containers' network namespaces are named by default.

Best Answer

Related Solutions

Network Namespaces – How to Find the Network Namespace of a veth Peer ifindex

Network namespace

Interface index

Why does `systemd-nspawn -n` network namespace not show in `ip netns list`

Anonymous network namespaces

Creating named network namespaces with systemd

Related Question