I can't seem to match processes running in cgroup v2 hierarchies with the cgroup
module of iptables
. I am running Linux 4.13.0 with all required modules:
$ grep CGROUP <kernel_config>
CONFIG_CGROUPS=y
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_HUGETLB is not set
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
# CONFIG_CGROUP_BPF is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
**CONFIG_NETFILTER_XT_MATCH_CGROUP=m**
CONFIG_NET_CLS_CGROUP=m
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
$ lsmod | grep cgroup
xt_cgroup 16384 2
x_tables 36864 7 xt_LOG,xt_cgroup,iptable_mangle,ip_tables,iptable_filter,xt_mark,ipt_MASQUERADE
It's a Debian based distro with systemd-235, which mounts the following cgroups:
$ mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
If I work with cgroup v1 and net_cls
, all is fine:
$ cd /sys/fs/cgroup/net_cls,net_prio/
$ mkdir test
$ echo 1 > test/net_cls.classid
$ iptables -A OUTPUT -m cgroup --cgroup 1 -j LOG
$ ping -i 2 google.com &>/dev/null &
$ pgrep ping > test/tasks
I can see the packets in the log. Doing the same with cgroup v2 successfully adds the iptables rules but does not match:
$ cd /sys/fs/cgroup/unified/
$ mkdir test
$ iptables -A OUTPUT -m cgroup --path test -j LOG
$ ping -i 2 google.com &>/dev/null &
$ pgrep ping > test/cgroup.procs
The process is running inside this cgroup:
$ cat /proc/<pid>/cgroup
0::/test
and iptables
did not complain about an invalid cgroup path, but nothing shows up in the log.
Background
I need to run a tor relay outside my VPN traffic which is used for all packet going outside my LAN. I followed the approach outlined in this answer and it works great (with cgroup v1). The problem is that I didn't find a straightforward way to create a custom cgroup at boot (cgmanager
fails to start due to apparent lack of cgroup v2 support) and to assign the tor process to it (how to do it inside a systemd
service?). But systemd
does create a separate cgroup inside the unified cgroup v2 hierarchy for every service, so the tor process lives in system.slice/system-tor.slice
. As shown by a simple example above, iptables can't seem to match this traffic.
Best Answer
Part of the answer to your question is in my answer that you linked:
Well, iptables matches sometimes in this case, like in your cgroup v1 log rule.
Still, iptables seems to always match for the moved process children, as they are immediately created with the right cgroup. So a solution is to start a new shell, move the shell in the cgroup, and run the desired command in this new shell:
That's indeed what this cgexec replacement script for cgroup v2 does. You may need to edit the script to replace
CGBASE
variable value with/sys/fs/cgroup/unified
(get the correct path for your environment withmount -t cgroup2
).EDIT: Updated novpn.sh to support cgroups v2 with
-2
flag.But is it supposed to work?
I'm a bit surprised that this answer for cgroup v2 actually works given this issue - more in the Notes of this page.
Which means the net_cls controller is bound to cgroup v1 (otherwise hierarchy would be 0) but iptables still works with cgroup v2 parameter. How I understand it: net_cls network controller is just a cgroup v1 concept that was replaced by cgroup v2 cgroup namespace. So it seems we can use both iptables cgroup v1 and iptables cgroup v2 rules at the same time if the OS supports both cgroup v1 and v2.
Background notes on running services in a network control group:
Except Fedora 31 that switched to cgroup v2 by default, at this time, most distributions still use cgroup v1 by default.
cgmanager
is indeed not needed and I recently removed it from the requirements from the answer you linked.cgmanager
is deprecated and was dropped in bionic, in favor ofsystemd
own cgroup management implementation. Unfortunately,systemd
maintainers have dropped NetClass option for cgroup v1, because they focus on cgroup v2.So with cgroup v1, it becomes tricky to run services in a network control group because you need to do all these steps BEFORE the desired service main process (e.g. tor relay, apache executable, whatever) gets executed, without any help from
systemd
which is the service launcher:This might be possible with the systemd unit service initialization script. Otherwise,
cgconfig
could be used, see this question/answer for Ubuntu - but I'd stay away ofcgrulesengd
as it may interfere withsystemd
.