Linux Capabilities with User Namespaces

capabilitieslinuxlinux-kernelnamespace

I'm confused about how file capabilities work with regards to user namespaces. As I understand it, if a file has a capability, then any threads/process executing the file can attain the capability.

On my ping binary, I have the CAP_NET_RAW capability set, and there is no setuid.

# CAP_NET_RAW is set
→ getcap `which ping`                                               
/bin/ping = cap_net_raw+ep

# There is no setuid
→ ls -l `which ping`                                                 
-rwxr-xr-x 1 root root 64424 Mar  9  2017 /bin/ping

# ping works...
→ ping -c 1 google.com                                               
PING google.com (172.217.6.46) 56(84) bytes of data.                                                  
64 bytes from sfo03s08-in-f14.1e100.net (172.217.6.46): icmp_seq=1 
ttl=54 time=11.9 ms                

--- google.com ping statistics ---                 
1 packets transmitted, 1 received, 0% packet loss, time 0ms                                           
rtt min/avg/max/mdev = 11.973/11.973/11.973/0.000 ms  

So why is it that I cannot ping from my user namespace?

→ ping google.com      
ping: socket: Operation not permitted 

→ capsh --print        
Current: = ...cap_net_raw...+ep                                                         
Bounding set =...cap_net_raw...                                                       
Securebits: 00/0x0/1'b0                            
secure-noroot: no (unlocked)                      
secure-no-suid-fixup: no (unlocked)               
secure-keep-caps: no (unlocked)                   
uid=0(root)                                        
gid=0(root)                                        

→ getcap `which ping`  
/bin/ping = cap_net_raw+ep      

Best Answer

The child process created by clone(2) with the CLONE_NEWUSER flag starts out with a complete set of capabilities in the new user namespace. Likewise, a process that creates a new user namespace using unshare(2) or joins an existing user namespace using setns(2) gains a full set of capabilities in that namespace. On the other hand, that process has no capabilities in the parent (in the case of clone(2)) or previous (in the case of unshare(2) and setns(2)) user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace).

...

When a non-user-namespace is created, it is owned by the user namespace in which the creating process was a member at the time of the creation of the namespace. Actions on the non-user-namespace require capabilities in the corresponding user namespace.

-- http://man7.org/linux/man-pages/man7/user_namespaces.7.html

You can't get raw net access to network interfaces which you don't own!

$ unshare -r
# ping -c1 127.0.0.1
ping: socket: Operation not permitted

Compare:

$ unshare -rn
# ping -c1 127.0.0.1
connect: Network is unreachable
# ip link set dev lo up    # apparently the `lo` interface is pre-created.
# ping -c1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.048 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.048/0.048/0.048/0.000 ms