Linux – why is CAP_SYS_ADMIN needed for CLONE_NEWPID

capabilitieslinuxnamespaceprocessSecurity

man 2 unshare tells us

Use of CLONE_NEWPID requires the CAP_SYS_ADMIN capability

and the suggested reading for further information man 7 pid_namespaces does not really discuss the presumable risk that makes it necessary to restrict pid_namespaces to root/CAP_SYS_ADMIN only.

What would the risk of CLONE_NEWPID be if run by a non-root user?

In a clone without CLONE_NEWPID the pid_namespace would be unchanged and hence much broader and potentially more dangerous than it would be int the case of creating a new empty pid_namespace.

Sadly, without some concept of user PID namespaces for a non-root user, keeping track of descendant processes reliably in Linux becomes difficult. pid_namespaces would be very handy functionality and thus it is incomprehensible to me why only CAP_SYS_ADMIN is thought fit to run CLONE_NEWPID. Did I miss a major point that makes CLONE_NEWPID such risky busyness?

Best Answer

I think it's a precaution. Unprivileged users are not allowed to apply confinements to programs like sudo which are set-user-id (or have file capabilities set), in case it confuses them into performing actions they did not intend to allow.

In some cases this is enforced by preventing elevation by set-uid etc. This is the approach taken when filtering system calls with seccomp.

However for namespaces, the intention was very much to allow namespacing user ids. Namespaces were merged into mainline Linux in an incremental process, starting with the simplest, and culminating in user namespaces. I suspect there was little interest in adding the special case, to enforce no-new-privs when entering a PID namespace, when you do not already have full privilege.

The interaction of these namespaces becomes quite intricate, so it's nice not to proliferate too many different cases, if those cases are not in very high demand.

Related Solutions

Linux – Suggestions needed to debug why ps -ef gets stuck

Had that just yesterday. The problem was, one process was in "uninterruptible sleep" state, shown as status D in top. ls /proc/ does not return and is not abortable. ps -ef does not return and is not abortable.

If rebooting does not help you probably have a bad sector on your DVD or hard disk and the process PID is trying to read there during startup. So technically rebooting helps, but the error re-occurs automatically.

Check with top if the process is indeed in status D, then go on from there. Boot the computer without calling this process (rescue system). Then start the program stracing it and see which files it accesses. I bet one file has bad sectors.

Why are not all permitted capabilities of a linux process effective all the time

The difference between effective/permitted capabilities is similar to the difference between real/effective UIDs in setuid programs. The idea isn't to stop a rogue app from escalating privs (you wouldn't grant them privs in the first place, same as you wouldn't setuid them) but to allow a program to run with minimal privileges and only escalate where necessary. This helps minimize the impact of bugs

A very contrived example: I want to have a program that will let me send a SIGHUP to processes owned by the user or to allow a God user to send SIGHUP to init.

This program has the CAP_KILL capability set on the file.

The pseudo code might look something like:

drop_effective CAP_KILL
repeat forever:
  get_process_id_from user
  if process_id==1 and user_is_God:
    set_effective CAP_KILL
    kill(-1,1)
    drop_effective CAP_KILL
 else:
   kill(-1,process_id)

The obvious bug, here, is that I don't check that the user is allowed to send the signal in the first place. Because I've dropped the effective CAP_KILL permission I won't be allowing the user to kill processes other than their own.

Very contrived, for sure! But the idea is to run as far as possible with "least privileges" and only enable privileges when necessary.

Now this won't necessarily protect against buffer overflow attacks because the injected code could enable permitted privileges, so capability aware code should also drop permitted privileges once they are no longer needed; eg a webserver might drop CAP_NET_BIND_SERVICE after it has bound to port 80. You can't enable something not in your permitted set!

Best Answer

Related Solutions

Linux – Suggestions needed to debug why ps -ef gets stuck

Why are not all permitted capabilities of a linux process effective all the time

Related Question