Non-Root User Virtualization – How to Achieve the Effect of Chroot in Userspace in Linux

chrootjailsnot-root-uservirtualization

The goal is to install and run programs in a displaced (relocated) distro (whose / must not coincide with the global /) inside a host Linux system. The programs are not adapted for using a different / .

fakechroot is not a complete solution because it employs library-substitution instead of acting on the level of system calls (so not good for statically linked binaries).

Best Answer

The solution must probably be based either on ptrace or namespaces (unshare).

ptrace-based solutions are probably less efficient then namespaces/unshare-based (but the latter technology is cutting-edge and is not well explored path, probably).

ptrace-based

UMView

As for ptrced-based solutions, thanks to the comments at https://stackoverflow.com/a/1019720/94687, I've discovered UMView:

The linked docs describe how to have a "copy-on-write view" of the host fs -- that's not exactly like performing a chroot. Exact intructions on how to achieve /-substitution in umview would be nice to have in an answer to my question (please write one if you figure out how to do this!).

umview must be open-source, because it is included in Ubuntu and Debian -- http://packages.ubuntu.com/lucid/umview.

"Confining programs"

Another implementation is described in http://www.cs.vu.nl/~rutger/publications/jailer.pdf, http://www.cs.vu.nl/~guido/mansion/publications/ps/secrypt07.pdf.

They have a change-root-ing policy rule, CHRDIR, whose effect is similar to chroot. (Section "The jailing policy")

However, they might have not published their source code (partially based on a modified strace http://www.liacs.nl/~wichert/strace/ -- Section "Implementation")...

geordi

Geordi (http://www.eelis.net/geordi/, https://github.com/Eelis/geordi) could probably be modified to make the wanted rewriting of file arguments to system calls in the jailed programs.

proot

PRoot is a ready to use ptrace-based tool for this. http://proot.me/:

chroot equivalent

To execute a command inside a given Linux distribution, just give proot the path to the guest rootfs followed by the desired command. The example below executes the program cat to print the content of a file:
proot -r /mnt/slackware-8.0/ cat /etc/motd

Welcome to Slackware Linux 8.0
The default command is /bin/sh when none is specified. Thus the shortest way to confine an interactive shell and all its sub-programs is:
proot -r /mnt/slackware-8.0/

$ cat /etc/motd
Welcome to Slackware Linux 8.0

unshare-based

user_namespaces support in the Linux kernel has got more mature since when the question was asked. Now you can play with performing a chroot as a normal with the help of unshare like in Simulate chroot with unshare:

unshare --user --map-root-user --mount-proc --pid --fork
chroot ......
su - user1

virtual machines/OS

(the answer mentioning virtual machines/OS)

kernel extension (like SELinux)

(mentioned in comments here),

chroot

Chroot-based helpers (which however must be setUID root, because chroot requires root; or perhaps chroot could work in an isolated namespace--see below):

[to tell a little more about them!]

Known chroot-based isolation tools:

hasher with its hsh-run and hsh-shell commands. (Hasher was designed for building software in a safe and repeatable manner.)
schroot mentioned in another answer
...

ptrace

Another trustworthy isolation solution (besides a seccomp-based one) would be the complete syscall-interception through ptrace, as explained in the manpage for fakeroot-ng:

Unlike previous implementations, fakeroot-ng uses a technology that leaves the traced process no choice regarding whether it will use fakeroot-ng's "services" or not. Compiling a program statically, directly calling the kernel and manipulating ones own address space are all techniques that can be trivially used to bypass LD_PRELOAD based control over a process, and do not apply to fakeroot-ng. It is, theoretically, possible to mold fakeroot-ng in such a way as to have total control over the traced process.

While it is theoretically possible, it has not been done. Fakeroot-ng does assume certain "nicely behaved" assumptions about the process being traced, and a process that break those assumptions may be able to, if not totally escape then at least circumvent some of the "fake" environment imposed on it by fakeroot-ng. As such, you are strongly warned against using fakeroot-ng as a security tool. Bug reports that claim that a process can deliberatly (as opposed to inadvertly) escape fake‐ root-ng's control will either be closed as "not a bug" or marked as low priority.

It is possible that this policy be rethought in the future. For the time being, however, you have been warned.

Still, as you can read it, fakeroot-ng itself is not designed for this purpose.

(BTW, I wonder why they have chosen to use the seccomp-based approach for Chromium rather than a ptrace-based...)

Of the tools not mentioned above, I have noted Geordi for myself, because I liked that the controlling program is written in Haskell.

Known ptrace-based isolation tools:

Geordi
proot
fakeroot-ng
... (see also How to achieve the effect of chroot in userspace in Linux (without being root)?)

seccomp

One known way to achieve isolation is through the seccomp sandboxing approach used in Google Chromium. But this approach supposes that you write a helper which would process some (the allowed ones) of the "intercepted" file access and other syscalls; and also, of course, make effort to "intercept" the syscalls and redirect them to the helper (perhaps, it would even mean such a thing as replacing the intercepted syscalls in the code of the controlled process; so, it doesn't sound to be quite simple; if you are interested, you'd better read the details rather than just my answer).

More related info (from Wikipedia):

http://en.wikipedia.org/wiki/Seccomp
http://code.google.com/p/seccompsandbox/wiki/overview
LWN article: Google's Chromium sandbox, Jake Edge, August 2009
seccomp-nurse, a sandboxing framework based on seccomp.

(The last item seems to be interesting if one is looking for a general seccomp-based solution outside of Chromium. There is also a blog post worth reading from the author of "seccomp-nurse": SECCOMP as a Sandboxing solution ?.)

The illustration of this approach from the "seccomp-nurse" project:

enter image description here

A "flexible" seccomp possible in the future of Linux?

There used to appear in 2009 also suggestions to patch the Linux kernel so that there is more flexibility to the seccomp mode--so that "many of the acrobatics that we currently need could be avoided". ("Acrobatics" refers to the complications of writing a helper that has to execute many possibly innocent syscalls on behalf of the jailed process and of substituting the possibly innocent syscalls in the jailed process.) An LWN article wrote to this point:

One suggestion that came out was to add a new "mode" to seccomp. The API was designed with the idea that different applications might have different security requirements; it includes a "mode" value which specifies the restrictions that should be put in place. Only the original mode has ever been implemented, but others can certainly be added. Creating a new mode which allowed the initiating process to specify which system calls would be allowed would make the facility more useful for situations like the Chrome sandbox.

Adam Langley (also of Google) has posted a patch which does just that. The new "mode 2" implementation accepts a bitmask describing which system calls are accessible. If one of those is prctl(), then the sandboxed code can further restrict its own system calls (but it cannot restore access to system calls which have been denied). All told, it looks like a reasonable solution which could make life easier for sandbox developers.

That said, this code may never be merged because the discussion has since moved on to other possibilities.

This "flexible seccomp" would bring the possibilities of Linux closer to providing the desired feature in the OS, without the need to write helpers that complicated.

(A blog posting with basically the same content as this answer: http://geofft.mit.edu/blog/sipb/33.)

namespaces (`unshare`)

Isolating through namespaces (unshare-based solutions) -- not mentioned here -- e.g., unsharing mount-points (combined with FUSE?) could perhaps be a part of a working solution for you wanting to confine filesystem accesses of your untrusted processes.

More on namespaces, now, as their implementation has been completed (this isolation technique is also known under the nme "Linux Containers", or "LXC", isn't it?..):

"One of the overall goals of namespaces is to support the implementation of containers, a tool for lightweight virtualization (as well as other purposes)".

It's even possible to create a new user namespace, so that "a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace. This means that the process has full root privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace".

For real working commands to do this, see the answers at:

and special user-space programming/compiling

But well, of course, the desired "jail" guarantees are implementable by programming in user-space (without additional support for this feature from the OS; maybe that's why this feature hasn't been included in the first place in the design of OSes); with more or less complications.

The mentioned ptrace- or seccomp-based sandboxing can be seen as some variants of implementing the guarantees by writing a sandbox-helper that would control your other processes, which would be treated as "black boxes", arbitrary Unix programs.

Another approach could be to use programming techniques that can care about the effects that must be disallowed. (It must be you who writes the programs then; they are not black boxes anymore.) To mention one, using a pure programming language (which would force you to program without side-effects) like Haskell will simply make all the effects of the program explicit, so the programmer can easily make sure there will be no disallowed effects.

I guess, there are sandboxing facilities available for those programming in some other language, e.g., Java.

Cf. "Sandboxed Haskell" project proposal.
NaCl--not mentioned here--belongs to this group, doesn't it?

Some pages accumulating info on this topic were also pointed at in the answers there:

Shell – Permanently switching to zsh in a script, without being root and without being asked for the password

If you don't have the permission to change your login shell, you can tell bash (I assume bash is your login shell) to replace itself with zsh. In your ~/.bash_login, add this line:

exec /bin/zsh --login