More similar Qs with more answers worth attention:
NOTE: Some of the answers there point to specific solutions not yet mentioned here.
Actually, there are quite a few jailing tools with different implementation, but many of them are either not secure by design (like fakeroot
, LD_PRELOAD
-based), or not complete (like fakeroot-ng
, ptrace
-based), or would require root (chroot
, or plash
mentioned at fakechroot warning label).
These are just examples; I thought of listing them all side-by-side, with indication of these 2 features ("can be trusted?", "requires root to set up?"), perhaps at Operating-system-level virtualization Implementations.
In general, the answers there cover the full described range of possibilities and even more:
virtual machines/OS
kernel extension (like SELinux)
- (mentioned in comments here),
chroot
Chroot-based helpers (which however must be setUID root, because chroot
requires root; or perhaps chroot
could work in an isolated namespace--see below):
[to tell a little more about them!]
Known chroot-based isolation tools:
- hasher with its
hsh-run
and hsh-shell
commands. (Hasher was designed for building software in a safe and repeatable manner.)
- schroot mentioned in another answer
- ...
ptrace
Another trustworthy isolation solution (besides a seccomp
-based one) would be the complete syscall-interception through ptrace
, as explained in the manpage for fakeroot-ng
:
Unlike previous implementations, fakeroot-ng uses a
technology that leaves the
traced process no choice regarding whether it will use
fakeroot-ng's "services" or
not. Compiling a program statically, directly calling the
kernel and manipulating
ones own address space are all techniques that can be trivially
used to bypass
LD_PRELOAD based control over a process, and do not apply to
fakeroot-ng. It is,
theoretically, possible to mold fakeroot-ng in such a way as to have
total control
over the traced process.
While it is theoretically possible, it has not been done.
Fakeroot-ng does assume
certain "nicely behaved" assumptions about the process being
traced, and a process
that break those assumptions may be able to, if not totally escape
then at least
circumvent some of the "fake" environment imposed on it by
fakeroot-ng. As such,
you are strongly warned against using fakeroot-ng as a
security tool. Bug reports
that claim that a process can deliberatly (as opposed to inadvertly)
escape fake‐
root-ng's control will either be closed as "not a bug" or marked as
low priority.
It is possible that this policy be rethought in the future. For
the time being,
however, you have been warned.
Still, as you can read it, fakeroot-ng
itself is not designed for this purpose.
(BTW, I wonder why they have chosen to use the seccomp
-based approach for Chromium rather than a ptrace
-based...)
Of the tools not mentioned above, I have noted Geordi for myself, because I liked that the controlling program is written in Haskell.
Known ptrace-based isolation tools:
seccomp
One known way to achieve isolation is through the seccomp sandboxing approach used in Google Chromium. But this approach supposes that you write a helper which would process some (the allowed ones) of the "intercepted" file access and other syscalls; and also, of course, make effort to "intercept" the syscalls and redirect them to the helper (perhaps, it would even mean such a thing as replacing the intercepted syscalls in the code of the controlled process; so, it doesn't sound to be quite simple; if you are interested, you'd better read the details rather than just my answer).
More related info (from Wikipedia):
(The last item seems to be interesting if one is looking for a general seccomp
-based solution outside of Chromium. There is also a blog post worth reading from the author of "seccomp-nurse": SECCOMP as a Sandboxing solution ?.)
The illustration of this approach from the "seccomp-nurse" project:
A "flexible" seccomp possible in the future of Linux?
There used to appear in 2009 also suggestions to patch the Linux kernel so that there is more flexibility to the seccomp
mode--so that "many of the acrobatics that we currently need could be avoided". ("Acrobatics" refers to the complications of writing a helper that has to execute many possibly innocent syscalls on behalf of the jailed process and of substituting the possibly innocent syscalls in the jailed process.) An LWN article wrote to this point:
One suggestion that came out was to
add a new "mode" to seccomp. The API
was designed with the idea that
different applications might have
different security requirements; it
includes a "mode" value which
specifies the restrictions that should
be put in place. Only the original
mode has ever been implemented, but
others can certainly be added.
Creating a new mode which allowed the
initiating process to specify which
system calls would be allowed would
make the facility more useful for
situations like the Chrome sandbox.
Adam Langley (also of Google) has
posted a patch which does just that.
The new "mode 2" implementation
accepts a bitmask describing which
system calls are accessible. If one of
those is prctl(), then the sandboxed
code can further restrict its own
system calls (but it cannot restore
access to system calls which have been
denied). All told, it looks like a
reasonable solution which could make
life easier for sandbox developers.
That said, this code may never be
merged because the discussion has
since moved on to other possibilities.
This "flexible seccomp" would bring the possibilities of Linux closer to providing the desired feature in the OS, without the need to write helpers that complicated.
(A blog posting with basically the same content as this answer: http://geofft.mit.edu/blog/sipb/33.)
namespaces (unshare
)
Isolating through namespaces (unshare
-based solutions) -- not mentioned here -- e.g., unsharing mount-points (combined with FUSE?) could perhaps be a part of a working solution for you wanting to confine filesystem accesses of your untrusted processes.
More on namespaces, now, as their implementation has been completed (this isolation technique is also known under the nme "Linux Containers", or "LXC", isn't it?..):
"One of the overall goals of namespaces is to support the implementation of containers, a tool for lightweight virtualization (as well as other purposes)".
It's even possible to create a new user namespace, so that "a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace. This means that the process has full root privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace".
For real working commands to do this, see the answers at:
and special user-space programming/compiling
But well, of course, the desired "jail" guarantees are implementable by programming in user-space (without additional support for this feature from the OS; maybe that's why this feature hasn't been included in the first place in the design of OSes); with more or less complications.
The mentioned ptrace
- or seccomp
-based sandboxing can be seen as some variants of implementing the guarantees by writing a sandbox-helper that would control your other processes, which would be treated as "black boxes", arbitrary Unix programs.
Another approach could be to use programming techniques that can care about the effects that must be disallowed. (It must be you who writes the programs then; they are not black boxes anymore.) To mention one, using a pure programming language (which would force you to program without side-effects) like Haskell will simply make all the effects of the program explicit, so the programmer can easily make sure there will be no disallowed effects.
I guess, there are sandboxing facilities available for those programming in some other language, e.g., Java.
Some pages accumulating info on this topic were also pointed at in the answers there:
Best Answer
The solution must probably be based either on ptrace or namespaces (unshare).
ptrace-based solutions are probably less efficient then namespaces/unshare-based (but the latter technology is cutting-edge and is not well explored path, probably).
ptrace-based
UMView
As for ptrced-based solutions, thanks to the comments at https://stackoverflow.com/a/1019720/94687, I've discovered UMView:
The linked docs describe how to have a "copy-on-write view" of the host fs -- that's not exactly like performing a chroot. Exact intructions on how to achieve /-substitution in umview would be nice to have in an answer to my question (please write one if you figure out how to do this!).
umview must be open-source, because it is included in Ubuntu and Debian -- http://packages.ubuntu.com/lucid/umview.
"Confining programs"
Another implementation is described in http://www.cs.vu.nl/~rutger/publications/jailer.pdf, http://www.cs.vu.nl/~guido/mansion/publications/ps/secrypt07.pdf.
They have a change-root-ing policy rule, CHRDIR, whose effect is similar to chroot. (Section "The jailing policy")
However, they might have not published their source code (partially based on a modified strace http://www.liacs.nl/~wichert/strace/ -- Section "Implementation")...
geordi
Geordi (http://www.eelis.net/geordi/, https://github.com/Eelis/geordi) could probably be modified to make the wanted rewriting of file arguments to system calls in the jailed programs.
proot
PRoot is a ready to use ptrace-based tool for this. http://proot.me/:
unshare-based
user_namespaces support in the Linux kernel has got more mature since when the question was asked. Now you can play with performing a
chroot
as a normal with the help ofunshare
like in Simulate chroot with unshare: