Sandboxing means running a program in a closed environment (such as no permission to open new files, no or restricted network access, etc.) in order to protect from malicious or erroneous software.
In Fedora Linux there is `policycoreutils` package which contains bin/sandbox based on SELinux.
This sandbox is not perfect however. So in this post I will describe some proposed updates and implementation considerations to improve this sandbox.
The first thing to say, it that it is implemented as two executables: a Python script which calls (if there are no error) the binary program written in C. In principle such hierarchical two-level structure should be eliminated (for example by performance considerations) and this should be instead done as a monolithic C program. However this is not urgent.
Consider two scenarios of ending execution of the sandboxed program:
- It terminates normally.
- The user or software does not want to wait more than 30 seconds and kills it with SIGKILL. (Somebody may argue that it is should be first hit by SIGTERM signal and be given time to exit gracefully before SIGKILL. But this probably doesn’t matter for a sandboxed program as it anyway has not opened any files and thus there is no need to close them.)
It terminates normally
In this case we (or rather the application which started a sandbox to do some calculations in it) need to know when it terminates.
Note that the sandboxed program may fork/exec childs and exit itself. Also it may call setsid().
At first it may seem that we can just waitpid() for the sandbox process, but the process may create children and exit itself. That would give false sense that our program really finished, while polluting process space by child processes which may not exit at all. As such I propose the sandbox process fork before loading the actual sandboxed program. The forked process would first move itself to a cgroup and then execute (now without forking) the actual sandboxed program. The original process would wait until the cgroup becomes empty.
To wait until a cgroup becomes empty is probably possible with 2.4 Notification API. (Please comment whether it can be done this way.) If it is impossible to implement, there should be conceived a Linux kernel patch.
The user or software does not want to wait more than 30 seconds
First we need to freeze this cgroup (so that no hacker would create new processes in a cgroup probably faster than we kill them).
Then we should recursively enumerate all processes in this cgroup and all its subgroups and kill every process with SIGKILL.
It seems that waiting for a cgroup to become empty can be done with “Notification API” of cgroups in the kernel.
We need to remove all files in the sandboxed process (if it for example creates files in /tmp). It can be done using ptrace syscall on Linux.
Another things (which can be made with SELinux) to be considered:
Even if networking in general is enabled, we should disallow the sandboxed program to use local nets such as 192.168.0.0/24 and others (and certainly also disallow 127.0.0.0/8).
However we can make an exception: querying UDP port 53 (that is DNS) should be allowed for local nets such as 192.168.0.0/24 and even for 127.0.0.0/8.