Sometimes you need to unmount a filesystem or detach a loop device but it is busy
because of open file descriptors, perhaps because of a smb
server process.
To force the unmount, you can kill the offending process (or try kill -SIGTERM
), but that would close the smb
connection (even though some of the files it has open do not need to be closed).
A hacky way to force a process to close a given file descriptor is described here using gdb
to call close(fd)
.
This seems dangerous, however. What if the closed descriptor is recycled? The process might use the old stored descriptor not realizing it now refers to a totally different file.
I have an idea, but don't know what kind of flaws it has: using gdb
, open /dev/null
with O_WRONLY
(edit: an comment suggested O_PATH
as a better alternative), then dup2
to close the offending file descriptor and reuse its descriptor for /dev/null
. This way any reads or writes to the file descriptor will fail.
Like this:
sudo gdb -p 234532
(gdb) set $dummy_fd = open("/dev/null", 0x200000) // O_PATH
(gdb) p dup2($dummy_fd, offending_fd)
(gdb) p close($dummy_fd)
(gdb) detach
(gdb) quit
What could go wrong?
Best Answer
Fiddling with a process with
gdb
is almost never safe though may be necessary if there's some emergency and the process needs to stay open and all the risks and code involved is understood.Most often I would simply terminate the process, though some cases may be different and could depend on the environment, who owns the relevant systems and process involved, what the process is doing, whether there is documentation on "okay to kill it" or "no, contact so-and-so first", etc. These details may need to be worked out in a post-mortem meeting once the dust settles. If there is a planned migration it would be good in advance to check whether any processes have problematic file descriptors open so those can be dealt with in a non-emergency setting (cron jobs or other scheduled tasks that run only in the wee hours when migrations may be done are easily missed if you check only during daytime hours).
Write-only versus Read versus Read-Write
Your idea to reopen the file descriptor
O_WRONLY
is problematic as not all file descriptors are write-only. John Viega and Matt Messier take a more nuanced approach in the "Secure Programming Cookbook for C and C++" book and handle standard input differently than standard out and standard error (p. 25, "Managing File Descriptors Safely"):In the
gdb
case the descriptor (or alsoFILE *
handle) would need to be checked whether it is read-only or read-write or write-only and an appropriate replacement opened on/dev/null
. If not, a once read-only handle that is now write-only will cause needless errors should the process attempt to read from that.What Could Go Wrong?
How exactly a process behaves when its file descriptors (and likely also
FILE *
handles) are fiddled behind the scenes will depend on the process and will vary from "no big deal" should that descriptor never be used to "nightmare mode" where there is now a corrupt file somewhere due to unflushed data, no file-was-properly-closed indicator, or some other unanticipated problem.For
FILE *
handles the addition of afflush(3)
call before closing the handle may help, or may cause double buffering or some other issue; this is one of the several hazards of making random calls ingdb
without knowing exactly what the source code does and expects. Software may also have additional layers of complexity built on top offd
descriptors or theFILE *
handles that may also need to be dealt with. Monkey patching the code could turn into a monkey wrench easily enough.Summary
Sending a process a standard terminate signal should give it a chance to properly close out resources, same as when a system shuts down normally. Fiddling with a process with
gdb
will likely not properly close things out, and could make the situation very much worse.