One of our systems has a growing log file (we will be addressing) but currently the application owner will delete the file with rm
then wait for the next maintenance window to reboot. I find myself with weeks until the next maintenance window and a disk with 100% utilization.
Following guidance from this post I located the file and truncated it. The issue now is the program/process does not appear to be written logs anywhere. What is the best way to get this process to stop using the old file and start using the 'new file'?
# find /proc/*/fd -ls | grep '(deleted)'|grep path
112567191 0 l-wx------ 1 user1 group1 64 Feb 20 14:10 /proc/27312/fd/2 -> /path/file.log\ (deleted)
# > "/proc/27312/fd/2"
# find /proc/*/fd -ls | grep '(deleted)'|grep path
112567191 0 l-wx------ 1 user1 group1 64 Feb 20 14:10 /proc/27312/fd/2 -> /path/file.log\ (deleted)
# stat /path/file.log
File: ‘/path/file.log’
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 811h/2065d Inode: 2890717 Links: 1
Access: (0644/-rw-r--r--) Uid: (54322/loc_psoft) Gid: (54321/oinstall)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2019-02-20 12:44:42.738686325 -0500
Modify: 2019-02-08 11:38:19.741494973 -0500
Change: 2019-02-08 11:38:19.741494973 -0500
Birth: -
# stat /proc/27312/fd/2
File: ‘/proc/27312/fd/2’ -> ‘/path/file.log (deleted)’
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: 3h/3d Inode: 112567191 Links: 1
Access: (0300/l-wx------) Uid: (54322/loc_psoft) Gid: (54321/oinstall)
Context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Access: 2019-02-20 14:10:45.155518866 -0500
Modify: 2019-02-20 14:10:45.154518886 -0500
Change: 2019-02-20 14:10:45.154518886 -0500
Birth: -
At this time I don't have a disk space issue, I only have the issue of logs not being written.
UPDATE 1:
The PID can be found using lsof +L1|grep $path
and is it located in the "held" file path as well proc/$PID/fd/N
.
I haven't been able to sell an interuption to the deciders yet, either as a init 6
or kill 1 $PID
. I'm going to try and recreate the issue elsewhere and give a few of the sugestions here and I've dug up.
Best Answer
The program in question will have to be altered or, simply, restarted.
What appears to be happening is that the program is opening a file handle for writing to the log, and keeping that selfsame file handle open for the duration. If the file is removed, as you describe it is "held" in abeyance and indeed is still written to until the file handle is closed.
If you can alter the program to change it from (pseudocode):
to (pseudocode):
That will solve the issue.
If you cannot, then rather than rebooting the entire system, a file which has been
rm
ed will be well and truly gone once all processes holding open a file handle have been closed (or, more specifically, their file handles have been closed).Most well-written daemons will incidentally cycle their file handles if sent SIGHUP (read your program's documentation!). But simply stopping (or terminating) and restarting the program will also release any open file handles.