Stop program from writing to a deleted file

filesystemslogsopen filesprocess

One of our systems has a growing log file (we will be addressing) but currently the application owner will delete the file with rm then wait for the next maintenance window to reboot. I find myself with weeks until the next maintenance window and a disk with 100% utilization.

Following guidance from this post I located the file and truncated it. The issue now is the program/process does not appear to be written logs anywhere. What is the best way to get this process to stop using the old file and start using the 'new file'?

# find /proc/*/fd -ls | grep  '(deleted)'|grep path
112567191    0 l-wx------   1 user1 group1       64 Feb 20 14:10 /proc/27312/fd/2 -> /path/file.log\ (deleted)

# > "/proc/27312/fd/2"

# find /proc/*/fd -ls | grep  '(deleted)'|grep path
112567191    0 l-wx------   1 user1 group1        64 Feb 20 14:10 /proc/27312/fd/2 -> /path/file.log\ (deleted)

 # stat /path/file.log
   File: ‘/path/file.log’
   Size: 0               Blocks: 0          IO Block: 4096   regular empty file
 Device: 811h/2065d      Inode: 2890717     Links: 1
 Access: (0644/-rw-r--r--)  Uid: (54322/loc_psoft)   Gid: (54321/oinstall)
 Context: unconfined_u:object_r:unlabeled_t:s0
 Access: 2019-02-20 12:44:42.738686325 -0500
 Modify: 2019-02-08 11:38:19.741494973 -0500
 Change: 2019-02-08 11:38:19.741494973 -0500
  Birth: -

# stat /proc/27312/fd/2
  File: ‘/proc/27312/fd/2’ -> ‘/path/file.log (deleted)’
  Size: 64              Blocks: 0          IO Block: 1024   symbolic link
Device: 3h/3d   Inode: 112567191   Links: 1
Access: (0300/l-wx------)  Uid: (54322/loc_psoft)   Gid: (54321/oinstall)
Context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Access: 2019-02-20 14:10:45.155518866 -0500
Modify: 2019-02-20 14:10:45.154518886 -0500
Change: 2019-02-20 14:10:45.154518886 -0500
 Birth: -

At this time I don't have a disk space issue, I only have the issue of logs not being written.

UPDATE 1:
The PID can be found using lsof +L1|grep $path and is it located in the "held" file path as well proc/$PID/fd/N.
I haven't been able to sell an interuption to the deciders yet, either as a init 6 or kill 1 $PID. I'm going to try and recreate the issue elsewhere and give a few of the sugestions here and I've dug up.

Best Answer

The program in question will have to be altered or, simply, restarted.

What appears to be happening is that the program is opening a file handle for writing to the log, and keeping that selfsame file handle open for the duration. If the file is removed, as you describe it is "held" in abeyance and indeed is still written to until the file handle is closed.

If you can alter the program to change it from (pseudocode):

LogFileHandle = OpenFileHandle( Logfile, 'wa' )
UpdateLog( log_entry ) {
    LogFileHandle.Write( log_entry )
}
do_literally_everything_forever()
LogFileHandle.Close()

to (pseudocode):

UpdateLog( log_entry ) {
    LogFileHandle = OpenFileHandle( Logfile, 'wa' )
    LogFileHandle.Write( log_entry )
    LogFileHandle.Close()
}
do_literally_everything_forever()

That will solve the issue.

If you cannot, then rather than rebooting the entire system, a file which has been rmed will be well and truly gone once all processes holding open a file handle have been closed (or, more specifically, their file handles have been closed).

Most well-written daemons will incidentally cycle their file handles if sent SIGHUP (read your program's documentation!). But simply stopping (or terminating) and restarting the program will also release any open file handles.