Machine A
's file system is mounted on machine B
via sshfs
. A process running on B
, which was initiated from A
by ssh
touch
es a file on the mounted file system of A
to communicate a signal; the signal (file) is then removed by A
. This works reliably the first time the file is created/destroyed (touch
/rm
).
However, if a second process (again, running on B
, spawned from A
) tries to touch exactly the same file, the following error is sporadically thrown:
`touch: cannot touch '/path/to/file': No such file or directory`.
The path is valid as judged by the fact that attempts to touch
it manually after the error is thrown are uniformly successful. As mentioned, the error is sporadic (complicating attempts to debug), but only occurs when the file is touched after already having undergone a cycle of creation/deletion.
The actions that intermittently produce an error (touch
, rm
, touch
) are separated in time so concurrent access is unlikely to be the culprit (i.e., the second touch does not happen until the file produced by the first touch is removed). Thinking the cause might stem from file system buffering, sync
is called from A
after removing the file, to no avail. Calling sync
from B
immediately before touching the file also does not help, though I do not know whether B
's sync
call affects the file system of A
(the version of sync
on B
lacks the -f
option for explicit file system specification; I tried to call sync
on A
from the process running on B
via ssh user@A sync
before touch
ing but the process seems to exit without error just after the sync-over-ssh call since the remaining lines including the touch
statement are not executed; perhaps because it is not possible to ssh
from the server back to the client on a process initiated by ssh
from the client to the server).
How may the cause of this file system-related error be determined?
Best Answer
You can investigate what might be happening by running sshfs with option
-o debug
. It prints a lot of information on the basic filesystem operations done by atouch test
command. An example operation is:The relevant part is that a
getattr
call was done, and it ended in success. When you do a successful touch on a non-existant file the operations we see are (removing the details):We see the getattr test for the file fails, which is normal as it does not exist, so we go on to create the file.
If the file is now removed on the server, and we do the touch again on the client we see a different sequence:
Now the getattr says the file still exists, so
touch
goes on toopen()
the file, but this results in your error message: the file does not really exist at all.So it all seems to be a problem of the client cache of what files exist being too slow to catch up with changes at the remote. The simplest answer is to mount your remote with a shorter timeout for the
getattr
call, i.e. thestat()
system call. This should work for you