Program stall under user but runs under root

forkprocesssystem-calls

I am running R job under a normal user john and root. Interestingly, the program stalls under john user but runs quickly under root. Using strace, I found that when john runs the R, the process stalls for its child process. I guess the linux do not let the child process continue and the parent (main program) stays stalled infinitely. Is there any limitation on the number of fork/clone that a normal linux user can do? Any idea why this happens?

Anyway, here on this post I've described my start-point of problem.

Further information

Last lines of strace for john user (where program stalls):

lseek(255, -82, SEEK_CUR)               = 1746
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa12fd4f9d0) = 13302
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x43d060, [], SA_RESTORER, 0x311b432900}, {0x452250, [], SA_RESTORER, 0x311b432900}, 8) = 0
wait4(-1,  <unfinished ...>

Last lines of strace for root (where program runs completely):

lseek(255, -82, SEEK_CUR)               = 1746
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f81d8e239d0) = 13244
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x43d060, [], SA_RESTORER, 0x311b432900}, {0x452250, [], SA_RESTORER, 0x311b432900}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 13244
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fff54a591dc, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff)        = 0
rt_sigaction(SIGINT, {0x452250, [], SA_RESTORER, 0x311b432900}, {0x43d060, [], SA_RESTORER, 0x311b432900}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "\n### Local Variables: ***\n### mo"..., 1828) = 82
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "", 1828)                     = 0
exit_group(1) 

Best Answer

Use strace -f R to follow R and all its child processes as well. This should show the exact point where the child program hangs.

Some additional possible points to check:

as root (su - root), and as the user john, compare the outputs of:

ulimit -a  #will show all the "limits" set for that user. You may reach one of them?
set ; env  #maybe john & root don't have same PATH or some other thing changes (LD_LIBRARY_PATH? or another?)
grep $(whoami) /etc/passwd /etc/group  #see if john maybe needs to be in some group?
Related Question