Bash – What happens if I start too many background jobs

background-processbashexpectjobstelnet

I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?

I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.

If I did try to start 700 of them in some sort of loop like this:

for node in `ls ~/sagLogs/`; do  
    foo &  
done

With

CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz
Memory 47.94 GB

My question is:

Could all 700 instances possibly run concurrently?
How far could I get until my server reaches its limit?
When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.

Best Answer

Could all 700 instances possibly run concurrently?

That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.

How far could I get until my server reaches its limit?

This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:

The entire run-time memory requirements of one job, times 700.
The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).
Any other memory requirements on the system.

Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:

How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.
How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.
Many other things I probably haven't thought of.

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.

What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).

Related Solutions

Shell – Where do background jobs go

Your background job continues executing until someone tells it to stop by sending it a signal. There are several ways it might die:

When the terminal goes away for any reason, it sends a HUP signal (“hangup”, as in modem hangup) to the shell running inside it (more precisely, to the controlling process) and to the process in the foreground process group. A program running in the background is thus not affected, but…
When the shell receives that HUP signal, it propagates it to the background jobs. So if the background process is not ignoring the signal, it dies at this point.
If the program tries to read or write from the terminal after it's gone away, the read or write will fail with an input/output error (EIO). The program may then decide to exit.
You (or your system administrator), of course, may decide to kill the program at any time.

If your concern is to keep the program running, then:

If the program may interact with the terminal, use Screen or Tmux to run the program in a virtual terminal that you can disconnect from and reconnect to at will.
If the program just needs to keep running and is not interactive, start it with the nohup command (nohup myprogram --option somearg), which ensures that the shell won't send it a SIGHUP, redirects standard input to /dev/null and redirects standard output and standard error to a file called nohup.out.
If you've already started the program and don't want it to die when you close your terminal, run the disown built-in, if your shell has one. If it doesn't, you can avoid the shell's propagation of SIGHUP by killing the shell with extreme prejudice (kill -KILL $$ from that shell, which bypasses any exit trigger that the indicated process has).
If you've already started the program and would like to reattach it to another terminal, there are ways, but they're not 100% reliable. See How can I disown a running process and associate it to a new screen shell? and linked questions.

Bash – Is it possible to customise the prompt to show the if there are any background jobs

Put \j in your prompt. From the bash manual:

\j The number of jobs currently managed by the shell

^{Just remember that prompts do go stale and jobs can finish at any time, so if you have left the terminal idle, you'll want to redisplay the prompt.}

At the cost of requiring an extra process just to print your prompt, you can make the \j only appear if any jobs exist.

PROMPT_COMMAND='hasjobs=$(jobs -p)'
PS1='${hasjobs:+\j }\$ '

Best Answer

Related Solutions

Shell – Where do background jobs go

Bash – Is it possible to customise the prompt to show the if there are any background jobs

Related Question