Limit maximum number of concurrent scp processes running on a host

limitprocessscp

I am facing a problem where I have a fleet of servers which contain a lot of data. Each of the host runs many instances of a specific process p1, which makes several scp connections to other hosts in parallel to get the data it has to process. This in turn puts a lot of load on these hosts and many times they go down.

I am looking for ways through which I can limit the number of concurrent scp processes that can be run on a single host.

Most of the links pointed me to MaxStartup & MaxSessions settings in /etc/ssh/sshd_config which were more to do with limiting the number of ssh sessions that can be made/initiated at any given point etc.

Is there a specific config file for scp which can be used here? Or is there a way at the system level to limit the number of instances of a specific process/command that can run concurrently at a time?

Best Answer

scp itself has no such feature. With GNU parallel you can use the sem command (from semaphore) to arbitrarily limit concurrent processes:

sem --id scp -j 50 scp ...

For all processes started with the same --id, this applies a limit of 50 concurrent instances. An attempt to start a 51st process will wait (indefinitely) until one of the other processes exits. Add --fg to keep the process in the foreground (default is to run it in the background, but this doesn't behave quite the same a shell background process).

Note that the state is stored in ${HOME}/.parallel/ so this won't work quite as hoped if you have multiple users using scp, you may need a lower limit for each user. (It should also be possible override the HOME environment variable when invoking sem, make sure umask permits group write, and modify the permissions so they share state, I have not tested this heavily though, YMMV.)

parallel requires only perl and a few standard modules.

You might also consider using scp -l N where N is a transfer limit in kBps, select a specific cipher (for speed, depending on your required security), or disable compression (especially if the data is already compressed) to further reduce CPU impact.

For scp, ssh is effectively a pipe and an scp instance runs on each end (the receiving end runs with the undocumented -t option). Regarding MaxSessions, this won't help, "sessions" are multiplexed over a single SSH connection. Despite copious misinformation to the contrary, MaxSessions limits only the multiplexing of sessions per-TCP connection, not any other limit.

The PAM module pam_limits supports limiting concurrent logins, so if OpenSSH is built with PAM, and usePAM yes is present in the sshd_config you can set limit by username, group membership (and more). You can then set a hard maxlogins to limit the logins in /etc/security/limits.conf. However this counts up all logins per user, not just the new logins using just ssh, and not just scp, so you might run into trouble unless you have a dedicated scp user id. Once enabled, it will also apply to interactive ssh sessions. One way around this is to copy or symlink the sshd binary, calling it sshd-scp then you can use a separate PAM configuration file, i.e. /etc/pam.d/sshd-scp (OpenSSH calls pam_start() with the "service name" set to that of the binary it was invoked as). You'll need to run this on a separate port (or IP), and using a separate sshd_config is probably a good idea too. If you implement this, then scp will fail (exit code 254) when the limit is reached, so you'll have to deal with that in your transfer process.

(Other options include ionice and cpulimit, these may cause scp sessions to timeout or hang for long periods, causing more problems.)

The old school way of doing something similar is to use atd and batch, but that doesn't offer tuning of concurrency, it queues and starts processes when the load is below a specific threshold. A newer variation on that is Task Spooler that supports queueing and running jobs in a more configurable sequential/parallel way, with runtime reconfiguration supported (e.g. changing queued jobs and concurrency settings), though it offers no load or CPU related control itself.

Related Solutions

Ssh – Weird SSH/SCP progress meter behavior

This behaviour is easily explained through output buffer size and TCP window settings.

First, when receiving data, you either have the bits or you don't. Your local scp knows how much it is expecting and how much it has received so far, so it can give you an accurate assessment of the progress and estimated time remaining.

When you are sending data, you don't have any information about how much of that data has actually reached the receiver yet. Your local machine will have an output buffer that holds data after it has been "sent" by the application (scp) and before it is actually transmitted on the network. Additionally, TCP allows a certain amount of data to be "in flight" between the sender and the receiver.

When sending data, scp only sees how much it has handed off to the OS for eventual transmission. Filling up the output buffers happens really quickly, so that's why scp measures a high transmission rate initially. As the transfer progresses, this value converges toward the real transmission rate. After you have handed all the data off to your OS, it still has to reach the other side and that's why it appears to be "stuck" at 100% for several seconds at the end.

Modern OSes and TCP networks have increased the TCP window size (see TCP window scale option) to account for "long fat networks" with both high bandwidth and high latency. That is why you may be seeing this behaviour more often than you have in the past.

Does OpenBSD have a limit to the number of file descriptors

Taking a look at the source code, to get the default value of max open files:

Well documented code

extern int maxfiles;                 /* kernel limit on number of open files */

maxfiles, on param.c defines the formula to maxfiles

int maxfiles = 5 * (NPROCESS + MAXUSERS) + 80;

OK, we found it.

NPROCESS =

#define NPROCESS (30 + 16 * MAXUSERS)

MAXUSERS = - Lets take amd64 architecture as an example:

machine         amd64
include         "../../../conf/GENERIC"
maxusers        80                      # estimated number of users

Lets sum all the stuff:

maxfiles = 5 * ((30 + 16 * 80) + 80) + 80
maxfiles = 5 * ((30 + 1280) + 80) + 80
maxfiles = 5 * (1390) + 80
maxfiles = 6950 + 80
maxfiles = 7030

To increase the total of max open files, you will need first of all to increase the max open files kernel limit with sysctl kern.maxfiles=20000 and increase the number of files a process/user can open, editing login.conf. This Tor daemon setup have both examples for you.

Best Answer

Related Solutions

Ssh – Weird SSH/SCP progress meter behavior

Does OpenBSD have a limit to the number of file descriptors

Related Question