My question is simple – is there a way to display curl's individual Exit Status for each URL when curl is doing multiple requests?
Let's imagine that I need to check sites a.com
, b.com
, c.com
and see their:
- HTTP return code
- if HTTP return code is
000
, I need to display curl'sexit code
.
NOTE – a.com
, b.com
, c.com
are used as an example in this code/question. In the real script, I do have a list of valid URLs – more than 400 of them with non-overlapping patterns – and they return a variety of HTTP codes – 200/4xx/5xx as well as 000.
The 000 is the case when curl could not make a connection but provides Exit Codes to understand what prevented it to establish a connection. In my cases, there are a number of exit codes as well – 6, 7, 35, 60.
I tried to run the following code
unset a
unset rep
a=($(curl -s --location -o /dev/null -w "%{response_code}\n" {https://a.com,https://b.com,https://a.com}))
rep+=("$?")
printf '%s\n' "${a[@]}"
echo
printf '%s\n' "${rep[@]}"
While the above code returns the HTTP return code for each individual request, the Exit Code is displayed only from the last request.
000
000
000
60
I do need the ability to log individual Exit Code when I supply multiple URLs to curl.
Is there a workaround/solution for this problem?
Some additional information: currently I put all my URLs in an array and run a cycle thru it checking each URL separately. However, going thru 400 URLs takes 1-2 hours and I need to somehow speed up the process.
I did try to use -Z with curl. While it did speed up the process about 40-50%, it didn't help because in addition to show only the above-mentioned last Exit Status, the Exit Status, in this case, is always displayed as 0, which is not correct.
P.S. I am open to using any other command-line tool if it can resolve the above problem – parallel checking of 10s/100s of URLs with logging of their HTTP codes and if the connection can't be established – log additional information like curl's Exit Codes do.
Thanks.
Best Answer
Analysis
The exit code is named "exit code" because it is returned when a command exits. If you run just one
curl
then it will exit exactly once.curl
, when given one or more URLs, might provide a way to retrieve a code equivalent to the exit code of separatecurl
handling just the current URL; it would be something similar to%{response_code}
you used. Unfortunately it seems there is no such functionality (yet; add it maybe). To get N exit codes you need Ncurl
processes. You need to run something like this N times:I understand your N is about 400, you tried this in a loop and it took hours. Well, spawning 400
curl
s (even with 400echo
s, ifecho
wasn't a builtin; and even with 400 (sub)shells, if needed) is not that time consuming. The culprit is in the fact you run all these synchronously (didn't you?).Simple loop and its problems
It's possible to loop and run the snippet asynchronously:
There are several problems with this simple approach though:
curl
s that run simultaneously, there is no queue. This can be very bad in terms of performance and available resources.curl
s) may get interleaved, possibly mid-line.curl
orecho
from another subshell may cut in betweencurl
and its correspondingecho
.parallel
The right tool is
parallel
. Basic variant of the tool (frommoreutils
, at least in Debian) solves (1). It probably solves (2) in some circumstances. This is irrelevant anyway because this variant does not solve (3) or (4).GNU
parallel
solves all these problems.It solves (1) by design.
It solves (2) and (3) with its
--group
option:(source)
which is the default, so usually you don't have to use it explicitly.
It solves (4) with its
--keep-order
option:(source)
In Debian GNU
parallel
is in a package namedparallel
. The rest of this answer uses GNUparallel
.Basic solution
where
urls
is a file with URLs and-j 40
means we allow up to 40 parallel jobs (adjust it to your needs and abilities). In this case it's safe to embed{}
in the shell code. It's an exception explicitly mentioned in this answer: Never embed{}
in the shell code!The output will be like
Note the single-quoted string is the shell code. Within it you can implement some logic, so exit code
0
is never printed. If I were you I would print it anyway, in the same line, on the leading position:Now even if some
curl
is manually killed before it prints, you will get something in the first column. This is useful for parsing (we'll return to it). Example:where
143
meanscurl
was terminated (see Default exit code when process is terminated).With arrays
If your URLs are in an array named
urls
, avoid this syntax:parallel
is an external command. If the array is large enough then you will hitargument list too long
. Use this instead:It will work because in Bash
printf
is a builtin and therefore everything before|
is handled internally by Bash.To get from
urls
array toa
andrep
arrays, proceed like this:Notes
If we generated exit codes in the second column (which is easier, you don't need a helper variable like
out
) and adjusted ourread
accordingly, so it'sread -r ax repx
, then a line<empty ax><space>143
would save143
intoax
becauseread
ignores leading spaces (it's complicated). By reversing the order we avoid a bug in our code. A line like143<space><empty ax>
is properly handled byread -r repx ax
.You will hopefully be able to check 400 URLs in few minutes. The duration depends on how many jobs you allow in parallel (
parallel -j …
), but also on:curl
s download;--connect-timeout
and--max-time
(consider using them).