How to display curl’s individual Exit Status from multiple requests

bashcurlurl

My question is simple – is there a way to display curl's individual Exit Status for each URL when curl is doing multiple requests?

Let's imagine that I need to check sites a.com, b.com, c.com and see their:

HTTP return code
if HTTP return code is 000, I need to display curl's exit code.

NOTE – a.com, b.com, c.com are used as an example in this code/question. In the real script, I do have a list of valid URLs – more than 400 of them with non-overlapping patterns – and they return a variety of HTTP codes – 200/4xx/5xx as well as 000.

The 000 is the case when curl could not make a connection but provides Exit Codes to understand what prevented it to establish a connection. In my cases, there are a number of exit codes as well – 6, 7, 35, 60.

I tried to run the following code

unset a
unset rep
a=($(curl -s --location -o /dev/null -w "%{response_code}\n" {https://a.com,https://b.com,https://a.com}))
rep+=("$?")
printf '%s\n' "${a[@]}"
echo
printf '%s\n' "${rep[@]}"

While the above code returns the HTTP return code for each individual request, the Exit Code is displayed only from the last request.

I do need the ability to log individual Exit Code when I supply multiple URLs to curl.
Is there a workaround/solution for this problem?

Some additional information: currently I put all my URLs in an array and run a cycle thru it checking each URL separately. However, going thru 400 URLs takes 1-2 hours and I need to somehow speed up the process.
I did try to use -Z with curl. While it did speed up the process about 40-50%, it didn't help because in addition to show only the above-mentioned last Exit Status, the Exit Status, in this case, is always displayed as 0, which is not correct.

P.S. I am open to using any other command-line tool if it can resolve the above problem – parallel checking of 10s/100s of URLs with logging of their HTTP codes and if the connection can't be established – log additional information like curl's Exit Codes do.

Thanks.

Best Answer

Analysis

The exit code is named "exit code" because it is returned when a command exits. If you run just one curl then it will exit exactly once.

curl, when given one or more URLs, might provide a way to retrieve a code equivalent to the exit code of separate curl handling just the current URL; it would be something similar to %{response_code} you used. Unfortunately it seems there is no such functionality (yet; add it maybe). To get N exit codes you need N curl processes. You need to run something like this N times:

curl … ; echo "$?"

I understand your N is about 400, you tried this in a loop and it took hours. Well, spawning 400 curls (even with 400 echos, if echo wasn't a builtin; and even with 400 (sub)shells, if needed) is not that time consuming. The culprit is in the fact you run all these synchronously (didn't you?).

Simple loop and its problems

It's possible to loop and run the snippet asynchronously:

for url in … ; do
   ( curl … ; echo "$?" ) &
done

There are several problems with this simple approach though:

You cannot easily limit the number of curls that run simultaneously, there is no queue. This can be very bad in terms of performance and available resources.
Concurrent output from two or more commands (e.g from two or more curls) may get interleaved, possibly mid-line.
Even if output from each command separately looks fine, curl or echo from another subshell may cut in between curl and its corresponding echo.
There is no guarantee a subshell invoked earlier starts (or ends) printing before a subshell invoked later.

`parallel`

The right tool is parallel. Basic variant of the tool (from moreutils, at least in Debian) solves (1). It probably solves (2) in some circumstances. This is irrelevant anyway because this variant does not solve (3) or (4).

GNU parallel solves all these problems.

It solves (1) by design.
It solves (2) and (3) with its --group option:

--group
Group output. Output from each job is grouped together and is only printed when the command is finished. Stdout (standard output) first followed by stderr (standard error). […]

^(source)

which is the default, so usually you don't have to use it explicitly.
It solves (4) with its --keep-order option:

--keep-order
-k
Keep sequence of output same as the order of input. Normally the output of a job will be printed as soon as the job completes. […] -k only affects the order in which the output is printed - not the order in which jobs are run.

^(source)

In Debian GNU parallel is in a package named parallel. The rest of this answer uses GNU parallel.

Basic solution

<urls parallel -j 40 -k 'curl -s --location -o /dev/null -w "%{response_code}\n" {}; echo "$?"'

where urls is a file with URLs and -j 40 means we allow up to 40 parallel jobs (adjust it to your needs and abilities). In this case it's safe to embed {} in the shell code. It's an exception explicitly mentioned in this answer: Never embed {} in the shell code!

The output will be like

Note the single-quoted string is the shell code. Within it you can implement some logic, so exit code 0 is never printed. If I were you I would print it anyway, in the same line, on the leading position:

<urls parallel -j 40 -k '
   out="$(
      curl -s --location -o /dev/null -w "%{response_code}" {}
   )"
   printf "%s %s\n" "$?" "$out"'

Now even if some curl is manually killed before it prints, you will get something in the first column. This is useful for parsing (we'll return to it). Example:

where 143 means curl was terminated (see Default exit code when process is terminated).

With arrays

If your URLs are in an array named urls, avoid this syntax:

parallel … ::: "${urls[@]}"    # don't

parallel is an external command. If the array is large enough then you will hit argument list too long. Use this instead:

printf '%s\n' "${urls[@]}" | parallel …

It will work because in Bash printf is a builtin and therefore everything before | is handled internally by Bash.

To get from urls array to a and rep arrays, proceed like this:

unset a
unset rep
while read -r repx ax; do
   rep+=("$repx")
   a+=("$ax")
done < <(printf '%s\n' "${urls[@]}" \
         | parallel -j 40 -k '
              out="$(
                 curl -s --location -o /dev/null -w "%{response_code}" {}
              )"
         printf "%s %s\n" "$?" "$out"')
printf '%s\n' "${a[@]}"
echo
printf '%s\n' "${rep[@]}"

Notes

If we generated exit codes in the second column (which is easier, you don't need a helper variable like out) and adjusted our read accordingly, so it's read -r ax repx, then a line <empty ax><space>143 would save 143 into ax because read ignores leading spaces (it's complicated). By reversing the order we avoid a bug in our code. A line like 143<space><empty ax> is properly handled by read -r repx ax.
You will hopefully be able to check 400 URLs in few minutes. The duration depends on how many jobs you allow in parallel (parallel -j …), but also on:
- how fast the servers respond;
- how much data and how fast curls download;
- options like --connect-timeout and --max-time (consider using them).

Related Solutions

Mac – How to set the exit status for emacsclient

In relatively current versions of Emacs (e.g. in my copy of Emacs 24.2, but not the OS X's distribution of emacs 22.1), you can write Elisp code that will send a command to the emacsclient telling it to exit with an error exit status.

This is easier than it sounds.

There is a buffer-local variable, server-buffer-clients, with the clients that are attached to the buffer. And the function server-send-string can be used to communicate commands following the server-process-filter protocol.

For example:

(server-send-string (car server-buffer-clients) "-error die")

causes (one of the) emacsclient(s) associated with the buffer to immediate issue the text

*ERROR*: die

and then exit with exit code 1.

So, it is pretty easy to define an interactive function that you could call from emacs itself to kill off the emacsclients:

(defun tell-emacsclients-for-buffer-to-die ()
  "Sends error exit command to every client for the current buffer."
  (interactive)
  (dolist (proc server-buffer-clients)
    (server-send-string proc "-error die")))

With the above in your .emacs file (and a sufficiently current version of Emacs), you can invoke M-x tell-emacsclients-for-buffer-to-die to make the emacsclients exit with error status. (And of course you could bind this function to an appropriate alternate key sequence.)

Footnote

Ideally, one would then couple the function above with a hook on the server-kill-buffer function to accomplish the goal number (2.) in the original question. (That is, killing the buffer without saving it via C-x # should fire off the same error exits from emacsclient.)

However, my attempts to add this to the kill-buffer-hook have failed, because the server-kill-buffer function is put on the front of the kill-buffer-hook list after the server-visit-hook hooks have been run, and so the default server-kill-buffer function will run first. (One could fix up the kill-buffer-hook afterward, but I am not yet sure where to put the code to do that into the Elisp control flow.)

Update: Okay, here's a really hacky way to accomplish the above:

(defun kill-buffer-with-special-emacsclient-handling ()
  (interactive)
  (add-hook 'kill-buffer-hook 'tell-emacsclients-for-buffer-to-die nil t)
  (kill-buffer))

(global-set-key (kbd "C-x k") 'kill-buffer-with-special-emacsclient-handling)

Update 2: Slightly more robust variant:

(defun kill-buffer-with-special-emacsclient-handling ()
  "Wrapper around kill-buffer that ensures tell-emacsclients-for-buffer-to-die is on the hooks"
  (interactive)
  (add-hook 'kill-buffer-hook 'tell-emacsclients-for-buffer-to-die nil t)
  (kill-buffer))

;; (global-set-key (kbd "C-x k") 'kill-buffer)

(defun install-emacsclient-wrapped-kill-buffer ()
  "Installs wrapped kill-buffer with special emacsclient handling.
Best not to install it unconditionally because the server is not
necessarily running."
  (interactive)
  (global-set-key (kbd "C-x k") 'kill-buffer-with-special-emacsclient-handling))

(add-hook 'server-switch-hook 'install-emacsclient-wrapped-kill-buffer)

Curl http_code of 000

The response 000 indicates that cURL failed to execute for some reason. In such a case, you should test for cURL exit code rather than making assumptions. See the "Exit Codes" section of the curl manpage for a full list of exit codes and their meanings.

Failed DNS resolution (6)

$ curl -w "%{http_code}\n" http://example.invalid/ ; echo "Exit code: $?"
000
curl: (6) Could not resolve host: example.invalid
Exit code: 6

(As answered by ILIV)

Connection refused (7)

$ curl -w "%{http_code}\n" http://localhost:81/ ; echo "Exit code: $?"
000
curl: (7) Failed to connect to localhost port 81: Connection refused
Exit code: 7

Connection timed out (28)

$ curl -w "%{http_code}\n" -m 5 http://10.255.255.1/ ; echo "Exit code: $?"
000
curl: (28) Connection timed out after 5001 milliseconds
Exit code: 28

(As answered by Arun)

Server actually returns 000 for some reason (0)

Start a fake server:

$ nc -l -p 65535 & <<EOF
> HTTP/1.1 000 Fake Status Code
> Content-Length: 0
> Connection: close
>
> EOF

Client request:

$ curl -w "%{http_code}\n" http://localhost:65535/ ; echo "Exit code: $?"
000
Exit code: 0

No idea why this would happen in the real world, but hey. If cURL doesn't get a valid status code at all, it assumes 200.