Bash – How you stop ‘wget’ after it gets a 404

bashshell-scriptwget

If you use brace expansion with wget, you can fetch sequentially-numbered images with ease:

$ wget 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'

It fetches the first 10 files numbered 90.jpg to 99.jpg just fine, but 100.jpg and onward return a 404: File not found error (I only have 100 images stored on the server). These non-existent files become more of "a problem" if you use a larger range, such as {00..200}, with 100 non-existent files, it increases the script's execution time, and might even become a slight burden (or at least annoyance) on the server.

Is there any way for wget to stop after it has received its first 404 error? (or even better, two in a row, in case there was a missing file in the range for another reason) The answer does not need to use brace expansion; loops are fine too.

Best Answer

If you're happy with a loop:

for url in 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
do
    wget "$url" || break
done

That will run wget for each URL in your expansion until it fails, and then break out of the loop.

If you want two failures in a row it gets a bit more complicated:

for url in 'http://www.iqandreas.com/sample-images/100-100-color/'{90..110}'.jpg'
do
    if wget "$url"
    then
        failed=
    elif [ "$failed" ]
    then
        break
    else
        failed=yes
    fi
done

You can shrink that a little with && and || instead of if, but it gets pretty ugly.

I don't believe wget has anything built in to do that.

Related Question