UPDATED:
I've found using this Bash script fixes the problem of having GIF files with a .jpg extension.
I'm attempting to download images from a forum who's url uses the following format:
http://www.someforum.com/attachment.php&id=XXX
I wrote a bash script that uses wget
to retrieve these images:
for i in {1..10}
do
wget --accept .jpg,.jpeg --cookies=on --load-cookies=cookies.txt -p "http://www.someforum.com/attachment.php&id=${i}" -O "image${i}.jpg"
done
It works and downloads the images. However if there isn't an image it still downloads the resulting HTML and stuffs it in XX.jpg
.
Curl does the same:
for i in {1..10}
do
curl --cookie cookies.txt "http://www.someforum.com/attachment.php&id=${i}" -o "image${i}.jpg"
done
Is there anyway to reject results that are not /image/*
? Right now I am assuming that the images are jpeg, it would be nice to detect the MIME/TYPE and use the appropriate filename.
Finally, wget is giving 500 response codes when an image isn't found, if I can filter 200 response codes this may yield a solution.
Bash, Ruby, Python answers are acceptable.
Best Answer
wget returns a non-zero exit code on error; it specifically sets exit status == 8 if the remote issued a 4xx or 5xx status. So, you can modify your bash loop to unlink the file if wget doesn't exit with success:
Similarly, curl has a --fail option, with which it wont save the file and returns exit status 22 when the http status is >= 400.