OK.. All photos in the 2 albums are retrieved.
As to how, and as can be ascertained from comments I made and with michail's remarks.
There are two albums..
http://www.fotka.pl/profil/AlekSanDraa2601/albumy/
one has 100 photos, the other 63 photos.
Here the one with 100 of them
http://www.megaupload.com/?d=30RWXKN9
Here's the album with 63 of them
http://www.megaupload.com/?d=CC27NM41
Taking the source code from here, first album
http://www.fotka.pl/profil/AlekSanDraa2601/albumy/1,Ja/74892555
Extracting the image URLs
All the thumbnails end in _72_p.jpg
We don't want them we want the larger versions, they require in the URL that amin.fotka be changed to a.fotka, and _72_p be changed to _500_s
This is the same for the second album.. so for example, for the second album with 63 photos
http://www.fotka.pl/profil/AlekSanDraa2601/albumy/2,Fotki_z_2011-09-25/75893982,,1319485161
here is blist3.txt A list with all the JPGs listed in _72_p form
http://pastebin.com/raw.php?i=Y2nXfAXT
You can get that with a line like this..
C:\>type source.txt | grep -oE "http://.*?\.jpg" >urls
edit the source to remove any miscellaneous parts.. like HTML attributes, obvious things that shouldn't be there.
or use this line which is better and should just get them all without anything miscellaneous to remove.
C:\>type source.txt | grep -oE "http://[^ ]*\.jpg" >urls
You have more URLs than you want there, for the second album, that command gives 97 and you only want the ones with _72_p in the URL
So | grep -E "72_p"
so you get a list of just the photos you want.
C:\>type list.txt | wc -l
63
see there are 63 in that file, the right number.
that is all of them in that album. All 63
wget -i list.txt -w 3
http://www.megaupload.com/?d=CC27NM41
So that's all of them, all 163(100+63) of them, from the two albums.
This is the line one would use to take a list of the JPGs
listps2.txt is a file with all JPGs, both relevant ones and irrelevant ones. The relevant ones are in 72_p form, extract the relevant ones with grep. And change them with SED. put them in "thatfile", and you can then do wget -i thatfile -w 3. As I did.
C:\>type listps2.txt | grep "72_p" | sed "s/_72_p/_500_s/" | sed "s/amin\.fotka/a.fotka/" >thatfile
C:\>wget -i thatfile
I think what you are looking for is the --cut-dirs
option. Used in conjunction with the -nH
(no hostname) option, you can specify exactly which level of directory you want to appear in your local output. As an example, I have a .pkg download that I want to write to my local directory, and I don't want all of the parent tree to be included, just the subdirectories. In this case, my starting point to just get the .pkg name as the parent directory is 5 levels down:
wget -np -nH --cut-dirs 5 -r http://www.myhost.org/pub/downloads/My_Drivers/OS_10_5_x/Letter_Format/driver_C123_105.pkg
What you will see, then, is the name driver_C123_105.pkg in your current directory.
% ls -lt | head
drwxr-xr-x 12 rob rob 408 Feb 22 12:54 driver_C123_105.pkg
-rw-------@ 1 rob rob 0 Feb 16 15:59 1kPSXcUj.pdf.part
-rw-------@ 1 rob rob 842 Feb 3 14:47 WcUuL69s.jnlp.part
[...etc...]
% find driver_C123_105.pkg
driver_C123_105.pkg
driver_C123_105.pkg/Contents
driver_C123_105.pkg/Contents/Archive.bom
driver_C123_105.pkg/Contents/Archive.pax.gz
driver_C123_105.pkg/Contents/index.html
driver_C123_105.pkg/Contents/index.html?C=D;O=A
driver_C123_105.pkg/Contents/index.html?C=D;O=D
driver_C123_105.pkg/Contents/index.html?C=M;O=A
driver_C123_105.pkg/Contents/index.html?C=M;O=D
driver_C123_105.pkg/Contents/index.html?C=N;O=A
driver_C123_105.pkg/Contents/index.html?C=N;O=D
driver_C123_105.pkg/Contents/index.html?C=S;O=A
driver_C123_105.pkg/Contents/index.html?C=S;O=D
driver_C123_105.pkg/Contents/Info.plist
driver_C123_105.pkg/Contents/PkgInfo
driver_C123_105.pkg/Contents/Resources
driver_C123_105.pkg/Contents/Resources/background.jpg
[.....etc....]
You can direct this output to go elsewhere with the -P
option.
Best Answer
I want to assume you've not tried this:
or to retrieve the content, without downloading the "index.html" files:
Reference: Using wget to recursively fetch a directory with arbitrary files in it