I have read the Wget manual, but unfortunately it does not seem to address my issue, so I would be most grateful if someone could offer me a bit of assistance.
We have a website, (say) website.com, which links directly to (say) website.com/1/, website.com/2/, … etc.
Now each page website.com/r/, where r is an integer, links to a number of pdf documents. Rather than them being located at website.com/r/doc-i.pdf – which would be convenient – they are all located at website.com/files/doc-i.pdf.
Thus, when I run the command wget -r -l 2 -A pdf website.com
, I will of course end up with a big folder named "files", with all the pdf documents contained within it.
I would much prefer, however, that they be organised into different folders named 1, 2, …, n, that correspond to the page from which they were downloaded. Since I will be downloading in total around 10,000 pdf files, I would rather not have to do this manually.
So how do I tell Wget to organise the files, not by the website directory structure, but by the route in which it took to get to the file?
I hope my explanation is clear, and that this is not too difficult to achieve.
Best Answer
(untested) The following needs some tunning, is just a general idea:
mv $b/website.com/files FINAL/$b
to reduce the levels