wget has such option as -np
which disables getting files from any parent directory. I need something similar but a bit more flexible. Consider:
www.foo.com/bar1/bar2/bar3/index.html
I would like to get everything but not "higher" (in the tree hierarchy) than bar2
(!). So bar2
should also be fetched but not bar1
.
Is there a way to make wget more selective?
Background: I'm trying to mirror a website, with a similar logical structure — starting point, then up, then down. If there is another tool than wget
, better suited for such layout, please let me know as well.
Update
Or instead of specifying possible depth up, maybe something like "no parents, unless they match this or that URL".
Update 2
There is some structure on the server, right? You can visualize it as a tree. So normally with "–no-parent" you start from some point A and go only down.
My wish, is ability to go up — expressed by saying, it is allowed to go up X nodes, or (which is 100% equivalent) that it is allowed to go up to B node (where the distance B-A=X).
In all cases, the rules for going down stays as were defined by users (for examples — go down only by Y levels).
How to store it? Actually it is not the question really — wget
by default recreates the server structure, there is nothing here to be afraid, or there is no need for fixing anything. So, in 2 words — as usual.
Update 3
Directory structure below — let's assume that in each directory there is only one file, in R — R.html and so on. This is simplified of course because you can have more than one page.
R
/ \
B G
/ \
C F
/ \
A D
/
E
A (A.html) is my starting point, X = 2 (so B is the most top level node I would like to fetch). In this particular example this means fetching all pages except R.html and G.html. A.html is called "starting point" because I have to start from it, not from B.
Update 4
Naming is used from Update 3.
wget OPTIONS www.foo.com/B/C/A/A.html
The question is what are the options to get all pages from directory B and below (knowing that you have to start from A.html).
Best Answer
I haven't tried it, but using -I and -X could give you what you want. My first tries would be along the line of
Explanation of options: