Way to disable wget from getting files from parent directories to given depth

mirrortreewget

wget has such option as -np which disables getting files from any parent directory. I need something similar but a bit more flexible. Consider:

www.foo.com/bar1/bar2/bar3/index.html

I would like to get everything but not "higher" (in the tree hierarchy) than bar2 (!). So bar2 should also be fetched but not bar1.

Is there a way to make wget more selective?

Background: I'm trying to mirror a website, with a similar logical structure — starting point, then up, then down. If there is another tool than wget, better suited for such layout, please let me know as well.

Update

Or instead of specifying possible depth up, maybe something like "no parents, unless they match this or that URL".

Update 2

There is some structure on the server, right? You can visualize it as a tree. So normally with "–no-parent" you start from some point A and go only down.

My wish, is ability to go up — expressed by saying, it is allowed to go up X nodes, or (which is 100% equivalent) that it is allowed to go up to B node (where the distance B-A=X).

In all cases, the rules for going down stays as were defined by users (for examples — go down only by Y levels).

How to store it? Actually it is not the question really — wget by default recreates the server structure, there is nothing here to be afraid, or there is no need for fixing anything. So, in 2 words — as usual.

Update 3

Directory structure below — let's assume that in each directory there is only one file, in R — R.html and so on. This is simplified of course because you can have more than one page.

        R 
       / \
      B   G
     / \
    C   F
   / \
  A   D
 /
E 

A (A.html) is my starting point, X = 2 (so B is the most top level node I would like to fetch). In this particular example this means fetching all pages except R.html and G.html. A.html is called "starting point" because I have to start from it, not from B.

Update 4

Naming is used from Update 3.

wget OPTIONS www.foo.com/B/C/A/A.html

The question is what are the options to get all pages from directory B and below (knowing that you have to start from A.html).

Best Answer

I haven't tried it, but using -I and -X could give you what you want. My first tries would be along the line of

wget -m -I bar1/bar2 -X "*" http://www.foo.com/bar1/bar2/bar3/index.html

Explanation of options:

-m: 
   --mirror
       Turn on options suitable for mirroring.  This option turns on recursion and time-stamping, sets
       infinite recursion depth and keeps FTP directory listings.  It is currently equivalent to -r -N -l
       inf --no-remove-listing.
-I: list
   --include-directories=list
       Specify a comma-separated list of directories you wish to follow when downloading.  Elements of
       list may contain wildcards.
-X: list
   --exclude-directories=list
       Specify a comma-separated list of directories you wish to exclude from download.  Elements of list
       may contain wildcards.
Related Question