I'm unsure which version of wget
or OS and any proxy's exist between you and sourceforge but wget
downloaded the file when I removed the "/download" and left it at the file extension.
I don't want to flood the post or pastebin my entire session but I got the 302 then 200 status codes before the transfer began. What happens when you try wget
?
Resolving downloads.sourceforge.net... 216.34.181.59
Connecting to downloads.sourceforge.net|216.34.181.59|:80... connected.
HTTP request sent, awaiting response... 302 Found
[snipped for brevity]
HTTP request sent, awaiting response... 200 OK
Length: 13432789 (13M) [application/x-gzip]
Saving to: `download'
-A "Sample" kind of acts like a Sample* would in bash
Not by my reading of man wget
:
- -A acclist --accept acclist
- -R rejlist --reject rejlist
Specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if
any of the wildcard characters, *, ?, [ or ], appear in an element of acclist or rejlist, it will
be treated as a pattern, rather than a suffix.
So your usage (no wildcards) is equivalent to the bash glob *.Sample
.
Wget works by scanning links, which is probably why it is trying to download an index.html
(you haven't said what the content of that is, if any, just that it took a long time) -- it has to have somewhere to start. To explain further: an URL is not a file path. You cannot scan a web server as if it were a directory hierarchy, saying, "give me all the files in directory foobar
". Iffoobar
corresponds to a real directory (it certainly doesn't have to, because it's part of an URL, not a file path), a web server may be configured to provide an autogenerated index.html listing the files, providing the illusion that you can browse the filesystem. But that's not part of the HTTP protocol, it's just a convention used by default with servers like apache. So what wget
does is scan, e.g., index.html
for <a href=
and <img src=
, etc., then it follows those links and does the same thing, recursively. That's what wget's "recursive" behaviour refers to -- it recursively scans links because (to reiterate), it does not have access to any filesystem on the server, and the server does not have to provide it with ANY information regarding such.
If you have an actual .html
web page that you can load and click through to all the things you want, start with that address, and use just -r -np -k -p
.
Best Answer
I think your
?
gets interpreted by shell (Correction by vinc17: more likely, it's the&
which gets interpreted).Just try with simple quotes around your URL:
Note that the file you are requesting is a
.tar
file but the above command will save it asindex.html?acc=GSE48191&format=file
. To have it correctly named, you can either rename it to.tar
:Or you can give the name as an option to
wget
:The above command will save the downloaded file as
GSE48191.tar
directly.