Stop wget reusing existing connection

networkingwget

so I am trying to wget a specific webpage using this command in bash scripting:

wget --no-cookies --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" -O $2/content.html $1

And the result is that I get the bot page from the website because wget is reusing the existing connection (I think). This command was working before I spam tested and now my server is getting a bot test redirect from the site (can't use this).

--2017-12-12 19:16:42--  https://www.kayak.co.uk/h/bots/human-redirect.vtl?url=%2Fflights%2FDUB-LAX%2F2018-06-04%2F2018-06-25%2F2adults%3Fsort%3Dbestflight_a
Reusing existing connection to [www.kayak.co.uk]:443.
HTTP request sent, awaiting response... 200 OK

My question is: is there anyway to stop wget from using the existing connection and reconnect the site to download each time?

Best Answer

I know this is an old issue, but perhaps this will help others who come across it as I have.

To disable the "keep-alive" feature, use the --no-http-keep-alive argument.

From the man page:

Turn off the "keep-alive" feature for HTTP downloads. Normally, Wget asks the server to keep the connection open so that, when you download more than one document from the same server, they get transferred over the same TCP connection. This saves time and at the same time reduces the load on the server.

Using this argument is typically needed in cases where a new, clean request is necessary. Although not strictly related, the --no-cache and --no-cookies arguments might also be relevant in cases where the --no-http-keep-alive argument is used.

So the OP's command would probably be:

wget --no-http-keep-alive --no-cache --no-cookies --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" -O $2/content.html $1
Related Question