Ubuntu – How to Fake Host and Multithread Download of One File?


I would like to download the 227 GB data file here but it takes me about 20-24 hours now. Torrent protocol would be awesome for the task and/or parallel process in the download but the host is limiting the number of connection to one (1).

  • Can you fake host and multithread the task with wget here? … I think not easily …
    Pseudocode where I think disabling directories and disabling host directories make sense; putting robots off; taking only the .bin file; saying my browser is mozilla; downloading to the directory /tmp/; the number of thread 150

    wget -n 150 -nd -nH -e robots=off -A".bin" -U mozilla \
        -P /tmp/ http://horatio.cs.nyu.edu/mit/tiny/data/tiny_images.bin
  • exclude proxilla since not in apt-get


  • Host limits the number of connections to one (1).
  • I set 150 number of connections in the axel because my max download speed is 30 MB so should get 15 MB/s (=0.1 MB/s * 150) download speed but now estimated time is 21 h at the start.
  • VPN attempt: Just if the host starts to blacklist the user.

Exclude axel from the task

Axel fails even with 1-16 connections where wanting also the progress bar (-a)

# http://www.cyberciti.biz/tips/download-accelerator-for-linux-command-line-tools.html
axel -a -n 1 -s 16 http://horatio.cs.nyu.edu/mit/tiny/data/tiny_images.bin
Initializing download: http://horatio.cs.nyu.edu/mit/tiny/data/tiny_images.bin
File size: 243615796224 bytes
Opening output file tiny_images.bin
Error opening local file

Exclude aria2c from the task

  • Host causes failure even with 2-16 number of connections; -c allows continuation of download if it gets interrupted, -x 10 and -s 10 allow up to 10 connections per server

    # http://askubuntu.com/a/507890/25388
    aria2c -c -x10 -s10 http://horatio.cs.nyu.edu/mit/tiny/data/tiny_images.bin
    08/17 21:27:25 [ERROR] CUID#6 - Download aborted. URI=http://horatio.cs.nyu.edu/mit/tiny/data/tiny_images.bin
    Exception: [AbstractCommand.cc:398] errorCode=16 URI=http://horatio.cs.nyu.edu/mit/tiny/data/tiny_images.bin
      -> [RequestGroup.cc:714] errorCode=16 Download aborted.
      -> [AbstractDiskWriter.cc:222] errNum=13 errorCode=16 Failed to open the file /media/masi/SamiSwapVirtual/tiny_images.bin, cause: Permission denied
    08/17 21:27:25 [NOTICE] Download GID#1e5701ee3b4d44f4 not complete: /media/masi/SamiSwapVirtual/tiny_images.bin
    Download Results:
    gid   |stat|avg speed  |path/URI
    1e5701|ERR |       0B/s|/media/masi/SamiSwapVirtual/tiny_images.bin
    Status Legend:
    (ERR):error occurred.
    aria2 will resume download if the transfer is restarted.
    If there are any errors, then see the log file. See '-l' option in help/man page for details.

Ubuntu: 16.04 64 bit
LTE router: TP-link MR220 with the latest firmware
LTE connection: 30/20 MB for Download/Upload
Download HDD: 2 TB ext4 Transcend

Best Answer

Unfortunately, what you are asking is pretty much impossible. You can't force a server to allow multiple connections.

If they limit the number of connections based on IP then you would need to send a different IP for each connection. At that point, you would need a program that can merge multiple parts of the same file from different computers with parts created on-the-fly. This should be technically possible but certainly not practical because you'd have to find such a program and, to my knowledge, such a program does not exist.

There are only 2 choices I can think:

  1. Download with the rate allowed.
  2. Transload the file from that server to another server and then download from that server.
    • this is the fastest option but utilizes an existing server in a datacenter preferably.
    • this is not a practical option for most people.

With that said, I tested the download myself and it provides the absolute maximum speed that my internet plan can handle with only one connection.

I have a 60 Mbit connection so I can download files at between 5.5MBps - 7.5MBps (megabytes per second). This download from NYU offered 7.2MBps which is totally reasonable and more than my average top speed of 6.5MBps. This means that downloading at my speed would take about 10 hours.

I suspect that you have a 30Mbit connection resulting in twice as much time so it is my estimation that the bottleneck is on your end, not NYU.

Related Question