Is it possible to get corrupted download with http

downloadhttp

For a long time I have assumed that it is not really possible to download a corrupted file via http as long as it's not corrupted on the server and the implementation of http protocol is correct, which is most likely the case for modern mainstream software.

So I always chuckled when I saw a download site offer an md5 hash of a file they provide for download. I haven't seen a case before, where I download a file, the size is correct but the content is not.

Well, today, I had a first case of this. I downloaded an iso of Ubuntu, tried to install it, it failed and after a long research (I just could not believe that the reason could be a corrupted download) I checked the MD5 and what do you know, it was wrong (size was correct). So I re-downloaded it and got yet another wrong md5. Only on my third download the md5 was correct.

So my question is, is it possible in principle to get corrupted download over http, assuming that the implementation is correct, the transfer has finished successfully and that the file is correct on the server. If this is possible, then how can this happen?

Best Answer

Yes, it's possible, especially on poor quality Internet connections – usually wireless, but some wired connections (such as the one I have) also have high error rates at high speeds.

The HTTP protocol does not have any provisions for ensuring data integrity. On transport layer, TCP does have error detection by using a checksum, but it's not very reliable.


There is another reason for providing hashes or digital signatures. Often, the actual files are distributed over many mirror servers, which cannot be guaranteed to be 100% secure. If there's no hash or signature to verify, someone with access to a mirror (not necessarily legitimate) could replace the files and remain undetected, without having to break into a completely different server where the website is hosted.


You can get automatic verification of files if you download Ubuntu over BitTorrent instead of HTTP. (Each piece is verified at download time, so you never have to re-download the entire thing.)

Related Question