MD5 and SHA1 checksum uses for downloading

hashing

I notice that when downloading a lot of open source tools (Eclipse, etc.) there are links for MD5 and SHA1 checksums, and didn't know what these were or what their purpose was.

I know these are hashing algorithms, and I do understand hashing, so my only guess is that these are used for hashing some component of the download targets, and to compare them with "official" hash strings stored server-side. Perhaps that way it can be determined whether or not the targets have been modified from their correct version (for security and other purposes).

Am I close or completely wrong, and if wrong, what are they?!?!

Thanks!

Best Answer

You're almost completely right. The only correction is that they are hashes of the whole file.

Sometimes, files can be corrupted during download whatever way is used to transfer them. Hashes are there to make sure that the file is intact. This is especially useful to users with bad Internet connections. Back when I was using fax modem, I'd often get problems with corrupt downloads.

Some download managers (like GetRight, if I remember correctly), can even automatically calculate the hash of the file and compare it to known value.

Another interesting point is security. A potential problem with open source tools is how much you can trust the distributer. Often programs such as Eclipse are the main tool used by software companies and therefore it is extremely important for them to move from the developer to the user intact. Since programs are open source, it is possible to for example make infected version which would look normal, but leak source code to some remote server or infect programs made by the software with a virus (I think this actually happened to some version of Delphi) or something similar. For that reason, it is important to have official correct hash which can be used to check if the distributed file is what is claims to be.

Some thoughts about distribution channels. Often free software can be found on large amount of sites and most popular sites like SourceForge, for example have large number of mirrors. Let's say there's a server in Barland which mirrors a large software distribution site. FooSoft uses the program distributed by site and they are in Republic of Baz which is right next to Barland. If someone wanted to infiltrate FooSoft, he could modify just the copy at Barland mirror and hope that geolocation software would then make sure that FooSoft gets the modified versions. Since versions from other mirrors are fine, chances are lower that malware would be detected. You could also make malware detect computer's IP address and activate only if it's from a certain range, and that way lower chances of discovery and so on.

Related Solutions

A tool for getting a complete directory/file listing with detailed information including hash(es)

WinHasher:

WinHasher is a free, Open Source cryptographic hash or digest generator written in C# using Microsoft's .NET 2.0 Framework. It can be used to verify file download integrity, compare two or more files for modifications, and to some degree generate strong, unique passwords.

CommandLine Hash Generator:

cmdhashgen is a Command Line Utility that can be used to generate various hashes for a given String or File.

Supported Hashes are CRC32, MD5, SHA-1, SHA-256, SHA-384 and SHA-512.

WinHasher has command-line utilities including "Hash", which can be tied into a batch file or script. It looks like the more stable of the two packages.

Usage: hash [-md5|-sha1|-sha256|-sha384|-sha512|-ripemd160|-whirlpool|
       -tiger] [-base64|-hexcaps|-bubbab] filename1 [filename2 ...]

WinHasher is a command-line cryptographic hash generator for files.  It
runs in one of two modes:  single file hashing and multi-file comparison.

In single file mode, WinHasher computes the cryptographic hash of the
given file and prints it to the screen.  With no command-line switches,
it computes the SHA-1 hash and displays it in hexadecimal format.  Various
switches allow you to change to other hashing algorithms, such as MD5,
the SHA family, RIPEMD-160, Whirlpool, and Tiger.  The "-base64" switch
causes WinHasher to output hashes in MIME Base64 (RFC 2045) format rather
than hexadecimal, "-hexcaps" outputs hexadecimal with all capital letters,
and "-bubbab" uses Bubble Babble encoding.

Way to search for files by hash value

Linux example:

hash='74e7432df4a66f246b5214d60b190b67e2f6ce52'
find . -type f -exec sh -c '
   sha1sum "$2" | cut -f 1 -d " " | sed "s|^\\\\||" | grep -Eqi "$1"
' find-sh "$hash" {} \; -print

This code is more complex than you would think it should be because:

it is intended to correctly handle filenames with spaces, newlines, backslashes, quotations, special characters etc. (change -print to -print0 to parse them further);
it is intended to accept hash(es) as regex (compatible with grep -E i.e. egrep),
e.g. '(^00)|(00$)' will match if the file hash starts or ends with 00.

You can use other *sum tools with compatible interface (e.g. md5sum).

Best Answer

Related Solutions

A tool for getting a complete directory/file listing with detailed information including hash(es)

Way to search for files by hash value

Related Question