Md5sum command binary and text mode

binaryhashsumtext;

The GNU md5sum command has two modes: binary mode and text mode. I guess the difference is only on how newline characters are handled? Am I right?

On GNU/Linux, the two modes always produce same result, so the only use of -b and -t options is to indicate the flag (* or ) used before the file name?

In what circumstances can the modes produce different results? On Windows/MacOS systems? (Versions for these platforms available?)

Best Answer

On GNU/Linux, the two modes always produce same result

Yes, explicitly. From man md5sum:

Note: There is no difference between binary and text mode option on [sic] GNU system.

This is from the md5sum implementation that ships with GNU coreutils 8.21; I notice an older version (8.12) does not have this notice but I presume the same would be true anyway.

Although AFAICT md5sum is not officially standardized (e.g., by POSIX), it is available on various platforms in various implementations and there is obviously some effort to make these compliant with one another for ease of use across systems.

In relation to this, the ISO/ANSI C Standard includes high level stream functions for accessing files. As part of the standard, these are available on any operating system which implements ISO C via a shared library or a compiler. Since pretty much all operating systems have this available (and are themselves most often written in C), it is a sort of universal language used to implement potentially very portable software.

Considering what it does, it would be totally feasible to write an md5sum that would compile and work on any operating system. I am not claiming this is true of the GNU coreutils version, but one of the high level file stream functions mentioned earlier is fopen(), which is mandated by ISO C to include a b switch used in opening a file to indicate it is being opened "as binary file". What that may mean or require of the system isn't stipulated by the standard, it's just required to exist so it can be used on system where there may be some (any) reason for it.

There is no such reason on linux/POSIX/*nix-style operating systems, so the switch does nothing. From the POSIX spec (a superset of ISO C) for fopen():

The character 'b' shall have no effect, but is allowed for ISO C standard conformance.

So, a completely portable md5sum implementation might use the ISO high level file stream functions, since there are no other methods for accessing files in ISO C (most platforms, including POSIX complaint ones, have their own lower level methods as well, but using these would not be portable because they are not in ISO C), and it should also implement the -b and -t flags to add or not add the b option to fopen() when it reads the file. On systems where that is meaningless, it won't make any difference.

Again, I'm not saying GNU's md5sum is written in such a completely portable way or derived from one that is, but obviously it is trying to comply, in its operability, with one that is. Note that having a flag which does nothing is not the same as not having the flag -- in the former case, it is specified to be okay but do nothing, whereas in the later case using it could be an error or lead to undefined behaviour.