Shell – Equivalent of Java’s String.getBytes() in Unix Shell (Cygwin)

binaryjavaopensslshell

Let's say I convert my string into byte array.

byte[] byte sUserID.getBytes(“UTF-8”);  //Convert User ID String to byte array    

Now I need to write a script on Shell that will have exactly the same functionality as my Java code. At some stage I must hash my byte array (using MessageDigest.getInstance(“SHA-256”) in Java and openssl dgst -sha256 –binary in Shell), but because digests in Java code are generated from byte arrays, they won’t match results I get in Shell (in Shell I simply hash strings at the moment, so input formats don't match).

Because my input for openssl in shell should be similar to Java input I want to know whether there is a way to “simulate” getBytes() method in Shell? I don’t have much experience in Shell so I don’t know what could be the best approach in this case. Any ideas? Cheers!

Best Answer

openssl's stdin is a byte stream.

The contents of $user is a sequence of non-0 bytes (which may or may not form valid characters in UTF-8 or other character set/encoding).

printf %s "$user"'s stdout is a byte stream.

printf %s "$user" | openssl dgst -sha256 –binary

Will connect printf's stdout with openssl's stdin. openssl's stdout is another byte stream.

Now, if you're inputing $user from the user from a terminal, The user will enter it by pressing keys on his keyboard. The terminal will send corresponding characters (as written on the key label) encoded in its configured character set. Usually, that character set will be based on the character set in the current locale. You can find what that is with locale charmap.

For instance, with a locale like fr_FR.iso885915@euro, and an xterm started in that locale, locale charmap will return ISO-8859-15. If the user enters stéphane as the username, that é will likely be encoded as the 0xe9 byte because that's how it's defined in the ISO-8859-15 character set.

If you want that é to be encoded as UTF-8 before passing to openssl, that's where you'd use iconv to convert that 0xe9 byte to the corresponding encoding in UTF-8 (two bytes: 0xc3 0xa9):

IFS= read -r user # read username from stdin as a sequence of bytes
                  # assumed to be encoded from characters as per the
                  # locale's encoding
printf %s "$user" |
  iconv -t utf-8 | # convert from locale encoding to UTF-8
  openssl dgst -sha256 –binary 
Related Question