Bash Null Bytes – How to Use Null Bytes in Bash

bashnull

I've read that, since file-paths in Bash can contain any character except the null byte (zero-valued byte, $'\0'), that it's best to use the null byte as a separator. For example, if the output of find will be sent to another program, it's recommended to use the -print0 option (for versions of find that have it).

But although something like this works fine (printing file-paths separated by newlines — don't worry, this is just a demonstration, I'm not actually doing it in real scripts):

find -print0 \
  | while IFS= read -r -d $'\0' ; do echo "$REPLY" ; done

something like this does not work:

for file in * ; do echo -n "$file"$'\0' ; done \
  | while IFS= read -r -d $'\0' ; do echo "$REPLY" ; done

When I try just the for-loop part, I find that it just prints all the filenames together, without the null byte in between.

Why is this? What's going on?

Best Answer

Bash uses C-style strings internally, which are terminated by null bytes. This means that a Bash string (such as the value of a variable, or an argument to a command) can never actually contain a null byte. For example, this mini-script:

foobar=$'foo\0bar'    # foobar='foo' + null byte + 'bar'
echo "${#foobar}"     # print length of $foobar

actually prints 3, because $foobar is actually just 'foo': the bar comes after the end of the string.

Similarly, echo $'foo\0bar' just prints foo, because echo doesn't know about the \0bar part.

As you can see, the \0 sequence is actually very misleading in a $'...'-style string; it looks like a null byte inside the string, but it doesn't end up working that way. In your first example, your read command has -d $'\0'. This works, but only because -d '' also works! (That's not an explicitly documented feature of read, but I suppose it works for the same reason: '' is the empty string, so its terminating null byte comes immediately. -d delim is documented as using "The first character of delim", and I guess that even works if the "first character" is past the end of the string!)

But as you know from your find example, it is possible for a command to print out a null byte, and for that byte to be piped to another command that reads it as input. No part of that relies on storing a null byte in a string inside Bash. The only problem with your second example is that we can't use $'\0' in an argument to a command; echo "$file"$'\0' could happily print the null byte at the end, if only it knew that you wanted it to.

So instead of using echo, you can use printf, which supports the same sorts of escape sequences as $'...'-style strings. That way, you can print a null byte without having to have a null byte inside a string. That would look like this:

for file in * ; do printf '%s\0' "$file" ; done \
  | while IFS= read -r -d '' ; do echo "$REPLY" ; done

or simply this:

printf '%s\0' * \
  | while IFS= read -r -d '' ; do echo "$REPLY" ; done

(Note: echo actually also has an -e flag that would let it process \0 and print a null byte; but then it would also try to process any special sequences in your filename. So the printf approach is more robust.)

Incidentally, there are some shells that do allow null bytes inside strings. Your example works fine in Zsh, for example (assuming default settings). However, regardless of your shell, Unix-like operating systems don't provide a way to include null bytes inside arguments to programs (since program arguments are passed as C-style strings), so there will always be some limitations. (Your example can work in Zsh only because echo is a shell builtin, so Zsh can invoke it without relying on the OS support for invoking other programs. If you used command echo instead of echo, so that it bypassed the builtin and used the standalone echo program on the $PATH, you'd see the same behavior in Zsh as in Bash.)

Related Solutions

Bash – In bash, how to convert 8 bytes to an unsigned int (64bit LE)

Bash is the wrong tool altogether. Shells are good at gluing bits and pieces together; text processing and arithmetic are provided on the side, and data processing isn't in their purview at all.

I'd go for Python over Perl, because Python has bignums right off the bat. Use struct.unpack to unpack the data.

#!/usr/bin/env python
import os, struct, sys
fmt = "<" + "Q" * 8192
header_bytes = sys.stdin.read(65536)
header_ints = list(struct.unpack(fmt, header_bytes))
sys.stdin.seek(-65536, 2)
footer_bytes = sys.stdin.read(65536)
footer_ints = list(struct.unpack(fmt, header_bytes))
# your calculations here

Here's my answer to the original question. The revised question doesn't have much to do with the original, which was about converting one 8-byte sequence into the 64-bit integer it represents in little-endian order.

I don't think bash has any built-in feature for this. The following snippet sets a to a string that is the hexadecimal representation of the number that corresponds to the bytes in the specified string in big endian order.

a=0x$(printf "%s" "$string" |
      od -t x1 -An |
      tr -dc '[:alnum:]')

For little-endian order, reverse the order of the bytes in the original string. In bash, and for a string of known length, you can do

a=0x$(printf "%s" "${string:7:1}${string:6:1}${string:5:1}${string:4:1}${string:3:1}${string:2:1}${string:1:1}${string:0:1}" |
      od -t x1 -An |
      tr -dc '[:alnum:]')

You can also get your platform's prefered endianness if your od supports 8-byte types.

a=0x$(printf "%s" "$string" |
      od -t x8 -An |
      tr -dc '[:alnum:]')

Whether you can do arithmetic on $a will depend on whether your bash supports 8-byte arithmetic. Even if it does, it'll treat it as a signed value.

Alternatively, use Perl:

a=0x$(perl -e 'print unpack "Q<", $ARGV[0]' "$string")

If your perl is compiled without 64-bit integer support, you'll need to break the bytes up.

a=0x$(perl -e 'printf "%x%08x\n", reverse unpack "L<L<", $ARGV[0]' "$string")

(Replace < by > for big-endian or remove it to get the platform endianness.)

Bash – How to Name a File in the Deepest Level of a Directory Tree

That's an odd request!

I'd use find + awk to grab a file in the deepest directory:

bash-3.2$ deepest=$(find / -type f | awk -F'/' 'NF > depth {
>     depth = NF;
>     deepest = $0;
> }
>
> END {
>     print deepest;
> }')

Using ${deepest} in your mv command is left as an exercise but the following five lines may help you further:

bash-3.2$ echo "${deepest}"
/Developer/SDKs/MacOSX10.6.sdk/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/vendor/tzinfo-0.3.12/tzinfo/definitions/America/Argentina/Buenos_Aires.rb

bash-3.2$ echo "${deepest%.*}"
/Developer/SDKs/MacOSX10.6.sdk/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/vendor/tzinfo-0.3.12/tzinfo/definitions/America/Argentina/Buenos_Aires

bash-3.2$ echo "${deepest%/*}"
/Developer/SDKs/MacOSX10.6.sdk/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/vendor/tzinfo-0.3.12/tzinfo/definitions/America/Argentina

bash-3.2$ echo "${deepest##*/}"
Buenos_Aires.rb

bash-3.2$ echo "${deepest##*.}"
rb

Following update to question:

find -type d [...] "This would only find the directory. [...] How could this be solved in the most simple way?".

By supplying -type f to find to find all files (f), not all directories (d).

Best Answer

Related Solutions

Bash – In bash, how to convert 8 bytes to an unsigned int (64bit LE)

Bash – How to Name a File in the Deepest Level of a Directory Tree

Following update to question:

Related Question