Is it legal to print null bytes using awk's printf
function according to POSIX? The POSIX standard of awk
does not seem to explicitly mention it either way. Real world implementations differ in how they behave:
+$ gawk 'BEGIN { x = sprintf("\000"); print(length(x)); }'
1
+$ busybox awk 'BEGIN { x = sprintf("\000"); print(length(x)); }'
0
+$
and
+$ gawk 'BEGIN { printf("\000"); }' | xxd
00000000: 00 .
+$ busybox awk 'BEGIN { printf("\000"); }' | xxd
+$
Is this specified somewhere in the standard? If yes, is the behaviour required for variables (x = sprintf("\000")
) and printf (printf("\000")
) same?
Best Answer
There are at least 4 relevant pieces of text in the POSIX.2018 specification of
awk
:Emphasis (bold text) is mine in all the quoted text below:
That means that if the input contains NUL characters (which would make it non-text as per the POSIX definition of text), then the behaviour is unspecified.
So
\000
results in undefined behaviour.About regexp matching:
About
printf
/sprintf
:So, that's another way to get a NUL character that leads to undefined behaviour.
So, to sum up, in
awk
, POSIX tells us you can't use the NUL character portably, whether it's for input, output or to store in its variables.gawk
(since at least 2.10 in 1989 which is the earliest version I could find where NUL support is documented) and @ThomasDickey'smawk
(since version 20140914) are two implementations that can deal with NUL.