bash
can't hold binary data in its variables. It's already bad enough to process text with shell loops, it would be even worse for processing binary data. The shell is the tool to run other tools.
Also note that the read
buit-in command reads characters, not bytes.
Also, dd
does one read
system call, so a dd bs=77 count=1
won't necessarily read 77 bytes especially if stdin is a pipe (the GNU implementation of dd
has iflag=fullblock
for that).
Here, you want to use a data processing programming language like perl
:
In perl
:
perl -ne 'BEGIN{$/=\77}
print "Do something with the 77 byte long <$_> record\n"'
With GNU awk
:
LC_ALL=C awk -vRS='.{,77}' '{print "the record is in <" RT ">"}'
If you want to use a shell, your best option is probably zsh
which is the only one that can store binary data in its variables:
while LC_ALL=C IFS= read -ru0 -k77 record; do
print -r -- "you may only call builtins with $record
anyway since you can't pass NUL bytes in arguments
to an external command"
done
If all you want to do is pass each chunk as stdin to a new invocation of some command
, then you can use GNU split
and its --filter
option:
split -b 77 --filter='some command'
--filter
starts a new shell to evaluate some command
for each chunk. Unless your sh
does the optimisation already by itself, you can do:
split -b 77 --filter='exec some command'
To save a fork.
Using dd
, you could parse its stderr output to find out the end of input. You'd need the GNU specific iflag=fullblock
as well:
while
{
report=$({
LC_ALL=C dd bs=77 iflag=fullblock count=1 2>&3 |
some command >&4 3>&- 4>&-
} 3>&1)
} 4>&1
[ "${report%%+*}" -eq 1 ]
do
: nothing
done
If the input size is multiple of 77 though, some command
will be run an extra time with an empty input.
First, the use of yet another cat
doesn't really make much difference, and you shouldn't bother about it.
Second, the commands that make up a pipeline are executed in separate processes anyway, no matter if they're external commands or built-ins:
$ a=0
$ a=1 | a=2 | a=3
$ echo $a
0
As to your exact problem, it's not possible to simply connect 'stdin' to 'stdout'; even if a shell had some nop
builtin which would collapse when used in a pipeline (eg | nop |
-> |
), the shell has no way to know in advance, at the time it sets up the pipeline, that your "switchboard" will switch to nop
instead of awk
or sort
.
You can also achieve the same effect as you "switchboards" by building the pipeline yourself, and then calling eval to run it. Example:
$ cat test.sh
type=`file -zi "$1"`
case $type in
*application/gzip*) mycat='zcat "$1"';;
*) mycat='cat "$1"';;
esac
case $type in
*charset=utf-16le*) mycat="$mycat | iconv -f utf16le";;
esac
# highlight comments in blue
esc=`printf '\033'`;
mycat="$mycat | sed 's/^#.*/$esc[34m&$esc[m/'"
echo >&2 "$mycat" # show the built pipeline
eval "$mycat" # ... and run it
$ iconv -t utf16 test.sh > test16.sh; gzip test16.sh
$ sh test.sh test16.sh.gz
That's a bit off-topic, but on linux there is a faster way to copy the stdin to stdout (if any of them is a pipe) -- the splice(2)
syscall, which doesn't involve moving the data to and from the userland:
$ cat splice_cat.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdlib.h>
#include <err.h>
int main(int ac, char **av){
ssize_t r;
size_t block = ac > 1 ? strtoul(av[1], 0, 0) : 0x20000;
for(;;)
if((r = splice(0, NULL, 1, NULL, block, 0)) < 1){
if(r < 0) err(1, "splice");
return 0;
}
}
$ cc -Wall splice_cat.c -o splice_cat
$ dd if=/dev/zero bs=1M count=100 status=none | (time cat >/dev/null)
real 0m0.153s
user 0m0.012s
sys 0m0.056s
$ dd if=/dev/zero bs=1M count=100 status=none | (time ./splice_cat >/dev/null)
real 0m0.100s
user 0m0.004s
sys 0m0.020s
However (afaik), that's not used by either the shell or cat
, dd
, etc.
Best Answer
Your way is adding line breaks to every thing that it write in space of whatever separator (
$IFS
) is using to split up the read. Instead of breaking it up into newlines just take the whole thing and pass it along. You can reduce the entire bit of code above to this:You don't need the truncate bit, this will truncate and write the whole STDIN stream out to it.
Edit: If you are using zsh you can just use
> $file
in place of the cat. You are redirecting to a file and truncating it, but if there is anything hanging out there waiting for something to accept STDIN it will get read at that point. I think you can do something like this with bash but you would have to set some special mode.