A problem with split --filter
is that the output can be mixed up, so you get half a line from process 1 followed by half a line from process 2.
GNU Parallel guarantees there will be no mixup.
So assume you want to do:
A | B | C
But that B is terribly slow, and thus you want to parallelize that. Then you can do:
A | parallel --pipe B | C
GNU Parallel by default splits on \n and a block size of 1 MB. This can be adjusted with --recend and --block.
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22444
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351 93a7668d
21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80 e02a2244 40e8a43f
$ bash install.sh
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
The ^D
character (also known as \04
or 0x4, END OF TRANSMISSION in Unicode) is the default value for the eof
special control character parameter of the terminal or pseudo-terminal driver in the kernel (more precisely of the tty
line discipline attached to the serial or pseudo-tty device). That's the c_cc[VEOF]
of the termios
structure passed to the TCSETS/TCGETS ioctl
one issues to the terminal device to affect the driver behaviour.
The typical command that sends those ioctls
is the stty
command.
To retrieve all the parameters:
$ stty -a
speed 38400 baud; rows 58; columns 191; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O;
min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke
That eof
parameter is only relevant when the terminal device is in icanon
mode.
In that mode, the terminal driver (not the terminal emulator) implements a very simple line editor, where you can type Backspace to erase a character, Ctrl-U to erase the whole line... When an application reads from the terminal device, it sees nothing until you press Return at which point the read()
returns the full line including the last LF
character (by default, the terminal driver also translates the CR
sent by your terminal upon Return to LF
).
Now, if you want to send what you typed so far without pressing Enter, that's where you can enter the eof
character. Upon receiving that character from the terminal emulator, the terminal driver submits the current content of the line, so that the application doing the read
on it will receive it as is (and it won't include a trailing LF
character).
Now, if the current line was empty, and provided the application will have fully read the previously entered lines, the read
will return 0 character.
That signifies end of file to the application (when you read from a file, you read until there's nothing more to be read). That's why it's called the eof
character, because sending it causes the application to see that no more input is available.
Now, modern shells, at their prompt do not set the terminal in icanon
mode because they implement their own line editor which is much more advanced than the terminal driver built-in one. However, in their own line editor, to avoid confusing the users, they give the ^D
character (or whatever the terminal's eof
setting is with some) the same meaning (to signify eof
).
Best Answer
read
reads a record (line by default, but ksh93/bash/zsh allow other delimiters with-d
, even NUL with zsh/bash) and returns success as long as a full record has been read.read
returns non-zero when it finds EOF while the record delimiter has still not been encountered.That allows you do do things like
Or with zsh/bash
And that loop to exit after the last record has been read.
You can still check if there was more data after the last full record with
[ -n "$nul_delimited_record" ]
.In your case,
read
's input doesn't contain any record as it doesn't contain any NUL character. Inbash
, it's not possible to embed a NUL inside a here document. Soread
fails because it hasn't managed to read a full record. It stills stores what it has read until EOF (after IFS processing) in thejson
variable.In any case, using
read
without setting$IFS
rarely makes sense.For more details, see Understanding "IFS= read -r line".