I would suggest doing all the house-keeping inside awk
, this works here with GNU awk:
BEGIN { file = "1" }
{ print | "gzip -9 > " file ".gz" }
NR % 10000 == 0 {
close("gzip -9 > " file ".gz")
file = file + 1
}
This will save 10000 lines to 1.gz
, the next 10000 to 2.gz
, etc. Use sprintf
if you want more flexibility in filename generation.
Updated with a test
Test data used are primes up to 300k, found here.
wc -lc primes; md5sum primes
Output:
25997 196958 primes
547d527ec50c2799fa6ce96dba3c26c0 primes
Now, if the awk program above was saved into split.awk
and run like this (with GNU awk):
awk -f split.awk primes
Three files (1.gz, 2.gz and 3.gz) are produced. Testing these files:
for f in {1..3}; do gzip -dc $f.gz >> foo; done
Test:
diff source.file foo
Output should be nothing if the files are the same.
And the same tests as above:
gzip -dc [1-3].gz | tee >(wc -lc) >(md5sum) > /dev/null
Output:
25997 196958
547d527ec50c2799fa6ce96dba3c26c0 -
This shows that the contents are the same and that the files are split as expected.
Best Answer
No operator is needed (or used). Your example would be something like
For related discussion