Shell – Split file and know how many files were generated

filesshellsplit

I'm using the following lines to split a file into smaller parts:

split --line-bytes=100M -d $input $output/FILENAME
echo "$input was split into ??? 100MB files." >> demo.log

After that, I need to write in a log file how many smaller files were generated from this split. Is there any way to do that?

Best Answer

The easiest way is to simply save the resulting pieces names in an array e.g.

splitarr=($output/FILENAME*)

and get the array length (number of elements) with ${#splitarr[@]}. This assumes the only filenames matching that pattern are those produced by the split command.

You appear to be using gnu split so here are some other ways to do it: you could add the --verbose option (see man page for details) and just count the lines that split prints to stdout and save that into a variable:

ct=$(split --verbose --line-bytes=100M -d $input $output/FILENAME | wc -l)

You could get the same result with the less known option --filter:

ct=$(split --filter='printf %s\\n;cat >$FILE' --line-bytes=100M -d $input $output/FILENAME | wc -l)

Alternatively, if you know that only your split command will create files in that directory in the next N seconds you could use inotifywatch to gather statistics for e.g close_write event:

inotifywatch . -t 20 -e close_write

will watch the current dir for close_write events for the next 20 seconds and will output something like:

Establishing watches...
Finished establishing watches, now collecting statistics.
total  close_write  filename
11     11           ./

so it's only a matter of extracting that number from the table (e.g. pipe it to awk 'END{print $2}'; also keep in mind the first two lines are printed on stderr)

Related Solutions

Split Large File into Chunks Without Splitting Entry

Here's a solution that could work:

seq 1 $(((lines=$(wc -l </tmp/file))/16+1)) $lines |
sed 'N;s|\(.*\)\(\n\)\(.*\)|\1d;\1,\3w /tmp/uptoline\3\2\3|;P;$d;D' |
sed -ne :nl -ne '/\n$/!{N;bnl}' -nf - /tmp/file

It works by allowing the first sed to write the second sed's script. The second sed first gathers all input lines until it encounters a blank line. It then writes all output lines to a file. The first sed writes out a script for the second one instructing it on where to write its output. In my test case that script looked like this:

1d;1,377w /tmp/uptoline377
377d;377,753w /tmp/uptoline753
753d;753,1129w /tmp/uptoline1129
1129d;1129,1505w /tmp/uptoline1505
1505d;1505,1881w /tmp/uptoline1881
1881d;1881,2257w /tmp/uptoline2257
2257d;2257,2633w /tmp/uptoline2633
2633d;2633,3009w /tmp/uptoline3009
3009d;3009,3385w /tmp/uptoline3385
3385d;3385,3761w /tmp/uptoline3761
3761d;3761,4137w /tmp/uptoline4137
4137d;4137,4513w /tmp/uptoline4513
4513d;4513,4889w /tmp/uptoline4889
4889d;4889,5265w /tmp/uptoline5265
5265d;5265,5641w /tmp/uptoline5641

I tested it like this:

printf '%s\nand\nmore\nlines\nhere\n\n' $(seq 1000) >/tmp/file

This provided me a file of 6000 lines, which looked like this:

<iteration#>
and
more
lines
here
#blank

...repeated 1000 times.

After running the script above:

set -- /tmp/uptoline*
echo $# total splitfiles
for splitfile do
    echo $splitfile
    wc -l <$splitfile
    tail -n6 $splitfile
done

OUTPUT

15 total splitfiles
/tmp/uptoline1129
378
188
and
more
lines
here

/tmp/uptoline1505
372
250
and
more
lines
here

/tmp/uptoline1881
378
313
and
more
lines
here

/tmp/uptoline2257
378
376
and
more
lines
here

/tmp/uptoline2633
372
438
and
more
lines
here

/tmp/uptoline3009
378
501
and
more
lines
here

/tmp/uptoline3385
378
564
and
more
lines
here

/tmp/uptoline3761
372
626
and
more
lines
here

/tmp/uptoline377
372
62
and
more
lines
here

/tmp/uptoline4137
378
689
and
more
lines
here

/tmp/uptoline4513
378
752
and
more
lines
here

/tmp/uptoline4889
372
814
and
more
lines
here

/tmp/uptoline5265
378
877
and
more
lines
here

/tmp/uptoline5641
378
940
and
more
lines
here

/tmp/uptoline753
378
125
and
more
lines
here

How to split file and save parts to multiple locations

I think you can get away with using split's --filter=COMMAND.

... | split -b <SIZE> -d - part --filter=./split-filter

where ./split-filter is something like

#!/bin/bash

set -e

n="${FILE#part}"
case $((10#$n%3)) in
    0)
        dd bs=64K >"path1/$FILE"
        ;;
    1)
        dd bs=64K >"path2/$FILE"
        ;;
    2)
        dd bs=64K >"path3/$FILE"
        ;;
esac

Best Answer

Related Solutions

Split Large File into Chunks Without Splitting Entry

OUTPUT

How to split file and save parts to multiple locations

Related Question