Splitting file based on size, but make sure that it ends with newline

filesnewlinessplittext processing

I am able to use the split command successfully to split large file into multiple smaller files. This is being achieved by following command

split -b 1G $temp_path $final_filepath

But only caveat is that these files many times contain last line which is split across 2 files. Is there any way to avoid that using split or any other command ?

Best Answer

Yes, don't use the -b parameter. From the split(1) man page:

-b, --bytes=SIZE put SIZE bytes per output file

-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file

-l, --lines=NUMBER put NUMBER lines per output file

By using -b you are telling split to deliniate files at a specific size in bytes (or Kb or MB). If that is the middle of a line, too bad.

Split supports 'number of lines' and a 'max output file size comprised of whole lines'.

Instead, try this:

split -C 1G $temp_path $final_filepath

The -C flag is not available on all versions of split (notably OS X / Darwin). In that case you can use gsplit which is available in the GNU coreutils package on Homebrew and MacPorts.

Related Question