Bash Script – Output Lines to New Numbered Files Until Original Is Empty

bashsed

I have a chromosome file that look like this:

JH739887 1 30495534
JH739888 1 29527584
JH739889 1 22321128
JH739890 1 19792264
JH739891 1 19033121
JH739892 1 17022292
[...]

A test file could be generated like this:

cd ~/Desktop/
printf "JH%06d \t 1 \t 100 \n" {1..27239} > test_lotsoflines.txt

It has 27239 lines, but Id like to have 10 files with ~2724 lines in it instead (this will be to make a parallel command work).

I'm able to output from line 1 to 2724 from the original file to a new file.

sed -n -e '1,2724p' ${REFGENO}/geoFor1.chrom.start.stop.sizes > ~/Desktop/output.txt
wc -l ~/Desktop/output.txt
 2724 ~/Desktop/output.txt

But now, I want to increment from line 2725 to 5448 until I reach the end of the file (27239 lines) and output into a new file output##.txt.

output01.txt 2724 lines 
output02.txt 2724 lines 
[...]
output10.txt 2723 lines

I was thinking using printf "output%02d.txt\n" to get to the output##.txt

But how to increment the number file and the lines in the files to generate 10 files in the end? Surely, the last file would have 2723 lines since the number of lines in the original file is not ending by "0".

A way that could be used is to update the file name with solution provided in How can I increment a number at the end of a string in bash?:

updateVersion()
{
  [[ $1 =~ ([^0-9]*)([0-9]+) ]] || { echo 'invalid input'; exit; }     
  echo "${BASH_REMATCH[1]}$(( ${BASH_REMATCH[2]} + 1 ))"
}

But I would need to separate the file name and the extension…

I'm on a mac: macOS Mojave 10.14.6.

Best Answer

This kind of thing is exactly what the GNU Coreutils split function is designed for

Ex. to split file into 10 chunks without splitting lines with prefix output, suffix .txt, and incrementing numbers

split -d -n l/10 --additional-suffix='.txt' file output
Related Question