Ubuntu – split a file based on pre-defined set of rows

command linesplittext processing

I want to split a text file according to a pre-defined set of rows.
For example. I have a file

a
b
c
d
e
f

And then I have the following sets of rows (these could be stored however it is more convenient, on one file, multiple files,…).

1,2
3,6
5,4

I want to split my file so that I get 3 files back like:

file1

a
b

file2

c
f

file3

e
d

Best Answer

Here is a bash script assuming your input file is named infile and the ranges are stored 1-per-line in a file named splits:

i=1
for range in $(< splits); do
  sed -n "$(echo "$range" | cut -f1 -d, )p" infile > "file$i"
  sed -n "$(echo "$range" | cut -f2 -d, )p" infile >> "file$i"
  ((i++))
done

This simply uses sed to print the lines specified by the ranges, and saves each result as a new file (files created are named file1 file2 file3 etc). Two invocations of sed are used to preserve the specified order of the rows.

Note that there is no format or error checking done by this simple script, and existing files named e.g. file1 will be overwritten.

 


A simplified alternative (courtesy of @muru) using while read and letting bash split the ranges instead of cut:

i=1
while IFS=',' read n1 n2 
do
    sed -n "$n1 p; $n2 p" infile > "file$i"
    ((i++))
done < splits

If the order of the lines in the output files is important (e.g. rows 5,4 != 4,5), then the sed bit will need to be broken up into two separate invocations similar to the first script.

Related Question