Ubuntu – split a file based on pre-defined set of rows

command linesplittext processing

I want to split a text file according to a pre-defined set of rows.
For example. I have a file

a
b
c
d
e
f

And then I have the following sets of rows (these could be stored however it is more convenient, on one file, multiple files,…).

1,2
3,6
5,4

I want to split my file so that I get 3 files back like:

file1

a
b

file2

c
f

file3

e
d

Best Answer

Here is a bash script assuming your input file is named infile and the ranges are stored 1-per-line in a file named splits:

i=1
for range in $(< splits); do
  sed -n "$(echo "$range" | cut -f1 -d, )p" infile > "file$i"
  sed -n "$(echo "$range" | cut -f2 -d, )p" infile >> "file$i"
  ((i++))
done

This simply uses sed to print the lines specified by the ranges, and saves each result as a new file (files created are named file1 file2 file3 etc). Two invocations of sed are used to preserve the specified order of the rows.

Note that there is no format or error checking done by this simple script, and existing files named e.g. file1 will be overwritten.

A simplified alternative (courtesy of @muru) using while read and letting bash split the ranges instead of cut:

i=1
while IFS=',' read n1 n2 
do
    sed -n "$n1 p; $n2 p" infile > "file$i"
    ((i++))
done < splits

If the order of the lines in the output files is important (e.g. rows 5,4 != 4,5), then the sed bit will need to be broken up into two separate invocations similar to the first script.

Related Solutions

Ubuntu – How to replace a string on the 5th line of multiple text files

Here are a few approaches. I am using brace expansion (file{1..4}.txt) which means file1.txt file2.txt file3.txt file4.txt

Perl
```
perl -i -pe 's/.*/ Good Morning / if $.==5' file{1..4}.txt
```
Explanation:
- -i: causes perl to edit the files in place, changing the original file.
  
  If -i is followed with a file extension suffix, then a backup is created for every file that is modified. Ex: -i.bak creates a file1.txt.bak if file1.txt is modified during the execution.
- -p: means read the input file line by line, apply the script and print it.
- -e: allows you to pass a script from the command line.
- s/.*/ Good Morning /: That will replace the text in the current line (.*) with Good Morning.
- $. is a special Perl variable that holds the current line number of the input file. So, s/foo/bar/ if $.==5, means replace foo with bar only on the 5th line.
sed
```
sed -i '5s/.*/ Good Morning /' file{1..4}.txt
```
Explanation:
- -i: Like for perl, edit file in place.
By default, sed prints each line of the input file. The 5s/pattern/replacement/ means substitute pattern with replacement on the 5th line.
Awk
```
for f in file{1..4}.txt; do 
    awk 'NR==5{$0=" Good Morning "}1;' "$f" > foobar && mv foobar "$f"; 
done
```
Explanation:

awk has no equivalent to the -i option¹ which means that we need to create a temporary file (foobar) which is then renamed to overwrite the original. The bash loop for f in file{1..4}.txt; do ... ; done simply goes through each of file{1..4}.txt, saving the current file name as $f. In awk, NR is the current line number and $0 is the content of the current line. So, the script will replace the line ($0) with " Good Morning " only on the 5th line. 1; is awk for "print the line".

¹_{Newer versions do as devnull showed in his answer.}
coreutils
```
for f in file{1..4}.txt; do 
    (head -4 "$f"; echo " Good Morning "; tail -n +6 "$f") > foobar && 
    mv foobar "$f"; 
done 
```
Explanation:

The loop is explained in the previous section.
- head -4: print the first 4 lines
- echo " Good Morning ": print " Good Morning "
- tail -n +6: print everything from the 6th line to the end of the file
The parentheses ( ) around those three commands allow you to capture the output of all three (so, 1st 4 lines, then " Good morning ", then the rest of the lines) and redirect them to a file.

Ubuntu – Comparing two text files

Try this command:

 grep -v -f file2.csv file1.csv > file3.csv

According to grep manual:

  -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

  -v, --invert-match
          Invert the sense of matching, to select non-matching lines.  (-v
          is specified by POSIX.)

As Steeldriver said in his comment is better add also -x and -F that:

  -F, --fixed-strings
          Interpret PATTERN as a  list  of  fixed  strings,  separated  by
          newlines,  any  of  which is to be matched.  (-F is specified by
          POSIX.)
  -x, --line-regexp
          Select  only  those  matches  that exactly match the whole line.
          (-x is specified by POSIX.)

So, better command is:

 grep -xvFf file2.csv file1.csv > file3.csv

This command use file2.csv line as pattern and print line of file1.csv that doesn't match (-v).

Best Answer

Related Solutions

Ubuntu – How to replace a string on the 5th line of multiple text files

Explanation:

Explanation:

Explanation:

Explanation:

Ubuntu – Comparing two text files

Related Question