Text Processing – Split a File by Column and Rename Generated Files

awktext processing

I have a .txt that can be exemplified like this:

NAME | CODE
name1 | 001
name2 | 001
name3 | 002
name4 | 003
name5 | 003
name6 | 003

I need to write a script to split this file according to the CODE column, so in this case I'd get this:

file 1:
NAME | CODE
name1 | 001
name2 | 001

file 2:
NAME | CODE
name3 | 002

file 3:
NAME | CODE
name4 | 003
name5 | 003
name6 | 003

According to some research, using awk would work:

$ awk -F, '{print > $2".txt"}' inputfile

The thing is, I also need to include the header to the first line and I need the file names to be different. Instead of 001.txt, for example, I need the file name to be something like FILE_$FILENAME_IDK.txt.

Best Answer

You could try like this:

awk 'NR==1{h=$0; next}
!seen[$3]++{f="FILE_"FILENAME"_"$3".txt";print h > f} 
{print >> f}' infile

The above saves the header in a variable h (NR==1{h=$0; next}) then, if $3 not seen (!seen[$3]++ i.e. if it's the first time it encounters the current value of $3) it sets the filename (f=...) and writes the header to filename (print h > f). Then it appends the entire line to filename (print >> f). It uses default FS (field separator): blank. If you want to use | as FS (or even a regex with gnu awk) see cas' comment below.

Related Question