Text Processing – Split a File by Column and Rename Generated Files

awktext processing

I have a .txt that can be exemplified like this:

NAME | CODE
name1 | 001
name2 | 001
name3 | 002
name4 | 003
name5 | 003
name6 | 003

I need to write a script to split this file according to the CODE column, so in this case I'd get this:

file 1:
NAME | CODE
name1 | 001
name2 | 001

file 2:
NAME | CODE
name3 | 002

file 3:
NAME | CODE
name4 | 003
name5 | 003
name6 | 003

According to some research, using awk would work:

$ awk -F, '{print > $2".txt"}' inputfile

The thing is, I also need to include the header to the first line and I need the file names to be different. Instead of 001.txt, for example, I need the file name to be something like FILE_$FILENAME_IDK.txt.

Best Answer

You could try like this:

awk 'NR==1{h=$0; next}
!seen[$3]++{f="FILE_"FILENAME"_"$3".txt";print h > f} 
{print >> f}' infile

The above saves the header in a variable h (NR==1{h=$0; next}) then, if $3 not seen (!seen[$3]++ i.e. if it's the first time it encounters the current value of $3) it sets the filename (f=...) and writes the header to filename (print h > f). Then it appends the entire line to filename (print >> f). It uses default FS (field separator): blank. If you want to use | as FS (or even a regex with gnu awk) see cas' comment below.

Output:

ID#-ee: Time- file1.txt value=  gg / file2.txt value= 2g
ID#-ii: Building- file1.txt value=  ll / file2.txt value= 2l

Files – Split File by Number of Lines Including Header in Each One

With gnu split you could save the header in a variable then split starting from the 2nd line, using the --filter option to write the header first and then the 99 lines for each piece and also specify the output directory (e.g. path to/output dir/):

header=$(head -n 1 infile.txt)
export header
tail -n +2 infile.txt | split -l 99 -d --additional-suffix=.txt \
--filter='{ printf %s\\n "$header"; cat; } >path\ to/output\ dir/$FILE' - file_

this will create 100-lines pieces as

path to/output dir/file_01.txt
path to/output dir/file_02.txt
path to/output dir/file_03.txt
..............................

Best Answer

Related Solutions

Lum – Compare 2 tab delimited files and output differences with column header

Output:

Files – Split File by Number of Lines Including Header in Each One

Related Question