I have a pipe delimited file a.txt
which includes a header row. The first column holds a filename.
I would like to split a.txt
into several different files – the name of which is determined by the first column. I would also like to have the header row of a.txt
repeated at the top of each file .
so I have a.txt
:
filename|count|age
1.txt|1|15
1.txt|2|14
2.txt|3|1
41.txt|44|1
2.txt|1|3
and I want to create 1.txt
filename|count|age
1.txt|1|15
1.txt|2|14
and 2.txt
filename|count|age
2.txt|3|1
2.txt|1|3
and 41.txt
filename|count|age
41.txt|44|1
I have a basic split working
awk -F\| '{print>$1}' a.txt
but I am struggling to work out how to get the header included, could anybody help? Thanks!
Best Answer
The solution would be to store the header in a separate variable and print it on the first occurence of a new
$1
value (=file name):a.txt
in a variablehdr
but otherwise leave that particular line unprocessed.$1
value (=the desired output filename) was already encountered, by looking it up in an arrayseen
which holds an occurence count of the various$1
values. If the counter is still zero for the current$1
value, output the header to the file indicated by$1
, then increase the counter to suppress header output for all later occurences. The rest you already figured out yourself.Addendum:
If you have more than one input file, which all have a header line, you can simply place them all as arguments to the
awk
call, as inIf, however, only the first file has a header line, you would need to change
FNR
toNR
in the first rule.Caveat
As noted by Ed Morton, the simple approach only works if the number of different output files is small (max. around 10). GNU
awk
will still continue working, but become slower due to automatically closing and opening files in the background as needed; otherawk
implementations may simply fail due to "too many open files".