Splitting file based on values in specific column

awksedtext processing

I have a file that I would like to break up into multiple files with uniq values for the first column. For example, here is a file:

fileA.txt

1    Cat
1    Dog
1    Frog
2    Boy
2    Girl
3    Tree
3    Leaf
3    Branch
3    Trunk

I would like my output to look something like this:

file1.txt

1    Cat
2    Boy
3    Tree

file2.txt

1    Dog
2    Girl
3    Leaf

file3.txt

1    Frog
3    Branch

file4.txt

3    Trunk

If a value does not exist, I want it to be skipped. I have tried to search for similar situations to mine, but I've come up short. Does anyone have idea of how to do this?

Edit: My awk version is: awk version 20070501

Best Answer

$ gawk '{print > "file" ++a[$1] ".txt"}' input

# And on OSX awk, and also gawk:

$ awk '{print > ("file" ++a[$1] ".txt")}' input


$ head file*txt
==> file1.txt <==
1    Cat
2    Boy
3    Tree

==> file2.txt <==
1    Dog
2    Girl
3    Leaf

==> file3.txt <==
1    Frog
3    Branch

==> file4.txt <==
3    Trunk

edit: An explanation. This prints the current line into (>) fileX.txt. Every time the first field is found, an array a[$1] is increased by 1 before it is evaluated. This is used to establish the file name.

edit 2: I do not have the possibility to check with OSX awk, but I guess if you are halfly serious about using awk, you would do good installing gawk or mawk. You could, however, give this a shot:

$ awk '{a[$1]++; f = "file" a[$1] ".txt"; print > f}' input

This does the same, but all the action is split in separate steps. This is to help OSX awk to understand the right order of evaluating the parts.

Related Question