Splitting file based on values in specific column

awksedtext processing

I have a file that I would like to break up into multiple files with uniq values for the first column. For example, here is a file:


1    Cat
1    Dog
1    Frog
2    Boy
2    Girl
3    Tree
3    Leaf
3    Branch
3    Trunk

I would like my output to look something like this:


1    Cat
2    Boy
3    Tree


1    Dog
2    Girl
3    Leaf


1    Frog
3    Branch


3    Trunk

If a value does not exist, I want it to be skipped. I have tried to search for similar situations to mine, but I've come up short. Does anyone have idea of how to do this?

Edit: My awk version is: awk version 20070501

Best Answer

$ gawk '{print > "file" ++a[$1] ".txt"}' input

# And on OSX awk, and also gawk:

$ awk '{print > ("file" ++a[$1] ".txt")}' input

$ head file*txt
==> file1.txt <==
1    Cat
2    Boy
3    Tree

==> file2.txt <==
1    Dog
2    Girl
3    Leaf

==> file3.txt <==
1    Frog
3    Branch

==> file4.txt <==
3    Trunk

edit: An explanation. This prints the current line into (>) fileX.txt. Every time the first field is found, an array a[$1] is increased by 1 before it is evaluated. This is used to establish the file name.

edit 2: I do not have the possibility to check with OSX awk, but I guess if you are halfly serious about using awk, you would do good installing gawk or mawk. You could, however, give this a shot:

$ awk '{a[$1]++; f = "file" a[$1] ".txt"; print > f}' input

This does the same, but all the action is split in separate steps. This is to help OSX awk to understand the right order of evaluating the parts.

