I have a big csv file (Test.csv), which looks like this:
1,2,3,A,5
1,2,3,B,5
1,2,3,E,5
1,2,3,D,5
1,2,3,Z,5
1,2,3,B,5
I want to print the lines in which the 4th column has the same content in different files. Actually, I need to join these lines that have the same content in a new csv or txt file, named as the 4th column content. For example:
Output:
File A
1,2,3,A,5
1,2,3,A,5
1,2,3,A,5
File B
1,2,3,B,5
1,2,3,B,5
Since the input file is large, I have no idea how many different patterns there are in this 4th column. Column 4 contains only words and the other columns contain words and/or numbers.
As I have no experience, I researched similar questions and even tried the following code:
awk 'NR==FNR{a[$4]=NR; next} $NF in a {print > "outfile" a[$NF]}' Test.csv
but nothing worked. Can anyone help me, please? Thanks in advance.
Best Answer
This will work efficiently using POSIX sort and any awk in any shell on every UNIX box:
Some things to note: