Linux – How to convert a 3 column csv file into a table (or matrix)

I have an CSV input file format like this, with a nucleotide sequence in field 1, text in field 2, and an integer in field 4:

ATGC,CD3,56
ATGC,CD4,67
ATGC,IgD,126
ATGC,IgM,127
AGTC,CD3,67
AGTC,CD4,78
AGTC,IgD,102
AGTC,IgM,89
TCGA,CD3,334
TCGA,CD4,123
TCGA,IgD,456
TCGA,IgM,80
CGTA,CD3,54
CGTA,CD4,32
CGTA,IgD,82
CGTA,IgM,117

I opened this CSV file using Numbers in Mac it's display as 3 columns format, however, I want to convert it to the table(or Matrix) format (also a CSV file), making the first column, the nucleotide sequences, into a header, and want the result to also look like a table (or Matrix):

     ATGC  AGTC  TCGA  CGTA
CD3  56    67    334   54
CD4  67    78    123   32
IgD  126   102   456   82
IgM  127   89    80    117

Below is a section from my real input CSV file (sample input.txt):

AGAATAGTCTGATTCT,-,,38
AGAATAGTCTGATTCT,AnnexinV,,51
AGAATAGTCTGATTCT,CD127,,39
AGAATAGTCTGATTCT,CD138,,3
AGAATAGTCTGATTCT,CD14,,2
AGAATAGTCTGATTCT,CD16,,4
AGAATAGTCTGATTCT,CD19,,10
AGAATAGTCTGATTCT,CD20,,6
AGAATAGTCTGATTCT,CD24,,21
AGAATAGTCTGATTCT,CD25,,4
AGAATAGTCTGATTCT,CD27,,87
AGAATAGTCTGATTCT,CD3,,235
AGAATAGTCTGATTCT,CD34,,5
AGAATAGTCTGATTCT,CD38,,18
AGAATAGTCTGATTCT,CD4,,412
AGAATAGTCTGATTCT,CD43,,99
AGAATAGTCTGATTCT,CD5,,430
AGAATAGTCTGATTCT,CD56,,3
AGAATAGTCTGATTCT,CD8,,7
AGAATAGTCTGATTCT,IgD,,4
AGAATAGTCTGATTCT,IgM,,2
TGTGGTAGTTCGTCTC,-,,9
TGTGGTAGTTCGTCTC,AnnexinV,,42
TGTGGTAGTTCGTCTC,CD127,,6
TGTGGTAGTTCGTCTC,CD138,,4
TGTGGTAGTTCGTCTC,CD16,,40
TGTGGTAGTTCGTCTC,CD19,,7
TGTGGTAGTTCGTCTC,CD20,,2
TGTGGTAGTTCGTCTC,CD24,,24
TGTGGTAGTTCGTCTC,CD25,,2

How can I do this using Linux text formatting commands?

{ ks[$1 $2] = $3; # save the third column using the first and second as index k1[$1]++; # save the first column k2[$2]++; # save the second column } END { # After processing input for (j in k1) { # loop over the first column printf "\t%s", j; # and print column headers }; print ""; # newline for (i in k2) { # loop over the second printf "%s", i; # print it as row header for (j in k1) { # loop over first again printf "\t%s", ks[j i]; # and print values } print ""; # newline } }

~ awk -F, -f foo.awk foo AGTC ATGC CGTA TCGA CD4 78 67 32 123 IgD 102 126 82 456 IgM 89 127 117 80 CD3 67 56 54 334

Linux – How to convert a 3 column csv file into a table (or matrix)

Best Answer

Related Question

Best Answer

Related Solutions

Convert JSON to CSV – Using JQ for JSON Array Conversion

Related Question