Organizing three dimensional data with awk/sed/perl

awkbioinformaticsperlsedtext processing

I have this file (a sparse matrix):

PC.354 OTU1 6
PC.354 OTU2 1
PC.356 OTU0 4
PC.356 OTU2 7
PC.356 OTU3 3

I want an output like this (dense matrix -classic .biom table):

OTU_ID PC.354  PC.355  PC.356
OTU0   0   0   4
OTU1   6   0   0
OTU2   1   0   7
OTU3   0   0   3

How can I do this with awk/perl/sed?
I found a similar question about R package (xtabs/tidyr), but I'm not used to it.

Best Answer

In perl:

#!/usr/bin/perl

my (%hotu, %hpc)=();
while(<>){
  my($pc,$otu,$v)=split;
  $hpc{$pc}=1;
  ($hotu{$otu} or $hotu{$otu}={})->{$pc}+=$v;
}
#headers
my @apc = sort keys %hpc;
print join ("\t", 'OTU_ID', @apc) . "\n";
#values
foreach my $otu (sort keys %hotu) {
  print join ("\t", $otu, map {$_=0 unless defined; $_} @{$hotu{$otu}}{@apc}) . "\n";
}
Related Question