AWK – Merge 2 Rows Based on Same Column Values

awkperl

I have a file like below.

47196436 47723284 name1 1.77273

42672249 52856963 name2 1.06061
52856963 430695 name2 1.16667

55094959 380983 name3 1.55613

17926380 55584836 name4 1.02461
3213456 34211 name4 1.11
54321 34211 name4 1.23

The first 2 columns correspond to the primary keys in my table. I am trying to merge the rows in such a way that if there is same name all the keys will be in the same row.

I am trying to get the output as,

47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

I was able to achieve it partially using the below command.

awk '{ x[$3]=x[$3] " " $2; } 
END { 
   for (k in x) print k,x[k] >"OUTPUT1";  
}' ccc.txt

However, it is not giving me the output correctly. I need some assistance in further modifying the above command.

Best Answer

A perl solution:

$ perl -ane '$h{$F[2]} .= " ".$F[0]." ".$F[1];
    END {
        for $k (sort keys %h) {
            print $_," " for grep {!$seen{$_}++} split(" ",$h{$k});
            print "$k\n";
        }
    }' file

47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

Related Solutions

Bash – Merge fields in a file

An 'awk' approach,

awk '
  $1!="exon" {                       # If the first died is unequal to "exon"
    if(previous)print previous       # If there is a previous line then print it
    print                            # Print the current line
    previous=start=end=exon_seq=""   # Set all variable to an empty string
    next                             # Move on to the next line in the input file
  }
  {
    if(exon_seq) {                   # if there is a sequence of lines with "exon in field 1
      if(start<=$2 && end>=$3)       # if the start value (field 2) of the previous line 
                                     # is less or equal to the current line and the end
                                     # value of the previous line is greater than or
                                     # equal to field 3 of the current line
        next                         # then do nothing and read the next line
      else                           # if there is no overlap,
        print previous               # then print the previous line
    }
    else {                           # if we are not already in the a sequence of 
                                     # "exon" lines, then this is the first one
      exon_seq=1                     # so exon_seq should become 1
    }
    previous=$0; start=$2; end=$3    # `start` become field2, `end` becomes field 3 and
                                     # `previous` becomes the current record (line)
  }
  END{                               # After all lines are processed
    if(previous) print previous      # If there still is a previous line, then print it
  }
' file

UNIX paste columns and insert zeros for all missing values

If column-order is important, i.e. numbers from the same file should be kept in the same column, you need to add padding while reading the different files. Here is one way that works with GNU awk:

merge.awk

# Set k to be a shorthand for the key
{ k = $1 SUBSEP $2 }

# First element with this key, add zeros to align it with other rows
!(k in h) {
  for(i=1; i<=ARGIND-1; i++)
    h[k] = h[k] OFS 0 
}

# Remember the data element
{ h[k] = h[k] OFS $3 }

# Before moving to the next file, ensure that all rows are aligned
ENDFILE {
  for(k in h) {
    if(split(h[k], a) < ARGIND)
      h[k] = h[k] OFS 0
  }
}

# Print out the collected data
END {
  for(k in h) {
    split(k, a, SUBSEP)
    print a[1], a[2], h[k]
  }
}

Here are some test files: f1, f2, f3 and f4:

$ tail -n+1 f[1-4]
==> f1 <==
xyz desc1 21
uvw desc2 22
pqr desc3 23

==> f2 <==
xyz desc1 56
uvw desc2 57

==> f3 <==
xyz desc1 87
uvw desc2 88

==> f4 <==
xyz desc1 11
uvw desc2 12
pqr desc3 13
stw desc1 14
arg desc2 15

Test 1

awk -f merge.awk f[1-4] | column -t

Output:

pqr  desc3  23  0   0   13
uvw  desc2  22  57  88  12
stw  desc1  0   0   0   14
arg  desc2  0   0   0   15
xyz  desc1  21  56  87  11

Test 2

awk -f merge.awk f2 f3 f4 f1 | column -t