Concatenating columns horizontally. Printing only rows that intersect on 1st column

awkzsh

Say I have two different commands that output contents by columns, e.g.:

$ command_1

7049857 abc fdg hsi
5409858 xxx fyy hsi
540958  abc zzz hsi
54230956  rbc sss hsi

$ command_2

7049857 0 fdg free
5409858 0 fyy free
540958  2 zzz free

I would like to silently grab columns with indices x,y,z from the output of command_1 and the columns with indices a,b,c from the output command_2 and print a new output: x,y,z,a,b,c.

Example:

Output the column column 1 and 2 from commmand_2: and the last column from command_1:

$ new_command

7049857 0  hsi
5409858 0  hsi
540958  2  hsi

What I have so far:

I am relatively new to awk, but I know I can grab the corresponding columns from each of these commands with awk:

command_1 | awk '{print $x " " $y " " $z}'
command_2 | awk '{print $a " " $b " " $c}'

The above concatenates the columns vertically but I need to concatenate them horizontally.

Printing only those rows that intersect on the first column:

Say that it is possible that command_2 has more or less rows than command_1. However, the first column of both commands holds items belonging to the same category (a multiple-digit ID, as in the example above)

With this, assuming that the first column of both commands are sorted equally and that new or missing entries can only happen at the end, how could we make sure that we only print those rows for which we have entries in command_1 and command_2 (i.e. the intersection of both commands)? (the example above shows this)

Best Answer

Try something like this:

join <(command1) <(command2) | cut -d ' ' -f 1,5,4

The join command takes files, not commands, but with a shell such as bash you can use the shown construct to turn the command output into a virtual file.

Instead of cut you can also use awk, if that's your thing.

If you find yourself doing this a lot, consider using a relational database engine.

Related Question