Side-by-side comparison of more than two files containing numerical values

awkdiff()text processing

I have three files containing a sorted sequence of numbers, one per line :

file1

1
2
3

file2

1
3
4

file3

1
5

I want to "align" these three files side-by-side like the following :

file1  file2  file3
1      1      1
2      
3      3
       4
              5

I've tried with sdiff but it only works with 2 files

Best Answer

You could process each file and print a line with some character e.g. X for every missing number in the sequence 1-max (where max is the last number in that file), paste the results then replace that character with space:

paste \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file1) \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file2) \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file3) \
| tr X ' '

If a certain value is missing from all files you'll get empty lines in your output (actually they're not empty, they contain only blanks).
To remove them replace tr X ' ' with sed '/[[:digit:]]/!d;s/X/ /g' Also, if you need a header you can always run something like this first:

 printf '\t%s' file1 file2 file3 | cut -c2-

File1

123123,,
222333,,

File2

111222,Jones,Sally
111333,Johnson,Roger
123123,Doe,John
444555,Richardson,George
222333,Smith,Jane
223456,Alexander,Philip

You could try using the join command, like so:

# join -t, -v 2 <(sort file1) <(sort file2)
111222,Jones,Sally
111333,Johnson,Roger
223456,Alexander,Philip
444555,Richardson,George

More information about the command can be found here: man join

join [OPTION]... FILE1 FILE2

-t CHAR
    use CHAR as input and output field separator 
-v FILENUM
    like -a FILENUM, but suppress joined output lines

Shell – How to merge two files in the same row

The join utility is intended for exactly this kind of problem: it joins two files based on one of their fields, by default the first one. The files should be sorted first; so

join <(sort file2) <(sort file1) | column -t

produces

Alice  Wednesday  616.556.4458
Bob    Tuesday    313.123.4567
Carol  Monday     248.344.5576
Dave   Thursday   734.838.9800
Mary   Saturday   313.449.1390
Ted    Sunday     248.496.2204

This is sorted by name rather than by weekday; you'd need some post-processing to sort by weekday if necessary...

Best Answer

Related Solutions

Shell – Compare two .csv files

File1

File2

Shell – How to merge two files in the same row

Related Question