Linux – How to combine the information of each pair of rows in one row

linuxtext processing

I have data like this (the real data has over 50,000 digits and 8000 rows):

input:

I want to put the value of each second row beside the value of the first row with the same name. Also, there should be two space as deliminator between each pair of values and there should be one tab as deliminator among different pair of values. The output should look like:

output:

1   1  2    1  1    1  1    2  2    2  1
2   2  1    2  1    2  1    2  2    1  2
3   2  1    1  1    1  1    2  2    1  2

any suggestion?

Best Answer

I'd use perl, and run it as oneliner like this:

perl -wne 'sub parseline { ($id,$v) = split; return split //,$v };
    @a = parseline();
    print "$id\t";
    $_ = <>;
    @b = parseline();
    for ($i=0; $i<@a; $i++) {
      print "$a[$i]  $b[$i]\t"
    };
    print "\n"' < input  > output

Explanation:

perl -wne runs the rest of command for each line of input
sub parseline { .... } will parse input, and set first number in line as $id, and return the rest as array of characters.
@a=parseline() will store first line chars in array @a
next, we print $id, followed by TAB (\t)
$_=<>; @b=parseline(); will read next (even) line and put it's data in array @b
for ($i=0; $i<@a; $i++) { print "$a[$i] $b[$i]\t" } for each element of the array @a, we will print that element, two spaces, corresponding element from array @b and then tab
print "\n" will print newline at the end
due to -n parameter to perl at the start, whole process will restart with line 3, then 5, then 7 etc.
< input > output indicates from which file we read our input, and to which file we write output.

Note: the code will print extra tab at the end of each line. Removing it is left as an exercise for the reader to prevent crowdsourced homework assignments and keep code little simpler. Also the code assumes that lines to pair are always two and one after another (as given in example)

As it processes input file line by line, it easily scales linearly for many thousands of lines...

Related Solutions

Linux Shell Text Processing – How to Sum Up Values of Each Two Rows Across Their Line in Linux

sed '
    N                                                       #append next line
    s/$/))/                                                 #add `))` to end
    s/\(\S*\s*\)\(.*\)\n\1/printf "%016d\n" \$((10#\2+10#/  #check Nos, form line
    t                                                       #to end if Nos equal
    s/))$//                                                 #remove `))`
    D                                                       #delete 1st line
    ' file |
bash

Regarding 45000 digits number please note that maximum number which bash can handle is

/* Minimum and maximum values a `signed long int' can hold.  */
#  if __WORDSIZE == 64
#   define LONG_MAX 9223372036854775807L
#  else
#   define LONG_MAX 2147483647L
#  endif

[ 1 ] /usr/include/limits.h

Best Answer

Related Solutions

Linux Shell Text Processing – How to Sum Up Values of Each Two Rows Across Their Line in Linux

Related Question