I have data like this (the real data has over 50,000 digits and 8000 rows):
input:
1 11122
1 21121
2 22221
2 11122
3 21121
3 11122
I want to put the value of each second row beside the value of the first row with the same name. Also, there should be two space as deliminator between each pair of values and there should be one tab as deliminator among different pair of values. The output should look like:
output:
1 1 2 1 1 1 1 2 2 2 1
2 2 1 2 1 2 1 2 2 1 2
3 2 1 1 1 1 1 2 2 1 2
any suggestion?
Best Answer
I'd use perl, and run it as oneliner like this:
Explanation:
perl -wne
runs the rest of command for each line of inputsub parseline { .... }
will parse input, and set first number in line as$id
, and return the rest as array of characters.@a=parseline()
will store first line chars in array @a$id
, followed by TAB (\t
)$_=<>; @b=parseline();
will read next (even) line and put it's data in array@b
for ($i=0; $i<@a; $i++) { print "$a[$i] $b[$i]\t" }
for each element of the array@a
, we will print that element, two spaces, corresponding element from array@b
and then tabprint "\n"
will print newline at the end-n
parameter toperl
at the start, whole process will restart with line 3, then 5, then 7 etc.< input > output
indicates from which file we read our input, and to which file we write output.Note: the code will print extra tab at the end of each line. Removing it is left as an exercise for the reader to prevent crowdsourced homework assignments and keep code little simpler. Also the code assumes that lines to pair are always two and one after another (as given in example)
As it processes input file line by line, it easily scales linearly for many thousands of lines...