Ubuntu – Text manipulation with sed

command linesedtext processing

Currently, I have multiple text files with contents looking like this (with many lines):

565 0 10 12 23 18 17 25
564 1 7 12 13 16 18 40 29 15

I wish to change each line to have the following format:

0 565:10:1 565:12:1 565:23:1 565:18:1 565:17:1 565:25:1
1 564:7:1 564:12:1 564:13:1 564:16:1 564:18:1 564:40:1 564:29:1 564:15:1

Are there any way of doing the above using sed? Or do I need to resort to Python?

Best Answer

You could do it with sed, yes, but other tools are simpler. For example:

$ awk '{
        printf "%s ", $2; 
        for(i=3;i<=NF;i++){
            printf "%s:%s:1 ",$1,$(i) 
        }
        print ""
       }' file 
0 565:10:1 565:12:1 565:23:1 565:18:1 565:17:1 565:25:1 
1 564:7:1 564:12:1 564:13:1 564:16:1 564:18:1 564:40:1 564:29:1 564:15:1 

Explanation

awk will split each line of input on whitespace (by default), saving each fields as $1, $2, $N. So:

  • printf "%s ", $2; will print the 2nd field and a trailing space.
  • for(i=3;i<=NF;i++){ printf "%s:%s:1 ",$1,$(i) } : will iterate over fields 3 to the last field (NF is the number of fields) and for each of them it will print the 1st field, a :, then the current field and a :1.
  • print "" : this just prints a final newline.

Or Perl:

$ perl -ane 'print "$F[1] "; print "$F[0]:$_:1 " for @F[2..$#F]; print "\n"' file 
0 565:10:1 565:12:1 565:23:1 565:18:1 565:17:1 565:25:1 
1 564:7:1 564:12:1 564:13:1 564:16:1 564:18:1 564:40:1 564:29:1 564:15:1 

Explanation

The -a makes perl behave like awk and split its input on whitespace. Here, the fields are stored in the array @F, meaning that the 1st field will be $F[0], the 2nd $F[1] etc. So:

  • print "$F[1] " : print the 2nd field.
  • print "$F[0]:$_:1 " for @F[2..$#F]; : iterate over fields 3 to the last field ($#F is the number of elements in the array @F, so @F[2..$#F] takes an array slice starting at the 3rd element until the end of the array) and print the 1st field, a :, then the current field and a :1.
  • print "\n" : this just prints a final newline.
Related Question