Repeat a line, splitting one field

awkperltext processing

I have a tab delimited file in which in every line I have this:

K00001;K00004;K00008    0   0   34  0   0   0   0   0   0   0   0   0   0   0   0   0   36  0   0   52  0   0   0   6   0

I would like to have one row with a unique code and the same sequence of numbers like this:

K00001 0    0   34  0   0   0   0   0   0   0   0   0   0   0   0   0   36  0   0   52  0   0   0   6   0    
K00004 0    0   34  0   0   0   0   0   0   0   0   0   0   0   0   0   36  0   0   52  0   0   0   6   0    
K00008 0    0   34  0   0   0   0   0   0   0   0   0   0   0   0   0   36  0   0   52  0   0   0   6   0

Best Answer

You can use awk to split the first column:

~$ awk '{split($1,a,";"); $1="";for (i in a){print a[i],$0}}' myfile
K00001  0 0 34 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0 0 52 0 0 0 6 0
K00004  0 0 34 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0 0 52 0 0 0 6 0
K00008  0 0 34 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0 0 52 0 0 0 6 0

You split the first column on ; (split($1,a,";")) and then you delete it, to print the whole (new) line ($0) for each item in the array.

As suggested in the comment, after the edit, we can see that tabs are used as separator. To use tab as Output Field Separator, you can use OFS="\t", for instance in the BEGIN part of awk. Besides, an empty field is inserted with $1="". So instead of printing a[i] then $0, it is better to set $1 to a[i] and then print $0:

~$ awk 'BEGIN{OFS="\t"}{split($1,a,";"); for (i in a){$1=a[i];print}}' myfile
K00001  0       0       34      0       0       0       0       0       0       0       0       0       0       0       0       0       36      0       0       52      0       0       0       6       0
K00004  0       0       34      0       0       0       0       0       0       0       0       0       0       0       0       0       36      0       0       52      0       0       0       6       0
K00008  0       0       34      0       0       0       0       0       0       0       0       0       0       0       0       0       36      0       0       52      0       0       0       6       0

Explanation

Setting FS and OFS to tab ensures the output is correctly delimited. The for-loop looks at every field and sets it to zero if it is empty. The one at the end is a shorthand for { print $0 }.

Combine columns from several files into one

awk 'FNR==1 {print $2}' file*

This prints the second column ($2) of the first line (FNR==1) for every file whose filename starts with file.

An alternative is to print the first line and then immediately skip to the next file (nextfile is a mawk and GNU awk-specific keyword):

awk '{print $2; nextfile}' file*

Best Answer

Related Solutions

Text Processing – How to Replace Missing Value Blank Space with Zero

Explanation

Combine columns from several files into one

Related Question