I have a file with 250k rows and 10 columns like:
img1 aa bb cc ...
img2 aa yy dd ...
img3 uu bb ee ...
img4 NA bb tt ...
I want a script that will convert this file to:
img1 1 1 1 ...
img2 1 2 2 ...
img3 2 1 3 ...
img4 0 1 4 ...
A unique value from each column after first one should be replaced with a unique identifier starting from 0, where 0 is reserved for the string "NA".
Also, for each column, I want to generate a file that contains the mapping. For example, the file for 2nd column should be:
NA 0
aa 1
uu 2
Can anyone suggest an elegant solution for this? Any help would be greatly appreciated.
Best Answer
Here's a very simple approach. Works fine for me, using gawk 3.1.7.