Lum – Replace a column and preserve spacing

awkcolumnstext processing

This is a follow up to unix: replace one entire column in one file with a single value from another file

I am trying to replace one column of a file (file1) with one specific value from another file (file2).

file1 is structured like this:

HETATM    8  P   FAD B 600      98.424  46.244  76.016  1.00 18.65
HETATM    9  O1P FAD B 600      98.634  44.801  75.700  1.00 17.69 O  
HETATM   10  O2P FAD B 600      98.010  46.640  77.387  1.00 15.59 O  
HETATM   11 H5B1 FAD B 600      96.970  48.950  72.795  1.00 -1.00 H  

and I absolutely need to conserve that structure.

file2 is structured like this:

1 27, -81.883, 4.0
5 48, -67.737, 20.0
1 55, -72.923, 4.0
4 27, -62.64, 16.0

I noticed that awk is "misbehaving" and looses the format of my pdb file, meaning that instead of:

HETATM    1  PA  FAD B 600      95.987  47.188  74.293  1.00 -73.248

I get

HETATM 1 PA FAD B 600 95.887 47.194 74.387 1.00 -73.248 

I have tried:

file1="./Min1_1.traj_COP1A_.27.pdb"
file2="./COP1A_report1"
value="$(awk -F, 'NR==1{print $2;exit}' $file2)"
#option 1: replaces the column I want but messes up the format
awk -F ' ' '{$11 = v} 1' v="$value" $file1 >TEST1
#option 2: keeps the format but adds the value at the end only
awk -F ' ', '{$2 = v} 1' v="$value" $file1 >TEST2
awk -F, '{$11 = v} 1' v="$value" $file1 >TEST3

I guess it is because a pdb file does not have the same delimiters for all columns and awk is not dealing with that in the manner I want it to.

Any ideas how to "tame" awk for this problem or what other command to use?

Best Answer

Use a regex ([^[:blank:]] i.e. non-blank) and replace the 11th match:

awk '{print gensub (/[^[:blank:]]+/, v, 11)}' v="$value" infile

Same with sed:

sed "s/[^[:blank:]]\{1,\}/${value}/11" infile

Another way, if your file has fixed length fields and you know the "position" of each field (e.g. assuming only spaces in your sample file, the 11th field takes up 4 chars, from 57th to 60th on each line)

awk '{print substr($0,1,56) v substr($0,61)}' v=$value file

or

sed -E "s/^(.{56}).{4}(.*)$/\1${value}\2/" infile
Related Question