Lum – Replace a column and preserve spacing

awkcolumnstext processing

This is a follow up to unix: replace one entire column in one file with a single value from another file

I am trying to replace one column of a file (file1) with one specific value from another file (file2).

file1 is structured like this:

HETATM    8  P   FAD B 600      98.424  46.244  76.016  1.00 18.65
HETATM    9  O1P FAD B 600      98.634  44.801  75.700  1.00 17.69 O  
HETATM   10  O2P FAD B 600      98.010  46.640  77.387  1.00 15.59 O  
HETATM   11 H5B1 FAD B 600      96.970  48.950  72.795  1.00 -1.00 H

and I absolutely need to conserve that structure.

file2 is structured like this:

1 27, -81.883, 4.0
5 48, -67.737, 20.0
1 55, -72.923, 4.0
4 27, -62.64, 16.0

I noticed that awk is "misbehaving" and looses the format of my pdb file, meaning that instead of:

HETATM    1  PA  FAD B 600      95.987  47.188  74.293  1.00 -73.248

I get

HETATM 1 PA FAD B 600 95.887 47.194 74.387 1.00 -73.248

I have tried:

file1="./Min1_1.traj_COP1A_.27.pdb"
file2="./COP1A_report1"
value="$(awk -F, 'NR==1{print $2;exit}' $file2)"
#option 1: replaces the column I want but messes up the format
awk -F ' ' '{$11 = v} 1' v="$value" $file1 >TEST1
#option 2: keeps the format but adds the value at the end only
awk -F ' ', '{$2 = v} 1' v="$value" $file1 >TEST2
awk -F, '{$11 = v} 1' v="$value" $file1 >TEST3

I guess it is because a pdb file does not have the same delimiters for all columns and awk is not dealing with that in the manner I want it to.

Any ideas how to "tame" awk for this problem or what other command to use?

Best Answer

Use a regex ([^[:blank:]] i.e. non-blank) and replace the 11th match:

awk '{print gensub (/[^[:blank:]]+/, v, 11)}' v="$value" infile

Same with sed:

sed "s/[^[:blank:]]\{1,\}/${value}/11" infile

Another way, if your file has fixed length fields and you know the "position" of each field (e.g. assuming only spaces in your sample file, the 11th field takes up 4 chars, from 57th to 60th on each line)

awk '{print substr($0,1,56) v substr($0,61)}' v=$value file

sed -E "s/^(.{56}).{4}(.*)$/\1${value}\2/" infile

Related Solutions

Unix: replace one entire column in one file with a single value from another file

First extract the field you want from File 2:

value="$(awk -F, 'NR==1{print $3;exit}' file2)"

Then plug it into the replacement code for File 1:

awk '{$11 = v} 1' v="$value" file1

Awk – Match Values Between Two Files and Create a New File

Here's one way:

$ awk -F"[, ]" 'NR==FNR{a[$1]=$1","$2; next} ($2 in a){print a[$2]","$1}' file1 file2 
1000,Brian,3044
400,Nick,4466
1010,Jason,1206

The -F"[, ]" sets the field separator to either a space or a comma. FNR is the current line number and NR the current line number of the current file. The two will be equal only while the 1st file is being read. Therefore, NR==FNR{a[$1]=$1","$2; next} will be run only on the lines of the first file and will save the 1st and 2nd fields (with a comma in between) as values in the array a whose keys are the 1st fields. Then, when the 2nd file is being read, if the 2nd field is in a, we print the value associated with it (the 1st and 2nd fields of the first file) and the 1st field of the second file.

That said, there's actually an app for this! This sort of thing is what join was made for. Sadly, since your two files are unsorted and have different delimiters, we need some tricks. If your shell supports <(), you can do:

$ join -t, -1 1 -2 2 <(sort file1) <(sed 's/ /,/g' file2 | sort -t"," -k2) 
1000,Brian,3044
1010,Jason,1206
400,Nick,4466

The join -t, -1 1 -2 2 means use , as the delimiter and join on the 1st field of file1 and the 2nd field of file2. The sed just replaces spaces with commas so we have the same delimiter in both files. The sort does what it says on the bottle: it sorts its input.

Best Answer

Related Solutions

Unix: replace one entire column in one file with a single value from another file

Awk – Match Values Between Two Files and Create a New File

Related Question