Calculate sum of squares using shell script in perl/awk

awkperl

I have 2 files as below.

file1

0.34
0.27
0.32

file2

0.15
0.21
0.15

Now, I would like to calculate the sum of squares between each column. For example,

[(0.34 - 0.15)^2 + (0.27 - 0.21)^2 + (0.32 - 0.15)^2 ] / 3

Where 3 is the total number of lines in the file. I will be having same number of lines in both the files.

I have come up with the below bash script which works perfectly fine, but I want to know if there is some other easier way.

#! /bin/bash   
sum=0.0
while true; do
  read -r lineA <&3
  read -r lineB <&4
  if [ -z "$lineA" -o -z "$lineB" ]; then
    break
  fi
diff=$(bc <<< "scale=5; $lineA - $lineB")
square=$(bc <<< "scale=5; $diff*$diff")
sum=$(bc <<< "scale=5; $sum+$square")
done 3<file1 4<file2
filelen=`wc -l file1 | cut -f1 -d' '`
final=$(bc <<< "scale=5; $sum/$filelen")
echo "$final"

Is there a simpler way in awk or perl?

EDIT

I had 2 million rows in my input file and the input file actually contained scientific numbers like below.

3.59564e-185

My script as well as the suggested answers failed on scientific numbers. However, I could make my script in the question work when I changed the scientific numbers to 10^ notation.

I converted my input file as below.

sed -e 's/[eE]+*/\*10\^/' file1 > file1_converted
sed -e 's/[eE]+*/\*10\^/' file2 > file2_converted

Now, the suggested 2 answers failed giving me the error message as Nan. My script seemed to work but for 2 million rows it is taking a long time to execute.

Is there any efficient way to make it work?

Best Answer

One way to do it using paste since your files have the same number of lines.

paste file1 file2 | awk '{s += ($1-$2)^2}; END{print (s+0)/NR}'
0.0228667
Related Question