I have 2 files as below.
file1
0.34
0.27
0.32
file2
0.15
0.21
0.15
Now, I would like to calculate the sum of squares between each column. For example,
[(0.34 - 0.15)^2 + (0.27 - 0.21)^2 + (0.32 - 0.15)^2 ] / 3
Where 3 is the total number of lines in the file. I will be having same number of lines in both the files.
I have come up with the below bash script which works perfectly fine, but I want to know if there is some other easier way.
#! /bin/bash
sum=0.0
while true; do
read -r lineA <&3
read -r lineB <&4
if [ -z "$lineA" -o -z "$lineB" ]; then
break
fi
diff=$(bc <<< "scale=5; $lineA - $lineB")
square=$(bc <<< "scale=5; $diff*$diff")
sum=$(bc <<< "scale=5; $sum+$square")
done 3<file1 4<file2
filelen=`wc -l file1 | cut -f1 -d' '`
final=$(bc <<< "scale=5; $sum/$filelen")
echo "$final"
Is there a simpler way in awk
or perl
?
EDIT
I had 2 million rows in my input file and the input file actually contained scientific numbers like below.
3.59564e-185
My script as well as the suggested answers failed on scientific numbers. However, I could make my script in the question work when I changed the scientific numbers to 10^
notation.
I converted my input file as below.
sed -e 's/[eE]+*/\*10\^/' file1 > file1_converted
sed -e 's/[eE]+*/\*10\^/' file2 > file2_converted
Now, the suggested 2 answers failed giving me the error message as Nan
. My script seemed to work but for 2 million rows it is taking a long time to execute.
Is there any efficient way to make it work?
Best Answer
One way to do it using
paste
since your files have the same number of lines.