Is there a way to compare two file and give some kind of numeric indication of their similarity?
For example, if I have two files that differ by just one character (say, a character was deleted or changed), the program ought to say something like "file X differs by 1 character."
Or if two lines are different, say "file X differs by two lines."
The best output would be something like "File X is 95% similar to file Y"
Best Answer
One approach could be to compute the Levenshtein distance.
Here using the
Text::LevenshteinXS
perl
module:Then:
Here's a line-based implementation of the Levenshtein distance in
awk
(computes the distance in terms of number of inserted/deleted/modified lines instead of characters):You may also be interested in
diffstat
's output: