Ubuntu – Compare 2 numerals and copy only similar part sed/grep/awk


Supposing I have a array called a. There are 2 entries in a array a[1] and a[2].So each element contains a numeral value. Both these values have a similar starting numbers however they have different endings. I am to copy the similar part and ignore the rest.

So as an example

$ echo ${a[1]}

$ echo ${a[2]}

I need some command to compare these elements and then copy only the similar part until the first non matching field.
ie., in this example


similar part is .

Another example

$ echo ${a[1]}

$ echo ${a[2]}

OUTPUT for this example

similar part is .

Best Answer

From Stack Overflow:

In sed, assuming the strings don't contain any newline characters:

string1="test toast"
string2="test test"
printf "%s\n%s\n" "$string1" "$string2" | sed -e 'N;s/^\(.*\).*\n\1.*$/\1/'

This assumes that the strings themselves don't contain newlines.

Therefore you can do:

printf "%s\n" "${a[1]}" "${a[2]}" | sed -r 'N;s/^(.*)(\..*)?\n\1.*$/\1/'

The (\..*) should eliminate a trailing . from the common section.

The solution involves two parts:

  • Getting sed to work across two lines. This is done using N, and can be avoided if a character is guaranteed to be not in the input. For example, because spaces are not present in the elements as given, we can instead use:

    printf "%s " "${a[1]}" "${a[2]}" | sed -r 's/^(.*)(\..*)? \1.*$/\1/'

    Essentially, the character or string separating the two elements in the output should be used after %s in the printf formatting string, and before \1 in the regular expression.

  • Finding a repeating string using regex. The trick for this is well-known, and is always a variation of:


    .* matches any set of characters, and () groups them for later reference, by \1. Thus (.*)\1 is any sequence of characters followed by itself.

Related Question