Bash – Intersection of Two Arrays

arraybashscripting

I have two arrays like this:

A=(vol-175a3b54 vol-382c477b vol-8c027acf vol-93d6fed0 vol-71600106 vol-79f7970e vol-e3d6a894 vol-d9d6a8ae vol-8dbbc2fa vol-98c2bbef vol-ae7ed9e3 vol-5540e618 vol-9e3bbed3 vol-993bbed4 vol-a83bbee5 vol-ff52deb2)
B=(vol-175a3b54 vol-e38d0c94 vol-2a19386a vol-b846c5cf vol-98c2bbef vol-7320102b vol-8f6226cc vol-27991850 vol-71600106 vol-615e1222)

The arrays are not sorted and might possibly even contain duplicated elements.

  1. I would like to make the intersection of these two arrays and store the elements in another array. How would I do that?

  2. Also, how would I get the list of elements that appear in B and are not available in A?

Best Answer

comm(1) is a tool that compares two lists and can give you the intersection or difference between two lists. The lists need to be sorted, but that's easy to achieve.

To get your arrays into a sorted list suitable for comm:

$ printf '%s\n' "${A[@]}" | LC_ALL=C sort

That will turn array A into a sorted list. Do the same for B.

To use comm to return the intersection:

$ comm -1 -2 file1 file2

-1 -2 says to remove entries unique to file1 (A) and unique to file2 (B) - the intersection of the two.

To have it return what is in file2 (B) but not file1 (A):

$ comm -1 -3 file1 file2

-1 -3 says to remove entries unique to file1 and common to both - leaving only those unique to file2.

To feed two pipelines into comm, use the "Process Substitution" feature of bash:

$ comm -1 -2 <(pipeline1) <(pipeline2)

To capture this in an array:

$ C=($(command))

Putting it all together:

# 1. Intersection
$ C=($(comm -12 <(printf '%s\n' "${A[@]}" | LC_ALL=C sort) <(printf '%s\n' "${B[@]}" | LC_ALL=C sort)))

# 2. B - A
$ D=($(comm -13 <(printf '%s\n' "${A[@]}" | LC_ALL=C sort) <(printf '%s\n' "${B[@]}" | LC_ALL=C sort)))
Related Question