Optimizing a compare query

minusoptimizationoracleperformancequeryquery-performance

Suppose I have two tables, A and B, and I know that size(A) = size(B). I want to confirm that the data in both tables is the same in three given columns, suppose they are X, Y, and Z (there are no keys on the table).

For that, I would do:

 SELECT COUNT(*) FROM
     (
        Select
            X, Y, Z
        From
            A
     )
     MINUS
     (
        Select
            X, Y, Z
        From
            B
     )

Now, I really don't need to know count(*) value, as long as there is one mismatch between the data, i.e. values tuple exists in A but not in B, I know that the tables are not identical. Is there a way for me to say this in SQL? I.e. as soon as MINUS encounters one mismatched value, return a value from a query indicating that?

Thanks!

Best Answer

Your requirement and logic behind the requirement makes sense, in theory. However, how quickly this can be achieved depends upon the data volume in tables A and B as well as availability of any useful indexes. Worst case scenario is if both the tables A and B have large volume of data and there are no useful indexes available on the tables. In that case (and provided table statistics are close to actual data), oracle will not be able to "find first unmatched record" any quicker than getting a count (provided you change your query to the LEFT JOIN approach mentioned above). Best case scenario is either both the tables A and B or table A has small volume of data and/or there are indexes on both tables on the combination of X, Y and Z columns. In this case, following query may perform better than doing a COUNT

SELECT COUNT(*) FROM DUAL
WHERE NOT EXISTS
(
    SELECT NULL
    FROM A LEFT JOIN B
    ON  A.X=B.X AND A.Y=B.Y AND A.Z=B.Z
    WHERE B.X IS NULL
) ;

first solution

Then, rewrite your query to:

SELECT * FROM A
WHERE A4Key IN('023', '009', '011', '013', '015', '017', '019', '021')
  AND A3 NOT IN(SELECT Z FROM B)
-- -------
UNION ALL
-- -------
SELECT * FROM A
WHERE A4Key IN('006', '024', '028', '031')
  AND A3 NOT IN(SELECT Z FROM C)
-- -------
UNION ALL
-- -------
SELECT * FROM A
WHERE A4Key IN('004', '025')
  AND A3 NOT IN(SELECT Z FROM D)

Now, evaluate your query plans to devise the best indexes.

second solution

Create a table such as:

CREATE TABLE A4Exceptions(
  A4Key VARCHAR(3),
  Z     `whichever_type_of_Z`
)

With initial data set by:

INSERT INTO A4Exceptions SELECT '023', Z FROM B
INSERT INTO A4Exceptions SELECT '009', Z FROM B
...
INSERT INTO A4Exceptions SELECT '006', Z FROM C
...

And keep it up-to-date with the changing (?) data from B, C and D.

Done that, now you may, with blazing performance (after creating some indexes):

SELECT *
FROM A
WHERE NOT EXISTS(
  SELECT *
  FROM A4Exceptions exc
  WHERE exc.A4Key = A.A4Key
    AND ext.Z     = A.A3
)

MySQL – Limit Distinct Count in GROUP BY for Better Performance

You can test this variation. In theory, it would use an (id, value) index to find the min and max and would not have to count the distinct values at all:

SELECT id
FROM t
GROUP BY id
HAVING MIN(value) < MAX(value) ;

Best Answer

Related Solutions

Mysql – Efficient way to write query for multiple table which depends on column value of a table in thesql

first solution

second solution

MySQL – Limit Distinct Count in GROUP BY for Better Performance

Related Question