Sql-server – Most efficient way of finding rows in a table that don’t exist in another — SQL Server

performancequery-performancesql serversql-server-2008

What is the best method of finding rows in table A that don't exist in table B based on unique key of the tables, when both tables are huge (A over 300 million rows, B over 500 million rows) and the unique key is a VARCHAR(60) field? (I know, a bad idea to have this column as unique key, but I can't change it).

Specifically:

  1. Would using a CHAR(60) versus VARCHAR(60) going to help?
  2. How about hashing the key to binary(20)? Do you expect a significant
    improvement in join?

Best Answer

SELECT tbl_A.id
FROM tbl_A
LEFT JOIN tbl_B ON tbl_A.id = tbl_B.id
WHERE tbl_B.id IS NULL

This is most likely how I would do something like that. Just a simple join where the column in A does not equal the columns in B. The DBMS should take care of optimizations.

As for varchar and char: char is more efficient when using data that is all the same length.

Here is an SQLFiddle example http://sqlfiddle.com/#!9/bbbbc3/1