I need to insert and delete records in table 1 based on the data in multiple columns in table 2.
Example: Table1 and Table2 have identical columns: address, city, state, and zip.
Table1 is the master table with millions and millions of records.
Table2 (thousands of records) is a table based on data that changed in some way – although it may not be a change in data in this table.
Matching records means that the address, city, state, and zip are identical in value in both tables.
So I want to do 2 operations:
- Insert records into Table1 that only exist in Table2
- Delete records from Table1 where the record is not in Table2 but the city
is in Table2
Based on the size of Table1, I need a very efficient way to execute this.
I know some SQL – but not to the level this would require if done with SQL.
This is a DB2 database – version 10.5.
Best Answer
Operation 1
Insert records into
Table1
that only exist inTable2
:Operation 2
Delete records from
Table1
where the record is not inTable2
but thecity
is inTable2
.(assuming that the columns ar enot nullable. If they allow nulls, it would be better to rewrite the
NOT IN
with aNOT EXISTS
subquery):As for performance and efficiency, I have almost zero experience with DB2, so it would be better if you waited for some other answer. But an index on all the columns
(address, city, state, zip)
- on both tables - would probably be the best you can do for the queries, plus a single index on(city)
. Or perhaps combine the two suggestions in one index with all the 4 columns but choosecity
for the 1st part of the index:(city, address, state, zip)
.You haven't provided the tables' structure but the 4 columns seem appropriate for a
UNIQUE
constraint so you may already have such an index.