I have a table A which has 3800+ rows, another table B with 300+ rows.
when I execute a update like this:
update A set some_colum = '2' where another_column not in (select id from B)
it takes 33 seconds to finish. Is that too bad performance?
how to improve?
I add a non-unique index on A.another_column, but it does't help. B.id is primary key.
here is the explain :
Best Answer
Looks like my short answer was not to everyone's taste, so let me show you the long answer
First of all, let's create our lab.
@WestFarmer mentioned a couple of tables
Also a mentoned that he create an index over table A (another_column) and that B(id) was the primary key.
Now let's populate both tables, 3800+ for A and 300+ for B. I fix it to 4000 and 400.
Also I am going to add this extra record, just to show you why I suggest NOT EXIST instead NOT IN
As you will see below, ANOTHER_COLUMN include values from 101 to 500 for table A
And id store values from 1 to 400
Now the logic said that NOT IN should be the same as NOT EXIST, but if we count the record using not in you will see that it does not include NULL values:
However if we use not exists it include the null record I inserted.
Now the question does not mention nothing about NULL values but performance. So, let's remove the null record
Now both options looks similar, right? No matter if we use NOT IN or NOT EXIST.
I will show you why is better option NOT EXISTS from the performance perspective
I have activated autotrace option in order to see the execution plan.
First, let's run NOT IN option
Now, let's run NOT EXIST option
Looks pretty similar if we check the Elapsed time, however if you check the execution plan you will see one little difference:
NOT IN use HASH JOIN meanwhile NOT EXISTS use NESTED LOOPS which in this case is a better way to run this query, in fact the cost is different 4 vs 3. probably in this case does not look to much, but imagine millions of records. Big difference!
NOTE: The index over A (another_column) is not usefull no matter what option you use.