Why update a table take so long on oracle

oracleperformanceupdate

I have a table A which has 3800+ rows, another table B with 300+ rows.

when I execute a update like this:

update A set some_colum = '2' where another_column not in (select id from B)

it takes 33 seconds to finish. Is that too bad performance?

how to improve?

I add a non-unique index on A.another_column, but it does't help. B.id is primary key.

here is the explain :
enter image description here

Best Answer

Looks like my short answer was not to everyone's taste, so let me show you the long answer

First of all, let's create our lab.

@WestFarmer mentioned a couple of tables

SQL>create table A (
  2  some_column    number,
  3  another_column number
  4  );

Table created.

SQL>create table B (
  2  id number
  3  );

Table created.

Also a mentoned that he create an index over table A (another_column) and that B(id) was the primary key.

SQL>create index idx1 on A (another_column);

Index created.

SQL>alter table B add constraint b_pk primary key (id);

Table altered.

Now let's populate both tables, 3800+ for A and 300+ for B. I fix it to 4000 and 400.

SQL>declare
  2    i number;
  3    j number;
  4  begin
  5    for i in 1..400 loop
  6      for j in 1..10 loop
  7        insert into A values (j, i+100);
  8      end loop;
  9      insert into B values (i);
  10   end loop;
  11 end;
  12 /

PL/SQL procedure successfully completed.

Also I am going to add this extra record, just to show you why I suggest NOT EXIST instead NOT IN

SQL>insert into A values (1,NULL);

1 row created.

SQL>commit;

Commit complete.

As you will see below, ANOTHER_COLUMN include values from 101 to 500 for table A

SQL>select max(another_column), min(another_column) from A

MAX(ANOTHER_COLUMN) MIN(ANOTHER_COLUMN)
------------------- -------------------
                500                 101

And id store values from 1 to 400

SQL>select max(id), min(id) from B;

   MAX(ID)    MIN(ID)
---------- ----------
       400          1

Now the logic said that NOT IN should be the same as NOT EXIST, but if we count the record using not in you will see that it does not include NULL values:

SQL>select count(*) from A where another_column not in (select id from B);

  COUNT(*)
----------
      1000

However if we use not exists it include the null record I inserted.

SQL>select count(*) from A where not exists (select null from B where another_column = id);

  COUNT(*)
----------
      1001

Now the question does not mention nothing about NULL values but performance. So, let's remove the null record

SQL>delete A where another_column is null;

1 row deleted.

SQL>commit;

Commit complete.

SQL>select count(*) from A where another_column not in (select id from B);

  COUNT(*)
----------
      1000

SQL>select count(*) from A where not exists (select null from B where another_column = id);

  COUNT(*)
----------
      1000

Now both options looks similar, right? No matter if we use NOT IN or NOT EXIST.

I will show you why is better option NOT EXISTS from the performance perspective

I have activated autotrace option in order to see the execution plan.

First, let's run NOT IN option

SQL>update A set some_column = '2' where another_column not in (select id from B);

1000 rows updated.

Elapsed: 00:00:00.39

Execution Plan
----------------------------------------------------------
Plan hash value: 3794573283

------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |      |  4000 |   152K|     4 |
|   1 |  UPDATE                   | A    |       |       |       |
|*  2 |   HASH JOIN RIGHT ANTI SNA|      |  4000 |   152K|     4 |
|   3 |    INDEX FULL SCAN        | B_PK |   400 |  5200 |     1 |
|   4 |    TABLE ACCESS FULL      | A    |  4000 |   101K|     2 |
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ANOTHER_COLUMN"="ID")

Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          7  recursive calls
       1022  db block gets
         44  consistent gets
          0  physical reads
     248832  redo size
        397  bytes sent via SQL*Net to client
        584  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Now, let's run NOT EXIST option

SQL>update A set some_column = '2' where not exists (select null from B where another_column = id);

1000 rows updated.

Elapsed: 00:00:00.33

Execution Plan
----------------------------------------------------------
Plan hash value: 2312933728

------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------
|   0 | UPDATE STATEMENT    |      |  4000 |   152K|     3 |
|   1 |  UPDATE             | A    |       |       |       |
|   2 |   NESTED LOOPS ANTI |      |  4000 |   152K|     3 |
|   3 |    TABLE ACCESS FULL| A    |  4000 |   101K|     2 |
|*  4 |    INDEX UNIQUE SCAN| B_PK |     1 |    13 |     1 |
------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("ANOTHER_COLUMN"="ID")

Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          7  recursive calls
         15  db block gets
         47  consistent gets
          0  physical reads
      94068  redo size
        394  bytes sent via SQL*Net to client
        601  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Looks pretty similar if we check the Elapsed time, however if you check the execution plan you will see one little difference:

NOT IN use HASH JOIN meanwhile NOT EXISTS use NESTED LOOPS which in this case is a better way to run this query, in fact the cost is different 4 vs 3. probably in this case does not look to much, but imagine millions of records. Big difference!

NOTE: The index over A (another_column) is not usefull no matter what option you use.