Why update a table take so long on oracle

oracleperformanceupdate

I have a table A which has 3800+ rows, another table B with 300+ rows.

when I execute a update like this:

update A set some_colum = '2' where another_column not in (select id from B)

it takes 33 seconds to finish. Is that too bad performance?

how to improve?

I add a non-unique index on A.another_column, but it does't help. B.id is primary key.

here is the explain :
enter image description here

Best Answer

Looks like my short answer was not to everyone's taste, so let me show you the long answer

First of all, let's create our lab.

@WestFarmer mentioned a couple of tables

SQL>create table A (
  2  some_column    number,
  3  another_column number
  4  );

Table created.

SQL>create table B (
  2  id number
  3  );

Table created.

Also a mentoned that he create an index over table A (another_column) and that B(id) was the primary key.

SQL>create index idx1 on A (another_column);

Index created.

SQL>alter table B add constraint b_pk primary key (id);

Table altered.

Now let's populate both tables, 3800+ for A and 300+ for B. I fix it to 4000 and 400.

SQL>declare
  2    i number;
  3    j number;
  4  begin
  5    for i in 1..400 loop
  6      for j in 1..10 loop
  7        insert into A values (j, i+100);
  8      end loop;
  9      insert into B values (i);
  10   end loop;
  11 end;
  12 /

PL/SQL procedure successfully completed.

Also I am going to add this extra record, just to show you why I suggest NOT EXIST instead NOT IN

SQL>insert into A values (1,NULL);

1 row created.

SQL>commit;

Commit complete.

As you will see below, ANOTHER_COLUMN include values from 101 to 500 for table A

SQL>select max(another_column), min(another_column) from A

MAX(ANOTHER_COLUMN) MIN(ANOTHER_COLUMN)
------------------- -------------------
                500                 101

And id store values from 1 to 400

SQL>select max(id), min(id) from B;

   MAX(ID)    MIN(ID)
---------- ----------
       400          1

Now the logic said that NOT IN should be the same as NOT EXIST, but if we count the record using not in you will see that it does not include NULL values:

SQL>select count(*) from A where another_column not in (select id from B);

  COUNT(*)
----------
      1000

However if we use not exists it include the null record I inserted.

SQL>select count(*) from A where not exists (select null from B where another_column = id);

  COUNT(*)
----------
      1001

Now the question does not mention nothing about NULL values but performance. So, let's remove the null record

SQL>delete A where another_column is null;

1 row deleted.

SQL>commit;

Commit complete.

SQL>select count(*) from A where another_column not in (select id from B);

  COUNT(*)
----------
      1000

SQL>select count(*) from A where not exists (select null from B where another_column = id);

  COUNT(*)
----------
      1000

Now both options looks similar, right? No matter if we use NOT IN or NOT EXIST.

I will show you why is better option NOT EXISTS from the performance perspective

I have activated autotrace option in order to see the execution plan.

First, let's run NOT IN option

SQL>update A set some_column = '2' where another_column not in (select id from B);

1000 rows updated.

Elapsed: 00:00:00.39

Execution Plan
----------------------------------------------------------
Plan hash value: 3794573283

------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |      |  4000 |   152K|     4 |
|   1 |  UPDATE                   | A    |       |       |       |
|*  2 |   HASH JOIN RIGHT ANTI SNA|      |  4000 |   152K|     4 |
|   3 |    INDEX FULL SCAN        | B_PK |   400 |  5200 |     1 |
|   4 |    TABLE ACCESS FULL      | A    |  4000 |   101K|     2 |
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ANOTHER_COLUMN"="ID")

Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          7  recursive calls
       1022  db block gets
         44  consistent gets
          0  physical reads
     248832  redo size
        397  bytes sent via SQL*Net to client
        584  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Now, let's run NOT EXIST option

SQL>update A set some_column = '2' where not exists (select null from B where another_column = id);

1000 rows updated.

Elapsed: 00:00:00.33

Execution Plan
----------------------------------------------------------
Plan hash value: 2312933728

------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------
|   0 | UPDATE STATEMENT    |      |  4000 |   152K|     3 |
|   1 |  UPDATE             | A    |       |       |       |
|   2 |   NESTED LOOPS ANTI |      |  4000 |   152K|     3 |
|   3 |    TABLE ACCESS FULL| A    |  4000 |   101K|     2 |
|*  4 |    INDEX UNIQUE SCAN| B_PK |     1 |    13 |     1 |
------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("ANOTHER_COLUMN"="ID")

Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          7  recursive calls
         15  db block gets
         47  consistent gets
          0  physical reads
      94068  redo size
        394  bytes sent via SQL*Net to client
        601  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Looks pretty similar if we check the Elapsed time, however if you check the execution plan you will see one little difference:

NOT IN use HASH JOIN meanwhile NOT EXISTS use NESTED LOOPS which in this case is a better way to run this query, in fact the cost is different 4 vs 3. probably in this case does not look to much, but imagine millions of records. Big difference!

NOTE: The index over A (another_column) is not usefull no matter what option you use.

PERSPECTIVE #1

When you update by Primary Key only in InnoDB, there is a rare but possible occasion when the clustered index (aka gen_clust_index) can get locked.

I once answered three posts from one individual on this subject

Please read through these carefully. The poster of these question found his own workaround based on the seeing InnoDB Clustered Index Locking behavior. Sadly, he did not post what the workaround was.

In addition, when you see the queries running slow, log into mysql and run SHOW ENGINE INNODB STATUS\G and starting looking for locks in the Clustered Index.

PERSPECTIVE #2

I see you commented out innodb_log_file_size. You have it at 5MB, the default. Since innodb_buffer_pool_size is set to 1G, innodb_log_file_size needs to be at 256M. Click here to go about setting innodb_log_file_size to 256M.

PERSPECTIVE #3

I see you are not using innodb_file_per_table. You may want to use it in order to have table updates done specifically for the one table with a million rows. Click here to see how to Clean Up InnoDB infrastructure to use innodb_file_per_table.

Sql-server – Improve Performance -Update query with OR in WHERE

The best option if you really want to squeeze out the most is to parallelise the query.

Instead of a single UPDATE, run multiple copies. Each of them operate on a subset of the data in the temporary table. Keep adding more copies and more subsets until you run out of cores of CPU (whatever happens first).

If you don't have the ability to parallelise. Run two UPDATES, one that works on the first part of the OR, another on the second. If possible, add indexes on t_h_bg_pc_it to support both queries. Like this:

Declare @date datetime = '09-Dec-2013' 
update  #tmp_sp_abc  
set     test = rtrim(xyz_test)  
from    #tmp_sp_abc t1
join    t_h_bg_pc_it t2 
  on    t2.id_i = t1.i
where   t1.r_type = 1  
and     t2.[date] = @date

update  #tmp_sp_abc  
set     test = rtrim(xyz_test)  
from    #tmp_sp_abc t1
join    t_h_bg_pc_it t2 
  on    tt2.id_s1 = t1.s
where   t1.r_type = 1  
and     t2.[date] = @date

You can now index t_h_bg_pc_it on both s, date, xyz_test and id_id, date, xyz_test (two indexes). This should speed things up. It may also be useful to index #tmp_sp_abc on r_type. If possible, do both the parallelism and the indexing/rewrite.

Finally: Fire whoever it is that comes up with naming conventions for your tables. Life is too short for this type of cryptic naming, SQL Server does auto-completion.

Best Answer

Related Solutions

Mysql – Innodb UPDATE performance

PERSPECTIVE #1

PERSPECTIVE #2

PERSPECTIVE #3

Sql-server – Improve Performance -Update query with OR in WHERE

Related Question