Updating 3 million rows in 100million+ table

oracleperformance

I have a really big table containing over 100kk rows and I would like to update about 3kk rows.

Following count returns 3kk and takes about 300s to execute

select count(*) from A where x=.. and y=..

But the update statement is much slower…

update A set z = .. where x=.. and y=..

V$SESSION_LONGOPS says it might take ~10-15h to finish.

What are the steps I can take to improve the performance of the above update? CTAS probably doesn't make much sense in this case (3kk rows in 100kk rows table).

Best Answer

Let's say your statement is:

update A set z = 'HELLO WORLD' where x = 1 and y = 2;

1) This does extra work, because it updates column z to HELLO WORLD, even if it was HELLO WORLD already. Rewrite it as:

update A set z = 'HELLO WORLD' where x = 1 and y = 2 and (z != 'HELLO WORLD' or z is null);

2) If you have 2 columns indexed seperately, it is possible, but quite rare, that the database uses both indexes for the lookup. You can try creating 1 index having both columns, but I highly doubt this would be of any help, you would need a big, wide table with exceptionally good clustering for x and y columns to benefit from this for 3/100 million rows.

3) DML does not run in parallel by default. You can enable parallel execution as:

alter session force parallel dml parallel 8;

Note that this will not force parallel execution even if the syntax suggests that. It's like adding parallel 8 hint to your statements in your session.

4) If column z is indexed, than the UPDATE statement need to maintain the index(es) as well. Sometimes it is better to make the index(es) unusable, run the DML, then rebuild the index(es).

5) If the content of column z does not fit its original place (HI becomes HELLO WORLD), and your table blocks are full, you may experience row migration, that makes performance worse not only during the UPDATE, but for later queries as well.

You would be surprised how much better CTAS can be.

6) If column z is part of an enabled foreign key constraint, you can not simply update it to any value. The database needs to check the parent values to make sure you are updating to a valid value. Again, disable, run the DML, enable validate, may improve performance.

But this is just guesswork without the facts. An SQL Monitoring output, or execution plan with runtime statistics could help. If that is not enough, monitoring the wait events + parameters in v$session, or a raw SQL trace could help even more.