SQL Index – Why Does Update Statement Also Update Indexes?

indexupdate

I am currently learning about SQL queries and how indexes work with them, and I ran into a situation that I don't understand. Let's say I have this query for that updates a tables' values with another table and they both have a dozen columns with a hundred thousand rows each:

UPDATE TA
SET TA.Column = TB.Column,
    TA.ColumnTwo = TB.ColumnTwo,
    TA.ColumnThree = 3
...
FROM TableA TA
INNER JOIN TableB TB ON TA.ColumnFour = TB.ColumnFour

Table TA has 9 indexes on it while Table TB only has 2. This update query would take longer than 5 minutes to update only a few hundred rows that matches the condition TA.ColumnFour = TB.ColumnFour. I assumed that it was because of the extra indexes since the query execution plan shows 9 Index Update (1 for each index). I removed the duplicate/unecessary indexes and I was able to lower the number of indexes to 4 but the query would still run slow.

My question is why does the update statement also updates indexes and how would I be able to prevent the query from creating the multiple Index Update?

Best Answer

For consistency. There are times the access path will use only the data from the indexes or will start from the index and jump to the table. In both cases the informatoin on these two entities should be compatible.

A index is just a separate ordered set of data from a table. When you remove an entry from the table, the entry on the index pointing to this row should also be removed. The same should happen when you update/insert. That's the burden of indexes: faster select and slower DML (insert, update, delete).

When you do, for example, a

select count(*) from table t1 where c1 = :value;

Your result should be the same whether you have indexes or not. And the beauty of Database Management Systems is that they do that automatically for you. :)

You cannot prevent this update. The only time this will not happen is when you do not change the data referenced on the index. If you have a table with four columns (a, b, c, d), and two of them are indexed (a, b), updating only the other two (c, d) will not trigger this extra update.