SQL Server – Is Soft Delete to Improve Performance All Wrong?

database-designdeleteperformancesql serversql-server-2008-r2

A few years back I read soft delete would improve performance of deletion, because instead of actually removing the item and impact the clustered (and all other) indices, it would just change a value.

I recently tried this and I got very disappointing performances.

So I searched through the web for a good implementation of soft deletion and the first things I ended up reading was articles/blogs saying "soft deletion is bad".

Apparently it cause most harm than good when the table gets huge…

I don't think what I did makes table way more huge tables because a nightly (each day) maintenance removes (for real) all rows marked for deletion.

So, are soft deletion really bad for performance? Do real deletion cause a whole clustered table to be fully reordered (which would be a lot of disk operations!!!)?

Best Answer

In some cases, soft delete is not meant to be permanent. You can just defer the delete to some background job (say, right before you reorganize / rebuild indexes) so that the originating transaction doesn't have to wait for the deletes to occur (especially if you have cascading deletes, triggers, etc).

In other cases, a soft delete is not a performance requirement at all, but rather a way to avoid permanently deleting data, so there is always a trace of it. In that case, you want the soft deletes to be permanent. (But if you "delete" a lot of data at some point the table will be mostly "deleted" data. If that is the pattern and keeping the data permanently is a requirement then you might think about using an archive table to hold the deleted rows, which you can also perform as a deferred background task, still only requiring the originating transaction to update a flag.)

In both cases, you can improve runtime performance by using filtered indexes. Can't guide you on which columns you need to include, but you can say:

CREATE INDEX x ON dbo.table(col) INCLUDE(othercol)
WHERE deleted = 0;

It may sound counterintuitive, but you may want the inverse index as well, to make your delete operations faster, at least in terms of identifying the rows to delete. In that case you probably just want the primary key there:

CREATE INDEX y ON dbo.table(key) WHERE deleted = 1;

Note that there are many limitations: