SQL Server – DELETE vs TRUNCATE

database-internalssql server

I am trying to get a greater understanding on the differences between the DELETE and TRUNCATE commands. My understanding of the internals goes something along the lines of:

DELETE -> the database engine finds and removes the row from the relevant data pages and all index pages where the row is entered. Thus, the more indexes the longer the delete takes.

TRUNCATE -> simply removes all the table's data pages en mass making this a more efficient option for deleting the contents of a table.

Assuming the above is correct (please correct me if not):

How do different recovery modes affect each statement? If there is any effect at all
When deleting, are all indexes scanned or only those where the row is? I would assume all indexes are scanned (and not seeked?)
How are the commands replicated? Is the SQL command sent and processed on each subscriber? Or is MSSQL a bit more intelligent than that?

Best Answer

DELETE -> the database engine finds and removes the row from the relevant data pages and all index pages where the row is entered. Thus, the more indexes the longer the delete takes.

Yes, though there are two options here. Rows may be deleted from nonclustered indexes row-by-row by the same operator that performs the base table deletes. This is known as a narrow (or per-row) update plan:

Per row deletion

Or, the nonclustered index deletions may be performed by separate operators, one per nonclustered index. In this case (known as a wide, or per-index update plan) the complete set of actions is stored in a worktable (eager spool) before being replayed once per index, often explicitly sorted by the particular nonclustered index's keys to encourage a sequential access pattern.

Per index deletion

TRUNCATE -> simply removes all the table's data pages en masse making this a more efficient option for deleting the contents of a table.

Yes. TRUNCATE TABLE is more efficient for a number of reasons:

Fewer locks may be needed. Truncation typically requires only a single schema modification lock at the table level (and exclusive locks on each extent deallocated). Deletion might acquire locks at a lower (row or page) granularity as well as exclusive locks on any pages deallocated.
Only truncation guarantees that all pages are deallocated from a heap table. Deletion may leave empty pages in a heap even if an exclusive table lock hint is specified (for example if a row-versioning isolation level is enabled for the database).
Truncation is always minimally logged (regardless of the recovery model in use). Only page deallocation operations are recorded in the transaction log.
Truncation can use deferred drop if the object is 128 extents or larger in size. Deferred drop means the actual deallocation work is performed asynchronously by a background server thread.

How do different recovery modes affect each statement? Is there is any effect at all?

Deletion is always fully logged (every row deleted is recorded in the transaction log). There are some small differences in the content of log records if the recovery model is other than FULL, but this is still technically full logging.

When deleting, are all indexes scanned or only those where the row is? I would assume all indexes are scanned (and not seeked?)

Deleting a row in an index (using either the narrow or wide update plans shown previously) is always an access by key (a seek). Scanning the whole index for each row deleted would be horribly inefficient. Let's look again at the per-index update plan shown earlier:

Wide plan 2

Execution plans are demand-driven pipelines: parent operators (to the left) drive child operators to do work by requesting a row at a time from them. The Sort operators are blocking (they must consume their whole input before producing the first sorted row), but they are still being driven by their parent (the Index Delete) requesting that first row. The Index Delete pulls a row at a time from the completed Sort, updating the target nonclustered index for each row.

In a wide update plan, you will often see columns being added to the row stream by the base table update operator. In this case, the Clustered Index Delete adds nonclustered index key columns to the stream. This data is required by the storage engine to locate the row to remove from the nonclustered index:

Output list detail

How are the commands replicated? Is the SQL command sent and processed on each subscriber? Or is SQL Server a bit more intelligent than that?

Truncation is not allowed on a table that is published using transactional or merge replication. How deletions are replicated depends on the type of replication and how it is configured. For example, snapshot replication just replicates a point-in-time view of the table using bulk methods - incremental changes are not tracked or applied. Transactional replication works by reading log records and generating appropriate transactions to apply the changes at subscribers. Merge replication tracks changes using triggers and metadata tables.

Related Solutions

Sql-server – Should I lock the table during delete

Changing the locking to table locks will just make the deletes run even slower as the delete won't be able to run until the lock can be taken on the table which means that all other threads need to be finished or blocked. If you have foreign keys with delete cascade enabled that will probably take a lot of the time.

You might want to change it to a SQL Agent job so instead of running your app which connects and disconnects, you just run a loop deleting data until you are done.

SELECT NULL --Makes the WHILE loop work.
WHILE @@ROWCOUNT <> 0
BEGIN
    DELETE TOP (1000) FROM YourTable
    WHERE Column < getdate()-31
END

If this doesn't work you could look into table partitioning which would allow you to switch the data to another table very quickly, then truncate the data from the new table. This however does require Enterprise Edition.

Sql-server – Confirmation on query optimization and performance understanding (seek and scan)

To answer your questions:

Since you say 'all the records meet the criteria' I would advise doing tests on a larger set of rows where some rows are outside the criteria.
There are some circumstances where using an expression against an indexed field will result in the index being seeked, such as WHERE [MyField] LIKE 'SomeString%'. However, using an expression directly in the field will result in the query not being sargable, and will induce an index scan. For instance, WHERE LEFT([MyField], 1) = 'T'.
Not having a clustered index on a table (meaning the table is a heap) does not affect the ability for a particular SQL query to use a seek operation on an index. Not having a clustered index on a table has its own set of issues that is way beyond the scope of this answer. Check this blog post for more info on heaps vs clustered indexes.

Best Answer

Related Solutions

Sql-server – Should I lock the table during delete

Sql-server – Confirmation on query optimization and performance understanding (seek and scan)

Related Question