Sql-server – Reclaiming space after delete statement in SQL Server

clustered-indexheapsql servert-sql

I have some tables used in reporting and some of them grow huge with the daily ETL so I have implemented som jobs that delete rows more that x days old. E.g. everyday 10% new data gets added and I delete 10% of the oldest rows in another job.

Do I need to do something for efficiency? Some tables are heaps with nonclustered indexes and some also have clustered indexes. E.g. to I need to rebuild the tables in case of heaps or rebuild index in case of clustered index on the table? If so how often? Most tables are only used once per day and when all joins and calculations are done they get extracted for visualization in a non-live manner.

I know a little about index fragmentation and heap fragmentation and that one can query fragmentation in percent.

How much do I need to worry about the above scenario?

Best Answer

You can identify and resolve heap / index fragmentation following this guide: How to identify and resolve SQL Server Index Fragmentation

You likely only need to reorganize, and not worry so much about rebuilding as often. There's recommendations for when to use one over the other, but rebuilding is a heavier operation that likely doesn't result in much more benefit than reorganizing.

While 20 million rows is a nice little chunk of data, it isn't anything enormous by any means. I would still potentially recommend always using clustered indexes and you'll see the best benefits during the JOINs and calculations on those tables.

Using clustered indexes in a smart way will also reorganize your data for you in a sense. What I mean by that, is if every day you're loading 20 million records into an empty table, and then add a clustered index on that table after the data has been fully loaded, it'll efficiently organize the data sorted on that clustered index. It can basically take the place of manually rebuilding / reorganizing operations that are pretty heavy operations by themselves. Adding the clustered index to the table after the data has been fully loaded is usually the most efficient way for it to reorganize the data.