Let me try to summarise the answers as per the commenters above.
First of all, to achieve higher deletion speeds you can do three things:
- Delete a higher number of rows at a time
- Use partitioning
- Run multiple deletes
Ad 1) Since deleting a large number of rows at a time hit the transaction log hard and causes locking - you likely want the number of deletes in a statement to be "small". I find that around 10K rows is a good number (and having table lock escalation turned off with ALTER TABLE Foo SET LOCK_ESCALATION = Disabled
Ad 2) If your table is not already partitioned, this is not an option
Ad 3) DELETE in SQL Server is single threaded, so you need to run more than one at a time to get max speed. To run multiple deletes concurrently, you need a way to partition them so each parallel delete runs on its own set of keys and doesn't block with the others. Typically, you can use the primary key of the table to partition the delete.
For example, if you have an IDENTITY column called key
on the table, you can first:
SELECT MAX(key) - MIN(key), MIN(key) FROM Foo WHERE <rows that must be deleted>
Keep the Max - Min and Min somewhere (a table in tempdb) that you can quickly read from into variables @IntervalSize
, @MinKey
respectively.
Let us say you decide that you can allocate 4 cores for running deletion. You now run 4 queries, either from 4 x SSMS new query or four command prompt via SQLCMD, each doing this:
DECLARE @NumDone INT = 1
WHILE @NumDone > 0 BEGIN
SET ROWCOUNT 10000
DELETE FROM Foo WHERE Key BETWEEN @MinKey + @IntervalSize / 4 * @n
AND @MinKey + @IntervalSize / 4 (@n + 1)
SET @NumDone = @@ROWCOUNT
END
Pick @n = 0,1,2,3 for each of the concurrently running DELETES.
You've got lots of questions in here:
Q: (The lack of foreign keys) confuses me a lot! It is a good practice (not mandatory) to have Fk's in the DWH for a variety of reasons (data integrity, relations visible for semantic layer, ....)
A: Correct, it's normally a good practice to have foreign keys in a data warehouse. However, clustered columnstore indexes don't support that yet.
Q: So MS advocates Clustered Column store indexes for DWH scenarios, However it can not handle FK relationships?!
A: Microsoft gives you tools. It's up to you how you use those tools.
If your biggest challenge is a lack of data integrity in your data warehouse, then the tool you want is conventional tables with foreign keys.
If your biggest challenge is query performance, and you're willing to check your own data integrity as part of the loading process, then the tool you want is clustered columnstore indexes.
Q: However SQL 2014 than adds no real new value for DWH??
A: Thankfully, clustered columnstore wasn't the only new feature in SQL Server 2014. For example, check out the new cardinality estimator.
Q: Why am I so angry and bitter about the way my favorite feature was implemented?
A: You caught me - you didn't really ask that question - but I'll answer it anyway. Welcome to the world of third party software where not everything is built according to your exact specifications. If you feel passionately about a change you'd like to see in a Microsoft product, check out Connect.Microsoft.com. It's their feedback process where you can submit a change, other people can vote it up, and then the product team reads it and tells you why they won't implement it. Sometimes. Most of the time they just mark it as "won't fix, works on my machine" but hey, sometimes you do get some answers.
Best Answer
For question 2.
I think I will break the large table up into many different tables and then use a view to join them together. This will allow me to have something like table partitioning without the feature of table partitioning as outlined in this article.
https://www.simple-talk.com/sql/sql-tools/sql-server-partitioning-without-enterprise-edition/
View with schemabinding that unions all the tables together. Then I can insert data and select data from this view as if it was the primary fact table. I would only need to ensure all my SELECT queries on this view include the column I chose to partition the tables with to get the full benefit.