Go ahead and drop the clustering key while you're importing data. When you've finished your INSERTs
, create the clustering key first, then the PK if it's non-clustered, then any remaining indices. I'm running such scripts at this very moment, and it takes about half as long as inserting into a table which is fully indexed.
There's no problem in going without a clustering key while loading data. I would recommend that you import your data in the same order that it will be clustered, if possible; this will reduce the need to shuffle the data around when it's clustered. If you're using an arbitrary IDENTITY
column, I suggest you reconsider; there may be a better candidate for clustering (or you may not even need a clustering key).
There's no problem in going without a PK while loading data. It's most important in maintaining referential integrity and in giving your indices a narrow target to hit; neither applies when you're bulk-loading data, assuming that you trust your data to not contain duplicates.
Duplicates are not always the devil either, and it may be faster to remove them after bulk load than to build processes at the front end that de-dupe in other ways.
My preference is to pull in data in all its ugly rawness first, often into a heap table, then do SQL cleanup before copying it to a new table. Maybe that's just because I'm a SQL guy and everything looks like a nail to me, but SQL is optimized for set-based operations. On the other hand, if the volume of data is huge you may need to do conversions and cleanup on an RBAR basis as you import it via SSIS or whatnot.
I would also recommend explicitly defining your PK (if any) with an ALTER TABLE
statement after your CREATE TABLE
. This reminds you to make an explicit choice (including whether to cluster on the PK), puts it next to your other index declarations, and lets you give it a non-random name.
Okay, after heeding advice from the comments, I removed the top 10000 *
from the select statement. After doing so, forcing the join order actually sped up the query as intended. It took 48 seconds (vs. 60 sec).
Here's the original execution plan.
And here's the forced join execution plan.
Best Answer
Have a look at the query execution plan, that should tell you if it is using an index.
I don't know why a sort join couldn't use an index, keys can be read from the index and then sorted if required, which saves scanning the table.