So my question is, is there any difference in using the suggested
index above, or what I think is a better alternative...
The missing-index suggestions made by the optimizer are opportunistic and relevant only to the particular query concerned. The optimizer goes through an index analysis phase, where it might note the absence of a covering index it didn't find. These suggestions are not intended to be a replacement for a full workload-representative DTA session, much less proper index design by a skilled database practitioner based on wide knowledge of the data and critical queries.
The suggestions should always be reviewed, as you have done, to ensure an optimal set of indexes for all queries is created - not one covering index per query as could be the case if the suggestions were followed literally.
There are naturally implications when widening the keys of an index compared with using INCLUDE
column, some of which have been noted by others. I personally prefer to INCLUDE
the clustering keys explicitly where they are useful. Clustered indexes can be changed, and it is rare that the person performing this change would check to see if any queries were relying on the implicit behaviour.
Changing columns from INCLUDE
to keys may also affect update query plans (overall shape and Halloween Protection requirements) and there are logging implications where keys of an index might change too.
I would probably choose to modify the suggestion as you have done, but I would be careful to validate update (= insert/update/delete/merge) query plans for the affected table.
Holy cow, you've got a lot of questions in here. Let's break this down.
Q: Will SQL "move" the existing rows to maintain the clustering, or will it let the table become "fragmented"?
Think of a database as a collection of pages - literal pieces of paper laid out on your desk. Think about the dictionary for now. If you wanted to add more words to the dictionary, you could add them in place if the pages had empty space.
When you first start out with an empty dictionary, this is relatively easy. But think about a mature dictionary with thousands of paper pages in it, all full.
When you want to add more words to that mature dictionary, odds are there isn't going to be any space left on the page. SQL Server will "tear" a page - it will take a brand new page somewhere else, and move some of the words over onto that new page. The new page would be at the end of the dictionary. The good news is that immediately after that action, there's now a half-empty page at the end of your dictionary, and also at the middle, both with space to add words.
If you happen to be adding them in that order, that is. (This is why the way you load data becomes increasingly important.)
Could this cause a big performance hit if the import is done one row at a time?
Forget the index for a second - adding data one row at a time is just plain inefficient regardless of the indexing structure. SQL Server is a set-based system - whenever you can work in sets, you probably should.
What happens when I query the data?
You didn't ask this, but I'm asking it for you, hahaha.
Think back about the aftermath of our inserts. Now we've got a dictionary that's mostly ordered, but when you get to a few points of the dictionary, you'll have to jump to the back to read from a few other pages. If these pages are all cached in your memory (RAM, buffer pool, etc) then the overhead just isn't going to be that large. Most memory access is random anyway - it's not like SQL Server stores your dictionary in memory in order.
On the other hand, if you need to fetch the data from conventional magnetic hard drives (spinning rust), then you can end up getting a bit of a performance benefit if that data is stored in order. The real design goal here, though, is to get the data from RAM instead of getting it from drives. The difference between defragmented data on disk versus fragmented data on disk is nowhere near as significant as the difference between getting it from disk versus getting it from RAM.
Should I rather just not bother with the ordering of the rows and just add an identity column as the primary key and an index on the Date column to help with my queries?
Bingo: this is the difference between physical database design and logical database design. Programmers have to worry a lot about physical database design initially, but as long as your database is under, say, 100GB in size, you can fix logical design in post, so to speak. Put an identity field on there for starters, cluster on it, and then after being live for a few months, revisit the index design to maximize performance.
Now, having said that, once you're experienced with this type of decisionmaking, then you'll be better equipped to guesstimate indexes right from the start. Even so, I don't even usually put much thought into index design initially. Users never seem to query the data the way I would have expected.
Best Answer
If the intervals within a particular account do not overlap with each other, you can replace your query with a TOP (1) query, to find the last start date before your desired end date. Do the same but opposite to get your earliest end date after your desired start date. These will form a small range you can use to check.
If they do overlap within a particular account, then you have one of the classically hard problems, and I'd recommend you read the material that Dejan Sarka has written at http://solidqblogs.azurewebsites.net/en/businessanalytics/interval-queries-in-sql-server-part-1/ (and its follow ups).
The problem is that an indexes on either startdate or enddate are less than ideal, because either one alone doesn't give a clue about whether the interval reaches your desired range. Another strategy is useful instead, such as indexing values within the range (lots of storage, and you have to consider the granularity), or come up with a creative solution such as interval trees or spatial analytics.