Your ix_hugetable
looks quite useless because:
- it is the clustered index (PK)
- the INCLUDE makes no difference because a clustered index INCLUDEs all non-key columns (non-key values at lowest leaf = INCLUDEd = what a clustered index is)
In addition:
- added or fk should be first
- ID is first = not much use
Try changing the clustered key to (added, fk, id)
and drop ix_hugetable
. You've already tried (fk, added, id)
. If nothing else, you'll save a lot of disk space and index maintenance
Another option might be to try the FORCE ORDER hint with table order boh ways and no JOIN/INDEX hints. I try not to use JOIN/INDEX hints personally because you remove options for the optimiser. Many years ago I was told (seminar with a SQL Guru) that FORCE ORDER hint can help when you have huge table JOIN small table: YMMV 7 years later...
Oh, and let us know where the DBA lives so we can arrange for some percussion adjustment
Edit, after 02 Jun update
The 4th column is not part of the non-clustered index so it uses the clustered index.
Try changing the NC index to INCLUDE the value column so it doesn't have to access the value column for the clustered index
create nonclustered index ix_hugetable on dbo.hugetable (
fk asc, added asc
) include(value)
Note: If value is not nullable then it is the same as COUNT(*)
semantically. But for SUM it need the actual value, not existence.
As an example, if you change COUNT(value)
to COUNT(DISTINCT value)
without changing the index it should break the query again because it has to process value as a value, not as existence.
The query needs 3 columns: added, fk, value. The first 2 are filtered/joined so are key columns. value is just used so can be included. Classic use of a covering index.
First, a query or (update) statement with a condition like WHERE posts < comments
that compares 2 columns cannot effectively use your indexes so it will probably have to do a full table scan. It might be better if you had a composite index, (posts, comments)
or the other way around, but it would still need to do a full index scan.
If the rows to be updated are very few, say 100, it's not very efficient to scan 1 million rows to update only 100.
So, the easiest thing would probably be to add such a composite index and test. If the resulting efficiency from the full index scan is acceptable, you can keep using this query.
Another thing you could do - since you are using MariaDB - is to add a VIRTUAL
(computed) column and index it:
ALTER TABLE posts
ADD COLUMN comment_posts_diff INT AS (columns - posts) PERSISTENT,
ADD INDEX more_comments_ix (comment_posts_diff) ;
Then you only need to change the condition of your queries to use:
WHERE comments - posts > 0
or:
WHERE comment_posts_diff > 0
and the index will be effectively used.
Note that there are 2 variants of computed columns, the VIRTUAL
and the PERSISTENT
ones. As it's obvious from the names, the persistent columns
actually take space on disks and the values are automatically modified when any of the columns in their definition is modified. Their advantage is that they can be indexed.
If the 'more comments'
though is just a constant string that applies to rows with comments > posts
and there is another string ('less comment'
?) that applies to the rest of the rows or a similar need, you don't really need the column notes
and any of the above. Your queries could just compute the respective value (more comments, less comment, no comments) at run time.
Or you could have a computed column to do the work for you (vitual this time, not persistent):
ALTER TABLE posts
DROP COLUMN notes,
ADD COLUMN notes VARCHAR(20) AS
CASE WHEN columns > posts THEN 'more comments'
WHEN columns = 0 THEN 'no comments'
WHEN columns = posts THEN 'comments = posts'
ELSE 'less comments'
END
VIRTUAL ;
Best Answer
You should retype the various Name columns from nvarchar(max) to nvarchar([right size]). It's unlikely that names will be up to 2GB in length, and making them max sized prevents them being used as a key in an index. A good general rule of thumb is to avoid large object data types wherever possible.
You'll probably need to make that change to the EF code, but in T-SQL:
I have also changed the column definition to
NOT NULL
there.Anyway, given indexes:
...you should end up with a plan that avoids lookups and sorts, something like:
With the small number of rows in Status and Colors, it is probably not worth indexing their Name columns right now, but that could change over time. In any case, if the name columns should be unique, you should constrain them to be so using a unique constraint or unique nonclustered index, for example:
Second approach
With the same indexes, if you are able to change the SQL, you could also separate out the Color and Status lookups:
This makes for a simpler set of operations that also happen to make life easier for the optimizer:
Note also that Unicode string literals should be prefixed with N for correct data typing.
Oh, and you should also patch your instance. It is currently SQL Server 2008 Service Pack 1 - Service Pack 4 has been available for some time now.