I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).
There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.
columns needed: TABLE
, FIELD
, AFTER
, DATE
, and there is an index on each of DATE
and TABLE
.
After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.
CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)
INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'
Runnig this -> (216598 row(s) affected)
Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins
I want to run something like this for several queries:
SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'
It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.
How can I optimize this?
EDIT
I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.
I can't create a unique on any index since there is no unique field or rows.
I am using SQL Server 2012.
Table Info: It's a heap and has the following space usage:
name rows reserved data index_size unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB
Best Answer
The question is mainly about how to optimize the select statement:
Removing the redundant projections and adding the presumed
dbo
schema:Without an index like
([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE])
SQL Server has two main options:[TABLE] = 'OTB'
and[FIELD] = 'STATUS'
(usingIDX6
), then perform a heap (RID) lookup per row to retrieve the[AFTER]
and[DATE]
columns.Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the
[TABLE] = 'OTB'
and[FIELD] = 'STATUS'
predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with anINDEX(0)
orFORCESCAN
hint to scan the heap.Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.
The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).
If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:
Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.