Sql-server – Optimizing queries for 25+ million rows

performanceperformance-tuningsql serversql-server-2012

I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).

There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.

columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.

After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.

CREATE TABLE #TEMP
(
    TABLE VARCHAR(30) NULL,
    FIELD VARCHAR(30) NULL,
    AFTER VARCHAR(1000) NULL,
    DATE DATETIME,
    SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)

INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
    SELECT TABLE, FIELD, AFTER, DATE 
    FROM mytbl WITH (NOLOCK)
    WHERE TABLE = 'OTB' AND
    FIELD = 'STATUS'

Runnig this -> (216598 row(s) affected)

Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins

I want to run something like this for several queries:

SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'

It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.

How can I optimize this?

EDIT

I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.

I can't create a unique on any index since there is no unique field or rows.

I am using SQL Server 2012.

Table Info: It's a heap and has the following space usage:

name    rows        reserved    data        index_size  unused
mytbl   24869658    9204568 KB  3017952 KB  5816232 KB  370384 KB

Best Answer

The question is mainly about how to optimize the select statement:

SELECT [TABLE], [FIELD], [AFTER], [DATE]
FROM mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB' AND
[FIELD] = 'STATUS'

Removing the redundant projections and adding the presumed dbo schema:

SELECT [AFTER], [DATE] 
FROM dbo.mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB'
AND FIELD = 'STATUS';

Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:

Scan the heap entirely (3GB+); or
Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.

Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB' and [FIELD] = 'STATUS'predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0) or FORCESCAN hint to scan the heap.

Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.

The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).

If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:

CREATE INDEX index_name
ON dbo.mytbl ([DATE],[AFTER])
WHERE [TABLE] = 'OTB'
AND [FIELD] = 'STATUS';

Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.

Best Answer

Related Solutions

MySQL optimization – year column grouping – using temporary table, filesort

SQL Server Tempdb Space – Why Creating Clustered Index Uses All Tempdb Space

Related Question