Sql-server – Unable to Query Very Large Table

partitioningperformancequery-performancesql serversql server 2014

I have a table defined as:

CREATE TABLE [dbo].[DataTable](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [Data] [varbinary](max) NULL,
    [Preview] [varbinary](max) NULL,
 CONSTRAINT [PKDataTable] PRIMARY KEY CLUSTERED ([ID] ASC))

The table contains 189M records and is partitioned according to this function/scheme:

CREATE PARTITION FUNCTION [DataPartition](bigint) AS RANGE RIGHT 
      FOR VALUES (0, 200000, 400000, 600000, 800000, 1000000, ...,
                  199000000, 199200000, 199400000, 199600000, 199800000, 200000000)

CREATE PARTITION SCHEME [DataPartition] AS PARTITION [DataPartition] TO 
     ([PRIMARY], [Data0000000K], [Data0000200K], [Data0000400K], [Data0000600K], 
      [Data0000800K], [Data0001000K], ..., [Data0199000K], [Data0199200K], [Data0199400K], 
      [Data0199600K], [Data0199800K], [Data0200000K])

So there are over 1,000 partitions on the table, which seems to be a magic number per the Microsoft Documentation. I only plan to query the table on the field that I use for partitioning, so by that logic, more partitions would be better in order to better take advantage of partition elimination (as well as making the individual partitions more manageable).

I can run the following set of queries without any problem:

select * from DataTable (nolock) where ID = 188727110
select * from DataTable (nolock) where ID = 188727112
select * from DataTable (nolock) where ID = 188727113
select * from DataTable (nolock) where ID = 188727114

Each query requires 3 physical/logical reads, so I would assume that partition elimination is taking place. However, attempting to run the next query results in no (immediate) response.

select * from DataTable (nolock) where ID in (188727110, 188727112, 188727113, 188727114)

I can't even get an estimated execution plan for the last query (I gave up after waiting for five minutes with no response). On a test database, which is only one percent the size of the full database, the latter query works fine and produces this execution plan. Also, running a TOP N query from either end of the full size table works fine, but adding a where clause to it causes the same apparent degradation.

The server this is running on has 48GB of memory, of which 32GB is allocated to the SQL Server instance. This is twice the 16GB minimum recommended in the MS doc. I'm not sure what else to consider checking. What could be causing this query to behave so poorly when the individual queries are working without delay?

[update] Here is initial tracking of the query through sp_whoisactive:

+-----------------+--------------------------------------------+--------+--------------------+----------------+---------------------+---------+--------+----------------+-------------+-----------+------------------------------------------------+-----------------+
| dd hh:mm:ss.mss |                 wait_info                  |  CPU   | tempdb_allocations | tempdb_current | blocking_session_id |  reads  | writes | physical_reads | used_memory |  status   |                  program_name                  | collection_time |
+-----------------+--------------------------------------------+--------+--------------------+----------------+---------------------+---------+--------+----------------+-------------+-----------+------------------------------------------------+-----------------+
| 00 00:00:01.173 | NULL                                       | 993    |                  0 |              0 | NULL                | 22,815  |      0 | 0              | 4,614       | runnable  | Microsoft SQL Server Management Studio - Query | 38:25.8         |
| 00 00:00:31.453 | (13ms)PAGEIOLATCH_SH:DATABASENAME:20(PFS)  | 3,695  |                  0 |              0 | NULL                | 48,035  |      0 | 1,458          | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 38:56.0         |
| 00 00:01:01.640 | (24ms)PAGEIOLATCH_SH:DATABASENAME:29(PFS)  | 6,151  |                  0 |              0 | NULL                | 71,857  |      0 | 2,678          | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 39:26.2         |
| 00 00:01:31.906 | (13ms)PAGEIOLATCH_SH:DATABASENAME:38(PFS)  | 8,414  |                  0 |              0 | NULL                | 94,325  |      0 | 4,461          | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 39:56.5         |
| 00 00:02:02.126 | (31ms)PAGEIOLATCH_SH:DATABASENAME:47(PFS)  | 10,980 |                  0 |              0 | NULL                | 119,649 |      0 | 5,823          | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 40:26.7         |
| 00 00:02:32.393 | (14ms)PAGEIOLATCH_SH:DATABASENAME:56(PFS)  | 13,570 |                  0 |              0 | NULL                | 143,519 |      0 | 7,157          | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 40:56.9         |
| 00 00:03:02.580 | (5ms)PAGEIOLATCH_SH:DATABASENAME:64(PFS)   | 16,172 |                  0 |              0 | NULL                | 164,772 |      0 | 10,170         | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 41:27.1         |
| 00 00:03:32.843 | (15ms)PAGEIOLATCH_SH:DATABASENAME:74(PFS)  | 18,893 |                  0 |              0 | NULL                | 189,765 |      0 | 12,315         | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 41:57.4         |
| 00 00:04:03.046 | (15ms)PAGEIOLATCH_SH:DATABASENAME:83(PFS)  | 21,571 |                  0 |              0 | NULL                | 215,194 |      0 | 13,728         | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 42:27.6         |
| 00 00:04:33.216 | (32ms)PAGEIOLATCH_SH:DATABASENAME:92(PFS)  | 23,996 |                  0 |              0 | NULL                | 239,058 |      0 | 15,056         | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 42:57.8         |
| 00 00:05:03.500 | (22ms)PAGEIOLATCH_SH:DATABASENAME:101(PFS) | 26,580 |                  0 |              0 | NULL                | 261,942 |      0 | 16,558         | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 43:28.0         |
| 00 00:05:33.703 | (17ms)PAGEIOLATCH_SH:DATABASENAME:110(PFS) | 29,148 |                  0 |              0 | NULL                | 286,738 |      0 | 17,886         | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 43:58.2         |
| 00 00:06:03.873 | (16ms)PAGEIOLATCH_SH:DATABASENAME:119(PFS) | 31,818 |                  0 |              0 | NULL                | 310,696 |      0 | 19,307         | 4,614       | suspended | Microsoft SQL Server Management Studio - Query | 44:28.4         |
+-----------------+--------------------------------------------+--------+--------------------+----------------+---------------------+---------+--------+----------------+-------------+-----------+------------------------------------------------+-----------------+

The stats in the sp_whoisactive logs increased linearly over that time. The number in the wait_info section increased linearly until reaching the number of partitions in the database, at which point the query completed. This took almost two hours.

[major update] At the time of the issue that I've described above, a long-running insert, inserting a single row at a time into DataTable, was processing. This has since completed, and I am now able to complete the problematic query without any delay. I attempted to replicate the problem by performing the insert process against the test system, but it did not occur there. In addition, I was able to complete the query while running a smaller insert process that behaves in the same manner as the longer-running one.

Best Answer

The best advice that I can give is to examine what's going on in that session when you're waiting on a query plan. Is the session using CPU? Is there a long wait event? Is there blocking? If you aren't sure how to answer those questions try downloading sp_whoisactive and save off the results every 30 seconds. You can edit that information into the question is you still aren't sure as to what's going on.

I'm aware of a few reasons as to why getting an estimated query plan can take a while. I'll go through them and see which ones might apply to your case.

1. Blocking

Creating a query plan might require a lock that's incompatible with currently held locks by other sessions. The default behavior of SQL Server is to wait forever for the lock that you need. Your query could be blocked if some other process is holding a schema modification lock on the object, but this seems unlikely because the other queries compile just fine.

2. Query compile can't get enough CPU

The query compilation process takes CPU resources to complete. If a server is under CPU pressure your compile could be delayed for that reason. This seems unlikely given that the other queries compiled just fine.

3. Query compile is waiting on some kind of memory operation

It's possible to hit memory-related waits during query compilation. I believe this only happens when the server is under some kind of pressure related to memory or active query compiles. This seems unlikely given that the other queries compiled just fine.

4. Stats update

If you allow automatic stats updates or automatic stats creation and stats updates aren't set for async, it's possible to be blocked for many minutes while waiting on those updates. I don't see why your one query would trigger a stats update but the other one wouldn't, so this seems unlikely.

5. SSMS takes a long time to render the plan

Some query plans are very large or have complex graphical query plans. Depending on the version of SSMS and the specs of the client it's possible to spend minutes waiting on a query plan to render. I suppose that this is a possibility, but the fact that the query doesn't seem to complete even when you don't ask for a plan makes it unlikely.

6. There's something about query compilation which takes a long time

Some query plans take a long time to compile. This could be because they are very complex. Sometimes it's caused by performance issues with the query optimizer which may later be improved by a product update. I will saw that I've seen long query compile times when dealing with tables with lots of partitions. It's true that Microsoft increased the partition limit past 1000 but there are some performance issues associated with going past that limit.

I missed a few things about your table definition the first time I read the question. It appears that you have 2000 different filegroups for your partition scheme. SQL Server supports that, but I suspect that it can cause issues. Can you explain what you're trying to accomplish with so many filegroups? There may be a much simpler configuration that gives you what you need.

It should also be said that partitioning a rowstore table on the same column as a clustered key isn't expected to help query performance. Without partitioning you would get clustered index seeks with your test queries. Will it really be faster to use partition elimination?