Sql-server – 2 similar queries in large sql server table – 1 takes too long

sql server

I have a huge table in my database that contains distances between cities. This enables my application to find nearby cities around the world when a starting city is selected. It contains 4 columns – ID, StartCityID, EndCityID and Distance and contains about 120 million rows. I've got indexes set up on the startcityID, endcityID, another one for both, and another one each for startcity + distance, and endcity + distance. (This is my first real dealings with indexes so not 100% sure if I'm doing it correctly).

Anyway – I do the following 2 queries:

Select distinct StartCityID
From Distances where EndCityID = 23485

and

Select distinct EndCityID 
From Distances where StartCityID = 20045

They both return the same number of cityIDs, but the top one takes 35 seconds to do, and the bottom one returns results immediately. When I look at the indexes, they seem to be set up to serve startCity and endCity in the same way. Anyone know why they might be acting differently? I'm at a loss…

NB – this may offer more insight, but the one that takes 35 seconds – if I press execute again straight away with the same ID, it returns results immediately as well that time. Unfortunately that isn't good enough for my website but it may be useful information.

Thanks

Best Answer

The explain plan (or execution plan) should tell you exactly where the difference lies, and which indexes are being used.

See here for more info on how to get an explain plan.

Related Solutions

SQL Query Optimization – Seeking on Multiple Indexes

The problem is that it might (and knowing spatial indexes, probably will) assume that the spatial filter will be a lot more selective than the time filter.

But if you have a few million records within 200km, then it could be significantly worse.

You're asking it to find records within 200km, which returns data ordered by some spatial order. Finding the records in there that are close in time means checking each one.

Or else you're finding records by time, and you're getting results in time order. Then, filtering this list to the 200km radius is a matter of checking each one.

If you filter the data in two ranges like this, it becomes hard to apply the second filter using an index. You may be better off telling it not to use the spatial index if the time filter is the tighter one.

If both are large individually, and it's only together that they are tight, then you have a more complex issue, one that people have tried to solve for a long time, and which could be nicely resolved by indexes that cover 3D (and beyond) space. Except that SQL Server doesn't have them.

Sorry.

Edit: more info...

This is a similar problem to finding time ranges that cover a particular point in time. When you search for the records that start before that point, you then have an unordered mess of end times - and vice-versa. If you look for people in the phone book whose surnames start with F, you can't hope to find the people whose first names start with R very easily. And an index on first name doesn't help either for the same reason. Finding things in that next index is hard when your first index is not an equality.

Now, if you could change your date filter into an equality filter (or series of equality filters), then you could stand a chance, except that a spatial index is a special kind of index and can't be used as the second level in a composite index.

So you're left with an awkward situation, I'm afraid. :(

Edit: Try:

select top 100000 h.Time, m.Time, h.GeoPoint.STDistance(m.GeoPoint)/1000.0
from L2V5.dbo.header h join L2.dbo.MLS_Header m
on h.mjd between m.mjd-.125 and m.mjd+.125
where h.GeoPoint.STDistance(m.GeoPoint)/1000.0 < 200
option( table hint ( h, index(ix_MJD) ) );

Notice that I'm deliberately breaking the sargability by dividing by 1000 before comparing to 200. I want this work to be done in the Key Lookup.

Mind you, you could avoid the need for the lookups (and the hints) by INCLUDEing GeoPoint and Time in both ix_MJD indexes. That'll certainly take some of the heat out of the query plan.

Sql-server – SQL Server transferring results slowly in table result set, but instantly with XML

Those XML results you keep mentioning. I do not think you are seeing what you think you are seeing. ;-)

Is FOR XML magic?

When you have the XML result in the grid, click on the left margin that contains the row number of the result set (there should be just 1 row). That should highlight the XML result. Now hit Control-C to copy the XML and then in the query editor above, put the cursor on the beginning of an empty line and hit Control-V to paste the XML. That should leave the cursor on the far right, at the end of the XML. Look down at the status bar that has the Line #, Column #, and Character #. What does it shown for Column # and Character #? Mine shows 2033, yet the XML got clipped and there should be more.

Now click on the XML itself in the result set. That takes you to another tab with the full XML result, unclipped. That is more than likely well over 2033 characters of text.

The "instantaneous" result you are seeing with the FOR XML is merely the fact that SSMS is only displaying the first 2033 characters of however many there are, where it is 5000 or 5,000,000 characters more in the actual result value. And in fact, if your query takes long enough, you can pay close attention to the query timer in the lower, right-hand corner and if you re-run the query and are returning 10+ rows, that counter can still be going after the result grid finishes displaying its truncated XML snippet.

Do the number of rows and columns greatly affect the time to display?

I put together some code (shown below), which returns 2300 columns, to test this behavior. It starts out with the FOR XML being uncommented. After the initial run, it returns immediately each time, no matter how you adjust the @Rows input param. Set it to 10, then 20, then 40. The XML always comes back immediately. But also watch the query timer as it will keep going, especially as you increase the value of @Rows.

Then comment-out the SET statement that adds the FOR XML so you get the 2300 individual fields in the grid. It might take 30 takes to run the first time. Running again it took me only 2 - 3 seconds each time, even as I increased the @Rows from 4 to 10, 20, 40, 50, even 80. Hence, the time it takes to display the results for 2000+ columns is not a function of how many rows are being returned, at least not at this small scale.

The Code

DECLARE @Select NVARCHAR(MAX) = N'SELECT TOP (@Rows)
         OBJECT_NAME(sac1.[object_id]) AS [ObjectName], *';
DECLARE @From NVARCHAR(MAX) = N'FROM   [sys].[all_columns] sac1';
DECLARE @JoinTemplate NVARCHAR(MAX) = N'
INNER JOIN [sys].[all_columns] sac{{counter}}
        ON sac{{counter}}.[object_id] = sac1.[object_id]
       AND sac{{counter}}.[column_id] = sac1.[column_id]';

DECLARE @CRLF NCHAR(2) = NCHAR(0x0D) + NCHAR(0x0A);
DECLARE @Counter INT = 2;

WHILE (@Counter < 93)
BEGIN
  SET @From += REPLACE(@JoinTemplate, N'{{counter}}', CONVERT(NVARCHAR(5), @Counter));

  SET @Counter += 1;
END;


SET @Select += @CRLF + @From;
--SET @Select += @CRLF + N'ORDER BY OBJECT_NAME(sac1.[object_id])';
SET @Select += @CRLF + N' FOR XML RAW, ELEMENTS';

EXEC sp_executesql
  @Select,
  N'@Rows INT',
  @Rows = 4;

Uncomment the SET line that has the ORDER BY to see that it takes the full 30 - 40 seconds every time you run the query (due to the sort).

Best Answer

Related Solutions

SQL Query Optimization – Seeking on Multiple Indexes

Sql-server – SQL Server transferring results slowly in table result set, but instantly with XML

Related Question