Sql-server – Optimizing SQL Server for an app that repeatedly hits the database with the exact same query

sql serversql server 2014sql-server-express

I am dealing with performance issues with a Windows app that uses SQL Server Express (2014) on the back-end.

I have managed to get this running a lot better primarily by reviewing the indexing SQL Server side but there is one particular report that is still running quite slowly.

Looking into what it is doing, it appears to be looping in the app and querying out thousands of very simple SELECT * queries against one table WHERE = Primary Key, so only retrieving one record in each case. And when I say IDENTICAL, I mean identical, it's not even varying the primary key to get different stuff, it is asking for exactly the same record back from the database apparently each time it needs it, up to a hundred times in only a few seconds.

This is an example report that takes about 10-15 seconds to run when the server is quiet- how many times the query runs I've added as a comment:

SELECT * FROM "Patient" WHERE "_Recno" = 35051  -- (runs 106 times)
SELECT * FROM "Client" WHERE "_Recno" = 15607   -- (99 times)
SELECT * FROM "SpeciesEntry" WHERE "_Recno" = 180   -- (97)
SELECT * FROM "Table" WHERE "_Recno" = 9    -- (97)
SELECT * FROM "DefaultEntry" WHERE "_Recno" = 2615  -- (96)
SELECT * FROM "Table" WHERE "_Recno" = 34   -- (96)
SELECT * FROM "DefaultEntry" WHERE "_Recno" = 2562  -- (84)
SELECT * FROM "Table" WHERE "_Recno" = 33   -- (84)
SELECT * FROM "Treatment" WHERE "_Recno" = 1682 -- (33)
SELECT * FROM "Treatment" WHERE "_Recno" = 1819 -- (33)
SELECT * FROM "Treatment" WHERE "_Recno" = 927  -- (33)
SELECT * FROM "Treatment" WHERE "_Recno" = 934  -- (33)
SELECT * FROM "Treatment" WHERE "_Recno" = 935  -- (33)
SELECT * FROM "Treatment" WHERE "_Recno" = 940  -- (33)
SELECT * FROM "Treatment" WHERE "_Recno" = 942  -- (33)
SELECT * FROM "Treatment" WHERE "_Recno" = 944  -- (33)
SELECT * FROM "OptionWP" WHERE "_Recno" = 103   -- (3)
SELECT * FROM "OptionWP" WHERE "_Recno" = 54    -- (1)
SELECT * FROM "PatientEstimate" WHERE "_Recno" = 8928   -- (1)
SELECT * FROM "Phrase" WHERE "_Recno" = 9718    -- (1)
SELECT * FROM "Table" WHERE "_Recno" = 4    -- (1)
SELECT * FROM "BreedEntry" WHERE "_Recno" = 3283    -- (1)

The number after the query is the number of times that exact query is being executed, e.g. the query SELECT * FROM "Patient" WHERE "_Recno" = 35051 is being executed 106 times, with that _Recno. There are actually 1,031 queries being executed to build this report (in this instance, it varies) – the 23 or so above are the distinct queries.

Now each query above runs very very fast, we are talking about a few tens of micro seconds in each case. In fact if you add up all 1,031 queries used to make this report the total amount of time used for all of them is only 59,193 microseconds, or only 59 milliseconds.

So the issue and the delay seems to be the overhead- although there is only about 59ms of actual database time in this the report takes about 10-15 seconds to run for the client as it is going back and forth with over 1,000 queries.

Note in most cases the client app and SQL Server are on the same machine, and multiple instances of the client are accessed through RDP. In a few cases the client is on a different machine on the LAN, and I imagine performance would be worse there. But you can take it in most cases that there should not be a network issue as both client app and SQL Server are on the same physical box.

The ten seconds is even sort of acceptable, the issue is that at busier times that can increase to as much as a minute or more.

Any thoughts on how to go about dealing with optimising this? If it was an app I had access to the source of obviously I'd replace all of this with a single or a few queries that used joins but that isn't an option, the app is a black box- all I can do is optimize from the SQL Server side.

Talking to the client, while performance is bad whether they are using it through RDP or a remote client app installation, performance is a lot worse with the remote client app, and this is more of an issue for them. So any suggestions regarding things I can look at to improve the performance there, with regard to the network or whatever, would be appreciated. One thing to note is that this SQL 2014 box is now virtualized, previously they were using I think 2008 or 2012 but it was not virtualized – they say this report was quicker then. They have other reasons for wanting it virtualized, moving it off virtualization is not an option.

It connects using Windows authentication and (I am pretty sure) TCP/IP. I don't think I'd be able to change this. It's not dropping and reestablishing the connections as far as I can make out, it appears to be using connection pooling at least.

I use Hibernate in my day job and I have come up against this sort of scenario before, with the ORM generating thousands of queries, and my usual solution there is to look at the fetching strategy (lazy vs eager loading) in the code or indeed in the case of reporting usually rewrite the whole thing in SQL. In this case though the software is what it is, a Windows executable, and there is nothing I can do about that, I can only address the SQL side.

My understanding is that the vendor is no longer supporting this particular version and has gone back to a version that uses flat files rather than SQL at all. That wouldn't work for the client – they have this DB integrated with various other things. It's niche software and like much of that sort of sw it's technically terrible on the back end but has functionality that the user in the niche needs. Anyway, I can't change the software, only what goes on on SQL Server. It was unusable, and I have got it usable, so have made progress working within those constraints.

There is no blocking or locking whatsoever, I have checked that. That was actually the main performance problem I fixed, but there has been nothing at all blocking long enough to get logged for the last month or so. Memory isn't really an issue as it is SQL Server Express so limited to 1GB anyway. From looking at it though I don't think it has a memory issue, disk if anything seems to be the biggest hardware choke point.

Best Answer

The queries being identical is not the problem, and in some ways helps since the Execution Plan is cached and the data pages needed for the query should still be cached. The issue would then tend to be the per-connection overhead of authentication and initializing the session.

The first thing to look into is: is "connection pooling" being used? You can test for this by using SQL Server Profiler, select the "RPC:Completed" event in the "Stored Procedures" category, make sure that "TextData", "ClientProcessID", and "SPID" are checked for that event (at the very least, you can select other columns if you like). Then, go to "Column Filters", select "TextData", and in the "Like" condition add the following condition: exec sp[_]reset[_]connection. Now run that trace. If you see instances of exec sp_reset_connection coming through, then that is due to connection pooling being used. But that doesn't necessarily mean that it is this app that is using it. So look at one of the "SPID"s in the Profiler trace and try to match it to the output of the following query:

SELECT sssn.login_time,
       DATEDIFF(MILLISECOND, conn.connect_time, sssn.login_time)
                  AS [MillisecondsBetweenConnectionAndSessionStart],
       conn.*,
       sssn.[program_name],
       sssn.host_process_id,
       sssn.client_interface_name,
       sssn.login_name,
       qry.[text]
FROM sys.dm_exec_connections conn
INNER JOIN sys.dm_exec_sessions sssn
        ON sssn.session_id = conn.session_id
OUTER APPLY sys.dm_exec_sql_text(conn.most_recent_sql_handle) qry
WHERE conn.session_id <> conn.most_recent_session_id
OR    DATEDIFF(MILLISECOND, conn.connect_time, sssn.login_time) > 50
ORDER BY conn.connect_time;

The field on the far-right/end has the most recent query executed. That should confirm that the session is the app in question (by what it is doing).

IF you cannot find any evidence of connection pooling be used in general, or at least used for this app, then the Connection String being used by the app needs to be updated to enabled connection pooling.

...in most cases the client app and SQL Server are on the same machine, and multiple instances of the client are accessed through RDP. In a few cases the client is on a different machine on the LAN...

Based on the above info from the question, it appears that there are multiple clients connecting, correct? If so, then connection pooling, while probably still a good idea and helpful, will be less effective since each client maintains its own pool of connections. Meaning, 5 instances (or whatever number of them) will still make 5 separate pools for connections, and each one will reduce the connection startup overhead for its respective app, but cannot reduce the overhead beyond that / down to a single shared connection). Also keep in mind that even with connection pooling, if an app does not properly close its connections before attempting to open new ones, you will still have multiple connections / sessions coming from a given instance of the app.

In this case, and in the case of a single app that is not using connection pooling and cannot have its Connection String updated in order to use connection pooling, then, if possible to update the app at all, it would be quite helpful to implement a caching layer such as Redis or memcache (and I believe AWS and Azure both offer cloud-based caching solutions). These repetitive hits, can often be cached and bypass the RDBMS altogether (for a specified amount of time, of course), which is a large part of why these things exist.


Now that I have just caught up with the recent comments on the question, it appears that most likely neither the Connection String nor any part of the app can be modified. In which case there is not much one can do, outside of possibly checking to see if the connections made from the app that is running on the same server as SQL Server is using Shared Memory or TCP/IP to connect, and if it is TCP/IP for the same-server connections, then check to see if the Shared Memory protocol is enabled for SQL Server, and if not, then enable it. This is not a guaranteed improvement since it is possible that the connection string forces the protocol to be TCP/IP (e.g. using the syntax of: server=tcp:{something}), but still worth trying as this syntax is likely not being used.



UPDATE

from comment on this answer:

[the query above] gives me nothing, but looking at sys.dm_exec_connections.connect_time for the app in question is several hours to days back. ... in every case ... the difference between connection and session start is always under 50ms (between 3 and 16)

This could very well indicate the problem, or at least a large part of it. If a connection was made several hours to days back, and the session started immediately thereafter, then either the app is a desktop app that makes a single connection and session that it executes all queries against (similar to how SSMS query tabs work), or the app code is incorrectly not closing out the connection object each time, in which case you could be seeing a lot of concurrent connections, many of which are effectively abandoned but still taking up memory (and on SQL Server Express, there might be a connection limit anyway).

Try the following query, adapted from the original query above:

SELECT conn.connect_time,
       sssn.login_time,
       CASE WHEN conn.last_read > conn.last_write THEN conn.last_read
            ELSE conn.last_write END AS [LastActivity],
       GETDATE() AS [Now],
       DATEDIFF(MILLISECOND, conn.connect_time, sssn.login_time)
                  AS [MillisecondsBetweenConnectionAndSessionStart],
       DATEDIFF(MILLISECOND, sssn.login_time, CASE WHEN conn.last_read > conn.last_write
                                               THEN conn.last_read ELSE conn.last_write END)
                  AS [MillisecondsBetweenSessionStartAndLastActivity],
       DATEDIFF(MILLISECOND, CASE WHEN conn.last_read > conn.last_write
                         THEN conn.last_read ELSE conn.last_write END, GETDATE())
                  AS [MillisecondsBetweenLastActivityAndNow],
       sssn.[program_name],
       sssn.host_process_id,
       sssn.client_interface_name,
       sssn.login_name,
       qry.[text],
       conn.*
FROM sys.dm_exec_connections conn
INNER JOIN sys.dm_exec_sessions sssn
        ON sssn.session_id = conn.session_id
OUTER APPLY sys.dm_exec_sql_text(conn.most_recent_sql_handle) qry
WHERE  sssn.is_user_process = 1
ORDER BY [MillisecondsBetweenLastActivityAndNow] DESC;

If connection pooling is being used then you will see rows with high values for the MillisecondsBetweenConnectionAndSessionStart field but much lower values for the MillisecondsBetweenSessionStartAndLastActivity field. The reason is that a connection is established and re-used. Each time the connection is re-used, the login_time resets to the most recent SqlConnection.Open event, and then immediately executes the query.

If you have a desktop app with a stable connection (like SSMS), you will see the opposite behavior that you see with connection pooling: you will instead have rows with low values for the MillisecondsBetweenConnectionAndSessionStart field but much higher values for the MillisecondsBetweenSessionStartAndLastActivity field. The reason is that the connection, once established, never closed and simply has queries continue to get executed against it.

If you have an app that isn't closing its connections, but yet is still opening new ones (either due to mistake or misunderstanding of how connection pooling works -- whether or not pooling is enabled), then you will not only have a lot of rows, but they will have a low value for the MillisecondsBetweenConnectionAndSessionStart field but much higher values for the MillisecondsBetweenLastActivityAndNow field. This happens if a connection is made, used once, and then SqlConnection.Open() is called before calling SqlConnection.Close() or SqlConnection.Dispose() (Dispose() will call Close(), and will be automatically called if the SqlConnection object was created in a using() construct).