Sql-server – Exploring async DB IO – what is the cause for “The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement”

sql serversql-server-2012

Please, observe the following code:

static async Task<Result> IOAsync(string duration)
{
    using (var conn = new SqlConnection("server=localhost;database=abc;uid=xyz;pwd=123;Max Pool Size=1000"))
    using (var cmd = new SqlCommand("WAITFOR DELAY @Duration;SELECT GETDATE(),NEWID()", conn))
    {
        cmd.CommandTimeout = 0;
        cmd.Parameters.Add(new SqlParameter("Duration", duration));

        await conn.OpenAsync();
        using (var reader = await cmd.ExecuteReaderAsync())
        {
            await reader.ReadAsync();
            return new Result(reader.GetGuid(1), reader.GetDateTime(0));
        }
    }
}

All it does is fire a simple SQL query to a local Sql Server 2012 instance using async I/O API. Notice the Max Pool Size is set to 1000.

This function is then called by the following code:

var sw = new Stopwatch();
sw.Start();
var res = Task.WhenAll(Enumerable.Repeat(duration, count).Select(IOAsync).ToArray()).GetAwaiter().GetResult();
sw.Stop();
var idCount = res.Select(o => o.Id).Distinct().Count();
if (idCount != count)
{
    // Hmm, should never happen.
    Console.Error.WriteLine("ERROR: Count of distinct IDs is less than {0} - {1} !", count, idCount);
}
else
{
    Console.WriteLine("Count of distinct timestamps = " + res.Select(o => o.Timestamp).Distinct().Count());
    Console.WriteLine("Elapsed time = " + sw.Elapsed);
}

And I am calling this code with various combinations of duration and count to explore the behavior of the async database IO.

Please, find below example outputs:

PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release> .\AsyncIODemo.exe -c 100 -d 16 -s Async
Running a single cold run (to be discarded).
100 IO tasks using the Async strategy, 00:00:16 each.
Count of distinct timestamps = 28
Elapsed time = 00:00:16.1041368
PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release> .\AsyncIODemo.exe -c 200 -d 16 -s Async
Running a single cold run (to be discarded).
200 IO tasks using the Async strategy, 00:00:16 each.
Count of distinct timestamps = 56
Elapsed time = 00:00:16.1950685
PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release> .\AsyncIODemo.exe -c 300 -d 16 -s Async
Running a single cold run (to be discarded).
300 IO tasks using the Async strategy, 00:00:16 each.
Count of distinct timestamps = 84
Elapsed time = 00:00:16.2898671
PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release> .\AsyncIODemo.exe -c 400 -d 16 -s Async
Running a single cold run (to be discarded).
400 IO tasks using the Async strategy, 00:00:16 each.
Count of distinct timestamps = 113
Elapsed time = 00:00:16.3857622
PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release> .\AsyncIODemo.exe -c 500 -d 16 -s Async
Running a single cold run (to be discarded).
500 IO tasks using the Async strategy, 00:00:16 each.
Count of distinct timestamps = 144
Elapsed time = 00:00:16.4950963
PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release> .\AsyncIODemo.exe -c 600 -d 16 -s Async
Running a single cold run (to be discarded).
600 IO tasks using the Async strategy, 00:00:16 each.
Count of distinct timestamps = 178
Elapsed time = 00:00:16.6130098
PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release> .\AsyncIODemo.exe -c 700 -d 16 -s Async
Running a single cold run (to be discarded).
700 IO tasks using the Async strategy, 00:00:16 each.
System.Data.SqlClient.SqlException (0x80131904): Connection Timeout Expired.  The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement.  This could be because the pre-login handshake
 failed or the server was unable to respond back in time.  The duration spent while attempting to connect to this server was - [Pre-Login] initialization=0; handshake=15001;  ---> System.ComponentModel.Win32Exception (0x
80004005): The wait operation timed out
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
   at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
   at System.Data.SqlClient.TdsParser.ConsumePreLoginHandshake(Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity, Boolean& marsCapable)
   at System.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecu
rity, Boolean withFailover)
   at System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover)
   at System.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCr
edential credential, TimeoutTimer timeout)
   at System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redi
rectedUserInstance)
   at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecu
rePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData)
   at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnection
Options userOptions)
   at System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbCo
nnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionPool.WaitForPendingOpen()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at AsyncIODemo.Program.<IOAsync>d__1c.MoveNext() in c:\Users\mkharitonov\Documents\Visual Studio 2013\Projects\AsyncIODemo\AsyncIODemo\Program.cs:line 270
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at AsyncIODemo.Program.Run(Int32 count, String duration, IOStrategy strategy, Nullable`1 cpu, Nullable`1 maxThreadCount) in c:\Users\mkharitonov\Documents\Visual Studio 2013\Projects\AsyncIODemo\AsyncIODemo\Program.cs
:line 181
   at AsyncIODemo.Program.Main(String[] args) in c:\Users\mkharitonov\Documents\Visual Studio 2013\Projects\AsyncIODemo\AsyncIODemo\Program.cs:line 86
ClientConnectionId:3bcf9df5-fe95-4b8c-a6f2-f916db12d0e1
Error Number:-2,State:0,Class:11
PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release> .\AsyncIODemo.exe -c 700 -d 15 -s Async
Running a single cold run (to be discarded).
700 IO tasks using the Async strategy, 00:00:15 each.
Count of distinct timestamps = 199
Elapsed time = 00:00:30.0608164
PS C:\work\AsyncIODemo\AsyncIODemo\bin\Release>

At first I am starting 100 async IOs, each taking 16 seconds. And it works great, so I increase the count to 200, 300, …, 700 – boom! A failure, which I have never encountered before.

Now, I know the formal cause – the default Connection Timeout is exactly 15 seconds. Indeed, increasing the connection timeout (in the connection string) or firing 700 IOs for 15 seconds, instead of 16 – works.

But something bad happens – the overall time jumps by a factor of two. It is as if the database server refuses to accept that many concurrent IOs (the exact figure is between 600 and 700), but the client side does not know about it and attempts to open all of them.

Anyway, through error and trial (using binary search) I have found that limit to be 648.

My question is – where does this number come from? How to change it? The max user connections is 32768, but that is not the case in reality. Is this because this is a Development license of an otherwise Enterprise edition of Sql Server 2012?

Best Answer

Mark,

Basically what is happening is that there are no more worker threads to pick up the login request or they aren't picked up in a reasonable amount of time. This comes from many logins happening around the same time with other worker threads already fulfilling requests.

The can be seen in the following DMVs:

sys.dm_exec_connections
sys.dm_exec_sessions
sys.dm_os_tasks
sys.dm_os_workers
sys.dm_os_schedulers

Since your server has 12 cores, by default the number of worker threads will be:

Max Worker Threads

MWT = 512 + (CPUs - 4) * 16
MWT = 512 + (12 - 4) * 16
MWT = 512 + 8 * 16
MWT = 512  + 128
MWT = 640

This is where you're hitting the login failures which to me is saying that at 640 runs of the test app and the server automatically setting up for 640 worker threads that this is the culprit.

Your options are:

  1. Change the max worker threads
  2. Stagger the logins
  3. Re-use the sessions you've already created