Sql-server – Identifying why query is sleeping with open transactions

errorssp-whoisactivesql serversql-server-2012transaction

We have a 3rd-party application which is sporadically leaving connections in a sleeping state with an open transaction. This is causing many other queries to be blocked. I've been logging sp_WhoIsActive to a table, and below is the lead blocker. I believe the Java application is using Hibernate ORM.

The query is against a very small table with just a few thousand rows. If I run this statement manually, it finishes immediately. The query in my sp_WhoIsActive table shows NULL in the wait_info table, and 10 million reads, which sounds suspicious. I force statistics updates every day on the entire database as we've seen hundreds of millions of reads on similar small queries in the past. That seems to have helped, but this still doesn't sound right.

How can I troubleshoot this to see what's happening? The record does not show the execution plan.

(@P0 decimal(38,0),@P1 decimal(38,0),@P2 nvarchar(4000))DELETE FROM 
  SCHEDULE WHERE (PROCESS_ID= @P0 AND PARENT_ID= @P1 AND SCHEDULE!= @P2)

Best Answer

Been there 10+ years ago, here's what we did

We had the same issue with Kronos/Payroll. It get's pretty serious when people cannot get their paycheck. Like yourself, I narrowed things down to open/sleeping transactions and it was Java related as well. What's more when we contacted the vendor, they refused to take any responsibility for it. So we internally had to find a solution. I reviewed the vendor API and it definitely looked like the issue was on the vendor side.

In my case, which MAY not be your case, I discovered by working with the payroll staff that these open sleeping transactions had actually completed their work. So we implemented the following job that still looks for these processes and kills them to this day.

Basically, a T-SQL job shown in Listing A calls stored procedure in Listing B to kill the open/sleeping transactions, and the data about these issues is stored in Listing C. My apologies for any legacy T-SQL faux pas you may find--this code is crusty. Please try it in your dev/test environment first to see if it works for you. Just talking about killing these is just as creepy today as it was then--I don't like it.

Oh yes, the kicker for us all is that killing the open/sleeping transactions did NOT roll them back--but allowed them to complete (as well as stop all the blocking chains). My guess is that they were not programmed correctly to be distributed but rather they programmed the app and SQL Server to be on the same server.

Listing A: T-SQL job killing open/sleeping transactions (first four lines should be commented out)

This process kills transactions that have the following attributes
1.) They're blocking other transactions
2.) They're sleeping (i.e. doing no work)
3.) They're in a transaction that has not closed (continues to remain open)



while (1=1)
begin
declare @sleepingblocks int
exec sp_kill2 @sleepingblocks=@sleepingblocks output
--print @sleepingblocks
while (@sleepingblocks > 0)
begin
      waitfor delay '00:00:15'
      exec sp_kill2 @sleepingblocks=@sleepingblocks output
--    print @sleepingblocks
end
end

Listing B: the sp_kill2 stored procedure looks for open/sleeping transactions and kills them

USE [master]
GO
/****** Object:  StoredProcedure [dbo].[sp_kill2]    Script Date: 10/13/2017 4:07:20 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

    ALTER proc [dbo].[sp_kill2]
    @sleepingblocks int output
    as
    --declare @sleepingblocks int
    --select count(*) from kronos..totaleventts where totalizationstatus <> 0
    select @sleepingblocks=0
    set nocount on
    create table #one
    (
          RowID int primary key identity(1,1),
          spid int,
          sqlhandle binary(20)
    )

    create table #two
    (
          RowID int primary key identity(1,1),
          spid int,
          sqlhandle binary(20)
    )

    create table #three
    (
          RowID int primary key identity(1,1),
          killcmd varchar(4000),
        spid varchar(4000)

    )


    insert into #one(spid,sqlhandle)
    select distinct T.spid,T.sql_handle from master..sysprocesses T --where spid=169
    join (select distinct blocked,spid,sql_handle from master..sysprocesses where blocked <> 0) D 
    on T.spid = D.blocked
    where T.status like '%sleeping%' and open_tran > 0
    select * from #one

    waitfor delay '00:00:15'

    insert into #two(spid,sqlhandle)
    select distinct T.spid,T.sql_handle from master..sysprocesses T --where spid=169
    join (select distinct blocked,spid,sql_handle from master..sysprocesses where blocked <> 0) D 
    on T.spid = D.blocked
    where T.status like '%sleeping%' and open_tran > 0
    select * from #two

    insert into #three(killcmd,spid)
    select distinct 'kill ' +  convert(varchar(255),a.spid),convert(varchar(255),a.spid) from #one a join #two b on a.spid = b.spid and a.sqlhandle = b.sqlhandle

    declare @currow int, @maxrow int, @cmdstring varchar(8000)
    select @currow=1
    select @maxrow=0
    select @maxrow = count(*) from #three

    while (@currow <=@maxrow)
    begin



    select @cmdstring = 'select distinct spid
    ,kpid
    ,blocked
    ,waittype
    ,waittime
    ,lastwaittype
    ,waitresource
    ,sp.dbid
    ,uid
    ,cpu
    ,physical_io
    ,memusage
    ,login_time
    ,last_batch 
    ,ecid
    ,open_tran
    ,status  
    ,sid
    ,hostname
    ,program_name
    ,hostprocess
    ,cmd 
    ,nt_domain 
    ,nt_username
    ,net_address 
    ,net_library
    ,loginame
    ,context_info
    ,sql_handle
    ,stmt_start
    ,stmt_end
    ,request_id 
    ,text from master..sysprocesses sp cross apply sys.dm_exec_sql_text(sql_handle) as st where sp.spid ='+spid  from #three where RowID = @currow

    --print @cmdstring
    --select @cmdstring = 'dbcc inputbuffer('+spid+')' from #three where RowID = @currow


    insert into dbadmin..KilledProcesses3(
    spid
    ,kpid
    ,blocked
    ,waittype
    ,waittime
    ,lastwaittype
    ,waitresource
    ,dbid
    ,uid
    ,cpu
    ,physical_io
    ,memusage
    ,login_time
    ,last_batch 
    ,ecid
    ,open_tran
    ,status  
    ,sid
    ,hostname
    ,program_name
    ,hostprocess
    ,cmd 
    ,nt_domain 
    ,nt_username
    ,net_address 

    ,net_library
    ,loginame
    ,context_info
    ,sql_handle
    ,stmt_start
    ,stmt_end
    ,request_id 
    ,text
    )
    exec (@cmdstring)
    --print @cmdstring

    --select @cmdstring = 'dbcc inputbuffer('+spid+')' from #three where RowID = @currow
    select @cmdstring = killcmd from #three where RowID = @currow
    --print @cmdstring
    exec (@cmdstring)
    select @currow = @currow +1
    select @sleepingblocks =1
    end

    drop table #one
    drop table #two
    drop table #three

    /*
    select 
    spid
    ,kpid
    ,blocked
    ,waittype
    ,waittime
    ,lastwaittype
    ,waitresource
    ,sp.dbid
    ,uid
    ,cpu
    ,physical_io
    ,memusage
    ,login_time
    ,last_batch 
    ,ecid
    ,open_tran
    ,status  
    ,sid
    ,hostname
    ,program_name
    ,hostprocess
    ,cmd 
    ,nt_domain 
    ,nt_username
    ,net_address 
    ,net_library
    ,loginame
    ,context_info
    ,sql_handle
    ,stmt_start
    ,stmt_end
    ,request_id 
    ,text
    from master..sysprocesses sp cross apply sys.dm_exec_sql_text(sql_handle) as st where sp.spid =169
    select * from dbadmin..KilledProcesses3

    truncate table dbadmin..killedprocesses3


    */

Listing C: Processes3 table contains data about the sleeping/open transactions

USE [dbadmin]
GO

/****** Object:  Table [dbo].[killedprocesses3]    Script Date: 10/13/2017 4:08:56 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING OFF
GO

    CREATE TABLE [dbo].[killedprocesses3](
        [spid] [smallint] NOT NULL,
        [kpid] [smallint] NOT NULL,
        [blocked] [smallint] NOT NULL,
        [waittype] [binary](2) NOT NULL,
        [waittime] [bigint] NOT NULL,
        [lastwaittype] [nchar](32) NOT NULL,
        [waitresource] [nchar](256) NOT NULL,
        [dbid] [smallint] NOT NULL,
        [uid] [smallint] NULL,
        [cpu] [int] NOT NULL,
        [physical_io] [bigint] NOT NULL,
        [memusage] [int] NOT NULL,
        [login_time] [datetime] NOT NULL,
        [last_batch] [datetime] NOT NULL,
        [ecid] [smallint] NOT NULL,
        [open_tran] [smallint] NOT NULL,
        [status] [nchar](30) NOT NULL,
        [sid] [binary](86) NOT NULL,
        [hostname] [nchar](128) NOT NULL,
        [program_name] [nchar](128) NOT NULL,
        [hostprocess] [nchar](10) NOT NULL,
        [cmd] [nchar](16) NOT NULL,
        [nt_domain] [nchar](128) NOT NULL,
        [nt_username] [nchar](128) NOT NULL,
        [net_address] [nchar](12) NOT NULL,
        [net_library] [nchar](12) NOT NULL,
        [loginame] [nchar](128) NOT NULL,
        [context_info] [binary](128) NOT NULL,
        [sql_handle] [binary](20) NOT NULL,
        [stmt_start] [int] NOT NULL,
        [stmt_end] [int] NOT NULL,
        [request_id] [int] NOT NULL,
        [text] [nvarchar](max) NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF
    GO