I've been trying to track down the source of my tempdb
log filling up every few weeks where the tempdb
log space used will continue to grow until it reaches 99% and stays there for hours while the log file grows. After adding in some additional logging and email alerts, I think I identified the source of the problem, at least for this incident, not sure if the cause is the same in past incidents.
dbcc opentran
in the tempdb
reported this:
Oldest active transaction:
SPID (server process ID): 47s
UID (user ID) : -1
Name : sort_init
LSN : (41:3017:42)
Start time : Dec 19 2020 10:03:20:410AM
SID : 0x01
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
The spid
is 47s
which I discovered is a system spid. I ran sp_WhoIsActive
with @show_system_spids = 1
which brought back data on the process. After running sp_WhoIsActive
over the course of an hour, tempdb_allocations
ended up at 4,221,088
, tempdb_current
was 4,128,656
, open_tran_count
was 1
(throughout the entire incident) and sql_text
was always null
. After running out of ideas, I ended up restarting SQL Server. Several times prior to restarting SQL server, I issued a checkpoint
in tempdb
but that didn't help, likely since the transaction from the system process was still active.
The Start time
from the dbcc opentran
output is 10:03am and it was 1:00pm when I restarted SQL Server. I ran dbcc opentran
a few times over the course of an hour and the same output was returned each time.
In this particular case, the server was rebooted and it restarted at 10:02:54 — 26 seconds before this long running system process started at 10:03:20. Past incidents don't appear to have happened immediately after a system reboot. I don't have many details on past incidents, but at least the last time this occurred 9 days ago, in the sp_WhoIsActive
output I collected (without @show_system_spids
set to 1
) there were no long running processes captured which likely means it was a similar system process running / filling up the tempdb
log. This 99% full tempdb
log incident 9 days ago was ongoing for 7 hours.
Nothing stands out in the SQL Server log and Windows logs. I'm running Microsoft SQL Server 2017 (RTM-CU15-GDR) (KB4505225) – 14.0.3192.2 (X64).
Is there a way to determine what a system SPID is doing? Or any tips on how I can better troubleshoot this problem when it happens next time? The only clue from this incident is the Name
value of sort_init
in the dbcc opentran
output which could be helpful but seems generic at the same time.
Best Answer
There are some newer DMVs designed specifically for troubleshooting what is happening in TEMPDB. Take a look at sys.dm_db_session_space_usage and sys.dm_db_task_space_usage. See this link for a discussion of those DMSVs