Sql-server – SQL Server 2005 tempdb Time out occurred while waiting for buffer latch type 4 (pagenum)

sql serversql-server-2005tempdb

Three separate processes have failed in the last two days with very similar errors:

Time-out occurred while waiting for buffer latch type 4 for page (1:1189832), 
database ID 2.

Time-out occurred while waiting for buffer latch type 4 for page (1:1189827), 
database ID 2.

Time-out occurred while waiting for buffer latch type 4 for page (1:1189827), 
database ID 2.

Two of them are exactly the same, the other is accessing a slightly different pagenum. Database #2 is tempdb. I tried to run DBCC PAGE(tempdb, 1, 1189827, 1) but the output is just gibberish to me. I'm fairly lost.

Does anyone know what the root cause of these timeouts might be? (I've seen dozens of suggestions on other forums, among which was memory corruption)

Any help would be greatly appreciated. If you would like to read the DBCC PAGE output, ask and I will post it.

Best Answer

By the time that you whip out dbcc page and try to look at the page, it's possibly/probably being used for something else anyway. It is tempdb, after all.

I've been at this since SQL 6.5 and I have only ever seen these sorts of page timeouts caused by slow or misconfigured storage. Every incident I can remember offhand, it's been a SAN. If you have a SAN, you should definitely talk to your SAN guy/gal and find out if they have been seeing unusual load on the system or if they have changed anything lately. I've seen problems with flaky fiber switches and cables, for example, or maybe they moved your data onto slower disks lately.

Beyond that, Have a hard look at your drives, SAN or otherwise. Look at the "Avg. Disk Sec/Read" and "Avg. Disk Sec/Write" numbers for the drive that the tempdb MDF file is stored on while stuff is happening, preferably while you are getting errors. It's helpful to look at the bytes/second numbers as well, to get a feel for what the drives are being asked to do (a drive with 1000 ms latency numbers that is being asked to push only 1 MB/sec is one thing, a drive with 50 ms latency when pushing at 10 MB/s is another.)

If the drive is overwhelmed, you may need to reduce the I/O going to the drive. Tactics would include moving tempdb files to another device, creating additional tempdb data files on other devices, tuning queries (any unnecessary GROUP BY or DISTINCT?), moving any data files that are also on that storage device to somewhere else, etc.