We have a very strange problem with a new SQL Server 2019 on 2 new physical machines:
Infrastructure:
Starting a new installation of SQL Server 2019 Enterprise (15.0.2000.5 / X64 on Windows Server 2019 Standard 10.0 / Build 17763) on 2 new physical machines (for AlwaysOn replicas).
The new machines are a LENOVO :
- ThinkSystem SR630 – [7X02CTO1WW]
- 1 CPU : 1 Xeon Gold 6208U – 2.90 Ghz (16 cores x 2 – hyperthreading)
- 256 Gb RAM (32 Gb x 8)
The problem appears systematically on the 2 new machines…
Test:
The test that generate the malfunctions is as follows:
- creation of a database
- enlarging the database files and switching to SIMPLE recovery mode
- creation of a table
- filling the table with 1 million rows
- execution of the test request (calculation of aggregated time intervals – collapsing)
It is this last query (executed almost 10 times) which very often leads to an error with the following message:
Msg 601, Level 12, State 1, Line … Could not continue scan with NOLOCK due to data movement.
Of course, at no point did we implement the NOLOCK
hint or the READ UNCOMMITTED
isolation level. But… information is recorded in the SQL Server event log:
- time-out for buffer latch in pages of tempdb
- sometimes a dump is generated
Example of « buffer latch » messages :
A time-out occurred while waiting for buffer latch — type 4, bp 00000292CE3D60C0, page 9:18634, stat 0x10b, database id: 2, allocation
unit Id: 422212527063040/140737550352384, task 0x00000292AB073468 : 9,
waittime 300 seconds, flags 0x100000001a, owning task
0x00000292AB06B848. Not continuing to wait. A time-out occurred while
waiting for buffer latch — type 4, bp 00000292CE398340, page 6:10372,
stat 0x10b, database id: 2, allocation unit Id:
422212527063040/140737550352384, task 0x00000292AB07B468 : 8, waittime
300 seconds, flags 0x1a, owning task 0x00000292AB073468. Not
continuing to wait. A time-out occurred while waiting for buffer latch
— type 4, bp 00000292CE3DC480, page 9:18655, stat 0x10b, database id:
2, allocation unit Id: 422212527063040/140737550352384, task
0x00000292AB703C28 : 12, waittime 300 seconds, flags 0x1a, owning task
0x00000292AB07B468. Not continuing to wait.
Our investigations:
- Installing CU6 and after CU7 does not solve the problem
- Disk failure has been sweeped, because we create the database on the 3 different hard disk drives. And the problem remains, even when we move the tempdb.
- The « buffer latch » information likely indicates a memory fault. But no hardware information appears to be about the RAM in the servers (we performed a complete memory test).
- While soft NUMA is enabled, we performed complementary tests by disabling SOFT NUMA, but it does not solve the problem.
Some elements that seems to decrease the occurrence of the phenomenon:
- disabling SOFT NUMA (with hyperthreading enabled): the first 3 test are successes and last 7 are failure
- disabling hyperthreading (with SOFT NUMA disabled): on 10 tests only 1 is a failure
- disabling hyperthreading (with SOFT NUMA enabled): on 90 tests only 1 is a failure
Some elements that seems to solve the problem:
- setting MAXDOP 1
- recomputing statistics (UPDATE STATISTICS T_TIME_INTERVAL_TIV WITH FULLSCAN;)
We remove the RAM and replace it by another, with a different reference. The problem persists…
Perhaps this has something to do with the new « Concurrent PFS updates » of SQL Server 2019…
is there any trace flag that disables this new behaviour ?
========== SCRIPT ==========
-- 1) creating database
CREATE DATABASE DB_BENCH
GO
DECLARE @SQL NVARCHAR(max) = N'';
SELECT @SQL = @SQL + N'ALTER DATABASE DB_BENCH MODIFY FILE (NAME = ''' + name + N''', SIZE = 10 GB, FILEGROWTH = 64 MB);'
FROM DB_BENCH.sys.database_files;
SET @SQL = @SQL + N'ALTER DATABASE DB_BENCH SET RECOVERY SIMPLE;'
EXEC (@SQL);
GO
USE DB_BENCH
GO
-- 2) creating table and view
CREATE TABLE T_TIME_INTERVAL_TIV
(TIV_ID INT NOT NULL IDENTITY PRIMARY KEY,
TIV_GROUP INT,
TIV_DEBUT DATETIME2(0),
TIV_FIN DATETIME2(0))
GO
CREATE VIEW V
AS
SELECT TIV_GROUP AS id, TIV_DEBUT AS intime, TIV_FIN AS outtime
FROM T_TIME_INTERVAL_TIV
GO
-- 3) inserting datas
TRUNCATE TABLE T_TIME_INTERVAL_TIV;
GO
BULK INSERT T_TIME_INTERVAL_TIV
FROM "C:\DATA_SQL\intervals.txt"
WITH (KEEPIDENTITY , FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n');
GO
-- 4) testing
SET NOCOUNT ON;
GO
SET STATISTICS TIME ON;
GO
WITH T1 As
(SELECT id, intime
FROM V
UNION ALL
SELECT id, outtime FROM V),
T2 As
(SELECT ROW_NUMBER() OVER(PARTITION BY id ORDER BY intime) NN, id, intime
FROM T1 T1_1),
T3 As
(SELECT T2_1.NN - ROW_NUMBER() OVER(PARTITION BY T2_1.id ORDER BY T2_1.intime,T2_2.intime) NN1,
T2_1.id, T2_1.intime intime, T2_2.intime outtime
FROM T2 T2_1
INNER JOIN T2 T2_2
ON T2_1.id=T2_2.id
And T2_1.NN=T2_2.NN-1
WHERE EXISTS (SELECT *
FROM V S
WHERE S.id=T2_1.id
AND (S.intime < T2_2.intime AND S.outtime>T2_1.intime))
OR T2_1.intime = T2_2.intime)
SELECT id, MIN(intime) intime, MAX(outtime) outtime
FROM T3
GROUP BY id, NN1
ORDER BY id, intime, outtime;
========== END OF SCRIPT ==========
File to execute the script:
- data
- dump 0005 SQL Server
2.1 log
2.2 mdmp
2.3 txt
2.4 SQLDUMPER_ERRORLOG.log - Query plans
3.1 raw data
3.2 after a UPDATE STATS in FULLSCAN mode
3.3 Comparison of the two plans
On 2020-09-14UTC16:00, complementary tests:
Using 2017 version of the database on the SQL Server 2019 as this:
USE master
GO
ALTER DATABASE DB_BENCH SET COMPATIBILITY_LEVEL = 140;
GO
Results without any error over 180 execution of the same query
Using 2019 version and enabling TF 3925 or 3972 or the combination of 3925 and 3972, results with systematically error on all execution (with MAXDOP 0, hyperthrading on and NUMA soft on).
Best Answer
SEE THE SAMPLE REPORT AT THE VERY BOTTOM
I offer here a detailed and complete proposed answer. The poster will need to respond as accepted or provide feedback to improve the solution. My opinion is that the issue reported was not a fault with Microsoft SQL Server 2019, not about TRACE FLAGS, and not about TEMPDB. The issue was the code and the conceptualization of the requirements. The requirements are fairly straightforward, as follows.
The general concept here is similar to a FILO Stack (First In, Last Out). It simulates EXACTLY one thousand (1000) number-classified "Actors" going onto a crowded stage with a writable journal or logging card, where the Actors enter the stage and leave the stage many times over a long series of multiple Broadway Shows, managed by Liza Minelli in New York City, USA, long before COVID.
The exit and enter cycles of each of the 1000 Actors can go on for days and years because it is a very successful show, and their exits and entries are tracked to the second of the day. The data has a granularity of 86400 seconds per day, with no fractions. I have not analyzed if more than one Actor can exit and enter simultaneously, but there will be evidence in the data.
Each Actor enters at Stage Right, and then subsequently exits at Stage Left. Each Actor is "labelled" with a numbered card. This is called a "Classifier" by some business analyst working for Liza Minelli. The analyst assigns this badge number to each Actor, and sticks the stinking badge on their shoulder.
The Actor's badge number does NOT uniquely identify each row in the Poster source data (SQL Pro) because the Actor with the Badge Number may enter the Stage and exits the Stage multiple times over the years, just like a Symbolic Prophet.
Thus, their Classifier appears many times on each row with different enter and exit datetimes. Each datetime they enter at stage right, the Actor gets a new datetime stamped on a logging card that they carry in the "Datetime IN" column. Next, when they subsequently exit at stage left, the Actor gets another stamp in another column named "Datetime OUT".
Over the years, the Actor can enter and exit the stage multiple times, but the annual report for the Security Department under Liza Minelli only needs to see the first "entering Stage Right", known as "Datetime IN" and the last "exiting Stage Left", known as "Datetime OUT". They need this for each individual Actor who is classified with the numbered stinking badge on their shoulder. Note, the badges do stink. In fact, even if the Actors say they don't need no stinking badges, they still stink. Giggle now to stay happy.
Also, the first will come last and the last will come first, since Liza Minelli has Faith (see Matthew 19:30). ANYWAY, here are the steps to complete this requirement / exigence:
All the code is listed here, including all DDL (data definition) and DML (data manipulation / T-SQL), and even dozens of the result rows. Square brackets are shown below, not literally in the object names created. I just use them as a matter of convention. The code blocks below will be referenced by the following numbered list.
The report should have the following features:
This is actually a classic STACK solution. First In, Last Out (push / pop). The poster appears to require 1 row for each classifier, where the row shows the classifier number (a non-key integer), with their entries and exits. See Shakespeare for more on Actors entering and exiting the Stage of Life.
Next:
Next, the OLEDB OPENROWSET BULK PROVIDER FORMAT FILE
Next, Seeding SPROC and Error SPROC
Next, the seeding and loader. You'll need the Data from SQL Pro's post
Report: first few dozen rows: