Sql-server – Why does querying for records that violate a foreign key return incorrect results

foreign keysql-server-2012stored-proceduresuniqueidentifier

While doing some pre-checking on a table before adding a new Foreign Key, we were querying to see how many current rows would violate the new key. This is a fairly active database with near constant inserts on the tables in question.

The FK would be from MessagePatientIdentifier.MessageID to Message.MessageID.

The query we used was as simple as could be:

select * from MessagePatientIdentifier as mpi
where
    MessageID not in (select MessageID from Message)

The issue that we are seeing, is that rows are being returned from MessagePatientIdentifier

MessagePatientID                     | MessageID 
553bde76-47d4-4ec3-96d1-b5d2e98931e1 | 7d45464d-8cc4-4a2e-8828-020722165b39

in this case and when you then select * from Message where MessageID = 7d45464d-8cc4-4a2e-8828-020722165b39, the record does in fact exist.

We then moved on and tried the same query, but on a table that DOES have a FK in place, referencing Message table in the same way. The same results…the query is reporting that Child table records exist without parent the corresponding parent table (Message) record.

MessageRecipID                       | MessageID 
26d6d632-87b3-407e-aeb0-04552981e5f8 | 750f0fb4-3e6c-485d-996e-f061f8caa360

Then, once again if you select * from Message where MessageID = 750f0fb4-3e6c-485d-996e-f061f8caa360, this returns the record.

This data comes in via stored procedures from a Mirth Server as well as through a BizTalk WCF-SQL send port. The proc inserts the message record, gets the new uniqueidentifier as an output variable and then uses that to call auxiliary stored procs to insert into both the MessagePatient.MessageID and MessageRecip.MessageID.

Is this expected behaviour and I'm just not educated on the inner workings of SQL? Technically I believe BizTalk runs everything in a transaction, so it shouldn't get out of sync, even if it did where would it get the MessageID value if not from the Message insert?

What am I missing here?

Best Answer

This is a fairly active database 

If you can restore a static backup somewhere, and do analysis on that unchanging copy, I think you'll find that the weird behavior you're seeing goes away.

Read Committed

The default isolation level in SQL Server is READ COMMITTED. Under this isolation level, you read the data that is currently committed to the database. The only guarantee that it provides is that when SQL Server reads a given page, it will never return an uncommitted result (known as a dirty read).

In your case you're scanning two tables and comparing them. While SQL Server is doing that scan, there is data movement.

  • You read the first few pages of data and have the first bit of data.
  • Some of the rows you haven't read yet are updated. Perhaps the MessageSubject or some other column is updated to a longer value, and this results in a page split. Or perhaps the key column is updated and the row is physically moved "up" the index into the portion you already read.
  • In both cases, it's possible for a row (or many rows) to move from a place you haven't yet read to a place you already read. You'll never see the row because it moved around. This causes the row to be "missing" from your scan.
  • It's also possible a row moves in the other direction: from a place you've already read to a spot you haven't gotten to yet. In this case, you'll see that row twice and your results will have a mysterious double result.

Your examples use integers, but in your question you mention uniqueidentifier. Since uniqueidentifiers are random, the randomness means rows are constantly being inserting into random places on the table. This increases the chances that a page fills up and has to be split to accommodate the new page, and increases the chances you see these phenomena.

Read the section titled "Locking Read Committed Behaviours" in the above linked article for a thorough explanation.

The fix?

If you want to avoid these off phenomena, don't use the default isolation level in SQL Server. My favorite & default isolation level is READ COMMITTED SNAPSHOT, which avoids these problems. (Read about all the isolation levels here to decide which one is right for you.)

Or if it's just for a one-time analysis, you could use a database snapshot to create a static image to query, or you could stop writing to the database, or restore a copy elsewhere. Stopping writes will stop data movement, and you won't run into the problem.