Sql-server – SQL Server Service Broker – CHECKDB found consistency errors, how could this happen

service-brokersql serversql-server-2005sql-server-2012

I encountered a weird SQL Server Service Broker consistency problem, namely I have a queue that holds messages (status=1, Ready to receive) which are lying there for a very long time.

In the meantime there was an upgrade from SQL 2005 to SQL 2012 and now when I want to do something with these messages (e.g. END CONVERSATION, even with CLEANUP) I get an error:

The conversation handle "……." is not found.

So I tried to look for this handle in sys.conversation_endpoints, but to my surprise it's not there. This shouldn't have happened, as far as I know if there are messages in the queue with associated conversation_group_id and conversation_handle, there should be an endpoint for each one.

I ended up running DBCC CHECKDB and got messages such as:

Msg 8997, Level 16, State 1, Line 1 Service Broker Msg 9708, State 1:
The messages in the queue with ID some_number are referencing the
invalid conversation group 'some_guid'.

Msg 8997, Level 16, State 1, Line 1 Service Broker Msg 9705, State 1:
The messages in the queue with ID some_number are referencing the
invalid conversation handle 'some_guid'.


My question is: how could this happen? Are there any potential DEV/DBA errors or gotchas e.g. while performing SQL upgrade, that might lead to such corruption?

Best Answer

Corruptions can occur due to media corruption or memory bit-flips. Ideally, one has a backup and restores from a backup.

If you cannot restore from a uncorrupted DB, when you have a situation where you have inconsistencies between internal tables you can try to run REPAIR_ALLOW_DATA_LOSS. If repair does not fix the issues you can force a new broker in the database (ALTER DATABASE ... SET NEW_BROKER) which will wipe clean all conversations and messages. You will loose all messages pending processing. You need to exercise care if you have remote conversations (endpoints in different server or even different DB) as the partner endpoint will be left stranded and will have to be cleaned up. NEW_BROKER changes the service_broker_guid in sys.databases and this may impact routes.

If you need more detailed help you'll have to open a case with product support.

Are there any potential DEV/DBA errors or gotchas e.g. while performing SQL upgrade, that might lead to such corruption?

I'm not aware of any, I suspect more likely to be flaky media/memory/cpu (they all can cause corruptions).