SQL Server – Why sys.fn_hadr_backup_is_preferred_replica is Slow and Alternatives

availability-groupsbackupsql server

I am using sys.fn_hadr_backup_is_preferred_replica to query the databases on each of my Availability Group replicas to figure out if the database should be backed up on that replica – then, if the function returns true, checking msdb.dbo.backupset to see if the database has been backed up recently.

The problem I'm experiencing is that the function takes a long time to run (about 3 seconds per database) – can anyone recommend an alternative that is faster?

In response to the comment below, I am currently on SQL 2012 SP1 (11.0.3000.0) – however, the queries I am creating will need to target multiple different installations of SQL Server 2012 on multiple customer sites, so, even if the performance of this system function has been improved in subsequent updates, I may not be able to force people to update, unfortunately.

Best Answer

The problem I'm experiencing is that the function takes a long time to run (about 3 seconds per database)

It's possible that this is among a growing list of dynamic management objects that don't support predicate push-down, so you will probably notice higher delays as the number of AGs and/or AG databases increases on any given replica.

And it's possible that's exactly what was fixed, as @Kin pointed out, in Connect #783011; sadly the Connect item doesn't indicate where exactly you can find the fix.

Based on some quick searches (and the date of the comment), I think you will want at least 2012 SP1 CU8, which (via KB #2918791) fixed a regression introduced in 2012 SP1 CU7 (via KB #2887115).

I don't see any evidence in later builds that suggest this function has been modified after 2012 SP1 CU8, and the modification is possibly related to the regression that was introduced.

But I am a huge proponent of using the most recent service pack branch:

  • SP2 is the only currently supported branch; RTM and SP1 have been retired, so if you're still on those, it's time to move;
  • the latest CU (currently that is SP2 CU #8);
  • if your applications use connection pooling, you may also be interested in this on-demand hotfix.

can anyone recommend an alternative that is faster?

If you look at the source for sys.fn_hadr_backup_is_preferred_replica, you can see that there is some convoluted logic and waterfall variable population that you could probably short circuit if you are not using mirroring or log shipping, by writing your own, cleaned-up version. But then you get to calls to forbidden system functions like RetrieveDbReplicaState(), which you can't call yourself, and you realize that there isn't a way to rewrite this function and replicate its full functionality

So, your best bet is to apply whatever fix Microsoft has supplied in order to optimize the function. If you still experience slowness after moving to the latest build, you can file another Connect item, or you can raise a case with CSS directly (just be aware that if you have 10,000 databases in your AG, they may just tell you to stop doing that).