Sql-server – Login in server suddenly lost mapping to their Users in databases

availability-groupsloginsSecuritysql serversql server 2014

Mine is Always ON availability environment. I have 3 SQL 2014 servers(Primary and 2 replicas). Since one year all the logins working FINE in the servers. Some how suddenly, I am getting errors from the Application server stating that the PIDS were not able to access the Database. The PIDS are successfully logging in the server but not able to access database. I checked the mappings of the login in the server. I seen no database was checked. I dont know how it lost its mapping to user in the database.

When I checked for the user in the database, I can see that user exist in the database and has all the required privileges assigned to that user too..

NOte: I tried to restart the server that which fix the problem for 2 hours and again we ran back to the same issue. I droped all those users(who lost mapping from Login) and recreated them with all the SIDs synced in all three servers. This fixed my problem for now. but I want to know the root cause of the issue.

Question 1: How my user lost their mappings with login suddenly that which are working absolutely fine since 1 YEAR.

Question 2: How a simple server restart fix the issue for a couple of hours and again why the problem came back.

Best Answer

Question 1: How my user lost their mappings with login suddenly that which are working absolutely fine since 1 YEAR.

There is nothing internal to AGs that would do this. This had to have been done by someone or something. The Logins and Users are mapped via their SID, thus if the mapping no longer was valid (different SIDs), the only way those can become different is either it was initially setup incorrectly or someone/something changed it. You could attempt to look at the default trace and see if it has any data, but if you weren't auditing for it chances are you won't find it retroactively as SQL Server doesn't save that type of data.

Question 2: How a simple server restart fix the issue for a couple of hours and again why the problem came back.

The "simple restart" just masked the problem. If you restarted a primary replica, then the primary would have changed - probably to the one with a working mapping. If it was a secondary when restarted, then the readable connection (if using read only routing) would have been disconnected and a reconnect would have went to your other secondary where the mapping was probably fine.

The restart didn't "fix" anything, you got lucky and "masked" it. This is completely dependent on the exact state of each replica at the time you completed this.

When I checked for the user in the database, I can see that user exist in the database and has all the required privileges assigned to that user too..

Yes, and it should. If it didn't then someone deleted the user and re-created it.

Users exist inside the database and thus their information is sent to each replica. Logins, however, are server level objects stored in master and thus are not part of the AG traffic as it is outside the bounds of the database. That's how this happens (mismatched SIDs).

Also, this is only an issue with a SQL Login - as Windows logins obtain their SID from active directory and shouldn't change (note that it can but takes admin intervention and commands).