As SQL 2012 is in it’s RC0 phase, I was trying to configure AlwaysON on my 2-node WSFC with node and disk majority.Next I installed Default instance of SQL Server in parallel ( to save some time ) on both nodes of the WSFC.Next I enabled my instances for AlwaysON from SQL Configuration Manager for both the instances.Next I created Availability Group to identify the databases,secondary replicas along with the mode of secondary(synchronous or asynchronous commit),endpoint port etc.
However, in my first encounter with AlwaysON, I was hit with the following error while joining the secondary replica to the Availability Group
TITLE: Microsoft SQL Server Management Studio
Failed to join the instance ‘DSI-CLUNODE2’ to the availability group ‘AVG’. (Microsoft.SqlServer.Management.SDK.TaskForms)
An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
Failed to join local availability replica to availability group ‘AVG’. The operation encountered SQL Server error 41106 and has been rolled back. Check the SQL Server error log for more details. When the cause of the error has been resolved, retry the ALTER AVAILABILITY GROUP JOIN command. (Microsoft SQL Server, Error: 41158)
When I checked the Errorlog, I see my errorlog filled with following error messages
Error: 18456, Severity: 14, State: 5.
Login failed for user ‘DOM2K8R2CLUDSI-CLUNODE2$’. Reason: Could not find a login matching the name provided.
The error message tells me login failure error messages for the machine account (DOM2K8R2CLUDSI-CLUNODE2$) of Node 2 on the SQL Instance on Node 1.
So I consulted my senior colleague and I got following reply from Suresh Kandoth ,Escalation Engineer with Microsoft CSS Team
If we are using virtual accounts (Service SIDs in my case) for service startup of db engine, then if that service goes outside the machine, authentication on the remote system will be performed using machine account. So on the target server [SRV1] we need to add the windows login [domain_nameSRV2$] and give appropriate permissions.
Now, I understand that machine account of Node 2 is required to be a login to Node 1 and the same login should have connect permissions to database mirroring endpoint created by AlwaysON Group as documented ( Thanks to my colleague Jwalanth Chavan in Microsoft for providing the link ) in the following
For two server instances to connect to each other’s database mirroring endpoint point, the login account of each instance requires access to the other instance. Also, each login account requires connect permission to the database mirroring endpoint of the other instance.
The impact of this requirement depends on whether the server instances run as the same domain user account:
· If the server instances run as the same domain user account, the correct user logins exist automatically in both master databases. This simplifies the security configuration the database and is recommended.
· If the server instances run as different user accounts, user logins on the server instance that hosts the principal server or primary replica must be manually reproduced on the server instance that hosts the mirror server or on every server instance that hosts a secondary replica. For more information, see Create a Login for a Different Account and Grant Connect Permission, later in this topic.
So to resolve the issue, I created a login for machine account (DOM2K8R2CLUDSI-CLUNODE2$) of the secondary replica on the primary instance and granted it connect permission to the database mirroring (hadr_endpoint) endpoint and vice versa.
ON Primary Replica
CREATE LOGIN [DOM2K8R2CLUDSI-CLUNODE2$] FROM WINDOWS
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [DOM2K8R2CLUDSI-CLUNODE2$]
ON Secondary Replica
CREATE LOGIN [DOM2K8R2CLUDSI-CLUNODE1$] FROM WINDOWS
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [DOM2K8R2CLUDSI-CLUNODE1$]
The above error was resolved and my secondary replica now looks clean without any errors as shown below
If the service startup account of the instances participating in AlwaysON are domain accounts, the AlwaysON GUI creates the logins for the startup account and grants it connect permissions to hadr_endpoint on the opposite instances.
However in case when the service startup account of the instances participating AlwaysON is Service SIDs, the AlwaysON GUI doesn’t create login for the machine accounts on the opposite instances and hence we see the above error. As confirmed by the product team, it is by design behaviour and recommendation is to manually
1. Add the logins for machine accounts (in other instances) and
2. Give the connect permission to hadr_endpoint.
However, I am yet to confirm whether running SQL Server instance on Service SIDs is supported for AlwaysON scenario.
Special thanks to Jwalanth Chavan,Suresh Kandoth and Balmukund Lakhani to help me identify the cause of the issue.
The same issue is also discussed in the thread
Hope this helps !!!
Premier Field Engineer