Failed to join the instance to the availability group while configuring AlwaysON

As SQL 2012 is in it’s RC0 phase, I was trying to configure AlwaysON on my 2-node WSFC with node and disk majority.Next I installed Default instance of SQL Server in parallel ( to save some time Smile ) on both nodes of the WSFC.Next I enabled my instances for AlwaysON from SQL Configuration Manager for both the instances.Next I created Availability Group to identify the databases,secondary replicas along with the mode of secondary(synchronous or asynchronous commit),endpoint port etc.

However, in my first encounter with AlwaysON, I was hit with the following error while joining the secondary replica to the Availability Group

TITLE: Microsoft SQL Server Management Studio
————————————————————-

Failed to join the instance ‘DSI-CLUNODE2’ to the availability group ‘AVG’. (Microsoft.SqlServer.Management.SDK.TaskForms)

——————————
ADDITIONAL INFORMATION:

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

——————————

Failed to join local availability replica to availability group ‘AVG’.  The operation encountered SQL Server error 41106 and has been rolled back.  Check the SQL Server error log for more details.  When the cause of the error has been resolved, retry the ALTER AVAILABILITY GROUP JOIN command. (Microsoft SQL Server, Error: 41158)

When I checked the Errorlog, I see my errorlog filled with following error messages

Error: 18456, Severity: 14, State: 5.
Login failed for user ‘DOM2K8R2CLUDSI-CLUNODE2$’. Reason: Could not find a login matching the name provided
.

The error message tells me login failure error messages for the machine account (DOM2K8R2CLUDSI-CLUNODE2$) of Node 2 on the SQL Instance on Node 1.

So I consulted my senior colleague and I got following reply from Suresh Kandoth ,Escalation Engineer with Microsoft CSS Team

If we are using virtual accounts (Service SIDs in my case) for service startup of db engine, then if that service goes outside the machine, authentication on the remote system will be performed using machine account. So on the target server [SRV1] we need to add the windows login [domain_nameSRV2$] and give appropriate permissions.

Now, I understand that machine account of Node 2 is required to be a login to Node 1 and the same login should have connect permissions to database mirroring endpoint created by AlwaysON Group as documented ( Thanks to my colleague Jwalanth Chavan in Microsoft for providing the link ) in the following

MSDN link:  http://msdn.microsoft.com/en-us/library/ms366346(v=SQL.110).aspx

For two server instances to connect to each other’s database mirroring endpoint point, the login account of each instance requires access to the other instance. Also, each login account requires connect permission to the database mirroring endpoint of the other instance.

The impact of this requirement depends on whether the server instances run as the same domain user account:

· If the server instances run as the same domain user account, the correct user logins exist automatically in both master databases. This simplifies the security configuration the database and is recommended.

· If the server instances run as different user accounts, user logins on the server instance that hosts the principal server or primary replica must be manually reproduced on the server instance that hosts the mirror server or on every server instance that hosts a secondary replica. For more information, see Create a Login for a Different Account and Grant Connect Permission, later in this topic.

So to resolve the issue, I created a login for machine account (DOM2K8R2CLUDSI-CLUNODE2$) of the secondary replica on the primary instance and granted it connect permission to the database mirroring (hadr_endpoint) endpoint and vice versa.

ON Primary Replica

use [master]
GO
CREATE LOGIN [DOM2K8R2CLUDSI-CLUNODE2$] FROM WINDOWS
GO
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [DOM2K8R2CLUDSI-CLUNODE2$]
GO

ON Secondary Replica

use [master]
GO
CREATE LOGIN [DOM2K8R2CLUDSI-CLUNODE1$] FROM WINDOWS
GO
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [DOM2K8R2CLUDSI-CLUNODE1$]
GO

The above error was resolved and my secondary replica now looks clean without any errors as shown below

image

To conclude,

If the service startup account of the instances participating in AlwaysON are domain accounts, the AlwaysON GUI  creates the logins for the startup account and grants it connect permissions to hadr_endpoint on the opposite instances.

However in case when  the service startup account of the instances participating  AlwaysON is Service SIDs, the AlwaysON GUI doesn’t create login for the machine accounts on the opposite instances and hence we see the above error. As confirmed by the product team, it is by design behaviour and  recommendation is to manually

1. Add the logins for machine accounts (in other instances) and
2. Give the connect permission to hadr_endpoint.

However, I am yet to confirm whether running SQL Server instance on Service SIDs is supported for AlwaysON scenario.

Special thanks to Jwalanth Chavan,Suresh Kandoth and Balmukund Lakhani to help me identify the cause of the issue.

The same issue is also discussed in the thread

http://social.msdn.microsoft.com/Forums/en-US/sqlhadr/thread/616b9f77-fbd4-49fa-9887-b84ca72544a5

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer

11 comments

  1. Hi,

    I am thank full to you to write this problem in detail and its solution is very easy to understand. I am wondering a day around internet to finding my solution and finally it helps.

    Regards,
    Shamas Saeed.
    DBA EnPointe (PAK)

  2. Just one thing, I had to remove databases from the Availability group, delete from secondary and then rejoin.

    After this, it worked like charm.

    Thanks again.
    Neeti

  3. Hi,

    Thank you for this clear explanation. I however still have problems with the joining db to the secondary server. Server seems to be down.

    I’ve got Windows 2012 R2 and SQL 2012 (11.0.3000) and the services run under a domain account (special SQL service account).
    I can make a succesful first AG and all works fine. But when I make a second AG I always get the error that joining isn’t possible. Nomather what I’ve tried.

    I’ve pulled out my hair and don’t know where to start getting it all working again.

    Thanx in advance.

    Best Wishes,

    Jeroen

  4. Thank you for this page, this resolved this issue for me as well. What a superb challenge Microsoft offers us by failing to document their software & it’s foibles.

  5. Hi ! im sorry I don’t speak english very well… can you please show me how to do these two steps :

    1. Add the logins for machine accounts (in other instances) and
    2. Give the connect permission to hadr_endpoint.

    Thanks.

  6. Hi,

    Thanks you so much for this resolution , So many sites have same resolution but you explained very well.

    Appreciate your help.

    Thanks !!!

  7. This is still applicable to SQL 2016 Availability Groups, as of today. Excellent article for a situatin I had not encountered before

Leave a Reply to Shamas saeed Cancel reply

Your email address will not be published. Required fields are marked *