SQL SERVER 2005 RESOURCE FAILS TO COME ONLINE ON CLUSTER AFTER CHANGING THE SAN DRIVE TO ADD MORE DISK SPACE

We have observed a number of issue now where in clustered instance of sql server 2005 fails to come online after performing changing on shared disk on a cluster.

As mentioned in my previous posts if the sql server fails to come online we need to start troubleshooting by looking at the sql server error log. So we checked the sql server error log and  observed the following error

2009-01-11 21:10:06.96 Server      Error: 26054, Severity: 16, State: 1.
2009-01-11 21:10:06.96 Server      Could not find any IP address that this SQL Server instance depends upon.  Make sure that the cluster service is running, that the dependency relationship between SQL Server and Network Name resources is correct, and that the IP addresses on which this SQL Server instance depends are available.  Error code: 0x103.
2009-01-11 21:10:06.96 Server      Error: 17182, Severity: 16, State: 1.
2009-01-11 21:10:06.96 Server      TDSSNIClient initialization failed with error 0x103, status code 0xa.
2009-01-11 21:10:06.96 Server      Error: 17182, Severity: 16, State: 1.
2009-01-11 21:10:06.96 Server      TDSSNIClient initialization failed with error 0x103, status code 0x1.
2009-01-11 21:10:06.96 Server      Error: 17826, Severity: 18, State: 3.
2009-01-11 21:10:06.96 Server      Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.
2009-01-11 21:10:06.96 Server      Error: 17120, Severity: 16, State: 1.
2009-01-11 21:10:06.96 Server      SQL Server could not spawn FRunCM thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.

As seen from the above Error 26054 message which says that cluster service is unable to find the dependency between the sql server instance and the sql network resource name.

We checked the Cluster Administrator for the sql server resource and found that the dependency exists between the sql server resource and sql network name but still we receive the error.

Cause
=====

We checked the Registry Location HKLMClusterResources<SQL Server GUID>Parameters

The values InstanceName and VirtualServerName were missing from the keys below:

HKLMCluster<SQL Server GUID>Parameters
HKLMCluster<SQL Agent GUID>Parameters

The above registry keys are used by the cluster administrator to connect and start the sql server service. So the Cluster Administrator was not able to connect and start the sql server  resource and hence we receive the above error

Resolution
========

IMPORTANT : This resolution contains information about modifying the registry. Before you modify the registry, make sure to back it up and make sure that you understand how to restore the registry if a problem occurs. For information about how to back up, restore, and edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:

256986 Description of the Microsoft Windows Registry

We added the following registry values in the Parameters registry key:

For a named instance of SQL Server:

• InstanceName
Value Name: InstanceName
Value Type: REG_SZ
Value Data: <instance name of the named instance>

• VirtualServerName
Value Name: VirtualServerName
Value Type: REG_SZ
Value Data: <Virtual server name of the instance>

Once we added the registry keys we were able to bring the sql server online on Node1. 

We failed the instance over to Node2 we checked the above registry keys and found they were missing, so we added missing keys and rebooted both the servers of the clusters.

After rebooting the server the sql server resources came online fine without any errors

 

Note:

This post should not be treated as the Microsoft’s Recommendation or Resolution to the problem, it is only a workaround which worked in our environment and so we would like to share it.

 

3 comments

  1. Good Post. I changed HDD’s in virtual world in ESX and the 2 registry key entries kept me from rebuilding the entire damn thing. Work around worked for me.

  2. Thanks Pariks! Just resolved a case within minutes using this blog. The error message is so misleading I’d have taken hours to figure out that those are the reghistry keys which are missing.
    Keep up the good work!

Leave a Reply to Prashant Cancel reply

Your email address will not be published. Required fields are marked *