It has been really long time since I have updated this blog as my Job involves a lot of travel however I try my best to update it frequently. In this blog I am going to discuss a issue which one of Customers encountered recently.
Recently One of Customers had upgraded to SQL 2008 with Transactional Replication and after upgrading they found that distribution agent fails intermittently with following error
2010-07-22 11:36:51.609 Agent message code 21100. The distribution agent failed to create temporary files in ‘C:Program FilesMicrosoft SQL Server100COM’ directory. System returned errorcode 5.
2010-07-22 11:36:51.656 ErrorId = 2988, SourceTypeId = 2
ErrorCode = ‘21100’ ErrorText = ‘The distribution agent failed to create temporary files in ‘C:Program FilesMicrosoft SQL Server100COM’ directory. System returned errorcode 5.’
Initially it appeared that we were hitting one of the known issue in SQL 2008 which is documented in the following KB article. http://support.microsoft.com/kb/956032/
However we tried to apply the hotfix mentioned in the KB article but that did not help.
Error Code 5 in the above error message clearly indicates Access is denied on COM Folder so we tried giving Full permission to Everyone but still we were receiving the above errors intermittently.
We also observed that the issue happens for just one Publication with one table in it , this table has a XML data column. There are other similar publications in your environment but those are not heavily used. All the failure happens for update on XML column of that table.
From SQL 2008 its seems that we are doing streaming in the background whenever we update a BLOG Column , in our case XML column. I am still not sure if this streaming for XML is strictly OLEDB streaming or a variant for the same but definitely there is some streaming going on because of which replication is creating temp files with extension name of “.LOB” (e.g. “C:Program FilesMicrosoft SQL Server100COM1a8b7ed71f0bda4183193049a69c0113_18_0.LOB”)
If you do not change the XML column this streaming is not happening and thus no .LOB file being generated , thus its sure that any update to the XML column is generating this stream by default. We do not see this behavior in SQL 2005. The issue goes away in the next retry of the distribution agent. Thus its very intermittent issue. So we decided to capture procmon while the distribution Agent was running by filtering just on the folder path ‘C:Program FilesMicrosoft SQL Server100COM
This is what we see from the procmon.
distrib.exe 6068 CreateFile C:Program FilesMicrosoft SQL Server100COM1a8b7ed71f0bda4183193049a69c0113_0_0.LOB DELETE PENDING Access: Generic Read/Write, Delete, Disposition: OverwriteIf, Options: Sequential Access, Synchronous IO Non-Alert, Non-Directory File, Delete On Close, Open No Recall, Attributes: HT, ShareMode: , AllocationSize: 0
McShield.exe 1588 CloseFile C:Program FilesMicrosoft SQL Server100COM1a8b7ed71f0bda4183193049a69c0113_0_0.LOB SUCCESS 7/29/2010 17:52
Looking at the chain of events above , we can see that Distrib.exe got a DELETE PENDINGnotification , this could be because of the fact that McShield.exe was still holding the file, immediately followed by this we see Distrib.exe restarting with the different process ID 5296 and these events match with the time frame when we last saw the error message in Replication Monitor.
Based on the above chain of events , we found McShield.exe is not releasing the file handle soon and thus forcing the Distrib.Exe to fail/restart with a Access Denied Error message.
So to resolve the above error message we need to exclude the folder C:Program FilesMicrosoft SQL Server100COM from scanning in the Anti-virus.
In the KB Article http://support.microsoft.com/kb/309422 we have already documented the folders to be excluded for running antivirus on running on servers which run sql server. However the above folder is not mentioned so I thought I would document this in my Blog to help other to avoid failure of replication.
Credits: The issue was drilled down by one of our Escalation Engineer and my colleaque Rishi Maini while from the Customer end Ragvendra T helped us to resolve it.
Premier Field Engineer|Microsoft Services