It has been really long time since I have updated this blog as my Job involves a lot of travel however I try my best to update it frequently. In this blog I am going to discuss a issue which one of Customers encountered recently.
Recently One of Customers had upgraded to SQL 2008 with Transactional Replication and after upgrading they found that distribution agent fails intermittently with following error
2010-07-22 11:36:51.609 Agent message code 21100. The distribution agent failed to create temporary files in ‘C:Program FilesMicrosoft SQL Server100COM’ directory. System returned errorcode 5.
2010-07-22 11:36:51.656 ErrorId = 2988, SourceTypeId = 2
ErrorCode = ‘21100’ ErrorText = ‘The distribution agent failed to create temporary files in ‘C:Program FilesMicrosoft SQL Server100COM’ directory. System returned errorcode 5.’
Initially it appeared that we were hitting one of the known issue in SQL 2008 which is documented in the following KB article. http://support.microsoft.com/kb/956032/
However we tried to apply the hotfix mentioned in the KB article but that did not help.
Error Code 5 in the above error message clearly indicates Access is denied on COM Folder so we tried giving Full permission to Everyone but still we were receiving the above errors intermittently.
We also observed that the issue happens for just one Publication with one table in it , this table has a XML data column. There are other similar publications in your environment but those are not heavily used. All the failure happens for update on XML column of that table.
From SQL 2008 its seems that we are doing streaming in the background whenever we update a BLOG Column , in our case XML column. I am still not sure if this streaming for XML is strictly OLEDB streaming or a variant for the same but definitely there is some streaming going on because of which replication is creating temp files with extension name of “.LOB” (e.g. “C:Program FilesMicrosoft SQL Server100COM1a8b7ed71f0bda4183193049a69c0113_18_0.LOB”)
If you do not change the XML column this streaming is not happening and thus no .LOB file being generated , thus its sure that any update to the XML column is generating this stream by default. We do not see this behavior in SQL 2005. The issue goes away in the next retry of the distribution agent. Thus its very intermittent issue. So we decided to capture procmon while the distribution Agent was running by filtering just on the folder path ‘C:Program FilesMicrosoft SQL Server100COM
This is what we see from the procmon.
distrib.exe 6068 CreateFile C:Program FilesMicrosoft SQL Server100COM1a8b7ed71f0bda4183193049a69c0113_0_0.LOB DELETE PENDING Access: Generic Read/Write, Delete, Disposition: OverwriteIf, Options: Sequential Access, Synchronous IO Non-Alert, Non-Directory File, Delete On Close, Open No Recall, Attributes: HT, ShareMode: , AllocationSize: 0
McShield.exe 1588 CloseFile C:Program FilesMicrosoft SQL Server100COM1a8b7ed71f0bda4183193049a69c0113_0_0.LOB SUCCESS 7/29/2010 17:52
Looking at the chain of events above , we can see that Distrib.exe got a DELETE PENDINGnotification , this could be because of the fact that McShield.exe was still holding the file, immediately followed by this we see Distrib.exe restarting with the different process ID 5296 and these events match with the time frame when we last saw the error message in Replication Monitor.
Based on the above chain of events , we found McShield.exe is not releasing the file handle soon and thus forcing the Distrib.Exe to fail/restart with a Access Denied Error message.
So to resolve the above error message we need to exclude the folder C:Program FilesMicrosoft SQL Server100COM from scanning in the Anti-virus.
In the KB Article http://support.microsoft.com/kb/309422 we have already documented the folders to be excluded for running antivirus on running on servers which run sql server. However the above folder is not mentioned so I thought I would document this in my Blog to help other to avoid failure of replication.
Credits: The issue was drilled down by one of our Escalation Engineer and my colleaque Rishi Maini while from the Customer end Ragvendra T helped us to resolve it.
Premier Field Engineer|Microsoft Services
“So to resolve the above error message we need to exclude the folder C:Program FilesMicrosoft SQL Server100COM from scanning in the Anti-virus.”
Symantec in my customer’s case.
This was about to turn into a real hair-puller until I found your information.
Thank you! we tried the MS hotfix but did not work for us. The AV was the cause after doing this changes we don’t have problem any more 🙂
I am having this issue but our disritbution instance is running on a windows 2008 cluster. I’m concerned that using the C drive on one of the nodes may have an issue in case of a fail over.
I would like to change the location to a file share or one of the clustered luns . Any idea how to change the default location from C:Program FilesMicrosoft SQL Server100COM?
For Cluster, you need not worry for these files which are created in COM Folder since they are just Temp files and will be created once the instance is failed over to another folder in the local path for COM folder.
For Cluster, what you need to worry about is the location of the snapshot folder which should be in the fileshare.
I tend not to drop a leave a response, however I looked at a few of
the comments on this page Transactional Replication in SQL 2008 Fails “The distribution agent failed to create temporary files in ?C:Program FilesMicrosoft SQL Server100COM? directory. System returned errorcode 5.” | SQL Server Faq. I do have some questions for you if it’s allright. Is it simply me or do a few of the remarks look as if they are left by brain dead individuals? 😛 And, if you are posting on other online social sites, I’d like to follow anything new you have to post. Would you post a list of the complete urls of your shared pages like your twitter feed, Facebook page or linkedin profile?
This site was… how do you say it? Relevant!! Finally I have found something that
helped me. Thanks a lot!
I have a slight concern.
I used to argue that AV was not needed on production sql servers as they were usually buried at the back of a dev stack with no access to the world. i.e. you put stuff there and serve it out through a protected web site. These days however this is no longer the case as the data in the DB often comes from the web and systems are regularly set up with internet access and web browsers (for admins) to allow patching / ease-of-use debugging and problem solving (downloading patches etc.).
So now AV on production SQL Servers is a very valid idea.
The trouble is though, we have just suggested excluding a folder from AV that contains regularly accessed and updated files and sub-folders.
To do this is close to simply removing AV from the server and going old school. If we want to exclude a folder from the AV then that folder needs isolation and we need to make sure that the benefit of the exclusion outweighs the risks.
One solution (that I don’t know is possible in this case) would be to make SQL sever write its temp blob files to a newly created dedicated location rather than this shared one. Do we know if this is possible?
If it isn’t then we have to consider any other alternatives (like not replicating this data) which may not be an option either. If no alternative exists then we need to flag this to MS and have them re-factor in a future release and accept a hopefully short-term insecure fix / kludge of excluding the shared-use folder.