Seemingly random server connection problem

If you are having a problem using Vault, post a message here.

Moderator: SourceGear

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Sun May 10, 2009 3:57 pm

Again, I am using the CCNet Vault plugin for checking on source changes in Vault, not command-line Vault. It is our build process (often triggered by CCNet) that uses command-line Vault. When 5:00am rolls around, the Vault plugin usually does not detect any source code changes so no actual builds are triggered. Therefore, this freeze-up from earlier today resulted from just the CCNet Vault plugin.

To stagger the start of Vault polling in the morning, the CCNet trigger is configured to filter out the late evening and morning hours as follows:

Code: Select all

<triggers>
    <filterTrigger startTime="22:30" endTime="05:00">
        <trigger type="intervalTrigger" buildCondition="IfModificationExists" seconds="60"/>
    </filterTrigger>
</triggers>
I then adjust endTime for each individual project. This does seem to do what I want it to, stopping the CCNet builds at 10:30pm and restarting them around 5:00am. However, if the CCNet service were to be restarted, the "Checking Modifications" activity will trigger for all projects at the same time (a "feature" of CCNet).

Looking at the Vault server log, it seems as though the first 11 projects this morning were able to log in successfully (though not staggered) and most of the remaining projects all timed out with exceptions. Here's the log from those that were able to log in:

Code: Select all

----5/10/2009 5:02:39 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:39 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:39 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:39 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:39 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:39 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:40 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:41 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:41 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:41 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/10/2009 5:02:42 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
What's odd to me here is that I'm not seeing any staggering. It's also odd (and disturbing) that the other logins failed (timed out).

Where you able to figure anything out from the error logs I sent in my previous post?

ian_sg
Posts: 787
Joined: Wed May 04, 2005 10:55 am
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by ian_sg » Mon May 11, 2009 7:50 am

Okay, sorry about the misunderstanding about with integration you're using. The pollRetry settings only work with the command-line version (which I never stated), so that's the root of my confusion.

The problem is that when Vault logs in, it also refreshes the client-side tree cache. This was originally designed for clients like the standalone Vault GUI client, and obviously doesn't scale well to environments like CC.NET with a couple of dozen projects.

These are your options, as I see them.
  1. Try the command-line integration, so the pollRetry* settings can do their thing. I was initially skeptical that this would work reliably, but I was reminded that we did do load testing when we added the pollRetry* parameters up to 40 projects. The size of the repository tree and frequency and number of changes are the other important variables here (due to the aforementioned refresh), so I can't be sure our testing will guarantee your success, but I'm more optimistic that this could work for you now.
  2. Are the 26 projects completely independent? Are there cases where you will always build one if another is being built? If so, you can use a project trigger. As I understand it, this will build a project when another completes, so it no longer has to poll for changes. This could reduce the number of projects that need to hit the server simultaneously in the event of a restart.
  3. Customize the CC.NET code to do staggering on a restart, particularly if this is a common occurrence. This isn't as hard as it might sound, especially if you're already doing .NET development.
  4. Split the 26 projects across at least 2 machines.
Ian Olsen
SourceGear

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Mon May 11, 2009 3:30 pm

No problem -- I should have realized that you were talking about command-line Vault. The documentation for these blocks looks so similar that I didn't even notice. Funny thing is that the Vault plugin does not complain about the existence of the superfluous options, so I'm just leaving them (as it makes it easier to switch back to command-line Vault).

In your post, you stated that "when Vault logs in, it also refreshes the client-side tree cache". Are you saying the client cache is updated on simply a login? I ask because I am using Vault in CCNet with both autoGetSource="false" and applyLabel="false". So, is it still updating the local cache even in this case? If so, it seems as though there is opportunity for improvement with the Vault Plugin. Though the command-line Vault could not make any assumption about how it is being used, it seems to me as though the Vault plugin for CCNet can and should be written with the assumption that the same user is going to be logging in and checking for modifications all of the time.

In any event, I switched my projects back to using command-line Vault . . . and the problem got much worse. Within about 10 minutes I started seeing projects in Exception status and SqlException: Timeout expired errors in the Vault server log. After switching back to the Vault plugin, things improved, though the initial "Checking Modifications" period went on for quite some time.

Looking deeper into the server logs, I found a repeated deadlock exception that popped up when all projects where making their "Checking Modifications" call (using the Vault plugin):

Code: Select all

----5/11/2009 1:54:32 PM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	System.Data.SqlClient.SqlException: Transaction (Process ID 59) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlDataReader.HasMoreRows()
   at System.Data.SqlClient.SqlDataReader.ReadInternal(Boolean setTimeout)
   at System.Data.SqlClient.SqlDataReader.Read()
   at VaultServiceSQL.VaultRepUtil.BuildTreeDelta(SqlDataReader dr, Int64 nBaseRevID, Int64 nTargetRevID, Hashtable htSharedItems)
   at VaultServiceSQL.VaultSqlSCC.GetRepositoryTreeDelta(VaultSqlConn conn, Int32 nRepID, Hashtable htSharedItems, Int64 nBaseRevID, Int64 nTargetRevID, VaultRepositoryDelta& rep)    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlDataReader.HasMoreRows()
   at System.Data.SqlClient.SqlDataReader.ReadInternal(Boolean setTimeout)
   at System.Data.SqlClient.SqlDataReader.Read()
   at VaultServiceSQL.VaultRepUtil.BuildTreeDelta(SqlDataReader dr, Int64 nBaseRevID, Int64 nTargetRevID, Hashtable htSharedItems)
   at VaultServiceSQL.VaultSqlSCC.GetRepositoryTreeDelta(VaultSqlConn conn, Int32 nRepID, Hashtable htSharedItems, Int64 nBaseRevID, Int64 nTargetRevID, VaultRepositoryDelta& rep)
----5/11/2009 1:54:32 PM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Rolling Back a transaction   at VaultServiceSQL.VaultSqlConn.RollbackTransaction()
   at VaultServiceSQL.VaultSqlConn.CloseConn()
   at VaultServiceAPILib.VaultServiceAPI.GetRepositoryTreeDelta(VaultTreeManager tm, Boolean bAdminMode, Int32 nUserID, Int32 nRepID, Int64 nBaseTxID, VaultDateTime dtLastChange, Boolean bUseDBDeltaOnCacheMiss, VaultDateTime& dtLatestChange, Int64& nTargetTxID, VaultRepositoryDelta& rd, VaultIntDnld& dlOut)
   at VaultService.VaultService.GetRepositoryStructure(Int32 nRepID, Int64 nBaseRevision, Int64 nTargetRevision, VaultDateTime dtLastClientSecurityCheck, VaultDateTime& dtLatestServerSecurityCheck, Int64& nReturnTargetRevision, VaultRepositoryDelta& rd, Boolean bUseDBDeltaOnCacheMiss)
   at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
   at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Web.Services.Protocols.LogicalMethodInfo.Invoke(Object target, Object[] values)
   at System.Web.Services.Protocols.WebServiceHandler.Invoke()
   at System.Web.Services.Protocols.WebServiceHandler.CoreProcessRequest()
   at System.Web.Services.Protocols.SyncSessionlessHandler.ProcessRequest(HttpContext context)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
   at System.Web.HttpApplication.ApplicationStepManager.ResumeSteps(Exception error)
   at System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(HttpContext context, AsyncCallback cb, Object extraData)
   at System.Web.HttpRuntime.ProcessRequestInternal(HttpWorkerRequest wr)
   at System.Web.HttpRuntime.ProcessRequestNoDemand(HttpWorkerRequest wr)
   at System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr ecb, Int32 iWRType)
I know that in a previous post you stated that there are "some known issues with the same user logging in concurrently". Does the same problem hold true for checking for changes? Based upon the stack dump I'm looking at, I'd say it does.

I think that if I can simply stagger the "Checking Modifications" for my projects at startup and at all other times, most of my issues related to CCNet will be resolved. I'll have to look into that further to see what can be done. I am a .NET developer, though I have not seen any documentation or configuration option for doing this. And I'd hate to alter the CCNet code as it impedes future upgrade. My gut tells me that if this were really "easy", I would have been able to Google an answer. Instead, I've only seen others looking for a solution to the same problem, so it's probably not a simple fix. Also, as a side note, even updating the ccnet.config file causes all projects to reload and again start their "Checking Modifications" phase.

I'll do what I can from my end to work around the problems with Vault here. Whatever you can do from your end so that the plugin better supports CCNet would be greatly appreciated.

Thanks,

--Dave Novak

ian_sg
Posts: 787
Joined: Wed May 04, 2005 10:55 am
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by ian_sg » Tue May 12, 2009 8:19 am

Thanks, Dave. I spoke with a developer more familiar with the plugin this morning, and he's got some ideas that might help.

If you'd be willing to send email to support@sourcegear.com, and reference this thread, he'll pick up with you from there.
Ian Olsen
SourceGear

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Tue May 12, 2009 8:32 am

I will indeed follow up with the developer regarding the CCNet plugin.

However, before I do so, I wanted to mention that I was able to mitigate the CCNet concurrency issue by splitting my projects into logical Queues. The way I have it configured now, there will never be more than 10 concurrent Vault logins or "Checking Modifications" calls from the Vault cruise control user. I have already seen some improvement as a result. This is apparently the way CCNet intends for you to "stagger" the start of projects.

Having said that, I want to go back to my original post for this thread, which stated that our build process was occasionally failing when trying to get files from Vault. Please note that our build process is really separate from CCNet and accesses Vault using a different Vault account than the one used by the CCNet Vault plugin. I will include the error again here for convenience:

Code: Select all

"C:\Program Files\SourceGear\Vault Client\vault.exe" -host vault.servername.com -user Buildmachine -password ***** -repository Art get "$/Art 9.0.0/" -makewritable -setfiletime modification -merge overwrite -destpath E:\Temp\builds\wd.rad8CD22\  -verbose
<vault>
  <error>
    <exception>System.Exception: The connection to the server failed: server cannot be contacted or uses a protocol that is not supported by this client. Server was unable to process request. ---> Object reference not set to an instance of an object. ---> System.Web.Services.Protocols.SoapException: Server was unable to process request. ---> Object reference not set to an instance of an object.
   at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
   at VaultClientNetLib.ClientService.VaultService.Login(String strHostname, Boolean bUseFullFiles, String username, String strEncryptedPassword, String strRMKey, String& strAuthToken)
   at VaultClientNetLib.VaultConnection.Login(String strURLBase, String strUserLogin, String strPassword)
   at VaultClientOperationsLib.ClientInstance.Login(String urlbase, String username, String password)
   at VaultClientIntegrationLib.ServerOperations.Login(AccessLevelType altCommand, Boolean bAllowAuto, Boolean bSaveSession)
   --- End of inner exception stack trace ---
   at VaultClientIntegrationLib.ServerOperations.Login(AccessLevelType altCommand, Boolean bAllowAuto, Boolean bSaveSession)
   at VaultClientIntegrationLib.ServerOperations.Login()
   at VaultCmdLineClient.VaultCmdLineClient.ProcessCommand(Args curArg)
   at VaultCmdLineClient.VaultCmdLineClient.Main(String[] args)</exception>
  </error>
  <result>
    <success>False</success>
  </result>
</vault>
Process completed with exit code -1
Though it's true we have often have concurrent builds (though typically no more than 2 or 3 at the same time), I'm surprised that command-line Vault could fail like this. Is there anything we can do to make this better?

Again, I will follow up with the developer regarding the CCNet plugin. But I’d like to keep going with my original issue here as I believe this to be a general issue with Vault (unrelated to CCNet).

Beth
Posts: 8550
Joined: Wed Jun 21, 2006 8:24 pm
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by Beth » Fri May 29, 2009 3:29 pm

This is currently offline.

HS: 215987
Beth Kieler
SourceGear Technical Support

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Mon Jun 01, 2009 9:11 am

Beth --

My understanding was that the offline discussion was for the CCNet plugin. However, I would like to keep the problem I reported originally (and just reiterated in my previous post) alive here. We are still having issues occasionally with our build process (which uses command-line Vault) with it failing on "The connection to the server failed: server cannot be contacted or uses a protocol that is not supported by this client. Server was unable to process request" (as previously reported).

Is anyone even looking at that problem?

Beth
Posts: 8550
Joined: Wed Jun 21, 2006 8:24 pm
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by Beth » Mon Jun 01, 2009 9:41 am

Looking back through the ticket, I thought you said you were using the command line with Cruise Control. Are you using it separate from Cruise Control? Are you running gets with the command line client at the same time as your Cruise Control builds?
Beth Kieler
SourceGear Technical Support

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Mon Jun 01, 2009 9:55 am

You should re-read the entire post. I think that I was clear that our build process (which is seperate from CCNet) uses command-line Vault where as our integration with CCNet uses the Vault plugin. Please note as well that we use CCNet only to trigger our build process. The CCNet and vault-plugin merely check for changes in the source code repository (without actually doing a Get) and then calls our "normal" build process if changes were found.

Keep in mind as well that we've been seeing this issue for some time now (long before we began our integration with CCNet).

jeremy_sg
Posts: 1821
Joined: Thu Dec 18, 2003 11:39 am
Location: Sourcegear
Contact:

Re: Seemingly random server connection problem

Post by jeremy_sg » Mon Jun 01, 2009 10:10 am

Dave,

Since the problem we're seeing is actually the server returning an error when lots of login connections come in at once, it would impact the command line client, and the CC.Net plugin. The debug plugin versions that I sent you (did you get the new one I mailed you about last week?) work by serializing the login commands so that they only occur one at a time.
Subscribe to the Fortress/Vault blog

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Mon Jun 01, 2009 10:30 am

I just completed my testing of the new plug-in and sent the result to you.

jeremy_sg
Posts: 1821
Joined: Thu Dec 18, 2003 11:39 am
Location: Sourcegear
Contact:

Re: Seemingly random server connection problem

Post by jeremy_sg » Mon Jun 01, 2009 10:32 am

I'll respond offline.
Subscribe to the Fortress/Vault blog

Post Reply