Seemingly random server connection problem

If you are having a problem using Vault, post a message here.

Moderator: SourceGear

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Seemingly random server connection problem

Post by davenovak » Wed May 06, 2009 2:25 pm

Hello --

We are using the latest version of Vault (4.1.4) and our build process makes use of the Vault command-line client. Once or twice a day, however, we run into the following error when the build process attempts to pull files from Vault:

Code: Select all

"C:\Program Files\SourceGear\Vault Client\vault.exe" -host vault.servername.com -user Buildmachine -password ***** -repository Art get "$/Art 9.0.0/" -makewritable -setfiletime modification -merge overwrite -destpath E:\Temp\builds\wd.rad8CD22\  -verbose
<vault>
  <error>
    <exception>System.Exception: The connection to the server failed: server cannot be contacted or uses a protocol that is not supported by this client. Server was unable to process request. ---> Object reference not set to an instance of an object. ---> System.Web.Services.Protocols.SoapException: Server was unable to process request. ---> Object reference not set to an instance of an object.
   at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
   at VaultClientNetLib.ClientService.VaultService.Login(String strHostname, Boolean bUseFullFiles, String username, String strEncryptedPassword, String strRMKey, String& strAuthToken)
   at VaultClientNetLib.VaultConnection.Login(String strURLBase, String strUserLogin, String strPassword)
   at VaultClientOperationsLib.ClientInstance.Login(String urlbase, String username, String password)
   at VaultClientIntegrationLib.ServerOperations.Login(AccessLevelType altCommand, Boolean bAllowAuto, Boolean bSaveSession)
   --- End of inner exception stack trace ---
   at VaultClientIntegrationLib.ServerOperations.Login(AccessLevelType altCommand, Boolean bAllowAuto, Boolean bSaveSession)
   at VaultClientIntegrationLib.ServerOperations.Login()
   at VaultCmdLineClient.VaultCmdLineClient.ProcessCommand(Args curArg)
   at VaultCmdLineClient.VaultCmdLineClient.Main(String[] args)</exception>
  </error>
  <result>
    <success>False</success>
  </result>
</vault>
Process completed with exit code -1
Yet, we can kick off the build again and all is perfectly fine. What would cause such an error? And, more importantly, what can be done to fix it? There's nothing worse than getting "broken build" notifications only to find out that the "break" was the Vault command-line failed.

Thanks,

--Dave Novak

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Re: Seemingly random server connection problem

Post by lbauer » Thu May 07, 2009 7:41 am

Are there any errors in the Vault server log that correspond to the time the build process fails?

Does the failure seem to be about the same time(s) each day?
Linda Bauer
SourceGear
Technical Support Manager

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Thu May 07, 2009 8:06 am

No errors found in the Vault server log (which makes sense as this is a client connection issue). The error itself can and does happen at virtually any time. However, it seems to happen more frequently if the build server is under load and/or currently making Vault client calls in concurrent build processes.

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Re: Seemingly random server connection problem

Post by lbauer » Thu May 07, 2009 2:46 pm

You could try the suggestions in this forum post about adjusting the client time-out:

http://support.sourcegear.com/viewtopic ... out#p48193
Linda Bauer
SourceGear
Technical Support Manager

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Thu May 07, 2009 4:46 pm

Which suggestion are you referring to? The one about adjusting the HKEY_CURRENT_USER\Software\SourceGear\Vault\Client\Settings\ServiceRequestTimeoutSeconds registry key or the disabling the client Use Expect: 100-Continue Headers option?

I can't see how the registry key can help since, as the previous poster mentioned, "7200 seconds is 2 hours. Is that really the login timeout? That's not going to make a difference to our issue as we know it definitely took less than 1 minute." And the Use Expect: 100-Continue Headers option is for low latency networks, and we are indeed all connected to the local area network running at 1Gbps. I'll try that option if you really think it's going to make a difference, but I don't see how it will.

Any other suggestions? This problem happens quite frequently.

Beth
Posts: 8550
Joined: Wed Jun 21, 2006 8:24 pm
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by Beth » Fri May 08, 2009 8:35 am

I'd like to add here to what Linda has already suggested. Can you look in your Vault server log and find the corresponding error(s) and post them?

Is it possible for you to try connecting from another client just to see if there's a difference, or try a client right on the server machine?
Beth Kieler
SourceGear Technical Support

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Fri May 08, 2009 8:41 am

Again, as mentioned previously, the server log does not contain this error. I've also been able to reproduce this same issue while being on the Vault server itself.

What could be causing this problem?

Beth
Posts: 8550
Joined: Wed Jun 21, 2006 8:24 pm
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by Beth » Fri May 08, 2009 9:01 am

About when did this start happening, or was this always something that was happening as long as you've had the product?

Is there antivirus or a software firewall on the server?

As for what it could be, I'm not sure yet. I'm not negating at all what Linda has suggested either.
Beth Kieler
SourceGear Technical Support

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Fri May 08, 2009 9:45 am

We've been seeing this off and on at least since we upgraded to Vault 4.x (over 6 months ago). However, we've been seeing it more frequently ever since we started using CCNet for Continuous Integration builds (using either the vault command-line and vault CCNet plugin). The issue definitely seems to be related to load on the server and happens most frequently if multiple process are all trying to log into Vault at the same time. I seriously doubt it is a firewall or anti-virus issue as I can reproduce the problem directly on the Vault server.

Is it possible there is a concurrency issue in the Vault client? Please note that the Vault "user" for our build process is all the same user. Is there any issue (such as a race condition) with having this same user log into multiple, concurrent instances of the Vault client (either the command-line Vault or CCNet plugin) from the same machine all at the same time?

Beth
Posts: 8550
Joined: Wed Jun 21, 2006 8:24 pm
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by Beth » Fri May 08, 2009 1:03 pm

Have you tried the Vault cc.net plugin? It doesn't need to re-login on every poll for changes. The documentation is in the download: http://www.sourcegear.com/downloads.html
Beth Kieler
SourceGear Technical Support

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Fri May 08, 2009 1:18 pm

Yes -- when I mentioned "Vault CCNet plugin" in my previous post, that's what I was referring to. Though it does not require re-login, all CCNet projects will log in at the exact same time if restarting CCNet service. Also, some of the failures we've seen are with our build service, which uses command-line Vault to pull files from Vault.

ian_sg
Posts: 787
Joined: Wed May 04, 2005 10:55 am
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by ian_sg » Fri May 08, 2009 2:32 pm

Hi Dave. Sorry you're having trouble with this. (I can almost hear the voice of HAL 9000 as I type this.) :)

It's likely that the problem is actually with the server. There are some known issues with the same user logging in concurrently, so it's almost always CC.NET that exposes the problem. It's mysterious that you're not seeing any errors in your server log, as I'd expect, but nonetheless I still suspect this is the problem.

Both methods of CC.NET integration have ways of dealing with the issue. The stock, command-line based version has parameters for retrying the login before reporting a failure: pollRetryAttempts and pollRetryWait (details here).

The plugin tries to stay logged in.

It sounds lke you've tried the plugin and are still have issues. Withe the command line version, have you tried tweaking the parameters I mentioned? I'd start by increasing the pollRetryWait from the 5 second default to 10 seconds.

If that doesn't work, we should probably get a closer look at your server log. If you could turn on the server's verbose logging and send us a log that includes a period of trouble, we can dig deeper.
Ian Olsen
SourceGear

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Sat May 09, 2009 1:40 pm

No, I missed those settings when configuring the Vault plugin for CCNet. I've since updated my configuration and am using 5 for pollRetryAttempts and 10 for pollRetryWait. Those settings look promising.

I looked again in the Vault server log, and recently found the following error (which repeated several times throughout the log):

Code: Select all

----5/9/2009 3:22:33 PM      cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Login 
----5/9/2009 3:22:33 PM      cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Object reference not set to an instance of an object.
   at VaultServiceSQL.VaultSqlSystem.GetLoginInfo(VaultSqlConn conn, String strLogin, String strHostname, String strSessionID, VaultUser vu, String& strDbHashPwd, Int32& nAttempts)
   at VaultServiceAPILib.VaultServiceAPI.AuthenticateAndLogin(VaultSqlConn conn, Boolean allowwebonly, String strLogin, String strDecryptedPassword, String strHostname, String strSessionID, VaultUser vu, String strDomain, Guid& gSessionGUID, Int32& nValidUserAttempts)
   at VaultServiceAPILib.VaultServiceAPI.Login(VaultSqlConn conn, Boolean allowwebonly, String strLogin, String strDecryptedPassword, String strHostname, String strSessionID, String strDomain, String& strToken, VaultUser& vu, Int32& nValidUserAttempts)
   at VaultServiceAPILib.VaultServiceAPISystem.DoLogin(Boolean allowwebonlyusers, Boolean bAdminLogin, HttpSessionState hss, String strHostname, String strLogin, String strPassword, String strRMKey, String& strAuthToken, String strDomain)
   at VaultServiceAPILib.VaultServiceAPISystem.DoLogin(Boolean allowwebonlyusers, Boolean bAdminLogin, HttpSessionState hss, String strHostname, String strLogin, String strPassword, String strRMKey, String& strAuthToken)
   at VaultService.VaultService.Login(String strHostname, Boolean bUseFullFiles, String username, String strEncryptedPassword, String strRMKey, String& strAuthToken) 
I'm not sure what that could mean, but it might shed some light on the problem.

Also, the original error I reported in my first post to this thread was actually for command-line Vault in our build process, which actually uses a different Vault user account (BuildMachine in this case). Is there any such retry tweaking available for command-line Vault that might help with this issue (as there is for the Vault CCNet plugin)?

Thanks for your help,

--Dave Novak

davenovak
Posts: 222
Joined: Mon Jan 15, 2007 2:15 pm
Location: Atlanta, GA

Re: Seemingly random server connection problem

Post by davenovak » Sun May 10, 2009 9:27 am

Every Sunday morning at 4:45am, our Vault server and Build machine reboot. But this shouldn't be a problem since we have CCNet turned off during the night until 5:00am. But once the CCNet projects started up this morning, about half of our 26 projects ended up in Exception status with the following client error:

Code: Select all

System.Exception: Wait timeout for mutex after 30000 milliseconds.
   at VaultLib.SystemMutex.Take(UInt32 ms)
   at VaultClientOperationsLib.CacheMember_LastStructureGetTime.set_LastStructureGetTime(VaultDateTime value)
   at VaultClientOperationsLib.ClientInstance.Refresh(Int64 knownServerRevision, Boolean isRetry, VaultRepositoryDelta& delta, Int64& returnedRevision, ChangeSetItemColl committedItems)
   at VaultClientOperationsLib.ClientInstance.Refresh()
   at ThoughtWorks.CruiseControl.Core.Sourcecontrol.FortressClient.Refresh()
   at ThoughtWorks.CruiseControl.Core.Sourcecontrol.FortressVaultSourceControl.GetModifications(IIntegrationResult from, IIntegrationResult to)
   at ThoughtWorks.CruiseControl.Core.Sourcecontrol.FilteredSourceControl.GetModifications(IIntegrationResult from, IIntegrationResult to)
   at ThoughtWorks.CruiseControl.Core.Sourcecontrol.QuietPeriod.GetModifications(ISourceControl sourceControl, IIntegrationResult lastBuild, IIntegrationResult thisBuild)
   at ThoughtWorks.CruiseControl.Core.IntegrationRunner.GetModifications(IIntegrationResult from, IIntegrationResult to)
   at ThoughtWorks.CruiseControl.Core.IntegrationRunner.Integrate(IntegrationRequest request)
And yes, I have changed the plugin options as described in my previous post.

On the server side, I'm seeing the following errors (which are repeated multiple times):

Code: Select all

----5/10/2009 5:08:32 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalTransaction.Commit()
   at System.Data.SqlClient.SqlTransaction.Commit()
   at VaultServiceSQL.VaultSqlConn.CommitTransaction()    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalTransaction.Commit()
   at System.Data.SqlClient.SqlTransaction.Commit()
   at VaultServiceSQL.VaultSqlConn.CommitTransaction()
----5/10/2009 5:08:32 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	Rolling Back a transaction   at VaultServiceSQL.VaultSqlConn.RollbackTransaction()
   at VaultServiceSQL.VaultSqlConn.CloseConn()
   at VaultServiceAPILib.VaultServiceAPI.GetRepositoryTreeDelta(VaultTreeManager tm, Boolean bAdminMode, Int32 nUserID, Int32 nRepID, Int64 nBaseTxID, VaultDateTime dtLastChange, Boolean bUseDBDeltaOnCacheMiss, VaultDateTime& dtLatestChange, Int64& nTargetTxID, VaultRepositoryDelta& rd, VaultIntDnld& dlOut)
   at VaultService.VaultService.GetRepositoryStructure(Int32 nRepID, Int64 nBaseRevision, Int64 nTargetRevision, VaultDateTime dtLastClientSecurityCheck, VaultDateTime& dtLatestServerSecurityCheck, Int64& nReturnTargetRevision, VaultRepositoryDelta& rd, Boolean bUseDBDeltaOnCacheMiss)
   at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
   at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Web.Services.Protocols.LogicalMethodInfo.Invoke(Object target, Object[] values)
   at System.Web.Services.Protocols.WebServiceHandler.Invoke()
   at System.Web.Services.Protocols.WebServiceHandler.CoreProcessRequest()
   at System.Web.Services.Protocols.SyncSessionlessHandler.ProcessRequest(HttpContext context)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
   at System.Web.HttpApplication.ApplicationStepManager.ResumeSteps(Exception error)
   at System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(HttpContext context, AsyncCallback cb, Object extraData)
   at System.Web.HttpRuntime.ProcessRequestInternal(HttpWorkerRequest wr)
   at System.Web.HttpRuntime.ProcessRequestNoDemand(HttpWorkerRequest wr)
   at System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr ecb, Int32 iWRType)
 
----5/10/2009 5:08:32 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable.
   at System.Data.SqlClient.SqlTransaction.ZombieCheck()
   at System.Data.SqlClient.SqlTransaction.Rollback()
   at VaultServiceSQL.VaultSqlConn.RollbackTransaction()    at System.Data.SqlClient.SqlTransaction.ZombieCheck()
   at System.Data.SqlClient.SqlTransaction.Rollback()
   at VaultServiceSQL.VaultSqlConn.RollbackTransaction()
----5/10/2009 5:08:32 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	System.NullReferenceException: Object reference not set to an instance of an object.
   at VaultServiceSQL.VaultSqlSCC.GetUserRepositoryLastChange(VaultSqlConn conn, Int32 nRepID, VaultDateTime& dtLastChg)    at VaultServiceSQL.VaultSqlSCC.GetUserRepositoryLastChange(VaultSqlConn conn, Int32 nRepID, VaultDateTime& dtLastChg)
----5/10/2009 5:08:32 AM     cruisecontrol--rtgbuild1.ddcinternal.com(10.150.4.230)--SSL Disabled	GetRepositoryStructure returned: FailPermissionDenied 
Please note that we stagger the start of our 26 projects by 15 seconds, with the hope of avoiding hammering the Vault server.

Any ideas as to what's going on here? Any suggestions for a solution?

ian_sg
Posts: 787
Joined: Wed May 04, 2005 10:55 am
Location: SourceGear
Contact:

Re: Seemingly random server connection problem

Post by ian_sg » Sun May 10, 2009 3:33 pm

Typically when this problem comes up, the culprit is lots of CC.NET projects logging in simultaneously to poll for changes. I'd be surprised if one login every 15 seconds caused any trouble. It would not surprise me if 20+ simultaneous logins of the same user had issues.

Because half the projects had exception statuses right after the projects restarted around 5:00, I suspect the staggering is not working the way you want it to. What mechanism do you use to stagger the start? Do the logs indicate that the 26 projects are logging in serially, separated by 15 seconds?

Unfortunately, with all of this running on one machine, I think it's going to be difficult to get 26 projects working well via the command line integration with CC.NET. The plugin might be a better alternative.

With the plugin, you still want to stagger the initial startup, to avoid 26 concurrent logins from the same user. But from that point forward, those projects will log in only when the connection is severed or the Vault server is restarted. I don't think you've mentioned the specific problem(s) you encountered with the plugin. Would you be willing to try that again? Note that the configuration blocks are virtually identical, so it should be relatively straightforward to switch, even for 26 projects.
Ian Olsen
SourceGear

Post Reply