Problems after upgrade to 5.0.3

If you are having a problem using Vault, post a message here.

Moderator: SourceGear

Post Reply
plexus
Posts: 27
Joined: Thu Dec 13, 2007 12:34 pm

Problems after upgrade to 5.0.3

Post by plexus » Wed Jul 14, 2010 12:17 pm

We have about 70 developers using Vault. About 2 months ago we upgraded from 4.1.4 to 5.0.3. We are using SQL 2005. Over the next month, all of the developers migrated to VS2010. Our 2 main repositories have the following attributes: revision 38470, 13118 files, 1949 folders and revision 158783, 24675 files, 1021 folders).


Since the Vault upgade, we have had fairly frequent problems connecting to the server. The fix is for the user to reset the Vault client's repository cache. Prior to the upgrade, I would get a call maybe once every couple of weeks and I would have the user clear the cache and they would be able to connect again. Now it is at least every couple of days, and some people have to do it every time after they reboot.

When they have the problem, the error they get depends on if they are in VS2010 or the Vault client. If they are in VS2010, they get a message saying: "The associated source control database could not be accessed." and gives 4 choices of whether to work offline, remove source control bindings, or change the bindings. If they are using the Vault client, they get this error:

"The Vault server could not be contacted to perform the operation. Your network connection to the server may have been interrupted. Please verify your network settings using the Options dialog under the Tools menu in the Vault GUI Client.The underlying connection was closed: The connection was closed unexpectedly."

If they then reset the client repository cache, they are able to connect again.

When I try connecting with the Vault client and have the problem, the following happens:

The server accepts my credentials and I see the login in the sgvault.log on the server:

----7/14/2010 11:52:19 AM [name and address removed]--SSL Disabled Login

I then get the list of repositories and choose one of the large ones.

A little over a minute later I get the "The Vault server could not be contacted to perform the operation. Your network connection to the server may have been interrupted. Please verify your network settings using the Options dialog under the Tools menu in the Vault GUI Client.The underlying connection was closed: The connection was closed unexpectedly." error. There are no further messages in the server log, but if I look in the client log I see this:

Client log
7/14/2010 11:53:25 AM <generic>: [GUIClientWorkerThread:6484] [System.Exception: The Vault server could not be contacted to perform the operation. Your network connection to the server may have been interrupted. Please verify your network settings using the Options dialog under the Tools menu in the Vault GUI Client.The underlying connection was closed: The connection was closed unexpectedly.
at VaultClientNetLib.VaultConnection.GetRepositoryStructure(Int32 nRepID, Int64 nSrcRevision, Int64 nDestRevision, Boolean bRequestDBDeltaOnCacheMiss, Int64& nReturnDestRevision, VaultDateTime dtLastCheck, VaultDateTime& dtLatestCheck, VaultRepositoryDelta& rd)
at VaultClientOperationsLib.ClientInstance.Refresh(Int64 knownServerRevision, Boolean isRetry, VaultRepositoryDelta& delta, Int64& returnedRevision, ChangeSetItemColl committedItems)
at VaultClientOperationsLib.ClientInstance.SetActiveRepositoryID(Int32 id, String username, String uniqueRepositoryID, Boolean doRefresh, Boolean updateKnownChangesAll)
at VaultClientPresentationLib.GUIClientInstance.ChooseRepository(Boolean forceDialogShow, String inProfile)
at VaultClientPresentationLib.GUIClientThread.ProcessCommand(GUIClientThreadCommand command, GUIClientThreadCommandResult& outputResult)]The Vault server could not be contacted to perform the operation. Your network connection to the server may have been interrupted. Please verify your network settings using the Options dialog under the Tools menu in the Vault GUI Client.The underlying connection was closed: The connection was closed unexpectedly.
at VaultClientNetLib.VaultConnection.GetRepositoryStructure(Int32 nRepID, Int64 nSrcRevision, Int64 nDestRevision, Boolean bRequestDBDeltaOnCacheMiss, Int64& nReturnDestRevision, VaultDateTime dtLastCheck, VaultDateTime& dtLatestCheck, VaultRepositoryDelta& rd)
at VaultClientOperationsLib.ClientInstance.Refresh(Int64 knownServerRevision, Boolean isRetry, VaultRepositoryDelta& delta, Int64& returnedRevision, ChangeSetItemColl committedItems)
at VaultClientOperationsLib.ClientInstance.SetActiveRepositoryID(Int32 id, String username, String uniqueRepositoryID, Boolean doRefresh, Boolean updateKnownChangesAll)
at VaultClientPresentationLib.GUIClientInstance.ChooseRepository(Boolean forceDialogShow, String inProfile)
at VaultClientPresentationLib.GUIClientThread.ProcessCommand(GUIClientThreadCommand command, GUIClientThreadCommandResult& outputResult)

Stack Trace:
at VaultClientPresentationLib.GUIClientInstance.ShowCommandException(Exception e, IWin32Window dialogOwner)
at VaultClientPresentationLib.GUIClientThread.ProcessCommand(GUIClientThreadCommand command, GUIClientThreadCommandResult& outputResult)
at VaultClientPresentationLib.GUIClientThread.Start()
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()

Again, if I then reset the client cache I can connect again.

Looking at a packet capture of the conversation between my PC and the server I see this:
I see the packets corresponding to the login and display of the repository list. When I choose a repository, the client POSTs to vaultservices.asmx with a SOAPAction of "GetRepositoryStructure" and a "Expect: 100-continue" header. The server responds with a "100 Continue" response. The client then responds with the XML for the repository I chose. The server responds with an ACK packet. 61 seconds later, the server still has not started sending repository information to the client, so the client issues a FIN packet, to which the server immediately ACK's. 58 seconds after this, the server begins sending the repository info to the client, even thought the connection is closed. The client responds with RST packets.

Looking at this it appears the server is not responding in a timely fashion, so the client gives up. But if that were the problem, why does resetting the client repository cache fix the problem?

Further, I am seeing errors like these pretty frequently:

----7/14/2010 8:48:56 AM [Client Name and Address Removed]--SSL Disabled System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
at VaultServiceSQL.VaultSqlSCC.GetRepositoryTreeDelta(VaultSqlConn conn, Int32 nRepID, Hashtable htSharedItems, Int64 nBaseRevID, Int64 nTargetRevID, VaultRepositoryDelta& rep) at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
at VaultServiceSQL.VaultSqlSCC.GetRepositoryTreeDelta(VaultSqlConn conn, Int32 nRepID, Hashtable htSharedItems, Int64 nBaseRevID, Int64 nTargetRevID, VaultRepositoryDelta& rep)

They seem to come in groups, and are not necessarily related to the time the clients are having the problem above.

I am will to provide server and client logs, as well as the packet capture if necessary.

Any help would be appreciated.

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Re: Problems after upgrade to 5.0.3

Post by lbauer » Wed Jul 14, 2010 1:43 pm

When a user clears their cache, at the next login, the client gets the whole tree from the repository, rather than calculating the delta. It can be faster if there's a problem calculating the delta.

It does look like you're getting some type of delays in communication. Is SQL Server on a different machine than the Vault Server? Has anything changed on your network?

Have you done database maintenance recently to optimize database response?

http://support.sourcegear.com/viewtopic.php?t=2924

I also recommend upgrading clients and server to Vault 5.0.4, as we fixed several bugs related to Visual Studio integration.
Linda Bauer
SourceGear
Technical Support Manager

plexus
Posts: 27
Joined: Thu Dec 13, 2007 12:34 pm

Re: Problems after upgrade to 5.0.3

Post by plexus » Wed Jul 14, 2010 2:28 pm

Linda,

The SQL server is on the same machine as the Vault application. We did DB index rebuilds based on the link you provided last month. We did not do the checkdb however.

About a 1.5 months before the upgrade the developers moved to a new building that had a slower connection to us. We did start to get more issues when that happened, but the issues really started to be more frequent after the upgrade. A couple of weeks ago we finally got our dedicated 100MB connection to the new building and peope are still having this issue. Plus, I am in the same building as the Vault server and have the issue sometimes with my client, and there have been no network changes in this building.

So it seems that other than the upgrade, none of those changes appear to be the culprit.

Does the 5.0.4 upgrade have any other fixes than the integration with VS? Like I said, my Vault client has the issue without even using VS2010, and I know some of the developers have seen it when trying to connect with just the client.

One other thing I am thinking of: We have a few copies of some of these large repositories in the DB that were used for testing various things. Do you think deleting those repositories might help with the speed? It seems like reducing the number of records might help.

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Re: Problems after upgrade to 5.0.3

Post by lbauer » Fri Jul 16, 2010 6:59 am

How big is the sgvault database and the database log file?

What are the specs of the Vault Server machine -- OS, how much RAM, CPU, etc. Is there anything else besides Vault and SQL Servre on that machine?

If you operations from a Vault Client on the Vault Server machine, and use "localhost" for the Vault Server name, is performance still affected? This may rule out any network issues.

You could certainly delete the unused test repositories.(Backup up your databases first!). That would make the database smaller. It may or may not help with performance, because it's the size and structure of a specific repository that affects performance while working in that repository.

We would like to see a copy of the Vault Server log. Send the log to support at sourcegear.com, Attn: Linda. Please include a link to this forum post.
Linda Bauer
SourceGear
Technical Support Manager

plexus
Posts: 27
Joined: Thu Dec 13, 2007 12:34 pm

Re: Problems after upgrade to 5.0.3

Post by plexus » Mon Jul 26, 2010 2:28 pm

Linda,

Sorry for the delay. The sgvault mdf is 10GB and the log is 4GB. Here are stats on the server:

Server: HP Proliant DL360G5
OS: Windows 2008 x64 (not R2)
RAM: 6GB
CPU: Intell XEON Quad Core 2.83GHz
Drives: 10k RPM SAS in RAID 1
NIC: Gigabit NIC

This server does also run CruiseControl .Net for building and deploying .Net projects.

I am having an issue today with a particular developer. He is responible for our largest folder, which contains almost 3000 files. Doing a checking routinely takes 2-3 minutes. Looking at packet captures, I can see the request made, then the server doesn't send a response for 2-3 minutes.

I will email the server log to you.

plexus
Posts: 27
Joined: Thu Dec 13, 2007 12:34 pm

Re: Problems after upgrade to 5.0.3

Post by plexus » Mon Jul 26, 2010 2:30 pm

Also, I found this thread:

http://kb.sourcegear.com/FortressHelp/v ... 51&start=0

and although it is for a much older version, I am wondering if some of might still apply to our situation?

plexus
Posts: 27
Joined: Thu Dec 13, 2007 12:34 pm

Re: Problems after upgrade to 5.0.3

Post by plexus » Mon Jul 26, 2010 2:50 pm

Also, we are seeing some disk queueing related to TempDB. There seems to be more activity in TempDB than the sgVaultDB. We are looking into moving this DB to a SAN volume with more spindles to handle the load, but if you have any idea's of why TempDB would be getting hit so hard, and ways to reduce it, we might be able to avoid moving it.

Thanks.

plexus
Posts: 27
Joined: Thu Dec 13, 2007 12:34 pm

Re: Problems after upgrade to 5.0.3

Post by plexus » Tue Jul 27, 2010 11:55 am

I don't know if this is the cause of all of the problems, but I am fairly convinced that the example I gave of my experience in the first post is due to getting a repository diff. I have been watching the disk queue and %Disk time watching for times of high activity. When that happens I can see in Reliability and Performance Monitor that tempDB is what is doing the high disk activity to that drive. Also, when I look i packet captures for this time, I see the issue start when someone did a GetRepositryStructure request:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Header>
<VaultAuth xmlns="http://www.sourcegear.com/schemas/vault">
<Token>[removed]</Token>
</VaultAuth>
</soap:Header>
<soap:Body>
<GetRepositoryStructure xmlns="http://www.sourcegear.com/schemas/vault">
<nRepID>6</nRepID>
<nBaseRevision>343593</nBaseRevision>
<nTargetRevision>-1</nTargetRevision>
<dtLastClientSecurityCheck>
<Ticks>583644240000000000</Ticks>
</dtLastClientSecurityCheck>
<dtLatestServerSecurityCheck>
<Ticks>583644240000000000</Ticks>
</dtLatestServerSecurityCheck>
<nReturnTargetRevision>-1</nReturnTargetRevision>
<bUseDBDeltaOnCacheMiss>true</bUseDBDeltaOnCacheMiss>
</GetRepositoryStructure>
</soap:Body>
</soap:Envelope>

This request takes about a minute, then the server sends the response, then the client sends a GetCheckOuListChanges request:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Header>
<VaultAuth xmlns="http://www.sourcegear.com/schemas/vault">
<Token>[Removed]</Token>
</VaultAuth>
</soap:Header>
<soap:Body>
<GetCheckOutListChanges xmlns="http://www.sourcegear.com/schemas/vault">
<nRepID>6</nRepID>
<nCheckoutListRevision>395841</nCheckoutListRevision>
<dtLastLockDate>
<Ticks>583644240000000000</Ticks>
</dtLastLockDate>
</GetCheckOutListChanges>
</soap:Body>
</soap:Envelope>


This takes about 40 seconds. All during this time TempDB is the highest activity on the drive that is having the disk queueing and activity problems.

Again, this sounds very similar to what the symptoms were in the thread link I posted. Should we try the same things: increase the tree size, or is there a way to disable the tree diff altogether? Like in the thread link, usually if you clear the client cache (and hence the local version of the repository), it downloads the whole tree again and is relatively quick.

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Re: Problems after upgrade to 5.0.3

Post by lbauer » Tue Jul 27, 2010 12:34 pm

I wouldn't adjust the tree size just yet. We haven't had users change this setting for some time.

If it's taking too long to get the repository delta, you can have your users change a setting in their Vault client.

This setting is in the Vault Client under Tools->Options-> Network Settings->Request database delta on repository cache miss. If this is checked, uncheck it.

Let me know if it makes a difference.
Linda Bauer
SourceGear
Technical Support Manager

plexus
Posts: 27
Joined: Thu Dec 13, 2007 12:34 pm

Re: Problems after upgrade to 5.0.3

Post by plexus » Tue Jul 27, 2010 2:34 pm

Linda,

Changing that client setting seems to have helped that one developer with 3000 items in a single folder. He said prior to the change, probably 90% of his checkins would take minutes each. After changing it, he did about 10 in a row and all were quick. Another developer reports that things just seem quicker in general after changing it. We are going to have a few more change the setting and see what they report.

I do have a question though: If all 70 developers make this change, will there be a performance impact on the server worse than what the delta's cause? I know the delta's seem to be a performance impact now, but I am just wondering what is going to be affected if we have everyone do this?

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Re: Problems after upgrade to 5.0.3

Post by lbauer » Tue Jul 27, 2010 5:08 pm

Changing this client setting will not affect the database, but with large repositories, it will affect network I/O and throughput.

I consulted with one of our developers, and we determined that you might want to try changing the TreeManagerSize, which you asked about earlier..

If there have been many revisions of the folder tree since a user last logged in to Vault, it can take a long time for Vault to diff (spgetrepositorydelta ) the user's cached tree against the actual tree and then download the tree changes.

Changing the TreeManagerSize will cause more trees to be stored in memory and make logins faster. This will also cause the Vault server to use more memory.

If possible, determine how many commits may occur in any given repository for a two or three day period. Then use this number for TreeManagerSize in vault.config. If you don't know, try a TreeManagerSize value around 250, and restart Vault (iisreset.exe).

The TreeManagerSize is set in Vault Server's vault.config file in the Vault Service directory.
By default the xml element, TreeManagerSize is <TreeManagerSize>-1<TreemanagerSize>. Change it to 250.

Keep an eye on the memory of the Vault Server. If there are too many trees cached, you may run into an out of memory condition or create thrashing for virtual memory. In that case, decrease the tree manager size a bit.

After the change, everyone will hit spgetrepositorydelta the first time, and then later refreshes should be handled by the Vault Server's cache.
Linda Bauer
SourceGear
Technical Support Manager

Post Reply