Vault process becomes CPU bound.

If you are having a problem using Vault, post a message here.

Moderator: SourceGear

Post Reply
Guest

Vault process becomes CPU bound.

Post by Guest » Fri Nov 05, 2004 12:39 pm

We’re running Vault (2.0.6.2219) on a W2K3 server with FogBUGZ, SQL 2K and SharePoint Services. Apparently (before I became involved) the vault service was running OK. Now, as webmaster, I’m getting infrequent complaints of Vault locking up. What I see it the process running at 100% CPU and attempts to connect with the client or vs.NET fail. Restarting IIS killed the process consuming 100% CPU but only until the developer who complained tried to resume what he was doing. Rebooting the server will restore service for a longer time. There doesn’t appear to be any time pattern.

There are a series of event log entries like

Code: Select all

“A process serving application pool 'VaultAppPool' failed to respond to a ping. The process id was '652'.” 
immediately following an entry like

Code: Select all

“A worker process with process id of '3816' serving application pool 'VaultAppPool' has requested a recycle because the worker process reached its allowed processing time limit.”  
These messages occurred hours before some of the complaints but not always and not before the detailed examination of log files noted below.

The vault log only reports

Code: Select all

“A configuration error occurred reading [DelayThreshold] from Vault.Config.  ConfigReader reported the following error: The section could not be found within vault.config”. 
This message precedes system startup messages.

The only other error messages in the vault log are like:

Code: Select all

An error occurred during the deletion of rnjzi355ryonksagbol5uifg.  Please check that the session was removed from the database.\tTransaction (Process ID 57) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Code: Select all

An error occurred during the deletion of session, rnjzi355ryonksagbol5uifg.  Please check that the session was removed from sgvault.dbo.tblsession in the database. Error: FailDBDelete
These messages don’t seem to have any time correlation with the complaints.

On the other hand, these associations could be purely coincidental. A detailed examination of log files for one event shows only a block of time between the onset of symptom and system reboot with no entries in the sgvault.log file. There were no errors before the event and only the ConfigReader message before the System Start message after the computer was rebooted.

Scheduled maintenance for next Monday includes disabling the recycling of worker processes as suggested by
http://support.sourcegear.com/viewtopic.php?t=1014

I’m not convinced that this will be a solution because of the log file analysis findings. My question to Source Gear is; What the hell could be causing the web service process, associated with it’s own ApplicationPool to consume 100%CPU for hours at a time?

CONFIG:
Windows 2K3 Enterprise Edition P3-600mhz 512MB Patched to date
MS SQL 2000 sp3
FogBUGZ Version 3.1.9 (DB 328)
Vault (2.0.6.2219)
SharePoint Services sp1 is installed on a separate virtual server

JeromeThomas
Posts: 5
Joined: Thu Nov 04, 2004 3:22 pm
Location: USA

Vault process becomes CPU bound.

Post by JeromeThomas » Fri Nov 05, 2004 12:41 pm

Sorry, thought I was logged in when I made that post.

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Fri Nov 05, 2004 1:19 pm

Off the top of my head, I don't know what would be causing the problem.

1) Is IIS 6 recycling quite a bit, and does your situation involve having 100s or 1000s of checked out files?

2) What is taking up 100% - w3wp.exe or a different process?

3) What specific action is the user taking which causes the server to go haywire? A checkout? Branch? History lookup?
Jeff Clausius
SourceGear

JeromeThomas
Posts: 5
Joined: Thu Nov 04, 2004 3:22 pm
Location: USA

Post by JeromeThomas » Fri Nov 05, 2004 2:56 pm

jclausius wrote:Off the top of my head, I don't know what would be causing the problem.

1) Is IIS 6 recycling quite a bit, and does your situation involve having 100s or 1000s of checked out files?

2) What is taking up 100% - w3wp.exe or a different process?

3) What specific action is the user taking which causes the server to go haywire? A checkout? Branch? History lookup?

IIS has recycled 23 times since 10/1 days but W3WC was discovered to be at 100% CPU only 9 times. The first time it was brought to my attention I thought this was related. The event log was full of Ping time outs all the way back to the recycle. Ping time outs always follow a recycle but I've also responded to Vault complaints where these were not immediately in the event log. The timing on this is all over the board. Recycles occured anywhere from 30 minutes to 12 hours before the server was reboot to clear up vault and it recycled anywhere from once to three times before a reboot.

Matching the reboots to the vault log reveals only an absence of vault log entries when the reposatories are inaccessable. There are no error messages immediately preceeding the reboots. All the reboot entries in the system log have messages like "rebooting because vault is spazing out again". :shock: Yea, that's informative.

Here's an interesting tidbit. There are no complaints in the log prior to installing .NET 1.1 sp1.

It looks as if one of the developers was trying to fix this before I was asked to look in to it. SharePoint (installed on a seperate virtual server) was shut down and all it's app pools stopped. Vault was assigned it's own app pool, vault logs were turned on...

One of the lead developers has this to say...
We do not use the check-out/edit/check-in model. We use the edit/merge/check-in model. That being the case, we rarely have any files that are officially "checked-out". The total number of files in the repository is not extremely large - probably no more than a couple of thousand files if that. A normal "check-in" operation is usually only a handful of files - perhaps a dozen or so (sometimes more, frequently less).

The symptom is that the Vault client cannot connect to the server - I'm guessing its because the Vault web service never responds. From the client perspective, its like the server does not even exist.

I have not been able to determine any specific action that triggers the problem. We don't do branching, so its not that. We occasionally do Labels, but not frequently. The most common actions are:

- check-in
- show differences
- get latest version (recursive, locked on main folder)
- show history
- we have a continuous integration server (Draco.NET) that uses the Vault command-line client to check for updates and then automatically get new files to perform an automated build

I have not seen any pattern as to one specific action that triggers the problem. I think that the Vault server application has a verbose logging mode - perhaps we could turn on verbose mode and then examine the logs the next time it happens to see if that sheds any light on it.

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Fri Nov 05, 2004 3:51 pm

Are there any other web apps installed on this server, particularly MS Front Page? If it was OK before .Net 1.1 SP1, has anything else has changed on the machine since then?
Jeff Clausius
SourceGear

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Fri Nov 05, 2004 4:17 pm

I forgot to ask, which process is running 100% utlization?
Jeff Clausius
SourceGear

JeromeThomas
Posts: 5
Joined: Thu Nov 04, 2004 3:22 pm
Location: USA

Post by JeromeThomas » Fri Nov 05, 2004 4:49 pm

MS Front Page is not installed on this (default web) virtual server. Looks like the crystal reports viewer from vs.NET 2K3, Infragraphics UltraWeb controls, and MS SQL Reporting Services. All of which were installed prior to SG and FB (May). My developers report that SG worked fine before October. The only change in the notes is the addition of SharePoint on a seperate virtual server in June and .NET 1.1 sp1 in 9/27. The first complaint of SG is 10/4.

I just checked the install log and the last thing installed was NET 1.1 sp1 on 9/27, dito for the windowsupdate log.

Last week I installed Process Explorer from sysinternals.com to try to figure out what's going on.

The process that pegs the CPU is w3wc. It's one of two running on the server. It's owner is NETWORK SERVICE. The other instances of w3wc is owned by IWAM_WIN2K3 . VaultAppPool's identity is NETWORK SERVICE, DefaultAppPool's is IWAM_WIN2K3.

JeromeThomas
Posts: 5
Joined: Thu Nov 04, 2004 3:22 pm
Location: USA

Post by JeromeThomas » Fri Nov 05, 2004 4:57 pm

Oh yea, there's an installation of vs.NET 2K3 but that was installed September 2003

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Fri Nov 05, 2004 10:40 pm

Nothing seems extraordinary in your description.

I did uncover some reports that MS FrontPage was causing 100% utilization of IIS's ASP.Net process, but you've eliminated that as the problem.

Searching http://groups.google.com for "Sharepoint" "100" "cpu" does generate a fair number of hits. I wonder...

Would it be too much to ask if Sharepoint be completely removed for a temporary time period? It would be one less variable in the equation. Note, I don't know what "leftovers" IIS configuration settings might still exist after Sharepoint is uninstalled ( like Sharepoint's web.config in %SYSTEMDRIVE%\Inetpub\wwwroot ).

Otherwise, could you temporarily move the Vault server to a different machine ( but keeping links to the the same database )? Preferably a machine which has never had Sharepoint installed.

The only other thing that comes to mind is placing the Vault server into Debug Logging mode from within the Vault Admin Tool. This allows the server to record all web method hits. If there is anything going on there, at least there will be a log of the web method. Perhaps the data there can give an indication what is happening.
Jeff Clausius
SourceGear

JeromeThomas
Posts: 5
Joined: Thu Nov 04, 2004 3:22 pm
Location: USA

Post by JeromeThomas » Wed Nov 17, 2004 4:28 pm

Just posting a followup to this thread. I't been over a week since altering the IIS configuration of the VaultService and it's virtual server (default). Normally, we would have experienced at least 4 instances of user inability to connect and repeated event log entries of ping time outs. To date, we've not experienced any inaccessability issues and there were only two ping time outs during peak load periods.

This was accomplished by implementing the http://support.sourcegear.com/viewtopic.php?t=1014
Additionally, VaultService was assigned it's own AppPool and that pool's process was limited to 50%CPU. Near as I can tell, this has resolved this issue so I'm closing the case.

Thankyou all for your input
Jeorme

Post Reply