SOS requires restart

If you are having a problem using SourceOffSite, post a message here.

Moderator: SourceGear

Post Reply
tamadeb
Posts: 6
Joined: Tue Feb 21, 2006 7:59 pm

SOS requires restart

Post by tamadeb » Tue Feb 21, 2006 8:39 pm

SOS Version: 4.1.2
VSS: 6.0d (Build 31222)
OS: Windows 2003 w/sp1
Hardware: 1x3.4Ghz Xeon w/o HT; 1 GB of RAM; 3 x72 GB raid 5

So I've been searching for help with this thing. Just when the it looks like the thread is going somewhere, it ends without any conclusion.

What have I done so far to troubleshoot this .... headache?

First I rebuild this server from scratch.
Then I loaded VSS 6.0d (Build 31222) and install SOS 4.1.2. Made sure that the second NIC is disabled. Made sure HT is disabled.

Then my users would do their thing, and next thing I know, they have to restart the SOS service because the users can't log in. It just hangs there. Nothing happens. Once they restart the service, about 1/2 a day later, the cycle starts all over again. There are nothing on the event logson the server. Enabled verbose messaging on the SOS server and I can't see anything.

I hardly see any CPU utilization nor do I see any memory used. The SOS service lingers at about 64 MB. It doesn't want any more or any less.

I tried to troubleshoot this thing. At first I was convinced that it was the IT guy who build this server incorrectly because he had version 3.5.3 on a Windows 2003 server. You know that MSJVM issue. But now, I don't think that is the case.

Or do we need to have a dedicated SOS server for each of the developers just so that we have something stable. I need some help with this issue. Having these developers half way around the world is not helping with my sleep.

Any ideas?

Thanks

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Wed Feb 22, 2006 10:27 am

This sounds like a crash in the VSS automation component, since restarting the SOS Server gets things running again. The version of the VSS Client you're using on the SOS Server machine isn't one with a known issue, but we haven't extensively tested it with SOS. Is there another ssapi.dll on that machine? Perhaps the wrong one is registered:

http://support.sourcegear.com/viewtopic.php?t=255

You could try the Microsoft Hotfix version instead:

http://download.sourcegear.com/files/vss_60c_hotfix.zip

If these suggestions don't work, set SOS logging to Verbose and send me a copy of the log.txt file the next time SOS hangs.
Linda Bauer
SourceGear
Technical Support Manager

tamadeb
Posts: 6
Joined: Tue Feb 21, 2006 7:59 pm

I've done what you had asked......

Post by tamadeb » Thu Feb 23, 2006 8:34 am

Here's my log file...

Thanks
Attachments
log.txt
(635.18 KiB) Downloaded 900 times

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Mon Feb 27, 2006 11:43 am

Would you check in the SOS Server Manager, General Settings->Idle Connections, do you have a value set for timing out connections? If you uncheck this, do users still get disconnected?
Linda Bauer
SourceGear
Technical Support Manager

tamadeb
Posts: 6
Joined: Tue Feb 21, 2006 7:59 pm

Post by tamadeb » Mon Feb 27, 2006 11:49 am

the setting is unchecked. should i try checking it?

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Mon Feb 27, 2006 1:15 pm

No, it might cause additional disconnects. We're looking into the error messages in the log file.

Could some device like a firewall, etc. be closing connections after a certain amount of time or data has passed?
Linda Bauer
SourceGear
Technical Support Manager

tamadeb
Posts: 6
Joined: Tue Feb 21, 2006 7:59 pm

Post by tamadeb » Mon Mar 06, 2006 7:47 am

i checked our firewall and do not see anything that would restrict the time or the amount of traffic between sites.

tamadeb
Posts: 6
Joined: Tue Feb 21, 2006 7:59 pm

Post by tamadeb » Fri Mar 17, 2006 3:14 am

This issue is still open...

Are there anything else that I can provide so that I can get some help with this?

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Fri Mar 17, 2006 9:30 am

There are many errors in the log that point to network problems outside of SOS:
2/21/2006 8:10:35 AM - 1: ParseStream created a null message!

2/21/2006 8:17:45 AM - 2: Error processing client request: at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 size, SocketFlags socketFlags)
at ClassicService.ProtocolMessage.ParseStream(Socket socket, Crypto crypto)
at ClassicService.Client.GetMessage()

2/21/2006 1:33:00 PM - The requested name is valid, but no data of the requested type was found
We have seen this error when there is a concurrency crash:
2/21/2006 8:37:12 PM - 22: Exception: A blocking operation was interrupted by a call to WSACancelBlockingCall
So it's possible there are network errors as well as issues with the VSS Automation Component.

If the server hangs are being caused by a crash of the VSS automation component, I would suggest changing the version used by the SOS Server. We have not done extensive testing of the version you are using.

However, we do know that the Microsoft Hotfix version is one of the more stable versions to use with SOS. I had suggested trying this earlier -- are you using this now and still having issues?

http://download.sourcegear.com/files/vss_60c_hotfix.zip

If this doesn't help, the next things to try are:

Run Analyze on your VSS database to make sure database inconsistencies are not causing the VSS automation component and the SOS Server to hang:

http://support.sourcegear.com/viewtopic.php?t=50

Use -f and then -c until no more errors are reported.

Since you have 65 users, you also may want to consider splitting the load among two or more SOS Servers. You can install the SOS Server on one or more machines and have each SOS Server serve a certain number of users.
Linda Bauer
SourceGear
Technical Support Manager

tamadeb
Posts: 6
Joined: Tue Feb 21, 2006 7:59 pm

found the fix

Post by tamadeb » Mon Apr 10, 2006 11:12 am

Just so that anyone who has database on Novell server, you need to add "sharable" attribute to rights.dat file.

No more restarts since

Post Reply