server hangs

If you are having a problem using SourceOffSite, post a message here.

Moderator: SourceGear

blehenbauer
Posts: 20
Joined: Mon Mar 15, 2004 10:46 am

server hangs

Post by blehenbauer » Tue Jul 27, 2004 8:04 am

We currently license 3.5.3 Server and are on the last legs of a 4.0 Server trial license.

In 3.5.3, we frequently experienced issues when multiple developers did simultaneous gets from the database of a Visual Studio Solution from a visual sourcesafe database. The SOS Visual Studio IDE would hang and the database would become inaccessible using the SOS client. If the machines were left, they would stay in this state for hours. To get things moving again, we'd usually have to kill Visual Studio and restart the SOS server.

We had hoped to see this change when we used 4.0 Server, but as soon as more developers starting using it we began experiencing the same issue with the SOS clients hanging when everyone accessed the database simultaneously.

My guess is that this might be some kind of file locking issue? Are there any steps we can take to find the source?

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Thu Jul 29, 2004 7:31 am

You may be experiencing the concurrency bug in the VSS automation component.

See if either of these KB articles provides a solution:

http://support.sourcegear.com/viewtopic.php?t=10

http://support.sourcegear.com/viewtopic.php?t=1348
Linda Bauer
SourceGear
Technical Support Manager

blehenbauer
Posts: 20
Joined: Mon Mar 15, 2004 10:46 am

no luck yet

Post by blehenbauer » Thu Jul 29, 2004 9:56 am

We tried this this morning, it was set to use apartment threading, but already we have had the hang. Here is some more information on our setup:

1. We have a sourcesafe database installed on a Windows 2003 server. This is running VSS 6.0 D and has always been set to use "Both" for the threading model.

2. Some users in our office connect to this VSS database via a mapped drive, using the version of VSS installed on their personal computers, not using SOS.

3. Our developers always use SOS when referencing this SOS database from Visual Studio. Because our version of SOS (3.5.3) does not support windows 2003 Server, SOS is on a W2K server that references the VSS database via a mapped drive.

I checked the registry for both the 2003 server, that the VSS database is on, and for the 2000 Server, that the SOS server is on, and added the registry change to the 2000 server, but we had a hang again this morning.

As I said before, we just finished a trial version of 4.0 Server (which was on the 2003 server with the database, and this server always had the registry change), and we were having hangs on that server as well.

However, I'm wondering the context that the automation component runs in. If I reference VSS from a mapped drive, does it use *my* installed version of the automation component and registry settings, or does it use the server's automation component? If so, would I need to update this registry setting on the computer of every user that opens it with VSS on a mapped drive?

blehenbauer
Posts: 20
Joined: Mon Mar 15, 2004 10:46 am

one more piece of information

Post by blehenbauer » Thu Jul 29, 2004 10:02 am

When we have a SOS hang, VSS never hangs. We can continue to access the VSS database using VSS from any machine (using mapped drives), but the SOS server will not work until the service is restarted.

In addition, when we had a hang on the 3.5.3 SOS server, it would cause the 4.0 SOS server on a different machine to hang, but the 4.0 server would recover when the 3.5.3 service was restarted. Because these are on different machines, it really sounds like the 3.5.3 service may have been locking a necessary file, like some kind of file locking problem?

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Mon Aug 02, 2004 3:59 pm

VSS clients are totally independent of SOS, so they would be able to connect to the VSS database even if SOS is having problems, as long as VSS is fine.

The SOS Server connects to VSS through the SourceSafe automation component. This can crash or hang, bringing down the SOS Server as well. The automation component is provided by the VSS client on the SOS Server machine, so you don't need to do any registry changes for VSS on client machines.

What's strange here is that your SOS Servers seem to be interdependent. If these are on different machines, the only thing in common would be that they connect to the same srscafe.ini file of the VSS database.

After the next hang, send me copies of the log.txt files from both SOS 4.0 and SOS 3.5.3 servers. They're in the SOS Server directories.
Linda Bauer
SourceGear
Technical Support Manager

blehenbauer
Posts: 20
Joined: Mon Mar 15, 2004 10:46 am

Post by blehenbauer » Wed Aug 04, 2004 10:15 am

This morning, we had 4 developers all doing a get latest version during the same half hour.

They got frustrated and restarted the SOS server after their clients became unresponsive and did not recover in 15 minutes.

Looking at the logs, there is nothing to indicate anything other than heavy use. (I'll send you a copy of our log file as a PM.) The get latest version operation is continuing at the time they restarted the server.

At this point I am wondering if we are just expecting it to be more responsive than it is capable of being. 4 developers is not what I would consider to be heavy use, but maybe we are being impatient. Are our expectations for responsiveness out of line?

blehenbauer
Posts: 20
Joined: Mon Mar 15, 2004 10:46 am

Post by blehenbauer » Tue Aug 10, 2004 10:43 am

We've had a couple more restarts since last posting. I am pm'ing you our log file. Any help you can provide is appreciated.

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Wed Aug 11, 2004 10:58 am

Your SOS 3.5.3 log file shows a large number of disconnects and errors like "socket read failed" "connection reset by peer".

This often indicates a network problem, generally a firewall or other device closing the connection. It could also be due to the SOS Server shutting down.

I noted only 4 restarts of the SOS 3.5.3 server between July 15 and August 20 . . . which is strange, because today's date is Aug 11. What date is the SOS Server machine set to?

Are you running SOS 4.0 and 3.5.3 on the same machine? If so, there could be a conflict because SOS 4.0 uses port 8080 and the 3.5.3 log indicates it's using port 8080 as well.
Linda Bauer
SourceGear
Technical Support Manager

blehenbauer
Posts: 20
Joined: Mon Mar 15, 2004 10:46 am

Post by blehenbauer » Wed Aug 11, 2004 11:11 am

lbauer wrote:I noted only 4 restarts of the SOS 3.5.3 server between July 15 and August 20 . . . which is strange, because today's date is Aug 11. What date is the SOS Server machine set to?
I just logged in to double-check, and the server date is currently Aug 11. The last modified date on the log file says Aug 11. But you are right, the last date in the log is Aug 20.
lbauer wrote:Are you running SOS 4.0 and 3.5.3 on the same machine? If so, there could be a conflict because SOS 4.0 uses port 8080 and the 3.5.3 log indicates it's using port 8080 as well.
No, they are not running on the same machine. When we did have SOS 4.0 server installed on a different machine, and we opted not to license at the end of our trial when it did not appear to solve our problems.
lbauer wrote:Your SOS 3.5.3 log file shows a large number of disconnects and errors like "socket read failed" "connection reset by peer".

This often indicates a network problem, generally a firewall or other device closing the connection. It could also be due to the SOS Server shutting down.
Do you have any steps you can recommend to help me to troubleshoot the network disconnects? Could this cause a server hang like the one we are seeing to occur?

blehenbauer
Posts: 20
Joined: Mon Mar 15, 2004 10:46 am

Post by blehenbauer » Wed Aug 11, 2004 11:35 am

I apologize, in my haste I did not look at the year on the log file and attached the wrong log. Sorry for having you look at the wrong file. I'm sending the correct one.

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Thu Aug 12, 2004 10:44 am

This still sounds like a concurrency problem.

Since you're dealing with different machines and more than one installation of SOS, let's troubleshoot this systematically.

On the machine hosting the SOS 3.5.3 server, find the VSS client Win32 directory. (Might be under
C:\Program Files\Microsoft Visual Studio\. .) Locate the ssapi.dll file. Right click on the file, select Properties->Version. Send me a screenshot showing the version.

Next, on the SOS Server machine, find this key:
HKEY_CLASSES_ROOT\CLSID\{783CD4E4-9D54-11CF-B8EE-00608CC9A71F}\InprocServer32

Send me a screenshot of the values in that key.
Linda Bauer
SourceGear
Technical Support Manager

joey
Posts: 2
Joined: Fri Jul 23, 2004 10:28 am

Post by joey » Mon Aug 23, 2004 3:45 pm

Linda,

The hotfix for the "concurrency bug in the VSS automation " with respect to VSS 6.0c & earlier should be applied to the server that is hosting SOS.
Should the clients replace their components in the win32 directory as well ?
thanks.

JoeyLee
CSC

joey
Posts: 2
Joined: Fri Jul 23, 2004 10:28 am

Post by joey » Mon Aug 23, 2004 3:46 pm

Linda,

The hotfix for the "concurrency bug in the VSS automation " with respect to VSS 6.0c & earlier should be applied to the server that is hosting SOS.
Should the clients replace their components in the win32 directory as well ?
thanks.

JoeyLee
CSC

lbauer
Posts: 9736
Joined: Tue Dec 16, 2003 1:25 pm
Location: SourceGear

Post by lbauer » Thu Aug 26, 2004 7:31 am

No, that's not necessary. The SOS Server uses the automation component provided by the VSS client on the SOS Server machine.

So only the VSS Client on the server machine is important to SOS operations. SOS does not interact with VSS clients on other machines.
Linda Bauer
SourceGear
Technical Support Manager

Guest

Post by Guest » Fri Sep 10, 2004 12:35 pm

Okay, we're using the SOS 4.0 server now, and the issue persists.

It looks a lot like the issue described in the topic below http://support.sourcegear.com/viewtopic.php?t=1610

In this last case, someone was doing a big get, the connection to sourcesafe was dropped, and then no one was able to log in.

Here is the end of the log file:

9/10/2004 1:14:48 PM - 2: GetFileList project: $/Development/Angel Dev/Setup/AngelPortal/AngelRoot/Admin/Db/SQL/Queries/Angel55Updates
9/10/2004 1:14:48 PM - 1: Received message number 102.
9/10/2004 1:14:48 PM - 1: Enter GetFileList()
9/10/2004 1:14:48 PM - 1: Ignore remote dates = False
9/10/2004 1:14:48 PM - 1: GetFileList project: $/Development/ANGEL 6.1/ANGEL
9/10/2004 1:14:48 PM - Enter PinStatus()
9/10/2004 1:14:48 PM - Enter PinStatus()
9/10/2004 1:20:21 PM - Connection accepted from 192.168.139.10:18318 on local address 192.168.139.19:8080, session id is 3.
9/10/2004 1:20:21 PM - 3: Enter Authorized()
9/10/2004 1:20:26 PM - 3: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:20:26 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:20:26 PM - 3: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:20:26 PM - 3: Preparing to send the list of databases...
9/10/2004 1:20:26 PM - 3: Sending the challenge to the client.
9/10/2004 1:20:26 PM - 3: Sending the challenge message body to the client.
9/10/2004 1:20:26 PM - 3: Waiting for the client's response...
9/10/2004 1:20:26 PM - 3: Reviewing the client's response...
9/10/2004 1:20:26 PM - 3: Process the client's non-crypto login request.
9/10/2004 1:20:26 PM - 3: Enter Login()
9/10/2004 1:20:26 PM - 3: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:20:26 PM - 3: Client is speaking protocol version 2.0

9/10/2004 1:21:19 PM - Connection accepted from 192.168.139.10:18834 on local address 192.168.139.19:8080, session id is 4.
9/10/2004 1:21:19 PM - 4: Enter Authorized()
9/10/2004 1:21:24 PM - 4: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:21:24 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:21:24 PM - 4: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:21:24 PM - 4: Preparing to send the list of databases...
9/10/2004 1:21:24 PM - 4: Sending the challenge to the client.
9/10/2004 1:21:24 PM - 4: Sending the challenge message body to the client.
9/10/2004 1:21:24 PM - 4: Waiting for the client's response...
9/10/2004 1:21:26 PM - 4: Reviewing the client's response...
9/10/2004 1:21:26 PM - 4: Process the client's non-crypto login request.
9/10/2004 1:21:26 PM - 4: Enter Login()
9/10/2004 1:21:26 PM - 4: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:21:26 PM - 4: Client is speaking protocol version 2.0

9/10/2004 1:23:15 PM - Connection accepted from 192.168.139.10:19823 on local address 192.168.139.19:8080, session id is 5.
9/10/2004 1:23:15 PM - 5: Enter Authorized()
9/10/2004 1:23:19 PM - 5: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:23:19 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:23:19 PM - 5: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:23:19 PM - 5: Preparing to send the list of databases...
9/10/2004 1:23:19 PM - 5: Sending the challenge to the client.
9/10/2004 1:23:19 PM - 5: Sending the challenge message body to the client.
9/10/2004 1:23:19 PM - 5: Waiting for the client's response...
9/10/2004 1:23:20 PM - 5: Reviewing the client's response...
9/10/2004 1:23:20 PM - 5: Process the client's non-crypto login request.
9/10/2004 1:23:20 PM - 5: Enter Login()
9/10/2004 1:23:20 PM - 5: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:23:20 PM - 5: Client is speaking protocol version 2.0

9/10/2004 1:24:01 PM - Connection accepted from 192.168.139.10:20394 on local address 192.168.139.19:8080, session id is 6.
9/10/2004 1:24:01 PM - 6: Enter Authorized()
9/10/2004 1:24:06 PM - 6: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:24:06 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:24:06 PM - 6: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:24:06 PM - 6: Preparing to send the list of databases...
9/10/2004 1:24:06 PM - 6: Sending the challenge to the client.
9/10/2004 1:24:06 PM - 6: Sending the challenge message body to the client.
9/10/2004 1:24:06 PM - 6: Waiting for the client's response...
9/10/2004 1:24:07 PM - 6: Reviewing the client's response...
9/10/2004 1:24:07 PM - 6: Process the client's non-crypto login request.
9/10/2004 1:24:07 PM - 6: Enter Login()
9/10/2004 1:24:07 PM - 6: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:24:07 PM - 6: Client is speaking protocol version 2.0

Post Reply