server hangs
Moderator: SourceGear
-
- Posts: 20
- Joined: Mon Mar 15, 2004 10:46 am
server hangs
We currently license 3.5.3 Server and are on the last legs of a 4.0 Server trial license.
In 3.5.3, we frequently experienced issues when multiple developers did simultaneous gets from the database of a Visual Studio Solution from a visual sourcesafe database. The SOS Visual Studio IDE would hang and the database would become inaccessible using the SOS client. If the machines were left, they would stay in this state for hours. To get things moving again, we'd usually have to kill Visual Studio and restart the SOS server.
We had hoped to see this change when we used 4.0 Server, but as soon as more developers starting using it we began experiencing the same issue with the SOS clients hanging when everyone accessed the database simultaneously.
My guess is that this might be some kind of file locking issue? Are there any steps we can take to find the source?
In 3.5.3, we frequently experienced issues when multiple developers did simultaneous gets from the database of a Visual Studio Solution from a visual sourcesafe database. The SOS Visual Studio IDE would hang and the database would become inaccessible using the SOS client. If the machines were left, they would stay in this state for hours. To get things moving again, we'd usually have to kill Visual Studio and restart the SOS server.
We had hoped to see this change when we used 4.0 Server, but as soon as more developers starting using it we began experiencing the same issue with the SOS clients hanging when everyone accessed the database simultaneously.
My guess is that this might be some kind of file locking issue? Are there any steps we can take to find the source?
You may be experiencing the concurrency bug in the VSS automation component.
See if either of these KB articles provides a solution:
http://support.sourcegear.com/viewtopic.php?t=10
http://support.sourcegear.com/viewtopic.php?t=1348
See if either of these KB articles provides a solution:
http://support.sourcegear.com/viewtopic.php?t=10
http://support.sourcegear.com/viewtopic.php?t=1348
Linda Bauer
SourceGear
Technical Support Manager
SourceGear
Technical Support Manager
-
- Posts: 20
- Joined: Mon Mar 15, 2004 10:46 am
no luck yet
We tried this this morning, it was set to use apartment threading, but already we have had the hang. Here is some more information on our setup:
1. We have a sourcesafe database installed on a Windows 2003 server. This is running VSS 6.0 D and has always been set to use "Both" for the threading model.
2. Some users in our office connect to this VSS database via a mapped drive, using the version of VSS installed on their personal computers, not using SOS.
3. Our developers always use SOS when referencing this SOS database from Visual Studio. Because our version of SOS (3.5.3) does not support windows 2003 Server, SOS is on a W2K server that references the VSS database via a mapped drive.
I checked the registry for both the 2003 server, that the VSS database is on, and for the 2000 Server, that the SOS server is on, and added the registry change to the 2000 server, but we had a hang again this morning.
As I said before, we just finished a trial version of 4.0 Server (which was on the 2003 server with the database, and this server always had the registry change), and we were having hangs on that server as well.
However, I'm wondering the context that the automation component runs in. If I reference VSS from a mapped drive, does it use *my* installed version of the automation component and registry settings, or does it use the server's automation component? If so, would I need to update this registry setting on the computer of every user that opens it with VSS on a mapped drive?
1. We have a sourcesafe database installed on a Windows 2003 server. This is running VSS 6.0 D and has always been set to use "Both" for the threading model.
2. Some users in our office connect to this VSS database via a mapped drive, using the version of VSS installed on their personal computers, not using SOS.
3. Our developers always use SOS when referencing this SOS database from Visual Studio. Because our version of SOS (3.5.3) does not support windows 2003 Server, SOS is on a W2K server that references the VSS database via a mapped drive.
I checked the registry for both the 2003 server, that the VSS database is on, and for the 2000 Server, that the SOS server is on, and added the registry change to the 2000 server, but we had a hang again this morning.
As I said before, we just finished a trial version of 4.0 Server (which was on the 2003 server with the database, and this server always had the registry change), and we were having hangs on that server as well.
However, I'm wondering the context that the automation component runs in. If I reference VSS from a mapped drive, does it use *my* installed version of the automation component and registry settings, or does it use the server's automation component? If so, would I need to update this registry setting on the computer of every user that opens it with VSS on a mapped drive?
-
- Posts: 20
- Joined: Mon Mar 15, 2004 10:46 am
one more piece of information
When we have a SOS hang, VSS never hangs. We can continue to access the VSS database using VSS from any machine (using mapped drives), but the SOS server will not work until the service is restarted.
In addition, when we had a hang on the 3.5.3 SOS server, it would cause the 4.0 SOS server on a different machine to hang, but the 4.0 server would recover when the 3.5.3 service was restarted. Because these are on different machines, it really sounds like the 3.5.3 service may have been locking a necessary file, like some kind of file locking problem?
In addition, when we had a hang on the 3.5.3 SOS server, it would cause the 4.0 SOS server on a different machine to hang, but the 4.0 server would recover when the 3.5.3 service was restarted. Because these are on different machines, it really sounds like the 3.5.3 service may have been locking a necessary file, like some kind of file locking problem?
VSS clients are totally independent of SOS, so they would be able to connect to the VSS database even if SOS is having problems, as long as VSS is fine.
The SOS Server connects to VSS through the SourceSafe automation component. This can crash or hang, bringing down the SOS Server as well. The automation component is provided by the VSS client on the SOS Server machine, so you don't need to do any registry changes for VSS on client machines.
What's strange here is that your SOS Servers seem to be interdependent. If these are on different machines, the only thing in common would be that they connect to the same srscafe.ini file of the VSS database.
After the next hang, send me copies of the log.txt files from both SOS 4.0 and SOS 3.5.3 servers. They're in the SOS Server directories.
The SOS Server connects to VSS through the SourceSafe automation component. This can crash or hang, bringing down the SOS Server as well. The automation component is provided by the VSS client on the SOS Server machine, so you don't need to do any registry changes for VSS on client machines.
What's strange here is that your SOS Servers seem to be interdependent. If these are on different machines, the only thing in common would be that they connect to the same srscafe.ini file of the VSS database.
After the next hang, send me copies of the log.txt files from both SOS 4.0 and SOS 3.5.3 servers. They're in the SOS Server directories.
Linda Bauer
SourceGear
Technical Support Manager
SourceGear
Technical Support Manager
-
- Posts: 20
- Joined: Mon Mar 15, 2004 10:46 am
This morning, we had 4 developers all doing a get latest version during the same half hour.
They got frustrated and restarted the SOS server after their clients became unresponsive and did not recover in 15 minutes.
Looking at the logs, there is nothing to indicate anything other than heavy use. (I'll send you a copy of our log file as a PM.) The get latest version operation is continuing at the time they restarted the server.
At this point I am wondering if we are just expecting it to be more responsive than it is capable of being. 4 developers is not what I would consider to be heavy use, but maybe we are being impatient. Are our expectations for responsiveness out of line?
They got frustrated and restarted the SOS server after their clients became unresponsive and did not recover in 15 minutes.
Looking at the logs, there is nothing to indicate anything other than heavy use. (I'll send you a copy of our log file as a PM.) The get latest version operation is continuing at the time they restarted the server.
At this point I am wondering if we are just expecting it to be more responsive than it is capable of being. 4 developers is not what I would consider to be heavy use, but maybe we are being impatient. Are our expectations for responsiveness out of line?
-
- Posts: 20
- Joined: Mon Mar 15, 2004 10:46 am
Your SOS 3.5.3 log file shows a large number of disconnects and errors like "socket read failed" "connection reset by peer".
This often indicates a network problem, generally a firewall or other device closing the connection. It could also be due to the SOS Server shutting down.
I noted only 4 restarts of the SOS 3.5.3 server between July 15 and August 20 . . . which is strange, because today's date is Aug 11. What date is the SOS Server machine set to?
Are you running SOS 4.0 and 3.5.3 on the same machine? If so, there could be a conflict because SOS 4.0 uses port 8080 and the 3.5.3 log indicates it's using port 8080 as well.
This often indicates a network problem, generally a firewall or other device closing the connection. It could also be due to the SOS Server shutting down.
I noted only 4 restarts of the SOS 3.5.3 server between July 15 and August 20 . . . which is strange, because today's date is Aug 11. What date is the SOS Server machine set to?
Are you running SOS 4.0 and 3.5.3 on the same machine? If so, there could be a conflict because SOS 4.0 uses port 8080 and the 3.5.3 log indicates it's using port 8080 as well.
Linda Bauer
SourceGear
Technical Support Manager
SourceGear
Technical Support Manager
-
- Posts: 20
- Joined: Mon Mar 15, 2004 10:46 am
I just logged in to double-check, and the server date is currently Aug 11. The last modified date on the log file says Aug 11. But you are right, the last date in the log is Aug 20.lbauer wrote:I noted only 4 restarts of the SOS 3.5.3 server between July 15 and August 20 . . . which is strange, because today's date is Aug 11. What date is the SOS Server machine set to?
No, they are not running on the same machine. When we did have SOS 4.0 server installed on a different machine, and we opted not to license at the end of our trial when it did not appear to solve our problems.lbauer wrote:Are you running SOS 4.0 and 3.5.3 on the same machine? If so, there could be a conflict because SOS 4.0 uses port 8080 and the 3.5.3 log indicates it's using port 8080 as well.
Do you have any steps you can recommend to help me to troubleshoot the network disconnects? Could this cause a server hang like the one we are seeing to occur?lbauer wrote:Your SOS 3.5.3 log file shows a large number of disconnects and errors like "socket read failed" "connection reset by peer".
This often indicates a network problem, generally a firewall or other device closing the connection. It could also be due to the SOS Server shutting down.
-
- Posts: 20
- Joined: Mon Mar 15, 2004 10:46 am
This still sounds like a concurrency problem.
Since you're dealing with different machines and more than one installation of SOS, let's troubleshoot this systematically.
On the machine hosting the SOS 3.5.3 server, find the VSS client Win32 directory. (Might be under
C:\Program Files\Microsoft Visual Studio\. .) Locate the ssapi.dll file. Right click on the file, select Properties->Version. Send me a screenshot showing the version.
Next, on the SOS Server machine, find this key:
HKEY_CLASSES_ROOT\CLSID\{783CD4E4-9D54-11CF-B8EE-00608CC9A71F}\InprocServer32
Send me a screenshot of the values in that key.
Since you're dealing with different machines and more than one installation of SOS, let's troubleshoot this systematically.
On the machine hosting the SOS 3.5.3 server, find the VSS client Win32 directory. (Might be under
C:\Program Files\Microsoft Visual Studio\. .) Locate the ssapi.dll file. Right click on the file, select Properties->Version. Send me a screenshot showing the version.
Next, on the SOS Server machine, find this key:
HKEY_CLASSES_ROOT\CLSID\{783CD4E4-9D54-11CF-B8EE-00608CC9A71F}\InprocServer32
Send me a screenshot of the values in that key.
Linda Bauer
SourceGear
Technical Support Manager
SourceGear
Technical Support Manager
No, that's not necessary. The SOS Server uses the automation component provided by the VSS client on the SOS Server machine.
So only the VSS Client on the server machine is important to SOS operations. SOS does not interact with VSS clients on other machines.
So only the VSS Client on the server machine is important to SOS operations. SOS does not interact with VSS clients on other machines.
Linda Bauer
SourceGear
Technical Support Manager
SourceGear
Technical Support Manager
Okay, we're using the SOS 4.0 server now, and the issue persists.
It looks a lot like the issue described in the topic below http://support.sourcegear.com/viewtopic.php?t=1610
In this last case, someone was doing a big get, the connection to sourcesafe was dropped, and then no one was able to log in.
Here is the end of the log file:
9/10/2004 1:14:48 PM - 2: GetFileList project: $/Development/Angel Dev/Setup/AngelPortal/AngelRoot/Admin/Db/SQL/Queries/Angel55Updates
9/10/2004 1:14:48 PM - 1: Received message number 102.
9/10/2004 1:14:48 PM - 1: Enter GetFileList()
9/10/2004 1:14:48 PM - 1: Ignore remote dates = False
9/10/2004 1:14:48 PM - 1: GetFileList project: $/Development/ANGEL 6.1/ANGEL
9/10/2004 1:14:48 PM - Enter PinStatus()
9/10/2004 1:14:48 PM - Enter PinStatus()
9/10/2004 1:20:21 PM - Connection accepted from 192.168.139.10:18318 on local address 192.168.139.19:8080, session id is 3.
9/10/2004 1:20:21 PM - 3: Enter Authorized()
9/10/2004 1:20:26 PM - 3: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:20:26 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:20:26 PM - 3: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:20:26 PM - 3: Preparing to send the list of databases...
9/10/2004 1:20:26 PM - 3: Sending the challenge to the client.
9/10/2004 1:20:26 PM - 3: Sending the challenge message body to the client.
9/10/2004 1:20:26 PM - 3: Waiting for the client's response...
9/10/2004 1:20:26 PM - 3: Reviewing the client's response...
9/10/2004 1:20:26 PM - 3: Process the client's non-crypto login request.
9/10/2004 1:20:26 PM - 3: Enter Login()
9/10/2004 1:20:26 PM - 3: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:20:26 PM - 3: Client is speaking protocol version 2.0
9/10/2004 1:21:19 PM - Connection accepted from 192.168.139.10:18834 on local address 192.168.139.19:8080, session id is 4.
9/10/2004 1:21:19 PM - 4: Enter Authorized()
9/10/2004 1:21:24 PM - 4: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:21:24 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:21:24 PM - 4: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:21:24 PM - 4: Preparing to send the list of databases...
9/10/2004 1:21:24 PM - 4: Sending the challenge to the client.
9/10/2004 1:21:24 PM - 4: Sending the challenge message body to the client.
9/10/2004 1:21:24 PM - 4: Waiting for the client's response...
9/10/2004 1:21:26 PM - 4: Reviewing the client's response...
9/10/2004 1:21:26 PM - 4: Process the client's non-crypto login request.
9/10/2004 1:21:26 PM - 4: Enter Login()
9/10/2004 1:21:26 PM - 4: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:21:26 PM - 4: Client is speaking protocol version 2.0
9/10/2004 1:23:15 PM - Connection accepted from 192.168.139.10:19823 on local address 192.168.139.19:8080, session id is 5.
9/10/2004 1:23:15 PM - 5: Enter Authorized()
9/10/2004 1:23:19 PM - 5: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:23:19 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:23:19 PM - 5: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:23:19 PM - 5: Preparing to send the list of databases...
9/10/2004 1:23:19 PM - 5: Sending the challenge to the client.
9/10/2004 1:23:19 PM - 5: Sending the challenge message body to the client.
9/10/2004 1:23:19 PM - 5: Waiting for the client's response...
9/10/2004 1:23:20 PM - 5: Reviewing the client's response...
9/10/2004 1:23:20 PM - 5: Process the client's non-crypto login request.
9/10/2004 1:23:20 PM - 5: Enter Login()
9/10/2004 1:23:20 PM - 5: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:23:20 PM - 5: Client is speaking protocol version 2.0
9/10/2004 1:24:01 PM - Connection accepted from 192.168.139.10:20394 on local address 192.168.139.19:8080, session id is 6.
9/10/2004 1:24:01 PM - 6: Enter Authorized()
9/10/2004 1:24:06 PM - 6: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:24:06 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:24:06 PM - 6: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:24:06 PM - 6: Preparing to send the list of databases...
9/10/2004 1:24:06 PM - 6: Sending the challenge to the client.
9/10/2004 1:24:06 PM - 6: Sending the challenge message body to the client.
9/10/2004 1:24:06 PM - 6: Waiting for the client's response...
9/10/2004 1:24:07 PM - 6: Reviewing the client's response...
9/10/2004 1:24:07 PM - 6: Process the client's non-crypto login request.
9/10/2004 1:24:07 PM - 6: Enter Login()
9/10/2004 1:24:07 PM - 6: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:24:07 PM - 6: Client is speaking protocol version 2.0
It looks a lot like the issue described in the topic below http://support.sourcegear.com/viewtopic.php?t=1610
In this last case, someone was doing a big get, the connection to sourcesafe was dropped, and then no one was able to log in.
Here is the end of the log file:
9/10/2004 1:14:48 PM - 2: GetFileList project: $/Development/Angel Dev/Setup/AngelPortal/AngelRoot/Admin/Db/SQL/Queries/Angel55Updates
9/10/2004 1:14:48 PM - 1: Received message number 102.
9/10/2004 1:14:48 PM - 1: Enter GetFileList()
9/10/2004 1:14:48 PM - 1: Ignore remote dates = False
9/10/2004 1:14:48 PM - 1: GetFileList project: $/Development/ANGEL 6.1/ANGEL
9/10/2004 1:14:48 PM - Enter PinStatus()
9/10/2004 1:14:48 PM - Enter PinStatus()
9/10/2004 1:20:21 PM - Connection accepted from 192.168.139.10:18318 on local address 192.168.139.19:8080, session id is 3.
9/10/2004 1:20:21 PM - 3: Enter Authorized()
9/10/2004 1:20:26 PM - 3: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:20:26 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:20:26 PM - 3: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:20:26 PM - 3: Preparing to send the list of databases...
9/10/2004 1:20:26 PM - 3: Sending the challenge to the client.
9/10/2004 1:20:26 PM - 3: Sending the challenge message body to the client.
9/10/2004 1:20:26 PM - 3: Waiting for the client's response...
9/10/2004 1:20:26 PM - 3: Reviewing the client's response...
9/10/2004 1:20:26 PM - 3: Process the client's non-crypto login request.
9/10/2004 1:20:26 PM - 3: Enter Login()
9/10/2004 1:20:26 PM - 3: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:20:26 PM - 3: Client is speaking protocol version 2.0
9/10/2004 1:21:19 PM - Connection accepted from 192.168.139.10:18834 on local address 192.168.139.19:8080, session id is 4.
9/10/2004 1:21:19 PM - 4: Enter Authorized()
9/10/2004 1:21:24 PM - 4: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:21:24 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:21:24 PM - 4: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:21:24 PM - 4: Preparing to send the list of databases...
9/10/2004 1:21:24 PM - 4: Sending the challenge to the client.
9/10/2004 1:21:24 PM - 4: Sending the challenge message body to the client.
9/10/2004 1:21:24 PM - 4: Waiting for the client's response...
9/10/2004 1:21:26 PM - 4: Reviewing the client's response...
9/10/2004 1:21:26 PM - 4: Process the client's non-crypto login request.
9/10/2004 1:21:26 PM - 4: Enter Login()
9/10/2004 1:21:26 PM - 4: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:21:26 PM - 4: Client is speaking protocol version 2.0
9/10/2004 1:23:15 PM - Connection accepted from 192.168.139.10:19823 on local address 192.168.139.19:8080, session id is 5.
9/10/2004 1:23:15 PM - 5: Enter Authorized()
9/10/2004 1:23:19 PM - 5: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:23:19 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:23:19 PM - 5: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:23:19 PM - 5: Preparing to send the list of databases...
9/10/2004 1:23:19 PM - 5: Sending the challenge to the client.
9/10/2004 1:23:19 PM - 5: Sending the challenge message body to the client.
9/10/2004 1:23:19 PM - 5: Waiting for the client's response...
9/10/2004 1:23:20 PM - 5: Reviewing the client's response...
9/10/2004 1:23:20 PM - 5: Process the client's non-crypto login request.
9/10/2004 1:23:20 PM - 5: Enter Login()
9/10/2004 1:23:20 PM - 5: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:23:20 PM - 5: Client is speaking protocol version 2.0
9/10/2004 1:24:01 PM - Connection accepted from 192.168.139.10:20394 on local address 192.168.139.19:8080, session id is 6.
9/10/2004 1:24:01 PM - 6: Enter Authorized()
9/10/2004 1:24:06 PM - 6: Unable to get hostname from address: 192.168.139.10
9/10/2004 1:24:06 PM - The requested name is valid, but no data of the requested type was found
9/10/2004 1:24:06 PM - 6: Connection from: 192.168.139.10 (192.168.139.10)
9/10/2004 1:24:06 PM - 6: Preparing to send the list of databases...
9/10/2004 1:24:06 PM - 6: Sending the challenge to the client.
9/10/2004 1:24:06 PM - 6: Sending the challenge message body to the client.
9/10/2004 1:24:06 PM - 6: Waiting for the client's response...
9/10/2004 1:24:07 PM - 6: Reviewing the client's response...
9/10/2004 1:24:07 PM - 6: Process the client's non-crypto login request.
9/10/2004 1:24:07 PM - 6: Enter Login()
9/10/2004 1:24:07 PM - 6: User 'pmiller' requesting to login to database 'E:\VSS\Development\srcsafe.ini'
9/10/2004 1:24:07 PM - 6: Client is speaking protocol version 2.0