Mutex Timout with Linux, v3.16

This forum is now locked, since Gold Support is no longer offered.

Moderator: SourceGear

James Jeffers
Posts: 95
Joined: Mon Aug 29, 2005 12:39 pm

Mutex Timout with Linux, v3.16

Post by James Jeffers » Tue Jan 24, 2006 7:35 am

Using:
Mono 1.1.12.1
Vault 3.1.6
<vault>
<error>
Wait timeout for mutex after 30000 milliseconds.
</error>
<exception>
System.Exception: Wait timeout for mutex after 30000 milliseconds.
in <0x00168> VaultLib.SystemMutex:Take (UInt32 ms)
in <0x00016> VaultClientOperationsLib.CacheMember:TakeSystemMutex ()
in <0x0004b> VaultClientOperationsLib.CacheMember_Repository:.ctor (System.String folder)
in <0x00120> VaultClientOperationsLib.TreeCache:.ctor (Int32 repID, System.String username, System.String uniqueRepositoryID, System.String localStoreBasePath, VaultClientOperationsLib.ClientInstance ci)
in <0x000f8> VaultClientOperationsLib.ClientInstance:SetActiveRepositoryID (Int32 id, System.String username, System.String uniqueRepositoryID, Boolean doRefresh, Boolean updateKnownChangesAll)
in <0x00650> VaultCmdLineClient.VaultCmdLineClient:Login (Boolean bAllowAuto, Boolean bSaveSession)
in <0x0000e> VaultCmdLineClient.VaultCmdLineClient:Login ()
in <0x000c2> VaultCmdLineClient.VaultCmdLineClient:ProcessCommandCheckout (System.Collections.ArrayList strItemArray)
in <0x009ec> VaultCmdLineClient.VaultCmdLineClient:ProcessCommand (VaultCmdLineClient.Args curArg)
</exception>
<result success="no" />
</vault>
I've tried to remove any system mutexes with ipcs/ipcrm with no success.

Is this a Mono problem?

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Wed Jan 25, 2006 9:42 am

Can you describe the circumstances a bit more?

For example, does this happen the first time? Or is it within a shell script? Are there other users logged on to the machine runing Vault?

Any other information about the situation?
Jeff Clausius
SourceGear

James Jeffers
Posts: 95
Joined: Mon Aug 29, 2005 12:39 pm

Post by James Jeffers » Wed Jan 25, 2006 9:46 am

When you say "system" do you mean using Mono/Vault or using the OS in general?

This mutex timeout appears during checkouts and gets. I can still do a "listworkingfolders" or a "help".

Removing the Linux OS system mutexes via ipcrm doesn't solve the problem.

James Jeffers
Posts: 95
Joined: Mon Aug 29, 2005 12:39 pm

Post by James Jeffers » Wed Jan 25, 2006 9:47 am

More: This happens EVERY time with a get or checkout of a file.

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Wed Jan 25, 2006 10:09 am

James Jeffers wrote:When you say "system" do you mean using Mono/Vault or using the OS in general?
I mean Mono/Vault. Basically, I'm trying to determine if someone else is also creating the same named mutex.

What distribution/version of Linux are you using? In our testing, Ubuntu (5.0.4) and Fedora Core 4 did not exhibit this behavior.
Jeff Clausius
SourceGear

James Jeffers
Posts: 95
Joined: Mon Aug 29, 2005 12:39 pm

Post by James Jeffers » Wed Jan 25, 2006 10:12 am

We're using RedHat Enterprise Linux v3.

There is only 1 user logged into Vault.

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Wed Jan 25, 2006 10:23 am

Can you try a short history query? Does that exhibit the behavior as well?
Jeff Clausius
SourceGear

James Jeffers
Posts: 95
Joined: Mon Aug 29, 2005 12:39 pm

Post by James Jeffers » Wed Jan 25, 2006 10:31 am

using history:
<vault>
<error>
Wait timeout for mutex after 30000 milliseconds.
</error>
<exception>
System.Exception: Wait timeout for mutex after 30000 milliseconds.
in <0x00168> VaultLib.SystemMutex:Take (UInt32 ms)
in <0x00016> VaultClientOperationsLib.CacheMember:TakeSystemMutex ()
in <0x0004b> VaultClientOperationsLib.CacheMember_Repository:.ctor (System.String folder)
in <0x00120> VaultClientOperationsLib.TreeCache:.ctor (Int32 repID, System.String username, System.String uniqueRepositoryID, System.String localStoreBasePath, VaultClientOperationsLib.ClientInstance ci)
in <0x000f8> VaultClientOperationsLib.ClientInstance:SetActiveRepositoryID (Int32 id, System.String username, System.String uniqueRepositoryID, Boolean doRefresh, Boolean updateKnownChangesAll)
in <0x00650> VaultCmdLineClient.VaultCmdLineClient:Login (Boolean bAllowAuto, Boolean bSaveSession)
in <0x0000e> VaultCmdLineClient.VaultCmdLineClient:Login ()
in <0x002ba> VaultCmdLineClient.VaultCmdLineClient:ProcessCommandHistory (System.String strReposPath)
in <0x01d9d> VaultCmdLineClient.VaultCmdLineClient:ProcessCommand (VaultCmdLineClient.Args curArg)
</exception>
<result success="no" />
</vault>

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Wed Jan 25, 2006 10:42 am

OK. Let's do a couple of things.

1) We need to free up the mutex. I'm trying to find what daemon Mono uses to store system mutexes. I don't normally like to give this advice, but a system reboot would probably restart the daemon. Is that out of the question?

2) Discover why the mutex is not being released. Once we have solved step 1, we need to discover why the repository's mutex is not being released. Once step 1 has cleared, can you try one get/history and then try a second get/history. Does the mutex error appear on the second attempt?
Jeff Clausius
SourceGear

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Thu Jan 26, 2006 7:27 am

I spent some time last night trying to recreate the problem without any success.

Did something happen within the Vault client prior to receiving this error? Something like one of the previous vault.exe runs was killed or failed in some way?

Also, I hope to have an answer about the named mutex daemon soon.
Jeff Clausius
SourceGear

James Jeffers
Posts: 95
Joined: Mon Aug 29, 2005 12:39 pm

Post by James Jeffers » Thu Jan 26, 2006 7:35 am

A previous Vault/mono process may have been terminated with a SIGINT.

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Thu Jan 26, 2006 8:44 am

I thought something like that may be the case. Off the top of my head, I don't know how SIGINT is handled by the Mono stack. It might also explain why the named mutex is left in an owned state.

Do you happen to remember why the process was stopped?
Jeff Clausius
SourceGear

James Jeffers
Posts: 95
Joined: Mon Aug 29, 2005 12:39 pm

Post by James Jeffers » Wed Feb 01, 2006 7:31 am

Could be that someone didn't want to wait the 30+ seconds for the Vault operation to finish. I can't think of any other reason a SIGINT would be issued.

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Mon Feb 06, 2006 11:38 am

I have a response from the Mono team -
Dick Porter of Ximian wrote:Named mutexes are released by a mono process when it exits normally. If a process crashes or is killed it will leave named mutexes locked, but a new process will clean up the old named mutex when it tries to open it again (there is a 60 second window where the old named mutex will still be considered "alive" though.)

I've just realised there is a bug here, in that if a process is already
waiting for the named mutex when the process holding it terminates
prematurely, then the wait will not finish. I'll look at that now.
- Dick
James in your case, was the subsequent Vault command after the 60 second window? Also, is it possible the behavior you are seeing is the same as described by Dick?
Jeff Clausius
SourceGear

James Jeffers
Posts: 95
Joined: Mon Aug 29, 2005 12:39 pm

Post by James Jeffers » Mon Feb 06, 2006 3:41 pm

Jeff,

I've seen this several DAYS after killing the process.

Locked