Vault conversion: praise and suggestions
Moderator: SourceGear
Vault conversion: praise and suggestions
My group is nearing completion of our conversion from SourceSafe to Vault. Overall, I would have to say we are very pleased with the product, and that you all have done a wonderful job with it. To give you an idea of the size of the source tree we are converting, our sgvault database is roughly 5gb at this point based on its SQL Server backup file size and we have about 50 users currently. We're a .NET shop that has source history going back 7 years or more.
We wound up doing the conversion in three phases, and we're nearing the end of the second phase. The first set of code had to convert all at once because it is compiled code and everything must be together for the build to work. That was roughly 30k files. With a *lot* of practice runs, and help from Jeremy S., we were able to get the conversion done with very few errors despite corruption in the SourceSafe database. Our second set of code is about three times larger than the first, but luckily it can be imported in pieces. I can see you cringing, but we are importing into the database while it is in production. I have two issues arising from this, described below. Unfortunately we had no choice but to do it this way due to the time the import was going to take.
Again, we are really impressed with your product, but I do have some suggestions to make it better:
1. I saw elsewhere on your forums a request to have the import not disable users. I agree completely! This is very important. When importing a sizeable code base, the reality as I see it is that you will have to do imports while people are working with the product. For our first phase, I kept doing practice imports while fellow developers were getting ready for the conversion by checking out the new features. Each practice import disabled the other developers, so I had to re-enable them one by one. For our second phase, as I said we couldn't see a way around importing into the live database, and re-enabling the users is tedious.
2. We are seeing directories temporarily become invisible in the repository that is being imported into. This gave us a big scare at first, but the workaround is to reconnect to the repository.
3. The shadow copy facility: please make this easier to configure. I like your identity switcher program, and in retrospect now that I know how to do it I could set it up again much quicker, but it was just tough. The biggest hurdle for me was to realize I had to use a special app pool for the shadow folder service that had the credentials the shadow was running under. I found this in the forums but only after some digging. If the shadow copy were part of the Vault server itself, rather than an external piece, would this be any simpler?
4. Shadow copies: the delay of 30 seconds or more is not working for us. It's often several minutes. This should be very close to real-time. We have a build server, and we've set up shadow copies onto the build server (which is not the Vault server, but for several reasons we want it to stay that way). After developers check things in, they kick off a build on the build server. The changes aren't making it to the build server on time, resulting in build errors (which are then emailed to the whole group). We'll wind up having our build server do a get from Vault each time the build is started, but that only works for the code-behind parts of our app. The .html and .aspx pages do not make it onto the development server right away, and there isn't an event we can hook (like a build) to fire off a get for these types of files.
As a general point, the shadow doesn't really seem useful to me if we can't depend on it being accurate at any given point in time. To resolve this, perhaps you could make the shadow copy multithreaded, so that Mary Megacheckin who uploads 300mb of files doesn't cause Larry Littlefiles to have to wait until Mary's files are copied. Each user could have their own thread. Also, does your shadow copy wake up periodically to check if changes have been made? If so, could I change its configuration so that it checks more often? If the shadow copy could be brought down to something like 5 second or less latency I think we could make use of it. We used shadows extensively in SourceSafe, and they seemed fast when I was evaluating the product, but now that it's in production this has become probably the biggest issue we have.
5. The "show merges" tool: in general we like it. Could I suggest that some of the steps be folded together, though? It seems like a lot of "next, next, next, next" that could be avoided. Perhaps on screens with nothing to do, such as "no renames", "no share/move operations", just don't show those steps. Also, I noticed inconsistent fonts & sizes through the wizard which make it look a little rough. It's minor, I know, but I was the guy who had to sell this to the rest of the group, who were used to years of moving pins around, and a slicker interface would have made things easier for me in that respect.
6. For "show merges", it would be extremely cool if we could select non-contiguous checkins to merge. It would be super-extremely cool if we could merge something that hasn't even been checked in yet but is in the pending checkins. We frequently make changes in the trunk and branch together, and it would be a major plus if the trunk and branch changes could hit the database in one operation.
7. Import tool: I realize what this tool is doing is outside the normal flow of Vault, but anything you can do to make this thing faster would be huge. There must be some parallelization you can exploit, or a way to index the Vault database so that what it does goes quicker. I spent a very significant amount of time running the import tool and finding ways to get it to finish what it needed to within a single weekend.
I haven't said anything yet about all the features we love now... CVS mode alongside VSS mode, the SHOW BLAME FEATURE (!!!!), the fact that the thing works at all, the speed of incremental gets, the email notifications... there are a lot of things we are getting spoiled by. Congratulations on a wonderful tool.
Jason Taylor
We wound up doing the conversion in three phases, and we're nearing the end of the second phase. The first set of code had to convert all at once because it is compiled code and everything must be together for the build to work. That was roughly 30k files. With a *lot* of practice runs, and help from Jeremy S., we were able to get the conversion done with very few errors despite corruption in the SourceSafe database. Our second set of code is about three times larger than the first, but luckily it can be imported in pieces. I can see you cringing, but we are importing into the database while it is in production. I have two issues arising from this, described below. Unfortunately we had no choice but to do it this way due to the time the import was going to take.
Again, we are really impressed with your product, but I do have some suggestions to make it better:
1. I saw elsewhere on your forums a request to have the import not disable users. I agree completely! This is very important. When importing a sizeable code base, the reality as I see it is that you will have to do imports while people are working with the product. For our first phase, I kept doing practice imports while fellow developers were getting ready for the conversion by checking out the new features. Each practice import disabled the other developers, so I had to re-enable them one by one. For our second phase, as I said we couldn't see a way around importing into the live database, and re-enabling the users is tedious.
2. We are seeing directories temporarily become invisible in the repository that is being imported into. This gave us a big scare at first, but the workaround is to reconnect to the repository.
3. The shadow copy facility: please make this easier to configure. I like your identity switcher program, and in retrospect now that I know how to do it I could set it up again much quicker, but it was just tough. The biggest hurdle for me was to realize I had to use a special app pool for the shadow folder service that had the credentials the shadow was running under. I found this in the forums but only after some digging. If the shadow copy were part of the Vault server itself, rather than an external piece, would this be any simpler?
4. Shadow copies: the delay of 30 seconds or more is not working for us. It's often several minutes. This should be very close to real-time. We have a build server, and we've set up shadow copies onto the build server (which is not the Vault server, but for several reasons we want it to stay that way). After developers check things in, they kick off a build on the build server. The changes aren't making it to the build server on time, resulting in build errors (which are then emailed to the whole group). We'll wind up having our build server do a get from Vault each time the build is started, but that only works for the code-behind parts of our app. The .html and .aspx pages do not make it onto the development server right away, and there isn't an event we can hook (like a build) to fire off a get for these types of files.
As a general point, the shadow doesn't really seem useful to me if we can't depend on it being accurate at any given point in time. To resolve this, perhaps you could make the shadow copy multithreaded, so that Mary Megacheckin who uploads 300mb of files doesn't cause Larry Littlefiles to have to wait until Mary's files are copied. Each user could have their own thread. Also, does your shadow copy wake up periodically to check if changes have been made? If so, could I change its configuration so that it checks more often? If the shadow copy could be brought down to something like 5 second or less latency I think we could make use of it. We used shadows extensively in SourceSafe, and they seemed fast when I was evaluating the product, but now that it's in production this has become probably the biggest issue we have.
5. The "show merges" tool: in general we like it. Could I suggest that some of the steps be folded together, though? It seems like a lot of "next, next, next, next" that could be avoided. Perhaps on screens with nothing to do, such as "no renames", "no share/move operations", just don't show those steps. Also, I noticed inconsistent fonts & sizes through the wizard which make it look a little rough. It's minor, I know, but I was the guy who had to sell this to the rest of the group, who were used to years of moving pins around, and a slicker interface would have made things easier for me in that respect.
6. For "show merges", it would be extremely cool if we could select non-contiguous checkins to merge. It would be super-extremely cool if we could merge something that hasn't even been checked in yet but is in the pending checkins. We frequently make changes in the trunk and branch together, and it would be a major plus if the trunk and branch changes could hit the database in one operation.
7. Import tool: I realize what this tool is doing is outside the normal flow of Vault, but anything you can do to make this thing faster would be huge. There must be some parallelization you can exploit, or a way to index the Vault database so that what it does goes quicker. I spent a very significant amount of time running the import tool and finding ways to get it to finish what it needed to within a single weekend.
I haven't said anything yet about all the features we love now... CVS mode alongside VSS mode, the SHOW BLAME FEATURE (!!!!), the fact that the thing works at all, the speed of incremental gets, the email notifications... there are a lot of things we are getting spoiled by. Congratulations on a wonderful tool.
Jason Taylor
Jason:
Thanks for the post. It is always nice to get positive, constructive criticism which can help improve the product.
Someone else may address the other issues, but I'd like to tackle Shadow Folders.
To answer your question, Vault Shadow Folders is a Vault Server plugin, so there is no configuration change you can make. In this model, once a transaction is committed to the server, Shadow Folders is notified immediately. In fact, if you turn on Shadow Folder Debug logging, I'm sure you would see Shadow Folders start times correspond nearly to the transaction's final commit time.
Also, in your example, although Mary started her transaction first, if Larry's transaction commits first, then Shadow Folders will first get Larry's changes, and then start on Mary's changes once she has finally committed her changes. Larry's changes are not waiting on Mary's.
Can you please answer a couple of questions about Shadow Folders?
- In your Vault Shadow Folder setup, are you using the mapping $/ of a repository or is it some sub folder?
- Once in a while, do you see Shadow Folders work quickly? For instance, if Mary checks in a file in one transaction, and then checks in a file in another transaction. After the first change has been shadowed is the subsequent transaction seen at a faster interval?
- How many shadows do you have configured? In how many repositories?
Thanks for the post. It is always nice to get positive, constructive criticism which can help improve the product.
Someone else may address the other issues, but I'd like to tackle Shadow Folders.
To answer your question, Vault Shadow Folders is a Vault Server plugin, so there is no configuration change you can make. In this model, once a transaction is committed to the server, Shadow Folders is notified immediately. In fact, if you turn on Shadow Folder Debug logging, I'm sure you would see Shadow Folders start times correspond nearly to the transaction's final commit time.
Also, in your example, although Mary started her transaction first, if Larry's transaction commits first, then Shadow Folders will first get Larry's changes, and then start on Mary's changes once she has finally committed her changes. Larry's changes are not waiting on Mary's.
Can you please answer a couple of questions about Shadow Folders?
- In your Vault Shadow Folder setup, are you using the mapping $/ of a repository or is it some sub folder?
- Once in a while, do you see Shadow Folders work quickly? For instance, if Mary checks in a file in one transaction, and then checks in a file in another transaction. After the first change has been shadowed is the subsequent transaction seen at a faster interval?
- How many shadows do you have configured? In how many repositories?
Jeff Clausius
SourceGear
SourceGear
We are shadowing a lot of files. We have two repositories, one with 10 shadows and one with three. The 10 shadow one is more active. The main trunk shadow, when viewed through Properties in the Vault client has a claimed 140mb tree size, disk space needed 280mb. The second repository's main trunk is 200 / 400. Neither repository is shadowing $/, but we do shadow a level or two down from that, such as trunk and all below, VersionXYZ and all below. Is there a quick way to get file counts through the client or admin tool?
Sometimes I do see the shadow copy happen right away. I hadn't noticed one way or the other the speed of several operations by the same user back to back as opposed to within a single operation.
Thanks for the suggestion to turn on the debug logging, I'll take a look at that.
Jason
Sometimes I do see the shadow copy happen right away. I hadn't noticed one way or the other the speed of several operations by the same user back to back as opposed to within a single operation.
Thanks for the suggestion to turn on the debug logging, I'll take a look at that.
Jason
Re: Vault conversion: praise and suggestions
As Jeff said, thanks for all the great feedback and constructive criticism. I'll respond to the non-shadow folder parts.
If I'm understanding correctly, this is already there, except the interface is not very good. In the screen where you map VSS users to Vault users, there is a checkbox for each user, and the text on the right says to check the users you want to remain active after the import. The checkbox should be in its own column to make it clear what it does. In any case, checking the checkbox will keep the user active after the import.jtaylor wrote:
1. I saw elsewhere on your forums a request to have the import not disable users. I agree completely! This is very important. When importing a sizeable code base, the reality as I see it is that you will have to do imports while people are working with the product. For our first phase, I kept doing practice imports while fellow developers were getting ready for the conversion by checking out the new features. Each practice import disabled the other developers, so I had to re-enable them one by one. For our second phase, as I said we couldn't see a way around importing into the live database, and re-enabling the users is tedious.
We've not had other reports of this, but then again no one else seems to do work at the same time as an import. At the same time, I understand why you want to do it that way. I'd be more concerned about this if it happens during other times, so let us know.2. We are seeing directories temporarily become invisible in the repository that is being imported into. This gave us a big scare at first, but the workaround is to reconnect to the repository.
For 4.0 (due out by end of this year), we are revamping the Merge Branches wizard into a separate application, that wil support things like non-continguous changes to merge together. I'll forward these requests.5. The "show merges" tool: in general we like it. Could I suggest that some of the steps be folded together, though? It seems like a lot of "next, next, next, next" that could be avoided. Perhaps on screens with nothing to do, such as "no renames", "no share/move operations", just don't show those steps. Also, I noticed inconsistent fonts & sizes through the wizard which make it look a little rough. It's minor, I know, but I was the guy who had to sell this to the rest of the group, who were used to years of moving pins around, and a slicker interface would have made things easier for me in that respect.
6. For "show merges", it would be extremely cool if we could select non-contiguous checkins to merge. It would be super-extremely cool if we could merge something that hasn't even been checked in yet but is in the pending checkins. We frequently make changes in the trunk and branch together, and it would be a major plus if the trunk and branch changes could hit the database in one operation.
Yes, we know the import tool is frustrating and slow, and we hope to find ways to improve it. But, moving that much data out of VSS and into Vault is inherently time consuming, and luckily it is one-time operation.
7. Import tool: I realize what this tool is doing is outside the normal flow of Vault, but anything you can do to make this thing faster would be huge. There must be some parallelization you can exploit, or a way to index the Vault database so that what it does goes quicker. I spent a very significant amount of time running the import tool and finding ways to get it to finish what it needed to within a single weekend.
Thanks Dan for your comments. Regarding #1, the deactivation of users, we do use the checkboxes during the import, but it still takes time. We have about 50 active users, but many more that should stay inactive. Any employee that has ever contributed to the code base shows up in the list because there are checkins made with their login. Over seven years that adds up to a lot of names. Complicating matters, we recently changed domain name policies from first name last initial to first initial last name, and because we want to use the AD authentication, all Vault users are using that scheme as their login name. Therefore, every active user has a corresponding inactive account. We have to be accurate with the activation because we don't own enough licenses to cover 'em all.
Jason
Jason
Jason,
It's too late for you to try this now, and it sounds like it may have been a real pain for you anyway, but...
When I did my import from VSS to Vault I took advantage of the ability to map a VSS user to a different Vault user to unify my old and new VSS users to be the same new account. It made sense to me to do that since those multiple VSS accounts really represented the same person. Of course I didn't have 50 accounts to map!
I agree it would be a very useful enhancement to the VSS Import tool for it to automatically check that enable button for any existing active user.
Mike
It's too late for you to try this now, and it sounds like it may have been a real pain for you anyway, but...
When I did my import from VSS to Vault I took advantage of the ability to map a VSS user to a different Vault user to unify my old and new VSS users to be the same new account. It made sense to me to do that since those multiple VSS accounts really represented the same person. Of course I didn't have 50 accounts to map!
I agree it would be a very useful enhancement to the VSS Import tool for it to automatically check that enable button for any existing active user.
Mike
Jason:
I think I understand what is happening, but I'll need some evidence to prove my theory.
Can I ask you to place your Vault Server in Daily - Debug mode? What I'd like to do is collect some data (a week or two) of Vault server activity to see how the Vault Shadow Folder client is refreshing its cache files.
I think I understand what is happening, but I'll need some evidence to prove my theory.
Can I ask you to place your Vault Server in Daily - Debug mode? What I'd like to do is collect some data (a week or two) of Vault server activity to see how the Vault Shadow Folder client is refreshing its cache files.
Jeff Clausius
SourceGear
SourceGear
Re: Vault conversion: praise and suggestions
Jason, I hope you don't mind me chipping in...jtaylor wrote: 4. Shadow copies: the delay of 30 seconds or more is not working for us. It's often several minutes. This should be very close to real-time. We have a build server, and we've set up shadow copies onto the build server (which is not the Vault server, but for several reasons we want it to stay that way). After developers check things in, they kick off a build on the build server. The changes aren't making it to the build server on time, resulting in build errors (which are then emailed to the whole group). We'll wind up having our build server do a get from Vault each time the build is started, but that only works for the code-behind parts of our app. The .html and .aspx pages do not make it onto the development server right away, and there isn't an event we can hook (like a build) to fire off a get for these types of files.
Jason Taylor
Maybe you are already on to this, but what you describe sound like continuous integration. I suggest you investigate CI tools such as CruiseControl.NET (there are others as well, but CC.NET is the one we use). So rather than relying on shadow folders and developers manually kicking off builds, our build server 'polls' Vault for changes and kicks off a build. Yes, it does mean that the build will have to 'get' the files, but if you use 'working directories' Vault client should be smart enough to get only changed files.
Hope this helps.