DiffMerge and Unicode files

dfurman · Post by **dfurman** » Thu Feb 08, 2007 2:11 pm

Using Vault 3.5.1 (4786).

We are starting to use Vault to manage T-SQL source code. We have scripted a number of SQL Server 2000 database objects using Enterprise Manager, using "International Text (Unicode)" option (a requirement for us). This created Unicode files with BOM, using little-endian byte order. These were checked in, and all worked well.

Now we want to use the DiffMerge tool. The Character Encoding option in DiffMerge tool is set to Unicode, and it works just fine showing differences between two Unicode files. The problem is with Merge. When Merge is started for the first time, all looks good and we can merge changes. Then we save the merged file, and close the Merge window. However, when DiffMerge saves the merged file, it uses a different Unicode flavor than the one of the original files. In particular, there is no BOM, and the byte order is big-endian. If we now invoke Merge again on the same file, two things may happen: the file will be displayed incorrectly (one character per line), or DiffMerge will stop responding and bring CPU utilization to 100%.

The only workaround we found so far is to open a file saved by DiffMerge in a text editor such as Textpad and save it with BOM using little-endian byte order. Then DiffMerge will happily work with the file, until it saves it again.

--
Dimitri Furman

Beth · Post by **Beth** » Thu Feb 08, 2007 5:05 pm

This one is going to take a little extra digging. Would you be willing to send us the file in it's various stages, such as the 2 versions being merged, the merged file, and then the saved file that suddenly has different encoding? Those files or further discussion on sending those files can go to support at sourcegear.com (attn: Beth) with a reference to this thread.

Beth · Post by **Beth** » Wed Feb 21, 2007 9:59 am

It was decided that this should be logged as a bug. It will be reviewed for fixing in a future release. When using the default iso-10646-ucs-2
encoding we don't put a BOM in the file when saving.

As a work-around suggestion, you could try using "utf-16" in the unicode settings (before or in place of the "iso-10646-ucs-2") that will output a file *with* a BOM -- but unfortunately it will be in big-endian order.