Diff shows garbage for T-SQL scripts (Vault v2.0.6)

elyna · Post by **elyna** » Wed Dec 22, 2004 7:20 am

We store SQL scripts in Vault formatted in plain text. The scripts are perfectly viewable in notepad, but when we try to Diff between two versions, all we get is garbage characters.

Has anyone run into this? We need this to work!

Thanks,
Eric

Post by **jclausius** » Wed Dec 22, 2004 7:42 am

You'll need to upgrade to Vault 3.0 or use a different Diff Tool which supports Unicode. Unicode support was added to Vault 3.0.x's Diff/Merge tool. The 2.0.6 Diff/Merge only works with ASCII files.

See
DiffMerge bug? SQL files looks like binary garbage for more information.

Perry · Post by **Perry** » Thu Dec 23, 2004 3:58 pm

I suspect that Unicode is being confused with UCS-2 here.

Perhaps what you meant was not really Unicode, but rather either UCS-2 or UTF-16? Unicode is a character set, and a family of related standards including encodings. UCS-2 and UTF-16 are particular encodings of Unicode (as is UTF-8).

UCS-2 is the older, Unicode-2 (I think?) only, two-byte encoding that Microsoft originally used for NT 4.

UTF-16 is a newer, Unicode-3+ variable-byte encoding that I think Microsoft has to fall back to, to extend UCS-2.

UTF-8 is a new and extremely popular encoding of Unicode which is multibyte and perfectly compatible with ASCII. It is widely used on the Internet and World Wide Web.

Neither UCS-2 nor UTF-16 is compatible with ASCII.

NB: Many people have trouble understanding Unicode because much if not most of Microsoft's documentation is quite wrong on this subject.

When using UTF-8 (which, as I mentioned is widely popular across the world, both in the World Wide Web and elsewhere), applications such as Notepad will have no difficulty at all with files using ASCII characters only .

Post by **jclausius** » Thu Dec 23, 2004 9:33 pm

Sorry, my posting was a bit ambiguous.

When in reference to T-SQL files, I did intend to use the term Unicode, but I should have been a little more specific. I was referring to the International text - Unicode option. This option will result in a Unicode character set using a multi-byte format, which caused problems in the older Diff/Merge which was originally designed for one-byte character files.

In SQL Server, upon generating the SQL scripts, there is an options tab. For those options, you can define the character set. The three options are:
- MS-DOS text (OEM)
- Windows text (ANSI)
- International text (Unicode).

I was referring to the third option.

As for Diff/Merge in Vault 3.0, support has been added for many different character sets besides ANSI, especially one of the Unicode character sets used to create the files within SQL Server tools. The additional character sets used by Diff/Merge can be seen in the Vault 3.0 client's iconv directory.

Sorry for the confusion.

George · Post by **George** » Mon Mar 14, 2005 9:47 am

I recently upgraded to Vault 3.0 in order to cope with this problem of diff/merge on unicode sql scripts. To the best of my knowledge I am looking at scripts created using query analyzer with the Unicode option as specified elsewhere in this post.

When I run the diff tool I see the attached. I was hoping I would not need to go in and change the format on each of the files I have but will do so if needed.

Attached is what I am seeing.

Post by **jclausius** » Mon Mar 14, 2005 10:10 am

What is the setting for your Character Encoding (Tools -> Options -> Character Encodings)?

George · Post by **George** » Mon Mar 14, 2005 8:06 pm

As presently configured

ANSI = cp1252,latin1
UNICODE = utf-8,iso-10646-ucs-2

The radio button for ANSI is selected and that control is input capable the other control is protected.

Post by **jclausius** » Mon Mar 14, 2005 9:01 pm

Switch the control to UNICODE, and hit OK. Does that solve the problem for these files?

George · Post by **George** » Tue Mar 15, 2005 2:31 pm

Yes indeed selecting the UNICODE button now causes the content to display properly. How do I interpret the use of this control. If everything displays OK nothing to worry about and if I am having a problem similar to what I saw then try to change the selection? Or can I just leave it on UNICODE with the hope that the ANSI content displays without a problem?

Post by **jclausius** » Tue Mar 15, 2005 2:48 pm

The control is used to treat the files as if they are ANSI or use a different character set.

I believe ASCII files ( 7-bit characters ) map directly into UTF-8, so as long as the file is encoded in this manner things will work correctly given UTF-8 is available in the UNICODE section of Diff/Merge's options.

SourceGear Support

Diff shows garbage for T-SQL scripts (Vault v2.0.6)

Diff shows garbage for T-SQL scripts (Vault v2.0.6)

Possibly some confusion here about what Unicode means

A different kind of garbage but still not useable

Character encodings

Toggling the ANSI/UNICODE setting