Diff shows garbage for T-SQL scripts (Vault v2.0.6)

If you are having a problem using Vault, post a message here.

Moderator: SourceGear

Post Reply
elyna
Posts: 4
Joined: Thu Nov 11, 2004 9:40 am

Diff shows garbage for T-SQL scripts (Vault v2.0.6)

Post by elyna » Wed Dec 22, 2004 7:20 am

We store SQL scripts in Vault formatted in plain text. The scripts are perfectly viewable in notepad, but when we try to Diff between two versions, all we get is garbage characters.

Has anyone run into this? We need this to work!

Thanks,
Eric

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Wed Dec 22, 2004 7:42 am

You'll need to upgrade to Vault 3.0 or use a different Diff Tool which supports Unicode. Unicode support was added to Vault 3.0.x's Diff/Merge tool. The 2.0.6 Diff/Merge only works with ASCII files.

See
DiffMerge bug? SQL files looks like binary garbage
for more information.
Jeff Clausius
SourceGear

Perry

Possibly some confusion here about what Unicode means

Post by Perry » Thu Dec 23, 2004 3:58 pm

I suspect that Unicode is being confused with UCS-2 here.

Perhaps what you meant was not really Unicode, but rather either UCS-2 or UTF-16? Unicode is a character set, and a family of related standards including encodings. UCS-2 and UTF-16 are particular encodings of Unicode (as is UTF-8).

UCS-2 is the older, Unicode-2 (I think?) only, two-byte encoding that Microsoft originally used for NT 4.

UTF-16 is a newer, Unicode-3+ variable-byte encoding that I think Microsoft has to fall back to, to extend UCS-2.

UTF-8 is a new and extremely popular encoding of Unicode which is multibyte and perfectly compatible with ASCII. It is widely used on the Internet and World Wide Web.

Neither UCS-2 nor UTF-16 is compatible with ASCII.


NB: Many people have trouble understanding Unicode because much if not most of Microsoft's documentation is quite wrong on this subject.


When using UTF-8 (which, as I mentioned is widely popular across the world, both in the World Wide Web and elsewhere), applications such as Notepad will have no difficulty at all with files using ASCII characters only .

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Thu Dec 23, 2004 9:33 pm

Sorry, my posting was a bit ambiguous.

When in reference to T-SQL files, I did intend to use the term Unicode, but I should have been a little more specific. I was referring to the International text - Unicode option. This option will result in a Unicode character set using a multi-byte format, which caused problems in the older Diff/Merge which was originally designed for one-byte character files.

In SQL Server, upon generating the SQL scripts, there is an options tab. For those options, you can define the character set. The three options are:
- MS-DOS text (OEM)
- Windows text (ANSI)
- International text (Unicode).

I was referring to the third option.


As for Diff/Merge in Vault 3.0, support has been added for many different character sets besides ANSI, especially one of the Unicode character sets used to create the files within SQL Server tools. The additional character sets used by Diff/Merge can be seen in the Vault 3.0 client's iconv directory.

Sorry for the confusion.
Jeff Clausius
SourceGear

George

A different kind of garbage but still not useable

Post by George » Mon Mar 14, 2005 9:47 am

I recently upgraded to Vault 3.0 in order to cope with this problem of diff/merge on unicode sql scripts. To the best of my knowledge I am looking at scripts created using query analyzer with the Unicode option as specified elsewhere in this post.

When I run the diff tool I see the attached. I was hoping I would not need to go in and change the format on each of the files I have but will do so if needed.

Attached is what I am seeing.
Attachments
SourceGearDiff.gif
SourceGearDiff.gif (9.3 KiB) Viewed 9508 times

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Mon Mar 14, 2005 10:10 am

What is the setting for your Character Encoding (Tools -> Options -> Character Encodings)?
Jeff Clausius
SourceGear

George

Character encodings

Post by George » Mon Mar 14, 2005 8:06 pm

As presently configured

ANSI = cp1252,latin1
UNICODE = utf-8,iso-10646-ucs-2

The radio button for ANSI is selected and that control is input capable the other control is protected.

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Mon Mar 14, 2005 9:01 pm

Switch the control to UNICODE, and hit OK. Does that solve the problem for these files?
Jeff Clausius
SourceGear

George

Toggling the ANSI/UNICODE setting

Post by George » Tue Mar 15, 2005 2:31 pm

Yes indeed selecting the UNICODE button now causes the content to display properly. How do I interpret the use of this control. If everything displays OK nothing to worry about and if I am having a problem similar to what I saw then try to change the selection? Or can I just leave it on UNICODE with the hope that the ANSI content displays without a problem?

jclausius
Posts: 3706
Joined: Tue Dec 16, 2003 1:17 pm
Location: SourceGear
Contact:

Post by jclausius » Tue Mar 15, 2005 2:48 pm

The control is used to treat the files as if they are ANSI or use a different character set.

I believe ASCII files ( 7-bit characters ) map directly into UTF-8, so as long as the file is encoded in this manner things will work correctly given UTF-8 is available in the UNICODE section of Diff/Merge's options.
Jeff Clausius
SourceGear

Post Reply