Diff shows garbage for T-SQL scripts (Vault v2.0.6)
Moderator: SourceGear
Diff shows garbage for T-SQL scripts (Vault v2.0.6)
We store SQL scripts in Vault formatted in plain text. The scripts are perfectly viewable in notepad, but when we try to Diff between two versions, all we get is garbage characters.
Has anyone run into this? We need this to work!
Thanks,
Eric
Has anyone run into this? We need this to work!
Thanks,
Eric
You'll need to upgrade to Vault 3.0 or use a different Diff Tool which supports Unicode. Unicode support was added to Vault 3.0.x's Diff/Merge tool. The 2.0.6 Diff/Merge only works with ASCII files.
See
DiffMerge bug? SQL files looks like binary garbage for more information.
See
DiffMerge bug? SQL files looks like binary garbage for more information.
Jeff Clausius
SourceGear
SourceGear
Possibly some confusion here about what Unicode means
I suspect that Unicode is being confused with UCS-2 here.
Perhaps what you meant was not really Unicode, but rather either UCS-2 or UTF-16? Unicode is a character set, and a family of related standards including encodings. UCS-2 and UTF-16 are particular encodings of Unicode (as is UTF-8).
UCS-2 is the older, Unicode-2 (I think?) only, two-byte encoding that Microsoft originally used for NT 4.
UTF-16 is a newer, Unicode-3+ variable-byte encoding that I think Microsoft has to fall back to, to extend UCS-2.
UTF-8 is a new and extremely popular encoding of Unicode which is multibyte and perfectly compatible with ASCII. It is widely used on the Internet and World Wide Web.
Neither UCS-2 nor UTF-16 is compatible with ASCII.
NB: Many people have trouble understanding Unicode because much if not most of Microsoft's documentation is quite wrong on this subject.
When using UTF-8 (which, as I mentioned is widely popular across the world, both in the World Wide Web and elsewhere), applications such as Notepad will have no difficulty at all with files using ASCII characters only .
Perhaps what you meant was not really Unicode, but rather either UCS-2 or UTF-16? Unicode is a character set, and a family of related standards including encodings. UCS-2 and UTF-16 are particular encodings of Unicode (as is UTF-8).
UCS-2 is the older, Unicode-2 (I think?) only, two-byte encoding that Microsoft originally used for NT 4.
UTF-16 is a newer, Unicode-3+ variable-byte encoding that I think Microsoft has to fall back to, to extend UCS-2.
UTF-8 is a new and extremely popular encoding of Unicode which is multibyte and perfectly compatible with ASCII. It is widely used on the Internet and World Wide Web.
Neither UCS-2 nor UTF-16 is compatible with ASCII.
NB: Many people have trouble understanding Unicode because much if not most of Microsoft's documentation is quite wrong on this subject.
When using UTF-8 (which, as I mentioned is widely popular across the world, both in the World Wide Web and elsewhere), applications such as Notepad will have no difficulty at all with files using ASCII characters only .
Sorry, my posting was a bit ambiguous.
When in reference to T-SQL files, I did intend to use the term Unicode, but I should have been a little more specific. I was referring to the International text - Unicode option. This option will result in a Unicode character set using a multi-byte format, which caused problems in the older Diff/Merge which was originally designed for one-byte character files.
In SQL Server, upon generating the SQL scripts, there is an options tab. For those options, you can define the character set. The three options are:
- MS-DOS text (OEM)
- Windows text (ANSI)
- International text (Unicode).
I was referring to the third option.
As for Diff/Merge in Vault 3.0, support has been added for many different character sets besides ANSI, especially one of the Unicode character sets used to create the files within SQL Server tools. The additional character sets used by Diff/Merge can be seen in the Vault 3.0 client's iconv directory.
Sorry for the confusion.
When in reference to T-SQL files, I did intend to use the term Unicode, but I should have been a little more specific. I was referring to the International text - Unicode option. This option will result in a Unicode character set using a multi-byte format, which caused problems in the older Diff/Merge which was originally designed for one-byte character files.
In SQL Server, upon generating the SQL scripts, there is an options tab. For those options, you can define the character set. The three options are:
- MS-DOS text (OEM)
- Windows text (ANSI)
- International text (Unicode).
I was referring to the third option.
As for Diff/Merge in Vault 3.0, support has been added for many different character sets besides ANSI, especially one of the Unicode character sets used to create the files within SQL Server tools. The additional character sets used by Diff/Merge can be seen in the Vault 3.0 client's iconv directory.
Sorry for the confusion.
Jeff Clausius
SourceGear
SourceGear
A different kind of garbage but still not useable
I recently upgraded to Vault 3.0 in order to cope with this problem of diff/merge on unicode sql scripts. To the best of my knowledge I am looking at scripts created using query analyzer with the Unicode option as specified elsewhere in this post.
When I run the diff tool I see the attached. I was hoping I would not need to go in and change the format on each of the files I have but will do so if needed.
Attached is what I am seeing.
When I run the diff tool I see the attached. I was hoping I would not need to go in and change the format on each of the files I have but will do so if needed.
Attached is what I am seeing.
- Attachments
-
- SourceGearDiff.gif (9.3 KiB) Viewed 9513 times
Character encodings
As presently configured
ANSI = cp1252,latin1
UNICODE = utf-8,iso-10646-ucs-2
The radio button for ANSI is selected and that control is input capable the other control is protected.
ANSI = cp1252,latin1
UNICODE = utf-8,iso-10646-ucs-2
The radio button for ANSI is selected and that control is input capable the other control is protected.
Toggling the ANSI/UNICODE setting
Yes indeed selecting the UNICODE button now causes the content to display properly. How do I interpret the use of this control. If everything displays OK nothing to worry about and if I am having a problem similar to what I saw then try to change the selection? Or can I just leave it on UNICODE with the hope that the ANSI content displays without a problem?
The control is used to treat the files as if they are ANSI or use a different character set.
I believe ASCII files ( 7-bit characters ) map directly into UTF-8, so as long as the file is encoded in this manner things will work correctly given UTF-8 is available in the UNICODE section of Diff/Merge's options.
I believe ASCII files ( 7-bit characters ) map directly into UTF-8, so as long as the file is encoded in this manner things will work correctly given UTF-8 is available in the UNICODE section of Diff/Merge's options.
Jeff Clausius
SourceGear
SourceGear