Vault client changing 8-bit characters

If you are having a problem using Vault, post a message here.

Moderator: SourceGear

Post Reply
stevek2
Posts: 70
Joined: Wed Jun 23, 2004 5:53 pm

Vault client changing 8-bit characters

Post by stevek2 » Thu Jan 21, 2010 6:20 pm

I am using Vault Client 4.1.3, and I have added a number of third-party source files (*.c) to Vault. These files contain characters > 127 -- for example, the character value 0xa9 is used (in a comment) the produce a copyright symbol.

When files with these 8-bit characters are added or checked-in to Vault, it modifies these characters (in the example I'm using, my .c file has a 0xa9 character, which, when checked-in to Vault, is changed to 0xef,0xbf,0xbd.

This seems to be some kind of encoding problem (ie, my source file uses an ANSI encoding but Vault seems to force a UTF-8 interpretation upon the file), and I don't see any option to configure Vault or change the file encoding.

This is a problem because now the files have changed in length and don't compare exactly to the original. I want Vault to keep the character encoding I gave it. I could change the file property to binary, but (a) there is no easy way to change the file property for hundreds of files, and (b) changing text (source) files to binary prevents Vault from being able to automatically handle merging for me, and I don't want to lose this.

Am I missing something, or is this a limitation in Vault? (or has in been fixed in 5.x?)

stevek2
Posts: 70
Joined: Wed Jun 23, 2004 5:53 pm

Re: Vault client changing 8-bit characters

Post by stevek2 » Fri Jan 22, 2010 2:50 pm

I did a little more research, the characters the Vault client is inserting -- 0xEF 0xBF 0xBD -- is the UTF-8 encoding for the Unicode replacement character, which reinforces the idea that Vault is interpreting my file as UTF-8.

The release notes for 4.1.4 have the following comment:

* Encoding was accidentally changed when EOL conversion was applied

Hopefully this is the bug I'm referring to and it was fixed in 4.1.4?

Beth
Posts: 8550
Joined: Wed Jun 21, 2006 8:24 pm
Location: SourceGear
Contact:

Re: Vault client changing 8-bit characters

Post by Beth » Fri Jan 22, 2010 3:34 pm

Do you use keyword expansion at all? If so, do you use them on your .c files?
Beth Kieler
SourceGear Technical Support

stevek2
Posts: 70
Joined: Wed Jun 23, 2004 5:53 pm

Re: Vault client changing 8-bit characters

Post by stevek2 » Fri Jan 22, 2010 3:36 pm

We used to, but I actually just disabled keyword expansion a few days ago for other reasons. So when I repro'd this, it was with keyword expansion disabled.

Beth
Posts: 8550
Joined: Wed Jun 21, 2006 8:24 pm
Location: SourceGear
Contact:

Re: Vault client changing 8-bit characters

Post by Beth » Tue Jan 26, 2010 9:41 am

I ran a few tests just using text files saved with different encodings and have attached my results. The contents of the files are just copyright symbols. I thought maybe the keywords would cause your switch, but I can't make it happen. Also, Vault doesn't appear to be changing the file encoding. This is with the most recent version though.

Could try a similar test on your installation to see if you get different results?
1) Create a .txt file in notepad, add some copyright symbols, and save it in the ANSI encoding.
2) Add that file to Vault.
3) View the file using a binary editor. Capture what that looks like then close the file.
4) Check out and edit the file.
5) Edit the file and check it in.
6) View the file using a binary editor. If it looks any different, then capture what that looks like then close the file. Send your results.
7) If the file didn't look different, then turn keyword expansion on.
8_) Check out and edit the file.
9) Edit the file by adding in keywords and check it in.
10) View the file using a binary editor. If it looks any different, then capture what that looks like then close the file. Send your results.

If you don't want to post results here, then send an email to support at sourcegear.com (attn: Beth) with a link to this forum thread and your results.
Attachments
Comparing Files with different encodings.doc
(185 KiB) Downloaded 179 times
Beth Kieler
SourceGear Technical Support

stevek2
Posts: 70
Joined: Wed Jun 23, 2004 5:53 pm

Re: Vault client changing 8-bit characters

Post by stevek2 » Tue Jan 26, 2010 2:03 pm

I actually did something pretty similar -- I took a new file, with ANSI encoding -- 7-bit ASCII except a single 0xA9 character, and without any type of BOM mark at the beginning of the file. I then added that file to Vault (v4.1.3 client/server) and then when I did a get of that file, Vault had converted that 0xA9 character into the 0xEF 0xBF 0xBD Unicode replacement character. I have keyword expansion disabled on the repository.

Could you do a little research on the bug that was fixed in 4.1.4 -- "* Encoding was accidentally changed when EOL conversion was applied" -- would this be causing my problem? I just don't want to have to upgrade right now unless I know that it'll fix the problem I'm seeing.

Beth
Posts: 8550
Joined: Wed Jun 21, 2006 8:24 pm
Location: SourceGear
Contact:

Re: Vault client changing 8-bit characters

Post by Beth » Wed Jan 27, 2010 8:57 am

This issue is fixed in Vault 4.1.4, so upgrading will solve it.

There is a work-around that might work for you as well. If you go to your Vault Tools- Options - Local Files, there is a setting called "Override Native EOL Type." You can try changing that setting to "Do not override. This won't change what's already been changed on disk, but if you haven't checked in changes with UTF-8, then clearing out your working folder and performing a Get will get you to the state you want to be in.
Beth Kieler
SourceGear Technical Support

Post Reply