Vault client changing 8-bit characters
Moderator: SourceGear
Vault client changing 8-bit characters
I am using Vault Client 4.1.3, and I have added a number of third-party source files (*.c) to Vault. These files contain characters > 127 -- for example, the character value 0xa9 is used (in a comment) the produce a copyright symbol.
When files with these 8-bit characters are added or checked-in to Vault, it modifies these characters (in the example I'm using, my .c file has a 0xa9 character, which, when checked-in to Vault, is changed to 0xef,0xbf,0xbd.
This seems to be some kind of encoding problem (ie, my source file uses an ANSI encoding but Vault seems to force a UTF-8 interpretation upon the file), and I don't see any option to configure Vault or change the file encoding.
This is a problem because now the files have changed in length and don't compare exactly to the original. I want Vault to keep the character encoding I gave it. I could change the file property to binary, but (a) there is no easy way to change the file property for hundreds of files, and (b) changing text (source) files to binary prevents Vault from being able to automatically handle merging for me, and I don't want to lose this.
Am I missing something, or is this a limitation in Vault? (or has in been fixed in 5.x?)
When files with these 8-bit characters are added or checked-in to Vault, it modifies these characters (in the example I'm using, my .c file has a 0xa9 character, which, when checked-in to Vault, is changed to 0xef,0xbf,0xbd.
This seems to be some kind of encoding problem (ie, my source file uses an ANSI encoding but Vault seems to force a UTF-8 interpretation upon the file), and I don't see any option to configure Vault or change the file encoding.
This is a problem because now the files have changed in length and don't compare exactly to the original. I want Vault to keep the character encoding I gave it. I could change the file property to binary, but (a) there is no easy way to change the file property for hundreds of files, and (b) changing text (source) files to binary prevents Vault from being able to automatically handle merging for me, and I don't want to lose this.
Am I missing something, or is this a limitation in Vault? (or has in been fixed in 5.x?)
Re: Vault client changing 8-bit characters
I did a little more research, the characters the Vault client is inserting -- 0xEF 0xBF 0xBD -- is the UTF-8 encoding for the Unicode replacement character, which reinforces the idea that Vault is interpreting my file as UTF-8.
The release notes for 4.1.4 have the following comment:
* Encoding was accidentally changed when EOL conversion was applied
Hopefully this is the bug I'm referring to and it was fixed in 4.1.4?
The release notes for 4.1.4 have the following comment:
* Encoding was accidentally changed when EOL conversion was applied
Hopefully this is the bug I'm referring to and it was fixed in 4.1.4?
Re: Vault client changing 8-bit characters
Do you use keyword expansion at all? If so, do you use them on your .c files?
Beth Kieler
SourceGear Technical Support
SourceGear Technical Support
Re: Vault client changing 8-bit characters
We used to, but I actually just disabled keyword expansion a few days ago for other reasons. So when I repro'd this, it was with keyword expansion disabled.
Re: Vault client changing 8-bit characters
I ran a few tests just using text files saved with different encodings and have attached my results. The contents of the files are just copyright symbols. I thought maybe the keywords would cause your switch, but I can't make it happen. Also, Vault doesn't appear to be changing the file encoding. This is with the most recent version though.
Could try a similar test on your installation to see if you get different results?
1) Create a .txt file in notepad, add some copyright symbols, and save it in the ANSI encoding.
2) Add that file to Vault.
3) View the file using a binary editor. Capture what that looks like then close the file.
4) Check out and edit the file.
5) Edit the file and check it in.
6) View the file using a binary editor. If it looks any different, then capture what that looks like then close the file. Send your results.
7) If the file didn't look different, then turn keyword expansion on.
8_) Check out and edit the file.
9) Edit the file by adding in keywords and check it in.
10) View the file using a binary editor. If it looks any different, then capture what that looks like then close the file. Send your results.
If you don't want to post results here, then send an email to support at sourcegear.com (attn: Beth) with a link to this forum thread and your results.
Could try a similar test on your installation to see if you get different results?
1) Create a .txt file in notepad, add some copyright symbols, and save it in the ANSI encoding.
2) Add that file to Vault.
3) View the file using a binary editor. Capture what that looks like then close the file.
4) Check out and edit the file.
5) Edit the file and check it in.
6) View the file using a binary editor. If it looks any different, then capture what that looks like then close the file. Send your results.
7) If the file didn't look different, then turn keyword expansion on.
8_) Check out and edit the file.
9) Edit the file by adding in keywords and check it in.
10) View the file using a binary editor. If it looks any different, then capture what that looks like then close the file. Send your results.
If you don't want to post results here, then send an email to support at sourcegear.com (attn: Beth) with a link to this forum thread and your results.
- Attachments
-
- Comparing Files with different encodings.doc
- (185 KiB) Downloaded 182 times
Beth Kieler
SourceGear Technical Support
SourceGear Technical Support
Re: Vault client changing 8-bit characters
I actually did something pretty similar -- I took a new file, with ANSI encoding -- 7-bit ASCII except a single 0xA9 character, and without any type of BOM mark at the beginning of the file. I then added that file to Vault (v4.1.3 client/server) and then when I did a get of that file, Vault had converted that 0xA9 character into the 0xEF 0xBF 0xBD Unicode replacement character. I have keyword expansion disabled on the repository.
Could you do a little research on the bug that was fixed in 4.1.4 -- "* Encoding was accidentally changed when EOL conversion was applied" -- would this be causing my problem? I just don't want to have to upgrade right now unless I know that it'll fix the problem I'm seeing.
Could you do a little research on the bug that was fixed in 4.1.4 -- "* Encoding was accidentally changed when EOL conversion was applied" -- would this be causing my problem? I just don't want to have to upgrade right now unless I know that it'll fix the problem I'm seeing.
Re: Vault client changing 8-bit characters
This issue is fixed in Vault 4.1.4, so upgrading will solve it.
There is a work-around that might work for you as well. If you go to your Vault Tools- Options - Local Files, there is a setting called "Override Native EOL Type." You can try changing that setting to "Do not override. This won't change what's already been changed on disk, but if you haven't checked in changes with UTF-8, then clearing out your working folder and performing a Get will get you to the state you want to be in.
There is a work-around that might work for you as well. If you go to your Vault Tools- Options - Local Files, there is a setting called "Override Native EOL Type." You can try changing that setting to "Do not override. This won't change what's already been changed on disk, but if you haven't checked in changes with UTF-8, then clearing out your working folder and performing a Get will get you to the state you want to be in.
Beth Kieler
SourceGear Technical Support
SourceGear Technical Support