Editor entering non-english symbols
Moderator: SourceGear
Editor entering non-english symbols
Diffmerge has one old problem in the editor.
I have OS with 2 languages (English and Russian).
Diffmerge enters normally english symbols only. After switching to Russian input language DiffMerge ignores all cyrillic character symbols. Can you help me?
I have OS with 2 languages (English and Russian).
Diffmerge enters normally english symbols only. After switching to Russian input language DiffMerge ignores all cyrillic character symbols. Can you help me?
-
- Posts: 534
- Joined: Tue Jun 05, 2007 11:37 am
- Location: SourceGear
- Contact:
Re: Editor entering non-english symbols
What OS and version do you have ?
What version of DiffMerge do you have ?
thanks
What version of DiffMerge do you have ?
thanks
Re: Editor entering non-english symbols
I have the same problem.
Russian UTF-8 characters are not displayed properly.
Here the screenshot - http://goo.gl/tAhzK
This screenshot shows the correct Russian characters (in other program) - http://goo.gl/vjXZA .
I have Mac OS X 10.7.5 and DiffMerge 3.3.2 (1139) [x86].
Russian UTF-8 characters are not displayed properly.
Here the screenshot - http://goo.gl/tAhzK
This screenshot shows the correct Russian characters (in other program) - http://goo.gl/vjXZA .
I have Mac OS X 10.7.5 and DiffMerge 3.3.2 (1139) [x86].
-
- Posts: 534
- Joined: Tue Jun 05, 2007 11:37 am
- Location: SourceGear
- Contact:
Re: Editor entering non-english symbols
Could you check the status bar and see what character encoding DiffMerge
selected for each file? In the right-most field, it should say something like
"UTF-8(BOM)" or it may have 2 encodings with a ":" between them.
Do these files have byte-order-marks (BOM) in them ?
What are your Ruleset settings for these types of files ?
(See the Preferences dialog / Rulesets.)
Does it help if you switch from "System Local/Default Encoding"
to a specific "Named Character Encoding" ?
selected for each file? In the right-most field, it should say something like
"UTF-8(BOM)" or it may have 2 encodings with a ":" between them.
Do these files have byte-order-marks (BOM) in them ?
What are your Ruleset settings for these types of files ?
(See the Preferences dialog / Rulesets.)
Does it help if you switch from "System Local/Default Encoding"
to a specific "Named Character Encoding" ?
Re: Editor entering non-english symbols
The original encoding of Russian files is UTF-8. But these files do not have BOM signature because it often causes errors on websites.jeffhostetler wrote:Could you check the status bar and see what character encoding DiffMerge
selected for each file? In the right-most field, it should say something like
"UTF-8(BOM)" or it may have 2 encodings with a ":" between them.
Do these files have byte-order-marks (BOM) in them ?
In the right-most field of status bar I can only see the text "default" without caption.
Initially, it was the "default" ruleset. I tried to change it to "UTF-8 Text Files", but nothing changes.jeffhostetler wrote:What are your Ruleset settings for these types of files ?
(See the Preferences dialog / Rulesets.)
Setting are:
[ ] Search for Unicode BOM => NOT checked (this check box does not affect the display of the Russian text).
Fallback Character Encoding Options: (*) Use Named Character Encoding Below (CHECKED) => Unicode 8 bit (UTF-8)
No, switching rulesets or encoding settings do not affect the appearance of the Russian text.jeffhostetler wrote:Does it help if you switch from "System Local/Default Encoding"
to a specific "Named Character Encoding" ?
-
- Posts: 534
- Joined: Tue Jun 05, 2007 11:37 am
- Location: SourceGear
- Contact:
Re: Editor entering non-english symbols
I'm not sure why it isn't working for you.
Could you send me a zip or tar file with those 2 files ?
Either post it here or email it to me at jeffh at sourcegear.com.
It would help if you also included the contents of the
Support Dialog (available from the About Dialog).
That will show me all of the preference settings and
info on the open files.
Thanks.
Could you send me a zip or tar file with those 2 files ?
Either post it here or email it to me at jeffh at sourcegear.com.
It would help if you also included the contents of the
Support Dialog (available from the About Dialog).
That will show me all of the preference settings and
info on the open files.
Thanks.
Re: Editor entering non-english symbols
I'm sorry, the problem goes away by itself ... Maybe it was necessary to restart the program, that I did not do after playing with the settings in the last time.
Now when I run the program it prompts me to set the rulesets for each of the compared files. At the same time the preview window at the bottom shows the wrong Russian characters (for example "\xd0\x90\xd0\xb4\xd1\x80\xd0\xb5\xd1\x81"). But after the files have been opened, everything shows correctly!
Here is contents of my "Support dialog" windows - https://www.dropbox.com/s/np6n061yarx7j ... t_info.txt
Thank you for help!
Now when I run the program it prompts me to set the rulesets for each of the compared files. At the same time the preview window at the bottom shows the wrong Russian characters (for example "\xd0\x90\xd0\xb4\xd1\x80\xd0\xb5\xd1\x81"). But after the files have been opened, everything shows correctly!
Here is contents of my "Support dialog" windows - https://www.dropbox.com/s/np6n061yarx7j ... t_info.txt
Thank you for help!
-
- Posts: 534
- Joined: Tue Jun 05, 2007 11:37 am
- Location: SourceGear
- Contact:
Re: Editor entering non-english symbols
The \xd0... characters are the raw UTF-8 multi-byte sequences. That dialog... for example "\xd0\x90\xd0\xb4\xd1\x80\xd0\xb5\xd1\x81") ...
shows the first few raw bytes of the file and is asking for help to figure out
what the encoding is, so it doesn't yet know that \xd0\x90 is a Cyrillic capital A
or just 2 bytes from a random code page.
WRT it asking you for each file, that setting can be changed. Before the
"default" Ruleset was set to assume system default encoding (which is usually
Latin-1 or a code page). You currently have it set to "ask for each file" (which
gives you the most flexibility, but can be annoying). You can also set it to "use
the specific encoding named below" and then force it to UTF-8 in the bottom
combo-box. Take a look at your settings on the "UTF-8 Text Files" Ruleset.
If you wanted, you could add "ini" as a suffix to the "UTF-8 Text Files" Ruleset
and use it rather than the "default" Ruleset.
Let us know if you have any problems getting it to work for you.
Re: Editor entering non-english symbols
Sorry, I was absent a long time.jeffhostetler wrote:What OS and version do you have ?
What version of DiffMerge do you have ?
thanks
Now, I have Windows 7 x64, but I had the problem in Windows XP also.
DiffMerge Version 3.3.2 (1139) [x64]
Codepage: WINDOWS-1251
I can use copy&paste, but it's very uncomfortably.
-
- Posts: 534
- Joined: Tue Jun 05, 2007 11:37 am
- Location: SourceGear
- Contact:
Re: Editor entering non-english symbols
I think you're having a different issue than "deeprus" is/was having.
I think his files were UTF-8 without BOM's.
You mentioned CP1251. Are your files CP1251 rather than UTF-8?
If you have CP-based files, look at the Options dialog and go to the
corresponding Ruleset and select the "Character Encodings" page.
Try changing the "Fallback Character Encoding Options" to
"Use Named Character Encoding Below" and at the bottom, select "CP 1251"
(the exact content and spelling of the drop-down varies by platform).
And see if that helps.
I think his files were UTF-8 without BOM's.
You mentioned CP1251. Are your files CP1251 rather than UTF-8?
If you have CP-based files, look at the Options dialog and go to the
corresponding Ruleset and select the "Character Encodings" page.
Try changing the "Fallback Character Encoding Options" to
"Use Named Character Encoding Below" and at the bottom, select "CP 1251"
(the exact content and spelling of the drop-down varies by platform).
And see if that helps.
Re: Editor entering non-english symbols
jeffhostetler wrote:I think you're having a different issue than "deeprus" is/was having.
I think his files were UTF-8 without BOM's.
You mentioned CP1251. Are your files CP1251 rather than UTF-8?
If you have CP-based files, look at the Options dialog and go to the
corresponding Ruleset and select the "Character Encodings" page.
Try changing the "Fallback Character Encoding Options" to
"Use Named Character Encoding Below" and at the bottom, select "CP 1251"
(the exact content and spelling of the drop-down varies by platform).
And see if that helps.
Hi!
I have CP1251 files. Fallback Character Encoding Options is CP 1251.
I don't have any problems with the display of russian characters. I cannot enter Russian symbols. If I press key with Russian symbols then DiffMerge ignores its.
I can do video. Do you need it?