I am using DiffMerge and have created a custom ruleset for "Content Handling" to try and get DiffMerge to ignore certain differences on each line being compared. My goal is to ignore the line numbers present on each line (the first 8 bytes of each line), i.e., classify these differences as "unimportant."
The screenshot labeled "default_ruleset" uses the default DiffMerge ruleset, and it comes out as I expect.
The screenshot labeled "my_ruleset" uses a custom ruleset, and it comes out different than what I expect. (I expect the first 8 bytes to be black, and then the values after each equal sign to be red.) The custom ruleset is:
start pattern=^
end pattern=\d{8}
I would expect that the start pattern of ^ would match the "start of line" and then the end pattern of \d{8} would match the eight digits in positions 1 through 8, and that after the end pattern, the highlighting of differences would continue again.
I have my DiffMerge options set to show "important" differences as red text and "unimportant" differences as black text. I think the regex I have supplied is fairly standard, as regexs go, so I'm confused about why it isn't working.
Any suggestions? Thanks.
What regular expression (regex) features are supported?
Moderator: SourceGear
-
- Posts: 6
- Joined: Mon Jan 28, 2008 11:07 pm
What regular expression (regex) features are supported?
- Attachments
-
- Default_ruleset
- default_ruleset.JPG (38.37 KiB) Viewed 7723 times
-
- My_ruleset
- my_ruleset.JPG (39.34 KiB) Viewed 7723 times
-
- Posts: 534
- Joined: Tue Jun 05, 2007 11:37 am
- Location: SourceGear
- Contact:
There's a problem with doing this.
Your understanding of RegEx's is correct. But there's a problem
with doing this.
I'm using the regex to search within each line for patterns such
as quotes and /* */ sequences and then alternate the tagging of
context as "important" and "unimportant". So to allow for things
like multiple quoted strings on a single line (and sequences spanning
multiple lines) to be handled consistently, I need to (effectively)
treat the doc as one long line and alternate searches for the start
regex and the end regex.
So, putting a simple ^ as the start pattern causes the second
match to start immediately after the end of the first. (don't feel
bad, I had to track this down using the debugger.)
I should put a warning in the program that using ^ isn't going to
generate the intended result - or update the alternate searching
stuff to handle a leading ^ (like the ends-at-eol stuff).
I should also point out that even if we do get that behavior working
as expected, DiffMerge will still use the data in the unimportant
columns in matching up lines. If what you're really wanting is a way
to compare 2 source files where the lines have been renumbered, it
isn't going to vertically line up as expected. Completely ignoring
various columns is a feature that has been requested.
Sorry,
jeff
with doing this.
I'm using the regex to search within each line for patterns such
as quotes and /* */ sequences and then alternate the tagging of
context as "important" and "unimportant". So to allow for things
like multiple quoted strings on a single line (and sequences spanning
multiple lines) to be handled consistently, I need to (effectively)
treat the doc as one long line and alternate searches for the start
regex and the end regex.
So, putting a simple ^ as the start pattern causes the second
match to start immediately after the end of the first. (don't feel
bad, I had to track this down using the debugger.)
I should put a warning in the program that using ^ isn't going to
generate the intended result - or update the alternate searching
stuff to handle a leading ^ (like the ends-at-eol stuff).
I should also point out that even if we do get that behavior working
as expected, DiffMerge will still use the data in the unimportant
columns in matching up lines. If what you're really wanting is a way
to compare 2 source files where the lines have been renumbered, it
isn't going to vertically line up as expected. Completely ignoring
various columns is a feature that has been requested.
Sorry,
jeff
-
- Posts: 6
- Joined: Mon Jan 28, 2008 11:07 pm
Thanks for the explanation Jeff.
To help me understand, does your algorithm work like what I have below or differently?
The file content is:
11111111 a = aaaaaa;\n22222222 b = bbbbbb;\n33333333 c = cccccc;\n
Search patterns are:
start=^ end=\d{8}
The numbers in the "search_sequence" image mean:
1=start and end of "start pattern"
2=start of "end pattern"
3=end of "end pattern"
Thanks again for your help.
To help me understand, does your algorithm work like what I have below or differently?
The file content is:
11111111 a = aaaaaa;\n22222222 b = bbbbbb;\n33333333 c = cccccc;\n
Search patterns are:
start=^ end=\d{8}
The numbers in the "search_sequence" image mean:
1=start and end of "start pattern"
2=start of "end pattern"
3=end of "end pattern"
Thanks again for your help.
- Attachments
-
- search_sequence.JPG (62.97 KiB) Viewed 7689 times
-
- Posts: 534
- Joined: Tue Jun 05, 2007 11:37 am
- Location: SourceGear
- Contact:
It's a little more involved.
It's a little more involved. your start regex will match the "left edge"
of the 11111111 and the end regex will match the 11111111. then the
start pattern will match the "left edge" of the space following the
11111111. then the end pattern will match the 22222222 or the EOL
if you have that set. and so on.
WAIT! I just figured out how to do it!
[1] Create a "Literal" (Important) with start regex ^[^\d] no end regex
and ends-at-eol.
[2] Create a "Comment" (Unimportant) with start regex ^ and end
regex \d{8}. (end-at-eol doesn't matter, but i'd set it just in case).
make sure that [1] is first in the list in the ruleset.
this worked in my simple 3 line file as in your initial example.
it assumes that all lines in the file will begin with line numbers.
it'll silently hide any lines that don't begin with line numbers, so
be careful.
hope this helps,
jeff
of the 11111111 and the end regex will match the 11111111. then the
start pattern will match the "left edge" of the space following the
11111111. then the end pattern will match the 22222222 or the EOL
if you have that set. and so on.
WAIT! I just figured out how to do it!
[1] Create a "Literal" (Important) with start regex ^[^\d] no end regex
and ends-at-eol.
[2] Create a "Comment" (Unimportant) with start regex ^ and end
regex \d{8}. (end-at-eol doesn't matter, but i'd set it just in case).
make sure that [1] is first in the list in the ruleset.
this worked in my simple 3 line file as in your initial example.
it assumes that all lines in the file will begin with line numbers.
it'll silently hide any lines that don't begin with line numbers, so
be careful.
hope this helps,
jeff
-
- Posts: 6
- Joined: Mon Jan 28, 2008 11:07 pm