Correct wrong occurrences in a string - c#

Buggy control provides in text something like this:
{{\field{\*\fldinst{HYPERLINK http://yandex.ru }}{\fldrslt{http://yandex.ru\ul0\cf0}}}}\f0\fs24
but correct version is:
{{\field{\*\fldinst{HYPERLINK http://yandex.ru }}{\fldrslt{\ul\cf1 http://yandex.ru}}}}\f0\fs24
I'm really newbie in regex and other text tools, so I don't know how replace all occurrences with correct variant in righteous way. We can't rewrite control logic now, there is more WinAPI code.
Platform is .NET Framework 2.0

Well, basically regular expression you've generated is ok, as it does in job and find all occurences like {http://yandex.com\ul0\cf0}.
If I have understood your goal correctly - the only transformation you need in each capture group - is transform {http://yandex.com\ul0\cf0} to {\ul\cf1 http://yandex.com}.
This can be done easily with Regex.Replace override having MatchEvaluator as argument.
For example, something like this (note, it is not most elegant solution, rather it is "quick and dirty"):
var result = Regex.Replace(source_Text, regex_pattern,
x => x.Groups[0].Value.Replace(#"\ul0\cf0", "").Replace("{", #"{\ul\cf1 "));

Related

Equivalent of Substring as a RegularExpression

Ok, I need a general regular expression that will give me the x characters from a string starting at position y like the string's substring function:
input_str.Substring(y,x)
But as a C# regular expression.
Example:
1234567890 Substring(5,3) 678
I know you are thinking why not just use the Substring function? The short answer is because this goes as a data for an existing function and in this context it would be inelegant to create a whole separate data parsing mechanism. We'd like to get this working without changing the code.
I feel like this is really obvious--but I'm pretty inexperienced with regular expressions. Thanks in advance for any help.
.{y}(.{x}).* should do it, I think, then just pull out the capture group.

Can I add a regular expression into a .Net Assertion?

I'm trying to pull out page source from a set of pages and run an assertion on the results, this is a Test that runs to check that we are crawling specific pages in our site. Sometimes the results come back with a different case for the URL string, I'd like to account for that in the Assertion where I am checking page source. This is probably the wrong way to do this but I was wondering if there is a way to add in the .Net regex commands to the Assertion text. I have this as an assertion:
Assert.IsTrue(driver.PageSource.Contains("/explore"));
But is there a way to be sure that I can capture explore, Explore or EXPLORE? I though I could use (?i) here but that doesn't seem to work. I'm more used to Perl and it's regex capabilities but with C# and .Net I'm a little lost on where I can and can't use the inline regex commands.
Anthonys answer is valid, you don't really need regex. But if you do want to use it, you can use
Regex.IsMatch(driver.PageSource, "/explore", RegexOptions.IgnoreCase)
You don't need a regular expression to perform a case-insensitive check. Use IndexOf and compare that the result is greater than -1. IndexOf has overloads that allow you to specify if casing matters. Something like
bool containsExplore = driver.PageSource.IndexOf("/explore", StringComparison.InvariantCultureIgnoreCase) > -1;
Assert.IsTrue(containsExplore);
Try:
RegEx.Match("string", "regexp", RegExOptions.IgnoreCase).Success
How about using
StringAssert.Matches(string, regex);
In your case, that would translate to
StringAssert.Matches("drive.PageSource", "\/explore");

RegEx replace with calculations?

Is it possible somehow to do a RegEx-replace with a calculation in the result? (in VS2010)
Such as:
Grid\.Row\=\"{[0-9]+}\"
to
Grid.Row="eval(int(\1) + 1)"
You can use a MatchEvaluator do achieve this, like
String s = Regex.Replace("1239", #"\d", m => (Int32.Parse(m.ToString()) + 1).ToString());
Output: 23410
Edit:
I just noticed... if you mean "using the VS2010 find-replace feature" and not "using C#", then the answer is "no", i am afraid.
You could always use capturing to retrieve any values you need for your calculation and then perform a RegEx Replace with a new RegEx that's constructed from you're equation and any values you captured.
If the equation doesn't use anything from the input text, one RegEx would be sufficient. You'd simply construct it by concatenating the static portions together with the computed value(s).
Unfortunately, C# and .NET do not provide an eval method or equivalent. However, it is possible to either use a library for expression parsing (a quick google gave me this .NET Math Expression Parser) or write your own (which is actually pretty easy, check out the Shunting-yard Algorithm and Postfix Notation). Simply capture the group then output the group value to the library/method you have written.
Edit: I see now you want this for the VS2010 program. This is unachievable unless you write your own VS extension. You could always write a program to search and replace your code and feed the code into it, then replace it the original code with its output.

Conditional Regex Replace in C# without MatchEvaluator

So, Im trying to make a program to rename some files. For the most part, I want them to look like this,
[Testing]StupidName - 2[720p].mkv
But, I would like to be able to change the format, if so desired. If I use MatchEvaluators, you would have to recompile every time. Thats why I don't want to use the MatchEvaluator.
The problem I have is that I don't know how, or if its possible, to tell Replace that if a group was found, include this string. The only syntax for this I have ever seen was something like (?<group>:data), but I can't get this to work. Well if anyone has an idea, im all for it.
EDIT:
Current Capture Regexes =
^(\[(?<FanSub>[^\]\)\}]+)\])?[. _]*(?<SeriesTitle>[\w. ]*?)[. _]*\-[. _]*(?<EpisodeNumber>\d+)[. _]*(\-[. _]*(?<EpisodeName>[\w. ]*?)[. _]*)?([\[\(\{](?<MiscInfo>[^\]\)\}]*)[\]\)\}][. _]*)*[\w. ]*(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*[Ss](?<SeasonNumber>\d+)[Ee](?<EpisodeNumber>\d+).*?(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*(?<SeasonNumber>\d)(?<EpisodeNumber>\d{2}).*?(?<Extension>\.[a-zA-Z]+)$
Current Replace Regex = [${FanSub}]${SeriesTitle} - ${EpisodeNumber} [${MiscInfo}]${Extension}
Using Regex.Replace, the file TestFile 101.mkv, I get []TestFile - 1[].mkv. What I want to do is make it so that [] is only included if the group FanSub or MiscInfo was found.
I can solve this with a MatchEvaluator because I actually get to compile a function. But this would not be a easy solution for users of the program. The only other idea I have to solve this is to actually make my own Regex.Replace function that accepts special syntax.
It sounds like you want to be able to specify an arbitrary format dynamically rather than hard-code it into your code.
Perhaps one solution is to break your filename parts into specific groups then pass in a replacement pattern that takes advantage of those group names. This would give you the ability to pass in different replacement patterns which return the desired filename structure using the Regex.Replace method.
Since you didn't explain the categories of your filename I came up with some random groups to demonstrate. Here's a quick example:
string input = "Testing StupidName Number2 720p.mkv";
string pattern = #"^(?<Category>\w+)\s+(?<Name>.+?)\s+Number(?<Number>\d+)\s+(?<Resolution>\d+p)(?<Extension>\.mkv)$";
string[] replacePatterns =
{
"[${Category}]${Name} - ${Number}[${Resolution}]${Extension}",
"${Category} - ${Name} - ${Number} - ${Resolution}${Extension}",
"(${Number}) - [${Resolution}] ${Name} [${Category}]${Extension}"
};
foreach (string replacePattern in replacePatterns)
{
Console.WriteLine(Regex.Replace(input, pattern, replacePattern));
}
As shown in the sample, named groups in the pattern, specified as (?<Name>pattern), are referred to in the replacement pattern by ${Name}.
With this approach you would need to know the group names beforehand and pass these in to rearrange the pattern as needed.

Regex index in matching string where the match failed

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?
For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.
Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.
Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:
The tokenizer would produce the following tokens given the following input = "INP :x:":
Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL
These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E
if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}
I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.
I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:
Get the Regex class source code
(e.g.
http://www.codeplex.com/NetMassDownloader
to download the .Net source).
Change the code to have a readonly
property with the failure index.
Make sure your code uses that Regex
rather than Microsoft's.
I guess such an index would only have meaning in some simple case, like in your example.
If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about?
It will depend on the algorithm used for mathcing...
Could fail on "abbbc..." or on "abbbcbbc..."
I don't believe it's possible, but I am intrigued why you would want it.
In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.
It is not possible to be able to tell where a regex fails. as a result you need to take a different approach. You need to compare strings. Use a regex to remove all the things that could vary and compare it with the string that you know it does not change.
I run into the same problem came up to your answer and had to work out my own solution. Here it is:
https://stackoverflow.com/a/11730035/637142
hope it helps

Categories