How do I do the following using only regex? - c#

Say I have the following string
[id={somecomplexuniquestring}test1],
[id={somecomplexuniquestring}test2],[id={somecomplexuniquestring}test3],
[id={somecomplexuniquestring}test4],[id={somecomplexuniquestring}test5],
[id={somecomplexuniquestring}test6],[id={somecomplexuniquestring}test7],
[id={somecomplexuniquestring}test8],[id={somecomplexuniquestring}test9]
is there a way just using regex to get the following result [id={somecomplexuniquestring}test6]
{somecomplexuniquestring} are unknown strings which cannot be used in the regex.
For example, the following will not work #"[id=[\s\S]+?test6]" as it starts from the very first id.

Is using RegEx the best solution? You have tagged C#, so would
variableWithString.Split(",").Any(x => x.Contains("test6"));
give you the exists match, or
result = variableWithString.Split(",").Where(x => x.Contains("test6"));
give you the match value you are seeking?

This doesn't work??
\[id={.*?}test6\]

This all depends on exactly what the limitations of somecomplexuniquestring are. For example, if you have a guarantee that they do not contain any [ or ] characters, you can use this simple one:
"\[[^\[\]]*test6\]"
Similarly, if it could contain square brackets but no curly braces, you can do something similar:
"\[id={[^{}]*}test6\]"
HOWEVER, if you have no such guarantee, and there's some sort of escaping system for including {} or [] in that string, then you need to let us know how that works to properly answer.

You can use this pattern:
#"\[[^]]*]"
If you want a specific test number you can do this:
#"\[id={[^}]*}test6]"

Related

Using Regex on encoded strings

I have the following regex:
#"{0}(.+?)(?:{1}(.{4}?))*(?:{2}(.+?))?{3}", "\\[\\[\\[", "\\|\\|\\|", "\\/\\/\\/", "\\]\\]\\]
To find items wrapped in [[[something]]], [[[something///comment]]].
I am using this to parse something on a web response ...
The problem is that in my web response I have a few things encoded as follows:
%5B%5B%5BPedido%20de%20Informa%C3%A7%C3%A3o%5D%5D%5D
So I am not able to identify that it starts with [[[ and finish with ]]] along with the other items.
Is there a way to solve this on the regex side?
You can unescape this string with helper functions like:
Uri.UnescapeDataString("%5B%5B%5BPedido%20de%20Informa%C3%A7%C3%A3o%5D%5D%5D");
will produce:
"[[[Pedido de Informação]]]"
Note: There is also HttpUtility.UrlDecode but required adding reference to System.Web which is not always wanted.
If unescaping the string is not an option, you can use a Noncapturing Group (?:...) and an Alternation Construct | to allow %5B alternatively to [ (same for %5D and ]).
For example, \\[\\[\\[ could be replaced by (?:\\[\\[\\[|%5B%5B%5B). Adapting the complete regex is left as an exercise to the reader.
Note, however, that this will also match [[[...%5D%5D%5D, which might or might not be a problem in your case.

Regex to check whether "and,or,not,and not" in a word?

I have a seneario where i have to check a word contains "and,or,not,and not" but the regex which i have created fails. Can any body provide me the correct regex for this?
The regex which i have created is like this
Regex objAlphaPattern = new Regex(#"^[not|and not|and|not]");
if(objAlphaPattern.IsMatch(searchTerm))
{
//// code
}
But it always returns true.
I have tried the word "Pen and Pencil" and "Pen Pencil" but both returning true.. Can anybody help in providing correct regex?
You're starting with a begin anchor. If you don't want to only check if it happens at the beginning of the string then you shouldn't have the ^.
Also, you are using [] when you should be using (). Actually in this case you don't even need ().
[] indicates a character class. You just don't need that.
Regex objAlphaPattern = new Regex("\b(and|not)\b");
if(objAlphaPattern.IsMatch(searchTerm))
{
//// code
}
That should do the job.
I highly recommend The Regex Coach to help you build regex.
I also highly recommend http://www.regular-expressions.info/ as a reference.
EDIT:
I feel I should point out you don't really even need the object instance.
if(System.Text.RegularExpressions.Regex.IsMatch(searchTerm, "\b(and|not)\b"))
{
//// code
}
You can just use the static method.
That's a very good point Tim:
"\band\b|\bnot\b"
Another very good point stema:
"\b(and|not)\b"
try
(not)|(and not)|(and)
instead
Your regular expression is wrong, it should be (and|not). There is no need to check for and not either, since it will fail at the first and.
You can use an online tool to check your regular expressions; such as http://regexpal.com/?flags=&regex=(and|not)&input=Pen%20and%20Pencil

Does a .NET function exist that lets you pass in a string containing Regular Expressions and then return the possible matches?

Before I look into doing this myself, is there already a function out there that will do this:
I want to pass in a string containing text and RegEx markup and then it returns all possible matches it would look for in a string.
so the following passed to the method
abc|def|xyz
Would return 3 strings in an array or collection:
abc
def
xyz
Because that regex notation says to look for either abc, def or xyz.
I don't want this to search for the term in another string or anything like that, just return the possible matches it could make.
That's a simple example, anything that will do this for me, or shall I start writing the method myself?
With a simple regex as per your example it will work, but as soon as you start dealing with wild cards and repetition, it will have to generate almost an infinite amount of possible solutions, which in some cases may never even terminate.
No, it does not :)

Conditional Regex Replace in C# without MatchEvaluator

So, Im trying to make a program to rename some files. For the most part, I want them to look like this,
[Testing]StupidName - 2[720p].mkv
But, I would like to be able to change the format, if so desired. If I use MatchEvaluators, you would have to recompile every time. Thats why I don't want to use the MatchEvaluator.
The problem I have is that I don't know how, or if its possible, to tell Replace that if a group was found, include this string. The only syntax for this I have ever seen was something like (?<group>:data), but I can't get this to work. Well if anyone has an idea, im all for it.
EDIT:
Current Capture Regexes =
^(\[(?<FanSub>[^\]\)\}]+)\])?[. _]*(?<SeriesTitle>[\w. ]*?)[. _]*\-[. _]*(?<EpisodeNumber>\d+)[. _]*(\-[. _]*(?<EpisodeName>[\w. ]*?)[. _]*)?([\[\(\{](?<MiscInfo>[^\]\)\}]*)[\]\)\}][. _]*)*[\w. ]*(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*[Ss](?<SeasonNumber>\d+)[Ee](?<EpisodeNumber>\d+).*?(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*(?<SeasonNumber>\d)(?<EpisodeNumber>\d{2}).*?(?<Extension>\.[a-zA-Z]+)$
Current Replace Regex = [${FanSub}]${SeriesTitle} - ${EpisodeNumber} [${MiscInfo}]${Extension}
Using Regex.Replace, the file TestFile 101.mkv, I get []TestFile - 1[].mkv. What I want to do is make it so that [] is only included if the group FanSub or MiscInfo was found.
I can solve this with a MatchEvaluator because I actually get to compile a function. But this would not be a easy solution for users of the program. The only other idea I have to solve this is to actually make my own Regex.Replace function that accepts special syntax.
It sounds like you want to be able to specify an arbitrary format dynamically rather than hard-code it into your code.
Perhaps one solution is to break your filename parts into specific groups then pass in a replacement pattern that takes advantage of those group names. This would give you the ability to pass in different replacement patterns which return the desired filename structure using the Regex.Replace method.
Since you didn't explain the categories of your filename I came up with some random groups to demonstrate. Here's a quick example:
string input = "Testing StupidName Number2 720p.mkv";
string pattern = #"^(?<Category>\w+)\s+(?<Name>.+?)\s+Number(?<Number>\d+)\s+(?<Resolution>\d+p)(?<Extension>\.mkv)$";
string[] replacePatterns =
{
"[${Category}]${Name} - ${Number}[${Resolution}]${Extension}",
"${Category} - ${Name} - ${Number} - ${Resolution}${Extension}",
"(${Number}) - [${Resolution}] ${Name} [${Category}]${Extension}"
};
foreach (string replacePattern in replacePatterns)
{
Console.WriteLine(Regex.Replace(input, pattern, replacePattern));
}
As shown in the sample, named groups in the pattern, specified as (?<Name>pattern), are referred to in the replacement pattern by ${Name}.
With this approach you would need to know the group names beforehand and pass these in to rearrange the pattern as needed.

Regex index in matching string where the match failed

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?
For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.
Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.
Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:
The tokenizer would produce the following tokens given the following input = "INP :x:":
Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL
These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E
if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}
I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.
I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:
Get the Regex class source code
(e.g.
http://www.codeplex.com/NetMassDownloader
to download the .Net source).
Change the code to have a readonly
property with the failure index.
Make sure your code uses that Regex
rather than Microsoft's.
I guess such an index would only have meaning in some simple case, like in your example.
If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about?
It will depend on the algorithm used for mathcing...
Could fail on "abbbc..." or on "abbbcbbc..."
I don't believe it's possible, but I am intrigued why you would want it.
In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.
It is not possible to be able to tell where a regex fails. as a result you need to take a different approach. You need to compare strings. Use a regex to remove all the things that could vary and compare it with the string that you know it does not change.
I run into the same problem came up to your answer and had to work out my own solution. Here it is:
https://stackoverflow.com/a/11730035/637142
hope it helps

Categories