Regular expression of simple boolean expression with parenthesis - c#

I'm trying to write regular expression that should get only the following patterns:
WordWihoutNumbers.WordWihoutNumbers='value'
and patterns with multiple sub expressions like:
WordWihoutNumbers.WordWihoutNumbers='value' OR WordWihoutNumbers.WordWihoutNumbers='value2' AND WordWihoutNumbers.WordWihoutNumbers='value3'
WordWihoutNumbers must be at least two characters and without digits.
for example, those are valid string:
Hardware.Make=’Lenovo’
Hardware.Make=’Lenovo’ OR User.Sitecode=’PRC’
and those are not:
Hardware.Make=’Lenovo’ OR => because there is nothing after the OR operator
Hardware.Make=’Lenovo => ' missing
Hardware Make=’Lenovo => . missing
Hardware.Make’Lenovo' => = missing
I used RegexBuddy to write the following Regex string:
(?i)(\s)*[a-z][a-z]+(.[a-z][a-z]+)(\s)*=(\s)*'[a-z0-9]+'(\s)*((\s)*(AND|OR)(\s)*[a-z][a-z]+(.[a-z][a-z]+)(\s)*=(\s)*'[a-z0-9]+')*
When I tested it using RegexBuddy it worked fine but when I using it inside my C# code I'm always getting 'false' result.
What am I'm doing wrong?
This is what I did in my C# code:
string expression = "Hardware.Make=’Lenovo’ OR User.Sitecode=’PRC’";
Regex expressionFormat = new Regex(#"(?i)(\s)*[a-z][a-z]+(.[a-z][a-z]+)(\s)*=(\s)*'[a-z0-9]+'(\s)*((\s)*(AND|OR)(\s)*[a-z][a-z]+(.[a-z][a-z]+)(\s)*=(\s)*'[a-z0-9]+')*");
bool result = expressionFormat.IsMatch(expression );
and result parameter is always false
UPDATE: thanks to #nhahtdh for his comment, I used a ’ in my input checking instead of '
I need to add to this expression also parenthesis validation, for example:
((WordWihoutNumbers.WordWihoutNumbers='value' OR WordWihoutNumbers.WordWihoutNumbers='value2') AND WordWihoutNumbers.WordWihoutNumbers='value3') is valid but
)WordWihoutNumbers.WordWihoutNumbers='value' OR WordWihoutNumbers.WordWihoutNumbers='value2') AND WordWihoutNumbers.WordWihoutNumbers='value3') is invalid.
Is it possible to implement using Regex? do you have an idea?

Thanks to #nhahtdh that found my issue.
the problem was that ’ and ' are different code points and that was the reason my regular expression didn't work (it was input problem).

Related

Dynamic guid extraction using Regex

Hi all I need to extract Guid from the following string
<PageFieldFieldValue:FieldValue FieldName='fa564e0f-0c70-4ab9-b863-0177e6ddd247' runat='server'></PageFieldFieldValue:FieldValue>
<PageFieldRichImageField:RichImageField FieldName="3de94b06-4120-41a5-b907-88773e493458" runat="server"></PageFieldRichImageField:RichImageField>
What i need is to get is "fa564e0f-0c70-4ab9-b863-0177e6ddd247" and "3de94b06-4120-41a5-b907-88773e493458" in this case, However this guid is dynamic and will change every time and there are lot more guids in the string that i have and I need to get all those guids so that I can add them to a colection.
Note: The string is actually an aspx page content. All nodes are different but have same property "FieldName" which I need to get.
I went through the link C# RegEx string extraction and construcked the regex in same way. Here is what I did :
string s = #"<PageFieldFieldValue:FieldValue FieldName='fa564e0f-0c70-4ab9-b863-0177e6ddd247' runat='server'>
</PageFieldFieldValue:FieldValue>";
Regex reg = new Regex(#"FieldName=(?<ReferenceId>{36})");
Match match = reg.Match(s);
string guid = match.Groups["ReferenceId"].Value;
How ever this didnt work for me. I get exception"parsing "FieldName=(?{35})" - Quantifier {x,y} following nothing." while creating the Regex object "reg".
If i dont use {36} which is suppose to be the length of GUiD:
Regex reg = new Regex(#"FieldName=(?<ReferenceId>)")
I dont get any exception but I dnt get desired result either. match.Groups["ReferenceId"].Value returns empty string
Try using sth. like that:
(?<=FieldName=['"])[a-f\d]{8}-[a-f\d]{4}-[a-f\d]{4}-[a-f\d]{4}-[a-f\d]{12}(?=['"])
Explanation
(?<=FieldName=['"]) prepended by FieldName= and " or '
[a-f\d]{8}-[a-f\d][...] followed by GUID (which is what is actually matching)
(?=['"]) followed by " or '
See this in action at Regex101
The issue you are having is basically that you are providing the quantifier {36} but are not telling it what to quantify - you need some character matching expression right before the quantifier. For, example I just added the '.' before {36} in your example (meaning "match any 36 characters") and it seems to work. Oh, and I also added the missing apostrophe after "FieldName=":
Regex reg = new Regex(#"FieldName='(?<ReferenceId>.{36})");
Working example: https://regex101.com/r/1tbien/1

How do I do the following using only regex?

Say I have the following string
[id={somecomplexuniquestring}test1],
[id={somecomplexuniquestring}test2],[id={somecomplexuniquestring}test3],
[id={somecomplexuniquestring}test4],[id={somecomplexuniquestring}test5],
[id={somecomplexuniquestring}test6],[id={somecomplexuniquestring}test7],
[id={somecomplexuniquestring}test8],[id={somecomplexuniquestring}test9]
is there a way just using regex to get the following result [id={somecomplexuniquestring}test6]
{somecomplexuniquestring} are unknown strings which cannot be used in the regex.
For example, the following will not work #"[id=[\s\S]+?test6]" as it starts from the very first id.
Is using RegEx the best solution? You have tagged C#, so would
variableWithString.Split(",").Any(x => x.Contains("test6"));
give you the exists match, or
result = variableWithString.Split(",").Where(x => x.Contains("test6"));
give you the match value you are seeking?
This doesn't work??
\[id={.*?}test6\]
This all depends on exactly what the limitations of somecomplexuniquestring are. For example, if you have a guarantee that they do not contain any [ or ] characters, you can use this simple one:
"\[[^\[\]]*test6\]"
Similarly, if it could contain square brackets but no curly braces, you can do something similar:
"\[id={[^{}]*}test6\]"
HOWEVER, if you have no such guarantee, and there's some sort of escaping system for including {} or [] in that string, then you need to let us know how that works to properly answer.
You can use this pattern:
#"\[[^]]*]"
If you want a specific test number you can do this:
#"\[id={[^}]*}test6]"

Regex in C# - remove quotes and escaped quotes from a value after another value

I am using HighCharts and am generating script from C# and there's an unfortunate thing where they use inline functions for formatters and events. Unfortunately, I can't output JSON like that from any serializer I know of. In other words, they want something like this:
"labels":{"formatter": function() { return Highcharts.numberFormat(this.value, 0); }}
And with my serializers available to me, I can only get here:
"labels":{"formatter":"function() { return Highcharts.numberFormat(this.value, 0); }"}
These are used for click events as well as formatters, and I absolutely need them.
So I'm thinking regex, but it's been years and years and also I was never a regex wizard.
What kind of Regex replace can I use on the final serialized string to replace any quoted value that starts with function() with the unquoted version of itself? Also, the function itself may have " in it, in which case the quoted string might have \" in it, which would need to also be replaced back down to ".
I'm assuming I can use a variant of the first answer here:
Finding quoted strings with escaped quotes in C# using a regular expression
but I can't seem to make it happen. Please help me for the love of god.
I've put more sweat into this, and I've come up with
serialized = Regex.Replace(serialized, #"""function\(\)[^""\\]*(?:\\.[^""\\]*)*""", "function()$1");
However, my end result is always:
formatter:function()$1
This tells me I'm matching the proper stuff, but my capture isn't working right. Now I feel like I'm probably being an idiot with some C# specific regex situation.
Update: Yes, I was being an idiot. I didn't have a capture around what I really wanted.
`enter code here` serialized = Regex.Replace(serialized, #"""function\(\)([^""\\]*(?:\\.[^""\\]*)*)""", "function()$1");
that gets my match, but in a case like this:
"formatter":"function() { alert(\"hi!\"); return Highcharts.numberFormat(this.value, 0); }"
it returns:
"formatter":function() { alert(\"hi!\"); return Highcharts.numberFormat(this.value, 0); }
and I need to get those nasty backslashes out of there. Now I think I'm truly stuck.
Regexp for match
"function\(\) (?<code>.*)"
Replace expression
function() ${code}
Try this : http://regexr.com?30jpf
What it does :
Finds double quotes JUST before a function declaration and immediately after it.
Regex :
(")(?=function()).+(?<=\})(")
Replace groups 1 & 3 with nothing :
3 capturing groups:
group 1: (")
group 2: ()
group 3: (")
string serialized = JsonSerializer.Serialize(chartDefinition);
serialized = Regex.Replace(serialized, #"""function\(\)([^""\\]*(?:\\.[^""\\]*)*)""", "function()$1").Replace("\\\"", "\"");

Regex Regular expression in c#

We have implemented the invocation of brill tagger from our c# code. We just neede to know what is the correct Regex regular expression for eliminating all from a string, but jst keep a-z,A-Z, full stop and comma. We tried [^a-zA-Z\.\,] on the online regular expression tester and it is giving the correct result, but when implemented in C#, it is not working properly. We also tried several other combinations but we are not getting the correct result.
This is the format in which we are writing:
strFileContent = Regex.Replace(strFileContent, #"[^a-zA-Z\.\,]", "");
but we are not getting the desired output. what is wrong??
Regex.Replace(yourString, #"[^a-z\.\,]", string.Empty, RegexOptions.IgnoreCase)
EDIT: I can't see anything wrong with what you are doing, my answer is exactly the same. I tested both in LINQPad and they both return the same result.

Regex to check whether "and,or,not,and not" in a word?

I have a seneario where i have to check a word contains "and,or,not,and not" but the regex which i have created fails. Can any body provide me the correct regex for this?
The regex which i have created is like this
Regex objAlphaPattern = new Regex(#"^[not|and not|and|not]");
if(objAlphaPattern.IsMatch(searchTerm))
{
//// code
}
But it always returns true.
I have tried the word "Pen and Pencil" and "Pen Pencil" but both returning true.. Can anybody help in providing correct regex?
You're starting with a begin anchor. If you don't want to only check if it happens at the beginning of the string then you shouldn't have the ^.
Also, you are using [] when you should be using (). Actually in this case you don't even need ().
[] indicates a character class. You just don't need that.
Regex objAlphaPattern = new Regex("\b(and|not)\b");
if(objAlphaPattern.IsMatch(searchTerm))
{
//// code
}
That should do the job.
I highly recommend The Regex Coach to help you build regex.
I also highly recommend http://www.regular-expressions.info/ as a reference.
EDIT:
I feel I should point out you don't really even need the object instance.
if(System.Text.RegularExpressions.Regex.IsMatch(searchTerm, "\b(and|not)\b"))
{
//// code
}
You can just use the static method.
That's a very good point Tim:
"\band\b|\bnot\b"
Another very good point stema:
"\b(and|not)\b"
try
(not)|(and not)|(and)
instead
Your regular expression is wrong, it should be (and|not). There is no need to check for and not either, since it will fail at the first and.
You can use an online tool to check your regular expressions; such as http://regexpal.com/?flags=&regex=(and|not)&input=Pen%20and%20Pencil

Categories