getting the correct regex to print out in c# - c#

Below is a regex statement I have been working on for quite sometime:
Match parsedRequestData = Regex.Match(requestData, #"^.*\[(.*)\]$");
What this is supposed to be doing is taking the email out of the email below:
2.3|[0246303#up.com]
For clarification, this email comes from a table in SQL Server. There are many emails that are formatted like this in there and the regex is supposed to be getting all of that from inside the brackets. However, it is matching the entirety of this line instead of whats inside of it. So my question is, is there something wrong with my regex statement or do I have something in my code I need to add?

Your regex is storing the email address in capture group 1. Try referencing group 1 like this:
parsedRequestData.Groups[1];
Code Sample:
string requestData = "2.3|[0246303#up.com]";
Match parsedRequestData = Regex.Match(requestData, #"^.*\[(.*)\]$");
if (parsedRequestData.Success)
{
Console.WriteLine(parsedRequestData.Groups[1]);
}
Results:
0246303#up.com

Your regex is OK. All you need is to use the Group[1]
var email = Regex.Match("2.3|[0246303#up.com]", #"^.*\[(.*)\]$").Groups[1].Value;

However, it is matching the entirety of this line instead of whats inside of it.
Unless one uses named match captures, the match capture groups are indexed.
Match.Groups[0].Value is the whole match; it shows all the match captures and all the grouped matched text.
Match.Groups[{1-N}].Value is the match captures in the order of specification in the pattern for anything in a ( ) parenthesis set(s). If there is only one ( ) there will be two indexed groups; 0 as mentioned above, and 1 of the items specified to be captured to N.
You only have one ( ) set so the data you want is found in match capture group 1. Group 0 has the non match capture items along with the match capture data.
If one names the match capture such as (?<MyNameHere> ) one can also access the match via Match.Groups["MyNameHere"].Value.
Suggestion on your pattern away from the answer
Usage of * (zero or more) in patterns can be problematic in that it can significantly increase the time of the parser takes due to backtracking false scenarios.
If one knows there is text to be found, don't tell the parser zero items may happen when that is impossible, change it to + one or more. That slight change can greatly affect the parsing operations, both in time and operations.
Change ^.*\[(.*)\]$ to ^.+\[(.+)\]$.
But to even increase the efficiency of the pattern, focus on the knowns of the characters [ and ] as anchors.
Pattern Restructure To Use Anchors
^[^[]+\[([^\]]+)[\s\]]+$
Why is this pattern better? Because we will look for "[" and "]" as anchors.
Let us break it down
^ - Beginning of the pattern (a hard anchor)
[^ ]+ This is a set notation where the ^ says NOT.
[^\[]+ So we want to capture all text + (one or more) that is NOT a [. This tells the pattern to match up to our anchor [ in the text. Note that we don't have to escape it for regex parser treats all characters in a set [ ] as a literal so [^[] is valid. (To be clear this is a match but don't capture text anchor so we will not find this text in an index above the 0 index; only in 0).
\[ Our literal anchor the "[" character.
([^\]]+) This is our match capture which says match this set where any character is valid but not an "]". Here we have to escape the ] because otherwise it would signify the end of our set.
[\s\]]+ we know the end of our text there will be spaces and the "]" character, so let us match (but not to capture) any combination of spaces and a ] before the end.
$ our final anchor, the end of the file/buffer indicator (or line if the right parser rule is set).

Related

Regular expression in RegularExpressionAttribute behavior

I am using this regular expression: #"[ \]\[;\/\\\?:*""<>|+=]|^[.]|[.]$"
First part [ \]\[;\/\\\?:*""<>|+=] should match any of the characters inside the brackets.
Next part ^[.] should match if the string starts with a 'dot'
Last part [.]$ should match if the string ends with a 'dot'
This works perfectly fine if I use Regex.IsMatch() function. However if I use RegularExpressionAttribute in ASP.NET MVC, I always get invalid model. Does anyone have any clue why this behavior occurs?
Examples:
"abcdefg" should not match
".abcdefg" should match
"abc.defg" should not match
"abcdefg." should match
"abc[defg" should match
Thanks in advance!
EDIT:
The RegularExpressionAttribute Specifies that a data field value in ASP.NET Dynamic Data must match the specified regular expression..
Which means. I need the "abcdef" to match, and ".abcdefg" to not match. Basically negate the whole expression I have above.
You need to make sure the pattern matches the entire string.
In a general case, you may append/prepend the pattern with .*.
Here, you may use
.*[ \][;/\\?:*"<>|+=].*|^[.].*|.*[.]$
Or, to make it a bit more efficient (that is, to reduce backtracking in the first branch) a negated character class will perform better:
[^ \][;/\\?:*"<>|+=]*[ \][;\/\\?:*"<>|+=].*|^[.].*|.*[.]$
But it is best to put the branches matching text at the start/end of the string as first branches:
^[.].*|.*[.]$|[^ \][;/\\?:*"<>|+=]*[ \][;/\\?:*"<>|+=].*
NOTE: You do not have to escape / and ? chars inside the .NET regex since you can't use regex delimiters there.
C# declaration of the last pattern will look like
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*"
See this .NET regex demo.
RegularExpressionAttrubute:
[RegularExpression(
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*",
ErrorMessage = "Username cannot contain following characters: ] [ ; / \\ ? : * \" < > | + =")
]
Your regex is an alternation which matches 1 character out of 3 character classes, the first consisting of more than 1 characters, the second a dot at the start of the string and the third a dot at the end of the string.
It works fine because it does match one of the alternations, only not the whole string you want to match.
You could use 3 alternations where the first matches a dot followed by repeating the character class until the end of the string, the second the other way around but this time the dot is at the end of the string.
Or the third using a positive lookahead asserting that the string contains at least one of the characters [\][;\/\\?:*"<>|+=]
^\.[a-z \][;\/\\?:*"<>|+=]+$|^[a-z \][;\/\\?:*"<>|+=]+\.$|^(?=.*[\][;\/\\?:*"<>|+=])[a-z \][;\/\\?:*"<>|+=]+$
Regex demo

C# Regular Expression: Search the first 3 letters of each name

Does anyone know how to say I can get a regex (C#) search of the first 3 letters of a full name?
Without the use of (.*)
I used (.**)but it scrolls the text far beyond the requested name, or
if it finds the first condition and after 100 words find the second condition he return a text that is not the look, so I have to limit in number of words.
Example: \s*(?:\s+\S+){0,2}\s*
I would like to ignore names with less than 3 characters if they exist in name.
Search any name that contains the first 3 characters that start with:
'Mar Jac Rey' (regex that performs search)
Should match:
Marck Jacobs L. S. Reynolds
Marcus Jacobine Reys
Maroon Jacqueline by Reyils
Can anyone help me?
The zero or more quantifier (*) is 'greedy' by default—that is, it will consume as many characters as possible in order to finding the remainder of the pattern. This is why Mar.*Jac will match the first Mar in the input and the last Jac and everything in between.
One potential solution is just to make your pattern 'non-greedy' (*?). This will make it consume as few characters as possible in order to match the remainder of the pattern.
Mar.*?Jac.*?Rey
However, this is not a great solution because it would still match the various name parts regardless of what other text appears in between—e.g. Marcus Jacobine Should Not Match Reys would be a valid match.
To allow only whitespace or at most 2 consecutive non-whitespace characters to appear between each name part, you'd have to get more fancy:
\bMar\w*(\s+\S{0,2})*\s+Jac\w*(\s+\S{0,2})*\s+Rey\w*
The pattern (\s+\S{0,2})*\s+ will match any number of non-whitespace characters containing at most two characters, each surrounded by whitespace. The \w* after each name part ensures that the entire name is included in that part of the match (you might want to use \S* instead here, but that's not entirely clear from your question). And I threw in a word boundary (\b) at the beginning to ensure that the match does not start in the middle of a 'word' (e.g. OMar would not match).
I think what you want is this regular expression to check if it is true and is case insensitive
#"^[Mar|Jac|Rey]{3}"
Less specific:
#"^[\w]{3}"
If you want to capture the first three letters of every words of at least three characters words you could use something like :
((?<name>[\w]{3})\w+)+
And enable ExplicitCapture when initializing your Regex.
It will return you a serie of Match named "name", each one of them is a result.
Code sample :
Regex regex = new Regex(#"((?<name>[\w]{3})\w+)+", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
var match = regex.Matches("Marck Jacobs L. S. Reynolds");
If you want capture also 3 characters words, you can replace the last "\w" by a space. In this case think to handle the last word of the phrase.

My regular expression isn't returning what I need

I have a block of text as such.
google.sbox.p50 && google.sbox.p50(["how to",[["how to tie a tie",0],["how to train your dragon 2 trailer",0],["how to do the cup song",0],["how to get a six pack in 3 minutes",0],["how to make a paper gun that shoots",0],["how to basic",0],["how to love lil wayne",0],["how to sing like your favorite artist",0],["how to be a heartbreaker marina and the diamonds",0],["how to tame a horse in minecraft",0]],{"q":"XJW--0IKH6sqOp0ME-x5B7b_5wY","j":"5","k":1}])
Using \\[([^]]+)\\] I am able to get everything I need, but with a little extra that I don't. I do not need the ["how to",[[. I only need the blocks that are formatted like,
["how to tie a tie",0]
Can someone please help me modify my expression to only get what I need? I've been at it for hours and I can't grasp the idea of RegEx.
Put both the opening and closing square brackets in the negated character class?
\\[([^][]+)\\]
\\[ matches a literal [
\\] matches a literal ]
[^][] is a negated class, which for instance matches any character except ][. It might be a little difficult to see it, but it's equivalent to [^\\]\\[]. Here the double escapes are not required because you are using a character class (just like \\. is equivalent to [.])
([^][]+) captures everything within square brackets, making sure there's no ] or [ inside.
In C#, you can use the # symbol to avoid having to double escape everytime and using this makes the regex like that:
var regex = new Regex(#"\[([^][]+)\]");
Note: This regex will capture everything within square brackets. If you wish to specificly get the format ["how to tie a tie",0], you can be more precise. After all, the regex will only match stuff you make it match:
var regex = new Regex(#"\["[^"]+",0\]");
Here, we have another negated character class: [^"]. This will match any character which is not a quote character.
This one assumes that the digit is always 0, as depicted in your sample text block. If you have multiple possibilities of numbers, you can use the character class [0-9]+:
var regex = new Regex(#"\["[^"]+",[0-9]+\]");
You can use \d+ as well, but this character class also matches other characters which may or may not render the regex worse. If you want to be more even cautious by allowing possible spaces, tabs, newlines, form feeds in between the characters, you can use this regex:
var regex = new Regex(#"\[\s*"[^"]+"\s*,\s*[0-9]+\s*\]");
Conclusion, there might be many regexes which suit what you need, just make sure you know how your data is coming through so you can pick one which has the right amount of freeway.
I think this is what you are looking for to match the format of ["how to tie a tie",0]:
(\["[^"]+",\d\])
( ) - around the whole thing so it all gets captured in this group
\[" - find ["
[^"]+ - find one or more of anything except "
", - find ",
\d - find a number, if you want more than just a single digit, do \d+
\] - match the ending ]
The only variable things in this regex are whatever is within the quotes ([^"]+) and the number (\d+).
Demo
If you don't want the square brackets in the capture group, you can do it like this:
\[("[^"]+",\d+)\]
I assume you don't want to match if there are quotes within your quotes as it would probably break whatever purpose you are using it for, but if you do, this should work:
\[("[^[\]]+",\d+)\]
You must use this pattern
#"\[[^][]+\]"
More informations about square brackets here.
I think you need this one: (\[[^\[^]+?])
What you did mis is the ? (smallest match) and exclude any [ or ]
Seemingly the text in the outer brackets is a JSON representation of an object. Instead of a regular expression I'd just:
strip off the stuff before the bracket + first bracket (google.sbox.p50 && google.sbox.p50() plus strip off the trailing bracket ). There are more ways to do this, and it can be more efficient than regex.
JSON parse the remaining inner part.
From that point you have the object representation, you can leave out the first element of the array what you don't need, plus you have everything else in a traversable form.
There's the session information at the end along with parameters anyway (in {} brackets), so in the end you may end up parsing stuff anyway. Better not to reinvent the wheel (JSON parsing).

Excluding certain patterns in a regex

I'm working on a Regex in C# to exclude certain patterns within a string.
These are the types patterns I want to accept are: "%00" (Hex 00-FF) and any other character without a starting '%'. The patterns I would like to exclude are: "%0" (Values with a starting % and one character after) and/or characters "&<>'/".
So far I have this
Regex correctStringRegex = new Regex(#"(%[0-9a-fA-F]{2})|[^%&<>'/]|(^(%.))",
RegexOptions.IgnoreCase);
Below are examples of what I'm trying to pass and reject.
Passing String %02This is%0A%0Da string%03
Reject String %0%0Z%A&<%0a%
If a string doesn't pass all the requirements I would like to reject the whole string completely.
Any Help would be greatly appreciated!
I suggest this:
^(?:%[0-9a-f]{2}|[^%&<>'/])*$
Explanation:
^ # Start of string
(?: # Match either
%[0-9a-f]{2} # %xx
| # or
[^%&<>'/] # any character except the forbidden ones
)* # any number of times
$ # until end of string.
This ensures that % is only matched when followed by two hexadecimals. Since you're already compiling the regex with the IgnoreCase flag set, you don't need a-fA-F, either.
Hmm, given the comments so far, I think you need a different problem definition. You want to pass or fail a string, using regex, based on whether or not the string contains any invalid patterns. Im assuming a string will fail if there is ANY invalid pattern, rather than the reverse of a string passing if there is any valid pattern.
As such, I would use this regex: %(?![0-9a-f]{2})|[&<>'/]
You would then run this in such a way that a string is invalid if you GET a match, a valid string will not have any matches in this set.
A quick explanation of a rather odd regex. The format (?!) tells the regex "Match the previous symbol if the symbols in this set DONT follow it" ie: Match if suffix not present. So, what im telling it to look for is any instance of % that is not followed by 2 hex characters, or any other invalid character. The assumption is that anything that DOESN'T match this regex is a valid character entry.

.NET regex matching

Broadly: how do I match a word with regex rules for a)the beginning, b)the whole word, and c)the end?
More specifically: How do I match an expression of length >= 1 that has the following rules:
It cannot have any of: ! # #
It cannot begin with a space or =
It cannot end with a space
I tried:
^[^\s=][^!##]*[^\s]$
But the ^[^\s=] matching moves past the first character in the word. Hence this also matches words that begin with '!' or '#' or '#' (eg: '#ab' or '#aa'). This also forces the word to have at least 2 characters (one beginning character that is not space or = -and- one non-space character in the end).
I got to:
^[^\s=(!##)]\1*$
for a regex matching the first two rules. But how do I match no trailing spaces in the word with allowing words of length 1?
Cameron's solution is both accurate and efficient (and should be used for any production code where speed needs to be optimized). The answer presented here is less efficient, but demonstrates a general approach for applying logic using regular expressions.
You can use multiple positive and negative lookahead regex assertions (all applied at one location in the target string - typically the beginning), to apply multiple logical constraints for a match. The commented regex below demonstrates how easy this is to do for this example case. You do need to understand how the regex engine actually matches (and doesn't match), to come up with the correct expressions, but its not hard once you get the hang of it.
foundMatch = Regex.IsMatch(subjectString, #"
# Match 'word' meeting multiple logical constraints.
^ # Anchor to start of string.
(?=[^!##]*$) # It cannot have any of: ! # #, AND
(?![ =]) # It cannot begin with a space or =, AND
(?!.*\S$) # It cannot end with a space, AND
.{1,} # length >= 1 (ok to match special 'word')
\z # Anchor to end of string.
",
RegexOptions.IgnorePatternWhitespace);
This application of "regex-logic" is frequently used for complex password validation.
Your first attempt was very close. You only need to exclude more characters for the first and last parts, and make the last two parts optional:
^[^\s=!##](?:[^!##]*[^\s!##])?$
This ensures that all three sections will not include any of !##. Then, if the word is more than one character long, it will need to end with a not-space, with only select characters filling the space in-between. This is all enforced properly because of the ^ and $ anchors.
I'm not quite sure what your second example matched, since the () should be taken as literal characters when embedded within a character class, not as a capturing group.

Categories