Nothing else but Regex for matching the string

Nothing else but Regex for matching the string - c#

I want to check whether there is string starting from number and then optional character with the help of the regex.So what should be the regex for matching the string which must be started with number and then character might be there or not.Like there is string "30a" or "30" it should be matched.But if there is "a" or some else character or sereis of characters, string should not be matched.

Sounds like there should be able to be any number of numeric characters at the beginning followed by optional other characters. To match any other character after a series of numbers at the beginning I would use:
\d+.*
To match only alpha numeric characters after the mandatory numeric beginning I would use:
\d+\w*
Note: as pointed out by Dav, if you add a ^ to the start of the expression and a $ to the end of the expression like this ^\d+\w*$ you will ensure the whole string matches. However if you leave those off, you will be able to search the input string for what you need. It just depends on what your needs are.

^\d.*
The ^ matches the start of the string, \d matches a single digit, and then the .* matches any number of additional characters.
Thus, the net result is that it will only match if the string begins with a digit.

Related

Using '*' instead of '+' in a regex check

I have a regex check:
Match matchLeft = Regex.Match(Name.Substring(subName.Length), #"\d*");
This basically checks for the first digits at the end of the subName. Now, I have noticed that with the use of * in the regex (* = 0 or more), if the next characters are not digits, it will return nothing. If they are however, it will return the string of digits.
But
If I use #"\d+" instead, it will look for 1 or more digits, and return the first instance of digits, regardless of there position after the substring.
So if I had a string ("abcdef123") and a substring ("abc"):
#"\d*" would match null
#"\d+" would match "123"
Alternatively, if the substring was "abcdef", both would match "123".
So my question is - why does the use of * return nothing if the directly following characters are not digits? Will this occur every time?

When you get the substring you end up with def123. The following are true:
\d+ tries to get at least one match in the string and will greedily match more. It must traverse the string to find the first match, arriving at the 123.
On the other hand, \d* will start at the beginning of the string and will successfully match the start of the string with zero digits. Even though it is greedy, it is completely satisfied with matching zero digits. It is a successful match and is zero-width.
You can change this behavior by making it \d*$ to anchor at the end of the matched string.

I think you answered your question yourself. This behavior is default and will occur every time.
See Quantifier Cheat Sheet
A+ One or more As, as many as possible (greedy), giving up characters if the engine needs to backtrack (docile)
A* Zero or more As, as many as possible (greedy), giving up characters if the engine needs to backtrack (docile)
Since \d* can match an empty string it will match an empty string as regex engine always tries to return a valid match, and can even match empty substrings at the beginning, end and between characters in a string.

Regex search for string like "$12,56,45" using c#

I want it to search string like "$12,56,450" using Regex in c#, but it doesn't match the string
Here is my code:
string input="Total earn for the year $12,56,450";
string pattern = #"\b(?mi)($12,56,450)\b";
Regex regex = new Regex(pattern);
if (regex.Match(input).Success)
{
return true;
}

This Regex will do the job, (?mi)(\$\d{2},\d{2},\d{3}), and here's a Regex 101 to prove it.
Now let's break it down a little:
\$ matches the literal $ at the beginning of the string
\d{2} matches any two digits
, matches the literal ,
\d{2} matches any two digits
, matches the literal ,
\d{3} matches any three digits
Now, for the purposes of the demonstration I removed the word boundaries, \b, but I'm also pretty confident you don't need them anyway. See, word boundaries aren't generally necessary for such a finite string match. Consider their definition:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.

You need to escape $ and some other special regex caracters.
try this #"\b(?mi)(\$12,56,450)\b";
if you want you can use \d to match a digit, and use \d{2,3} to match a digit with size 2 or 3.

Explain the Regex mentioned

Can any one please explain the regex below, this has been used in my application for a very long time even before I joined, and I am very new to regex's.
/^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$/
As far as I understand
this regex will validate
- for a minimum of 6 chars to a maximum of 10 characters
- will escape the characters like ^ and $
also, my basic need is that I want a regex for a minimum of 6 characters with 1 character being a digit and the other one being a special character.

^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$
^ is called an "anchor". It basically means that any following text must be immediately after the "start of the input". So ^B would match "B" but not "AB" because in the second "B" is not the first character.
.* matches 0 or more characters - any character except a newline (by default). This is what's known as a greedy quantifier - the regex engine will match ("consume") all of the characters to the end of the input (or the end of the line) and then work backwards for the rest of the expression (it "gives up" characters only when it must). In a regex, once a character is "matched" no other part of the expression can "match" it again (except for zero-width lookarounds, which is coming next).
(?=.{6,10}) is a lookahead anchor and it matches a position in the input. It finds a place in the input where there are 6 to 10 characters following, but it does not "consume" those characters, meaning that the following expressions are free to match them.
(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z]) is another lookahead anchor. It matches a position in the input where the following text contains four letters ([a-zA-Z] matches one lowercase or uppercase letter), but any number of other characters (including zero characters) may be between them. For example: "++a5b---C#D" would match. Again, being an anchor, it does not actually "consume" the matched characters - it only finds a position in the text where the following characters match the expression.
(?=.*\d.*\d) Another lookahead. This matches a position where two numbers follow (with any number of other characters in between).
.* Already covered this one.
$ This is another kind of anchor that matches the end of the input (or the end of a line - the position just before a newline character). It says that the preceding expression must match characters at the end of the string. When ^ and $ are used together, it means that the entire input must be matched (not just part of it). So /bcd/ would match "abcde", but /^bcd$/ would not match "abcde" because "a" and "e" could not be included in the match.
NOTE
This looks like a password validation regex. If it is, please note that it's broken. The .* at the beginning and end will allow the password to be arbitrarily longer than 10 characters. It could also be rewritten to be a bit shorter. I believe the following will be an acceptable (and slightly more readable) substitute:
^(?=(.*[a-zA-Z]){4})(?=(.*\d){2}).{6,10}$
Thanks to #nhahtdh for pointing out the correct way to implement the character length limit.

Check Cyborgx37's answer for the syntax explanation. I'll do some explanation on the meaning of the regex.
^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$
The first .* is redundant, since the rest are zero-width assertions that begins with any character ., and .* at the end.
The regex will match minimum 6 characters, due to the assertion (?=.{6,10}). However, there is no upper limit on the number of characters of the string that the regex can match. This is because of the .* at the end (the .* in the front also contributes).
This (?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z]) part asserts that there are at least 4 English alphabet character (uppercase or lowercase). And (?=.*\d.*\d) asserts that there are at least 2 digits (0-9). Since [a-zA-Z] and \d are disjoint sets, these 2 conditions combined makes the (?=.{6,10}) redundant.
The syntax of .*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z] is also needlessly verbose. It can be shorten with the use of repetition: (?:.*[a-zA-Z]){4}.
The following regex is equivalent your original regex. However, I really doubt your current one and this equivalent rewrite of your regex does what you want:
^(?=(?:.*[a-zA-Z]){4})(?=(?:.*\d){2}).*$
More explicit on the length, since clarity is always better. Meaning stay the same:
^(?=(?:.*[a-zA-Z]){4})(?=(?:.*\d){2}).{6,}$
Recap:
Minimum length = 6
No limit on maximum length
At least 4 English alphabet, lowercase or uppercase
At least 2 digits 0-9

REGEXPLANATION
/.../: slashes are often used to represent the area where the regex is defined
^: matches beginning of input string
.: this can match any character
*: matches the previous symbol 0 or more times
.{6,10}: matches .(any character) somewhere between 6 and 10 times
[a-zA-Z]: matches all characters between a and z and between A and Z
\d: matches a digit.
$: matches the end of input.
I think that just about does it for all the symbols in the regex you've posted

For your regex request, here is what you would use:
^(?=.{6,}$)(?=.*?\d)(?=.*?[!##$%&*()+_=?\^-]).*
And here it is unrolled for you:
^ // Anchor the beginning of the string (password).
(?=.{6,}$) // Look ahead: Six or more characters, then the end of the string.
(?=.*?\d) // Look ahead: Anything, then a single digit.
(?=.*?[!##$%&*()+_=?\^-]) // Look ahead: Anything, and a special character.
.* // Passes our look aheads, let's consume the entire string.
As you can see, the special characters have to be explicitly defined as there is not a reserved shorthand notation (like \w, \s, \d) for them. Here are the accepted ones (you can modify as you wish):
!, #, #, $, %, ^, &, *, (, ), -, +, _, =, ?
The key to understanding regex look aheads is to remember that they do not move the position of the parser. Meaning that (?=...) will start looking at the first character after the last pattern match, as will subsequent (?=...) look aheads.

Regular expression to allow only some characters in .net

I was just working on some validation and was stuck up on this though :( I want a text which contains only [a-z][A-Z][0-9][_] .
It should accept any of the above characters any number of times in any order. All other characters marks the text as invalid.
I tried this but it is not working !!
{
......
Regex strPattern = new Regex("[0-9]*[A-Z]*[a-z]*[_]*");
if (!strPattern.IsMatch(val))
{
return false;
}
return true
}

You want this:
Regex strPattern = new Regex("^[0-9A-Za-z_]*$");
Your expression does not work because:
It will accept any number of digits, followed by any number of uppercase letters, followed by any number of lowercase letters, followed by any number of underscores. For example, an underscore followed by a number would not match.
Your pattern is not anchored using the ^ and $ characters. This means that every string will match, because every string contains zero or more of the specified characters. (For example, the string "!##$" contains zero numbers, etc.!) Anchoring the expression to the start and end of the string means that the entire string much match the entire expression or the match will fail.
This pattern will still accept a zero-length string as valid. If you would like to enforce that the string be at least one character, change the * near the end of the expression to +. (* means "0 or more of the previous token" while + means "1 or more of the previous token.")

Try this:
new Regex("[0-9A-Za-z_]*");

why do these regex tests let certain characters pass?

I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?

If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.

In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Nothing else but Regex for matching the string - c#

^\d.* The ^ matches the start of the string, \d matches a single digit, and then the .* matches any number of additional characters. Thus, the net result is that it will only match if the string begins with a digit.

Related

Using '*' instead of '+' in a regex check

Regex search for string like "$12,56,45" using c#

Explain the Regex mentioned

Regular expression to allow only some characters in .net

why do these regex tests let certain characters pass?

Categories

Resources