Regular expression to accept 5 characters with optional

Regular expression to accept 5 characters with optional - c#

Hi i have to write a regular expression that should match the format like A12BC. Here first 2 characters that is A & 1 is mandatory and next 3 characters 2, B & C are optional. Currently my regEx works if i give the string value as A12BC.
When I give the input as A1B it should not match but my regular expression matches and gives me the result as susses. Can any one please help me and modify my RegExp so
that it behaves as per below:
Case "A1" : Should match
Case "A1B" : Should not match (this case is not working)
Case "A12B" : Should match
Case "A12BC" : Should match
Case "A12BCD" : Should not match
My regular expression is as below:
^[a-zA-Z][0-9][0-9]?[a-zA-Z]?[a-zA-Z]?$

To make sure that the third character, if present, is a digit, make third character mandatory in an optional group, like this:
^[a-zA-Z][0-9]([0-9][a-zA-Z]?[a-zA-Z]?)?$
This expression says that if the third character is present, it needs to match a digit. The two trailing letters are optional, too.
Note: you can simplify your expression by using predefined Character Classes \w for letters and \d for digits. Remember that you need to double backslashes for use in "plain" string literals (as opposed to verbatim string literals, in which backslashes are not doubled).

You can use:
^[a-zA-Z][0-9](?:[0-9][a-zA-Z]{0,2})?$
In the 2BC pattern, you have to make the digit mandatory before allowing zero, one or two letters.
(?:[0-9][a-zA-Z]{0,2})? matches an empty string, or a digit, or a digit followed by a letter, or a digit followed by two letters, but not a single letter.
(?:...) is a non capturing group, see demo here.

Related

Regex to match if string is exactly as defined

How can I check that the string is in correct format. I want the string to compare and pass only if matches exactly. Following are the correct formats :
0.#
0.##
0.###
0.####
0.#####
The hash (#) after the dot (.) can be upto 10 characters but it should only have 0.# nothing else is allowed.
Can someone please guide me how can I validate a string of this type ?

Im Regular Expression the carret (^) represent start-of-line and the ($) represents end-of-line (or before newline).
A regex with an exact match is just what you want enclosed by ^ and $. But you must ensure that special regular expression characters are quoted. For example the regex
^Hello World$
would match exactly on the String "Hello World" and nothing else.
You also can use numbers directly. You need to escape the dot "." as a dot in a regular expression means any character except newline. You escape a character by adding a backslash.
Next you should know about quantifiers. The usually ones are
-> 0 or many
-> 1 or many
{n} -> exactly n times
{n,} -> at least n times
{n,m} -> n to m times
So you can write:
^0\.#{1,10}$
If you use a normal string in C# with quotations (") you must use two backslashes
^0\\.#{1,10}$

C# Regex specify allowed start and end condtions

I'm trying to create a regex expression with the following requirements:
The value:
Must start with a-z or _, numbers are OK after the first character
Can have parentheses if they are opened and closed with number inside at the end of string, i.e SomeVar(10) is OK, SomeVar(10 is not OK.
Can have a . but only one at a time, and only between letters or numbers. SomeVar.InnerVar is OK, SomeVar..Innevar is not OK.
My try at the regex:
[a-zA-Z_]
??
??

Assuming you want to match an entire string, you may use something like the following:
^[a-zA-Z_](?:\w|(?<=\w)\.(?=\w))*(?:\(\d+\))?$
Demo.
If you want to match partial strings, you'd need to decide what boundaries are allowed. Otherwise, "SomeVar(10" would have a match (i.e., what comes before (), for example.
Notes:
\w matches a lowercase/uppercase letter, a digit, or an underscore. But it also matches Unicode letters and numbers. If you don't want that, you could use [a-zA-Z0-9_] instead.
Similarly, \d matches any Unicode digit. You either use it or use [0-9] depending on your requirements.

Use
^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*(\([^()]*\))?$
See proof.
[a-zA-Z_][a-zA-Z0-9_]* - a letter or underscore, then zero or more letters, digits, underscores
(\([^()]*\))? - optional group, parens may be present or absent
(\.[a-zA-Z_][a-zA-Z0-9_]*)* - dot is allowed between letter/digit/underscore.

Regular expression one letter followed by at least one numerical value

I try to create a regular expression that match the following..
The name starts with a letter and followed by at least 1 numerical value. For example this should be valid: "w1234.pdf" but not this: "ww1234.pdf". So far I only have this:
^[a-zA-Z][0-9]{1}?$

For
The name starts with a letter and followed by at least 1 numerical
value
you can try
^[a-zA-Z][0-9]
pattern. Explanation:
^ starting anchor
[a-zA-Z] single letter
[0-9] followed by a digit
Please, notice that valid names are wider class than provided in the question, e.g a1x456.txt - starting with a letter (a) followed by at least one numerical value (1)

Your regex matches one character in the range [a-zA-Z], followed by one digit. Note that {1}? is useless in this case, since [0-9] already matches one digit.
Change your regex to:
^[a-zA-Z][0-9].+$
if you want to match one digit or more after the first character.

Regular Expressions: Determining if a String is either a number or variable

I am trying to combine two Regular Expression patterns to determine if a String is either a double value or a variable. My restrictions are as follows:
The variable can only begin with an _ or alphabetical letter (A-Z, ignoring case), but it can be followed by zero or more _s, letters, or digits.
Here's what I have so far, but I can't get it to work properly.
String varPattern = #"[a-zA-Z_](?: [a-zA-Z_]|\d)*";
String doublePattern = #"(?: \d+\.\d* | \d*\.\d+ | \d+ ) (?: [eE][\+-]?\d+)?";
String pattern = String.Format("({0}) | ({1})",
varPattern, doublePattern);
Regex.IsMatch(word, varPattern, RegexOptions.IgnoreCase)
It seems that it is capturing both Regular Expression patterns, but I need it to be either/or.
For example, _A2 2 is valid using the code above, but _A2 is invalid.
Some examples of valid variables are as follows:
_X6 , _ , A , Z_2_A
And some examples of invalid variables are as follows:
2_X6 , $2 , T_2$
I guess I just need clarification on the pattern format for the Regular Expression. The format is unclear to me.

As noted, the literal whitespace you've put in your regular expressions is part of the regular expression. You won't get a match unless that same whitespace is in the text being scanned by the regular expression. If you want to use whitespace to make your regex, you'll need to specify RegexOptions.IgnorePatternWhitespace, after that, if you want to match any whitespace, you'll have to do so explicitly, either by specifying \s, \x20, etc.
It should be noted that if you do specify RegexOptions.IgnorePatternWhitespace, you can use Perl-style comments (# to end of line) to document your regular expression (as I've done below). For complex regular expressions, someone 5 years from now — who might be you! — will thank you for the kindness.
Your [presumably intended] patterns are also, I think, more complex than they need be. A regular expression to match the identifier rule you've specified is this:
[a-zA-Z_][a-zA-Z0-9_]*
Broken out into its constituent parts:
[a-zA-Z_] # match an upper- or lower-case letter or an underscore, followed by
[a-zA-Z0-9_]* # zero or more occurences of an upper- or lower-case letter, decimal digit or underscore
A regular expression to match the conventional style of a numeric/floating-point literal is this:
([+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?
Broken out into its constituent parts:
( # a mandatory group that is the integer portion of the value, consisting of
[+-]? # - an optional plus- or minus-sign, followed by
[0-9]+ # - one or more decimal digits
) # followed by
( # an optional group that is the fractional portion of the value, consisting of
\. # - a decimal point, followed by
[0-9]+ # - one or more decimal digits
)? # followed by,
( # an optional group, that is the exponent portion of the value, consisting of
[Ee] # - The upper- or lower-case letter 'E' indicating the start of the exponent, followed by
[+-]? # - an optional plus- or minus-sign, followed by
[0-9]+ # - one or more decimal digits.
)? # Easy!
Note: Some grammars differ as to whether the sign of the value is a unary operator or part
of the value and whether or not a leading + sign is allowed. Grammars also vary as to whether
something like 123245. is valid (e.g., is a decimal point with no fractional digits valid?)
To combine these two regular expression,
First, group each of them with parentheses (you might want to name the containing groups, as I've done):
(?<identifier>[a-zA-Z_][a-zA-Z0-9_]*)
(?<number>[+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?
Next, combine with the alternation operation, |:
(?<identifier>[a-zA-Z_][a-zA-Z0-9_]*)|(?<number>[+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?
Finally, enclose the whole shebang in an #"..." literal and you should be good to go.
That's about all there is to it.

Spaces are not ignored in regular expressions by default, so for each space in your current expressions it is looking for a space in that string. Add the RegexOptions.IgnorePatternWhitespace flag or remove the spaces from your expressions.
You will also want to add some beginning and end of string anchors (^ and $ respectively) so you do not match just part of a string.

You should avoid having spaces in your regular expressions unless you explicitly set IgnorePatterWhiteSpace. To make sure you get only matches on complete words you should include the beginning of line (^) and end of line ($) characters. I would also suggest you build the entire expression pattern instead of using String.Format("({0}) | ({1})", ...) as you have here.
The below should work given your examples:
string pattern = #"(?:^[a-zA-Z_][a-zA-Z_\d]*)|(?:^\d+(?:\.\d+){0,1}(?:[Ee][\+-]\d+){0,1}$)";

why do these regex tests let certain characters pass?

I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?

If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.

In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression to accept 5 characters with optional - c#

Related

Regex to match if string is exactly as defined

C# Regex specify allowed start and end condtions

Regular expression one letter followed by at least one numerical value

Regular Expressions: Determining if a String is either a number or variable

why do these regex tests let certain characters pass?

Categories

Resources