.Net Regex for Comma Separated string with a strict format - c#

I've got a string that I need to verify for validity, the later being so if:
It is completely empty
Or contains a comma-separated string that MUST look like this: 'abc,def,ghi,jkl'.
It doesn't matter how many of these comma separated values are there, but if the string is not empty, it must adhere to the comma (and only comma) separated format with no white-spaces around them and each value may only contain ascii a-z/A-z.. no special characters or anything.
How would I verify whether strings adhere to the rules, or not?

You can use this regex
^([a-zA-Z]+(,[a-zA-Z]+)*)?$
or
^(?!,)(,?[a-zA-Z])*$
^ is start of string
[a-zA-Z] is a character class that matches a single uppercase or lowercase alphabet
+ is a quantifier which matches preceding character or group 1 to many times
* is a quantifier which matches preceding character or group 0 to many times
? is a quantifier which matches preceding character or group 0 or 1 time
$ is end of string

Consider not using regex:
bool isOK = str == "" || str.Split(',').All(part => part != "" && part.All(c=> (c>= 'a' && c<='z') || (c>= 'A' && c<='Z')));

Related

How to validate 'live' input field with Regex?

Is there a way to validate 'live' input field using Regex in C#?
'live' means that I don't validate complete string, I want to validate the string while it's typed.
For example, I want my string to match the next pattern lalala111#alalala123, so I have to check each new char - is it # ? if it's # then is there a # already in the string? if yes, then I return a null, if no, then I return the char. And of course I have to check chars - is it letter or digit? if yes, then ok, if not, then not ok.
I hope you got my idea.
At now I have this code
private char ValidateEmail(string input, int charIndex, char charToValidate)
{
if (char.IsLetterOrDigit(charToValidate) &&
!(charToValidate >= 'а' && charToValidate <='я') &&
!(charToValidate >= 'А' && charToValidate <= 'Я') ||
charToValidate =='#' ||
"!#$%&'*+-/=?^_`{|}~#.".Contains(charToValidate.ToString()))
{
if ((charToValidate == '#' && input.Contains("#")) ||
(!input.Contains("#") && charIndex>=63) ||
(input.Contains("#") && charIndex >= 192))
return '\0';
}
else
{
return '\0';
}
return char.ToUpper(charToValidate);
}
it allows only latin letters with digits and some special characters, and also it allows first part of the string (before #) to have only 64 letters, and the second part (after #) to have only 128 letters, but the code looks ugly, don't it? So I want to do all these checks in one beauty regular expression.
lYou have to use the following code:
Declare this line at top:
System.Text.RegularExpressions.Regex remail = new System.Text.RegularExpressions.Regex(#"^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$");
next either on button click or leave event pass the following code to check
if (textemail.Text != "" && !remail.IsMatch(textemail.Text))
{
errorProvider1.Clear();
textemail.Focus();
errorProvider1.SetError(textemail, "Wrong Email ID");
MessageBox.Show("Wrong Email ID");
textemail.SelectAll();
return;
}
After a character has been typed, you want the string that has been entered to match one of the following:
between 1 and 64 (inclusive) acceptablecharacters.
between 1 and 64 (inclusive) acceptable characters then an # character.
between 1 and 64 (inclusive) acceptable characters then an # character then 128 or fewer acceptable characters.
Note that the last two clauses can be combined to say:
between 1 and 64 (inclusive) acceptable characters then an # character then between 0 and 128 inclusive acceptable characters.
Hence the entire requirement can be expressed as:
between 1 and 64 (inclusive) acceptable characters, optionally followed by an # character then between 0 and 128 inclusive acceptable characters.
Where the definition of "acceptable characters" is not at all clear from the question. The code within ValidateEmail does range checks of 'a' to 'я' and of 'А' to 'Я'. It also checks "!#$%&'*+-/=?^_{|}~#.".Contains(...)`.
The text below assumes acceptable characters actually means the 26 letters, upper and lower case, plus the 10 digits.
The required regular expression is then ^\w{1,64}(#\w{0,128})?$
This regular expression can then be used to check the concatenation of the already validated input text plus the character just typed.
If additional characters are wanted to be considered as acceptable then change the \w, there are two of them. For example if underscores (_) and hyphens (-) are to be allowed then change both \ws to be [\w_-].

Regex Lookahead and lookbehind at most one digit

I'm looking for create RegEx pattern
8 characters [a-zA_Z]
must contains only one digit in any place of string
I created this pattern:
^(?=.*[0-9].*[0-9])[0-9a-zA-Z]{8}$
This pattern works fine but i want only one digit allowed. Example:
aaaaaaa6 match
aaa7aaaa match
aaa88aaa don't match
aaa884aa don't match
aaawwaaa don't match
You could instead use:
^(?=[0-9a-zA-Z]{8})[^\d]*\d[^\d]*$
The first part would assert that the match contains 8 alphabets or digits. Once this is ensured, the second part ensures that there is only one digit in the match.
EDIT: Explanation:
The anchors ^ and $ denote the start and end of string.
(?=[0-9a-zA-Z]{8}) asserts that the match contains 8 alphabets or digits.
[^\d]*\d[^\d]* would imply that there is only one digit character and remaining non-digit characters. Since we had already asserted that the input contains digits or alphabets, the non-digit characters here are alphabets.
If you want a non regex solution, I wrote this for a small project :
public static bool ContainsOneDigit(string s)
{
if (String.IsNullOrWhiteSpace(s) || s.Length != 8)
return false;
int nb = 0;
foreach (char c in s)
{
if (!Char.IsLetterOrDigit(c))
return false;
if (c >= '0' && c <= '9') // just thought, I could use Char.IsDigit() here ...
nb++;
}
return nb == 1;
}

Regex problems with equal sign?

In C# I'm trying to validate a string that looks like:
I#paramname='test'
or
O#paramname=2827
Here is my code:
string t1 = "I#parameter='test'";
string r = #"^([Ii]|[Oo])#\w=\w";
var re = new Regex(r);
If I take the "=\w" off the end or variable r I get True. If I add an "=\w" after the \w it's False. I want the characters between # and = to be able to be any alphanumeric value. Anything after the = sign can have alphanumeric and ' (single quotes). What am I doing wrong here. I very rarely have used regular expressions and normally can find example, this is custom format though and even with cheatsheets I'm having issues.
^([Ii]|[Oo])#\w+=(?<q>'?)[\w\d]+\k<q>$
Regular expression:
^ start of line
([Ii]|[Oo]) either (I or i) or (O or o)
\w+ 1 or more word characters
= equals sign
(?<q>'?) capture 0 or 1 quotes in named group q
[\w\d]+ 1 or more word or digit characters
\k<q> repeat of what was captured in named group q
$ end of line
use \w+ instead of \w to one character or more. Or \w* to get zero or more:
Try this: Live demo
^([Ii]|[Oo])#\w+=\'*\w+\'*
If you are being a bit more strict with using paramname:
^([Ii]|[Oo])#paramname=[']?[\w]+[']?
Here is a demo
You could try something like this:
Regex rx = new Regex( #"^([IO])#(\w+)=(.*)$" , RegexOptions.IgnoreCase ) ;
Match group 1 will give you the value of I or O (the parameter direction?)
Match group 2 will give you the name of the parameter
Match group 3 will give you the value of the parameter
You could be stricter about the 3rd group and match it as
(([^']+)|('(('')|([^']+))*'))
The first alternative matches 1 or more non quoted character; the second alternative match a quoted string literal with any internal (embedded) quotes escape by doubling them, so it would match things like
'' (the empty string
'foo bar'
'That''s All, Folks!'

Get sub-strings from a string that are enclosed using some specified character

Suppose I have a string
Likes (20)
I want to fetch the sub-string enclosed in round brackets (in above case its 20) from this string. This sub-string can change dynamically at runtime. It might be any other number from 0 to infinity. To achieve this my idea is to use a for loop that traverses the whole string and then when a ( is present, it starts adding the characters to another character array and when ) is encountered, it stops adding the characters and returns the array. But I think this might have poor performance. I know very little about regular expressions, so is there a regular expression solution available or any function that can do that in an efficient way?
If you don't fancy using regex you could use Split:
string foo = "Likes (20)";
string[] arr = foo.Split(new char[]{ '(', ')' }, StringSplitOptions.None);
string count = arr[1];
Count = 20
This will work fine regardless of the number in the brackets ()
e.g:
Likes (242535345)
Will give:
242535345
Works also with pure string methods:
string result = "Likes (20)";
int index = result.IndexOf('(');
if (index >= 0)
{
result = result.Substring(index + 1); // take part behind (
index = result.IndexOf(')');
if (index >= 0)
result = result.Remove(index); // remove part from )
}
Demo
For a strict matching, you can do:
Regex reg = new Regex(#"^Likes\((\d+)\)$");
Match m = reg.Match(yourstring);
this way you'll have all you need in m.Groups[1].Value.
As suggested from I4V, assuming you have only that sequence of digits in the whole string, as in your example, you can use the simpler version:
var res = Regex.Match(str,#"\d+")
and in this canse, you can get the value you are looking for with res.Value
EDIT
In case the value enclosed in brackets is not just numbers, you can just change the \d with something like [\w\d\s] if you want to allow in there alphabetic characters, digits and spaces.
Even with Linq:
var s = "Likes (20)";
var s1 = new string(s.SkipWhile(x => x != '(').Skip(1).TakeWhile(x => x != ')').ToArray());
const string likes = "Likes (20)";
int likesCount = int.Parse(likes.Substring(likes.IndexOf('(') + 1, (likes.Length - likes.IndexOf(')') + 1 )));
Matching when the part in paranthesis is supposed to be a number;
string inputstring="Likes (20)"
Regex reg=new Regex(#"\((\d+)\)")
string num= reg.Match(inputstring).Groups[1].Value
Explanation:
By definition regexp matches a substring, so unless you indicate otherwise the string you are looking for can occur at any place in your string.
\d stand for digits. It will match any single digit.
We want it to potentially be repeated several times, and we want at least one. The + sign is regexp for previous symbol or group repeated 1 or more times.
So \d+ will match one or more digits. It will match 20.
To insure that we get the number that is in paranteses we say that it should be between ( and ). These are special characters in regexp so we need to escape them.
(\d+) would match (20), and we are almost there.
Since we want the part inside the parantheses, and not including the parantheses we tell regexp that the digits part is a single group.
We do that by using parantheses in our regexp. ((\d+)) will still match (20), but now it will note that 20 is a subgroup of this match and we can fetch it by Match.Groups[].
For any string in parantheses things gets a little bit harder.
Regex reg=new Regex(#"\((.+)\)")
Would work for many strings. (the dot matches any character) But if the input is something like "This is an example(parantesis1)(parantesis2)", you would match (parantesis1)(parantesis2) with parantesis1)(parantesis2 as the captured subgroup. This is unlikely to be what you are after.
The solution can be to do the matching for "any character exept a closing paranthesis"
Regex reg=new Regex(#"\(([^\(]+)\)")
This will find (parantesis1) as the first match, with parantesis1 as .Groups[1].
It will still fail for nested paranthesis, but since regular expressions are not the correct tool for nested paranthesis I feel that this case is a bit out of scope.
If you know that the string always starts with "Likes " before the group then Saves solution is better.

How to ignore regex matches in C#?

An input string:
string datar = "aag, afg, agg, arg";
I am trying to get matches: "aag" and "arg", but following won't work:
string regr = "a[a-z&&[^fg]]g";
string regr = "a[a-z[^fg]]g";
What is the correct way of ignoring regex matches in C#?
The obvious way is to use a[a-eh-z]g, but you could also try with a negative lookbehind like this :
string regr = "a[a-z](?<!f|g)g"
Explanation :
a Match the character "a"
[a-z] Match a single character in the range between "a" and "z"
(?<!XXX) Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
f|g Match the character "f" or match the character "g"
g Match the character "g"
Character classes aren't quite that fancy. The simple solution is:
a[a-eh-z]g
If you really want to explicitly list out the letters that don't belong, you could try something like:
a[^\W\d_A-Zfg]g
This character class matches everything except:
\W excludes non-word characters, i.e. punctuation, whitespace, and other special characters. What's left are letters, digits, and the underscore _.
\d removes digits so now we have letters and the underscore _.
_ removes the underscore so now we only match letters.
A-Z removes uppercase letters so now we only match lowercase letters.
Finally at this point we can list the individual lowercase letters we don't want to match.
All in all way more complicated than we'd likely ever want. That's regular expressions for ya!
What you're using is Java's set intersection syntax:
a[a-z&&[^fg]]g
..meaning the intersection of the two sets ('a' THROUGH 'z') and (ANYTHING EXCEPT 'f' OR 'g'). No other regex flavor that I know of uses that notation. The .NET flavor uses the simpler set subtraction syntax:
a[a-z-[fg]]g
...that is, the set ('a' THROUGH 'z') minus the set ('f', 'g').
Java demo:
String s = "aag, afg, agg, arg, a%g";
Matcher m = Pattern.compile("a[a-z&&[^fg]]g").matcher(s);
while (m.find())
{
System.out.println(m.group());
}
C# demo:
string s = #"aag, afg, agg, arg, a%g";
foreach (Match m in Regex.Matches(s, #"a[a-z-[fg]]g"))
{
Console.WriteLine(m.Value);
}
Output of both is
aag
arg
Try this if you want match arg and aag:
a[ar]g
If you want to match everything except afg and agg, you need this regex:
a[^fg]g
It seems like you're trying to match any three alphabetic characters, with the condition that the second character cannot be f or g. If this is the case, why not use the following regular expression:
string regr = "a[a-eh-z]g";
Regex: a[a-eh-z]g.
Then use Regex.Matches to get the matched substrings.

Categories