Help writing a regular expression - c#

I asked a very similar question to this one almost a month ago here.
I am trying very hard to understand regular expressions, but not a bit of it makes any sense. SLak's solution in that question worked well, but when I try to use the Regex Helper at http://gskinner.com/RegExr/ it only matches the first comma of -2.2,1.1-6.9,2.3-12.8,2.3 when given the regex ,|(?<!^|,)(?=-)
In other words I can't find a single regex tool that will even help me understand it. Well, enough whining. I'm now trying to re-write this regex so that I can do a Regex.Split() to split up the string 2.2 1.1-6.9,2.3-12.8 2.3 into -2.2, 1.1, -6.9, 2.3, -12.8, and 2.3.
The difference the aforementioned question is that there can now be leading and/or trailing whitespace, and that whitespace can act as a delimiter as can a comma.
I tried using \s|,|(?<!^|,)(?=-) but this doesn't work. I tried using this to split 293.46701,72.238185, but C# just tells me "the input string was not in a correct format". Please note that there is leading and trailing whitespace that SO does not display correctly.
EDIT: Here is the code which is executed, and the variables and values after execution of the code.

If it doesn't have to be Regex, and if it doesn't have to be slow :-) this should do it for you:
var components = "2.2 1.1-6.9,2.3-12.8 2.3".Replace("-", ",-").
Split(new[]{' ', ','},StringSplitOptions.RemoveEmptyEntries);
Components would then contain:[2.2 1.1 -6.9 2.3 -12.8 2.3]

Does it need to be split? You could do Regex.Matches(text, #"\-?[\d]+(\.[\d]+)?").
If you need split, Regex.Split(text, #"[^\d.-]+|(?=-)") should work also.
P.S. I used Regex Hero to test on the fly http://regexhero.net

Unless I'm missing the point entirely (it's Sunday night and I'm tired ;) ) I think you need to concentrate more on matching the things you do want and not the things you don't want.
Regex argsep = new Regex(#"\-?[0-9]+\.?[0-9]*");
string text_to_split = "-2.2 1.1-6.9,2.3-12.8 2.3 293.46701,72.238185";
var tmp3 = argsep.Matches(text_to_split);
This gives you a MatchCollection of each of the values you wanted.
To break that down and try and give you an understanding of what it's saying, split it up into parts:
\-? Matches a literal minus sign (\ denotes literal characters) zero or one time (?)
[0-9]+ Matches any character from 0 to 9, one or more times (+)
\.? Matches a literal full stop, zero or one time (?)
[0-9]* Matches any character from 0 to 9 again, but this time it's zero or more times (*)
You don't need to worry about things like \s (spaces) for this regex, as the things you're actually trying to match are the positive/negative numbers.

Consider using the string split function. String operations are way faster than regular expressions and much simpler to use/understand.

If the "Matches" approach doesnt work you could perhaps hack something in two steps?
Regex RE = new Regex(#"(-?[\d.]+)|,|\s+");
RE.Split(" -2.2,1.1-6.9,2.3-12.8,2.3 ")
.Where(s=>!string.IsNullOrEmpty(s))
Outputs:
-2.2
1.1
-6.9
2.3
-12.8
2.3

Related

Using RegEx, what's the best way to capture groups of digits, ignoring any whitespace in them

Given the following string...
ABC DEF GHI: 319 022 6543 QRS : 531 450
I'm trying to extract all ranges that start/end with a digit, and which may contain whitespace, but I want that whitespace itself removed.
For instance, the above should yield two results (since there are two 'ranges' that match what I aim looking for)...
3190226543
531450
My first thought was this, but this matches the spaces between the letters...
([\d\s])
Then I tried this, but it didn't seem to have any effect...
([\d+\s*])
This one comes close, but its grabbing the trailing spaces too. Also, this grabs the whitespace, but doesn't remove it.
(\d[\d\s]+)
If it's impossible to remove the spaces in a single statement, I can always post-process the groups if I can properly extract them. That most recent statement comes close, but how do I say it doesn't end with whitespace, but only a digit?
So what's the missing expression? Also, since sometimes people just post an answer, it would be helpful to explain out the RegEx too to help others figure out how to do this. I for one would love not just the solution, but an explanation. :)
Note: I know there can be some variations between RegEx on different platforms so that's fine if those differences are left up to the reader. I'm more interested in understanding the basic mechanics of the regex itself more so than the syntax. That said, if it helps, I'm using both Swift and C#.
You cannot get rid of whitespace from inside the match value within a single match operation. You will need to remove spaces as a post-processing step.
To match a string that starts with a digit and then optionally contains any amount of digits or whitespaces and then a digit you can use
\d(?:[\d\s]*\d)?
Details:
\d - a digit
(?:[\d\s]*\d)? - an optional non-capturing group matching
[\d\s]* - zero or more whitespaces / digits
\d - a digit.
See the regex demo.

How to eliminate digits followed by specific string

I have quite a long regex pattern. Here is just a part of it:
string pattern = #"((?<!top=)(?<![A-Za-z])\d)+";
Given the string:
date(Account/AccountClose) gt 2019-03-25 and Brg eq '100'&$select=IdAccountCurrent&$skip=10&$top=10
It matches 2019, 03, 25, 100, 10 and 0.
I want to eliminate the last 0 from the matching result. In other words, all numbers that are followed by top= should not match.
My solution works only if I have one digit after top=.How can I achieve the desired result ?
regex101 example
UPDATE: Unfortunately, the suggested solutions are not suited for the whole pattern. I tried to make my example simple but it looks like it's imposible to do.
So my whole regex pattern is:
string pattern = #"((?<!top=)(?<![A-Za-z])\d|-|T\d+|:|\.|\+|(?<=\d)Z)+|\bfalse\b|\btrue\b|\bnull\b|'[^']+'|\(['\d][^\)]+\)";
I need to edit this pattern to eliminate all digits right after top=.
my whole example (please see the last row in this example, last 0 should not be matched)
Just add 0-9 in your regex, for forcing the digit not to be preceded by another digit:
((?<!top=)(?<![A-Za-z0-9])\d+)
See here for a demo.
But you can also just use word boundaries:
(?<!top=)\b(\d+)
See here for a demo.
You can change your regex to this where I've used \b to reject the partial matching of digits,
(?<!top=)(?<![A-Za-z])\b\d+
Demo
The way your wrote your regex ((?<!top=)(?<![A-Za-z])\d)+ will work by applying the condition on an individually and then counting one or more such characters which wouldn't have allowed using \b in your regex and hence I changed it to remove outer parenthesis and used \b\d+. Hopefully this should give you all your desired matches. Let me know if you face any issues.

Why is this regex not allowing this text?

I have a username validator IsValidUsername, and I am testing "baconman" but it is failing, could someone please help me out with this regex?
if(!Regex.IsMatch(str, #"^[a-zA-Z]\\w+|[0-9][0-9_]*[a-zA-Z]+\\w*$")) {
isValid = false;
}
I want the restrictions to be: (It's very close)
Be between 5 & 17 characters long
contain at least one letter
no spaces
no special characters
You're escaping unnecessarily: if you write your regex as starting with # outside the string, you don't need both \ - just one is fine.
Either:
#"\w"
or
"\\w"
Edit: I didn't make this clear: right now due to the double escaping, you're looking for a \ in your regex and a w. So your match would need [some character]\w to match (example: "a\w" or "a\wwwwww" would match.
Your requirements are best taken care of in normal C#. They don't map well to a regular expression. Just code them up using LINQ which works on strings like it would on an IEnumerable<char>.
Also, understanding a query of a string is much easier than understanding a Regex with the requirements that you have.
It is possible to do everything as part of a Regex, however it is not pretty :-)
^(\w(?=\w*[a-zA-Z])|[a-zA-Z]|\w(?<=[a-zA-Z]\w*)){5,17}$
It does 3 checks that always results in 1 character being matched (so we can perform the length check in the end)
Either the character is any word character \w which is before [a-zA-Z]
Or it is [a-zA-Z]
Or it is any word character \w which is after [a-zA-Z]

C# Regex Validation

Can someone please validate this for me (newbie of regex match cons).
Rather than asking the question, I am writing this:
Regex rgx = new Regex (#"^{3}[a-zA-Z0-9](\d{5})|{3}[a-zA-Z0-9](\d{9})$"
Can someone telll me if it's OK...
The accounts I am trying to match are either of:
1. BAA89345 (8 chars)
2. 12345678 (8 chars)
3. 123456789112 (12 chars)
Thanks in advance.
You can use a Regex tester. Plenty of free ones online. My Regex Tester is my current favorite.
Is the value with 3 characters then followed by digits always starting with three... can it start with less than or more than three. What are these mins and max chars prior to the digits if they can be.
You need to place your quantifiers after the characters they are supposed to quantify. Also, character classes need to be wrapped in square brackets. This should work:
#"^(?:[a-zA-Z0-9]{3}|\d{3}\d{4})\d{5}$"
There are several good, automated regex testers out there. You may want to check out regexpal.
Although that may be a perfectly valid match, I would suggest rewriting it as:
^([a-zA-Z]{3}\d{5}|\d{8}|\d{12})$
which requires the string to match one of:
[a-zA-Z]{3}\d{5} three alpha and five numbers
\d{8} 8 digits or
\d{12} twelve digits.
Makes it easier to read, too...
I'm not 100% on your objective, but there are a few problems I can see right off the bat.
When you list the acceptable characters to match, like with a-zA-Z0-9, you need to put it inside brackets, like [a-zA-Z0-9] Using a ^ at the beginning will negate the contained characters, e.g. `[^a-zA-Z0-9]
Word characters can be matched like \w, which is equivalent to [a-zA-Z0-9_].
Quantifiers need to appear at the end of the match expression. So, instead of {3}[a-zA-Z0-9], you would need to write [a-zA-Z0-9]{3} (assuming you want to match three instances of a character that matches [a-zA-Z0-9]

Regular Expression to reject special characters other than commas

I am working in asp.net. I am using Regular Expression Validator
Could you please help me in creating a regular expression for not allowing special characters other than comma. Comma has to be allowed.
I checked in regexlib, however I could not find a match. I treid with ^(a-z|A-Z|0-9)*[^#$%^&*()']*$ . When I add other characters as invalid, it does not work.
Also could you please suggest me a place where I can find a good resource of regular expressions? regexlib seems to be big; but any other place which lists very limited but most used examples?
Also, can I create expressions using C# code? Any articles for that?
[\w\s,]+
works fine, as you can see bellow.
RegExr is a great place to test your regular expressions with real time results, it also comes with a very complete list of common expressions.
[] character class \w Matches any word character (alphanumeric & underscore). \s
Matches any whitespace character (spaces, tabs, line breaks). , include comma + is greedy match; which will match the previous 1 or more times.
[\d\w\s,]*
Just a guess
To answer on any articles, I got started here, find it to be an excellent resource:
http://www.regular-expressions.info/
For your current problem, try something like this:
[\w\s,]*
Here's a breakdown:
Match a single character present in the list below «[\w\s,]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A word character (letters, digits, etc.) «\w»
A whitespace character (spaces, tabs, line breaks, etc.) «\s»
The character “,” «,»
For a single character that is not a comma, [^,] should work perfectly fine.
You can try [\w\s,] regular expression. This regex will match only alpha-numeric characters and comma. If any other character appears within text, then this wont match.
For your second question regarding regular expression resource, you can goto
http://www.regular-expressions.info/
This website has lot of tutorials on regex, plus it has lot of usefult information.
Also, can I create expressions using
C# code? Any articles for that?
By this, do you mean to say you want to know which class and methods for regular expression execution? Or you want tool that will create regular expression for you?
You can create expressions with C#, something like this usually does the trick:
Regex regex = new Regex(#"^[a-z | 0-9 | /,]*$", RegexOptions.IgnoreCase);
System.Console.Write("Enter Text");
String s = System.Console.ReadLine();
Match match = regex.Match(s);
if (match.Success == true)
{
System.Console.WriteLine("True");
}
else
{
System.Console.WriteLine("False");
}
System.Console.ReadLine();
You need to import the System.Text.RegularExpressions;
The regular expression above, accepts only numbers, letters (both upper and lower case) and the comma.
For a small introduction to Regular Expressions, I think that the book for MCTS 70-536 can be of a big help, I am pretty sure that you can either download it from somewhere or obtain a copy.
I am assuming that you never messed around with regular expressions in C#, hence I provided the code above.
Hope this helps.
Thank you, all..
[\w\s,]* works
Let me go through regular-expressions.info and come back if I need further support.
Let me try the C# code approach and come back if I need further support.
[This forum is awesome. Quality replies so qucik..]
Thanks again
(…) is denoting a grouping and not a character set that’s denoted with […]. So try this:
^[a-zA-Z0-9,]*$
This will only allow alphanumeric characters and the comma.

Categories