How to eliminate digits followed by specific string - c#

I have quite a long regex pattern. Here is just a part of it:
string pattern = #"((?<!top=)(?<![A-Za-z])\d)+";
Given the string:
date(Account/AccountClose) gt 2019-03-25 and Brg eq '100'&$select=IdAccountCurrent&$skip=10&$top=10
It matches 2019, 03, 25, 100, 10 and 0.
I want to eliminate the last 0 from the matching result. In other words, all numbers that are followed by top= should not match.
My solution works only if I have one digit after top=.How can I achieve the desired result ?
regex101 example
UPDATE: Unfortunately, the suggested solutions are not suited for the whole pattern. I tried to make my example simple but it looks like it's imposible to do.
So my whole regex pattern is:
string pattern = #"((?<!top=)(?<![A-Za-z])\d|-|T\d+|:|\.|\+|(?<=\d)Z)+|\bfalse\b|\btrue\b|\bnull\b|'[^']+'|\(['\d][^\)]+\)";
I need to edit this pattern to eliminate all digits right after top=.
my whole example (please see the last row in this example, last 0 should not be matched)

Just add 0-9 in your regex, for forcing the digit not to be preceded by another digit:
((?<!top=)(?<![A-Za-z0-9])\d+)
See here for a demo.
But you can also just use word boundaries:
(?<!top=)\b(\d+)
See here for a demo.

You can change your regex to this where I've used \b to reject the partial matching of digits,
(?<!top=)(?<![A-Za-z])\b\d+
Demo
The way your wrote your regex ((?<!top=)(?<![A-Za-z])\d)+ will work by applying the condition on an individually and then counting one or more such characters which wouldn't have allowed using \b in your regex and hence I changed it to remove outer parenthesis and used \b\d+. Hopefully this should give you all your desired matches. Let me know if you face any issues.

Related

Using RegEx, what's the best way to capture groups of digits, ignoring any whitespace in them

Given the following string...
ABC DEF GHI: 319 022 6543 QRS : 531 450
I'm trying to extract all ranges that start/end with a digit, and which may contain whitespace, but I want that whitespace itself removed.
For instance, the above should yield two results (since there are two 'ranges' that match what I aim looking for)...
3190226543
531450
My first thought was this, but this matches the spaces between the letters...
([\d\s])
Then I tried this, but it didn't seem to have any effect...
([\d+\s*])
This one comes close, but its grabbing the trailing spaces too. Also, this grabs the whitespace, but doesn't remove it.
(\d[\d\s]+)
If it's impossible to remove the spaces in a single statement, I can always post-process the groups if I can properly extract them. That most recent statement comes close, but how do I say it doesn't end with whitespace, but only a digit?
So what's the missing expression? Also, since sometimes people just post an answer, it would be helpful to explain out the RegEx too to help others figure out how to do this. I for one would love not just the solution, but an explanation. :)
Note: I know there can be some variations between RegEx on different platforms so that's fine if those differences are left up to the reader. I'm more interested in understanding the basic mechanics of the regex itself more so than the syntax. That said, if it helps, I'm using both Swift and C#.
You cannot get rid of whitespace from inside the match value within a single match operation. You will need to remove spaces as a post-processing step.
To match a string that starts with a digit and then optionally contains any amount of digits or whitespaces and then a digit you can use
\d(?:[\d\s]*\d)?
Details:
\d - a digit
(?:[\d\s]*\d)? - an optional non-capturing group matching
[\d\s]* - zero or more whitespaces / digits
\d - a digit.
See the regex demo.

regex find one or two digits but not three digits in a string

I would like to find the match for:
\024jack3hall2\c$
\024jack3hall02\c$
\024jack3hall12\c$
but not for:
\024jack3hall023\c$
difference is the number of digits in the end part. I would like to have only 1 or 2, not 3.
my try:
\\\\024[a-zA-Z0-9]+[0-9]{1,2}\\[a-zA-Z]{1}\$(?!.)
I tried only on http://regexr.com/ but will implement in C#.
Is it possible to edit my try or I have to write several separate checks?
Why is
{1,2}
not working? \024jack3hall12343\c$ is also matching,
From the examples you have shown, something as simple as:
[^\d](\d{1,2})\\
Should work. It will match 1 or 2 digits followed by a \ so long as it isn't proceeded by another digit.
The matched digits are in a capture group if you need them (or you can just remove the brackets if you don't need that).
As for your original effort, right here:
\\\\024[a-zA-Z0-9]+[0-9]{1,2}
You are matching 1 or more from the range a-z, A-Z or 0-9. So that will match your extra digits if they come at the end of that pattern.
Answer:
\\\\024[a-zA-Z0-9]+[^\d](\d{1,2})\\[a-zA-Z]{1}\$(?!.)
I believe you were not escaping backslash properly.
Here is the correct regex:
\\024[a-zA-Z0-9]+[0-9]{1,2}\\[a-zA-Z]{1}\$(?!.)

C# Regex Validation

Can someone please validate this for me (newbie of regex match cons).
Rather than asking the question, I am writing this:
Regex rgx = new Regex (#"^{3}[a-zA-Z0-9](\d{5})|{3}[a-zA-Z0-9](\d{9})$"
Can someone telll me if it's OK...
The accounts I am trying to match are either of:
1. BAA89345 (8 chars)
2. 12345678 (8 chars)
3. 123456789112 (12 chars)
Thanks in advance.
You can use a Regex tester. Plenty of free ones online. My Regex Tester is my current favorite.
Is the value with 3 characters then followed by digits always starting with three... can it start with less than or more than three. What are these mins and max chars prior to the digits if they can be.
You need to place your quantifiers after the characters they are supposed to quantify. Also, character classes need to be wrapped in square brackets. This should work:
#"^(?:[a-zA-Z0-9]{3}|\d{3}\d{4})\d{5}$"
There are several good, automated regex testers out there. You may want to check out regexpal.
Although that may be a perfectly valid match, I would suggest rewriting it as:
^([a-zA-Z]{3}\d{5}|\d{8}|\d{12})$
which requires the string to match one of:
[a-zA-Z]{3}\d{5} three alpha and five numbers
\d{8} 8 digits or
\d{12} twelve digits.
Makes it easier to read, too...
I'm not 100% on your objective, but there are a few problems I can see right off the bat.
When you list the acceptable characters to match, like with a-zA-Z0-9, you need to put it inside brackets, like [a-zA-Z0-9] Using a ^ at the beginning will negate the contained characters, e.g. `[^a-zA-Z0-9]
Word characters can be matched like \w, which is equivalent to [a-zA-Z0-9_].
Quantifiers need to appear at the end of the match expression. So, instead of {3}[a-zA-Z0-9], you would need to write [a-zA-Z0-9]{3} (assuming you want to match three instances of a character that matches [a-zA-Z0-9]

Using regex to match any character until a substring is reached?

I'd like to be able to match a specific sequence of characters, starting with a particular substring and ending with a particular substring. My positive lookahead regex works if there is only one instance to match on a line, but not if there should be multiple matches on a line. I understand this is because (.+) captures up everything until the last positive lookahead expression is found. It'd be nice if it would capture everything until the first expression is found.
Here is my regex attempt:
##FOO\[(.*)(?=~~)~~(.*)(?=\]##)\]##
Sample input:
##FOO[abc~~hi]## ##FOO[def~~hey]##
Desired output: 2 matches, with 2 matching groups each (abc, hi) and (def, hey).
Actual output: 1 match with 2 groups (abc~~hi]## ##FOO[def, hey)
Is there a way to get the desired output?
Thanks in advance!
Use the question mark, it will match as few times as possible.
##FOO\[(.*?)(?=~~)~~(.*?)(?=\]##)\]##
This one also works but is not as strict although easier to read
##FOO\[(.*?)~~(.*?)\]##
The * operator is greedy by default, meaning it eats up as much of the string as possible while still leaving enough to match the remaining regex. You can make it not greedy by appending a ? to it. Make sure to read about the differences at the link.
You could use the String.IndexOf() method instead to find the first occurrence of your substring.

Help writing a regular expression

I asked a very similar question to this one almost a month ago here.
I am trying very hard to understand regular expressions, but not a bit of it makes any sense. SLak's solution in that question worked well, but when I try to use the Regex Helper at http://gskinner.com/RegExr/ it only matches the first comma of -2.2,1.1-6.9,2.3-12.8,2.3 when given the regex ,|(?<!^|,)(?=-)
In other words I can't find a single regex tool that will even help me understand it. Well, enough whining. I'm now trying to re-write this regex so that I can do a Regex.Split() to split up the string 2.2 1.1-6.9,2.3-12.8 2.3 into -2.2, 1.1, -6.9, 2.3, -12.8, and 2.3.
The difference the aforementioned question is that there can now be leading and/or trailing whitespace, and that whitespace can act as a delimiter as can a comma.
I tried using \s|,|(?<!^|,)(?=-) but this doesn't work. I tried using this to split 293.46701,72.238185, but C# just tells me "the input string was not in a correct format". Please note that there is leading and trailing whitespace that SO does not display correctly.
EDIT: Here is the code which is executed, and the variables and values after execution of the code.
If it doesn't have to be Regex, and if it doesn't have to be slow :-) this should do it for you:
var components = "2.2 1.1-6.9,2.3-12.8 2.3".Replace("-", ",-").
Split(new[]{' ', ','},StringSplitOptions.RemoveEmptyEntries);
Components would then contain:[2.2 1.1 -6.9 2.3 -12.8 2.3]
Does it need to be split? You could do Regex.Matches(text, #"\-?[\d]+(\.[\d]+)?").
If you need split, Regex.Split(text, #"[^\d.-]+|(?=-)") should work also.
P.S. I used Regex Hero to test on the fly http://regexhero.net
Unless I'm missing the point entirely (it's Sunday night and I'm tired ;) ) I think you need to concentrate more on matching the things you do want and not the things you don't want.
Regex argsep = new Regex(#"\-?[0-9]+\.?[0-9]*");
string text_to_split = "-2.2 1.1-6.9,2.3-12.8 2.3 293.46701,72.238185";
var tmp3 = argsep.Matches(text_to_split);
This gives you a MatchCollection of each of the values you wanted.
To break that down and try and give you an understanding of what it's saying, split it up into parts:
\-? Matches a literal minus sign (\ denotes literal characters) zero or one time (?)
[0-9]+ Matches any character from 0 to 9, one or more times (+)
\.? Matches a literal full stop, zero or one time (?)
[0-9]* Matches any character from 0 to 9 again, but this time it's zero or more times (*)
You don't need to worry about things like \s (spaces) for this regex, as the things you're actually trying to match are the positive/negative numbers.
Consider using the string split function. String operations are way faster than regular expressions and much simpler to use/understand.
If the "Matches" approach doesnt work you could perhaps hack something in two steps?
Regex RE = new Regex(#"(-?[\d.]+)|,|\s+");
RE.Split(" -2.2,1.1-6.9,2.3-12.8,2.3 ")
.Where(s=>!string.IsNullOrEmpty(s))
Outputs:
-2.2
1.1
-6.9
2.3
-12.8
2.3

Categories