I want a regular expression for such inputs:
1+2
3
1+22+3
But If I write following inputs then it should not allow. Such as;
+1+2
1+
a+1+b+c
12+
The string must start with number and then followed by only + character. But After the + character, it has to be any number.
I tried this [^0-9][^+]? but İt deletes the + sign at the start with the regex I wrote, but there is a problem. While deleting the + character, it also removes the number next to it. This event keeps repeating.
How can I do this?
Please try :
\d+(\+\d)*
Demo: https://regex101.com/r/hfqmYr/2
Where:
\d -> Matches with any digit
+ -> Matches a symbol one or more times
* -> Matching a symbol 0 or many times
As mentioned in the comments, it looks like you can use:
^[0-9]+(?:\+[0-9]+)*$
This is to allow the mentioned sample data and discard those you don't want to allow. See an online demo. The pattern matches:
^ - Start line anchor.
[0-9]+ - 1+ Digits (ASCII).
(?:\+[0-9]+)* - 0+ Times a non-capture group to allow for a literal plus followed by 1+ digits (ASCII).
$ - End line anchor.
As per my knowledge .NET requires you to explicitly mention these ASCII digits to avoid matching numbers from other languages (unless specified otherwise using ECMAScript options).
Related
I have the following input strings and I need a regex to validate the input.
test.test = OK
test.test.1 = OK
test.text* = OK
test.test. = NO
test.test.* = NO
test = NO
This is my regex, it works but does not successful validate the input as wished:
^[a-z0-9*.\-_\.:]+$
How can I get it work?
You may use
^(?!.*[.*]{2})[a-z0-9*_:-]+(?:\.[a-z0-9*_:-]+)+$
See the regex demo (at regexstorm, line endings are CRLF and \r? is used for the multiline string demo purpose only).
Details
^ - start of string
(?!.*[.*]{2}) - no two consecutive . and * are allowed
[a-z0-9*_:-]+ - 1 or more ASCII lowercase letters, digits, *, _, : or -
(?:\.[a-z0-9*_:-]+)+ - 1 or more consecutive occurrences of
\. - a dot
[a-z0-9*_:-]+ - 1 or more ASCII lowercase letters, digits, *, _, : or -
$ - end of string.
I think this will solve your problem.
^[a-z0-9]+\.[a-z0-9]+((\.[a-z0-9]+)|\*)?$
Explanation
^ - Start of the string.
[a-z0-9] - any of the character in this range will be valid.
+ - One or more.
\. - Matches literal . (period).
((\.[a-z0-9]+)|\*)? -
(\.[a-z0-9]+) - this sub-group checks for . followed by any digit or characters
\* - matches for asterisk
? - make the preceding group optional.
$ - Anchor to the end of line
From your given valid and invalid sample of text, I am concluding following things,
The text will contain word characters.
The word characters can be separated by single dot. Like this abc.xyz or aaa.bbb.ccc
Dot character can't be the first or last character. This is not ok .abc or abc.aaa.
Optionally star (asterisk) character can only appear as the last character. Hence test.text* is fine because test.text is fine but test.text.* is not fine because test.text. is not fine.
Considering these rules, you can use following regex,
^\w+(\.\w+)*(?<!\.)[*]?$
Explanation:
^ --> Start of string
\w+ --> match a word of one or more length
(\.\w+)* --> Match further word characters zero or more preceded by a single literal dot zero or more times.
(?<!\.)[*]? --> asterisk character can optionally be present at the end of string which should not be preceded by a literal dot
$ --> End of input.
Demo
Given 2 different lines I'm parsing, I need to extract the data points into regex match groups.
Example Line 1:
Header values are as follows:
DATE{space}TYPE{space}DESCR{space}VOLUME{space}RATE{space}TOTAL
[11/30/15] [CF] [DISC 1] [28270.18] [0.00150] [-42.41]
Example Line 2:
DATE{space}TYPE{space}DESCR{space}VOLUME{space}RATE{space}TOTAL
[11/30/15] [CF] [OTHER VOLUME FEES] [28186.68] [0.00008] [-2.25]
I'm using the following regex to get matches:
(?<date>^\d{1,2}[-/.]\d{1,2}[-/.]\d{1,2}[\d+])\s+(?<type>[A-Za-z]{2})\s+(?<descr>\w+\s+.*?(1))\s+.*?(?<volume>(\d+(?:\.\d+?))\s+.*?(?<rate>([0]?(\d+(?:\.\d+)?)))\s+(?<total>[-+]?\d+[.,]\d+)?.*$")
I can match the first case,but never the second case. there will always be a total, but they may NOT always be volume or rate. In addition, volume can be whole, decimal or code (e.g. "1B").
What am I missing here?
The description field is an open field and may contain "1" in it. I can have several words in it, or just 1.
Your log lines contain 6 fields, but the 4th and 5th can go missing. A common way to match optional fields is using an optional non-capturing group, (?:...)?. These groups do not make a separate memory buffers for the text they match, that is why they are useful to keep matching cleaner and more efficient.
NOTE that in .NET, there is a way to make all non-named capturing groups non-capturing by use of RegexOptions.ExplicitCapture option.
Your fixed regex mau look like
^(?<date>\d{1,2}[-/.]\d{1,2}[-/.]\d{1,2})\s+(?:(?<type>[A-Z]{2})\s+)?(?:(?<descr>\w.*?)\s+)?(?:(?<volume>\d*\.?\d+)\s+)?(?:(?<rate>\d*\.?\d+)\s+)?(?<total>[-+]?\d*[.,]?\d+)\s*$
See the .NET regex demo.
Details
^ - start of a line (when RegexOptions.Multiline is used)
(?<date>\d{1,2}[-/.]\d{1,2}[-/.]\d{1,2}) - Group "date": 1-2 digits and then 2 repetitions of -///. followed with 1-2 digits (thus, this pattern can be written as (?<date>\d{1,2}(?:[-/.]\d{1,2}){2})).
\s+ - 1 or more whitespaces
(?:(?<type>[A-Z]{2})\s+)? - an optional group matching 2 uppercase ASCII letters, captured into Group "type", and then 1+ whitespaces
(?:(?<descr>\w.*?)\s+)? - an optional group matching a word char (letter, digit or _ and some other special chars (like diacritics) followed with any 0+ chars other than a newline char LF, as few as possible, all this captured into Group "descr", and then 1+ whitespaces
(?:(?<volume>\d*\.?\d+)\s+)? - an optional group matching 0+ digits, an optional . and then 1+ digits (that is, floats or integers) captured into Group "volume", then 1+ whitespace chars
(?:(?<rate>\d*\.?\d+)\s+)? - an optional group matching a float or integer values captured into Group "rate", and then 1+ whitespace chars
(?<total>[-+]?\d*[.,]?\d+) - Group "total": an optional - or + followed with 0+ digits, an optional . or , and then 1+ digits (so, positive or negative floats or integers are matched)
\s* - any 0+ trailing whitespaces
$ - end of the line.
(?<date>^\d{1,2}[-/.]\d{1,2}[-/.]\d{1,2}[\d+])\s+(?<type>[A-Z]{2})\s+(?<descr>\w+.*?\s+)(?<volume>\d+[.]?\d+)\s+(?<rate>\d+[.]?\d+)\s+(?<total>[-+]?\d+[.,]\d+?.*$)
Yes. This is a fairly complex regex. But if you have varying spaces inside your grouping, you can use .*?\s+ to end on the last space. This seems to work nicely for all the use cases I have.
Thanks for your comments!
This question may sound stupid. But I have tried several options and none of them worked.
I have the following in a string variable.
string myText="*someText*someAnotherText*";
What I mean by above is that, there can be 0 or more characters before "someText". There can be 0 or more characters after "someText" and before "someAnotherText". Finally, there can be 0 or more occurrences of any character before "someAnotherText".
I tried the following.
string res= Regex.Replace(searchFor.ToLower(), "*", #"\S*");
It didn't work.
Then I tried the following.
string res= Regex.Replace(searchFor.ToLower(), "*", #"\*");
Even that didn't work.
Can someone help pls ?
Even though I have mentioned "*" to indicate 0 or more occurrences, it says that I haven't mentioned the number of occurrences.
Unlike the DOS wildcard character, the * character in a regular expression means repeat the previous item (character, group, whatever) 0 or more times. In your regular expression the first * has no preceding character, the second one follows the t character, so will repeat that any number of times.
To get '0 or more of any character' you need to use the composition .* where . is 'any character' and * is '0 or more times'.
In other words to search for someText followed any number of characters later by someAnotherText you would use the following Regex:
var re = new Regex(#"someText.*someAnotherText");
Note that unless you specify otherwise by putting start/end specifiers in (^ for start of string, $ for end) the Regex will match any substring of the test string.
Tests for the above, all returning true:
re.IsMatch("This is someText, followed by someAnotherText with text after.");
re.IsMatch("someTextsomeAnotherText");
re.IsMatch("start:someTextsomAnotherText:end");
And so on.
In Regex terms * is a quantifier. Other quantifiers are:
? Match 0 or 1
+ Match 1 or more
{n} Match 'n' times
{n,} Match at least 'n' times
{n,m} Match 'n' to 'm' times
All apply to the preceding term in the Regex.
Placing a ? after another quantifier (including ?) will convert it to lazy form, where it will match as few items as it can. This will allow following terms to also match the terms you specified.
The regular expression to match 0 or more occurrences of any character is
.*
where . matches any single character and * matches zero or more occurrences of it.
(This answer is a quick reference simplification of the current answer.)
I am new to working with Regexs in C# .NET. Say I have a string as follows...
"Working on log #4"
And within this string we can expect to see the number (4) vary. How can I use a Regex to extract only that number from the string.
I want to make sure that the string matches the first part:
"Working on log #"
And then exctract the integer from it.
Also - I know that I could do this using string.Split(), or .Substring, etc. I just wanted to know how I might use regex's to do this.
Thanks!
"Working on log #(\d+)"
The () create a match group, so you will be able to extract that section.
The \d matches any digit.
The + says "look at the previous token, match it one or more times" so it will make it match one or more digits.
So overall you're capturing a group containing one or more digits, where that group comes after "Working on log #"
RegEx rgx = new RegEx("Working on log #[0-9]"); is the pattern you want to use. The first part is a string literal, [0-9] says that character can be any value 0 through 9. If you allow multiple digits then change it to [0-9]{x} where x is the number of repetitions or [0-9]+ as a + after any character means 1 or more of that character is allowed.
You could also just do string.StartsWith("Working on log #") then split on # and use int.TryParse() with the second value to confirm it is in fact a valid integer.
Try this: ^(?<=Working on log #)\d+$. This only captures the number. No need for a capture group. Remove ^ and $ if this is within a larger string.
^ - start of string
(?<=) - positive lookbehind - ensures what is between = and ) is found before
\d+ - at least one digit
$ - end of string
A capturing group is the solution:
"Working on log #(?<Number>[0-9]+)"
Then you can access the matched groups using the Match.Groups property.
I know this stuff has been talked about a lot, but I'm having a problem trying to match the following...
Example input: "test test 310-315"
I need a regex expression that recognizes a number followed by a dash, and returns 310. How do I include the dash in the regex expression though. So the final match result would be: "310".
Thanks a lot - kcross
EDIT: Also, how would I do the same thing but with the dash preceding, but also take into account that the number following the dash could be a negative number... didnt think of this one when I wrote the question immediately. for example: "test test 310--315" returns -315 and "test 310-315" returns 315.
Regex regex = new Regex(#"\d+(?=\-)");
\d+ - Looks for one or more digits
(?=\-) - Makes sure it is followed by a dash
The # just eliminates the need to escape the backslashes to keep the compiler happy.
Also, you may want this instead:
\d+(?=\-\d+)
This will check for a one or more numbers, followed by a dash, followed by one or more numbers, but only match the first set.
In response to your comment, here's a regex that will check for a number following a -, while accounting for potential negative (-) numbers:
Regex regex = new Regex(#"(?<=\-)\-?\d+");
(?<=\-) - Negative lookbehind which will check and make sure there is a preceding -
\-? - Checks for either zero or one dashes
\d+ - One or more digits
(?'number'\d+)- will work ( no need to escape ). In this example the group containing the single number is the named group 'number'.
if you want to match both groups with optional sign try:
#"(?'first'-?\d+)-(?'second'-?\d+)"
See it working here.
Just to describe, nothing complicated, just using -? to match an optional - and \d+ to match one or more digit. a literal - match itself.
here's some documentation that I use:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
in the comments section of that page, it suggests escaping the dash with '\-'
make sure you escape your escape character \
You would escape the special meaning of - in regex language (means range) using a backslash (\). Since backslash has a special meaning in C# literals to escape quotes or be part of some characters, you need to escape that with another backslash(\). So essentially it would be \d+\\-.
\b\d*(?=\-) you will want to look ahead for the dash
\b = is start at a word boundry
\d = match any decimal digit
* = match the previous as many times as needed
(?=\-) = look ahead for the dash
Edited for Formatting issue with the slash not showing after posting