I need to match a string under the following conditions using Regex in C#:
Entire string can only be alphanumeric (including spaces).
Must be a maximum of 15 characters or less (including spaces).
First & last characters can only be a letter.
A single space can appear multiple times in anywhere but the first and last characters of the string. (Multiple spaces together should not be allowed).
Capitalization should be ignored.
Should match the WHOLE word(s).
If any one of these preconditions are broken, a match should not follow.
Here is what i currently have:
^\b([A-z]{1})(([A-z0-9 ])*([A-z]{1}))?\b$
And here are some test strings that should match:
Stack OverFlow
Iamthe greatest
A
superman23s
One Two Three
And some that shouldn't match (note the spaces):
Stack [double_space] Overflow Rocks
23Hello
ThisIsOver15CharactersLong
Hello23
[space_here]hey
etc.
Any suggestions would be much appreciated.
You should use lookaheads
|->matches if all the lookaheads are true
--
^(?=[a-zA-Z]([a-zA-Z\d\s]+[a-zA-Z])?$)(?=.{1,15}$)(?!.*\s{2,}).*$
-------------------------------------- ---------- ----------
| | |->checks if there are no two or more space occuring
| |->checks if the string is between 1 to 15 chars
|->checks if the string starts with alphabet followed by 1 to many requireds chars and that ends with a char that is not space
you can try it here
Try this regex: -
"^([a-zA-Z]([ ](?=[a-zA-Z0-9])|[a-zA-Z0-9]){0,13}[a-zA-Z])$"
Explanation : -
[a-zA-Z] // Match first character letter
( // Capture group
[ ](?=[a-zA-Z0-9]) // Match either a `space followed by non-whitespace` (Avoid double space, but accept single whitespace)
| // or
[a-zA-Z0-9] // just `non-whitespace` characters
){0,13} // from `0 to 13` character in length
[a-zA-Z] // Match last character letter
Update : -
To handle single characters, you can make the pattern after 1st character optional as nicely pointed by #Rawling in comments: -
"^([a-zA-Z](([ ](?=[a-zA-Z0-9])|[a-zA-Z0-9]){0,13}[a-zA-Z])?)$"
^^^ ^^^
use a capture group make it optional
And my version, again using look-aheads:
^(?=.{1,15}$)(?=^[A-Z].*)(?=.*[A-Z]$)(?![ ]{2})[A-Z0-9 ]+$
explained:
^ start of string
(?=.{1,15}$) positive look-ahead: must be between 1 and 15 chars
(?=^[A-Z].*) positive look-ahead: initial char must be alpha
(?=.*[A-Z]$) positive look-ahead: last char must be alpha
(?![ ]{2}) negative look-ahead: string mustn't contain 2 or more consecutive spaces
[A-Z0-9 ]+ if all the look-aheads agree, select only alpha-numeric chars + space
$ end of string
This will also need the IgnoreCase option setting
Related
So a sensor I'm interfacing to either outputs 4 multi-digit integers (separated by spaces) or an error string.
Ideally my regex would return a match for either of the above scenarios and reject any
other outputs - e.g. if only 3 numbers are output. I can then check if there are 4 groups (number output) or 1 group (error string output) in the following c#.
The regex I have matches all the time and returns spaces when there are less than 4 numbers so I still need to check everything.
I've tried putting in ?: but the format breaks. Any regex whizzes up to the challenge? Thanks in advance.
([0-9]+\s)([0-9]+\s)([0-9]+\s)([0-9]+)|([a-zA-Z\s_!]+)
So a numeric example would be 11 222 33 4444 or Sensor is in an error state! An incorrect output would be 222 11 3333 as it only has 3 fields
Also - I need to capture the four numbers (but not the spaces) or the error string.
You can capture either the 4 groups with only digits and match the whitespace chars outside of the group.
Or else match 1+ times any of the listed characters in the character class. Note that \s can also match a newline, and as the \s is in the character class the match can also consist of only spaces for example.
To match the whole string, you can add anchors.
^(?:([0-9]+)\s([0-9]+)\s([0-9]+)\s([0-9]+)|[a-zA-Z\s_!]+)$
» Regex demo
Another option to match the error string, is to start matching word characters without digits, optionally repeated by a whitespace char and again word characters without digits.
^(?:([0-9]+)\s([0-9]+)\s([0-9]+)\s([0-9]+)|[^\W\d]+(?:\s+[^\W\d]+)*!?)$
» Regex demo
If there can be a digit in the error message, but you don't want to match only digits or whitespace chars, you can exclude that using a negative lookahead.
^(?:([0-9]+)\s([0-9]+)\s([0-9]+)\s([0-9]+)|(?![\d\s]*$).+)$
» Regex demo
I am trying to extract the fifth and sixth value present in the stream through regex.
The stream is
12,097.00 435.00 100.00 43,037.00 3,090.00 200.00 86.00 45,890.47 7,570.00 51,514.47
I want values 200.00 and 100.00.
I tried ^(?:\S+\s+\n?){3,3} but it's selecting the string from beginning.
Can anybody help me please in getting the values that are present in the middle?
Using a quantifier like {3,3} can be written as {3}, but note that in the example string the values 200.00 and 100.00 are not the 5th and the 6th value.
With your pattern you only get the values at the beginning as the anchor ^ asserts the start of the string.
To get the third and the sixth value, you could also use 2 capture groups by using a quantifier {2} for the parts in between.
^(?:\S+\s+){2}(\S+)(?:\s+\S+){2}\s+(\S+)
^ Start of string
(?:\S+\s+){2} Repeat 2 times matching non whitespace chars followed by whitespace char
(\S+) Capture group 1, match 1+ non whitespace chars
(?:\s+\S+){2}\s+ Repeat 2 times matching whitespace chars and non whitespace chars
(\S+) Capture group 2, match 1+ non whitespace chars
Regex demo
Certainly, if you have access to the code itself, it would be easier to split the string and get nth chunk by its index.
If you are limited to a regex, you can use
(?<=^(?:\S+\s+){2})\S+
(?<=^(?:\S+\s+){5})\S+
Or, if there can be leading whitespaces:
(?<=^\s*(?:\S+\s+){2})\S+
(?<=^\s*(?:\S+\s+){5})\S+
See a .NET regex demo.
Details:
(?<= - start of a positive lookbehind that requires the following sequence of patterns to appear immediately to the left of the current location:
^ - start of string
\s* - zero or more whitespaces
(?:\S+\s+){2} - two occurrences of 1+ non-whitespace chars followed with 1+ whitespace chars
) - end of the lookbehind
\S+ - one or more non-whitespace chars.
I want to check string which look like following
1st radius = 120
and
2nd radius = 'value'
Here is my code
v1 = new Regex(#"^[A-Za-z]+\s[=]\s[A-Za-z]+$");
if (v1.IsMatch(singleLine))`
{
...
...
}
Using #"^[A-Za-z]+\s[=]\s[A-Za-z]+$" this expression 2nd string is matched but not first and when used this #"^[A-Za-z]+\s[=]\s\d{0,3}$" then only matched first one.
And i also want to check for radius = 'val01'
Basing on your effort, it looks as if you were trying to come up with
^[A-Za-z]+\s=\s(?:'[A-Za-z0-9]+'|\d{1,3})$
See the regex demo. Details:
^ - start of string
[A-Za-z]+ - one or more ASCII letters
\s=\s - a = char enclosed with single whitespace chars
(?:'[A-Za-z0-9]+'|\d{1,3}) - a non-capturing group matching either
'[A-Za-z0-9]+' - ', then one or more ASCII letters or digits and then a '
| - or
\d{1,3} - one, two or three digits
$ - end of string (actually, \z is safer when it comes to validating as there can be no final trailing newline after \z, and there can be such a newline after $, but it also depends on how you obtain the input).
If the pattern you tried ^[A-Za-z]+\s[=]\s[A-Za-z]+$ matches the second string radius = 'value', that means that 'value' consists of only chars A-Za-z.
In that case, you could either add matching digits to the second character class:
^[A-Za-z]+\s=\s[A-Za-z0-9]+$
If you either want to match 1-3 digits or at least a single char A-Za-z followed by optional digits:
^[A-Za-z]+\s=\s(?:[0-9]{1,3}|[A-Za-z]+[0-9]*)$
The pattern matches:
^ Start of string
[A-Za-z]+\s=\s Match the first part with chars A-Za-z and the = sign (Note that = does not have to be between square brackets)
(?: Non capture group
[0-9]{1,3} Match 1-3 digits (You can use \d{0,3} but that will also match an emtpy string due to the 0)
| Or
[A-Za-z]+[0-9]* Match 1+ chars A-Za-z followed by optional digits
) Close non capture group
$ End of string
Regex demo
Given 2 different lines I'm parsing, I need to extract the data points into regex match groups.
Example Line 1:
Header values are as follows:
DATE{space}TYPE{space}DESCR{space}VOLUME{space}RATE{space}TOTAL
[11/30/15] [CF] [DISC 1] [28270.18] [0.00150] [-42.41]
Example Line 2:
DATE{space}TYPE{space}DESCR{space}VOLUME{space}RATE{space}TOTAL
[11/30/15] [CF] [OTHER VOLUME FEES] [28186.68] [0.00008] [-2.25]
I'm using the following regex to get matches:
(?<date>^\d{1,2}[-/.]\d{1,2}[-/.]\d{1,2}[\d+])\s+(?<type>[A-Za-z]{2})\s+(?<descr>\w+\s+.*?(1))\s+.*?(?<volume>(\d+(?:\.\d+?))\s+.*?(?<rate>([0]?(\d+(?:\.\d+)?)))\s+(?<total>[-+]?\d+[.,]\d+)?.*$")
I can match the first case,but never the second case. there will always be a total, but they may NOT always be volume or rate. In addition, volume can be whole, decimal or code (e.g. "1B").
What am I missing here?
The description field is an open field and may contain "1" in it. I can have several words in it, or just 1.
Your log lines contain 6 fields, but the 4th and 5th can go missing. A common way to match optional fields is using an optional non-capturing group, (?:...)?. These groups do not make a separate memory buffers for the text they match, that is why they are useful to keep matching cleaner and more efficient.
NOTE that in .NET, there is a way to make all non-named capturing groups non-capturing by use of RegexOptions.ExplicitCapture option.
Your fixed regex mau look like
^(?<date>\d{1,2}[-/.]\d{1,2}[-/.]\d{1,2})\s+(?:(?<type>[A-Z]{2})\s+)?(?:(?<descr>\w.*?)\s+)?(?:(?<volume>\d*\.?\d+)\s+)?(?:(?<rate>\d*\.?\d+)\s+)?(?<total>[-+]?\d*[.,]?\d+)\s*$
See the .NET regex demo.
Details
^ - start of a line (when RegexOptions.Multiline is used)
(?<date>\d{1,2}[-/.]\d{1,2}[-/.]\d{1,2}) - Group "date": 1-2 digits and then 2 repetitions of -///. followed with 1-2 digits (thus, this pattern can be written as (?<date>\d{1,2}(?:[-/.]\d{1,2}){2})).
\s+ - 1 or more whitespaces
(?:(?<type>[A-Z]{2})\s+)? - an optional group matching 2 uppercase ASCII letters, captured into Group "type", and then 1+ whitespaces
(?:(?<descr>\w.*?)\s+)? - an optional group matching a word char (letter, digit or _ and some other special chars (like diacritics) followed with any 0+ chars other than a newline char LF, as few as possible, all this captured into Group "descr", and then 1+ whitespaces
(?:(?<volume>\d*\.?\d+)\s+)? - an optional group matching 0+ digits, an optional . and then 1+ digits (that is, floats or integers) captured into Group "volume", then 1+ whitespace chars
(?:(?<rate>\d*\.?\d+)\s+)? - an optional group matching a float or integer values captured into Group "rate", and then 1+ whitespace chars
(?<total>[-+]?\d*[.,]?\d+) - Group "total": an optional - or + followed with 0+ digits, an optional . or , and then 1+ digits (so, positive or negative floats or integers are matched)
\s* - any 0+ trailing whitespaces
$ - end of the line.
(?<date>^\d{1,2}[-/.]\d{1,2}[-/.]\d{1,2}[\d+])\s+(?<type>[A-Z]{2})\s+(?<descr>\w+.*?\s+)(?<volume>\d+[.]?\d+)\s+(?<rate>\d+[.]?\d+)\s+(?<total>[-+]?\d+[.,]\d+?.*$)
Yes. This is a fairly complex regex. But if you have varying spaces inside your grouping, you can use .*?\s+ to end on the last space. This seems to work nicely for all the use cases I have.
Thanks for your comments!
orginal question removed
I am looking for a Regular Expression which will format a string containing of special characters, characters and numbers into a string containing only numbers.
There are special cases in which it’s not enough to only replace all non-numeric characters with “” (empty).
1.) Zero in brackets.
If there are only zeros in a bracket (0) these should be removed if it is the first bracket pair. (The second bracket pair containing only zeros should not be removed)
2.) Leading zero.
All leading zero should be removed (ignoring brackets)
Examples for better understanding:
123 (0) 123 would be 123123 (zero removed)
(0) 123 -123 would be 123123(zero and all other non-numeric characters removed)
2(0) 123 (0) would be 21230 (first zero in brackets removed)
20(0)123023(0) would be 201230230 (first zero in brackets removed)
00(0)1 would be 1(leading zeros removed)
001(1)(0) would be 110 (leading zeros removed)
0(0)02(0) would be 20 (leading zeros removed)
123(1)3 would be 12313 (characters removed)
You could use a lookbehind to match (0) only if it's not at the beginning of the string, and replace with empty string as you're doing.
(original solution removed)
Updated again to reflect new requirements
Matches leading zeroes, matches (0) only if it's the first parenthesized item, and matches any non-digit characters:
^[0\D]+|(?<=^[^(]*)\(0\)|\D
Note that most regex engines do not support variable-length lookbehinds (i.e., the use of quantifiers like *), so this will only work in a few regex engines -- .NET's being one of them.
^[0\D]+ # zeroes and non-digits at start of string
| # or
(?<=^[^(]*) # preceded by start of string and only non-"(" chars
\(0\) # "(0)"
| # or
\D # non-digit, equivalent to "[^\d]"
(tested at regexhero.net)
You've changed and added requirements several times now. For multiple rules like this, you're probably better off coding for them individually. It could become complicated and difficult to debug if one condition matches and causes another condition not to match when it should. For example, in separate steps:
Remove parenthesized items as necessary.
Remove non-digit characters.
Remove leading zeroes.
But if you absolutely need these three conditions all matched in a single regular expression (not recommended), here it is.
Regexes get much, much simpler if you can use multiple passes. I think you could do a first pass to drop your (0) if it's not the first thing in a string, then follow it with stripping out the non-digits:
var noMidStrParenZero = Regex.Replace(text, "^([^(]+)\(0\)", "$1");
var finalStr = Regex.Replace(noMidStrParenZero, "[^0-9]", "");
Avoids a lot of regex craziness, and it's also self-documenting to an extent.
EDIT: this version should work with your new examples too.
This regex should be pretty near the one you're searching for.
(^[^\d])|([^\d](0[^\d])?)+
(You can replace everything that is caught by an empty string)
EDIT :
Your request evolved, and is now to complex to be treatd with a single pass. Assuming you always got a space before a bracket group, you can use those passes (keep this order) :
string[] entries = new string[7] {
"800 (0) 123 - 1",
"800 (1) 123",
"(0)321 123",
"1 (0) 1",
"1 (12) (0) 1",
"1 (0) (0) 1",
"(9)156 (1) (0)"
};
foreach (string entry in entries)
{
var output = Regex.Replace(entry , #"\(0\)\s*\(0\)", "0");
output = Regex.Replace(output, #"\s\(0\)", "");
output = Regex.Replace(output, #"[^\d]", "");
System.Console.WriteLine("---");
System.Console.WriteLine(entry);
System.Console.WriteLine(output);
}
(?: # start grouping
^ # start of string
| # OR
^\( # start of string followed by paren
| # OR
\d # a digit
) # end grouping
(0+) # capture any number of zeros
| # OR
([1-9]) # capture any non-zero digit
This works for all of your example strings, but the entire expression does match the ( followed by the zero. You can use Regex.Matches to get the match collection using a global match and then join all of the matched groups into a string to get numbers only (or just remove any non-numbers).