I need a regex to match a number in the second line. Similar input is like this:
^C1.1
xC20
SS3
M 4
Decimal pattern (-?\d+(\.\d+)?) matches all numbers and second number can be get in a loop on the code behind but I need a regular expression to get directly the number in the second line.
/^[^\r\n]*\r?\n\D*?(-?\d+(\.\d+)?)/
This operates by capturing a single line at the beginning of the input:
^ Beginning of the string
[^\r\n]* Anything that isn't a line terminator
\r?\n A newline, optionally preceded by a carriage return
Then all the non digit characters, then your numbers.
Since you've now repeatedly changed your needs, try this on for size:
/(?<=\n\D*)-?\d+(\.\d+)?/
I was able to capture it with this regex.
.*\n\D*(\d*).*\n
Check out group 1 of anything that this matches:
^.*?\r\n.*?(\d+)
If that doesn't work, try this:
^.*?\r\n.*?(\d+)
Both are with multiline NOT set...
I would probably use the captured group in /^.*?\r?\n.*?(-?\d+(?:\.\d+)?)/ where…
^ # beginning of string
.*? # anything...
\r?\n # followed by a new line
.*? # anything...
( # followed by...
-? # an optional negative sign (minus)
\d+ # a number
(?: # -this part not captured explicitly-
\.\d+ # a dot and a number
)? # -and is optional-
)
If it is a flavor that supports lookbehind then there are other alternatives.
Related
I have a string like following,
hi,hello,-LSB-,ASPECT,-RSB-,you
I want to extract sub-string that comes before -LSB-,ASPECT, till comma, hello in this case.
I have written regular expression like
\b\w+[/-/,LSB/-/,ASPECT]
however it extracts entire substring before and inclusing-LSB-,ASPECT, till start like,
hi,hello,-LSB-,ASPECT
Any clue??
The regex for this (using a positive lookahead assertion) would be
[^,]*(?=,-LSB-,ASPECT,)
Explanation:
[^,]* # Match any number of characters except commas
(?= # until the following regex can be matched:
,-LSB-,ASPECT, # the literal text ",-LSB-,ASPECT,".
) # (End of lookahead assertion)
Careful, square brackets create a character class which you don't want in this case.
Live demo
Try this:
(\w+),-LSB-,ASPECT
I am trying to create a RegEx expression that will successfully parse the following line:
"57" "testing123" 82 16 # 13 26 blah blah
What I want is to be able to do is identify the numbers in the line. Currently, what I'm using is this:
[0-9]+
which parses fine. However, where it gets tricky is if the number is in quotes, like "57" is or like "testing123" is, I do not want it to match.
In addition to that, anything after the hash sign (the '#"), I do not want to match anything at all after the hash sign.
So in this example, the matches I should be getting are "82" and "16". Nothing else should match.
Any help on this would be appreciated.
It should be easier for you to build 3 different regexes, and then create the logic that combines them:
Check, whether the string has #, and ignore everything after it.
Check, for all the matches of "\d+", and ignore all of them
Check everything that's left, whether it matches [0-9]+
.Net regular expression can rather easily parse this string. The following pattern should match everything until the comment:
\A # Start of the string
(?>
(?<Quoted> # A quoted string
"" # Open quotes
[^""\\]* # non quotes or backslashes
(?:\\.[^""\\]*)* # but allow escaped characters
"" # Close quotes
)
|
(?<Number> # A number
\d+ # some digits
)
|
\s+ # Whitespace separator
)*
If you also want to match the comment, add:
(?<Comment>
\# .*
)?
\z
You can get your numbers in a single Match, using all captures of the "Number" group:
Match parsed = Regex.Match(s, pattern, RegexOptions.IgnorePatternWhitespace);
CaptureCollection numbers = parsed.Groups["Number"].Captures;
Missing from this pattern is mainly unquoted string tokens, such as 4 8 this 15that, which can add some complexity, depending on how we'd want it to work.
The C# expression library I am using will not directly support my table/field parameter syntax:
The following are table/field parameter names that are not directly supported:
TableName1.FieldName1
[TableName1].[FieldName1]
[Table Name 1].[Field Name 1]
It accepts alphanumeric parameters without spaces, or most characters enclosed within square brackets. I would like to use C# regular expressions to replace the dot separators and neighboring brackets to a different delimiter, so the results would be as follows:
[TableName1|FieldName1]
[TableName1|FieldName1]
[Table Name 1|Field Name 1]
I also need to skip any string literals within single quotes, like:
'TableName1.FieldName1'
And, of course, ignore any numeric literals like:
12345.6789
EDIT: Thank you for your feedback on improving my question. Hopefully it is clearer now.
I've written a completely new answer, now that the problem is clarified:
You can do this in a single regex. It is quite bulletproof, I think, but as you can see, it's not exactly self-explanatory, which is why I've commented it liberally. Hope it makes sense.
You're lucky that .NET allows re-use of named capturing groups, otherwise you would have had to do this in several steps.
resultString = Regex.Replace(subjectString,
#"(?: # Either match...
(?<before> # (and capture into backref <before>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters,
) # (End of capturing group <before>).
\. # then a literal dot,
(?<after> # (now capture again, into backref <after>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters.
) # (End of capturing group <after>) and end of match.
| # Or:
\[ # Match a literal [
(?<before> # (now capture into backref <before>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <before>).
\]\.\[ # Match literal ].[
(?<after> # (capture into backref <after>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <after>).
\] # Match a literal ]
) # End of alternation. The match is now finished, but
(?= # only if the rest of the line matches either...
[^']*$ # only non-quote characters
| # or
[^']*'[^']*' # contains an even number of quote characters
[^']* # plus any number of non-quote characters
$ # until the end of the line.
) # End of the lookahead assertion.",
"[${before}|${after}]", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
I WOuld like to implement textBox in which user can only insert text in pattern like this:
dddddddddd,
dddddddddd,
dddddddddd,
...
where d is a digit. If user leave control with less then 10 digits in a row validation should fail and he should not be able to write in one line more than 10 digits, then acceptable should be only comma ",".
Thanks for help
Match m = Regex.Match(textBox.Text, #"^\d{10},$", RegexOptions.Multiline);
Haven't tried it, but it should work. Please take a look here and here for more information.
I suggest the regex
\A(?:\s*\d{10},)*\s*\d{10}\s*\Z
Explanation:
\A # start of the string
(?: # match the following zero or more times:
\s* # optional whitespace, including newlines
\d{10}, # 10 digits, followed by a comma
)* # end of repeated group
\s* # match optional whitespace
\d{10} # match 10 digits (this time no comma)
\s* # optional whitespace
\Z # end of string
In C#, this would look like
validInput = Regex.IsMatch(subjectString, #"\A(?:\s*\d{10},)*\s*\d{10}\s*\Z");
Note that you need to use a verbatim string (#"...") or double all the backslashes in the regex.
I want to match the first number/word/string in quotation marks/list in the input with Regex. For example, it should match those:
"hello world" gdfigjfoj sogjds
-14.5 fdhdfdfi dfjgdlf
test14 hfghdf hjgfjd
(a (c b 7)) (3 4) "hi"
Any ideas to a regex or how can I start?
Thank you.
If you want to match balanced parenthesis, regex is not the right tool for the job. Some regex implementations do facilitate recursive pattern matching (PHP and Perl, that I know of), but AFAIK, C# cannot do that (EDIT: see Steve's comment below: .NET can do this as well, after all).
You can match up to a certain depth using regex, but that very quickly explodes in your face. For example, this:
\(([^()]|\([^()]*\))*\)
meaning
\( # match the character '('
( # start capture group 1
[^()] # match any character from the set {'0x00'..''', '*'..'ÿ'}
| # OR
\( # match the character '('
[^()]* # match any character from the set {'0x00'..''', '*'..'ÿ'} and repeat it zero or more times
\) # match the character ')'
)* # end capture group 1 and repeat it zero or more times
\) # match the character ')'
will match single nested parenthesis like (a (c b 7)) and (a (x) b (y) c (z) d), but will fail to match (a(b(c))).
Any ideas to a regex or how can I start?
You can start with any tutorial on basic regex, such as this.
[Edit] I missed that you wanted to count parentheses. That cannot be done in regex - nothing that involves counting (except for non-standard lookaheads) can.
For first three cases, you could to use:
^("[^"]*"|[+-]?\d*(?:\.\d+)?|\w+)
For last one, I'm not sure if it's possible with regex to match that last closing parenthesis.
EDIT: using that suggested balanced matching for last one:
^\([^()]*(((?<Open>\()[^()]*)+((?<Close-Open>\))[^()]*)+)*(?(Open)(?!))\)