How to set repeat regular expression? - c#

I have regular expression ^\d{5}$|^\d{5}-\d{4}*$" it checked US zip.
But I need check "zip, zip, zip" how to do this?
I tried this ^(\d{5}$|^\d{5}-\d{4},)*$ but it not work

Try
((^|, )(\d{5}|\d{5}-\d{4}))*$
Tester: http://regexr.com?36297
Each match must be preceded by (^|, ), so by the beginning of the string or a , (comma space)
Note that you shouldn't use the \d in .NET, because ٠١٢٣٤ are \d! (in .NET \d includes non-ASCII Unicode digits). [0-9] is normally better.

The expression you appear to need is:
^\d{5}(|-\d{4})(,\d{5}(|-\d{4}))*$
The one you were attempting to write was:
^(\d{5}|\d{5}-\d{4},)*$
but that would require every ZIP to have a trailing comma, which the very last one would not have had.
Breaking down the answer given,
\d{5}(|-\d{4}) is a variant of your original, but simply making the -1234 optional.
(,\d{5}(|-\d{4}))* is the first regular expression preceded by a comma, and allowed zero or more times.

I would use this for speed:
^\d{5}(?:-\d{4})?(?:,\s*\d{5}(?:-\d{4})?)*$
expanded
^
\d{5}
(?: - \d{4} )?
(?:
, \s* \d{5}
(?: - \d{4} )?
)*
$
and this for speed/flexibility:
^\s*\d{5}(?:\s*-\s*\d{4})?(?:\s*,\s*\d{5}(?:\s*-\s*\d{4})?)*\s*$
expanded
^
\s*
\d{5}
(?: \s* - \s* \d{4} )?
(?:
\s* , \s* \d{5}
(?: \s* - \s* \d{4} )?
)*
\s*
$

Related

Match numbers that not in context of Value(x)

I am trying to match the numbers that are not in the context of Value(X) and discard rest of text.
Example text:
lorem ipsum Value (3) dfasdf 654345435ds sdfsdf asdf
asd
F
asdf
sad Value (2)
Example Regex:
Value\((\d)\)
Thanks for help.
The .NET regex engine supports a quantifier in the lookbehind assertion.
What you might do is assert that from the current position, the is not Value( to the left that has 1+ digits and ) to the right. If that is the case, match 1 or more digits.
The pattern matches:
(?<!\bValue[\p{Zs}\t]*\((?=[0-9]+\)))[0-9]+
(?<! Positive lookbehind, assert what is to the left is
\bValue Match Value preceded by a word boundary to prevent a partial match
[\p{Zs}\t]*\( Match optional horizontal spaces followed by (
(?=[0-9]+\)) Positive lookahead, assert 1+ digits followed by ) to the right
) Close lookbehind
[0-9]+ Match 1+ digits 0-9
.NET regex demo
Note that \d matches more digits than 0-9 only, but also from other languages. If you want match that, you can use \d, else you can use [0-9] instead.
You are looking for:
(?<!Value *\()\d+)
Note that I am assuming that every Value( has a closing bracket.
Explanation:
(?<!Value *\() asserts that what follows it is not preceded by "Value(", Value (, Value ( and so on.
\d+ matches a digit between one and infinite times
Something like this ought to do you:
private static readonly Regex rx = new Regex(#"
(?<! # A zero-width negative look-behind assertion, consisting of:
\w # - a word boundary, followed by
Value # - the literal 'Value', followed by
\s* # - zero or more whitespace characters, followed by
[(] # - a left parenthesis '(', followed by
\s* # - zero or more whitespace characters,
) # The whole of which is followed by
( # A number, consisting of
-? # - an optional minus sign, followed by
\d+ # - 1 or more decimal digits,
) # The whole of which is followed by
(?! # A zero-width negative look-ahead assertion, consisting of
\s* # - zero or more whitespace characters, followed by
[)] # - a single right parenthesis ')'
) #
",
rxOpts
);
private const RegexOptions rxOpts = RegexOptions.IgnoreCase
| RegexOptions.ExplicitCapture
| RegexOptions.IgnorePatternWhitespace
;
Then . . .
foreach ( Match m in rx.Matches( someText ) )
{
string nbr = m.Value;
Console.WriteLine("Found '{0}', nbr);
}

Regex to get square brackets containing numbers only but are not within square brackets themselves

Sample String
"[] [ds*[000112]] [1448472995] sample string [1448472995] ***";
The regex should match
[1448472995] [1448472995]
and should not match [000112] since there is outer square bracket.
Currently I have this regex that is matching [000112] as well
const string unixTimeStampPattern = #"\[([0-9]+)]";
This is a good way to do it using balanced text.
( \[ \d+ \] ) # (1)
| # or,
\[ # Opening bracket
(?> # Then either match (possessively):
[^\[\]]+ # non - brackets
| # or
\[ # [ increase the bracket counter
(?<Depth> )
| # or
\] # ] decrease the bracket counter
(?<-Depth> )
)* # Repeat as needed.
(?(Depth) # Assert that the bracket counter is at zero
(?!)
)
\] # Closing bracket
C# sample
string sTestSample = "[] [ds*[000112]] [1448472995] sample string [1448472995] ***";
Regex RxBracket = new Regex(#"(\[\d+\])|\[(?>[^\[\]]+|\[(?<Depth>)|\](?<-Depth>))*(?(Depth)(?!))\]");
Match bracketMatch = RxBracket.Match(sTestSample);
while (bracketMatch.Success)
{
if (bracketMatch.Groups[1].Success)
Console.WriteLine("{0}", bracketMatch);
bracketMatch = bracketMatch.NextMatch();
}
Output
[1448472995]
[1448472995]
You need to use balancing groups to handle this - it looks a bit daunting but isn't all that complicated:
Regex regexObj = new Regex(
#"\[ # Match opening bracket.
\d+ # Match a number.
\] # Match closing bracket.
(?= # Assert that the following can be matched ahead:
(?> # The following group (made atomic to avoid backtracking):
[^\[\]]+ # One or more characters except brackets
| # or
\[ (?<Depth>) # an opening bracket (increase bracket counter)
| # or
\] (?<-Depth>) # a closing bracket (decrease bracket counter, can't go below 0).
)* # Repeat ad libitum.
(?(Depth)(?!)) # Assert that the bracket counter is now zero.
[^\[\]]* # Match any remaining non-bracket characters
\z # until the end of the string.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace);
Are you just trying to capture the unix time stamp? Then you can try a simpler one where you specify the minimum number of characters matched in a group.
\[([0-9]{10})\]
Here I limit it to 10 characters since I doubt the time stamp will hit 11 characters anytime soon... To protect against that:
\[([0-9]{10,11})\]
Of course this could lead to false positives if you have a 10-length number in an enclosing bracket.
This will match your expression as expected: http://regexr.com/3csg3 it uses lookahead.

Regex to capture parenthesis with hash tag?

So far I have this perfectly working regex:
(?:(?<=\s)|^)#(\w*[A-Za-z_]+\w*)
It finds any word that starts with a hash tag (ex. #lolz but not hsshs#jdjd)
The problem is I also want it to match parenthesis. So if I have this it will match:
(#lolz wow)
or
(wow #cool)
or
(#cool)
Any idea on how can I make or use my regex to work like that?
The following seemed to work for me ...
\(?#(\w*[A-Za-z_]+\w*)\)?
The way you are using the following in context is overkill..
\w*[A-Za-z_]\w*
\w alone matches word characters ( a-z, A-Z, 0-9, _ ). And it is not necessary for the use of the non-capturing group (?: to be wrapped around your lookbehind assertion here.
I do believe that the following would suffice by itself.
(?<=^|\s)\(?#(\w+)\)?
Regular expression:
(?<= look behind to see if there is:
^ the beginning of the string
| OR
\s whitespace (\n, \r, \t, \f, and " ")
) end of look-behind
\(? '(' (optional (matching the most amount possible))
# '#'
( group and capture to \1:
\w+ word characters (a-z, A-Z, 0-9, _) (1 or more times)
) end of \1
\)? ')' (optional (matching the most amount possible))
See live demo
You can also use a negative lookbehind here if you wanted to.
(?<![^\s])\(?#(\w+)\)?

C#: Regex for string with enclosing single-quotes (and escaping by doubling the quotes)

I did not found a regex for my problem. There are always example-regex for escaping with back-slash.
But I need escaping by doubling the enclosing-character.
Example: 'o''reilly'
Result: o'reilly
'(?:''|[^']*)*'
will match a quote-delimited string that may contain double-escaped quotes. So that's your regex to find those strings.
Explanation:
' # Match a single quote.
(?: # Either match... (use (?> instead of (?: if you can)
'' # a doubled quote
| # or
[^']* # anything that's not a quote
)* # any number of times.
' # Match a single quote.
To now remove the quotes correctly, you could do it in two steps:
First, search for (?<!')'(?!') to find all single quotes; replace them with nothing.
Explanation:
(?<!') # Assert that the previous character (if present) isn't a quote
' # Match a quote
(?!') # Assert that the next character (if present) isn't a quote
Second, search for '' and replace all with '.

Regular expression to find separator dots in formula

The C# expression library I am using will not directly support my table/field parameter syntax:
The following are table/field parameter names that are not directly supported:
TableName1.FieldName1
[TableName1].[FieldName1]
[Table Name 1].[Field Name 1]
It accepts alphanumeric parameters without spaces, or most characters enclosed within square brackets. I would like to use C# regular expressions to replace the dot separators and neighboring brackets to a different delimiter, so the results would be as follows:
[TableName1|FieldName1]
[TableName1|FieldName1]
[Table Name 1|Field Name 1]
I also need to skip any string literals within single quotes, like:
'TableName1.FieldName1'
And, of course, ignore any numeric literals like:
12345.6789
EDIT: Thank you for your feedback on improving my question. Hopefully it is clearer now.
I've written a completely new answer, now that the problem is clarified:
You can do this in a single regex. It is quite bulletproof, I think, but as you can see, it's not exactly self-explanatory, which is why I've commented it liberally. Hope it makes sense.
You're lucky that .NET allows re-use of named capturing groups, otherwise you would have had to do this in several steps.
resultString = Regex.Replace(subjectString,
#"(?: # Either match...
(?<before> # (and capture into backref <before>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters,
) # (End of capturing group <before>).
\. # then a literal dot,
(?<after> # (now capture again, into backref <after>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters.
) # (End of capturing group <after>) and end of match.
| # Or:
\[ # Match a literal [
(?<before> # (now capture into backref <before>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <before>).
\]\.\[ # Match literal ].[
(?<after> # (capture into backref <after>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <after>).
\] # Match a literal ]
) # End of alternation. The match is now finished, but
(?= # only if the rest of the line matches either...
[^']*$ # only non-quote characters
| # or
[^']*'[^']*' # contains an even number of quote characters
[^']* # plus any number of non-quote characters
$ # until the end of the line.
) # End of the lookahead assertion.",
"[${before}|${after}]", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

Categories