Regular Expression to check the spaces and minimum entries in C# - c#

I am using c# for programming!
I want to write one regular expression in c# which will check first and last space in a sentence and will allow spaces in between it as well as there should be minimumm 2 charater entry in field, no limit for maximum characters, no special keys are allowed (#,#,$ etc) characters allowed
Please suggests!

It's not really clear exactly what you want. Your comment -- contradicting the question itself -- suggests something like this, perhaps...
^[A-Za-z0-9]+(?:\s*[A-Za-z0-9]+)+$
This means that the string must start and end with an alphanumeric, and all characters except the first and last must be either alphanumeric or whitespace.

Related

Password complexity regex with number or special character

I've got some regex that'll check incoming passwords for complexity requirements. However it's not robust enough for my needs.
((?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,20})
It ensures that a password meets minimum length, contains characters of both cases and includes a number.
However I need to modify this so that it can contain a number and/or an allowed special character. I've even been given a list of allowed special characters.
I have two problems, delimiting the special characters and making the first condition do an and/or match for number or special.
I'd really appreciate advice from one of the regex gods round these parts.
The allowed special characters are: #%+\/'!#$^?:.(){}[]~-_
If I understand your question correctly, you're looking for a possibility to require another special character. This could be done as follows (see the last lookahead):
((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[!§$%&/(/)]).{8,20})
See a demo for this approach here on regex101.com.
However, you can make your expression even better with further approvements: the dot-star (.*) brings you down the line and backtracks afterwards. If you have a password of say 10 characters and you want to make sure, four lookaheads need to be fulfilled, you'll need at least 40 steps (even more as the engine needs to backtrack).
To optimize your expression, you could use the exact opposite of your required characters, thus making the engine come to an end faster. Additionally, as already pointed out in the comments, do not limit your maximum password length.
In the language of regular expressions, this would come down to:
((?=\D*\d)(?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z])(?=.*[!§$%&/(/)]).{8,})
With the first approach, 63 steps are needed, while the optimized version only needs 29 steps (the half!). Regarding your second question, allowing a digit or a special character, you could simply use an alternation (|) like so:
((?:(?=\D*\d)|(?=.*[!§$%&/(/)]))(?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z]).{8,})
Or put the \d in the brackets as well, like so:
((?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z])(?=.*[\d!§$%&/(/)]).{8,})
This one would consider to be ConsideredAgoodPassw!rd and C0nsideredAgoodPassword a good password.

Noncapturing along with capturing match

I am trying to capture the subdomain from huge lists of domain names. For example I want to capture "funstuff" from "funstuff.mysite.com". I do not want to capture, ".mysite.com" in the match. These occurances are in a sea of text so I can not depend on them being at the start of a line. I know the subdomain will not include any special characters or numbers. So what I have is:
[a-z]{2,10}(?=\.mysite\.com)
The problem is this will work only if the subdomain is NOT preceded by a number or special character. For example, "asdfbasdasdfdfunstuff.mysite.com" will return "fdfunstuff" but "asdfasf23/funstuff.mysite.com" won't make a match.
I can not depend on there being a special character before the subdomain, like a "/" as in "http://funstuff.mysite.com" so that can not be used as part of the condition.
It is ok if the capture gets erroneous text before the subdomain, although 99% of the time it will be preceded with something other that a lowercase letter. I have tried,
(?<=[^a-z])[a-z]{2,10}(?=\.mysite\.com)
but for some reason this does not capture text is a situation like:
afb"asdfunstuff.mysite.com
Where the quotation mark prevents a match for [a-z]{2-20}. Basically what I would want to do in that case would be to capture asdfunstuff.mysite.com. How can this be accomplished?
So you've got two problems to solve: first, you want to match ".mysite.com" but not capture it; second, you want to grab up to 10 alphabetic characters in the "subdomain" position.
First problem can be solved by using a capturing group. The regex
([a-z]{2,10})\.mysite\.com
will capture somewhere between 2 and 10 characters, and the returned match object will expose that in one of its properties (depends on the language). C# returns a collection of Match objects, so it'll be the only item.
Second problem can be solved by using the word-boundary character \b. In .NET, this matches where an alphanumeric (i.e. \w) is next to a non-alphanumeric (\W). Other languages (e.g. ECMAScript / Javascript) work simliarly.
So, I suggest the following regex to solve your problem:
\b([a-z]{2,10})\.mysite\.com
Note that numbers are legal in subdomain names, too, so the following might be generally correct (though perhaps not in your specific case):
\b(\w{2,10})\.mysite\.com
where the "word character" \w is equivalent to [a-zA-Z_0-9] in .NET's ECMAScript-compliant mode. (Further reading.)

Regex for key value pairs

I am not great with regular expressions and I have a need to parse out key/value pairs from a string. An example of the string would be:
Event Name CallingNumber:+15555555555 CallID:12345 CallingName:Doe, John CallingTime:12-26-2013 14:27:41.645497
The result I'm looking for would be something like this:
CallingNumber=+15555555555
CallID=12345
CallingName=Doe, John
CallingTime=12-26-2013 14:27:41.645497
The key/value pairs are delimited by a space, but the value is allowed to have a space in it (ex: Doe, John). It would be nice if the values were surrounded by quotes or something, but they are not. Essentially I'm trying to match a word without a space followed by a colon and then any character after the colon until it reaches another word without a space followed by a colon.
Your match is impossible, the fields are delimited with : but you have a date with : in there, as well, Regex can't really distinguish those very easily.
Still, this is what I came up with:
(.+?):(.+?)(?=(?:[^\s]+:)|(?:$))
Again, beacuse of the date, this won't work perfectly.
Here's a fiddle to demonstrate: http://www.rexfiddle.net/Wm3NiK0
Edit: If your "keys" are only letters (not numbers), which avoids the time/date problem, then this will work:
([A-Za-z]+?):(.+?)\s?(?=(?:[A-Za-z]+:)|(?:$))
Here's another fiddle to demonstrate this: http://www.rexfiddle.net/sGQs7YV
You can apply the regex repeatedly, with a (.*) to return the "yet to be parsed" remainder
In pseudocode form, this might be:
match string to "^(([^:]*\s)*[^:]*)\s+(.*)$"
should grab "Event Name" and leave the rest as $3
loop:
keep only $3 as new base string
match new base string to "^(\w+)[:](.+?)\s+(\w+[:].*)$"
key = $1, value = $2, new remainder = $3
repeat until no $1, $2 values are returned
"I'm suing .NET (c#)," good idea! :) Microsoft needs to be put in its place!
Do you have a fixed number of fields, or could they vary in number? Do you expect the same fields each time? In the same order? If a fixed number, you could hard code the number of fields in the regexp, but I still think that trying to do it with just one regexp is asking for a headache. Use some scripting code and break it down piece by piece, first of all splitting it on :\s+. The last word in a group is then stripped off as the name of the next group, and the remainder is the value of the previous group. The first and last groups have to have some special treatment. I think that would be a lot easier and more understandable than trying to do it in one ugly regexp. As a bonus, any number of fields in any order could be handled.

I need a Regular Expression allowing user to input numbers, plus, minus and parentheses

I need a Regular Expression allowing user to input numbers, plus, minus and parentheses.
User can only input:
At most one open parenthesis '('.
At most one close parenthesis ')'.
At most one plus '+'
As many minus '-' but not after each other.
Exactly 11 numbers.
Here are valid inputs:
(0)+12-3-4-56-7890
+)0(12345-678-90
+01234567890
+(01234567890)
01234567890
-01-234+5678-90
(01234567890)
)01234567890(
And following are not valid:
0123456--7890
0((1234567890
01234567890))
++01234567890
123456
++123456789
I'm using C# for programming and if it helps order of open and close parentheses can become mandatory too. so )01234567890( will not be valid.
Thanks in advance
This regex passes your examples, but might not be exactly what you're looking for. It should point you in the right direction.
^(?!.*-{2,})(?!(?:.*\)){2,})(?!(?:.*\(){2,})(?!\+{2,})(?:\D*\d\D*){11}$
(?!.*-{2,}) Cannot contain two or more hyphens.
(?!(?:.*)){2,}) Cannot contain two or more closing parentheses.
(?!(?:.*(){2,}) Cannot contain two or more opening parentheses.
(?!+{2,}) Cannot start with more than two addition symbols.
(?:\D*\d\D*){11} Must contain 11 instances of a numeric character surrounded by anything.
However, this is very confusing and fairly inefficient. I bet the regex could be rewritten to be much quicker, but won't be much easier to understand.
I suggest that you follow MisterJack's suggestion instead of pursue a regex. It'll be easier to maintain.
EDIT
^(?!.*--)(?!.*(\(|\)|\+).*\1)(?:\D*\d\D*){11}$
I've consolidated the parentheses and plus symbol rules into one negative lookahead using a backreference. This also restricts the number of parens and pluses to just one of each. I couldn't get it to restrict to just a certain set of characters, but you might be able to do that in a second pass with another regex.
^ Match from beginning of the string
(?!.*--) Do not allow consecutive hyphens
(?!.* ((|)|+).*\1) Do not allow two or more instances of () or +
(?:\D*\d\D*){11} Must contain 11 digits, allow non-digit characters before and after, such as hyphen.
$ Match to end of string
I tried a negative and positive lookahead to restrict the characters, but couldn't get it to work right. I also tried to replace \D with [()+-] but that didn't work either. Maybe someone else will add a comment to show how to restrict the characters. I'd sure love to see how someone else does it in this regex.
I think that a regular expression isn't your best bet, because it could become too much complicated and it can easily be broken.
What I suggest you is to try to parse your input, i.e. to count how many numbers, minuses, plus and parenthesis the user entered, and if they appear in the right order. An easy way to do this could be to loop over the characters that compose the string and check if the current char:
is a number (and we keep count of how many numbers we found)
is a minus (and the previous char isn't a minus)
is a plus (and it is the first one)
is a parenthesis (it's the first open parenthesis or it's a closed one and we already found the open parenthesis)
This could do the trick.

Shall this Regex do what I expect from it, that is, matching against "A1:B10,C3,D4:E1000"?

I'm currently writing a library where I wish to allow the user to be able to specify spreadsheet cell(s) under four possible alternatives:
A single cell: "A1";
Multiple contiguous cells: "A1:B10"
Multiple separate cells: "A1,B6,I60,AA2"
A mix of 2 and 3: "B2:B12,C13:C18,D4,E11000"
Then, to validate whether the input respects these formats, I intended to use a regular expression to match against. I have consulted this article on Wikipedia:
Regular Expression (Wikipedia)
And I also found this related SO question:
regex matching alpha character followed by 4 alphanumerics.
Based on the information provided within the above-linked articles, I would try with this Regex:
Default Readonly Property Cells(ByVal cellsAddresses As String) As ReadOnlyDictionary(Of String, ICell)
Get
Dim validAddresses As Regex = New Regex("A-Za-z0-9:,A-Za-z0-9")
If (Not validAddresses.IsMatch(cellsAddresses)) then _
Throw New FormatException("cellsAddresses")
// Proceed with getting the cells from the Interop here...
End Get
End Property
Questions
1. Is my regular expression correct? If not, please help me understand what expression I could use.
2. What exception is more likely to be the more meaningful between a FormatException and an InvalidExpressionException? I hesitate here, since it is related to the format under which the property expect the cells to be input, aside, I'm using an (regular) expression to match against.
Thank you kindly for your help and support! =)
I would try this one:
[A-Za-z]+[0-9]+([:,][A-Za-z]+[0-9]+)*
Explanation:
Between [] is a possible group of characters for a single position
[A-Za-z] means characters (letters) from 'A' to 'Z' and from 'a' to 'z'
[0-9] means characters (digits) from 0 to 9
A "+" appended to a part of a regex means: repeat that one or more times
A "*" means: repeat the previous part zero or more times.
( ) can be used to define a group
So [A-Za-z]+[0-9]+ matches one or more letters followed by one or more digits for a single cell-address.
Then that same block is repeated zero or more times, with a ',' or ':' separating the addresses.
Assuming that the column for the spreadsheet is any 1- or 2-letter value and the row is any positive number, a more complex but tighter answer still would be:
^[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?(,[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?)*$
"[A-Z]{1,2}[1-9]\d*" is the expression for a single cell reference. If you replace "[A-Z]{1,2}[1-9]\d*" in the above with then the complex expression becomes
^<cell>(:<cell>)?(,<cell>(:<cell>*)?)*$
which more clearly shows that it is a cell or a range followed by one or more "cell or range" entries with commas in between.
The row and column indicators could be further refined to give a tighter still, yet more complex expression. I suspect that the above could be simplified with look-ahead or look-behind assertions, but I admit those are not (yet) my strong suit.
I'd go with this one, I think:
(([A-Z]+[1-9]\d*:)?[A-Z]+[1-9]\d*,)*([A-Z]+[1-9]\d*:)?[A-Z]+[1-9]\d*
This only allows capital letters as the prefix. If you want case insensitivity, use RegexOptions.IgnoreCase.
You could simplify this by replacing [A-Z]+[1-9]\d* with plain old [A-Z]\d+, but that will only allow a one-letter prefix, and it also allows stuff like A0 and B01. Up to you.
EDIT:
Having thought hard about DocMax's mention of lookarounds, and using Hans Kesting's answer as inspiration, it occurs to me that this should work:
^[A-Z]+\d+((,|(?<!:\w*):)[A-Z]+\d+)*$
Or if you want something really twisted:
^([A-Z]+\d+(,|$|(?<!:\w*):))*(?<!,|:)
As in the previous example, replace \d+ with [1-9]\d* if you want to prevent leading zeros.
The idea behind the ,|(?<!\w*:): is that if a group is delimited by a comma, you want to let it through; but if it's a colon, it's only allowed if the previous delimiter wasn't a colon. The (,|$|...) version is madness, but it allows you to do it all with only one [A-Z]+\d+ block.
However! Even though this is shorter, and I'll admit I feel a teeny bit clever about it, I pity the poor fellow who has to come along and maintain it six months from now. It's fun from a code-golf standpoint, but I think it's best for practical purposes to go with the earlier version, which is a lot easier to read.
i think your regex is incorrect, try (([A-Za-z0-9]*)[:,]?)*
Edit : to correct the bug pointed out by Baud : (([A-Za-z0-9]*)[:,]?)*([A-Za-z0-9]+)
and finally - best version : (([A-Za-z]+[0-9]+)[:,]?)*([A-Za-z]+[0-9]+)
// ah ok this wont work probably... but to answer 1. - no i dont think your regex is correct
( ) form a group
[ ] form a charclass (you can use A-Z a-d 0-9 etc or just single characters)
? means 1 or 0
* means 0 or any
id suggest reading http://www.regular-expressions.info/reference.html .
thats where i learned regexes some time ago ;)
and for building expressions i use Rad Software Regular Expression Designer
Let's build this step by step.
If you are following an Excel addressing format, to match a single-cell entry in your CSL, you would use the regular expression:
[A-Z]{1,2}[1-9]\d*
This matches the following in sequence:
Any character in A to Z once or twice
Any digit in 1 to 9
Any digit zero or more times
The digit expression will prevent inputting a cell address with leading zeros.
To build the expression that allows for a cell address pair, repeat the expression preceded by a colon as optional.
[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?
Now allow for repeating the pattern preceded by a comma zero or more times and add start and end string delimiters.
^[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?(,[A-Z]{1,2}[1-9]\d*(:[A-Z]{1,2}[1-9]\d*)?)*$
Kind of long and obnoxious, I admit, but after trying enough variants, I can't find a way of shortening it.
Hope this is helpful.

Categories