how to create regular expression based on some condition

how to create regular expression based on some condition - c#

i want to create a regular expression to find and replace uppercase character based on some condition.
find the starting uppercase for a group of uppercase character in a string and replace it lowercase and * before the starting uppercase.
If there is any lowercase following the uppercase,replace the uppercase with lowercase and * before the starting uppercase.
input string : stackOVERFlow
expected output : stack*over*flow
i tried but could not get it working perfectly.
Any idea on how to create a regular expression ?
Thanks

Well the expected inputs and outputs are slightly illogical: you're lower-casing the "f" in "flow" but not including it in the asterisk.
Anyway, the regex you want is pretty simple: #"[A-Z]+?". This matches a string of one or more uppercase alpha characters, nongreedily (don't think it makes a difference either way as the matched character class is relatively narrow).
Now, to do the find/replace, you would do something like the following:
Regex.Replace(inputString, #"([A-Z]+?)", "*$1*").ToLower();
This simply finds all occurrences of one or more uppercase alpha characters, and wherever it finds a match it replaces it with itself surrounded by asterisks. This does the surrounding but not the lowercasing; .NET Regex doesn't provide for that kind of string modification. However, since the end result of the operation should be a string with all lowercase chars, just do exactly that with a ToLower() and you'll get the expected result.

KeithS's solution can be simplified a bit
Regex.Replace("stackOVERFlow","[A-Z]+","*$0*").ToLower()
However, this will yield stack*overf*low including the f between the stars. If you want to exclude the last upper case letter, use the following expression
Regex.Replace("stackOVERFlow","[A-Z]+(?=[A-Z])","*$0*").ToLower()
It will yield stack*over*flow
This uses the pattern find(?=suffix), which finds a position before a suffix.

Related

Regular expression in RegularExpressionAttribute behavior

I am using this regular expression: #"[ \]\[;\/\\\?:*""<>|+=]|^[.]|[.]$"
First part [ \]\[;\/\\\?:*""<>|+=] should match any of the characters inside the brackets.
Next part ^[.] should match if the string starts with a 'dot'
Last part [.]$ should match if the string ends with a 'dot'
This works perfectly fine if I use Regex.IsMatch() function. However if I use RegularExpressionAttribute in ASP.NET MVC, I always get invalid model. Does anyone have any clue why this behavior occurs?
Examples:
"abcdefg" should not match
".abcdefg" should match
"abc.defg" should not match
"abcdefg." should match
"abc[defg" should match
Thanks in advance!
EDIT:
The RegularExpressionAttribute Specifies that a data field value in ASP.NET Dynamic Data must match the specified regular expression..
Which means. I need the "abcdef" to match, and ".abcdefg" to not match. Basically negate the whole expression I have above.

You need to make sure the pattern matches the entire string.
In a general case, you may append/prepend the pattern with .*.
Here, you may use
.*[ \][;/\\?:*"<>|+=].*|^[.].*|.*[.]$
Or, to make it a bit more efficient (that is, to reduce backtracking in the first branch) a negated character class will perform better:
[^ \][;/\\?:*"<>|+=]*[ \][;\/\\?:*"<>|+=].*|^[.].*|.*[.]$
But it is best to put the branches matching text at the start/end of the string as first branches:
^[.].*|.*[.]$|[^ \][;/\\?:*"<>|+=]*[ \][;/\\?:*"<>|+=].*
NOTE: You do not have to escape / and ? chars inside the .NET regex since you can't use regex delimiters there.
C# declaration of the last pattern will look like
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*"
See this .NET regex demo.
RegularExpressionAttrubute:
[RegularExpression(
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*",
ErrorMessage = "Username cannot contain following characters: ] [ ; / \\ ? : * \" < > | + =")
]

Your regex is an alternation which matches 1 character out of 3 character classes, the first consisting of more than 1 characters, the second a dot at the start of the string and the third a dot at the end of the string.
It works fine because it does match one of the alternations, only not the whole string you want to match.
You could use 3 alternations where the first matches a dot followed by repeating the character class until the end of the string, the second the other way around but this time the dot is at the end of the string.
Or the third using a positive lookahead asserting that the string contains at least one of the characters [\][;\/\\?:*"<>|+=]
^\.[a-z \][;\/\\?:*"<>|+=]+$|^[a-z \][;\/\\?:*"<>|+=]+\.$|^(?=.*[\][;\/\\?:*"<>|+=])[a-z \][;\/\\?:*"<>|+=]+$
Regex demo

Using a regular expression to find words with lower and upper case characters in parenthesis

I'm creating a program that searches a string for finding words that are in parenthesis with lower and upper case characters but I can't seem to figure out what regular expression to use. Example word - (LowerUpper)
regular expression :
string upperLowerParens = "\\([A-Z][a-z][a-z]+[A-Z]+\\)";

try this one
\(([A-Z]+[a-z]+)[A-za-z]*\)|\(([a-z]+[A-Z]+[A-za-z]*)\)
Explanation
this is divided into two parts if one of them is matched the word between brackets will be matched
\(([A-Z]+[a-z]+)[A-za-z]*\) this validates something like (LowerUpper) where capital letters appears first
\(([a-z]+[A-Z]+[A-za-z]*)\) this validates something like (upperLower) where lower case letters appears first
f you want to allow spaces after and before brackets you can change your regex to be something like this \( *([A-Z]+[a-z]+)[A-za-z]* *\)|\( *([a-z]+[A-Z]+[A-za-z]*) *\)
check the demo here Demo

To ensure the word in parentheses has at least one upper and one lower case letter, and contains ONLY letters, you can try (with case-sensitive set)
(?=\(.*?[A-Z].*?\))(?=\(.*?[a-z].*?\))\([A-Za-z]+\)

The problem with my first reg ex posted in the question was it only returned the first three letters. but I have found this Regex works for the stated example and was verified by regex101.com
[A-Z][a-z][a-z]+[A-Z]+[a-z][a-z]+

C# Regular Expression for String matching

I am looking for a regular expression that returns success only if the input string contains following characters:
a-zA-Z0-9~!#$^ ()_-+’:.?
Is this regular expression correct?
^[a-zA-Z0-9~!#$^ ()_-+’:.?]+$
I have understood what ^ means here but not sure about +$. Also are there any alternatives to this? By the way the above regular expression also includes a space character between ^ and (

it only contains the characters listed above
bool invalidCharsExist =
Regex.Replace(input, #"[a-zA-Z0-9~!#\$\^\ \(\)_\-\+’:\.\?]", "").Length != 0;
BTW: This is not fully equivalent to your regex (It will also include non-ascii letters and digits) but I think it is a better way to check
var specialChars = new HashSet<char>("~!#$^ ()_-+’:.?");
var allValid = input.All(c => char.IsLetterOrDigit(c) || specialChars.Contains(c));

Close, but get rid of that dash in the middle of your character class and put it at the beginning:
^[-a-zA-Z0-9~!#$^ ()_+’:.?]+$
And make sure when you put it in a string that you use the proper string qualifier (I forget what it's called):
#"^[-a-zA-Z0-9~!#$^ ()_+’:.?]+$"
As to whether or not you can do it in other ways, sure, for example a negative look-ahead that doesn't actually match anything. I don't think a proper regex optimizer would leave one better than the other, it's just a matter of preference. Do you want something that looks to succeed (selects the entire string if valid), or something that looks to fail (negative look-ahead).
Honestly if performance is at all important, you should write a good old for and loop over the characters (or the equivalent LINQ implementation). Regex won't even be in the ballpark.

the regular expression would be: ^[a-zA-Z0-9~!#$^ ()_\-+’:.?]+$
I personally recommend using https://regex101.com to check regex expressions - note that they don't have C# support, but in general javascript's RegExp has similar syntax to C#, but what it does give you a particularly useful explaination of what your expression is doing, here is this epression's explaination from there:
^ assert position at start of the string
[a-zA-Z0-9~!#$^ ()_\-\+’:.?]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
0-9 a single character in the range between 0 and 9
~!#$^ ()_ a single character in the list ~!#$^ ()_ literally
\- matches the character - literally
+’:.? a single character in the list ’:.? literally
$ assert position at end of the string
the issue with what you put in the OP was literally only forgetting to escape the - as it is reserved in the regular expression pattern to be used for special purposes (i.e in the [] notation the - is reserved to declare a character range like a-z)

C# regex match behaviour

I've got this line in my code:
Match match = Regex.Match(actualValue, regexValue, RegexOptions.None);
I've got a simple question. why when checking for success meaning with the line:
if(match.Success)
then the match does succeed with the following values:
actualValue = "G:1"
regexValue = "A*"
the actual does not seem to fit at least for me so i probably miss something...
what i do want to achieve is just receiving an actual value and a regular expression and check if the actual value fits the regular expression.. i thought that's what i did there but apparently i didn't.
EDIT: another question. is there a way to treat the * as the "any char" wildcard? meaning is it possible that A* will be considered as A and after it any char is possible?

Your code itself is correct; your regular expression isn't.
Based on your comments on other answers, you're after a regular expression which matches any string which starts with A, and you're assuming that '*' means "any characters". '*' in fact means "match the preceding character zero or more times", so the regular expression you've given means "match the start of the string followed by zero or more 'A' characters", which will match absolutely anything.
If you're looking for a regular expression that matches the whole string but only if it starts with 'A', the regular expression you're after is ^A.*. The '.' character in a regular expression means "match any character". This regular expression thus means "match the start of the string, followed by an 'A', followed by zero or more other characters" and will thus match the entire string provided it starts with 'A'.
However, you already have the whole string, so this is a little unnecessary - all you really want to do is get an answer to the question "does the string start with an 'A'?". A regular expression that will achieve this is simply '^A'. If it matches, the string started with an 'A'.
Of course, it should be pointed out that you don't need a regular expression to confirm this anyway. If this is genuinely all you want to do (and it's possible you've just put together a simple example, and your real scenario is more complicated), why not just use the StartsWith method?:
bool match = actualValue.StartsWith("A");

The regex matches because A* means "look for 0 or more occurrences of 'A'". It will match any string.
If you meant to look for an arbitrary number of 'A', but at least one, try A+ instead.

Looking at the comments it looks like you're trying to match a lot of strings starting with A.
If they're separated by white space you could find all of them using the following:
bool matched = Regex.IsMatch(actualValue, #"\bA\w+");
This matches : "Atest flkjs Apple Ascii cAse".
If there is only one string you're matching and it starts with A and has no spaces:
bool matched = Regex.IsMatch(actualValue, #"^A\w+$");
This matches "Apple", but not "Apple and orange" as the second string has spaces.
As Chris noted * is not a wildcard in the way you meant with regex searches. You can find some information to get you started with regexes at regex-info.

Regex take the regular expression in the constructor.
Exampel in your case could be :
if(new Regex("A*").IsMatch(actualValue)
//Do something
If you are unsecure of the regexpattern, try it out here

How can you match words with more than one character?

I would like to use a regular expression to match all words with more that one character, as opposed to words entirely made of the same char.
This should not match: ttttt, rrrrr, ggggggggggggg
This should match: rttttttt, word, wwwwwwwwwu

The following expression will do the trick.
^(?<FIRST>[a-zA-Z])[a-zA-Z]*?(?!\k<FIRST>)[a-zA-Z]+$
capture the first character into the group FIRST
capture some more characters (lazily to avoid backtracking)
ensure that that the next character is different from FIRST using a negative lookahead assertion
capture all (at least one due to the assertion) remaining characters
Note that is sufficient to look for a character that is different from the first one, because if no character is different from the first one, all characters are equal.
You can shorten the expression to the following.
^(\w)\w*?(?!\1)\w+$
This will match some more characters other than [a-zA-Z].

I would add all unique words to a list and then used this regex
\b(\w)\1+\b
to grab all one character words and get rid of them

This doesn't use a regular expression, but I believe it will do what you require:
public bool Match(string str)
{
return string.IsNullOrEmpty(str)
|| str.ToCharArray()
.Skip(1)
.Any( c => !c.Equals(str[0]) );
}

The following RE will do the opposite of what you're asking for: match where a word is composed of the same character. It may still be useful to you though.
\b(\w)\1*\b

\b\w*?(\w)\1*(?:(?!\1)\w)\w*\b
or
\b(\w)(?!\1*\b)\w*\b
This assumes you're plucking the words out of some larger text; that's why it needs the word boundaries and the padding. If you have a list of words and you're just trying to validate the ones that meet the criteria, a much simpler regex would probably do:
(.)(?:(?!\1).)
...because you already know each word contains only word characters. On the other hand, depending on your definition of "word" you might need to replace \w in the first two regexes with something more specific, like [A-Za-z].

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

how to create regular expression based on some condition - c#

Related

Regular expression in RegularExpressionAttribute behavior

Using a regular expression to find words with lower and upper case characters in parenthesis

C# Regular Expression for String matching

C# regex match behaviour

How can you match words with more than one character?

Categories

Resources