I like to create regex expression that would check a string to make sure it starts with:
Anything after this first number is an irrelevance. Is regex the best approach to solving this problem?
The L implies a letter and the # implies any number.
Use Regex.IsMatch(input, "^[a-zA-Z]\\.[a-zA-Z]{2}\\.\\d")
Assuming L represents a letter in the range A-Z or a-Z
Imagine a string that contains special characters like $§%%,., numbers and letters.
I want to receive the letter and number junks of an arbitrary string as an array of strings.
A good solution seems to be the use of regex, but I don't know how to express [numbers and letters]
// example
"abc" = {"abc"};
"ab .c" = {"ab", "c"}
"ab123,cd2, ,,%&$§56" = {"ab123", "cd2", "56"}
// try
string input = "jdahs32455$§&%$§df233§$fd";
string[] output = input.Split(Regex("makejunksfromstring"));
To extract chunks of 1 or more letters/digits you may use
[A-Za-z0-9]+ # ASCII only letters/digits
[\p{L}0-9]+ # Any Unicode letters and ASCII only digits
[\p{L}\p{N}]+ # Any Unicode letters/digits
See a regex demo.
C# usage:
string[] output = Regex.Matches(input, #"[\p{L}\p{N}]+").Cast<Match>().Select(x => x.Value).ToArray();
Yes, regex is indeed a good solution for this.
And in fact, to just match all standard words in the input sequence, this is all you need:
Let me quickly explain
\w matches any word character and is equivalent to [a-zA-Z0-9_] - matching a through z or A through Z or 0-9 or _, you might wanna go with [a-zA-Z0-9] instead to avoid that underscore.
Wrapping an expression in () means that you want to capture that part as a group.
The + means that you want sequences of 1 or more of the preceding characters.
I am looking for a regular expression that returns success only if the input string contains following characters:
a-zA-Z0-9~!#$^ ()_-+’:.?
Is this regular expression correct?
^[a-zA-Z0-9~!#$^ ()_-+’:.?]+$
I have understood what ^ means here but not sure about +$. Also are there any alternatives to this? By the way the above regular expression also includes a space character between ^ and (
it only contains the characters listed above
bool invalidCharsExist =
Regex.Replace(input, #"[a-zA-Z0-9~!#\$\^\ \(\)_\-\+’:\.\?]", "").Length != 0;
BTW: This is not fully equivalent to your regex (It will also include non-ascii letters and digits) but I think it is a better way to check
var specialChars = new HashSet<char>("~!#$^ ()_-+’:.?");
var allValid = input.All(c => char.IsLetterOrDigit(c) || specialChars.Contains(c));
Close, but get rid of that dash in the middle of your character class and put it at the beginning:
^[-a-zA-Z0-9~!#$^ ()_+’:.?]+$
And make sure when you put it in a string that you use the proper string qualifier (I forget what it's called):
#"^[-a-zA-Z0-9~!#$^ ()_+’:.?]+$"
As to whether or not you can do it in other ways, sure, for example a negative look-ahead that doesn't actually match anything. I don't think a proper regex optimizer would leave one better than the other, it's just a matter of preference. Do you want something that looks to succeed (selects the entire string if valid), or something that looks to fail (negative look-ahead).
Honestly if performance is at all important, you should write a good old for and loop over the characters (or the equivalent LINQ implementation). Regex won't even be in the ballpark.
the regular expression would be: ^[a-zA-Z0-9~!#$^ ()_\-+’:.?]+$
I personally recommend using https://regex101.com to check regex expressions - note that they don't have C# support, but in general javascript's RegExp has similar syntax to C#, but what it does give you a particularly useful explaination of what your expression is doing, here is this epression's explaination from there:
^ assert position at start of the string
[a-zA-Z0-9~!#$^ ()_\-\+’:.?]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
0-9 a single character in the range between 0 and 9
~!#$^ ()_ a single character in the list ~!#$^ ()_ literally
\- matches the character - literally
+’:.? a single character in the list ’:.? literally
$ assert position at end of the string
the issue with what you put in the OP was literally only forgetting to escape the - as it is reserved in the regular expression pattern to be used for special purposes (i.e in the [] notation the - is reserved to declare a character range like a-z)
I am trying to use Regular Expressions to find a string sequence inside a string.
The pattern i am looking for is:
dd.dd.dddd dd:dd:dd //d is a digit from 0-9
my regex is:
Regex r = new Regex(#"(\d[0-9]{2}.\d[0-9]{2}.\d[0-9]{4}\s\d[0-9]{2}:\d[0-9]{2}:\d[0-9]{2})$");
I am now trying to check, if the string "27.11.2014 09:14:59" is Matching to the regex, but sadly it isn't matching.
string str= "27.11.2014 09:14:59";
Regex r = new Regex(#"(\d[0-9]{2}.\d[0-9]{2}.\d[0-9]{4}\s\d[0-9]{2}:\d[0-9]{2}:\d[0-9]{2})$");
test = r.IsMatch(str,0);
//output: test=false
Anyone knows why the String is not Matching with that regular expression?
\d[0-9]{2} matches three digits:
\d first digit
[0-9] second digit
{2} causes the previous expression ([0-9]) to match again
If you remove all occurences of \d, your pattern should work. You should escape all dots . though, because right now they match any character, not just a ..
As Rawing already said, the upper Regular expression is trying to match 3 digits instead of one. for everyone who want to know how the regular expression should look like:
Thats working, at least for me.
I want to check in a C# program, if a user input is a single word. The word my only have characters A-Z and a-z. No spaces or other characters.
I try [A-Za-z]* , but this doesn't work. What is wrong with this expression?
Regex regex = new Regex("[A-Za-z]*");
if (!regex.IsMatch(userinput);)
Can you recomend website with a comprensiv list of regex examples?!
It probably works, but you aren't anchoring the regular expression. You need to use ^ and $ to anchor the expression to the beginning and end of the string, respectively:
Regex regex = new Regex("^[A-Za-z]+$");
I've also changed * to + because * will match 0 or more times while + will match 1 or more times.
You should add anchors for start and end of string: ^[A-Za-z]+$
Regarding the question of regex examples have a look at http://regexlib.com/.
For the regex, have a look at the special characters ^ and $, which represent starting and ending of string. This site can come in handy when constructing regexes in the future.
The asterisk character in regex specifies "zero or more of the preceding character class".
This explains why your expression is failing, because it will succeed if the string contains zero or more letters.
What you probably intended was to have one or more letters, in which case you should use the plus sign instead of the asterisk.
Having made that change, now it will fail if you enter a string that doesn't contain any letters, as you intended.
However, this still won't work for you entirely, because it will allow other characters in the string. If you want to restrict it to only letters, and nothing else, then you need to provide the start and end anchors (^ and $) in your regex to make the expression check that the 'one or more letters' is attached to the start and end of the string.
This should work as intended.
Hope that helps.
For more information on regex, I recommend http://www.regular-expressions.info/reference.html as a good reference site.
I don't know what the C#'s regex syntax is, but try [A-Za-z]+.
Try ^[A-Za-z]+$ If you don't include the ^$ it will match on any part of the string that has a alpha characters in it.
I know the question is only about strictly alphabetic input, but here's an interesting way of solving this which does not break on accented letters and other such special characters.
The regex "^\b.+?\b" will match the first word on the start of a string, but only if the string actually starts with a valid word character. Using that, you can simply check if A) the string matches, and B) the length of the matched string equals your full string's length:
public Boolean IsSingleWord(String userInput)
Regex firstWordRegex = new Regex("^\\b.+?\\b");
Match firstWordMatch = firstWordRegex.Match(userInput);
return firstWordMatch.Success && firstWordMatch.Length == userInput.Length;
The other persons have wrote how to resolve the problem you know. Now I'll speak about the problem you perhaps don't know: diacritics :-) Your solution doesn't support àèéìòù and many other letters. A correct solution would be:
where \p{L} is any letter plus \p{M}* that is 0 or more diacritic marks (in unicode diacritics can be "separated" from base letters, so you can have something like a + ` = à or you can have precomposed characters like the standard à)
if you just need the characters a-zA-Z you could simply iterate over the characters and compare the single characters if they are inside your range
for example:
for each character c: ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
This could increase your performance
I need a regular expression for c# which can match following pattern
basically my expression should have at least one number and min size is 8 char including number. If possible explain the regex also.
I'd probably check with two statements. Just check the length eg
string.Length > 7
and then make sure it this regex can find a match...
You can use a look-ahead assertion to verify the length, and then search forward for a digit, thus:
We look-ahead for 8 characters, and if that is successful, then we actually attempt to match "anything, followed by a digit".
But really, don't do this. Just check the length explicitly. It's much clearer.
Your regular expression pattern should just be: \d+ (match 1 or more numbers). For your example, it's probably best to not determine minimum length using regex since all you care about is that it has at least 1 number and is at least than 8 characters
Regex regEx = new Regex(#"\d+");
isValid = regEx.Match(myString).Success && myString.Length >= 8;
The pattern \d is just the same as [0-9] and the + symbol means at least one of. The # symbol in front of the string is so that it what try to escape \d.
As mentioned by El Ronnoco in the comments, just \d would match your requirement. Knowing about \d+ is useful for more complicated patterns where you want a few numbers in between some strings,etc.
Also: I've just read something that I didn't know. \d matches any character in the Unicode number, decimal digit category which is a lot more than just [0-9]. Something to be aware of if you just want any number. Otherwise El Ronnoco's answer of [0-9] for your pattern is sufficient.