Regex for Name, Streetname, Cityname, etc

Regex for Name, Streetname, Cityname, etc - c#

I m programming a web application with asp.net mvc and c#.
In a form a user should enter a name, a streetname and a city in different fields.
Start: The entered value has to start with an 'alphabetic' character (no matter if the language is english, chinese or french or what ever. So things like é and chinese chars and so on are ok. But chars like *¥##1 and so on are not allowed)
Middle: The same as i said first and spaces (but not two spaces after eachother).
End: That what I have said for the start.
This is correct:
A b c
Abcd ef
Abcdef
This is not correct:
1abc
A1 bc
1 2 3
a b c (space at the start)
Question:
What is the correct regex for this?
How can i set the length?
In a second case I want to allow numbers 0123456789 too (like the chars)
This is what I have: '^[a-zA-Z][a-zA-Z ][a-zA-Z]$'
Thank you

You want to validate strings that only contain letter words separated with a single space between them.
You may use a regex like
^\p{L}+(?: \p{L}+)*$
Or, if any whitepsace is allowed:
^\p{L}+(?:\s\p{L}+)*$
See the regex demo
To make it only match strings of 3 or more chars, use
^(?=.{3})\p{L}+(?:\s\p{L}+)*$
^^^^^^^^
Details
^ - start of a string
(?=.{3}) = a positive lookahead that requires any 3 chars immediately after the start of a string
\p{L}+ - 1 or more any Unicode letters
(?:\s\p{L}+)* - zero or more repetitions of
\s - any whitespace
\p{L}+ - 1 or more any Unicode letters
$ - end of string
Note that if you need to use it in ASP.NET, only use this regex to validate on the server side, as on the client side, this pattern might not be correctly handled by JavaScript regex.

You can use this regex:
^(?:\p{L}+ )*\p{L}+$
\p{L} matches all unicode code points that are in the "Letters" category.
The regex matches 0 or more of \p{L}+ (one or more letters plus a space) and then ensures there is at least one or more letters.
Demo
Example code:
Console.WriteLine(Regex.IsMatch("abc def", #"^(?:\p{L}+ )*\p{L}+$"));

Related

Regex expression with decimals, one letter and a question mark

I'm trying to make a suvat calculator so one can input decimals, a letter (e.g., S) and a question mark if you do not have a value.
Tests that will be valid include "2.3", "S", "?" but not values like "2.5s", "??", etc (only one type, can't have decimals AND a letter in the same input box)
Is there a regex expression for this? So far I have only got the regex for the decimal number:
^[0-9]\\d*(\\.\\d+)
I did also try a way simpler one but I would like a more developed expression for later on.
[0-9sS.?]

if i got your use case right, then this might work:
^(\?|(\d+\.?\d+)|\S)$
Read it as: The word contains either one question mark,
or a numeric value with propably a dot and numbers behind that
or a single letter
You can try it our here:
https://regex101.com/r/wLGJhJ/1

You can use
#"^(?:[0-9]+(?:\.[0-9]+)?|[A-Za-z?])\z"
Details:
^ - start of string
(?: - start of a non-capturing group:
[0-9]+ - one or more ASCII digits
(?:\.[0-9]+)? - an optional occurrence of . and one or more ASCII digits
| - or
[A-Za-z?] - an ASCII letter or ?char
) - end of the group
\z - the very end of string.
See a .NET regex demo online.

Trying to space words using Regex

I have a regex that is able to space words correctly, however, if something has a capitalized shortcode, it will not work.
what I'm trying to do is turn something like "TSTApplicationType" into TST Application Type".
Currently, I'm using Regex.Replace(value, "([a-z])_?([A-Z])", "$1 $2") to add the spaces to the words, however this just turns it into "TSTApplication Type".

You may use either of the two:
// Details on Approach 1
Regex.Replace(text, #"\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$)", "$& ")
// Details on Approach 2
Regex.Replace(text, #"(?<=\p{Lu})(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})", " ")
See regex demo #1 and regex demo #2
Details on Approach 1
\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$) matches
\p{Lu}{2,}(?=\p{Lu}) - 2 or more uppercase letters followed with an uppercase letter
| - or
(?>\p{Lu}\p{Ll}*)(?!$) - an uppercase letter and then 0 or more lowercase letters not at the end of string.
The replacement is the whole match (referenced with $&) and a space.
Details on Approach 2
This is a common approach that is basically inserting a space in between an uppercase letter and an uppercase letter followed with a lowercase letter ((?<=\p{Lu})(?=\p{Lu}\p{Ll})) or (|) between a lowercase letter and an uppercase letter (see (?<=\p{Ll})(?=\p{Lu})).

If you don't mind using Humanizer they also have this as well when you try to do .Humanize() on a string. This however doesn't preserve casing, but would be another option if you actually had wanted to change the casing.
"TSTApplicationType".Humanize(LetterCasing.Title); // TST Application Type

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.

Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.

One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).

If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.

This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.

This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

Validate characters and numbers in mixed data

I have created the following search patterns:
1) Search numbers within given range and excludes specific numbers (excludes 1,2,8)
string numberPattern = #"^([3-7|9 ]*)$";
2) Search letters within given range and excludes specific characters (excludes B,V)
string characterPattern = #"^(?:(?![BV])[A-Z ])+$";
And there can be three kind of inputs:
Input can be just characters: ANRPIGHSAGASGG
Input can be just numbers: 34567934567967
Input can be letters and numbers: 9ANRPIG34HS56A
Question:
Is there a way to tell regex, if using number pattern then it ignores characters and same for character pattern, that it would ignore numbers? The data just can be mixed, in mixed order, I just don't see other way than grouping numbers and characters in different lists and then use related pattern. Is there a way to accomplish that using only regex?

I suggest using
^[3-79A-Z -[BV]]*$
See the regex demo.
Details:
^ - a start of a string anchor
[3-79A-Z -[BV]]* - zero or more (*) characters:
3-79A-Z - digits from 3 to 7, 9, uppercase ASCII letters and a space except B and V ASCII letters (the -[BV] is a character class subtraction construct)
$ - end of string anchor.

Put it into a more readable state so you can maintain it.
^(?:[0-9A-Z](?<![128BV]))+$
Explained
^ # Beginning of string
(?: # Cluster group
[0-9A-Z] # Initially allow 0-9 or A-Z
(?<! [128BV] ) # Qualify, not 1,2,8,B,V
)+ # End cluster, must be at least 1 character
$ # End of string

Regex match if a string has length 2 and contains 1 letter and 1 number

Guys I hate Regex and I suck at writing.
I have a string that is space separated and contains several codes that I need to pull out. Each code is marked by beginning with a capital letter and ending with a number. The code is only two digits.
I'm trying to create an array of strings from the initial string and I can't get the regular expression right.
Here is what I have
String[] test = Regex.Split(originalText, "([a-zA-Z0-9]{2})");
I also tried:
String[] test = Regex.Split(originalText, "([A-Z]{1}[0-9]{1})");
I don't have any experience with Regex as I try to avoid writing them whenever possible.
Anyone have any suggestions?
Example input:
AA2410 F7 A4 Y7 B7 A 0715 0836 E0.M80
I need to pull out F7, A4, B7. E0 should be ignored.

You want to collect the results, not split on them, right?
Regex regexObj = new Regex(#"\b[A-Z][0-9]\b");
allMatchResults = regexObj.Matches(subjectString);
should do this. The \bs are word boundaries, making sure that only entire strings (like A1) are extracted, not substrings (like the A1 in TWA101).
If you also need to exclude "words" with non-word characters in them (like E0.M80 in your comment), you need to define your own word boundary, for example:
Regex regexObj = new Regex(#"(?<=^|\s)[A-Z][0-9](?=\s|$)");
Now A1 only matches when surrounded by whitespace (or start/end-of-string positions).
Explanation:
(?<= # Assert that we can match the following before the current position:
^ # Start of string
| # or
\s # whitespace.
)
[A-Z] # Match an uppercase ASCII letter
[0-9] # Match an ASCII digit
(?= # Assert that we can match the following after the current position:
\s # Whitespace
| # or
$ # end of string.
)
If you also need to find non-ASCII letters/digits, you can use
\p{Lu}\p{N}
instead of [A-Z][0-9]. This finds all uppercase Unicode letters and Unicode digits (like Ä٣), but I guess that's not really what you're after, is it?

Do you mean that each code looks like "A00"?
Then this is the regex:
"[A-Z][0-9][0-9]"
Very simple... By the way, there's no point writing {1} in a regex. [0-9]{1} means "match exactly one digit, which is exactly like writing [0-9].
Don't give up, simple regexes make perfect sense.

This should be ok:
String[] all_codes = Regex.Split(originalText, #"\b[A-Z]\d\b");
It gives you an array with all code starting with a capital letter followed by a digit, separated by an kind of word boundary (site space etc.)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex for Name, Streetname, Cityname, etc - c#

Related

Regex expression with decimals, one letter and a question mark

Trying to space words using Regex

Regex for alpha number string in c# accepting underscore and white spaces

Validate characters and numbers in mixed data

Regex match if a string has length 2 and contains 1 letter and 1 number

Categories

Resources