How to handle validations(RegEx) while localizing application - c#

We decided to support other languages by our project and I started localizing it.
In some text boxes, we are using text validations where we allow only certain characters like only alphabets from a to z or only certain characters. When we run our application in other language OS like Hebrew or Hindi, user will not be able to enter any text in those text boxes due to validation.
How can we make these rules localize\Globalize? How to handle these types of scenarios while localizing application

Use {L} along with your Regex for achieving the required validation for all languages.
To match any letter character from any language use:
\p{L}
If you also want to match numbers:
[\p{L}\p{Nd}]+
`\p{L}` ... matches a character of the unicode category letter.
it is the short form for [\p{Ll}\p{Lu}\p{Lt}\p{Lm}\p{Lo}]
\p{Ll} ... matches lowercase letters. (abc)
\p{Lu} ... matches uppercase letters. (ABC)
\p{Lt} ... matches titlecase letters.
\p{Lm} ... matches modifier letters.
\p{Lo} ... matches letters without case. (中文)
\p{Nd} ... matches a character of the unicode category decimal digit.
Just replace: ^[a-zA-Z0-9\s]+$ with ^[\p{L}0-9\s]+$

Related

Match vocabulary words and phrases

I am writing an application/logic that has vocabulary word/phrase as an input parameter. I am having troubles writing validation logic for this parameter's value!
Following are the rules I've came up with:
can be up to 4 words (with hyphens or not)
one apostrophe is allowed
only regular letters are allowed (no special characters like !##$%^&*()={}[]"";|/>/? ¶ © etc)
numbers are disallowed
case insensitive
multiple languages support (English, Russian, Norwegian, etc..) (so both Unicode and Cyrillic must be supported)
either whole string matches or nothing
Few examples (in 3 languages):
// match:
one two three four
one-two-three-four
one-two-three four
vær så snill
тест регекс
re-read
under the hood
ONe
rabbit's lair
// not-match:
one two three four five
one two three four#
one-two-three-four five
rabbit"s lair
one' two's
one1
1900
Given the expected result provided above - could someone point me to right direction on how to create a validation rule like that? If that matters - I will be writing validation logic in C# so I have more tools than just Regex available at my disposal.
If that is going to be of any help - I have been testing several solutions, like these ^[\p{Ll}\p{Lt}]+$ and (?=\S*['-])([a-zA-Z'-]+)$. The first regex seems to be doing a great job allowing just the letters I need (En, No and Rus), whereas the second rule set is doing great in using the Lookahead concept.
\p{Ll} or \p{Lowercase_Letter}: a lowercase letter that has an uppercase variant.
\p{Lu} or \p{Uppercase_Letter}: an uppercase letter that has a lowercase variant.
\p{Lt} or \p{Titlecase_Letter}: a letter that appears at the start of a word when only the first letter of the word is capitalized.
\p{L&} or \p{Letter&}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).
\p{Lm} or \p{Modifier_Letter}: a special character that is used like a letter.
\p{Lo} or \p{Other_Letter}: a letter or ideograph that does not have lowercase and uppercase variants.
Needless to say, neither of the solutions I have been testing take into account all the rules I defined above..
You can use
\A(?!(?:[^']*'){2})\p{L}+(?:[\s'-]\p{L}+){0,3}\z
See the regex demo. Details:
\A - start of string
(?!(?:[^']*'){2}) - the string cannot contain two apostrophes
\p{L}+ - one or more Unicode letters
(?:[\s'-]\p{L}+){0,3} - zero to three occurrences of
[\s'-] - a whitespace, ' or - char
\p{L}+ - one or more Unicode letters
\z - the very end of string.
In C#, you can use it as
var IsValid = Regex.IsMatch(text, #"\A(?!(?:[^']*'){2})\p{L}+(?:[\s'-]\p{L}+");{0,3}\z")

Foreign language characters in Regular expression in C#

In C# code, I am trying to pass chinese characters: " 中文ABC123".
When I use alphanumeric in general using "^[a-zA-Z0-9\s]+$",
it doesn't pass for "中文ABC123" and regex validation fails.
What other expressions do I need to add for C#?
To match any letter character from any language use:
\p{L}
If you also want to match numbers:
[\p{L}\p{Nd}]+
\p{L} ... matches a character of the unicode category letter.
it is the short form for [\p{Ll}\p{Lu}\p{Lt}\p{Lm}\p{Lo}]
\p{Ll} ... matches lowercase letters. (abc)
\p{Lu} ... matches uppercase letters. (ABC)
\p{Lt} ... matches titlecase letters.
\p{Lm} ... matches modifier letters.
\p{Lo} ... matches letters without case. (中文)
\p{Nd} ... matches a character of the unicode category decimal digit.
Just replace: ^[a-zA-Z0-9\s]+$ with ^[\p{L}0-9\s]+$
Thanks to #Andie2302 for pointing to the right way to do it.
In Addition, for many language in the world, it's still has the 'addition character' that require main character to generate it (ex. Thai word 'เก็บ' if use only \p{L} it will display only 'เกบ', you can see that some symbolic will be missing from the word).
That's why only \p{L} will not work for all foreign language.
So, you need to use code below, to support almost foreign language
\p{L}\p{M}
NOTE:
L stand for 'Letter' (All letter from all language, but does not include the 'Mark')
M stand for 'Mark' (The 'Mark' cannot display alone, it require 'Letter' to display it)
In Addition that you need Number, use code below
\p{N}
NOTE:
N stand for 'Numeric'
Thanks to this website for very useful information
https://www.regular-expressions.info/unicode.html

Regex validation Comma Separated Words - Foreign Charcters

I am developing an application in Arabic-English language, so i needed a Regex that validates to a set of separated words, here is my RegEx:
^([a-zA-Z]+(,[a-zA-Z]+)*)?$
This works flawless for me but as you see the charters specified is in English, i want this for Arabic language.
Can this expression be altered to accept other charters either Arabic or even maybe some other language ?
Instead of restricting to a set of alphabetical character, exclude the characters that mark the end of your word.
^([^,]+(,[^,]+)*)?$
If you really want to match Arabic characters, see: regular expression For Arabic Language

regular expressions with the Cyrillic alphabet?

I am currently writing some validation that will validate inputted data. I am using regular expressions to do so, working with C#.
Password = #"(?!^[0-9]*$)(?!^[a-zA-Z]*$)^([a-zA-Z0-9]{6,18})$"
Validate Alpha Numeric = [^a-zA-Z0-9ñÑáÁéÉíÍóÓúÚüÜ¡¿{0}]
The above work fine on the latin alphabet, but how can I expand such to working with the Cyrillic alphabet?
The basic approach to covering ranges of characters using regular expressions is to construct an expression of the form [A-Za-z], where A is the first letter of the range, and Z is the last letter of the range.
The problem is, there is no such thing as "The" Cyrillic alphabet: the alphabet is slightly different depending on the language. If you would like to cover Russian version of the Cyrillic, use [А-Яа-я]. You would use a different range, say, for Serbian, because the last letter in their Cyrillic is Ш, not Я.
Another approach is to list all characters one-by-one. Simply find an authoritative reference for the alphabet that you want to put in a regexp, and put all characters for it into a pair of square brackets:
[АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюя]
You can use character classes if you need to allow characters of particular language or particular type:
#"\p{IsCyrillic}+" // Cyrillic letters
#"[\p{Ll}\p{Lt}]+" // any upper/lower case letters in any language
In your case maybe "not a whitespace" would be enough: #"[^\s]+" or maybe "word character (which includes numbers and underscores) - #"\w+".
Password = #"(?!^[0-9]*$)(?!^[А-Яа-я]*$)^([А-Яа-я0-9]{6,18})$"
Validate Alpha Numeric = [^а-яА-Я0-9ñÑáÁéÉíÍóÓúÚüÜ¡¿{0}]

Regex in C# for password

I have regex for validating user passwords to contain:
atleast 8 alpha numberic characters
1 uppercase letter
1 lowercase letter
1 digit
Allowed special charaters !##$%*.~
I am using the following regex:
(?=(.*\w){8,})(?=(.*[A-Z]){1,})(?=(.*[a-z]){1,})(?=(.*[0-9]){1,})(?=(.*[!##$%*.~]))
This however does not prevent the user from entering other special characters
such as <,> , &.
How do I can restrict the allowed number of special characters?
A single regex to validate everything will ultimately look like line noise.
Instead I suggest:
Use simple String functions to test length
Use Regex to test for character inclusion and validity
^(?=.*[a-z])(?=.*[A-Z])(?=.*[^a-zA-Z])[a-zA-Z0-9!##$%*.~]{8,}$
The anchoring (^ and $) is important, by the way.

Categories