How to validate a string contains only latin characters including diacritics [duplicate]

How to validate a string contains only latin characters including diacritics [duplicate] - c#

I'd like to restrict my form input from entering non-english characters. For example, all Chinese, Japanese, Cyrllic, but also single characters like: à, â, ù, û, ü, ô, î, ê. Would this be possible? Do I have to set up a locale on my MVC application or rather just do a regex textbox validation? Just a side note, I want to be able to enter numbers and other characters. I only want this to exclude letters.
Please advice, thank you

For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.
For more details, see:
regular-expressions.info for some general information about Unicode code points, character properties, scripts and blocks
MSDN for the supported properties and blocks in .net
Those Unicode Properties and blocks are written \p{Name}, where "Name" is the name of the property or block.
When it is an uppercase "P" like this \P{Name}, then it is the negation of the property/block, i.e. it matches anything else.
There are e.g. some properties (only a short excerpt):
L ==> All letter characters.
Lu ==> Letter, Uppercase
Ll ==> Letter, Lowercase
N ==> All numbers. This includes the Nd, Nl, and No categories.
Pc ==> Punctuation, Connector
P ==> All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
Sm ==> Symbol, Math
There are e.g. some blocks (only a short excerpt):
0000 - 007F ==> IsBasicLatin
0400 - 04FF ==> IsCyrillic
1000 - 109F ==> IsMyanmar
What I used in the solution:
\P{L} is a character property that is matching any character that is not a letter ("L" for Letter)
\p{IsBasicLatin} is a Unicode block that matches the code points 0000 - 007F
So your regex would be:
^[\P{L}\p{IsBasicLatin}]+$
In plain words:
This matches a string from the start to the end (^ and $), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)
A short c# test method:
string[] myStrings = { "Foobar",
"Foo#bar!\"§$%&/()",
"Föobar",
"fóÓè"
};
Regex reg = new Regex(#"^[\P{L}\p{IsBasicLatin}]+$");
foreach (string str in myStrings) {
Match result = reg.Match(str);
if (result.Success)
Console.Out.WriteLine("matched ==> " + str);
else
Console.Out.WriteLine("failed ==> " + str);
}
Console.ReadLine();
Prints:
matched ==> Foobar
matched ==> Foo#bar!\"§$%&/()
failed ==> Föobar
failed ==> fóÓè

You can use a Regular Expression attribute on your ViewModel to restrict that
public class MyViewModel
{
[System.ComponentModel.DataAnnotations.RegularExpression("[a-zA-Z]+")]
public string MyEntry
{
get;
set;
}
}

You can use regex [\x00-\x80]+ or [\u0000-\u0080]+. Haven't tested but think it should work in C# also.
Adapted from: Regular expression to match non-English characters?
You can use regex validation for textbox and validate on the server also.

May be this one help You:=
private void Validate(TextBox textBox1)
{
Regex rx = new Regex("[^A-Z|^a-z|^ |^\t]");
if (rx.IsMatch(textBoxControl.Text))
throw new Exception("Your error message");
}
Usefull Link:-
http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/84e4f7fa-5fff-427f-8c0e-d478cb38fa12
http://www.c-sharpcorner.com/Forums/Thread/177046/allow-only-20-alphabets-and-numbers-in-textbox-using-reg.aspx

this might help, not efficient way but simple non-reg validation
foreach (char c in inputTextField)
{
if ((int)(c) > 127)
{
// expection or your logic whatever you want to return
}
}

Related

REGEX Matching string nonconsecutively

I'm trying to understand how to match a specific string that's held within an array (This string will always be 3 characters long, ex: 123, 568, 458 etc) and I would match that string to a longer string of characters that could be in any order (9841273 for example). Is it possible to check that at least 2 of the 3 characters in the string match (in this example) strMoves? Please see my code below for clarification.
private readonly string[] strSolutions = new string[8] { "123", "159", "147", "258", "357", "369", "456", "789" };
Private Static string strMoves = "1823742"
foreach (string strResult in strSolutions)
{
Regex rgxMain = new Regex("[" + strMoves + "]{2}");
if (rgxMain.IsMatch(strResult))
{
MessageBox.Show(strResult);
}
}
The portion where I have designated "{2}" in Regex is where I expected the result to check for at least 2 matching characters, but my logic is definitely flawed. It will return true IF the two characters are in consecutive order as compared to the string in strResult. If it's not in the correct order it will return false. I'm going to continue to research on this but if anyone has ideas on where to look in Microsoft's documentation, that would be greatly appreciated!
Correct order where it would return true: "144257" when matched to "123"
incorrect order: "35718" when matched to "123"
The 3 is before the 1, so it won't match.

You can use the following solution if you need to find at least two different not necessarily consecutive chars from a specified set in a longer string:
new Regex($#"([{strMoves}]).*(?!\1)[{strMoves}]", RegexOptions.Singleline)
It will look like
([1823742]).*(?!\1)[1823742]
See the regex demo.
Pattern details:
([1823742]) - Capturing group 1: one of the chars in the character class
.* - any zero or more chars as many as possible (due to RegexOptions.Singleline, . matches any char including newline chars)
(?!\1) - a negative lookahead that fails the match if the next char is a starting point of the value stored in the Group 1 memory buffer (since it is a single char here, the next char should not equal the text in Group 1, one of the specified digits)
[1823742] - one of the chars in the character class.

Merging 3 Regular Expressions to make a Slug/URL validation check

I am trying to merge a few working RegEx patterns together (AND them). I don't think I am doing this properly, further, the first RegEx might be getting in the way of the next two.
Slug example (no special characters except for - and _):
(^[a-z0-9-_]+$)
Then I would like to ensure the first character is NOT - or _:
(^[^-_])
Then I would like to ensure the last character is NOT - or _:
([^-_]$)
Match (good Alias):
my-new_page
pagename
Not-Match (bad Alias)
-my-new-page
my-new-page_
!##$%^&*()
If this RegExp can be simplified and I am more than happy to use it. I am trying to create validation on a page URL that the user can provide, I am looking for the user to:
Not start or and with a special character
Start and end with a number or letter
middle (not start and end) can include - and _
One I get that working, I can tweak if for other characters as needed.
In the end I am applying as an Annotation to my model like so:
[RegularExpression(
#"(^[a-z0-9-_]+$)?(^[^-_])?([^-_]$)",
ErrorMessage = "Alias is not valid")
]
Thank you, and let me know if I should provide more information.

See regex in use here
^[a-z\d](?:[a-z\d_-]*[a-z\d])?$
^ Assert position at the start of the line
[a-z\d] Match any lowercase ASCII letter or digit
(?:[a-z\d_-]*[a-z\d])? Optionally match the following
[a-z\d_-]* Match any character in the set any number of times
[a-z\d] Match any lowercase ASCII letter or digit
$ Assert position at the end of the line
See code in use here
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
Regex regex = new Regex(#"^[a-z\d](?:[a-z\d_-]*[a-z\d])?$");
string[] strings = {"my-new_page", "pagename", "-my-new-page", "my-new-page_", "!##$%^&*()"};
foreach(string s in strings) {
if (regex.IsMatch(s))
{
Console.WriteLine(s);
}
}
}
}
Result (only positive matches):
my-new_page
pagename

String Needs to Contain 2 words

I have a textbox on one of my views, and that textbox should not accept anything that has more than 2 words or less than 2 words. This textbox needs 2 words.
Basically this textbox accepts a person's first and last name. I don't want people to only enter one or the other.
Is there a way to check for a space character between 2 words and another space character along with any letter, number, etc after the 2nd word if it exists? I think that if the user accidently 'fat-fingers' an extra space after the 2nd word, that should be fine bc there are still only 2 words.
For example:
/* the _ character means space */
John /* not accepted */
John_ /* not accepted */
John_Smith_a /* not accepted */
John Smith_ /* accepted */
Any help is appreciated.

There are multiple approaches that you could use to solve this, I'll review over a few.
Using the String.Split() Method
You could use the String.Split() method to break up a string into it's individual components based on a delimiter. In this case, you could use a space as a delimiter to get the individual words :
// Get your words, removing any empty entries along the way
var words = YourTextBox.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// Determine how many words you have here
if(words.Length != 2)
{
// Tell the user they made a horrible mistake not typing two words here
}
Using a Regular Expression
Additionally, you could attempt to resolve this via a Regular Expression using the Regex.IsMatch() method :
// Check for exactly two words (and allow for beginning and trailing spaces)
if(!Regex.IsMatch(input,#"^(\s+)?\w+\s+\w+(\s+)?"))
{
// There are not two words, do something
}
The expression itself may look a bit scary, but it can be broken down as follows :
^ # This matches the start of your string
(\s+)? # This optionally allows for a single series of one or more whitespace characters
\w+ # This allows for one or more "word" characters that make up your first word
\s+ # Again you allow for a series of whitespace characters, you can drop the + if you just want one
\w+ # Here's your second word, nothing new here
(\s+)? # Finally allow for some trailing spaces (up to you if you want them)
A "word" character \w is a special character in Regular Expressions that can represent a digit, letter or an underscore and is the equivalent of [a-zA-Z0-9_].
Taking Advantage of Regular Expressions using MVC's RegularExpressionAttribute
Finally, since you are using MVC, you could take advantage of the [RegularExpressionValidation] attribute on your model itself :
[RegularExpression(#"^(\s+)?\w+\s+\w+(\s+)?", ErrorMessage = "Exactly two words are required.")]
public string YourProperty { get; set; }
This will allow you to simply call the ModelState.IsValid within your Controller Action to see if your Model has any errors or not :
// This will check your validation attributes like the one mentioned above
if(!ModelState.IsValid)
{
// You probably have some errors, like not exactly two words
}

use it like this
string s="John_Smith_a"
if (s.Trim().Split(new char[] { ' ' }).Length > 1)
{
}

The tag implies MVC here, so I would recommend using the RegularExpressionAttribute class:
public class YourModel
{
[RegularExpression(#"[^\w\s\w$]", ErrorMessage = "You must have exactly two words separated by a space.")]
public string YourProperty { get; set; }
}

Match m = Regex.Match(this.yourTextBox.Text, #"[^\w\s\w$]", String.Empty);
if (m.Success)
//do something
else
//do something else
With my very limited knowledge of regular expressions, I believe that this will solve your issue.

The cleanest way is to use regular expressions with the IsMatch method like this:
Regex.IsMatch("One Two", #"^\w+\s\w+\s?$")
Returns true if the input is a match.

Try this
if (str.Split(' ').Length == 2)
{
//Do Something
}
str is the variable holding your string to compare

form validation allow only english alphabet characters

I'd like to restrict my form input from entering non-english characters. For example, all Chinese, Japanese, Cyrllic, but also single characters like: à, â, ù, û, ü, ô, î, ê. Would this be possible? Do I have to set up a locale on my MVC application or rather just do a regex textbox validation? Just a side note, I want to be able to enter numbers and other characters. I only want this to exclude letters.
Please advice, thank you

For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.
For more details, see:
regular-expressions.info for some general information about Unicode code points, character properties, scripts and blocks
MSDN for the supported properties and blocks in .net
Those Unicode Properties and blocks are written \p{Name}, where "Name" is the name of the property or block.
When it is an uppercase "P" like this \P{Name}, then it is the negation of the property/block, i.e. it matches anything else.
There are e.g. some properties (only a short excerpt):
L ==> All letter characters.
Lu ==> Letter, Uppercase
Ll ==> Letter, Lowercase
N ==> All numbers. This includes the Nd, Nl, and No categories.
Pc ==> Punctuation, Connector
P ==> All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
Sm ==> Symbol, Math
There are e.g. some blocks (only a short excerpt):
0000 - 007F ==> IsBasicLatin
0400 - 04FF ==> IsCyrillic
1000 - 109F ==> IsMyanmar
What I used in the solution:
\P{L} is a character property that is matching any character that is not a letter ("L" for Letter)
\p{IsBasicLatin} is a Unicode block that matches the code points 0000 - 007F
So your regex would be:
^[\P{L}\p{IsBasicLatin}]+$
In plain words:
This matches a string from the start to the end (^ and $), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)
A short c# test method:
string[] myStrings = { "Foobar",
"Foo#bar!\"§$%&/()",
"Föobar",
"fóÓè"
};
Regex reg = new Regex(#"^[\P{L}\p{IsBasicLatin}]+$");
foreach (string str in myStrings) {
Match result = reg.Match(str);
if (result.Success)
Console.Out.WriteLine("matched ==> " + str);
else
Console.Out.WriteLine("failed ==> " + str);
}
Console.ReadLine();
Prints:
matched ==> Foobar
matched ==> Foo#bar!\"§$%&/()
failed ==> Föobar
failed ==> fóÓè

You can use a Regular Expression attribute on your ViewModel to restrict that
public class MyViewModel
{
[System.ComponentModel.DataAnnotations.RegularExpression("[a-zA-Z]+")]
public string MyEntry
{
get;
set;
}
}

You can use regex [\x00-\x80]+ or [\u0000-\u0080]+. Haven't tested but think it should work in C# also.
Adapted from: Regular expression to match non-English characters?
You can use regex validation for textbox and validate on the server also.

May be this one help You:=
private void Validate(TextBox textBox1)
{
Regex rx = new Regex("[^A-Z|^a-z|^ |^\t]");
if (rx.IsMatch(textBoxControl.Text))
throw new Exception("Your error message");
}
Usefull Link:-
http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/84e4f7fa-5fff-427f-8c0e-d478cb38fa12
http://www.c-sharpcorner.com/Forums/Thread/177046/allow-only-20-alphabets-and-numbers-in-textbox-using-reg.aspx

this might help, not efficient way but simple non-reg validation
foreach (char c in inputTextField)
{
if ((int)(c) > 127)
{
// expection or your logic whatever you want to return
}
}

how to prevent non-English characters and allow non-alpha characters

I have a string, and I want to make sure that every letter in it is English.
The other characters, I don't care.
34556#%42%$23$%^*&sdfsfr - valid
34556#%42%$23$%^*&בלה בלה - not valid
Can I do that with Linq? RegEx?
Thanks

You can define in a character class either all characters/character ranges/Unicode-properties/blocks you want to allow or you don't want to allow.
[abc] is a character class that allows a and b and c
[^abc] is a negated character class that matches everything but not a or b or c
Here in your case I would go this way, no need to define every character:
^[\P{L}A-Za-z]*$
Match from the start to the end of the string everything that is not a letter [^\p{L}] or A-Za-z.
\p{L} Is a Unicode property and matches everything that has the property letter. \P{L} is the negated version, everything that is not a letter.
Test code:
string[] StrInputNumber = { "34556#%42%$23$%^*&sdfsfr", "asdf!\"§$%&/()=?*+~#'", "34556#%42%$23$%^*&בלה בלה", "öäü!\"§$%&/()=?*+~#'" };
Regex ASCIILettersOnly = new Regex(#"^[\P{L}A-Za-z]*$");
foreach (String item in StrInputNumber) {
if (ASCIILettersOnly.IsMatch(item)) {
Console.WriteLine(item + " ==> Contains only ASCII letters");
}
else {
Console.WriteLine(item + " ==> Contains non ASCII letters");
}
}
Some more basic regex explanations: What absolutely every Programmer should know about regular expressions

Maybe you could use
using System.Linq;
...
static bool IsValid(string str)
{
return str.All(c => c <= sbyte.MaxValue);
}
This considers all ASCII chars to be "valid" (even control characters). But punctuation and other special characters outside ASCII are not "valid". If str is null, an exception is thrown.

One thing you can try is put the char you want in this regx
bool IsValid(string input) {
return !(Regex.IsMatch(#"[^A-Za-z0-9'\.&#:?!()$#^]", input));
}
char other than specfied in the regx string are get ignored i.e return false..

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to validate a string contains only latin characters including diacritics [duplicate] - c#

You can use a Regular Expression attribute on your ViewModel to restrict that public class MyViewModel { [System.ComponentModel.DataAnnotations.RegularExpression("[a-zA-Z]+")] public string MyEntry { get; set; } }

You can use regex [\x00-\x80]+ or [\u0000-\u0080]+. Haven't tested but think it should work in C# also. Adapted from: Regular expression to match non-English characters? You can use regex validation for textbox and validate on the server also.

this might help, not efficient way but simple non-reg validation foreach (char c in inputTextField) { if ((int)(c) > 127) { // expection or your logic whatever you want to return } }

Related

REGEX Matching string nonconsecutively

Merging 3 Regular Expressions to make a Slug/URL validation check

String Needs to Contain 2 words

form validation allow only english alphabet characters

how to prevent non-English characters and allow non-alpha characters

Categories

Resources