how to prevent non-English characters and allow non-alpha characters - c#

I have a string, and I want to make sure that every letter in it is English.
The other characters, I don't care.
34556#%42%$23$%^*&sdfsfr - valid
34556#%42%$23$%^*&בלה בלה - not valid
Can I do that with Linq? RegEx?
Thanks

You can define in a character class either all characters/character ranges/Unicode-properties/blocks you want to allow or you don't want to allow.
[abc] is a character class that allows a and b and c
[^abc] is a negated character class that matches everything but not a or b or c
Here in your case I would go this way, no need to define every character:
^[\P{L}A-Za-z]*$
Match from the start to the end of the string everything that is not a letter [^\p{L}] or A-Za-z.
\p{L} Is a Unicode property and matches everything that has the property letter. \P{L} is the negated version, everything that is not a letter.
Test code:
string[] StrInputNumber = { "34556#%42%$23$%^*&sdfsfr", "asdf!\"§$%&/()=?*+~#'", "34556#%42%$23$%^*&בלה בלה", "öäü!\"§$%&/()=?*+~#'" };
Regex ASCIILettersOnly = new Regex(#"^[\P{L}A-Za-z]*$");
foreach (String item in StrInputNumber) {
if (ASCIILettersOnly.IsMatch(item)) {
Console.WriteLine(item + " ==> Contains only ASCII letters");
}
else {
Console.WriteLine(item + " ==> Contains non ASCII letters");
}
}
Some more basic regex explanations: What absolutely every Programmer should know about regular expressions

Maybe you could use
using System.Linq;
...
static bool IsValid(string str)
{
return str.All(c => c <= sbyte.MaxValue);
}
This considers all ASCII chars to be "valid" (even control characters). But punctuation and other special characters outside ASCII are not "valid". If str is null, an exception is thrown.

One thing you can try is put the char you want in this regx
bool IsValid(string input) {
return !(Regex.IsMatch(#"[^A-Za-z0-9'\.&#:?!()$#^]", input));
}
char other than specfied in the regx string are get ignored i.e return false..

Related

Reversing a string with escape characters

I have a string that might contain escape characters. Let's assume this is '\'. I follow the MSDN Escape Sequences definition
I want to reverse this string, but keep the escape sequences.
Example:
string input = #"Hello\_World";
string reversed = #"dlroW\_elloH";
Note that in my input string the backslashes are separate characters. The reversed string is meant to be used in a SQL LIKE statement where the underscore is not meant as a wild card, but literally as an underscore. The backslash in the SQL LIKE functions as an escape character
The problem is, that if a character in my original string is preceded by a backslash, then in my reversed string this backslash should still precede the character: #"_" (two separate characters) should in reverse still be #"_".
Bonus points: Reverse escape sequences with numbers '\x0128'
I've tried it as extension functions:
public static string EscapedReverse(this string txt, char escapeChar)
{
IList<char> charList = txt.ToList();
return new string(EscapedReverse(charList, escapeChar).ToArray());
}
public static IEnumerable<char> EscapedReverse(this IList<char> text, char escapeChar)
{
int i = text.Count-1;
// Text[i] is the last character of the sequence;
// text[i] is the next character to return, except if text[i-1] is escapeChar
while (i > 0)
{
if(text[i-1] == escapeChar)
{
yield return text[i-1];
yield return text[i];
i -= 2;
}
else
{
yield return text[i];
i -= 1;
}
}
// return the last character
if (i == 0)
yield return text[i];
}
This works. However, my string is converted to array / list twice. I wondered if there would be a smarter method where the elements don't have to be accessed so often?
Addition: what is my problem anyway?
Comments suggested to add more information about my problem.
There is a requirement to show a list of matching elements while an operator is typing in a text box. Most elements he can see start with a similar prefix. The difference the operator searches for is in the end of the name.
Therefore we want to show a list of names ending with the typed character. So if the operator types "World" he will see a list with all names ending with "World".
The already existing database (change is out of the question) has a table with a NAME and a REVERSEDNAME. Software takes care that if a name is inserted or updated the correct reversed name is inserted / updated. REVERSEDNAME is indexed, so using a WHERE with reversed name is fast.
So if I need to return all names ending with "World", I need to return the names of all records where the REVERSEDNAME starts with the reverse of "WORLD":
SELECT TOP 30 [MYTABLE].[NAME] as Name
FROM [MYTABLE]
WHERE [MYTABLE].REVERSEDNAME LIKE 'dlroW%'
This works fine as long as no wild cards (like underscore) are used. This was solved by the software by escaping the underscore character (I know, bad design, the fact that SQL LIKE uses underscore as wild card should not seep through, but I have to live with this existing software)
So the operator types #"My_World"
My software received #"My_World", the backslash is a separate character
I have to reverse to #"dlrow_yM", note that the backslash is still before the underscore
My Dapper code:
IEnumerable<string> FetchNamesEndingWith(string nameEnd)
// here is my reversal procedure:
string reversedNameEnd = nameEnd.EscapedReverse() = '%';
using (var dbConnection = this.CreateOpenDbConnection())
{
return dbConnection.Query<string>(#"
SELECT TOP 30 [MYTABLE].[NAME] as Name
FROM [MYTABLE]
WHERE [MYTABLE].REVERSEDNAME LIKE #param ESCAPE '\'",
new {param = reversedNameEnd});
}
MSDN about using escape characters in SQL LIKE
Changing the escape character to a different character doesn't help. The problem is not that the escape character is a backslash, but that reversing my string should keep the escape character in front of the escaped character.
My code works, I only wondered if there would be a better algorithm that doesn't copy the string twice. Not only for this specific problem, but also if in future problems I need to reverse strings and keep certain characters in place.
You can use regular expressions:
var pattern = #"\\x[1-9a-fA-F]{4}|\\x[1-9a-fA-F]{2}|\\[0-7]{3}|\\.|.";
var rgx = new Regex(pattern);
return new string(
rgx.Matches(txt)
.Cast<Match>()
.OrderByDescending(x => x.Index)
.SelectMany(x => x.Value)
.ToArray());
pattern covers single characters and escape sequences in formats:
\x????
\x??
\???
\?

Using RegEx in c# to check for valid characters

I'm having a hard time understand regex. I have a scenario where valid characters are a-z, A-Z, 0-9 and a space. So when I try and create a RegEx for invalid characters I have this, [^a-zA-Z0-9 ].
Then I have strings that I want to search based on the RegEx and when it finds an invalid character, it checks if the character before it is invalid.
for example, "test test +?test"
So what I want to happen is if there are two invalid characters, one after the other, do nothing otherwise insert a '£'. So the string above will be fine, no changes. However, the string, "test test £test", should be changed to "test test ££test".
This is my code..
public string HandleInvalidChars(string message)
{
const string methodName = "HandleInvalidChars";
Regex specialChars = new Regex("[^a-zA-Z0-9 ]");
string strSpecialChars = specialChars.ToString();
//prev character in string which we are going to check
string prevChar;
Match match = specialChars.Match(message);
while (match.Success)
{
//get position of special character
int position = match.Index;
// get character before special character
prevChar = message.Substring(position - 1, 1);
//check if next character is a special character, if not insert ? escape character
try
{
if (!Regex.IsMatch(prevChar, strSpecialChars))
{
message = message.Insert(position, "?");
}
}
catch (Exception ex)
{
_logger.ErrorFormat("{0}: ApplicationException: {1}", methodName, ex);
return message;
}
match = match.NextMatch();
//loop through remainder of string until last character
}
return message;
}
When I test it on the first string it handles the first invalid char, '+', ok but it falls over when it reaches '£'.
Any help is really appreciated.
Thanks :)
What if you would change the RegEx to something like below, to check for only those cases with one special character and not with two?
[a-zA-Z0-9 ]{0,1}[^a-zA-Z0-9 ][a-zA-Z0-9 ]{0,1}
Another thing, I would create a new variable for the return value. As I can see you are keep changing the original string where you are looking for matches.
I believe you have overthought it a bit. All you need is to find a forbidden char that is not preceded nor followed with another forbidden char.
Declare
public string HandleInvalidChars(string message)
{
var pat = #"(?<![^A-Za-z0-9 ])[^A-Za-z0-9 ](?![^A-Za-z0-9 ])";
return Regex.Replace(message, pat, "£$&");
}
and use:
Console.WriteLine(HandleInvalidChars("test test £test"));
// => test test ££test
Console.WriteLine(HandleInvalidChars("test test +?test"));
// => test test +?test
See the online C# demo.
Details
(?<![^A-Za-z0-9 ]) - a negative lookbehind that fails the match if there is a char other than an ASCII letter/digit or space immediately to the left of the current location
[^A-Za-z0-9 ] - a char other than an ASCII letter/digit or space
(?![^A-Za-z0-9 ]) - a negative lookahead that fails the match if there is a char other than an ASCII letter/digit or space immediately to the right of the current location.
The replacement string contains a $&, backreference to the whole match value. Thus, using "£$&" we insert a £ before the match.
See the regex demo.

How to check the string has specific set of character in it or not?

I have a requirement to check wether the incoming string has any character and - in the begining?
sample code is:
string name = "e-rob";
if (name.Contains("[a-z]-"))
{
Console.WriteLine(name);
}
else
{
Console.WriteLine("no match found");
}
Console.ReadLine();`
The above code is not working. It is not neccessarily e- all the time it could be any character and then -
How can I do this?
Try using some RegEx:
Regex reg = new Regex("^[a-zA-Z]-");
bool check = reg.IsMatch("e-rob");
Or even more concise:
if (Regex.IsMatch("e-rob", "^[a-zA-Z]-")) {
// do stuff for when it matches here
}
The ^[a-zA-Z] is where the magic happens. Breaking it down piece-by-piece:
^: tells it to start at the beginning of whatever it's checking the pattern against
[a-zA-Z]: tells it to check for one upper- or lower-case letter between A and Z
-: tells it to check for a "-" character directly after the letter
So e-rob or E-rob would both return true where abcdef-g would return false
Also, as a note, in order to use RegEx you need to include
using System.Text.RegularExpressions;
in your class file
Here's a great link to teach you a bit about RegEx which is the best tool ever when you're talking about matching patterns
Try Regex
Regex reg = new Regex("[a-z]-");
if(reg.IsMatch(name.SubString(0, 2))
{...}
Another way to do this, kind of LINQish:
StartsWithLettersOrDash("Test123");
public bool StartsWithLettersOrDash(string str)
{
string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char [] alphas = (alphabet + alphabet.ToLower()).ToCharArray();
return alphas.Any(z => str.StartsWith(z.ToString() + "-"));
}

Regex for alphanumeric, at least 1 number and special chars

I am trying to find a regex which will give me the following validation:
string should contain at least 1 digit and at least 1 special character. Does allow alphanumeric.
I tried the following but this fails:
#"^[a-zA-Z0-9##$%&*+\-_(),+':;?.,!\[\]\s\\/]+$]"
I tried "password1$" but that failed
I also tried "Password1!" but that also failed.
ideas?
UPDATE
Need the solution to work with C# - currently the suggestions posted as of Oct 22 2013 do not appear to work.
Try this:
Regex rxPassword = new Regex( #"
^ # start-of-line, followed by
[a-zA-Z0-9!##]+ # a sequence of one or more characters drawn from the set consisting of ASCII letters, digits or the punctuation characters ! # and #
(<=[0-9]) # at least one of which is a decimal digit
(<=[!##]) # at least one of which is one of the special characters
(<=[a-zA-Z]) # at least one of which is an upper- or lower-case letter
$ # followed by end-of-line
" , RegexOptions.IgnorePatternWhitespace ) ;
The construct (<=regular-expression) is a zero-width positive look-behind assertion.
Sometimes it's a lot simpler to do things one step at a time. The static constructor builds the escaped character class characters from a simple list of allowed special characters. The built-in Regex.Escape method doesn't work here.
public static class PasswordValidator {
private const string ALLOWED_SPECIAL_CHARS = #"##$%&*+_()':;?.,![]\-";
private static string ESCAPED_SPECIAL_CHARS;
static PasswordValidator() {
var escapedChars = new List<char>();
foreach (char c in ALLOWED_SPECIAL_CHARS) {
if (c == '[' || c == ']' || c == '\\' || c == '-')
escapedChars.AddRange(new[] { '\\', c });
else
escapedChars.Add(c);
}
ESCAPED_SPECIAL_CHARS = new string(escapedChars.ToArray());
}
public static bool IsValidPassword(string input) {
// Length requirement?
if (input.Length < 8) return false;
// First just check for a digit
if (!Regex.IsMatch(input, #"\d")) return false;
// Then check for special character
if (!Regex.IsMatch(input, "[" + ESCAPED_SPECIAL_CHARS + "]")) return false;
// Require a letter?
if (!Regex.IsMatch(input, "[a-zA-Z]")) return false;
// DON'T allow anything else:
if (Regex.IsMatch(input, #"[^a-zA-Z\d" + ESCAPED_SPECIAL_CHARS + "]")) return false;
return true;
}
}
This may be work, there are two possible, the digit before special char or the digit after the special char. You should use DOTALL(the dot point all char)
^((.*?[0-9].*?[##$%&*+\-_(),+':;?.,!\[\]\s\\/].*)|(.*?[##$%&*+\-_(),+':;?.,!\[\]\s\\/].*?[0-9].*))$
This worked for me:
#"(?=^[!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]{8,}$)(?=([!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]\W+){1,})(?=[^0-9][0-9])[!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]*$"
alphanumeric, at least 1 numeric, and special character with a min length of 8
This should do the work
(?:(?=.*[0-9]+)(?=.*[a-zA-Z]+)(?=.*[##$%&*+\-_(),+':;?.,!\[\]\s\\/]+))+
Tested with javascript, not sure about c#, may need some little adjust.
What it does is use anticipated positive lookahead to find the required elements of the password.
EDIT
Regular expression is designed to test if there are matches. Since all the patterns are lookahead, no real characters get captured and matches are empty, but if the expression "match", then the password is valid.
But, since the question is C# (sorry, i don't know c#, just improvising and adapting samples)
string input = "password1!";
string pattern = #"^(?:(?=.*[0-9]+)(?=.*[a-zA-Z]+)(?=.*[##$%&*+\-_(),+':;?.,!\[\]\s\\/]+))+.*$";
Regex rgx = new Regex(pattern, RegexOptions.None);
MatchCollection matches = rgx.Matches(input);
if (matches.Count > 0) {
Console.WriteLine("{0} ({1} matches):", input, matches.Count);
foreach (Match match in matches)
Console.WriteLine(" " + match.Value);
}
Adding start of line, and a .*$ to the end, the expression will match if the password is valid. And the match value will be the password. (i guess)

How do I get a list of all the printable characters in C#?

I'd like to be able to get a char array of all the printable characters in C#, does anybody know how to do this?
edit:
By printable I mean the visible European characters, so yes, umlauts, tildes, accents etc.
This will give you a list with all characters that are not considered control characters:
List<Char> printableChars = new List<char>();
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
char c = Convert.ToChar(i);
if (!char.IsControl(c))
{
printableChars.Add(c);
}
}
You may want to investigate the other Char.IsXxxx methods to find a combination that suits your requirements.
Here's a LINQ version of Fredrik's solution. Note that Enumerable.Range yields an IEnumerable<int> so you have to convert to chars first. Cast<char> would have worked in 3.5SP0 I believe, but as of 3.5SP1 you have to do a "proper" conversion:
var chars = Enumerable.Range(0, char.MaxValue+1)
.Select(i => (char) i)
.Where(c => !char.IsControl(c))
.ToArray();
I've created the result as an array as that's what the question asked for - it's not necessarily the best idea though. It depends on the use case.
Note that this also doesn't consider full Unicode characters, only those in the basic multilingual plane. I don't know what it returns for high/low surrogates, but it's worth at least knowing that a single char doesn't really let you represent everything :(
A LINQ solution (based on Fredrik Mörk's):
Enumerable.Range(char.MinValue, char.MaxValue).Select(c => (char)c).Where(
c => !char.IsControl(c)).ToArray();
TLDR Answer
Use this Regex...
var regex = new Regex(#"[^\p{Cc}^\p{Cn}^\p{Cs}]");
TLDR Explanation
^\p{Cc} : Do not match control characters.
^\p{Cn} : Do not match unassigned characters.
^\p{Cs} : Do not match UTF-8-invalid characters.
Working Demo
I test two strings in this demo: "Hello, World!" and "Hello, World!" + (char)4. char(4) is the character for END TRANSMISSION.
using System;
using System.Text.RegularExpressions;
public class Test {
public static MatchCollection getPrintableChars(string haystack) {
var regex = new Regex(#"[^\p{Cc}^\p{Cn}^\p{Cs}]");
var matches = regex.Matches(haystack);
return matches;
}
public static void Main() {
var teststring1 = "Hello, World!";
var teststring2 = "Hello, World!" + (char)4;
var teststring1unprintablechars = getPrintableChars(teststring1);
var teststring2unprintablechars = getPrintableChars(teststring2);
Console.WriteLine("Testing a Printable String: " + teststring1unprintablechars.Count + " Printable Chars Detected");
Console.WriteLine("Testing a String With 1-Unprintable Char: " + teststring2unprintablechars.Count + " Printable Chars Detected");
foreach (Match unprintablechar in teststring1unprintablechars) {
Console.WriteLine("String 1 Printable Char:" + unprintablechar);
}
foreach (Match unprintablechar in teststring2unprintablechars) {
Console.WriteLine("String 2 Printable Char:" + unprintablechar);
}
}
}
Full Working Demo at IDEOne.com
Alternatives
\P{C} : Match only visible characters. Do not match any invisible characters.
\P{Cc} : Match only non-control characters. Do not match any control characters.
\P{Cc}\P{Cn} : Match only non-control characters that have been assigned. Do not match any control or unassigned characters.
\P{Cc}\P{Cn}\P{Cs} : Match only non-control characters that have been assigned and are UTF-8 valid. Do not match any control, unassigned, or UTF-8-invalid characters.
\P{Cc}\P{Cn}\P{Cs}\P{Cf} : Match only non-control, non-formatting characters that have been assigned and are UTF-8 valid. Do not match any control, unassigned, formatting, or UTF-8-invalid characters.
Source and Explanation
Take a look at the Unicode Character Properties available that can be used to test within a regex. You should be able to use these regexes in Microsoft .NET, JavaScript, Python, Java, PHP, Ruby, Perl, Golang, and even Adobe. Knowing Unicode character classes is very transferable knowledge, so I recommend using it!
I know ASCII wasn't specifically requested but this is a quick way to get a list of all the printable ASCII characters.
for (Int32 i = 0x20; i <= 0x7e; i++)
{
printableChars.Add(Convert.ToChar(i));
}
See this ASCII table.
Edit:
As stated by Péter Szilvási, the 0x20 and 0x7e in the loop are hexidecimal representations of the base 10 numbers 32 and 126, which are the printable ASCII characters.
public bool IsPrintableASCII(char c)
{
return c >= '\x20' && c <= '\x7e';
}

Categories