Regex for string without spacial characters or spaces [duplicate] - c#

How do I check a string to make sure it contains numbers, letters, or space only?

In C# this is simple:
private bool HasSpecialChars(string yourString)
{
return yourString.Any(ch => ! char.IsLetterOrDigit(ch));
}

The easiest way it to use a regular expression:
Regular Expression for alphanumeric and underscores
Using regular expressions in .net:
http://www.regular-expressions.info/dotnet.html
MSDN Regular Expression
Regex.IsMatch
var regexItem = new Regex("^[a-zA-Z0-9 ]*$");
if(regexItem.IsMatch(YOUR_STRING)){..}

string s = #"$KUH% I*$)OFNlkfn$";
var withoutSpecial = new string(s.Where(c => Char.IsLetterOrDigit(c)
|| Char.IsWhiteSpace(c)).ToArray());
if (s != withoutSpecial)
{
Console.WriteLine("String contains special chars");
}

Try this way.
public static bool hasSpecialChar(string input)
{
string specialChar = #"\|!#$%&/()=?»«#£§€{}.-;'<>_,";
foreach (var item in specialChar)
{
if (input.Contains(item)) return true;
}
return false;
}

String test_string = "tesintg#$234524##";
if (System.Text.RegularExpressions.Regex.IsMatch(test_string, "^[a-zA-Z0-9\x20]+$"))
{
// Good-to-go
}
An example can be found here: http://ideone.com/B1HxA

If the list of acceptable characters is pretty small, you can use a regular expression like this:
Regex.IsMatch(items, "[a-z0-9 ]+", RegexOptions.IgnoreCase);
The regular expression used here looks for any character from a-z and 0-9 including a space (what's inside the square brackets []), that there is one or more of these characters (the + sign--you can use a * for 0 or more). The final option tells the regex parser to ignore case.
This will fail on anything that is not a letter, number, or space. To add more characters to the blessed list, add it inside the square brackets.

Use the regular Expression below in to validate a string to make sure it contains numbers, letters, or space only:
[a-zA-Z0-9 ]

You could do it with a bool. I've been learning recently and found I could do it this way. In this example, I'm checking a user's input to the console:
using System;
using System.Linq;
namespace CheckStringContent
{
class Program
{
static void Main(string[] args)
{
//Get a password to check
Console.WriteLine("Please input a Password: ");
string userPassword = Console.ReadLine();
//Check the string
bool symbolCheck = userPassword.Any(p => !char.IsLetterOrDigit(p));
//Write results to console
Console.WriteLine($"Symbols are present: {symbolCheck}");
}
}
}
This returns 'True' if special chars (symbolCheck) are present in the string, and 'False' if not present.

A great way using C# and Linq here:
public static bool HasSpecialCharacter(this string s)
{
foreach (var c in s)
{
if(!char.IsLetterOrDigit(c))
{
return true;
}
}
return false;
}
And access it like this:
myString.HasSpecialCharacter();

private bool isMatch(string strValue,string specialChars)
{
return specialChars.Where(x => strValue.Contains(x)).Any();
}

Create a method and call it hasSpecialChar with one parameter
and use foreach to check every single character in the textbox, add as many characters as you want in the array, in my case i just used ) and ( to prevent sql injection .
public void hasSpecialChar(string input)
{
char[] specialChar = {'(',')'};
foreach (char item in specialChar)
{
if (input.Contains(item)) MessageBox.Show("it contains");
}
}
in your button click evenement or you click btn double time like that :
private void button1_Click(object sender, EventArgs e)
{
hasSpecialChar(textbox1.Text);
}

While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the [https://www.nuget.org/packages/Extensions.cs][1] package to your project.
Add "using Extensions;" to the top of your code.
"smith23#".IsAlphaNumeric() will return False whereas "smith23".IsAlphaNumeric() will return True. By default the .IsAlphaNumeric() method ignores spaces, but it can also be overridden such that "smith 23".IsAlphaNumeric(false) will return False since the space is not considered part of the alphabet.
Every other check in the rest of the code is simply MyString.IsAlphaNumeric().

Based on #prmph's answer, it can be even more simplified (omitting the variable, using overload resolution):
yourString.Any(char.IsLetterOrDigit);

No special characters or empty string except hyphen
^[a-zA-Z0-9-]+$

Related

How do I replace multiple occurrences of substrings using a custom c# function?

I wrote the code below to replace strings but it seemed like there is something wrong with the code. It couldn't replace multiple times. It only can replace the first occurrence. It suppose to replace all occurrences.
//Do not use System.Security.SecurityElement.Escape. I am writing this code for multiple .NET versions.
public string UnescapeXML(string s)
{
if (string.IsNullOrEmpty(s)) return s;
return s.Replace("&", "&").Replace("&apos;", "'").Replace(""", "\"").Replace(">", ">").Replace("<", "<");
}
protected void Page_Load(object sender, EventArgs e)
{
string TestData="TestData&amp;amp;";
Response.Write("using function====>>>>" + UnescapeXML(TestData));
Response.Write("\n\n<BR>Not using function====>>>>" + TestData.Replace("&", ""));
}
Your function is operating correctly, assuming you want a one-pass (per substring) solution.
Your testDate is missing & characters before amp;
Change to:
string TestData="TestData&&&";
From:
string TestData="TestData&amp;amp;";
What is a one-pass (per substring) solution?
Each unique substring1 occurrence is changed.
Then, each unique substring2 occurrence is changed.
...
Then, each unique substringN occurrence is changed.
Recursive Alternative
A recursive solution would involve the following:
After one-single substring replacement, you call the function again
If no replacements occured, return the string
public string ReplaceStuff(string MyStr) {
//Attempt to replace 1 substring
if ( ReplacementOccurred ){
return ReplaceStuff(ModifiedString);
}else{
return MyStr;
}
}
What do you get with Recursion?
Well, given an input string: &amp;, it would first be changed to &, then the next level of recursion would change it to &.
I honestly don't think the author wants the recursive solution, but I could see how passersby might.
Looks like the HttpUtility class will be the best choise in that case.

Check Format of a string

What is the smallest amount of C# possible to Check that a String fits this format #-##### (1 number, a dash then 5 more numbers).
It seems to me that a regular expression could do this quick (again, I wish I knew regular expressions).
So, here is an example:
public bool VerifyBoxNumber (string boxNumber)
{
// psudo code
if (boxNumber.FormatMatch("#-#####")
return true;
return false;
}
If you know real code that will make the above comparison work, please add an answer.
private static readonly Regex boxNumberRegex = new Regex(#"^\d-\d{5}$");
public static bool VerifyBoxNumber (string boxNumber)
{
return boxNumberRegex.IsMatch(boxNumber);
}
return Regex.IsMatch(boxNumber, #"^\d-\d{5}$");
^\d-\d{5}$ would be a regexp that matches only this pattern.

Allow user to only input text?

how do I make a boolean statement to only allow text? I have this code for only allowing the user to input numbers but ca'nt figure out how to do text.
bool Dest = double.TryParse(this.xTripDestinationTextBox.Text, out Miles);
bool MilesGal = double.TryParse(this.xTripMpgTextBox.Text, out Mpg);
bool PriceGal = double.TryParse(this.xTripPricepgTextBox.Text, out Price);
Update: looking at your comment I would advise you to read this article:
User Input Validation in Windows Forms
Original answer: The simplest way, at least if you are using .NET 3.5, is to use LINQ:
bool isAllLetters = s.All(c => char.IsLetter(c));
In older .NET versions you can create a method to do this for you:
bool isAllLetters(string s)
{
foreach (char c in s)
{
if (!char.IsLetter(c))
{
return false;
}
}
return true;
}
You can also use a regular expression. If you want to allow any letter as defined in Unicode then you can use this regular expression:
bool isOnlyLetters = Regex.IsMatch(s, #"^\p{L}+$");
If you want to restrict to A-Z then you can use this:
bool isOnlyLetters = Regex.IsMatch(s, #"^[a-zA-Z]+$");
You can use following code in the KeyPress event:
if (!char.IsLetter(e.KeyChar)) {
e.Handled = true;
}
You can use regular expression, browse here: http://regexlib.com/
You can do one of a few things.
User Regular Expression to check that the string matches your concept of 'Text' only
Write some code that checks each character against a list of valid characters. Or even use char.IsLetter()
Use the MaskedTextBox and set a custom mask to limit input to text only characters
I found hooking up into the text changed event of a textbox and accepting/rejecting the changes an acceptable solution. To "only allow text" is a rather vague definition, but you may as well check if your newly added text (the next char) is a number or not and simply reject all numbers/disallowed characters. This will make users feel they can only enter characters and special characters (like dots, question marks etc.).
private void UTextBox_TextChanged(object sender, EventArgs e)
{
string lastCharacter = this.Text[this.Text.Length-1].ToString();
MatchCollection matches = Regex.Matches(lastCharacter, "[0-9]", RegexOptions.None);
if (matches.Count > 0) //character is a number, reject it.
{
this.Text = Text.Substring(0, Text.Length-1);
}
}

Is there a way of making strings file-path safe in c#?

My program will take arbitrary strings from the internet and use them for file names. Is there a simple way to remove the bad characters from these strings or do I need to write a custom function for this?
Ugh, I hate it when people try to guess at which characters are valid. Besides being completely non-portable (always thinking about Mono), both of the earlier comments missed more 25 invalid characters.
foreach (var c in Path.GetInvalidFileNameChars())
{
fileName = fileName.Replace(c, '-');
}
Or in VB:
'Clean just a filename
Dim filename As String = "salmnas dlajhdla kjha;dmas'lkasn"
For Each c In IO.Path.GetInvalidFileNameChars
filename = filename.Replace(c, "")
Next
'See also IO.Path.GetInvalidPathChars
To strip invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars
var validFilename = new string(filename.Where(ch => !invalidFileNameChars.Contains(ch)).ToArray());
To replace invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and an _ for invalid ones
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? '_' : ch).ToArray());
To replace invalid characters (and avoid potential name conflict like Hell* vs Hell$):
static readonly IList<char> invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and replaces invalid chars with a unique letter (Moves the Char into the letter range of unicode, starting at "A")
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? Convert.ToChar(invalidFileNameChars.IndexOf(ch) + 65) : ch).ToArray());
This question has been asked many times before and, as pointed out many times before, IO.Path.GetInvalidFileNameChars is not adequate.
First, there are many names like PRN and CON that are reserved and not allowed for filenames. There are other names not allowed only at the root folder. Names that end in a period are also not allowed.
Second, there are a variety of length limitations. Read the full list for NTFS here.
Third, you can attach to filesystems that have other limitations. For example, ISO 9660 filenames cannot start with "-" but can contain it.
Fourth, what do you do if two processes "arbitrarily" pick the same name?
In general, using externally-generated names for file names is a bad idea. I suggest generating your own private file names and storing human-readable names internally.
I agree with Grauenwolf and would highly recommend the Path.GetInvalidFileNameChars()
Here's my C# contribution:
string file = #"38?/.\}[+=n a882 a.a*/|n^%$ ad#(-))";
Array.ForEach(Path.GetInvalidFileNameChars(),
c => file = file.Replace(c.ToString(), String.Empty));
p.s. -- this is more cryptic than it should be -- I was trying to be concise.
Here's my version:
static string GetSafeFileName(string name, char replace = '_') {
char[] invalids = Path.GetInvalidFileNameChars();
return new string(name.Select(c => invalids.Contains(c) ? replace : c).ToArray());
}
I'm not sure how the result of GetInvalidFileNameChars is calculated, but the "Get" suggests it's non-trivial, so I cache the results. Further, this only traverses the input string once instead of multiple times, like the solutions above that iterate over the set of invalid chars, replacing them in the source string one at a time. Also, I like the Where-based solutions, but I prefer to replace invalid chars instead of removing them. Finally, my replacement is exactly one character to avoid converting characters to strings as I iterate over the string.
I say all that w/o doing the profiling -- this one just "felt" nice to me. : )
Here's the function that I am using now (thanks jcollum for the C# example):
public static string MakeSafeFilename(string filename, char replaceChar)
{
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(c, replaceChar);
}
return filename;
}
I just put this in a "Helpers" class for convenience.
If you want to quickly strip out all special characters which is sometimes more user readable for file names this works nicely:
string myCrazyName = "q`w^e!r#t#y$u%i^o&p*a(s)d_f-g+h=j{k}l|z:x\"c<v>b?n[m]q\\w;e'r,t.y/u";
string safeName = Regex.Replace(
myCrazyName,
"\W", /*Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'*/
"",
RegexOptions.IgnoreCase);
// safeName == "qwertyuiopasd_fghjklzxcvbnmqwertyu"
Here's what I just added to ClipFlair's (http://github.com/Zoomicon/ClipFlair) StringExtensions static class (Utils.Silverlight project), based on info gathered from the links to related stackoverflow questions posted by Dour High Arch above:
public static string ReplaceInvalidFileNameChars(this string s, string replacement = "")
{
return Regex.Replace(s,
"[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]",
replacement, //can even use a replacement string of any length
RegexOptions.IgnoreCase);
//not using System.IO.Path.InvalidPathChars (deprecated insecure API)
}
static class Utils
{
public static string MakeFileSystemSafe(this string s)
{
return new string(s.Where(IsFileSystemSafe).ToArray());
}
public static bool IsFileSystemSafe(char c)
{
return !Path.GetInvalidFileNameChars().Contains(c);
}
}
Why not convert the string to a Base64 equivalent like this:
string UnsafeFileName = "salmnas dlajhdla kjha;dmas'lkasn";
string SafeFileName = Convert.ToBase64String(Encoding.UTF8.GetBytes(UnsafeFileName));
If you want to convert it back so you can read it:
UnsafeFileName = Encoding.UTF8.GetString(Convert.FromBase64String(SafeFileName));
I used this to save PNG files with a unique name from a random description.
private void textBoxFileName_KeyPress(object sender, KeyPressEventArgs e)
{
e.Handled = CheckFileNameSafeCharacters(e);
}
/// <summary>
/// This is a good function for making sure that a user who is naming a file uses proper characters
/// </summary>
/// <param name="e"></param>
/// <returns></returns>
internal static bool CheckFileNameSafeCharacters(System.Windows.Forms.KeyPressEventArgs e)
{
if (e.KeyChar.Equals(24) ||
e.KeyChar.Equals(3) ||
e.KeyChar.Equals(22) ||
e.KeyChar.Equals(26) ||
e.KeyChar.Equals(25))//Control-X, C, V, Z and Y
return false;
if (e.KeyChar.Equals('\b'))//backspace
return false;
char[] charArray = Path.GetInvalidFileNameChars();
if (charArray.Contains(e.KeyChar))
return true;//Stop the character from being entered into the control since it is non-numerical
else
return false;
}
From my older projects, I've found this solution, which has been working perfectly over 2 years. I'm replacing illegal chars with "!", and then check for double !!'s, use your own char.
public string GetSafeFilename(string filename)
{
string res = string.Join("!", filename.Split(Path.GetInvalidFileNameChars()));
while (res.IndexOf("!!") >= 0)
res = res.Replace("!!", "!");
return res;
}
I find using this to be quick and easy to understand:
<Extension()>
Public Function MakeSafeFileName(FileName As String) As String
Return FileName.Where(Function(x) Not IO.Path.GetInvalidFileNameChars.Contains(x)).ToArray
End Function
This works because a string is IEnumerable as a char array and there is a string constructor string that takes a char array.
Many anwer suggest to use Path.GetInvalidFileNameChars() which seems like a bad solution to me. I encourage you to use whitelisting instead of blacklisting because hackers will always find a way eventually to bypass it.
Here is an example of code you could use :
string whitelist = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.";
foreach (char c in filename)
{
if (!whitelist.Contains(c))
{
filename = filename.Replace(c, '-');
}
}

How to validate that a string doesn't contain HTML using C#

Does anyone have a simple, efficient way of checking that a string doesn't contain HTML? Basically, I want to check that certain fields only contain plain text. I thought about looking for the < character, but that can easily be used in plain text. Another way might be to create a new System.Xml.Linq.XElement using:
XElement.Parse("<wrapper>" + MyString + "</wrapper>")
and check that the XElement contains no child elements, but this seems a little heavyweight for what I need.
The following will match any matching set of tags. i.e. <b>this</b>
Regex tagRegex = new Regex(#"<\s*([^ >]+)[^>]*>.*?<\s*/\s*\1\s*>");
The following will match any single tag. i.e. <b> (it doesn't have to be closed).
Regex tagRegex = new Regex(#"<[^>]+>");
You can then use it like so
bool hasTags = tagRegex.IsMatch(myString);
You could ensure plain text by encoding the input using HttpUtility.HtmlEncode.
In fact, depending on how strict you want the check to be, you could use it to determine if the string contains HTML:
bool containsHTML = (myString != HttpUtility.HtmlEncode(myString));
Here you go:
using System.Text.RegularExpressions;
private bool ContainsHTML(string checkString)
{
return Regex.IsMatch(checkString, "<(.|\n)*?>");
}
That is the simplest way, since items in brackets are unlikely to occur naturally.
I just tried my XElement.Parse solution. I created an extension method on the string class so I can reuse the code easily:
public static bool ContainsXHTML(this string input)
{
try
{
XElement x = XElement.Parse("<wrapper>" + input + "</wrapper>");
return !(x.DescendantNodes().Count() == 1 && x.DescendantNodes().First().NodeType == XmlNodeType.Text);
}
catch (XmlException ex)
{
return true;
}
}
One problem I found was that plain text ampersand and less than characters cause an XmlException and indicate that the field contains HTML (which is wrong). To fix this, the input string passed in first needs to have the ampersands and less than characters converted to their equivalent XHTML entities. I wrote another extension method to do that:
public static string ConvertXHTMLEntities(this string input)
{
// Convert all ampersands to the ampersand entity.
string output = input;
output = output.Replace("&", "amp_token");
output = output.Replace("&", "&");
output = output.Replace("amp_token", "&");
// Convert less than to the less than entity (without messing up tags).
output = output.Replace("< ", "< ");
return output;
}
Now I can take a user submitted string and check that it doesn't contain HTML using the following code:
bool ContainsHTML = UserEnteredString.ConvertXHTMLEntities().ContainsXHTML();
I'm not sure if this is bullet proof, but I think it's good enough for my situation.
this also checks for things like < br /> self enclosed tags with optional whitespace. the list does not contain new html5 tags.
internal static class HtmlExts
{
public static bool containsHtmlTag(this string text, string tag)
{
var pattern = #"<\s*" + tag + #"\s*\/?>";
return Regex.IsMatch(text, pattern, RegexOptions.IgnoreCase);
}
public static bool containsHtmlTags(this string text, string tags)
{
var ba = tags.Split('|').Select(x => new {tag = x, hastag = text.containsHtmlTag(x)}).Where(x => x.hastag);
return ba.Count() > 0;
}
public static bool containsHtmlTags(this string text)
{
return
text.containsHtmlTags(
"a|abbr|acronym|address|area|b|base|bdo|big|blockquote|body|br|button|caption|cite|code|col|colgroup|dd|del|dfn|div|dl|DOCTYPE|dt|em|fieldset|form|h1|h2|h3|h4|h5|h6|head|html|hr|i|img|input|ins|kbd|label|legend|li|link|map|meta|noscript|object|ol|optgroup|option|p|param|pre|q|samp|script|select|small|span|strong|style|sub|sup|table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|ul|var");
}
}
Angle brackets may not be your only challenge. Other characters can also be potentially harmful script injection. Such as the common double hyphen "--", which can also used in SQL injection. And there are others.
On an ASP.Net page, if validateRequest = true in machine.config, web.config or the page directive, the user will get an error page stating "A potentially dangerous Request.Form value was detected from the client" if an HTML tag or various other potential script-injection attacks are detected. You probably want to avoid this and provide a more elegant, less-scary UI experience.
You could test for both the opening and closing tags <> using a regular expression, and allow the text if only one of them occcurs. Allow < or >, but not < followed by some text and then >, in that order.
You could allow angle brackets and HtmlEncode the text to preserve them when the data is persisted.
Beware when using the HttpUtility.HtmlEncode method mentioned above. If you are checking some text with special characters, but not HTML, it will evaluate incorrectly. Maybe that's why J c used "...depending on how strict you want the check to be..."

Categories