Regex to find out if the sequence has any special characters

Regex to find out if the sequence has any special characters - c#

I am looking for a regex to find out the given word sequence has any special characters.
For example.
In this input string
"test?test";
I would like to find out the words got
"test(any special char(s) including space)test"

You can just use [^A-Za-z0-9], which will match anything that is not alphanumeric, but of course it depends on what you consider a "special character." If underscore is not special [\W] can be a shortcut for anything that is not a word (A-Za-z0-9_) character.

You don't really need a regex here. If you want to test for alphanumeric characters, you car use LINQ, for example (or just iterate over the letters):
string input = "test test";
bool valid = input.All(Char.IsLetterOrDigit);
Char.IsLetterOrDigit checks for all Unicode alphanumeric characters. If you only want the English ones, you can write:
public static bool IsEnglishAlphanumeric(char c)
{
return ((c >= 'a') && (c <= 'z'))
|| ((c >= 'A') && (c <= 'Z'))
|| ((c >= '0') && (c <= '9'));
}
and use it similarly:
bool valid = input.All(IsEnglishAlphanumeric);

Related

Replace special characters and spaces with empty string with REGEX

I am looking to take a string from an input text box and remove all special characters and spaces.
e.g.
Input: ##HH 4444 5656 3333 2AB##
Output: HH4444565633332AB

If dealing with unicode to remove one or more characters that is not a letter nor a number:
[^\p{L}\p{N}]+
See this demo at regexstorm or a C# replace demo at tio.run
\p{L} matches any kind of letter from any language
\p{N} matches any kind of numeric character in any script

Let's define what we are going to keep, not what to remove. If we keep Latin letters and digits only we can put
string result = Regex.Replace(input, "[^A-Za-z0-9]", "");
or (Linq alternative)
string result = string.Concat(input
.Where(c => c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' c >= '0' && c <= '9'));

String Conversion - remove some characters and replace non-digits with ASCII code

I need to take the value CS5999-1 and convert it to 678359991. Basically replace any alpha character with the equivalent ASCII value and strip the dash. I need to get rid of non-numeric characters and make the value unique (some of the data coming in is all numeric and I determined this will make the records unique).
I have played around with regular expressions and can replace the characters with an empty string, but can't figure out how to replace the character with an ASCII value.
Code is still stuck in .NET 2.0 (Corporate America) in case that matters for any ideas.
I have tried several different methods to do this and no I don't expect SO members to write the code for me. I am looking for ideas.
to replace the alpha characters with an empty string I have used:
strResults = Regex.Replace(strResults , #"[A-Za-z\s]",string.Empty);
This loop will replace the character with itself. Basically if I could replace find a way to substitute the replace value with an the ACSII value I would have it, but have tried converting the char value to int and several other different methods I found and all come up with an error.
foreach (char c in strMapResults)
{
strMapResults = strMapResults.Replace(c,c);
}

Check if each character is in the a-z range. If so, add the ASCII value to the list, and if it is in the 0-9 range, just add the number.
public static string AlphaToAscii(string str)
{
var result = string.Empty;
foreach (char c in str)
{
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
result += (int)c;
else if (c >= '0' && c <= '9')
result += c;
}
return result;
}
All characters outside of the alpha-numeric range (such as -) will be ignored.
If you are running this function on particularly large strings or want better performance you may want to use a StringBuilder instead of +=.

For all characters in the ASCII range, the encoded value is the same as the Unicode code point. This is also true of ISO/IEC 8859-1, and UCS-2, but not of other legacy encodings.
And since UCS-2 is the same as UTF-16 for the values in UCS-2 (which includes all ASCII characters, as per the above), and since .NET char is a UTF-16 unit, all you need to do is just cast to int.
var builder = new StringBuilder(str.Length * 3); // Pre-allocate to worse-case scenario
foreach(char c in str)
{
if (c >= '0' && c <= '9')
builder.Append(c);
else if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
builder.Append((int)c);
}
string result = builder.ToString();

If you want to know how you might do this with a regular expression (you mentioned regex in your question), here's one way to do it.
The code below filters all non-digit characters, converting letters to their ASCII representation, and dumping anything else, including all non-ASCII alphabetical characters. Note that treating (int)char as the equivalent of a character's ASCII value is only valid where the character is genuinely available in the ASCII character set, which is clearly the case for A-Za-z.
MatchEvaluator filter = match =>
{
var alpha = match.Groups["asciialpha"].Value;
return alpha != "" ? ((int) alpha[0]).ToString() : "";
};
var filtered = Regex.Replace("CS5999-1", #"(?<asciialpha>[A-Za-z])|\D", filter);

Try this
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "CS5999-1";
MatchEvaluator evaluator = new MatchEvaluator(Replace);
string results = Regex.Replace(input, "[A-Za-z\\-]", evaluator);
}
static string Replace(Match match)
{
if (match.Value == "-")
{
return "";
}
else
{
byte[] ascii = Encoding.UTF8.GetBytes(match.Value);
return ascii[0].ToString();
}
}
}
}

Regex Lookahead and lookbehind at most one digit

I'm looking for create RegEx pattern
8 characters [a-zA_Z]
must contains only one digit in any place of string
I created this pattern:
^(?=.*[0-9].*[0-9])[0-9a-zA-Z]{8}$
This pattern works fine but i want only one digit allowed. Example:
aaaaaaa6 match
aaa7aaaa match
aaa88aaa don't match
aaa884aa don't match
aaawwaaa don't match

You could instead use:
^(?=[0-9a-zA-Z]{8})[^\d]*\d[^\d]*$
The first part would assert that the match contains 8 alphabets or digits. Once this is ensured, the second part ensures that there is only one digit in the match.
EDIT: Explanation:
The anchors ^ and $ denote the start and end of string.
(?=[0-9a-zA-Z]{8}) asserts that the match contains 8 alphabets or digits.
[^\d]*\d[^\d]* would imply that there is only one digit character and remaining non-digit characters. Since we had already asserted that the input contains digits or alphabets, the non-digit characters here are alphabets.

If you want a non regex solution, I wrote this for a small project :
public static bool ContainsOneDigit(string s)
{
if (String.IsNullOrWhiteSpace(s) || s.Length != 8)
return false;
int nb = 0;
foreach (char c in s)
{
if (!Char.IsLetterOrDigit(c))
return false;
if (c >= '0' && c <= '9') // just thought, I could use Char.IsDigit() here ...
nb++;
}
return nb == 1;
}

Regular expression to match one of two characters

What regular expression can I use to make sure input matches either a character 'a' or character 'x'.
I have tried the following but this doesn't work as I had hoped.
char option;
Console.WriteLine("Please make your option");
for (int i = 0; i < options.Length; i++)
{
Console.WriteLine(options[i]);
}
option = char.Parse(Console.ReadLine());
while (option != 'a' || option != 'x')
{
Console.WriteLine("'a' or 'x' please!!");
option = char.Parse(Console.ReadLine());
}
What I want is for one of the two characters to be accepted only...as input.

Regex.IsMatch(input, "[ax]", RegexOptions.IgnoreCase);
will match a,x,A,X

a + x in rational language, (a | x) or [ax] in almost every regexp system.

No regex is needed, you have logic error here, you need to use && (AND) logic operator instead of || (OR) in your while loop:
while (option != 'a' && option != 'x')

switch statement - validate substrings

the field data has 4 acceptable types of values:
j
47d (where the first one-two characters are between 0 and 80 and third character is d)
9u (where the first one-two characters are between 0 and 80 and third character is u)
3v (where the first character is between 1 and 4 and second character is v).
Otherwise the data should be deemed invalid.
string data = readconsole();
what is the best way of validating this input?
I was considering a combination of .Length and Switch substring checks.
ie.
if (data == "j")
else if (data.substring(1) == "v" && data.substring(0,1) >=1 && data.substring(0,1) <=4)
....
else
writeline("fail");

You can use a regular expression that matches the different kinds of values:
^(j|(\d|[1-7]\d|80)[du]|[1-4]v)$
Example:
if (Regex.IsMatch(data, #"^(j|(\d|[1-7]\d|80)[du]|[1-4]v)$")) ...
Explanation of the regular expression:
^ matches the beginning of string
j matches the literal value "j"
| is the "or" operator
\d matches one digit
[1-7]\d matches "10" - "79"
80 matches "80"
[du] matches either "d" or "u"
[1-4] matches "1" - "4"
v matches "v"
$ matches the end of the string

A regular expression will be the most succinct way to validate such rules.

You can use the regular expression:
^(?:j|(?:[0-7]?[0-9]|80)[du]|[1-4]v)$
Another option is to split by number and letter, and check the results. This is quite longer, but probably easier to maintain in the long run:
public bool IsValid(string s)
{
if (s == "j")
return true;
Match m = Regex.Match(s, #"^(\d+)(\p{L})$");
if (!m.Success)
return false;
char c = m.Groups[2].Value[0];
int number;
if (!Int32.TryParse(m.Groups[1].Value, NumberStyles.Integer,
CultureInfo.CurrentCulture, out number)) //todo: choose culture
return false;
return ((c == 'u' || c == 'd') && number > 0 && number <= 80) ||
(c == 'v' && number >= 1 && number <= 4);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to find out if the sequence has any special characters - c#

I am looking for a regex to find out the given word sequence has any special characters. For example. In this input string "test?test"; I would like to find out the words got "test(any special char(s) including space)test"

You can just use [^A-Za-z0-9], which will match anything that is not alphanumeric, but of course it depends on what you consider a "special character." If underscore is not special [\W] can be a shortcut for anything that is not a word (A-Za-z0-9_) character.

Related

Replace special characters and spaces with empty string with REGEX

String Conversion - remove some characters and replace non-digits with ASCII code

Regex Lookahead and lookbehind at most one digit

Regular expression to match one of two characters

switch statement - validate substrings

Categories

Resources