Regex to match alphanumeric and spaces

Regex to match alphanumeric and spaces - c#

What am I doing wrong here?
string q = "john s!";
string clean = Regex.Replace(q, #"([^a-zA-Z0-9]|^\s)", string.Empty);
// clean == "johns". I want "john s";

just a FYI
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
would actually be better like
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);

This:
string clean = Regex.Replace(dirty, "[^a-zA-Z0-9\x20]", String.Empty);
\x20 is ascii hex for 'space' character
you can add more individual characters that you want to be allowed.
If you want for example "?" to be ok in the return string add \x3f.

I got it:
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
Didn't know you could put \s in the brackets

The following regex is for space inclusion in textbox.
Regex r = new Regex("^[a-zA-Z\\s]+");
r.IsMatch(textbox1.text);
This works fine for me.

I suspect ^ doesn't work the way you think it does outside of a character class.
What you're telling it to do is replace everything that isn't an alphanumeric with an empty string, OR any leading space. I think what you mean to say is that spaces are ok to not replace - try moving the \s into the [] class.

There appear to be two problems.
You're using the ^ outside a [] which matches the start of the line
You're not using a * or + which means you will only match a single character.
I think you want the following regex #"([^a-zA-Z0-9\s])+"

bottom regex with space, supports all keyboard letters from different culture
string input = "78-selim güzel667.,?";
Regex regex = new Regex(#"[^\w\x20]|[\d]");
var result= regex.Replace(input,"");
//selim güzel

The circumflex inside the square brackets means all characters except the subsequent range. You want a circumflex outside of square brackets.

This regex will help you to filter if there is at least one alphanumeric character and zero or more special characters i.e. _ (underscore), \s whitespace, -(hyphen)
string comparer = "string you want to compare";
Regex r = new Regex(#"^([a-zA-Z0-9]+[_\s-]*)+$");
if (!r.IsMatch(comparer))
{
return false;
}
return true;
Create a set using [a-zA-Z0-9]+ for alphanumeric characters, "+" sign (a quantifier) at the end of the set will make sure that there will be at least one alphanumeric character within the comparer.
Create another set [_\s-]* for special characters, "*" quantifier is to validate that there can be special characters within comparer string.
Pack these sets into a capture group ([a-zA-Z0-9]+[_\s-]*)+ to say that the comparer string should occupy these features.

[RegularExpression(#"^[A-Z]+[a-zA-Z""'\s-]*$")]
Above syntax also accepts space

Related

Check if an expression is a match with regex

In C# I have two strings: [I/text] and [S/100x20].
So, the first one is [I/ followed by text and ending in ].
And the second is [S/ followed by an integer, then x, then another integer, and ending in ].
I need to check if a given string is a match of one of this formats. I tried the following:
(?<word>.*?) and (?<word>[0-9]x[0-9])
But this does not seem to work and I am missing the [I/...] and [S/...] parts.
How can I do this?

This should do nicely:
Regex rex = new Regex(#"\[I/[^\]]+\]|\[S/\d+x\d+\]");
If the text in [I/text] is supposed to include only alphanumeric characters then #Oleg's use of the \w instead of [^\]] would be better. Also using + means there needs to be at least one of the preceding character class, and the * allows class to be optional. Adjust as needed..
And use:
string testString1 = "[I/text]";
if(rex.IsMatch(testString1))
{
// should match..
}
string testString2 = "[S/100x20]";
if(rex.IsMatch(testString2))
{
// should match..
}

Following regex does it. Matches the whole string
"(\[I/\w+\])|(\[S/\d+x\d+\])"

([I/\w+])
(S/\d+x\d+])
the above works.
use http://regexr.com?34543 to play with your expressions

Trouble creating a Regex expression

I'm trying to create a regex expression what will accept a certain format of command. The pattern is as follows:
Can start with a $ and have two following value 0-9,A-F,a-f (ie: $00 - $FF)
or
Can be any value except for "&<>'/"
*if the value start with $ the next two values after need to be a valid hex value from 00-ff
So far I have this
Regex correctValue = new Regex("($[0-9a-fA-F][0-9a-fA-F])");
Any help will be greatly appreciated!

You just need to add "\" symbol before your "$" and it works:
string input = "$00";
Match m = Regex.Match(input, #"^\$[0-9a-fA-F][0-9a-fA-F]$");
if (m.Success)
{
foreach (Group g in m.Groups)
Console.WriteLine(g.Value);
}
else
Console.WriteLine("Didn't match");

If I'm following you correctly, the net result you're looking for is any value that is not in the list "&<>'/", since any combination of $ and two alphanumeric characters would also not be in that list. Thus you could make your expression:
Regex correctValue = new Regex("[^&<>'/]");
Update: But just in case you do need to know how to properly match the $00 - $FF, this would do the trick:
Regex correctValue = new Regex("\$[0-9A-Fa-f]{2}");

In Regular Expression $ use for Anchor assertion, and means:
The match must occur at the end of the string or before \n at the end of the line or string.
try using [$] (Character Class for single character) or \$ (Character Escape) instead.

Removing numbers from text using C#

I have a text file for processing, which has some numbers. I want JUST text in it, and nothing else. I managed to remove the punctuation marks, but how do I remove the numbers? I want this using C# code.
Also, I want to remove words with length greater than 10. How do I do that using Reg Expressions?

You can do this with a regex:
string withNumbers = // string with numbers
string withoutNumbers = Regex.Replace(withNumbers, "[0-9]", "");
Use this regex to remove words with more than 10 characters:
[\w]{10, 100}
100 defines the max length to match. I don't know if there is a quantifier for min length...

Only letters and nothing else (because I see you also want to remove the punctuation marks)
Regex.IsMatch(input, #"^[a-zA-Z]+$");

You can also use string.Join:
string s = "asdasdad34534t3sdf43534";
s = string.Join(null, System.Text.RegularExpressions.Regex.Split(s, "[\\d]"));

The Regex.Replace method should do the trick.
// regex to match any digit
var regex = new Regex("\d");
// replace all matches in input with empty string
var output = regex.Replace(input, String.Empty);

Regex to remove all special characters from string?

I'm completely incapable of regular expressions, and so I need some help with a problem that I think would best be solved by using regular expressions.
I have list of strings in C#:
List<string> lstNames = new List<string>();
lstNames.add("TRA-94:23");
lstNames.add("TRA-42:101");
lstNames.add("TRA-109:AD");
foreach (string n in lstNames) {
// logic goes here that somehow uses regex to remove all special characters
string regExp = "NO_IDEA";
string tmp = Regex.Replace(n, regExp, "");
}
I need to be able to loop over the list and return each item without any special characters. For example, item one would be "TRA9423", item two would be "TRA42101" and item three would be TRA109AD.
Is there a regular expression that can accomplish this for me?
Also, the list contains more than 4000 items, so I need the search and replace to be efficient and quick if possible.
EDIT:
I should have specified that any character beside a-z, A-Z and 0-9 is special in my circumstance.

It really depends on your definition of special characters. I find that a whitelist rather than a blacklist is the best approach in most situations:
tmp = Regex.Replace(n, "[^0-9a-zA-Z]+", "");
You should be careful with your current approach because the following two items will be converted to the same string and will therefore be indistinguishable:
"TRA-12:123"
"TRA-121:23"

[^a-zA-Z0-9] is a character class matches any non-alphanumeric characters.
Alternatively, [^\w\d] does the same thing.
Usage:
string regExp = "[^\w\d]";
string tmp = Regex.Replace(n, regExp, "");

This should do it:
[^a-zA-Z0-9]
Basically it matches all non-alphanumeric characters.

You can use:
string regExp = "\\W";
This is equivalent to Daniel's "[^a-zA-Z0-9]"
\W matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].

For my purposes I wanted all English ASCII chars, so this worked.
html = Regex.Replace(html, "[^\x00-\x80]+", "")

Depending on your definition of "special character", I think "[^a-zA-Z0-9]" would probably do the trick. That would find anything that is not a small letter, a capital letter, or a digit.

tmp = Regex.Replace(n, #"\W+", "");
\w matches letters, digits, and underscores, \W is the negated version.

If you don't want to use Regex then another option is to use
char.IsLetterOrDigit
You can use this to loop through each char of the string and only return if true.

public static string Letters(this string input)
{
return string.Concat(input.Where(x => char.IsLetter(x) && !char.IsSymbol(x) && !char.IsWhiteSpace(x)));
}

extract last match from string in c#

i have strings in the form [abc].[some other string].[can.also.contain.periods].[our match]
i now want to match the string "our match" (i.e. without the brackets), so i played around with lookarounds and whatnot. i now get the correct match, but i don't think this is a clean solution.
(?<=\.?\[) starts with '[' or '.['
([^\[]*) our match, i couldn't find a way to not use a negated character group
`.*?` non-greedy did not work as expected with lookarounds,
it would still match from the first match
(matches might contain escaped brackets)
(?=\]$) string ends with an ]
language is .net/c#. if there is an easier solution not involving a regex i'd be also happy to know
what really irritates me is the fact, that i cannot use (.*?) to capture the string, as it seems non-greedy does not work with lookbehinds.
i also tried: Regex.Split(str, #"\]\.\[").Last().TrimEnd(']');, but i'm not really pround of this solution either

The following should do the trick. Assuming the string ends after the last match.
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
var search = new Regex("\\.\\[(.*?)\\]$", RegexOptions.RightToLeft);
string ourMatch = search.Match(input).Groups[1]);

Assuming you can guarantee the input format, and it's just the last entry you want, LastIndexOf could be used:
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
int lastBracket = input.LastIndexOf("[");
string result = input.Substring(lastBracket + 1, input.Length - lastBracket - 2);

With String.Split():
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
char[] seps = {'[',']','\\'};
string[] splitted = input.Split(seps,StringSplitOptions.RemoveEmptyEntries);
you get "out match" in splitted[7] and can.also.contain.periods is left as one string (splitted[4])
Edit: the array will have the string inside [] and then . and so on, so if you have a variable number of groups, you can use that to get the value you want (or remove the strings that are just '.')
Edited to add the backslash to the separator to treat cases like '\[abc\]'
Edit2: for nested []:
string input = #"[abc].[some other string].[can.also.contain.periods].[our [the] match]";
string[] seps2 = { "].["};
string[] splitted = input.Split(seps2, StringSplitOptions.RemoveEmptyEntries);
you our [the] match] in the last element (index 3) and you'd have to remove the extra ]

You have several options:
RegexOptions.RightToLeft - yes, .NET regex can do this! Use it!
Match the whole thing with greedy prefix, use brackets to capture the suffix that you're interested in
So generally, pattern becomes .*(pattern)
In this case, .*\[([^\]]*)\], then extract what \1 captures (see this on rubular.com)
References
regular-expressions.info/Grouping with brackets

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to match alphanumeric and spaces - c#

What am I doing wrong here? string q = "john s!"; string clean = Regex.Replace(q, #"([^a-zA-Z0-9]|^\s)", string.Empty); // clean == "johns". I want "john s";

just a FYI string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty); would actually be better like string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);

This: string clean = Regex.Replace(dirty, "[^a-zA-Z0-9\x20]", String.Empty); \x20 is ascii hex for 'space' character you can add more individual characters that you want to be allowed. If you want for example "?" to be ok in the return string add \x3f.

I got it: string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty); Didn't know you could put \s in the brackets

The following regex is for space inclusion in textbox. Regex r = new Regex("^[a-zA-Z\\s]+"); r.IsMatch(textbox1.text); This works fine for me.

There appear to be two problems. You're using the ^ outside a [] which matches the start of the line You're not using a * or + which means you will only match a single character. I think you want the following regex #"([^a-zA-Z0-9\s])+"

bottom regex with space, supports all keyboard letters from different culture string input = "78-selim güzel667.,?"; Regex regex = new Regex(#"[^\w\x20]|[\d]"); var result= regex.Replace(input,""); //selim güzel

The circumflex inside the square brackets means all characters except the subsequent range. You want a circumflex outside of square brackets.

[RegularExpression(#"^[A-Z]+[a-zA-Z""'\s-]*$")] Above syntax also accepts space

Related

Check if an expression is a match with regex

Trouble creating a Regex expression

Removing numbers from text using C#

Regex to remove all special characters from string?

extract last match from string in c#

Categories

Resources