Remove Punctuation and Spaces from string using Regex

Remove Punctuation and Spaces from string using Regex - c#

I am trying to take out all the punctuation and spaces in a string that I am going to encrypt using a Playfair Cipher. I can't figure out why this line doesn't work.
s = Regex.Replace(s, #"[^\w\s]", string.Empty);

The [^\w\s] means remove anything that's not a word or whitespace character.
Try this instead:
s = Regex.Replace(s, #"[^\w]", string.Empty);
You could also use:
s = Regex.Replace(s, #"\W", string.Empty);
Of course that will leave underscores as those are considered word characters. To remove those as well, try this:
s = Regex.Replace(s, #"[\W_]", string.Empty);
Or this:
s = Regex.Replace(s, #"\W|_", string.Empty);

How about using Linq instead of Regex?
string str = "abc; .d";
var newstr = String.Join("", str.Where(char.IsLetterOrDigit));

The ^ character means not. I use [^A-Za-z0-9-] for my replacements of everything not alpha-numeric with a hyphen.

Your best bet is probably to use [^A-Za-z] since \w contains _ and 0-9 which I'm guessing you wouldn't want to keep.
The following regex would remove anything not a-z or A-Z.
s = Regex.Replace(s, #"[^A-Za-z]", string.Empty);

Related

Find and replace the string in paragraph

I want to empty the value between the hyphn for example need to clear the data in between the range of hyphen prefix and suffix then make it has empty string.
string templateContent = "Template content -macro- -UnitDetails- -testEmail- sending Successfully";
Output
templateContent = "Template content sending Successfully";

templateContent = Regex.Replace(templateContent, #"-\w*-\s?", string.Empty).TrimEnd(' ');
#"-\w*-\s" - is regex pattern for '-Word- '
- - pattern for -
\w - word character.
* - zero or any occurrences of \w
\s - pattern for whitespace character
? - marks \s as optional
TrimEnd(' ') - to remove trailing space if there was a pattern at end of the string

There are many ways to do this, however given your example the following should work
var split = templateContent
.Split(' ')
.Where(x => !x.StartsWith("-") && !x.EndsWith("-"));
var result = string.Join(" ",split);
Console.WriteLine(result);
Output
Template content sending Successfully
Full Demo Here
Note : I personally think regex is better suited to this

You can use regex for this
string regExp = "(-[a-zA-Z]*-)";
string tmp = Regex.Replace(templateContent , regExp, "");
string finalStr = Regex.Replace(tmp, " {2,}", " ");

var resultWithSpaces = Regex.Replace(templateContent, #"-\S+-", string.Empty);
This regular expression looks for two hyphens surrounding one or more characters that are not white space.
It will leave the spaces that were around the removed word. To get rid of those you can do another Regex to replace multiple spaces with a single space.
var result = Regex.Replace(resultWithSpaces, #"\s+", " ");

Regex - Removing specific characters before the final occurance of #

So, I'm trying to remove certain characters [.&#] before the final occurance of an #, but after that final #, those characters should be allowed.
This is what I have so far.
string pattern = #"\.|\&|\#(?![^#]+$)|[^a-zA-Z#]";
string input = "username#middle&something.else#company.com";
// input, pattern, replacement
string result = Regex.Replace(input, pattern, string.Empty);
Console.WriteLine(result);
Output: usernamemiddlesomethingelse#companycom
This currently removes all occurances of the specified characters, apart from the final #. I'm not sure how to get this to work, help please?

You may use
[.&#]+(?=.*#)
Or, equivalent [.&#]+(?![^#]*$). See the regex demo.
Details
[.&#]+ - 1 or more ., & or # chars
(?=.*#) - followed with any 0+ chars (other than LF) as many as possible and then a #.
See the C# demo:
string pattern = #"[.&#]+(?=.*#)";
string input = "username#middle&something.else#company.com";
string result = Regex.Replace(input, pattern, string.Empty);
Console.WriteLine(result);
// => usernamemiddlesomethingelse#company.com

Just a simple solution (and alternative to complex regex) using Substring and LastIndexOf:
string pattern = #"[.#&]";
string input = "username#middle&something.else#company.com";
string inputBeforeLastAt = input.Substring(0, input.LastIndexOf('#'));
// input, pattern, replacement
string result = Regex.Replace(inputBeforeLastAt, pattern, string.Empty) + input.Substring(input.LastIndexOf('#'));
Console.WriteLine(result);
Try it with this fiddle.

Trim string by strings

How can I trim a string by a whole string instead of a list of single characters?
I want to remove all and whitespaces at beginning and end of an HTML string. But method String.Trim() does only have overloads for set of characters and not for set of strings.

You could use HttpUtility.HtmlDecode(String) and use the resultant as an input for String.Trim()
HttpUtility.HtmlDecode on MSDN
HttpServerUtility.HtmlDecode on MSDN (a wrapper you can access through the Page.Server property)
string stringWithNonBreakingSpaces;
string trimmedString = String.Trim(HttpUtility.HtmlDecode(stringWithNonBreakingSpaces));
Note: This solution would decode all the HTML strings in the input.

The Trim method removes from the current string all leading and trailing white-space characters by default.
Edit: Solution for your problem AFTER your edit:
string input = #" <a href='#'>link</a> ";
Regex regex = new Regex(#"^( |\s)*|( |\s)*$");
string result = regex.Replace(input, String.Empty);
This will remove all trailing and leading spaces and . You can add any string or character group to the expression. If you were to trim all tabs too the regex would simply become:
Regex regex = new Regex(#"^( |\s|\t)*|( |\s|\t)*$");

Not sure if this is what you're looking for?
string str = "hello ";
str.Replace(" ", "");
str.Trim();

Use RegEx, as David Heffernan said. It is rather easy to select all spaces at the start of string: ^(\ | )*

Remove characters using Regex

I have a string. I need to replace all instances of a given array of strings from this original string - how would I do that?
Currently I am using...
var inputString = "this is my original string.";
var replacement = "";
var pattern = string.Join("|", arrayOfStringsToRemove);
Regex.Replace(inputString, pattern, replacement);
This works fine, but unfortunately it breaks down when someone tries to remove a character that has a special meaning in the regex.
How should I do this? Is there a better way?

Build the pattern using Regex.Escape:
StringBuilder pattern = new StringBuilder();
foreach (string s in arrayOfStringsToRemove)
{
pattern.Append("(");
pattern.Append(Regex.Escape(s));
pattern.Append(")|");
}
Regex.Replace(inputString, pattern.ToString(0, pattern.Length - 1), // remove trailing |
replacement);

Look at Regex.Escape

You need to escape special characters with a backslash
\
Sometimes you may need to use two backslashes
\\

You need to escape characters with spacial meaning of course.
var str_to_replace = "removing \[this\]\(link\)";

Regex to match alphanumeric and spaces

What am I doing wrong here?
string q = "john s!";
string clean = Regex.Replace(q, #"([^a-zA-Z0-9]|^\s)", string.Empty);
// clean == "johns". I want "john s";

just a FYI
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
would actually be better like
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);

This:
string clean = Regex.Replace(dirty, "[^a-zA-Z0-9\x20]", String.Empty);
\x20 is ascii hex for 'space' character
you can add more individual characters that you want to be allowed.
If you want for example "?" to be ok in the return string add \x3f.

I got it:
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
Didn't know you could put \s in the brackets

The following regex is for space inclusion in textbox.
Regex r = new Regex("^[a-zA-Z\\s]+");
r.IsMatch(textbox1.text);
This works fine for me.

I suspect ^ doesn't work the way you think it does outside of a character class.
What you're telling it to do is replace everything that isn't an alphanumeric with an empty string, OR any leading space. I think what you mean to say is that spaces are ok to not replace - try moving the \s into the [] class.

There appear to be two problems.
You're using the ^ outside a [] which matches the start of the line
You're not using a * or + which means you will only match a single character.
I think you want the following regex #"([^a-zA-Z0-9\s])+"

bottom regex with space, supports all keyboard letters from different culture
string input = "78-selim güzel667.,?";
Regex regex = new Regex(#"[^\w\x20]|[\d]");
var result= regex.Replace(input,"");
//selim güzel

The circumflex inside the square brackets means all characters except the subsequent range. You want a circumflex outside of square brackets.

This regex will help you to filter if there is at least one alphanumeric character and zero or more special characters i.e. _ (underscore), \s whitespace, -(hyphen)
string comparer = "string you want to compare";
Regex r = new Regex(#"^([a-zA-Z0-9]+[_\s-]*)+$");
if (!r.IsMatch(comparer))
{
return false;
}
return true;
Create a set using [a-zA-Z0-9]+ for alphanumeric characters, "+" sign (a quantifier) at the end of the set will make sure that there will be at least one alphanumeric character within the comparer.
Create another set [_\s-]* for special characters, "*" quantifier is to validate that there can be special characters within comparer string.
Pack these sets into a capture group ([a-zA-Z0-9]+[_\s-]*)+ to say that the comparer string should occupy these features.

[RegularExpression(#"^[A-Z]+[a-zA-Z""'\s-]*$")]
Above syntax also accepts space

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove Punctuation and Spaces from string using Regex - c#

I am trying to take out all the punctuation and spaces in a string that I am going to encrypt using a Playfair Cipher. I can't figure out why this line doesn't work. s = Regex.Replace(s, #"[^\w\s]", string.Empty);

How about using Linq instead of Regex? string str = "abc; .d"; var newstr = String.Join("", str.Where(char.IsLetterOrDigit));

The ^ character means not. I use [^A-Za-z0-9-] for my replacements of everything not alpha-numeric with a hyphen.

Your best bet is probably to use [^A-Za-z] since \w contains _ and 0-9 which I'm guessing you wouldn't want to keep. The following regex would remove anything not a-z or A-Z. s = Regex.Replace(s, #"[^A-Za-z]", string.Empty);

Related

Find and replace the string in paragraph

Regex - Removing specific characters before the final occurance of #

Trim string by strings

Remove characters using Regex

Regex to match alphanumeric and spaces

Categories

Resources