Regex to replace invalid characters - c#

I don't have much experience with RegEx so I am using many chained String.Replace() calls to remove unwanted characters -- is there a RegEx I can write to streamline this?
string messyText = GetText();
string cleanText = messyText.Trim()
.ToUpper()
.Replace(",", "")
.Replace(":", "")
.Replace(".", "")
.Replace(";", "")
.Replace("/", "")
.Replace("\\", "")
.Replace("\n", "")
.Replace("\t", "")
.Replace("\r", "")
.Replace(Environment.NewLine, "")
.Replace(" ", "");
Thanks

Try this regex:
Regex regex = new Regex(#"[\s,:.;/\\]+");
string cleanText = regex.Replace(messyText, "").ToUpper();
\s is a character class equivalent to [ \t\r\n].
If you just want to preserve alphanumeric characters, instead of adding every non-alphanumeric character in existence to the character class, you could do this:
Regex regex = new Regex(#"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();
Where \W is any non-word character (not [^a-zA-Z0-9_]).

Character classes to the rescue!
string messyText = GetText();
string cleanText = Regex.Replace(messyText.Trim().ToUpper(), #"[,:.;/\\\n\t\r ]+", "")

You would probably want to use a whitelist approach, there is an ocean of funny characters whose effect depending on combination may not be easy to figure.
A simple regex that removes everything but the allowed characters could look like this:
messyText = Regex.Replace(messyText, #"[^a-zA-Z0-9\x7C\x2C\x2E_]", "");
The ^ is there to invert the selection, apart from the alphanumeric characters this regex allows | , . and _ You can add and remove characters and character sets as needed.

Related

Find and replace the string in paragraph

I want to empty the value between the hyphn for example need to clear the data in between the range of hyphen prefix and suffix then make it has empty string.
string templateContent = "Template content -macro- -UnitDetails- -testEmail- sending Successfully";
Output
templateContent = "Template content sending Successfully";
templateContent = Regex.Replace(templateContent, #"-\w*-\s?", string.Empty).TrimEnd(' ');
#"-\w*-\s" - is regex pattern for '-Word- '
- - pattern for -
\w - word character.
* - zero or any occurrences of \w
\s - pattern for whitespace character
? - marks \s as optional
TrimEnd(' ') - to remove trailing space if there was a pattern at end of the string
There are many ways to do this, however given your example the following should work
var split = templateContent
.Split(' ')
.Where(x => !x.StartsWith("-") && !x.EndsWith("-"));
var result = string.Join(" ",split);
Console.WriteLine(result);
Output
Template content sending Successfully
Full Demo Here
Note : I personally think regex is better suited to this
You can use regex for this
string regExp = "(-[a-zA-Z]*-)";
string tmp = Regex.Replace(templateContent , regExp, "");
string finalStr = Regex.Replace(tmp, " {2,}", " ");
var resultWithSpaces = Regex.Replace(templateContent, #"-\S+-", string.Empty);
This regular expression looks for two hyphens surrounding one or more characters that are not white space.
It will leave the spaces that were around the removed word. To get rid of those you can do another Regex to replace multiple spaces with a single space.
var result = Regex.Replace(resultWithSpaces, #"\s+", " ");

REGEX Adding a string before comma c#

How can I append a known string before each coma on a comma separated string.
Is there a regex for that or something that doesn't use a loop
EX
given string :
email, email2, email3 (etc...)
to
string suffix = "#iou.com"
string desiredResult = "email#iou.com, email2#iou.com, email3#iou.com
Thank you!!
You can use [^,\s]+ regexp, and replace with "$0"+suffix:
var res = Regex.Replace(original, #"[^,\s]+", "$0"+suffix);
"$0" refers to the content captured by the regular expression.
Demo.
Or using LINQ:
Console.WriteLine(string.Join(",",input.Split(',').Select(s => string.Concat(s, suffix))));
You could use a zero-length capture group. Here's how that might look:
\w+(?<ReplaceMe>),?
The \w matches alphanumeric characters, and the named capture group called "ReplaceMe" matches the zero-length space between the end of the word and the beginning of the comma (or any other non-alphanumeric item, including the end of the string).
Then you'd just replace ReplaceMe with the appended value, like this:
Regex.Replace(original, #"\w+(?<ReplaceMe>),?", "#email.com");
Here's an example ofthat regex in action.
Here you are:
string input = "email, email2, email3";
string suffix = "#iou.com";
//string desiredResult = "email#iou.com, email2#iou.com, email3#iou.com";
Console.WriteLine(Regex.Replace((input + ",")
.Replace(",", suffix + ","), #",$", ""));
Hope this helps.

Regex to remove specific string if exist

I wanna remove the -L from the end of my string if exists
So
ABCD => ABCD
ABCD-L => ABCD
at the moment I'm using something like the line below which uses the if/else type of arrangement in my Regex, however, I have a feeling that it should be way more easier than this.
var match = Regex.Match("...", #"(?(\S+-L$)\S+(?=-L)|\S+)");
How about just doing:
Regex rgx = new Regex("-L$");
string result = rgx.Replace("ABCD-L", "");
So basically: if the string ends with -L, replace that part with an empty string.
If you want to not only invoke the replacement at the end of the string, but also at the end of a word, you can add an additional switch to detect word boundaries (\b) in addition to the end of the string:
Regex rgx = new Regex("-L(\b|$)");
string result = rgx.Replace("ABCD-L ABCD ABCD-L", "");
Note that detecting word boundaries can be a little ambiguous. See here for a list of characters that are considered to be word characters in C#.
You also can use String.Replace() method to find a specific string inside a string and replace it with another string in this case with an empty string.
http://msdn.microsoft.com/en-us/library/fk49wtc1(v=vs.110).aspx
Use Regex.Replace function,
Regex.Replace(string, #"(\S+?)-L(?=\s|$)", "$1")
DEMO
Explanation:
( group and capture to \1:
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times)
) end of \1
-L '-L'
(?= look ahead to see if there is:
\s whitespace (\n, \r, \t, \f, and " ")
| OR
$ before an optional \n, and the end of
the string
) end of look-ahead
You certainly can use Regex for this, but why when using normal string functions is clearer?
Compare this:
text = text.EndsWith("-L")
? text.Substring(0, text.Length - "-L".Length)
: text;
to this:
text = Regex.Replace(text, #"(\S+?)-L(?=\s|$)", "$1");
Or better yet, define an extension method like this:
public static string RemoveIfEndsWith(this string text, string suffix)
{
return text.EndsWith(suffix)
? text.Substring(0, text.Length - suffix.Length)
: text;
}
Then your code can look like this:
text = text.RemoveIfEndsWith("-L");
Of course you can always define the extension method using the Regex. At least then your calling code looks a lot cleaner and is far more readable and maintainable.

Remove Punctuation and Spaces from string using Regex

I am trying to take out all the punctuation and spaces in a string that I am going to encrypt using a Playfair Cipher. I can't figure out why this line doesn't work.
s = Regex.Replace(s, #"[^\w\s]", string.Empty);
The [^\w\s] means remove anything that's not a word or whitespace character.
Try this instead:
s = Regex.Replace(s, #"[^\w]", string.Empty);
You could also use:
s = Regex.Replace(s, #"\W", string.Empty);
Of course that will leave underscores as those are considered word characters. To remove those as well, try this:
s = Regex.Replace(s, #"[\W_]", string.Empty);
Or this:
s = Regex.Replace(s, #"\W|_", string.Empty);
How about using Linq instead of Regex?
string str = "abc; .d";
var newstr = String.Join("", str.Where(char.IsLetterOrDigit));
The ^ character means not. I use [^A-Za-z0-9-] for my replacements of everything not alpha-numeric with a hyphen.
Your best bet is probably to use [^A-Za-z] since \w contains _ and 0-9 which I'm guessing you wouldn't want to keep.
The following regex would remove anything not a-z or A-Z.
s = Regex.Replace(s, #"[^A-Za-z]", string.Empty);

Trim string by strings

How can I trim a string by a whole string instead of a list of single characters?
I want to remove all and whitespaces at beginning and end of an HTML string. But method String.Trim() does only have overloads for set of characters and not for set of strings.
You could use HttpUtility.HtmlDecode(String) and use the resultant as an input for String.Trim()
HttpUtility.HtmlDecode on MSDN
HttpServerUtility.HtmlDecode on MSDN (a wrapper you can access through the Page.Server property)
string stringWithNonBreakingSpaces;
string trimmedString = String.Trim(HttpUtility.HtmlDecode(stringWithNonBreakingSpaces));
Note: This solution would decode all the HTML strings in the input.
The Trim method removes from the current string all leading and trailing white-space characters by default.
Edit: Solution for your problem AFTER your edit:
string input = #" <a href='#'>link</a> ";
Regex regex = new Regex(#"^( |\s)*|( |\s)*$");
string result = regex.Replace(input, String.Empty);
This will remove all trailing and leading spaces and . You can add any string or character group to the expression. If you were to trim all tabs too the regex would simply become:
Regex regex = new Regex(#"^( |\s|\t)*|( |\s|\t)*$");
Not sure if this is what you're looking for?
string str = "hello ";
str.Replace(" ", "");
str.Trim();
Use RegEx, as David Heffernan said. It is rather easy to select all spaces at the start of string: ^(\ | )*

Categories