Replace all non-supported chars with a space - c#

I need to accomplish following. I have list of allowed chars (this is for QB Issues with special characters in QBO API v3 .NET SDK)
var goodChars = "ABCD...abcd...~_-...";
void string Sanitize(string input)
{
// TODO: Need to take input and replace all chars not included in "goodChars" with a space
}
I know how to find bad chars with RegEx, but this is like backwards, I don't need to look at matches. I need to look at what is not matching and replace only those.

string Sanitize(string input)
{
return new string(input.Select(x => goodChars.Contains(x)?x:' ').ToArray());
}
And as vc 74 suggests, its better to have an HashSet<char> of goodChars instead of a string for faster look ups

You can use a Regex with a negative pattern
const string pattern = "[^A-Za-z~_-]";
var regex = new Regex(pattern);
string sanitized = regex.Replace(input, " ");
Fiddle
Note that if this code is used frequently, you can store the regex in a static member to avoid recreating (and recompiling) for each invocation.

Related

C# Regex Replace Greek Letters

I'm getting string like "thetaetaA" (theta eta A)
I need to replace the recived string like {\theta}{\eta}A
// C# code with regex to match greek letters
string gl = "alpha|beta|delata|theta|eta";
string recived = "thetaetaA";
var greekLetters = Regex.Matches(recived,gl);
could someone please tell how can I create the required text
{\theta}{\eta}A
if I use loop and do a replace it generate following out put
{\th{\eta}}{\eta}A
because theta included eta
Regex.Matches() doesn't replace anything. Use Regex.Replace(). Capture the words and reference the capture in the replacement adding the special characters around it. (And possibly have the superstrings before the substrings in the alternation. Though it works either way for me. Supposedly it's a greedy match anyway.)
class Program
{
static void Main(string[] args)
{
string gl = "alpha|beta|delta|theta|eta";
string received = "thetaetaA";
string texified = Regex.Replace(received, $"({gl})", #"{\$1}");
Console.WriteLine(texified);
Console.ReadKey();
}
}

C# Regex Formatting String without Input string

I have the following Regex that is being used to matching incoming packets:
public static class ProtobufConstants
{
public static Regex ContentTypeNameRegex => new Regex("application/protobuf; proto=(.*?)");
}
I also need to write outgoing packets strings in the same format, i.e. create strings similar to "application/protobuf; proto=mynamespace.class1" ideally by using the same regex definition new Regex("application/protobuf; proto=(.*?)");.
To keep this code in one place, is it possible to use this regex template and replace the (.*?) parameter with a string (as per above example i would like to substitute "mynamespace.class1").
I see there is a Regex.Replace(string input, string replacement) but given the above ContentTypeNameRegex already has the format defined I don't have an input per se, I just want to format - not sure what to put here, if anything.
Is it possible to use in this manner, or do i need to revert to string.Format?
If you just want to replace the matched group with something else, you can change your pattern to:
(application/protobuf; proto=)(.*?)
That way, you can replace it by doing something like:
Regex re = ContentTypeNameRegex;
string replacement = "mynamespace.class1";
re.Replace(input, "$1" + replacement);
Use Regex.Replace but use the match evaluator to handle your formatting needs. Here is an example which simply replaces a slash with a dash and visa versa, based on what has been matched.
var text = "001-34/323";
Regex.Replace(text, "[-/]", me => { return me.Value == "-" ? "/" : "-"; })
Result
001/34-323
You can do the same with your input, to decide to change it or send it on as is.

Match the last bracket

I have a string which contains some text followed by some brackets with different content (possibly empty). I need to extract the last bracket with its content:
atext[d][][ef] // should return "[ef]"
other[aa][][a] // should return "[a]"
xxxxx[][xx][x][][xx] // should return "[xx]"
yyyyy[] // should return "[]"
I have looked into RegexOptions.RightToLeft and read up on lazy vs greedy matching, but I can't for the life of me get this one right.
This regex will work
.*(\[.*\])
Regex Demo
More efficient and non-greedy version
.*(\[[^\]]*\])
C# Code
string input = "atext[d][][ef]\nother[aa][][a]\nxxxxx[][xx][x][][xx]\nyyyyy[]";
string pattern = "(?m).*(\\[.*\\])";
Regex rgx = new Regex(pattern);
Match match = rgx.Match(input);
while (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
match = match.NextMatch();
}
Ideone Demo
It may give unexpected results for nested [] or unbalanced []
Alternatively, you could reverse the string using a function similar to this:
public static string Reverse( string s )
{
char[] charArray = s.ToCharArray();
Array.Reverse( charArray );
return new string( charArray );
}
And then you could perform a simple Regex search to just look for the first [someText] group or just use a for loop to iterate through and then stop when the first ] is reached.
With negative lookahead:
\[[^\]]*\](?!\[)
This is relatively efficient and flexible, without the evil .*. This will be also work with longer text which contains multiple instances.
Regex101 demo here
The correct way for .net is indeed to use the regex option RightToLeft with the appropriate method Regex.Match(String, String, RegexOptions).
In this way you keep the pattern very simple and efficient since it doesn't produce the less backtracking step and, since the pattern ends with a literal character (the closing bracket), allows a quick search for possible positions in the string where the pattern may succeeds before the "normal" walk of the regex engine.
public static void Main()
{
string input = #"other[aa][][a]";
string pattern = #"\[[^][]*]";
Match m = Regex.Match(input, pattern, RegexOptions.RightToLeft);
if (m.Success)
Console.WriteLine("Found '{0}' at position {1}.", m.Value, m.Index);
}

Regex performance issue on a really big string

Right now I am new to using regexes so I would really appreciate your help.
I have a really large string (I am parsing an as3 file to json) and I need to locate for those trailing commas out there in the objects..
This is the regex I am using
public static string TrimTraillingCommas(string jsonCode)
{
var regex = new Regex(#"(.*?),\s*(\}|\])", (RegexOptions.Multiline));
return regex.Replace(jsonCode, m => String.Format("{0} {1}", m.Groups[1].Value, m.Groups[2].Value));
}
The problem with it is that it's really slow. Without using it in the string the time to complete the program is : 00:00:00.0289668 and with it : 00:00:00.4096293
Could someone suggest a improved regex or algorithm for faster replacing those trailing commas.
Here is where i start from ( the string with the trailing commas )
Here is the end string I need
You can simplify your regular expression by eliminating your capture groups, replacing the purpose of the latter one by a lookahead:
var regex = new Regex(#",\s*(?=\}|\])");
return regex.Replace(jsonCode, " ");
You don't need the first expression .*? and you can convert the alternation
into a character class. That's about the best you could do.
var regex = new Regex(#",[^\S\r\n]*([}\]])");
return regex.Replace(jsonCode, " $1");

A string replace function with support of custom wildcards and escaping these wildcards in C#

I need to write a string replace function with custom wildcards support. I also should be able to escape these wildcards. I currently have a wildcard class with Usage, Value and Escape properties.
So let's say I have a global list called Wildcards. Wildcards has only one member added here:
Wildcards.Add(new Wildcard
{
Usage = #"\Break",
Value = Enviorement.NewLine,
Escape = #"\\Break"
});
So I need a CustomReplace method to do the trick. I should replace the specified parameter in a given string with another one just like the string.Replace. The only difference here that it must use my custom wildcards.
string test = CustomReplace("Hi there! What's up?", "! ", "!\\Break");
// Value of the test variable should be: "Hi there!\r\nWhat's up?"
// Because \Break is specified in a custom wildcard in Wildcards
// But if I use the value of the wildcard's Escape member,
// it should be replaced with the value of Usage member.
test = CustomReplace("Hi there! What's up?", "! ", "!\\\\Break");
// Value of the test variable should be: "Hi there!\\BreakWhat's up?"
My current method doesn't support escape strings.
It also can't be good when it comes to performance since I call string.Replace two times and each one searches the whole string, I guess.
// My current method. Has no support for escape strings.
CustomReplace(string text, string oldValue, string newValue)
{
string done = text.Replace(oldValue, newValue);
foreach (Wildcard wildcard in Wildcards)
{
// Doing this:
// done = done.Replace(wildcard.Escape, wildcard.Usage);
// ...would cause trouble when Escape contains Usage.
done = done.Replace(wildcard.Usage, wildcard.Value);
}
return done;
}
So, do I have to write a replace method which searches the string char by char with the logic to find and seperate both Usage and Escape values, then replace Escape with Usage while replacing Usage with another given string?
Or do you know an already written one?
Can I use regular expressions in this scenerio?
If I can, how? (Have no experience in this, a pattern would be nice)
If I do, would it be faster or slower than char by char searching?
Sorry for the long post, I tried to keep it clear and sorry for any typos and such; it's not my primary language. Thanks in advance.
You can try this:
public string CustomReplace(string text, string oldValue, string newValue)
{
string done = text.Replace(oldValue, newValue);
var builder = new StringBuilder();
foreach (var wildcard in Wildcards)
{
builder.AppendFormat("({0}|{1})|", Regex.Escape(wildcard.Usage),
Regex.Escape(wildcard.Escape));
}
builder.Length = builder.Length - 1; // Remove the last '|' character
return Regex.Replace(done, builder.ToString(), WildcardEvaluator);
}
private string WildcardEvaluator(Match match)
{
var wildcard = Wildcards.Find(w => w.Usage == match.Value);
if (wildcard != null)
return wildcard.Value;
else
return match.Value;
}
I think this is the easiest and fastest solution as there is only one Replace method call for all wildcards.
So if you are happy to just use Regex to fulfil your needs then you should check out this link. It has some great info for using in .Net. The website also has loads of examples on who to construct Regex patterns for many different needs.
A basic example of a Replace on a string with wildcards might look like this...
string input = "my first regex replace";
string result = System.Text.RegularExpressions.Regex.Replace(input, "rep...e", "result");
//result is now "my first regex result"
notice how the second argument in the Replace function takes a regex pattern string. In this case, the dots are acting as a wildcard character, they basically mean "match any single character"
Hopefully this will help you get what you need.
If you define a pattern for both your wildcard and your escape method, you can create a Regex which will find all the wildcards in your text. You can then use a MatchEvaluator to replace them.
class Program
{
static Dictionary<string, string> replacements = new Dictionary<string, string>();
static void Main(string[] args)
{
replacements.Add("\\Break", Environment.NewLine);
string template = #"This is an \\Break escaped newline and this should \Break contain a newline.";
// (?<=($|[^\\])(\\\\){0,}) will handle double escaped items
string outcome = Regex.Replace(template, #"(?<=($|[^\\])(\\\\){0,})\\\w+\b", ReplaceMethod);
}
public static string ReplaceMethod(Match m)
{
string replacement = null;
if (replacements.TryGetValue(m.Value, out replacement))
{
return replacement;
}
else
{
//return string.Empty?
//throw new FormatException()?
return m.Value;
}
}
}

Categories