Add prefix to special characters with Regular Expressions - c#

I have a list of special characters that includes ^ $ ( ) % . [ ] * + - ?. I want put % in front of this special characters in a string value.
I need this to generate a Lua script to use in Redis.
For example Test$String? must be change to Test%$String%?.
Is there any way to do this with regular expressions in C#?

In C#, you just need a Regex.Replace:
var LuaEscapedString = Regex.Replace(input, #"[][$^()%.*+?-]", "%$&");
See the regex demo
The [][$^()%.*+?-] character class will match a single character, either a ], [, $, ^, (, ), %, ., *, +, ? or - and will reinsert it back with the $& backreference in the replacement pattern pre-pending with a % character.
A lookahead is just a redundant overhead here (or a show-off trick for your boss).

You can use lookaheads and replace with %
/(?=[]*+$?)[(.-])/
Regex Demo
(?=[]*+$?)[(.-]) Postive lookahead, checks if the character following any one from the altenation []. If yes, substitutes with %

You can use this regex: ([\\^$()%.\\[\\]*+\\-?])
It will match and capture characters inside the character class. Then you can use $1 to reference the captured character and insert % before it, like so: %$1.
Here is an example code and demo:
string input = "Test$String?";
string pattern = "([\\^$()%.\\[\\]*+\\-?])";
string replacement = "%$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);

You can use (?=[\^\$()\%.[]*+-\?]) regex replaced String as "%"

Related

Regex only letters except set of numbers

I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)
My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle
Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle
It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");

Regex.IsMatch is not working when text including "$"

Regex.IsMatch method returns the wrong result while checking the following condition,
string text = "$0.00";
Regex compareValue = new Regex(text);
bool result = compareValue.IsMatch(text);
The above code returns as "False". Please let me know if i missed anything.
The Regex class has a special method for escaping characters in a pattern: Regex.Escape()
Change your code like this:
string text = "$0.00";
Regex compareValue = new Regex(Regex.Escape(text)); // Escape characters in text
bool result = compareValue.IsMatch(text);
"$" is a special character in C# regex. Escape it first.
Regex compareValue = new Regex(#"\$0\.00");
bool result = compareValue.IsMatch("$0.00");
Regex expressions: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
Both '.' and '$' are special characters and thus you need to escape them if you want to match the character itself. '.' matches any character and '$' matches the end of a string
see: https://regex101.com/r/pK2uY6/1
You have to escape $ since it is a special (reserved) character which means "end of string". In case . means just dot (say, decimal separator) you have to escape it as well (when not escaped, . means "any symbol"):
string pattern = #"\$0\.00";
bool result = RegEx.IsMatch(text, pattern);
As for your original pattern, it has no chance to match any string, since $0.00 means
$ end of string, followed by
0 zero
. any character
0 zero
0 zero
but end of string can't be followed by...

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?
Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.
Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).
Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

Regex removing empty spaces when using replace

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.

Regular expression query

I am looking for regular expression that evaluates below.
0-9 a-Z A-Z - / '
The C# version of this pattern is:
#"[0-9a-zA-Z/'-]"
Used in code:
var regex = new Regex(#"[0-9a-zA-Z/'-]");
or
var regex = new Regex(#"[0-9a-z/'-]", RegexOptions.IgnoreCase);
Note that the - is at the very end of the character class (the part in the brackets). For - to mean a literal hyphen inside a character class, it must be at the beginning or end of the class (i.e. [-blah] or [blah-]), or escaped with a backslash: [ab\-c] will match a, b, c, or -.
Note also the # at the beginning of the quoted string. This isn't important for this pattern, but it's a good habit to get into with C# regex. Regular expressions often contain backslashes, and the #"..." form will allow you to use backslashes in your pattern without having to escape them.
Use bellow code to validate(Regex patterns) Alphabetic and numbers:
String name="123ABCabc";
if(System.Text.RegularExpressions.Regex.Match(name, #"[0-9a-zA-Z_]") == true)
{
return true;
}
else
{
return false;
}
In the case you want to match digits, lower- and higher-case latin characters, "-", "/" and "'" then I would suggest the following:
[0-9a-zA-Z-\/\']

Categories