Regex removing empty spaces when using replace - c#

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?

The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.

After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1

By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.

Related

C# Regex Match between with or without new lines

I am trying to match text between two delimiters, [% %], and I want to get everything whether the string contains new lines or not.
Code
string strEmailContent = sr.ReadToEnd();
string commentPatt = #"\[%((\r\n?|\n).*(\r\n?|\n))%\]";
Regex commentRgx = new Regex(commentPatt, RegexOptions.Singleline);
Sample Inputs
//Successful
[%
New Comment
%] other content from input
//Match: [%\r\nNew Comment\r\n%]
//Fail
[% New Comment %]
//Match: false
//Successfully match single line with
string commentPatt = #"\[%(.*)%\]";
//Match: [% New Comment %]
I do not know how to combine these two patterns to match both cases. Can anyone provide any assistance?
To get text between two delimiters you need to use lazy matching with .*?, but to also match newline symbols, you need (?s) singleline modifier so that the dot could also match newline symbols:
(?s)\[%(.*?)%]
Note that (?s)\[%(.*?)%] will match even if the % is inside [%...%].
See regex demo. Note that the ] does not have to be escaped since it is situated in an unambiguous position and can only be interpreted as a literal ].
In C#, you can use
var rx = new Regex(#"(?s)\[%(.*?)%]");
var res = rx.Matches(str).Cast<Match>().Select(p => p.Groups[1].Value).ToList();
Try this pattern:
\[%([^%]*)%\]
It captures all characters between "[%" and "%]" that is not a "%" character.
Tested # Regex101
If you want to "see" the "\r\n" in your results, you'll have to escape them with a String.Replace().
See Fiddle Demo

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?
Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.
Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).
Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

C# regexp negative lookahead

i have a problem with replacing characters after specific character. For example i want to replace first 'aa' to '33' with this code.
string str = "dc1aaaafg";
string pattern = #"a{2}(?!(1))";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(str, "33");
but the result is 'dc13333fg'. It replaced the second group after '1'. I need to replace only first group like 'dc133aafg'. How can i achive this. I have a large string and it can be many replacing, this is just example.
Regex.Replace() is global. It will replace as many times as the pattern matches*.
You could use Regex.Replace(String, String, Int32) to limit the number of operations.
string result = rgx.Replace(str, "33", 1);
Or you change the pattern to a look-behind.
Regex rgx = new Regex(#"(?<=1)a{2}");
string result = rgx.Replace(str, "33");
* Note that Replace() is global, but not incremental. Using the expression a{2} on "aaaaaa" to with the replacement "ba" will result in "bababa", not in "bbbbba".
There is an overload to the Replace method in which you can specify the number of times. Specify 1 and it shall do only the first match.
string result = rgx.Replace(str, "33", 1);
A regex pattern cannot express that only the first match is relevant.
Use Regex.Match to get the position and length of the first match. Then use Substring (or Remove followed by Insert) to construct a new string from the old string, that has the replacement you want.
Try with a negative look behind : (a{2})(?<!\1{2})
(a{2}) # 'a' two times
(?<! # negative look behind
\1{2} # '\1' is the captured group 'a' twice to "jump" over the captured group
)

Replacing special chars in a string with a single unique char

I have a string like so:
string inputStr = "Name*&^%LastName*##";
The following Regex will replace all the special chars with a '-'
Regex rgx = new Regex("[^a-zA-Z0-9 - _]");
someStr = rgx.Replace(someStr, "-");
That produces an output something like:
Name---LastName---
How do I replace '---' with a single '-' so the output looks like this:
Name-LastName
So the question is how do I replace all the special chars with a single '-'?
Regards.
Try this
Regex rgx = new Regex("[^a-zA-Z0-9 \- _]+");//note - character is escaped
or
Regex rgx = new Regex("[^a-zA-Z0-9 _-]+");//or use - as last character
But this will give Name-LastName- Is this okay or..?
If you don't need - at last position you can use the following code as well. Credit goes to
#MatthewStrawbridge. You can see in comments.
string someStr = rgx.Replace(inputStr, "-").TrimEnd('-');
will output Name-LastName.
Edit: As #pguardiario pointed in comments updated my answer to escape - since range([]) has special meaning for - character. If we need - as a literal we need to escape it or make it first or last character of the character class in order to behave as literal.

How write a regex with group matching?

Here is the data source, lines stored in a txt file:
servers[i]=["name1", type1, location3];
servers[i]=["name2", type2, location3];
servers[i]=["name3", type1, location7];
Here is my code:
string servers = File.ReadAllText("servers.txt");
string pattern = "^servers[i]=[\"(?<name>.*)\", (.*), (?<location>.*)];$";
Regex reg = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match m;
for (m = reg.Match(servers); m.Success; m = m.NextMatch()) {
string name = m.Groups["name"].Value;
string location = m.Groups["location"].Value;
}
No lines are matching. What am I doing wrong?
If you don't care about anything except the servername and location, you don't need to specify the rest of the input in your regex. That lets you avoid having to escape the brackets, as Graeme correctly points out. Try something like:
string pattern = "\"(?<name>.+)\".+\s(?<location>[^ ]+)];$"
That's
\" = quote mark,
(?<name> = start capture group 'name',
.+ = match one or more chars (could use \w+ here for 1+ word chars)
) = end the capture group
\" = ending quote mark
.+\s = one or more chars, ending with a space
(?<location> = start capture group 'location',
[^ ]+ = one or more non-space chars
) = end the capture group
];$ = immediately followed by ]; and end of string
I tested this using your sample data in Rad Software's free Regex Designer, which uses the .NET regex engine.
I don't know if C# regex's are the same as perl, but if so, you probably want to escape the [ and ] characters. Also, there are extra characters in there. Try this:
string pattern = "^servers\[i\]=\[\"(?<name>.*)\", (.*), (?<location>.*)\];$";
Edited to add: After wondering why my answer was downvoted and then looking at Val's answer, I realized that the "extra characters" were there for a reason. They are what perl calls "named capture buffers", which I have never used but the original code fragment does. I have updated my answer to include them.
try this
string pattern = "servers[i]=[\"(?<name>.*)\", (.*), (?<location>.*)];$";

Categories