Removing White Space: C# - c#

I am trying to remove white space that exists in a String input. My ultimate goal is to create an infix evaluator, but I am having issues with parsing the input expression.
It seems to me that the easy solution to this is using a Regular Expression function, namely Regex.Replace(...)
Here's what I have so far..
infixExp = Regex.Replace(infixExp, "\\s+", string.Empty);
string[] substrings = Regex.Split(infixExp, "(\\()|(\\))|(-)|(\\+)|(\\*)|(/)");
Assuming the user inputs the infix expression (2 + 3) * 4, I would expect that this would break the string into the array {(, 2, +, 3, ), *, 4}; however, after debugging, I am getting the following output:
infixExp = "(2+3)*7"
substrings = {"", (, 2, +, 3, ), "", *, 7}
It appears that the white space is being properly removed from the infix expression, but splitting the resulting string is improper.
Could anyone give me insight as to why? Likewise, if you have any constructive criticism or suggestions, let me know!

If a match is at one end of the string, you will get an empty match next to it. Likewise, if there are two adjacent matches, the string will be split on both of them, so you end up with an empty string in between. Citing MSDN:
If multiple matches are adjacent to one another, an empty string is inserted into the array. For example, splitting a string on a single hyphen causes the returned array to include an empty string in the position where two adjacent hyphens are found [...].
and
If a match is found at the beginning or the end of the input string, an empty string is included at the beginning or the end of the returned array.
Just filter them out in a second step.
Also, please make your life easier and use verbatim strings:
infixExp = Regex.Replace(infixExp, #"\s+", string.Empty);
string[] substrings = Regex.Split(infixExp, #"(\(|\)|-|\+|\*|/)");
The second expression could be simplified even further:
#"([()+*/-])"

Please, ditch Regex. There are better tools to use. You can use String.Trim(), .TrimEnd(), and .TrimStart().
string inputString = " asdf ";
string output = inputString.Trim();
For whitespace within the string, use String.Replace.
string output2 = output.Replace(" ", "");
You will have to expand this to other whitespace characters.

var result = Regex.Split(input, "(\\d+|\\D)")
.Where(x=>x!="").ToArray();

m.buettner's answer is correct. Also consider that you can do this in one step. From MSDN:
If capturing parentheses are used in a Regex.Split expression, any
captured text is included in the resulting string array.
Therefore, if you include the whitespace in the split pattern but outside the capturing parentheses, you can split on it as well but not include it in the result array:
var substrings = Regex.Split("(2 + 3) * 7", #"([()+*/-])|\s+");
The result:
substrings = {"", ( , 2, "", +, "", 3, ), "", "", *, "", 7}
And your final result would be:
substrings.Where(s => s != String.Empty)

Why not just remove the white spaces and then split the string with normal string handling functions? Like this...
string x = "(2 + 3) * 4";
x = x.Replace(" ", "").Replace("\t",""); //etc...
char[] y = x.ToCharArray();
Why bother making this more complicated than it needs to be?

A non-regex solution would probably be String.Replace - you could simply replace " ", "\t", and other whitespace with the empty string "".

I found the solution I was looking for thanks to all of your replies.
// Ignore all whitespace within the expression.
infixExp = Regex.Replace(infixExp, #"\s+", String.Empty);
// Seperate the expression based on the tokens (, ), +, -,
// *, /, and ignore any of the empty Strings that are added
// due to duplicates.
string[] substrings = Regex.Split(infixExp, #"([()+*/-])");
substrings = substrings.Where(s => s != String.Empty).ToArray();
By doing this it seperates the characters of the String into parts based on the regular mathematical operators (+, -, *, /) and parenthesis. After doing this it eliminates any remaining empty Strings within the substrings

Related

how can i split a string by multiple delimiters and keep the delimiters?

i have for exemple this string "abc({".
now, i want to split it by the "(" delimiter, and i know i can use String.split for that.
but is there a way i can split if by this symbol but not loss it? like if i used split i would have gotten this string[] = { "abc" , "{" } and i want { "abc" , "(" , "{" }.
also is there a way to do this with multiple delimiters?
Use Regex.Split with a pattern enclosed with a capturing group.
If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array.
See the C# demo:
var s = "abc({";
var results = Regex.Split(s, #"(\()")
.Where(m=>!string.IsNullOrEmpty(m))
.ToList();
Console.WriteLine(string.Join(", ", results));
// => abc, (, {
The (\() regex matches and captures ( symbol into Capturing group 1, and thus the captured part is also output in the resulting string list.

Splitting on “,” but not “/,”

Question: How do I write an expression to split a string on ',' but not '/,'? Later I'll want to replace '/,' with ', '.
Details...
Delimiter: ','
Skip Char: '/'
Example input: "Mister,Bill,is,made,of/,clay"
I want to split this input into an array: {"Mister", "Bill", "is", "made", "of, clay"}
I know how to do this with a char prev, cur; and some indexers, but that seems beta.
Java Regex has a split functionality, but I don't know how to replicate this behavior in C#.
Note: This isn't a duplicate question, this is the same question but for a different language.
I believe you're looking for a negative lookbehind:
var regex = new Regex("(?<!/),");
var result = regex.Split(str);
this will split str on all commas that are not preceded by a slash. If you want to keep the '/,' in the string then this will work for you.
Since you said that you wanted to split the string and later replace the '/,' with ', ', you'll want to do the above first then you can iterate over the result and replace the strings like so:
var replacedResult = result.Select(s => s.Replace("/,", ", ");
string s = "Mister,Bill,is,made,of/,clay";
var arr = s.Replace("/,"," ").Split(',');
result : {"Mister", "Bill", "is", "made", "of clay"}
Using Regex:
var result = Regex.Split("Mister,Bill,is,made,of/,clay", "(?<=[^/]),");
Just use a Replace to remove the commas from your string :
s.Replace("/,", "//").Split(',').Select(x => x.Replace("//", ","));
You can use this in c#
string regex = #"(?:[^\/]),";
var match = Regex.Split("Mister,Bill,is,made,of/,clay", regex, RegexOptions.IgnoreCase);
After that you can replace /, and continue your operation as you like

C# regexp negative lookahead

i have a problem with replacing characters after specific character. For example i want to replace first 'aa' to '33' with this code.
string str = "dc1aaaafg";
string pattern = #"a{2}(?!(1))";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(str, "33");
but the result is 'dc13333fg'. It replaced the second group after '1'. I need to replace only first group like 'dc133aafg'. How can i achive this. I have a large string and it can be many replacing, this is just example.
Regex.Replace() is global. It will replace as many times as the pattern matches*.
You could use Regex.Replace(String, String, Int32) to limit the number of operations.
string result = rgx.Replace(str, "33", 1);
Or you change the pattern to a look-behind.
Regex rgx = new Regex(#"(?<=1)a{2}");
string result = rgx.Replace(str, "33");
* Note that Replace() is global, but not incremental. Using the expression a{2} on "aaaaaa" to with the replacement "ba" will result in "bababa", not in "bbbbba".
There is an overload to the Replace method in which you can specify the number of times. Specify 1 and it shall do only the first match.
string result = rgx.Replace(str, "33", 1);
A regex pattern cannot express that only the first match is relevant.
Use Regex.Match to get the position and length of the first match. Then use Substring (or Remove followed by Insert) to construct a new string from the old string, that has the replacement you want.
Try with a negative look behind : (a{2})(?<!\1{2})
(a{2}) # 'a' two times
(?<! # negative look behind
\1{2} # '\1' is the captured group 'a' twice to "jump" over the captured group
)

Get sub-strings from a string that are enclosed using some specified character

Suppose I have a string
Likes (20)
I want to fetch the sub-string enclosed in round brackets (in above case its 20) from this string. This sub-string can change dynamically at runtime. It might be any other number from 0 to infinity. To achieve this my idea is to use a for loop that traverses the whole string and then when a ( is present, it starts adding the characters to another character array and when ) is encountered, it stops adding the characters and returns the array. But I think this might have poor performance. I know very little about regular expressions, so is there a regular expression solution available or any function that can do that in an efficient way?
If you don't fancy using regex you could use Split:
string foo = "Likes (20)";
string[] arr = foo.Split(new char[]{ '(', ')' }, StringSplitOptions.None);
string count = arr[1];
Count = 20
This will work fine regardless of the number in the brackets ()
e.g:
Likes (242535345)
Will give:
242535345
Works also with pure string methods:
string result = "Likes (20)";
int index = result.IndexOf('(');
if (index >= 0)
{
result = result.Substring(index + 1); // take part behind (
index = result.IndexOf(')');
if (index >= 0)
result = result.Remove(index); // remove part from )
}
Demo
For a strict matching, you can do:
Regex reg = new Regex(#"^Likes\((\d+)\)$");
Match m = reg.Match(yourstring);
this way you'll have all you need in m.Groups[1].Value.
As suggested from I4V, assuming you have only that sequence of digits in the whole string, as in your example, you can use the simpler version:
var res = Regex.Match(str,#"\d+")
and in this canse, you can get the value you are looking for with res.Value
EDIT
In case the value enclosed in brackets is not just numbers, you can just change the \d with something like [\w\d\s] if you want to allow in there alphabetic characters, digits and spaces.
Even with Linq:
var s = "Likes (20)";
var s1 = new string(s.SkipWhile(x => x != '(').Skip(1).TakeWhile(x => x != ')').ToArray());
const string likes = "Likes (20)";
int likesCount = int.Parse(likes.Substring(likes.IndexOf('(') + 1, (likes.Length - likes.IndexOf(')') + 1 )));
Matching when the part in paranthesis is supposed to be a number;
string inputstring="Likes (20)"
Regex reg=new Regex(#"\((\d+)\)")
string num= reg.Match(inputstring).Groups[1].Value
Explanation:
By definition regexp matches a substring, so unless you indicate otherwise the string you are looking for can occur at any place in your string.
\d stand for digits. It will match any single digit.
We want it to potentially be repeated several times, and we want at least one. The + sign is regexp for previous symbol or group repeated 1 or more times.
So \d+ will match one or more digits. It will match 20.
To insure that we get the number that is in paranteses we say that it should be between ( and ). These are special characters in regexp so we need to escape them.
(\d+) would match (20), and we are almost there.
Since we want the part inside the parantheses, and not including the parantheses we tell regexp that the digits part is a single group.
We do that by using parantheses in our regexp. ((\d+)) will still match (20), but now it will note that 20 is a subgroup of this match and we can fetch it by Match.Groups[].
For any string in parantheses things gets a little bit harder.
Regex reg=new Regex(#"\((.+)\)")
Would work for many strings. (the dot matches any character) But if the input is something like "This is an example(parantesis1)(parantesis2)", you would match (parantesis1)(parantesis2) with parantesis1)(parantesis2 as the captured subgroup. This is unlikely to be what you are after.
The solution can be to do the matching for "any character exept a closing paranthesis"
Regex reg=new Regex(#"\(([^\(]+)\)")
This will find (parantesis1) as the first match, with parantesis1 as .Groups[1].
It will still fail for nested paranthesis, but since regular expressions are not the correct tool for nested paranthesis I feel that this case is a bit out of scope.
If you know that the string always starts with "Likes " before the group then Saves solution is better.

extract last match from string in c#

i have strings in the form [abc].[some other string].[can.also.contain.periods].[our match]
i now want to match the string "our match" (i.e. without the brackets), so i played around with lookarounds and whatnot. i now get the correct match, but i don't think this is a clean solution.
(?<=\.?\[) starts with '[' or '.['
([^\[]*) our match, i couldn't find a way to not use a negated character group
`.*?` non-greedy did not work as expected with lookarounds,
it would still match from the first match
(matches might contain escaped brackets)
(?=\]$) string ends with an ]
language is .net/c#. if there is an easier solution not involving a regex i'd be also happy to know
what really irritates me is the fact, that i cannot use (.*?) to capture the string, as it seems non-greedy does not work with lookbehinds.
i also tried: Regex.Split(str, #"\]\.\[").Last().TrimEnd(']');, but i'm not really pround of this solution either
The following should do the trick. Assuming the string ends after the last match.
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
var search = new Regex("\\.\\[(.*?)\\]$", RegexOptions.RightToLeft);
string ourMatch = search.Match(input).Groups[1]);
Assuming you can guarantee the input format, and it's just the last entry you want, LastIndexOf could be used:
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
int lastBracket = input.LastIndexOf("[");
string result = input.Substring(lastBracket + 1, input.Length - lastBracket - 2);
With String.Split():
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
char[] seps = {'[',']','\\'};
string[] splitted = input.Split(seps,StringSplitOptions.RemoveEmptyEntries);
you get "out match" in splitted[7] and can.also.contain.periods is left as one string (splitted[4])
Edit: the array will have the string inside [] and then . and so on, so if you have a variable number of groups, you can use that to get the value you want (or remove the strings that are just '.')
Edited to add the backslash to the separator to treat cases like '\[abc\]'
Edit2: for nested []:
string input = #"[abc].[some other string].[can.also.contain.periods].[our [the] match]";
string[] seps2 = { "].["};
string[] splitted = input.Split(seps2, StringSplitOptions.RemoveEmptyEntries);
you our [the] match] in the last element (index 3) and you'd have to remove the extra ]
You have several options:
RegexOptions.RightToLeft - yes, .NET regex can do this! Use it!
Match the whole thing with greedy prefix, use brackets to capture the suffix that you're interested in
So generally, pattern becomes .*(pattern)
In this case, .*\[([^\]]*)\], then extract what \1 captures (see this on rubular.com)
References
regular-expressions.info/Grouping with brackets

Categories