C# regex.split method is adding empty string before parenthesis - c#

I have some code that tokenizes a equation input into a string array:
string infix = "( 5 + 2 ) * 3 + 4";
string[] tokens = tokenizer(infix, #"([\+\-\*\(\)\^\\])");
foreach (string s in tokens)
{
Console.WriteLine(s);
}
Now here is the tokenizer function:
public string[] tokenizer(string input, string splitExp)
{
string noWSpaceInput = Regex.Replace(input, #"\s", "");
Console.WriteLine(noWSpaceInput);
Regex RE = new Regex(splitExp);
return (RE.Split(noWSpaceInput));
}
When I run this, I get all characters split, but there is an empty string inserted before the parenthesis chracters...how do I remove this?
//empty string here
(
5
+
2
//empty string here
)
*
3
+
4

I would just filter them out:
public string[] tokenizer(string input, string splitExp)
{
string noWSpaceInput = Regex.Replace(input, #"\s", "");
Console.WriteLine(noWSpaceInput);
Regex RE = new Regex(splitExp);
return (RE.Split(noWSpaceInput)).Where(x => !string.IsNullOrEmpty(x)).ToArray();
}

What you're seeing is because you have nothing then a separator (i.e. at the beginning of the string is(), then two separator characters next to one another (i.e. )* in the middle). This is by design.
As you may have found with String.Split, that method has an optional enum which you can give to have it remove any empty entries, however, there is no such parameter with regular expressions. In your specific case you could simply ignore any token with a length of 0.
foreach (string s in tokens.Where(tt => tt.Length > 0))
{
Console.WriteLine(s);
}

Well, one option would be to filter them out afterwards:
return RE.Split(noWSpaceInput).Where(x => !string.IsNullOrEmpty(x)).ToArray();

Try this (if you don't want to filter the result):
tokenizer(infix, #"(?=[-+*()^\\])|(?<=[-+*()^\\])");
Perl demo:
perl -E "say join ',', split /(?=[-+*()^])|(?<=[-+*()^])/, '(5+2)*3+4'"
(,5,+,2,),*,3,+,4
Altho it would be better to use a match instead of split in this case imo.

I think you can use the [StringSplitOptions.RemoveEmptyEntries] by the split
static void Main(string[] args)
{
string infix = "( 5 + 2 ) * 3 + 4";
string[] results = infix.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (var result in results)
Console.WriteLine(result);
Console.ReadLine();
}

Related

Split constantly on the last delimiter in C#

I have the following string:
string x = "hello;there;;you;;;!;"
The result I want is a list of length four with the following substrings:
"hello"
"there;"
"you;;"
"!"
In other words, how do I split on the last occurrence when the delimiter is repeating multiple times? Thanks.
You need to use a regex based split:
var s = "hello;there;;you;;;!;";
var res = Regex.Split(s, #";(?!;)").Where(m => !string.IsNullOrEmpty(m));
Console.WriteLine(string.Join(", ", res));
// => hello, there;, you;;, !
See the C# demo
The ;(?!;) regex matches any ; that is not followed with ;.
To also avoid matching a ; at the end of the string (and thus keep it attached to the last item in the resulting list) use ;(?!;|$) where $ matches the end of string (can be replaced with \z if the very end of the string should be checked for).
It seems that you don't want to remove empty entries but keep the separators.
You can use this code:
string s = "hello;there;;you;;;!;";
MatchCollection matches = Regex.Matches(s, #"(.+?);(?!;)");
foreach(Match match in matches)
{
Console.WriteLine(match.Captures[0].Value);
}
string x = "hello;there;;you;;;!;"
var splitted = x.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptryEntries);
foreach (var s in splitted)
Console.WriteLine("{0}", s);

Split string by character in C#

I need to split this code by ',' in C#.
Sample string:
'DC0''008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'
I can use string.split(',') but as you can see 'Comm,erc,' is split up by
comm
erc
also 'DC0''008_' should split up as
'DC0''008_'
not as
'DC0'
'008_'
The expected output should be like this:
'DC0''008_'
'23802.76'
'23802.76'
'23802.76'
'Comm,erc,'
'2f17'
'3f44c0ba-daf1-44f0-a361-'
split can do it but regex will be more complex.
You can use Regex.Matches using this simpler regex:
'[^']*'
and get all quoted strings in a collection.
Code:
MatchCollection matches = Regex.Matches(input, #"'[^']*'");
To print all the matched values:
foreach (Match match in Regex.Matches(input, #"'[^']*'"))
Console.WriteLine("Found {0}", match.Value);
To store all matched values in an ArrayList:
ArrayList list = new ArrayList();
foreach (Match match in Regex.Matches(input, #"'[^']*'")) {
list.add(match.Value);
}
EDIT: As per comments below if OP wants to consume '' in the captured string then use this lookaround regex:
'.*?(?<!')'(?!')
(?<!')'(?!') means match a single quote that is not surrounded by another single quote.
RegEx Demo
You can use this Regex to get all the things inside the commas and apostrophes:
(?<=')[^,].*?(?=')
Regex101 Explanation
To convert it into a string array, you can use the following:
var matches = Regex.Matches(strInput, "(?<=')[^,].*?(?=')");
var array = matches.Cast<Match>().Select(x => x.Value).ToArray();
EDIT: If you want it to be able to capture double quotes, then the Regex that will match it in every case becomes unwieldy. At this point, It's better to just use a simpler pattern with Regex.Split:
var matches = Regex.Split(strInput, "^'|'$|','")
.Where(x => !string.IsNullOrEmpty(x))
.ToArray();
it is good to modify your string then split it so that you will achieve what you want like some thing below
string data = "'DC0008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'";
data = Process(data); //process before split i.e for the time being replace outer comma with some thing else like '#'
string[] result = data.Split('#'); // now it will work lolz not confirmed and tested
the Process() function is below
private string Process(string input)
{
bool flag = false;
string temp="";
char[] data = input.ToCharArray();
foreach(char ch in data)
{
if(ch == '\'' || ch == '"')
if(flag)
flag=false;
else
flag=true;
if(ch == ',')
{
if(flag) //if it is inside ignore else replace with #
temp+=ch;
else
temp+="#";
}
else
temp+=ch;
}
return temp;
}
see output here http://rextester.com/COAH43918
using System;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication15
{
class Program
{
static void Main(string[] args)
{
string str = "'DC0008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'";
var matches = Regex.Matches(str, "(?<=')[^,].*?(?=')");
var array = matches.Cast<Match>().Select(x => x.Value).ToArray();
foreach (var item in array)
Console.WriteLine("'" + item + "'");
}
}
}

how to remove special char from the string and make new string?

I have a string 4(4X),4(4N),3(3X) from this string I want to make string 4,4,3. If I am getting the string 4(4N),3(3A),2(2X) then I want to make my string 4,3,2.
Please someone tell me how can I solve my problem.
This Linq query selects substring from each part of input string, starting from beginning till first open brace:
string input = "4(4N),3(3A),2(2X)";
string result = String.Join(",", input.Split(',')
.Select(s => s.Substring(0, s.IndexOf('('))));
// 4,3,2
This may help:
string inputString = "4(4X),4(4N),3(3X)";
string[] temp = inputString.Split(',');
List<string> result = new List<string>();
foreach (string item in temp)
{
result.Add(item.Split('(')[0]);
}
var whatYouNeed = string.Join(",", result);
You can use regular expressions
String input = #"4(4X),4(4N),3(3X)";
String pattern = #"(\d)\(\1.\)";
// ( ) - first group.
// \d - one number
// \( and \) - braces.
// \1 - means the repeat of first group.
String result = Regex.Replace(input, pattern, "$1");
// $1 means, that founded patterns will be replcaed by first group
//result = 4,4,3

How to break a string at each comma?

Hi guys I have a problem at hand that I can't seem to figure out, I have a string (C#) which looks like this:
string tags = "cars, motor, wheels, parts, windshield";
I need to break this string at every comma and get each word assign to a new string by itself like:
string individual_tag = "car";
I know I have to do some kind of loop here but I'm not really sure how to approach this, any help will be really appreciate it.
No loop needed. Just a call to Split():
var individualStrings = tags.Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries);
You can use one of String.Split methods
Split Method (Char[])
Split Method (Char[], StringSplitOptions)
Split Method (String[], StringSplitOptions)
let's try second option:
I'm giving , and space as split chars then on each those character occurrence input string will be split, but there can be empty strings in the results. we can remove them using StringSplitOptions.RemoveEmptyEntries parameter.
string[] tagArray = tags.Split(new char[]{',', ' '},
StringSplitOptions.RemoveEmptyEntries);
OR
string[] tagArray = s.Split(", ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries);
you can access each tag by:
foreach (var t in tagArray )
{
lblTags.Text = lblTags.Text + " " + t; // update lable with tag values
//System.Diagnostics.Debug.WriteLine(t); // this result can be see on your VS out put window
}
make use of Split function will do your task...
string[] s = tags.Split(',');
or
String.Split Method (Char[], StringSplitOptions)
char[] charSeparators = new char[] {',',' '};
string[] words = tags.Split(charSeparators, StringSplitOptions.RemoveEmptyEntries);
string[] words = tags.Split(',');
You are looking for the C# split() function.
string[] tags = tags.Split(',');
Edit:
string[] tag = tags.Trim().Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries);
You should definitely use the form supplied by Justin Niessner. There were two key differences that may be helpful depending on the input you receive:
You had spaces after your ,s so it would be best to split on ", "
StringSplitOptions.RemoveEmptyEntries will remove the empty entry that is possible in the case that you have a trailing comma.
Program that splits on spaces [C#]
using System;
class Program
{
static void Main()
{
string s = "there, is, a, cat";
string[] words = s.Split(", ".ToCharArray());
foreach (string word in words)
{
Console.WriteLine(word);
}
}
}
Output
there
is
a
cat
Reference

Remove formatting on string literal

Given the c# code:
string foo = #"
abcde
fghijk";
I am trying to remove all formatting, including whitespaces between the lines.
So far the code
foo = foo.Replace("\n","").Replace("\r", "");
works but the whitespace between lines 2 and 3 and still kept.
I assume a regular expression is the only solution?
Thanks.
I'm assuming you want to keep multiple lines, if not, i'd choose CAbbott's answer.
var fooNoWhiteSpace = string.Join(
Environment.NewLine,
foo.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim())
);
What this does it split the string into lines (foo.Split),
trim whitespace from the start and end of each line (.Select(fooline => fooline.Trim())),
then combine them back together with a new line inbetween (string.Join).
You could use a regular expression:
foo = Regex.Replace(foo, #"\s+", "");
How about this?
string input = #"
abcde
fghijk";
string output = "";
string[] parts = input.Split('\n');
foreach (var part in parts)
{
// If you want everything on one line... else just + "\n" to it
output += part.Trim();
}
This should remove everthing.
If the whitespace is all spaces, you could use
foo.Replace(" ", "");
For any other whitespace that may be in there, do the same. Example:
foo.Replace("\t", "");
Just add a Replace(" ", "") your dealing with a string literal which mean all the white space is part of the string.
Try something like this:
string test = #"
abcde
fghijk";
EDIT: Addded code to only filter out white spaces.
string newString = new string(test.Where(c => Char.IsWhiteSpace(c) == false).ToArray());
Produces the following: abcdefghijk
I've written something similar to George Duckett but put my logic into a string extension method so it easier for other to read/consume:
public static class Extensions
{
public static string RemoveTabbing(this string fmt)
{
return string.Join(
System.Environment.NewLine,
fmt.Split(new string[] { System.Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim()));
}
}
you can the call it like this:
string foo = #"
abcde
fghijk".RemoveTabbing();
I hope that helps someone

Categories