Split string by character in C# - c#

I need to split this code by ',' in C#.
Sample string:
'DC0''008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'
I can use string.split(',') but as you can see 'Comm,erc,' is split up by
comm
erc
also 'DC0''008_' should split up as
'DC0''008_'
not as
'DC0'
'008_'
The expected output should be like this:
'DC0''008_'
'23802.76'
'23802.76'
'23802.76'
'Comm,erc,'
'2f17'
'3f44c0ba-daf1-44f0-a361-'

split can do it but regex will be more complex.
You can use Regex.Matches using this simpler regex:
'[^']*'
and get all quoted strings in a collection.
Code:
MatchCollection matches = Regex.Matches(input, #"'[^']*'");
To print all the matched values:
foreach (Match match in Regex.Matches(input, #"'[^']*'"))
Console.WriteLine("Found {0}", match.Value);
To store all matched values in an ArrayList:
ArrayList list = new ArrayList();
foreach (Match match in Regex.Matches(input, #"'[^']*'")) {
list.add(match.Value);
}
EDIT: As per comments below if OP wants to consume '' in the captured string then use this lookaround regex:
'.*?(?<!')'(?!')
(?<!')'(?!') means match a single quote that is not surrounded by another single quote.
RegEx Demo

You can use this Regex to get all the things inside the commas and apostrophes:
(?<=')[^,].*?(?=')
Regex101 Explanation
To convert it into a string array, you can use the following:
var matches = Regex.Matches(strInput, "(?<=')[^,].*?(?=')");
var array = matches.Cast<Match>().Select(x => x.Value).ToArray();
EDIT: If you want it to be able to capture double quotes, then the Regex that will match it in every case becomes unwieldy. At this point, It's better to just use a simpler pattern with Regex.Split:
var matches = Regex.Split(strInput, "^'|'$|','")
.Where(x => !string.IsNullOrEmpty(x))
.ToArray();

it is good to modify your string then split it so that you will achieve what you want like some thing below
string data = "'DC0008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'";
data = Process(data); //process before split i.e for the time being replace outer comma with some thing else like '#'
string[] result = data.Split('#'); // now it will work lolz not confirmed and tested
the Process() function is below
private string Process(string input)
{
bool flag = false;
string temp="";
char[] data = input.ToCharArray();
foreach(char ch in data)
{
if(ch == '\'' || ch == '"')
if(flag)
flag=false;
else
flag=true;
if(ch == ',')
{
if(flag) //if it is inside ignore else replace with #
temp+=ch;
else
temp+="#";
}
else
temp+=ch;
}
return temp;
}
see output here http://rextester.com/COAH43918

using System;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication15
{
class Program
{
static void Main(string[] args)
{
string str = "'DC0008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'";
var matches = Regex.Matches(str, "(?<=')[^,].*?(?=')");
var array = matches.Cast<Match>().Select(x => x.Value).ToArray();
foreach (var item in array)
Console.WriteLine("'" + item + "'");
}
}
}

Related

c# regrex for a string repeated multiple times

I would like to have a regualr expression for the string where output would be like:
CP_RENOUNCEABLE
CP_RIGHTS_OFFER_TYP
CP_SELLER_FEED_SOURCE
CP_SELLER_ID_BB_GLOBAL
CP_PX
CP_RATIO
CP_RECLASS_TYP
I tried using regex with
string pattern = #"ISNULL(*)";
string strSearch = #"
LTRIM(RTRIM(ISNULL(CP_RENOUNCEABLE,'x2x'))), ISNULL(CP_RIGHTS_OFFER_TYP,-1), LTRIM(RTRIM(ISNULL(CP_SELLER_FEED_SOURCE,'x2x'))),
LTRIM(RTRIM(ISNULL(CP_SELLER_ID_BB_GLOBAL,'x2x'))),ISNULL(CP_PX,-1), ISNULL(CP_RATIO,-1), ISNULL(CP_RECLASS_TYP,-1);
string pattern = #"ISNULL(*\)";
foreach (Match match in Regex.Matches(strSearch, pattern))
{
if (match.Success && match.Groups.Count > 0)
{
var text = match.Groups[1].Value;
}
}
My guess is that we'd be having a comma after our desired outputs listed in the question, which then this simple expression might suffice,
(CP_[A-Z_]+),
Demo 1
If my guess wasn't right, and we would have other chars after that such as an space, we can add a char class on the right side of our capturing group, such as this:
(CP_[A-Z_]+)[,\s]
and we would add any char that might occur after our desired strings in [,\s].
Demo 2
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(CP_[A-Z_]+),";
string input = #"LTRIM(RTRIM(ISNULL(CP_RENOUNCEABLE,'x2x'))), ISNULL(CP_RIGHTS_OFFER_TYP,-1), LTRIM(RTRIM(ISNULL(CP_SELLER_FEED_SOURCE,'x2x'))),
LTRIM(RTRIM(ISNULL(CP_SELLER_ID_BB_GLOBAL,'x2x'))),ISNULL(CP_PX,-1), ISNULL(CP_RATIO,-1), ISNULL(CP_RECLASS_TYP,-1);";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Edit:
For capturing what is in between ISNULL and the first comma, this might work:
ISNULL\((.+?),
Demo 3

Split constantly on the last delimiter in C#

I have the following string:
string x = "hello;there;;you;;;!;"
The result I want is a list of length four with the following substrings:
"hello"
"there;"
"you;;"
"!"
In other words, how do I split on the last occurrence when the delimiter is repeating multiple times? Thanks.
You need to use a regex based split:
var s = "hello;there;;you;;;!;";
var res = Regex.Split(s, #";(?!;)").Where(m => !string.IsNullOrEmpty(m));
Console.WriteLine(string.Join(", ", res));
// => hello, there;, you;;, !
See the C# demo
The ;(?!;) regex matches any ; that is not followed with ;.
To also avoid matching a ; at the end of the string (and thus keep it attached to the last item in the resulting list) use ;(?!;|$) where $ matches the end of string (can be replaced with \z if the very end of the string should be checked for).
It seems that you don't want to remove empty entries but keep the separators.
You can use this code:
string s = "hello;there;;you;;;!;";
MatchCollection matches = Regex.Matches(s, #"(.+?);(?!;)");
foreach(Match match in matches)
{
Console.WriteLine(match.Captures[0].Value);
}
string x = "hello;there;;you;;;!;"
var splitted = x.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptryEntries);
foreach (var s in splitted)
Console.WriteLine("{0}", s);

Problems with regex in c# only returning a single match

I'm building a regex and I'm missing something as it's not working properly.
my regex logic is trying to look for anything that has #anychars# and return the number of matches on the sentence and not a single match.
Here are a few examples
1- #_Title_# and #_Content_# should return two matches: #_Title_# and #_Content_#.
2- Product #_TemplateName_# #_Full_Product_Name_# more text. text text #_Short_Description_# should return 3 matches: #_TemplateName_# #_Full_Product_Name_# and #_Short_Description_#
and so on. Here is what my regex looks like: ^(.*#_.*_#.*)+$
any thoughts on what I'm doing wrong?
Something as simple as:
#.*?#
Or:
#_.*?_#
If you are trying to match the underscores too (it wasn't clear in the original version of the question). Or:
#_(.*?)_#
Which makes it easier to extract the token between your #_ and _# delimiters as a group.
Should work. The *? is key. It's non-greedy. Otherwise you match everything between the first and last #
So for example:
var str = "Product #_TemplateName_# #_Full_Product_Name_# more text. text text #_Short_Description_#";
var r = new Regex("#_(.*?)_#");
foreach (Match m in r.Matches(str))
{
Console.WriteLine(m.Value + "\t" + m.Groups[1].Value);
}
Outputs:
#_TemplateName_#     TemplateName
#_Full_Product_Name_#    Full_Product_Name
#_Short_Description_#    Short_Description
Try this :
string[] inputs = {
"#Title# and #Content#",
"Product #TemplateName# #_Full_Product_Name_# more text. text text #_Short_Description_#"
};
string pattern = "(?'string'#[^#]+#)";
foreach (string input in inputs)
{
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine(string.Join(",",matches.Cast<Match>().Select(x => x.Groups["string"].Value).ToArray()));
}
Console.ReadLine();
You regular expression is not correct. In addition, you want to loop through match if you want all matching.
static void Main(string[] args)
{
string input = "Product #_TemplateName_# #_Full_Product_Name_# more text. text text #_Short_Description_#",
pattern = "#_[a-zA-Z_]*_#";
Match match = Regex.Match(input, pattern);
while (match.Success)
{
Console.WriteLine(match.Value);
match = match.NextMatch();
}
Console.ReadLine();
}
Result
Don't use anchors and change your regex to:
(#[^#]+#)
In regex the [^#] expression means any character BUT #
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(#[^#]+#)";
Regex rgx = new Regex(pattern);
string sentence = "#blah blah# asdfasdfaf #somethingelse#";
foreach (Match match in rgx.Matches(sentence))
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
}
}

Matching any word enclosed in parentheses in a sentence

I am trying to find a regex to match any word enclosed in parentheses in a sentence.
Suppose, I have a sentence.
"Welcome, (Hello, All of you) to the Stack Over flow."
Say if my matching word is Hello,, All, of or you. It should return true.
Word could contain anything number , symbol but separated from other by white-space
I tried with this \(([^)]*)\). but this returns all words enclosed by parentheses
static void Main(string[] args)
{
string ss = "Welcome, (Hello, All of you) to the Stack Over flow.";
Regex _regex = new Regex(#"\(([^)]*)\)");
Match match = _regex.Match(ss.ToLower());
if (match.Success)
{
ss = match.Groups[0].Value;
}
}
Help and Guidance is very much appreciated.
Thanks.
Thanks People for you time and answers. I have finally solved by changing my code as reply by Tim.
For People with similar problem. I am writing my final code here
static void Main(string[] args)
{
string ss = "Welcome, (Hello, All of you) to the Stack Over flow.";
Regex _regex = new Regex(#"[^\s()]+(?=[^()]*\))");
Match match = _regex.Match(ss.ToLower());
while (match.Success)
{
ss = match.Groups[0].Value;
Console.WriteLine(ss);
match = match.NextMatch();
}
}
OK, so it seems that a "word" is anything that's not whitespace and doesn't contain parentheses, and that you want to match a word if the next parenthesis character that follows is a closing parenthesis.
So you can use
[^\s()]+(?=[^()]*\))
Explanation:
[^\s()]+ matches a "word" (should be easy to understand), and
(?=[^()]*\)) makes sure that a closing parenthesis follows:
(?= # Look ahead to make sure the following regex matches here:
[^()]* # Any number of characters except parentheses
\) # followed by a closing parenthesis.
) # (End of lookahead assertion)
I've developed a c# function for you, if you are interested.
public static class WordsHelper
{
public static List<string> GetWordsInsideParenthesis(string s)
{
List<int> StartIndices = new List<int>();
var rtn = new List<string>();
var numOfOpen = s.Where(m => m == '(').ToList().Count;
var numOfClose = s.Where(m => m == ')').ToList().Count;
if (numOfClose == numOfOpen)
{
for (int i = 0; i < numOfOpen; i++)
{
int ss = 0, sss = 0;
if (StartIndices.Count == 0)
{
ss = s.IndexOf('(') + 1; StartIndices.Add(ss);
sss = s.IndexOf(')');
}
else
{
ss = s.IndexOf('(', StartIndices.Last()) + 1;
sss = s.IndexOf(')', ss);
}
var words = s.Substring(ss, sss - ss).Split(' ');
foreach (string ssss in words)
{
rtn.Add(ssss);
}
}
}
return rtn;
}
}
Just call it this way:
var text = "Welcome, (Hello, All of you) to the (Stack Over flow).";
var words = WordsHelper.GetWordsInsideParenthesis(s);
Now you'll have a list of words in words variable.
Generally, you should opt for c# coding, rather than regex because c# is far more efficient and readable and better than regex in performance wise.
But, if you want to stick on to Regex, then its ok, do the following:
If you want to use regex, keep the regex from Tim Pietzcker [^\s()]+(?=[^()]*\)) but use it this way:
var text="Welcome, (Hello, All of you) to the (Stack Over flow).";
var values= Regex.Matches(text,#"[^\s()]+(?=[^()]*\))");
now values contains MatchCollection
You can access the value using index and Value property
Something like this:
string word=values[0].Value;
(?<=[(])[^)]+(?=[)])
Matches all words in parentheses
(?<=[(]) Checks for (
[^)]+ Matches everything up to but not including a )
(?=[)]) Checks for )

C# regex.split method is adding empty string before parenthesis

I have some code that tokenizes a equation input into a string array:
string infix = "( 5 + 2 ) * 3 + 4";
string[] tokens = tokenizer(infix, #"([\+\-\*\(\)\^\\])");
foreach (string s in tokens)
{
Console.WriteLine(s);
}
Now here is the tokenizer function:
public string[] tokenizer(string input, string splitExp)
{
string noWSpaceInput = Regex.Replace(input, #"\s", "");
Console.WriteLine(noWSpaceInput);
Regex RE = new Regex(splitExp);
return (RE.Split(noWSpaceInput));
}
When I run this, I get all characters split, but there is an empty string inserted before the parenthesis chracters...how do I remove this?
//empty string here
(
5
+
2
//empty string here
)
*
3
+
4
I would just filter them out:
public string[] tokenizer(string input, string splitExp)
{
string noWSpaceInput = Regex.Replace(input, #"\s", "");
Console.WriteLine(noWSpaceInput);
Regex RE = new Regex(splitExp);
return (RE.Split(noWSpaceInput)).Where(x => !string.IsNullOrEmpty(x)).ToArray();
}
What you're seeing is because you have nothing then a separator (i.e. at the beginning of the string is(), then two separator characters next to one another (i.e. )* in the middle). This is by design.
As you may have found with String.Split, that method has an optional enum which you can give to have it remove any empty entries, however, there is no such parameter with regular expressions. In your specific case you could simply ignore any token with a length of 0.
foreach (string s in tokens.Where(tt => tt.Length > 0))
{
Console.WriteLine(s);
}
Well, one option would be to filter them out afterwards:
return RE.Split(noWSpaceInput).Where(x => !string.IsNullOrEmpty(x)).ToArray();
Try this (if you don't want to filter the result):
tokenizer(infix, #"(?=[-+*()^\\])|(?<=[-+*()^\\])");
Perl demo:
perl -E "say join ',', split /(?=[-+*()^])|(?<=[-+*()^])/, '(5+2)*3+4'"
(,5,+,2,),*,3,+,4
Altho it would be better to use a match instead of split in this case imo.
I think you can use the [StringSplitOptions.RemoveEmptyEntries] by the split
static void Main(string[] args)
{
string infix = "( 5 + 2 ) * 3 + 4";
string[] results = infix.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (var result in results)
Console.WriteLine(result);
Console.ReadLine();
}

Categories