C# string manipulation Regex or substring? - c#

I have a
string CCstring = "CC01=50 CC02=300 CC03=500 CC04=40";
I want to store the individual values in seperate strings like:
for(int i = 0; i<=4; i++)
{
string suffix = i.ToString().PadLeft(2, '0');
string CCindividual = CCindividual + i;
CCindividual = //THIS IS WHERE I WOULD LIKE TO GET MY INDIVIDUAL VALUES i.e 50,300,500,40;
Console.WriteLn("CC" + i + " =" + CCIndividual);//Testing
}
Which string manipulation should I use Regex or Substring. How would the code snippet look like?

One line:
string[] CCindividual = Regex.Split(CCstring, "CC[0-9]+=").Where(x => x != "").
Select(x => x.Trim()).ToArray<String>();
Not sure this is the more efficient way though.

Can't you use split to first split on spaces and next on '='? It's easier than regex or substring imho.

Neither. You can use string.Split to get an array:
string CCstring = "CC01=50 CC02=300 CC03=500 CC04=40";
string[] strings = CCstring.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
After that, you are able to do the same for the = using string.Split(new char[] { '=' }, StringSplitOptions.RemoveEmptyEntries);.

Unless you need this code to run very very efficiently. You should be worried about what is readable for you and your team (sometimes thats substring, split etc and sometimes thats regex). Only you can really decide.

just use String.Split
string CCstring = "CC01=50 CC02=300 CC03=500 CC04=40";
var result = CCstring.Split(' ')
.Select(s => s.Split('='))
.ToDictionary(kv => kv[0], kv => Convert.ToInt64(kv[1]));

Looking at your CCstring it will be definitely faster to walk through the string characters just once. Sure it doesn't worth that until you have tons of such strings.
So, yep, it's easier to use just string.Split once for spaces, and once for each fragmet to split by '='.

Related

Splitting string on multi-character delimeter

string Idstr="ID03I010102010210AEMPD4677EID03I020102020208L8159734ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201";
string[] stringSeparators = new string[] { "ID03I0" };
string[] result;
result = IdStr.Split(stringSeparators, StringSplitOptions.RemoveEmptyEntries);
This is the result:
result[0]=10102010210AEMPD4677E
result[1]=20102020208L8159734
result[3]=30102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201
Desired result:
result[0]=ID03I010102010210AEMPD4677E
result[1]=ID03I020102020208L8159734
result[3]=ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201
As you can see I want to include delimiter ID03I0 to the elements.
NOTE: I know I can include it by hardcoding it. But that's not the way I want to do it.
result = IdStr.Split(stringSeparators, StringSplitOptions.RemoveEmptyEntries)
.Select(x => stringSeparators[0] + x).ToArray();
This adds the seperator to the beginning at every element within your array.
EDIT: Unfortunately with this approach you are limited to use just one single delimiter. So if you want to add more you´d use Regex instead.
Following Regex pattern should work.
string input = "ID03I010102010210AEMPD4677EID03I020102020208L8159734ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201";
string delimiter = "ID03I0";//Modify it as you need
string pattern = string.Format("(?<=.)(?={0})", delimiter);
string[] result = Regex.Split(input, pattern);
Online Demo
Adapted from this answer.

Splitting on “,” but not “/,”

Question: How do I write an expression to split a string on ',' but not '/,'? Later I'll want to replace '/,' with ', '.
Details...
Delimiter: ','
Skip Char: '/'
Example input: "Mister,Bill,is,made,of/,clay"
I want to split this input into an array: {"Mister", "Bill", "is", "made", "of, clay"}
I know how to do this with a char prev, cur; and some indexers, but that seems beta.
Java Regex has a split functionality, but I don't know how to replicate this behavior in C#.
Note: This isn't a duplicate question, this is the same question but for a different language.
I believe you're looking for a negative lookbehind:
var regex = new Regex("(?<!/),");
var result = regex.Split(str);
this will split str on all commas that are not preceded by a slash. If you want to keep the '/,' in the string then this will work for you.
Since you said that you wanted to split the string and later replace the '/,' with ', ', you'll want to do the above first then you can iterate over the result and replace the strings like so:
var replacedResult = result.Select(s => s.Replace("/,", ", ");
string s = "Mister,Bill,is,made,of/,clay";
var arr = s.Replace("/,"," ").Split(',');
result : {"Mister", "Bill", "is", "made", "of clay"}
Using Regex:
var result = Regex.Split("Mister,Bill,is,made,of/,clay", "(?<=[^/]),");
Just use a Replace to remove the commas from your string :
s.Replace("/,", "//").Split(',').Select(x => x.Replace("//", ","));
You can use this in c#
string regex = #"(?:[^\/]),";
var match = Regex.Split("Mister,Bill,is,made,of/,clay", regex, RegexOptions.IgnoreCase);
After that you can replace /, and continue your operation as you like

Is there a method for removing whitespace characters from a string?

Is there a string class member function (or something else) for removing all spaces from a string? Something like Python's str.strip() ?
You could simply do:
myString = myString.Replace(" ", "");
If you want to remove all white space characters you could use Linq, even if the syntax is not very appealing for this use case:
myString = new string(myString.Where(c => !char.IsWhiteSpace(c)).ToArray());
String.Trim method removes trailing and leading white spaces. It is the functional equivalent of Python's strip method.
LINQ feels like overkill here, converting a string to a list, filtering the list, then turning it back onto a string. For removal of all white space, I would go for a regular expression. Regex.Replace(s, #"\s", ""). This is a common idiom and has probably been optimized.
If you want to remove the spaces that prepend the string or at itt's end, you might want to have a look at TrimStart() and TrimEnd() and Trim().
If you're looking to replace all whitespace in a string (not just leading and trailing whitespace) based on .NET's determination of what's whitespace or not, you could use a pretty simple LINQ query to make it work.
string whitespaceStripped = new string((from char c in someString
where !char.IsWhiteSpace(c)
select c).ToArray());
Yes, Trim.
String a = "blabla ";
var b = a.Trim(); // or TrimEnd or TrimStart
Yes, String.Trim().
var result = " a b ".Trim();
gives "a b" in result. By default all whitespace is trimmed. If you want to remove only space you need to type
var result = " a b ".Trim(' ');
If you want to remove all spaces in a string you can use string.Replace().
var result = " a b ".Replace(" ", "");
gives "ab" in result. But that is not equivalent to str.strip() in Python.
I don't know much about Python...
IF the str.strip() just removes whitespace at the start and the end then you could use str = str.Trim() in .NET... otherwise you could just str = str.Replace ( " ", "") for removing all spaces.
IF it removes all whitespace then use
str = (from c in str where !char.IsWhiteSpace(c) select c).ToString()
There are many diffrent ways, some faster then others:
public static string StripTabsAndNewlines(this string s) {
//string builder (fast)
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++) {
if ( ! Char.IsWhiteSpace(s[i])) {
sb.Append();
}
}
return sb.tostring();
//linq (faster ?)
return new string(input.ToCharArray().Where(c => !Char.IsWhiteSpace(c)).ToArray());
//regex (slow)
return Regex.Replace(s, #"\s+", "")
}
you could use
StringVariable.Replace(" ","")
I'm surprised no one mentioned this:
String.Join("", " all manner\tof\ndifferent\twhite spaces!\n".Split())
string.Split by default splits along the characters that are char.IsWhiteSpace so this is a very similar solution to filtering those characters out by the direct use of char.IsWhiteSpace and it's a one-liner that works in pre-LINQ environments as well.
Strip spaces? Strip whitespaces? Why should it matter? It only matters if we're searching for an existing implementation, but let's not forget how fun it is to program the solution rather than search MSDN (boring).
You should be able to strip any chars from any string by using 1 of the 2 functions below.
You can remove any chars like this
static string RemoveCharsFromString(string textChars, string removeChars)
{
string tempResult = "";
foreach (char c in textChars)
{
if (!removeChars.Contains(c))
{
tempResult = tempResult + c;
}
}
return tempResult;
}
or you can enforce a character set (so to speak) like this
static string EnforceCharLimitation(string textChars, string allowChars)
{
string tempResult = "";
foreach (char c in textChars)
{
if (allowChars.Contains(c))
{
tempResult = tempResult + c;
}
}
return tempResult;
}

Regular expression to split string and number

I have a string of the form:
codename123
Is there a regular expression that can be used with Regex.Split() to split the alphabetic part and the numeric part into a two-element string array?
I know you asked for the Split method, but as an alternative you could use named capturing groups:
var numAlpha = new Regex("(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)");
var match = numAlpha.Match("codename123");
var alpha = match.Groups["Alpha"].Value;
var num = match.Groups["Numeric"].Value;
splitArray = Regex.Split("codename123", #"(?<=\p{L})(?=\p{N})");
will split between a Unicode letter and a Unicode digit.
Regex is a little heavy handed for this, if your string is always of that form. You could use
"codename123".IndexOfAny(new char[] {'1','2','3','4','5','6','7','8','9','0'})
and two calls to Substring.
A little verbose, but
Regex.Split( "codename123", #"(?<=[a-zA-Z])(?=\d)" );
Can you be more specific about your requirements? Maybe a few other input examples.
IMO, it would be a lot easier to find matches, like:
Regex.Matches("codename123", #"[a-zA-Z]+|\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
rather than to use Regex.Split.
Well, is a one-line only: Regex.Split("codename123", "^([a-z]+)");
Another simpler way is
string originalstring = "codename123";
string alphabets = string.empty;
string numbers = string.empty;
foreach (char item in mainstring)
{
if (Char.IsLetter(item))
alphabets += item;
if (Char.IsNumber(item))
numbers += item;
}
this code is written in java/logic should be same elsewhere
public String splitStringAndNumber(String string) {
String pattern = "(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(string);
if (m.find()) {
return (m.group(1) + " " + m.group(2));
}
return "";
}

Perform Trim() while using Split()

today I was wondering if there is a better solution perform the following code sample.
string keyword = " abc, foo , bar";
string match = "foo";
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string s in split)
{
if(s.Trim() == match){// asjdklasd; break;}
}
Is there a way to perform trim() without manually iterating through each item? I'm looking for something like 'split by the following chars and automatically trim each result'.
Ah, immediatly before posting I found
List<string> parts = line.Split(';').Select(p => p.Trim()).ToList();
in How can I split and trim a string into parts all on one line?
Still I'm curious: Might there be a better solution to this? (Or would the compiler probably convert them to the same code output as the Linq-Operation?)
Another possible option (that avoids LINQ, for better or worse):
string line = " abc, foo , bar";
string[] parts= Array.ConvertAll(line.Split(','), p => p.Trim());
However, if you just need to know if it is there - perhaps short-circuit?
bool contains = line.Split(',').Any(p => p.Trim() == match);
var parts = line
.Split(';')
.Select(p => p.Trim())
.Where(p => !string.IsNullOrWhiteSpace(p))
.ToArray();
I know this is 10 years too late but you could have just split by ' ' as well:
string[] split= keyword.Split(new char[] { ',', ';', ' ' }, StringSplitOptions.RemoveEmptyEntries);
Because you're also splitting by the space char AND instructing the split to remove the empty entries, you'll have what you need.
If spaces just surrounds the words in the comma separated string this will work:
var keyword = " abc, foo , bar";
var array = keyword.Replace(" ", "").Split(',');
if (array.Contains("foo"))
{
Debug.Print("Match");
}
I would suggest using regular expressions on the original string, looking for the pattern "any number of spaces followed by one of your delimiters followed by one or more spaces" and remove those spaces. Then split.
Try this:
string keyword = " abc, foo , bar";
string match = "foo";
string[] split = Regex.Split(keyword.Trim(), #"\s*[,;]\s*");
if (split.Contains(match))
{
// do stuff
}
You're going to find a lot of different methods of doing this and the performance change and accuracy isn't going to be readily apparent. I'd recommend plugging them all into a testing suite like NUnit in order both to find which one comes out on top AND which ones are accurate.
Use small, medium, and large amounts of text in loops to examine the various situations.
Starting with .Net 5, there is an easier option:
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.TrimEntries);
You can combine it with the option to remove empty entries:
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries);

Categories