Perform Trim() while using Split() - c#

today I was wondering if there is a better solution perform the following code sample.
string keyword = " abc, foo , bar";
string match = "foo";
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string s in split)
{
if(s.Trim() == match){// asjdklasd; break;}
}
Is there a way to perform trim() without manually iterating through each item? I'm looking for something like 'split by the following chars and automatically trim each result'.
Ah, immediatly before posting I found
List<string> parts = line.Split(';').Select(p => p.Trim()).ToList();
in How can I split and trim a string into parts all on one line?
Still I'm curious: Might there be a better solution to this? (Or would the compiler probably convert them to the same code output as the Linq-Operation?)

Another possible option (that avoids LINQ, for better or worse):
string line = " abc, foo , bar";
string[] parts= Array.ConvertAll(line.Split(','), p => p.Trim());
However, if you just need to know if it is there - perhaps short-circuit?
bool contains = line.Split(',').Any(p => p.Trim() == match);

var parts = line
.Split(';')
.Select(p => p.Trim())
.Where(p => !string.IsNullOrWhiteSpace(p))
.ToArray();

I know this is 10 years too late but you could have just split by ' ' as well:
string[] split= keyword.Split(new char[] { ',', ';', ' ' }, StringSplitOptions.RemoveEmptyEntries);
Because you're also splitting by the space char AND instructing the split to remove the empty entries, you'll have what you need.

If spaces just surrounds the words in the comma separated string this will work:
var keyword = " abc, foo , bar";
var array = keyword.Replace(" ", "").Split(',');
if (array.Contains("foo"))
{
Debug.Print("Match");
}

I would suggest using regular expressions on the original string, looking for the pattern "any number of spaces followed by one of your delimiters followed by one or more spaces" and remove those spaces. Then split.

Try this:
string keyword = " abc, foo , bar";
string match = "foo";
string[] split = Regex.Split(keyword.Trim(), #"\s*[,;]\s*");
if (split.Contains(match))
{
// do stuff
}

You're going to find a lot of different methods of doing this and the performance change and accuracy isn't going to be readily apparent. I'd recommend plugging them all into a testing suite like NUnit in order both to find which one comes out on top AND which ones are accurate.
Use small, medium, and large amounts of text in loops to examine the various situations.

Starting with .Net 5, there is an easier option:
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.TrimEntries);
You can combine it with the option to remove empty entries:
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries);

Related

Split string by second comma, then by third comma

I have a string which looks like this:
Less than $5,000, $5,000-$9,999, $10,000-$14,999, $45,000-$49,999, $50,000-$54,999
And what I would like to have is:
Less than $5,000 or $5,000-$9,999 or $10,000-$14,999 or $45,000-$49,999 or $50,000-$54,999
What I tried is:
items[1].Split(new char[1] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(e => new AnswerModel { Name = e }).ToList()
Name should be "Less than $5,000" then "$5,000-$9,999" and so on.
But it splits by every comma, I would need to split it first by the second and then by the third.
What would be the best approach?
Maybe you can split by ", "
string s = "Less than $5,000, $5,000-$9,999, $10,000-$14,999, $45,000-$49,999, $50,000-$54,999";
string result = string.Join(" or ", s.Split(new []{", "},StringSplitOptions.None));
Returns your desired result:
Less than $5,000 or $5,000-$9,999 or $10,000-$14,999 or
$45,000-$49,999 or $50,000-$54,999
If your string looks exactly as you pasted here, and it will always have such pattern, then you can split on a string of two characters:
items[1].Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries);
Update:
But if your string is without spaces, then you should replace first with some unique character like |, and then split over that new character:
items[1].Replace(",$", "|$").Split(new char[1] { '|' }, StringSplitOptions.RemoveEmptyEntries);
Also you can replace it with or $ and the string would be already in desired format, and there would be no need to split and join again.
This should do the trick.

Using lambdas in C# to perform multiple functions on an array

I have a string which I would like to split on a particular delimiter and then remove starting and trailing whitespace from each member. Currently the code looks like:
string s = "A, B, C ,D";
string[] parts = s.Split(',');
for(int i = 0; i++; i< parts.Length)
{
parts[i] = parts[i].Trim();
}
I feel like there should be a way to do this with lambdas, so that it could fit on one line, but I can't wrap my head around it. I'd rather stay away from LINQ, but I'm not against it as a solution either.
string s = "A, B, C ,D";
string[] parts = s.Split(','); // This line should be able to perform the trims as well
I've been working in Python recently and I think that's what has made me revisit how I think about solutions to problems in C#.
What about:
string[] parts = s.Split(',').Select(x => x.Trim()).ToArray();
var parts = s.Split(',').Select(part => part.Trim());
If you really want to avoid LINQ, you can split on multiple characters and discard the extra "empty" entries you get between the "," and spaces. Note that you can end up getting odd results (e.g. if you have consecutive "," delimiters you won't get the empty string in between them anymore):
s.Split(new char[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
This will work for your sample input, but its very fragile. For example, as #Oscar points out, whitespace inside your tokens will cause them to get split as well. I'd highly recommend you go with one of the LINQ-based options instead.

How to break a string at each comma?

Hi guys I have a problem at hand that I can't seem to figure out, I have a string (C#) which looks like this:
string tags = "cars, motor, wheels, parts, windshield";
I need to break this string at every comma and get each word assign to a new string by itself like:
string individual_tag = "car";
I know I have to do some kind of loop here but I'm not really sure how to approach this, any help will be really appreciate it.
No loop needed. Just a call to Split():
var individualStrings = tags.Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries);
You can use one of String.Split methods
Split Method (Char[])
Split Method (Char[], StringSplitOptions)
Split Method (String[], StringSplitOptions)
let's try second option:
I'm giving , and space as split chars then on each those character occurrence input string will be split, but there can be empty strings in the results. we can remove them using StringSplitOptions.RemoveEmptyEntries parameter.
string[] tagArray = tags.Split(new char[]{',', ' '},
StringSplitOptions.RemoveEmptyEntries);
OR
string[] tagArray = s.Split(", ".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries);
you can access each tag by:
foreach (var t in tagArray )
{
lblTags.Text = lblTags.Text + " " + t; // update lable with tag values
//System.Diagnostics.Debug.WriteLine(t); // this result can be see on your VS out put window
}
make use of Split function will do your task...
string[] s = tags.Split(',');
or
String.Split Method (Char[], StringSplitOptions)
char[] charSeparators = new char[] {',',' '};
string[] words = tags.Split(charSeparators, StringSplitOptions.RemoveEmptyEntries);
string[] words = tags.Split(',');
You are looking for the C# split() function.
string[] tags = tags.Split(',');
Edit:
string[] tag = tags.Trim().Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries);
You should definitely use the form supplied by Justin Niessner. There were two key differences that may be helpful depending on the input you receive:
You had spaces after your ,s so it would be best to split on ", "
StringSplitOptions.RemoveEmptyEntries will remove the empty entry that is possible in the case that you have a trailing comma.
Program that splits on spaces [C#]
using System;
class Program
{
static void Main()
{
string s = "there, is, a, cat";
string[] words = s.Split(", ".ToCharArray());
foreach (string word in words)
{
Console.WriteLine(word);
}
}
}
Output
there
is
a
cat
Reference

StringSplitOptions.RemoveEmptyEntries doesn't work as advertised

I've come across this several times in the past and have finally decided to find out why.
StringSplitOptions.RemoveEmptyEntries would suggest that it removes empty entries.
So why does this test fail?
var tags = "One, Two, , Three, Foo Bar, , Day , ";
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());
tagsSplit.ShouldEqual(new string[] {
"One",
"Two",
"Three",
"Foo Bar",
"Day"
});
The result:
Values differ at index [2]
Expected string length 5 but was 0. Strings differ at index 0.
Expected: "Three"
But was: <string.Empty>
So it fails because instead of "Three", we have an empty string – exactly what StringSplitOptions.RemoveEmptyEntries should prevent.
Most likely because you change the string after the split. You trim the values after splitting them, RemoveEmptyEntries doesn't consider the string " " empty.
The following would achieve what you want, basically creating your own strip empty elements:
var tagsSplit = tags.Split(',').
Select(tag => tag.Trim()).
Where( tag => !string.IsNullOrEmpty(tag));
Adjacent delimiters yield an array element that contains an empty
string (""). The values of the StringSplitOptions enumeration specify
whether an array element that contains an empty string is included in
the returned array.
" " by definition is not empty (it is actually whitespace), so it is not removed from resulting array.
If you use .net framework 4, you could work around that by using string.IsNullOrWhitespace method
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Where(x => !string.IsNullOrWhiteSpace(x))
.Select(s => s.Trim());
RemoveEmptyEntries do not means space.
Your input string include many "space". You should notice that "space" is not empty. In computer, space is a special ASCII code. so the code:
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());
means:
Split the input by ',' and remove empty entry, not include space. So
you got an array with some space elements.
Then you do trim for each of elements. The space elements become to empty.
That's why you got it.
In .NET 5, they added StringSplitOptions.TrimEntries.
Since StringSplitOptions has the [System.Flags] attribute, it means you can write
var splitResult = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries);
When both RemoveEmptyEntries and TrimEntries are specified, it removes both empty values and values which only contain whitespace, whilst trimming all the remaining values.
Try
var tagsSplit = tags.Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
This will spit by comma and space, and eliminate empty strings.
I've searched also for a clean way to exclude whitespace-entries during a Split, but since all options seemed like some kind of workarounds, I've chosen to exclude them when looping over the array.
string[] tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string tag in tagsSplit.Where(t => !string.IsNullOrWhiteSpace(t))) { }
I think this looks cleaner and - as a bonus - .Split(...).ToArray() may be ommited.
Of course, it is an option only when you may loop just after split and do not have to store entries for later use.
As this is a very common need, I went ahead and wrapped the most popular answer in a string extension method:
public static IEnumerable<string> Split_RemoveWhiteTokens(this string s, params char[] separator)
{
return s.Split(separator).
Select(tag => tag.Trim()).
Where(tag => !string.IsNullOrEmpty(tag));
}
To split on ',' as the other examples, use like this:
var result = yourString.Split_RemoveWhiteTokens(',')
Note that the return type is IEnumerable, so you can do additional LINQ queries directly on the return result. Call .ToList() if you want to cast the result to a list.
By EmptyEntries it means a case where two delimiters are directly next to each other with nothing in between. Without using this option, it will print a blank line to represent this delimitation. If you use the "RemoveEmptyEntries" option it will not show the delimitation unless there is actually something between the delimiters. A blank space counts as something between the delimiters. If you tried:
One, Two,, Three,
You should find that RemoveEmptyEntries eliminates the delimitation between the two commas and goes straight from Two to Three.
var tagsSplit = tags.Split(',')
.Where(str => str != String.IsNullOrWhiteSpace(str))
.Select(s => s.Trim());

Substring of a variant string

I have the following return of a printer:
{Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000}
Ok, I need to save, in a list, the return in parts.
e.g.
[0] "Ta005000000000000000000F"
[1] "00000000000000000I"
[2] "00000000000000000N"
...
The problem is that the number of characters varies.
A tried to make it going into the 'space', taking the substring, but failed...
Any suggestion?
Use String.Split on a single space, and use StringSplitOptions.RemoveEmptyEntries to make sure that multiple spaces are seen as only one delimiter:
var source = "00000000000000000FS 0000000...etc";
var myArray = source.Split(' ', StringSplitOptions.RemoveEmptyEntries);
#EDIT: An elegant way to get rid of the braces is to include them as separators in the Split (thanks to Joachim Isaksson in the comments):
var myArray = source.Split(new[] {' ', '{', '}'}, StringSplitOptions.RemoveEmptyEntries);
You could use a Regex for this:
string input = "{Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000}";
IEnumerable<string> matches = Regex.Matches(input, "[0-9a-zA-Z]+").Select(m => m.Value);
You can use string.split to create an array of substrings. Split allows you to specify multiple separator characters and to ignore repeated splits if necessary.
You could use the .Split member of the "String" class and split the parts up to that you want.
Sample would be:
string[] input = {Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000};
string[] splits = input.Split(' ');
Console.WriteLine(splits[0]); // Ta005000000000000000000F
And so on.
Just off the bat. Without considering the encompassing braces:
string printMsg = "Ta005000000000000000000F 00000000000000000I
00000000000000000N 00000000000000000FS
00000000000000000IS 00000000000000000NS 00000000000000000";
string[] msgs = printMsg.Split(' ').ForEach(s=>s.Trim()).ToArray();
Could work.

Categories