StringSplitOptions.RemoveEmptyEntries doesn't work as advertised - c#

I've come across this several times in the past and have finally decided to find out why.
StringSplitOptions.RemoveEmptyEntries would suggest that it removes empty entries.
So why does this test fail?
var tags = "One, Two, , Three, Foo Bar, , Day , ";
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());
tagsSplit.ShouldEqual(new string[] {
"One",
"Two",
"Three",
"Foo Bar",
"Day"
});
The result:
Values differ at index [2]
Expected string length 5 but was 0. Strings differ at index 0.
Expected: "Three"
But was: <string.Empty>
So it fails because instead of "Three", we have an empty string – exactly what StringSplitOptions.RemoveEmptyEntries should prevent.

Most likely because you change the string after the split. You trim the values after splitting them, RemoveEmptyEntries doesn't consider the string " " empty.
The following would achieve what you want, basically creating your own strip empty elements:
var tagsSplit = tags.Split(',').
Select(tag => tag.Trim()).
Where( tag => !string.IsNullOrEmpty(tag));

Adjacent delimiters yield an array element that contains an empty
string (""). The values of the StringSplitOptions enumeration specify
whether an array element that contains an empty string is included in
the returned array.
" " by definition is not empty (it is actually whitespace), so it is not removed from resulting array.
If you use .net framework 4, you could work around that by using string.IsNullOrWhitespace method
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Where(x => !string.IsNullOrWhiteSpace(x))
.Select(s => s.Trim());

RemoveEmptyEntries do not means space.
Your input string include many "space". You should notice that "space" is not empty. In computer, space is a special ASCII code. so the code:
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());
means:
Split the input by ',' and remove empty entry, not include space. So
you got an array with some space elements.
Then you do trim for each of elements. The space elements become to empty.
That's why you got it.

In .NET 5, they added StringSplitOptions.TrimEntries.
Since StringSplitOptions has the [System.Flags] attribute, it means you can write
var splitResult = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries);
When both RemoveEmptyEntries and TrimEntries are specified, it removes both empty values and values which only contain whitespace, whilst trimming all the remaining values.

Try
var tagsSplit = tags.Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
This will spit by comma and space, and eliminate empty strings.

I've searched also for a clean way to exclude whitespace-entries during a Split, but since all options seemed like some kind of workarounds, I've chosen to exclude them when looping over the array.
string[] tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string tag in tagsSplit.Where(t => !string.IsNullOrWhiteSpace(t))) { }
I think this looks cleaner and - as a bonus - .Split(...).ToArray() may be ommited.
Of course, it is an option only when you may loop just after split and do not have to store entries for later use.

As this is a very common need, I went ahead and wrapped the most popular answer in a string extension method:
public static IEnumerable<string> Split_RemoveWhiteTokens(this string s, params char[] separator)
{
return s.Split(separator).
Select(tag => tag.Trim()).
Where(tag => !string.IsNullOrEmpty(tag));
}
To split on ',' as the other examples, use like this:
var result = yourString.Split_RemoveWhiteTokens(',')
Note that the return type is IEnumerable, so you can do additional LINQ queries directly on the return result. Call .ToList() if you want to cast the result to a list.

By EmptyEntries it means a case where two delimiters are directly next to each other with nothing in between. Without using this option, it will print a blank line to represent this delimitation. If you use the "RemoveEmptyEntries" option it will not show the delimitation unless there is actually something between the delimiters. A blank space counts as something between the delimiters. If you tried:
One, Two,, Three,
You should find that RemoveEmptyEntries eliminates the delimitation between the two commas and goes straight from Two to Three.

var tagsSplit = tags.Split(',')
.Where(str => str != String.IsNullOrWhiteSpace(str))
.Select(s => s.Trim());

Related

Trim space in <list>string C#

I am working on an application where I have multiple ID in a string that I passed from my view separated by a ';'.
So this is what it looks like "P171;P172".
if (ModelState.IsValid)
{
hiddenIDnumber= hiddenIDnumber.Trim();
List<string> listStrLineElements = hiddenIDnumber.Split(';').ToList();
foreach (string str in listStrLineElements)
The problem is, when I split my hiddenIDnumber, even if I have two numbers, I get a count of 3 and "" is returned (which I believe is an empty space).
When I use a breakpoint i get "P171","P172" AND "".
This is causing my program to fail because of my FK constraints.
Is there a way to "overcome this and somehow "trim" the space out?
Use another overload of string.Split whih allows you to ignore empty entries. For example:
List<string> listStrLineElements = hiddenIDnumber
.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries)
.ToList();
I would say that one way to do this would be to use String Split Options. With String.Split there is an overload that takes two arguments, i.e. it would be like
myString.Split(new [] {';'}, StringSplitOptions.RemoveEmptyEntries);
This should prevent any entries in your array that would only be an empty string.
var listStrLineElements = hiddenIDnumber.Split(new char[]{';'}, StringSplitOptions.RemoveEmptyEntries);
Use the parameter StringSplitOptions.RemoveEmptyEntries to automatically remove empty entries from the result list.
You can try:
IList<string> listStrLineElements = hiddenIDnumber.Split(";".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
I prefer this over new [] { ';' } for readability, and return it to an interface (IList<string>).
You will end up with the number of ;s plus one when you split. Since your comment mentions you have 2 ;s, you will get 3 in your list: before the first semicolon, between the first and the second, and after the second. You are not getting an empty space, you are getting a string.Empty because you have nothing after the last ;
if (ModelState.IsValid)
{
hiddenIDnumber= hiddenIDnumber.Trim(";");
List<string> listStrLineElements = hiddenIDnumber.Split(';').ToList();
foreach (string str in listStrLineElements)
This way you get rid of the ; at the end before you split, and you don't get an empty string back.

How to remove Whitespce from stringArray formed based on whitespace

I have a string which contains value like.
90 524 000 1234567890 2207 1926 00:34 02:40 S
Now i have broken this string into string Array based on white-space.Now i want to create one more string array into such a way so that all the white-space gets removed and it contains only real value.
Also i want to get the position of the string array element from the original string array based on the selection from the new string array formed by removing white space.
Please help me.
You can use StringSplitOptions.RemoveEmptyEntries via String.Split.
var values = input.Split(new [] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringSplitOptions.RemoveEmptyEntries: The return value does not include array elements that contain an empty string
When the Split method encounters two consecutive white-space it will return an empty string.Using StringSplitOptions.RemoveEmptyEntries will remove the empty strings and give you only the values you want.
You can also achieve this using LINQ
var values = input.Split().Where(x => x != string.Empty).ToArray();
Edit: If I understand you correctly you want the positions of the values in your old array. If so you can do this by creating a dictionary where the keys are the actual values and the values are indexes:
var oldValues = input.Split(' ');
var values = input.Split().Where(x => x != string.Empty).ToArray();
var indexes = values.ToDictionary(x => x, x => Array.IndexOf(oldValues, x));
Then indexes["1234567890"] will give you the position of 1234567890 in the first array.
You can use StringSplitOptions.RemoveEmptyEntries:
string[] arr = str.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
Note that i've also added tab character as delimiter. There are other white-space characters like the line separator character, add as desired. Full list here.
string s = "90 524 000 1234567890 2207 1926 00:34 02:40 S ";
s.Split(' ').Where(x=>!String.IsNullOrWhiteSpace(x))

How to get all words over a specified length from a string?

I am writing a text analysis program for an assignment and need to write a function which will return all words over a specified length from a string (in this case all words with over 6 characters).
I have found plenty of examples which show how to return groups of words based on their lengths but none on how to get ALL the words over a specified length
static IEnumerable<string> getWordsWithMinLength(string text, int minLength)
{
string[] words = text.Split();
return words.Where(w => w.Length >= minLength);
}
String [] words = text.Split(new char[] {' '},
System.StringSplitOptions.RemoveEmptyEntries );
String [] filteredWords = words.Where(w => w.Length>6).ToArray();
Create a list of strings var list = new List<string>(),
loop through every word in your text,
if (word.Length > 6) { list.Add(word) },
and when you're done, return list;
Voilà!
At least you used the homework tag, this does scream "hey, do my work for me." What have you tried so far? Where are you having problems?
Break the problem down. It seems like you have 3 logical pieces:
1)From a string, get all words
2)From those words, find all that have a length greater than N
3)Return those words.
Check out String.Split() for #1, and .Where() in Linq to do the filtering.

Why does C# split give me an array ending in an empty line?

I have the following expression:
"<p>What ?</p>\n<pre>Starting Mini</pre>"
When I perform a split as follows:
var split = content
.Split(new[] { "<pre>", "</pre>" }, StringSplitOptions.None);
Then it gives me three entries:
"<p>What ?</p>\n"
"Starting Mini"
""
Why does it give an empty line as the third entry and how can I avoid this?
The "why" is simply: the input (if you don't remove empty entries) will always "split" at any occurrence of the separator(s), so if the separator(s) appear n times in the string, then the array will be n+1 long. In particular, this essentially lets you know where they occurred in the original string (although when using multiple separators, it doesn't let you know which appeared where).
For example, with a simple example (csv without any escaping etc):
string[] arr = input.Split(','); // even if something like "a,b,c,d,"
// which allows...
int numberOfCommas = arr.Length - 1;
string original = string.Join(",", arr);
The fix is, as already mentioned, to use RemoveEmptyEntries.
Use StringSplitOptions.RemoveEmptyEntries instead to remove empty string in list
var split = content
.Split(new[] { "<pre>", "</pre>" }, StringSplitOptions.RemoveEmptyEntries);
You get this behaviour as specified from Microsoft:
"Adjacent delimiters yield an array element that contains an empty string ("")."
So since you have the last pre you get the last empty array element
Mailou, instead of giving 'StringSplitOptions.None' try 'StringSplitOptions.RemoveEmptyEntries'. It removes the the empty lines.
The reason you are getting this behaviour is that your one of the delimeter </pre> happens to exist at the end of the string.
You may see: string.Split - MSDN
...a delimiter is found at the beginning or end of this instance, the
corresponding array element contains Empty
To overcome this:
Use StringSplitOptions.RemoveEmptyEntries instead of StringSplitOptions.None
StringSplitOptions.RemoveEmptyEntries - MSDN
The return value does not include array elements that contain an empty
string
var split = content
.Split(new[] { "<pre>", "</pre>" }, StringSplitOptions.RemoveEmptyEntries);
You also need to specify the
StringSplitOptions.RemoveEmptyEntries enumerator.
The split string[] values not include any empty string by using StringSplitOptions.RemoveEmptyEntries
var split = content
.Split(new[] { "<pre>", "</pre>" }, StringSplitOptions.RemoveEmptyEntries);
Reference: StringSplitOptions Enumeration
You are getting empty line due to
</pre>
You are instructing split function to split by <pre> and </pre>
As result with <pre> you are getting
<p>What ?</p>\n
Starting Mini</pre>
And next result is with </pre> is
<p>What ?</p>\n
Starting Mini
...

Perform Trim() while using Split()

today I was wondering if there is a better solution perform the following code sample.
string keyword = " abc, foo , bar";
string match = "foo";
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string s in split)
{
if(s.Trim() == match){// asjdklasd; break;}
}
Is there a way to perform trim() without manually iterating through each item? I'm looking for something like 'split by the following chars and automatically trim each result'.
Ah, immediatly before posting I found
List<string> parts = line.Split(';').Select(p => p.Trim()).ToList();
in How can I split and trim a string into parts all on one line?
Still I'm curious: Might there be a better solution to this? (Or would the compiler probably convert them to the same code output as the Linq-Operation?)
Another possible option (that avoids LINQ, for better or worse):
string line = " abc, foo , bar";
string[] parts= Array.ConvertAll(line.Split(','), p => p.Trim());
However, if you just need to know if it is there - perhaps short-circuit?
bool contains = line.Split(',').Any(p => p.Trim() == match);
var parts = line
.Split(';')
.Select(p => p.Trim())
.Where(p => !string.IsNullOrWhiteSpace(p))
.ToArray();
I know this is 10 years too late but you could have just split by ' ' as well:
string[] split= keyword.Split(new char[] { ',', ';', ' ' }, StringSplitOptions.RemoveEmptyEntries);
Because you're also splitting by the space char AND instructing the split to remove the empty entries, you'll have what you need.
If spaces just surrounds the words in the comma separated string this will work:
var keyword = " abc, foo , bar";
var array = keyword.Replace(" ", "").Split(',');
if (array.Contains("foo"))
{
Debug.Print("Match");
}
I would suggest using regular expressions on the original string, looking for the pattern "any number of spaces followed by one of your delimiters followed by one or more spaces" and remove those spaces. Then split.
Try this:
string keyword = " abc, foo , bar";
string match = "foo";
string[] split = Regex.Split(keyword.Trim(), #"\s*[,;]\s*");
if (split.Contains(match))
{
// do stuff
}
You're going to find a lot of different methods of doing this and the performance change and accuracy isn't going to be readily apparent. I'd recommend plugging them all into a testing suite like NUnit in order both to find which one comes out on top AND which ones are accurate.
Use small, medium, and large amounts of text in loops to examine the various situations.
Starting with .Net 5, there is an easier option:
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.TrimEntries);
You can combine it with the option to remove empty entries:
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries);

Categories