String splitting produces different results than expected - c#

it returns not what i expected.
i expected something like:
ab
cab
ab
what am i doing wrong?

don't do .ToCharArray()
it will split \r then \n
that why you have empty value
something like this should work
var aa = ("a" & Environment.NewLine & "b" & Environment.NewLine & "c").Split(New String[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);

Since you are splitting on "\r" and "n", String.Split extracts the empty string from "\r\n".
Take a look at StringSplitOptions.RemoveEmptyEntries or use new String[] { "\r\n" } instead of "\r\n".ToCharArray().

You just splitting the string using \r or \n as delimiters, not the \r\n together.

Environment.NewLine is probably the way to go but if not this works
var ab = "a\r\nb\r\nc";
var abs = ab.Split(new[]{"\r\n"}, StringSplitOptions.None);

This option also works,
string [] b = Regex.Split(abc, "\r\n");

My understanding is that the string char sequence you provide to the Split method is a list of delimiter characters, not a single delimiter madeof several characters.
In your case, Split consider the '\r' and '\n' characters as delimiters. So when it encounters the '\r\n' sequence, it returns the string between those 2 delimiters, an empty string.

Related

My Regex.Split with '\n' takes up two spaces instead of 1

I need to split my text into each word, space, and new line.
Although the words and spaces are properly working, the \n is taking up two spaces only if it's not after a word.
Example: "\nTest\nword", here, the first \n takes up two spaces while the second one takes up one.
How would I write the proper regex?
My code:
string delimiterChars = "([ \r\n])";
wordArray = Regex.Split(myTexy, delimiterChars);
For context, I am using Unity.
Input: enter image description here
Output: enter image description here
On the output of the picture: The first element is empty and the second is \n here. I don't want the empty element.
Regex.Split will always produce empty items where the matches are consecutive, or when they are at the start/end of string.
Instead, you can use a matching and extracting approach:
string delimiterChars = "[^ \r\n]+|[ \r\n]";
string[] wordArray = Regex.Matches(myTexy, delimiterChars)
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
The [^ \r\n]+|[ \r\n] regex matches one or more chars other than a space, CR and LF, or a space, CR or an LF char.
You can use regular expressions to remove leading delimiter characters.
var myTexy = "\nTest\nword";
string delimiterChars = "([ \r\n])";
myTexy = Regex.Replace(myTexy, "^" + delimiterChars, "");
var wordArray = Regex.Split(myTexy, delimiterChars);
The "^" regex option says only look for these characters at the beginning of the string.
Also, just so you are aware the behavior you are seeing is intended and is documented here:
If a match is found at the beginning or the end of the input string,
an empty string is included at the beginning or the end of the
returned array.
Let me know if this is what you are looking for -
String text = "\nTest\nword";
string[] words = Regex.Split(text, #"(\n+)");
Output -
Try this :-
string myStr = "This is test text";
wordArray = myStr.Split(new char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
Output:

Split string by second comma, then by third comma

I have a string which looks like this:
Less than $5,000, $5,000-$9,999, $10,000-$14,999, $45,000-$49,999, $50,000-$54,999
And what I would like to have is:
Less than $5,000 or $5,000-$9,999 or $10,000-$14,999 or $45,000-$49,999 or $50,000-$54,999
What I tried is:
items[1].Split(new char[1] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(e => new AnswerModel { Name = e }).ToList()
Name should be "Less than $5,000" then "$5,000-$9,999" and so on.
But it splits by every comma, I would need to split it first by the second and then by the third.
What would be the best approach?
Maybe you can split by ", "
string s = "Less than $5,000, $5,000-$9,999, $10,000-$14,999, $45,000-$49,999, $50,000-$54,999";
string result = string.Join(" or ", s.Split(new []{", "},StringSplitOptions.None));
Returns your desired result:
Less than $5,000 or $5,000-$9,999 or $10,000-$14,999 or
$45,000-$49,999 or $50,000-$54,999
If your string looks exactly as you pasted here, and it will always have such pattern, then you can split on a string of two characters:
items[1].Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries);
Update:
But if your string is without spaces, then you should replace first with some unique character like |, and then split over that new character:
items[1].Replace(",$", "|$").Split(new char[1] { '|' }, StringSplitOptions.RemoveEmptyEntries);
Also you can replace it with or $ and the string would be already in desired format, and there would be no need to split and join again.
This should do the trick.

How to split a String by using two identifiers at the same time

I have a string which i want to split into sub strings either by symbol '\n' or '\r' , for single identifier splitting we can use
string[] strsplit = str.Split('\n') ;
but in my case it is not sure weather it is '\n' or '\r' ..
can any one please tell me is there is any way to split string like the below mentioned way..
string[] strsplit = str.Split('\n' || '\r') ;
thanks in Advance and sorry for my Bad english
Split method has overload which accepts array of char:
string[] strsplit = str.Split(new char[] { '\n', '\r' }) ;
As mentioned in comments you can now do it this way:
string[] strsplit = str.Split('\n', '\r') ;
If you have the case where sometimes you have lines split with sometimes \n, sometimes \r and sometimes \r\n you can do the following
someString.Split(new[] { '\n', '\r' }, StringSplitOptions.RemoveEmptyEntries);
Another option is if you want explicitly to include Environment.NewLine (\r\n) is
someString.Split(new[] { "\n", "\r", Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
Notice these are now strings (using " instead of ').
You can change the line skip behavior by changing the StringSplitOptions to StringSplitOptions.None
You say you only want to split by either \r or \n, but in reality usually people want to also consider \r\n, since that's the default Windows line break. If you also want that, you'll need to do a little extra work. One way is to use StringReader and let it to the work for you:
var lines = new List<string>();
using (var sr = new StringReader(str)) {
string line;
while ((line = sr.ReadLine()) != null) {
lines.Add(line);
}
}
string[] strsplit = lines.ToArray();
This has slightly different behavior than dav_i's answer, when it comes to handling multiple empty lines. Just depends on what you're looking for.
You just have to pass an array of chars to the Split method
string[] arr = "Test\rTest\nTest".Split(new[] { '\r', '\n' });

how to split a string by another string?

I have a string in following format
"TestString 1 <^> TestString 2 <^> Test String3
Which i want to split by "<^>" string.
Using following statement it gives the output i want
"TestString 1 <^> TestString 2 <^> Test String3"
.Split("<^>".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
But if my string contains "<" , ">" or "^" anywhere in the text then above split statement will consider that as well
Any idea how to split only for "<^>" string ?
By using ToCharArray you are saying "split on any of these characters"; to split on the sequence "<^>" you must use the overload that accepts a string[]:
string[] parts = yourValue.Split(new string[]{"<^>"}, StringSplitOptions.None);
Or in C# 3:
string[] parts = yourValue.Split(new[]{"<^>"}, StringSplitOptions.None);
Edit: As others pointed already out: String.Split has a good overload for your usecase. The answer below is still correct (as in working), but - not the way to go.
That's because this string.Split overload takes an array of separator chars. Each of them splits the string.
You want: Regex.Split
Regex regex = new Regex(#"<\^>");
string[] substrings = regex.Split("TestString 1 <^> TestString 2 <^> Test String3");
And - a sidenote:
"<^>".ToCharArray()
is really just a fancy way to say
new[]{'<', '^', '>'}
Try another overloaded Split method:
public string[] Split(
string[] separator,
StringSplitOptions options
)
So in you case it may looks like:
var result =
yourString.Split(new string[] {"<^>"},StringSplitOptions.RemoveEmptyEntries);
Hope, this helps.

what is the cleanest way to remove all extra spaces from a user input comma delimited string into an array

A program has users typing in a comma-delimited string into an array:
basketball, baseball, soccer ,tennis
There may be spaces between the commas or maybe not.
If this string was simply split() on the comma, then some of the items in the array may have spaces before or after them.
What is the best way of cleaning this up?
You can use Regex.Split for this:
string[] tokens = Regex.Split("basketball, baseball, soccer ,tennis", #"\s*,\s*");
The regex \s*,\s* can be read as: "match zero or more white space characters, followed by a comma followed by zero or more white space characters".
string[] values = delimitedString.Split(',').Select(s => s.Trim()).ToArray();
string s = "ping pong, table tennis, water polo";
string[] myArray = s.Split(',');
for (int i = 0; i < myArray.Length; i++)
myArray[i] = myArray[i].Trim();
That will preserve the spaces in the entries.
You can split on either comma or space, and remove the empty entries (those between a comma and a space):
string[] values = delimitedString.Split(new char[]{',',' '}, StringSplitOption.RemoveEmptyEntries);
Edit:
However, as that doesn't work with values that contain spaces, instead you can split on the possible variations if your values can contain spaces:
string[] values = delimitedString.Split(new string[]{" , ", " ,", ", ", ","}, StringSplitOptions.None);
Split the items on the comma:
string[] splitInput = yourInput.Split(',', StringSplitOption.RemoveEmptyEntries);
and then use
foreach (string yourString in splitInput)
{
yourString.Trim();
}
String.Replace(" ","") before you split the string
I would firstly split a text by ",", and then use Trim method to remove spaces on start and end of the each string. This will add support for string with spaces to your code.

Categories