How to remove Whitespce from stringArray formed based on whitespace - c#

I have a string which contains value like.
90 524 000 1234567890 2207 1926 00:34 02:40 S
Now i have broken this string into string Array based on white-space.Now i want to create one more string array into such a way so that all the white-space gets removed and it contains only real value.
Also i want to get the position of the string array element from the original string array based on the selection from the new string array formed by removing white space.
Please help me.

You can use StringSplitOptions.RemoveEmptyEntries via String.Split.
var values = input.Split(new [] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringSplitOptions.RemoveEmptyEntries: The return value does not include array elements that contain an empty string
When the Split method encounters two consecutive white-space it will return an empty string.Using StringSplitOptions.RemoveEmptyEntries will remove the empty strings and give you only the values you want.
You can also achieve this using LINQ
var values = input.Split().Where(x => x != string.Empty).ToArray();
Edit: If I understand you correctly you want the positions of the values in your old array. If so you can do this by creating a dictionary where the keys are the actual values and the values are indexes:
var oldValues = input.Split(' ');
var values = input.Split().Where(x => x != string.Empty).ToArray();
var indexes = values.ToDictionary(x => x, x => Array.IndexOf(oldValues, x));
Then indexes["1234567890"] will give you the position of 1234567890 in the first array.

You can use StringSplitOptions.RemoveEmptyEntries:
string[] arr = str.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
Note that i've also added tab character as delimiter. There are other white-space characters like the line separator character, add as desired. Full list here.

string s = "90 524 000 1234567890 2207 1926 00:34 02:40 S ";
s.Split(' ').Where(x=>!String.IsNullOrWhiteSpace(x))

Related

How to split a string of numbers on the white space character and convert to integers

I'm working on some homework and I need to get an input from the user which is a single line of numbers separated by spaces. I want to split this string and get the individual numbers out so that I can insert them into a Binary Search Tree.
I tried the split function and was able to rid of the white space but I'm not sure how to "collect" the individual numbers.
string data;
string[] newdata = { };
Console.WriteLine("Please enter a list of integers with spaces
between each number.\n");
data = Console.ReadLine();
newdata = data.Split(null);
Console.WriteLine(String.Join(Environment.NewLine, newdata));
I want to somehow collect the elements from newdata string array and convert them into integers but I'm having a tough time figuring out how to do that.
Well, you could use Linq .Select method combined with .Split method:
List<int> newData = data.Split(' ').Select(int.Parse).ToList();
If you want user to be able to enter empty spaces we need to trim the resulting strings after split. For that we can use another overload of string.Split method that accepts StringSplitOptions :
List<int> newData = data.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries).Select(int.Parse).ToList();
Finally if you want to allow user to enter incorrect data at times and still get collection of valid ints you could use int.TryParse and filter out values that were parsed incorrectly:
List<int> newData = data.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => int.TryParse(s.Trim(), out var n) ? (int?)n : null)
.Where(x => x != null)
.Select(i => i.Value)
.ToList();
Some smart LINQ answers are already provided, here is my extended step by step solution which also allows to ignore invalid numbers:
//Read input string
Console.Write("Input numbers separated by space: ");
string inputString = Console.ReadLine();
//Split by spaces
string[] splittedInput = inputString.Split(' ');
//Create a list to store all valid numbers
List<int> validNumbers = new List<int>();
//Iterate all splitted parts
foreach (string input in splittedInput)
{
//Try to parse the splitted part
if (int.TryParse(input, out int number) == true)
{
//Add the valid number
validNumbers.Add(number);
}
}
//Print all valid numbers
Console.WriteLine(string.Join(", ", validNumbers));
OK as code:
var words = data.Split();
int i;
List<int> integers = new List<int>();
foreach(var s in words)
{
if (int.TryParse(s, out i)) {integers.Add(i);}
}
// now you have a list of integers
// if using decimal, use decimal instead of integer
You can do as follows.
var numbers = Console.ReadLine();
var listOfNumbers = numbers.Split(new[]{" "},StringSplitOptions.RemoveEmptyEntries)
.Select(x=> Int32.Parse(x));
The above lines split the user input based on "whitespace", removing any empty entries in between, and then converts the string numbers to integers.
The StringSplitOptions.RemoveEmptyEntries ensures that empty entries are removed. An example of empty entry would be an string where two delimiters occur next to each other. For example, "2 3 4 5", there are two whitespaces between 2 and 3,which means, when you are spliting the string with whitespace as delimiter, you end up with an empty element in array. This is eliminated by usage of StringSplitOptions.RemoveEmptyEntries
Depending on whether you are expecting Integers or Decimals, you can use Int32.Parse or Double.Parse (or float/decimal etc)
Furthermore, you can include checks to ensure you have a valid number, otherwise throw an exception. You can alter the query as follows.
var listOfNumbers = numbers.Split(new[]{" "},StringSplitOptions.RemoveEmptyEntries)
.Select(x=>
{
Console.WriteLine(x);
if(Int32.TryParse(x,out var number))
return number;
else
throw new Exception("Element is not a number");
});
This ensures all the element in the list are valid numbers, otherwise throw an exception.
Keep the spaces and do the "split" using the space "data.split(' ');".

Why does C# split give me an array ending in an empty line?

I have the following expression:
"<p>What ?</p>\n<pre>Starting Mini</pre>"
When I perform a split as follows:
var split = content
.Split(new[] { "<pre>", "</pre>" }, StringSplitOptions.None);
Then it gives me three entries:
"<p>What ?</p>\n"
"Starting Mini"
""
Why does it give an empty line as the third entry and how can I avoid this?
The "why" is simply: the input (if you don't remove empty entries) will always "split" at any occurrence of the separator(s), so if the separator(s) appear n times in the string, then the array will be n+1 long. In particular, this essentially lets you know where they occurred in the original string (although when using multiple separators, it doesn't let you know which appeared where).
For example, with a simple example (csv without any escaping etc):
string[] arr = input.Split(','); // even if something like "a,b,c,d,"
// which allows...
int numberOfCommas = arr.Length - 1;
string original = string.Join(",", arr);
The fix is, as already mentioned, to use RemoveEmptyEntries.
Use StringSplitOptions.RemoveEmptyEntries instead to remove empty string in list
var split = content
.Split(new[] { "<pre>", "</pre>" }, StringSplitOptions.RemoveEmptyEntries);
You get this behaviour as specified from Microsoft:
"Adjacent delimiters yield an array element that contains an empty string ("")."
So since you have the last pre you get the last empty array element
Mailou, instead of giving 'StringSplitOptions.None' try 'StringSplitOptions.RemoveEmptyEntries'. It removes the the empty lines.
The reason you are getting this behaviour is that your one of the delimeter </pre> happens to exist at the end of the string.
You may see: string.Split - MSDN
...a delimiter is found at the beginning or end of this instance, the
corresponding array element contains Empty
To overcome this:
Use StringSplitOptions.RemoveEmptyEntries instead of StringSplitOptions.None
StringSplitOptions.RemoveEmptyEntries - MSDN
The return value does not include array elements that contain an empty
string
var split = content
.Split(new[] { "<pre>", "</pre>" }, StringSplitOptions.RemoveEmptyEntries);
You also need to specify the
StringSplitOptions.RemoveEmptyEntries enumerator.
The split string[] values not include any empty string by using StringSplitOptions.RemoveEmptyEntries
var split = content
.Split(new[] { "<pre>", "</pre>" }, StringSplitOptions.RemoveEmptyEntries);
Reference: StringSplitOptions Enumeration
You are getting empty line due to
</pre>
You are instructing split function to split by <pre> and </pre>
As result with <pre> you are getting
<p>What ?</p>\n
Starting Mini</pre>
And next result is with </pre> is
<p>What ?</p>\n
Starting Mini
...

splitting string into array with a specific number of elements, c#

I have a string which consists number of ordered terms separated by lines (\n) as it shown in the following example: (note, the string I have is an element of an array of string)
term 1
term 2
.......
.......
term n
I want to split a specific number of terms, let we say (1000) only and discard the rest of the terms. I'm trying the following code :
string[] training = traindocs[tr].Trim().Split('\n');
List <string> trainterms = new List<string>();
for (int i = 0; i < 1000; i++)
{
if (i >= training.Length)
break;
trainterms.Add(training[i].Trim().Split('\t')[0]);
}
Can I conduct this operation without using List or any other data structure? I mean just extract the specific number of the terms into the the Array (training) directly ?? thanks in advance.
How about LINQ? The .Take() extension method kind of seems to fit your bill:
List<string> trainterms = traindocs[tr].Trim().Split('\n').Take(1000).ToList();
According to MSDN you can use an overloaded version of the split method.
public string[] Split( char[] separator, int count,
StringSplitOptions options )
Parameters
separator Type: System.Char[] An array of Unicode characters that
delimit the substrings in this string, an empty array that contains no
delimiters, or null.
count Type: System.Int32 The maximum number of
substrings to return.
options Type: System.StringSplitOptions
StringSplitOptions.RemoveEmptyEntries to omit empty array elements
from the array returned; or StringSplitOptions.None to include empty
array elements in the array returned.
Return Value
Type: System.String[] An array whose elements contain the substrings
in this string that are delimited by one or more characters in
separator. For more information, see the Remarks section.
So something like so:
String str = "A,B,C,D,E,F,G,H,I";
String[] str2 = str.Split(new Char[]{','}, 5, StringSplitOptions.RemoveEmptyEntries);
System.Console.WriteLine(str2.Length);
System.Console.Read();
Would print: 5
EDIT:
Upon further investigation it seems that the count parameter just instructs when the splitting stops. The rest of the string will be kept in the last element.
So, the code above, would yield the following result:[0] = A, [1] = B, [2] = C, [3] = D, [4] = E,F,G,H,I, which is not something you seem to be after.
To fix this, you would need to do something like so:
String str = "A\nB\nC\nD\nE\nF\nG\nH\nI";
List<String> myList = str.Split(new Char[]{'\n'}, 5, StringSplitOptions.RemoveEmptyEntries).ToList<String>();
myList[myList.Count - 1] = myList[myList.Count - 1].Split(new Char[] { '\n' })[0];
System.Console.WriteLine(myList.Count);
foreach (String str1 in myList)
{
System.Console.WriteLine(str1);
}
System.Console.Read();
The code above will only retain the first 5 (in your case, 1000) elements. Thus, I think that Darin's solution might be cleaner, if you will.
If you want most efficient(fastest) way, you have to use overload of String.Split, passing total number of items required.
If you want easy way, use LINQ.

StringSplitOptions.RemoveEmptyEntries doesn't work as advertised

I've come across this several times in the past and have finally decided to find out why.
StringSplitOptions.RemoveEmptyEntries would suggest that it removes empty entries.
So why does this test fail?
var tags = "One, Two, , Three, Foo Bar, , Day , ";
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());
tagsSplit.ShouldEqual(new string[] {
"One",
"Two",
"Three",
"Foo Bar",
"Day"
});
The result:
Values differ at index [2]
Expected string length 5 but was 0. Strings differ at index 0.
Expected: "Three"
But was: <string.Empty>
So it fails because instead of "Three", we have an empty string – exactly what StringSplitOptions.RemoveEmptyEntries should prevent.
Most likely because you change the string after the split. You trim the values after splitting them, RemoveEmptyEntries doesn't consider the string " " empty.
The following would achieve what you want, basically creating your own strip empty elements:
var tagsSplit = tags.Split(',').
Select(tag => tag.Trim()).
Where( tag => !string.IsNullOrEmpty(tag));
Adjacent delimiters yield an array element that contains an empty
string (""). The values of the StringSplitOptions enumeration specify
whether an array element that contains an empty string is included in
the returned array.
" " by definition is not empty (it is actually whitespace), so it is not removed from resulting array.
If you use .net framework 4, you could work around that by using string.IsNullOrWhitespace method
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Where(x => !string.IsNullOrWhiteSpace(x))
.Select(s => s.Trim());
RemoveEmptyEntries do not means space.
Your input string include many "space". You should notice that "space" is not empty. In computer, space is a special ASCII code. so the code:
var tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());
means:
Split the input by ',' and remove empty entry, not include space. So
you got an array with some space elements.
Then you do trim for each of elements. The space elements become to empty.
That's why you got it.
In .NET 5, they added StringSplitOptions.TrimEntries.
Since StringSplitOptions has the [System.Flags] attribute, it means you can write
var splitResult = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries);
When both RemoveEmptyEntries and TrimEntries are specified, it removes both empty values and values which only contain whitespace, whilst trimming all the remaining values.
Try
var tagsSplit = tags.Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
This will spit by comma and space, and eliminate empty strings.
I've searched also for a clean way to exclude whitespace-entries during a Split, but since all options seemed like some kind of workarounds, I've chosen to exclude them when looping over the array.
string[] tagsSplit = tags.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string tag in tagsSplit.Where(t => !string.IsNullOrWhiteSpace(t))) { }
I think this looks cleaner and - as a bonus - .Split(...).ToArray() may be ommited.
Of course, it is an option only when you may loop just after split and do not have to store entries for later use.
As this is a very common need, I went ahead and wrapped the most popular answer in a string extension method:
public static IEnumerable<string> Split_RemoveWhiteTokens(this string s, params char[] separator)
{
return s.Split(separator).
Select(tag => tag.Trim()).
Where(tag => !string.IsNullOrEmpty(tag));
}
To split on ',' as the other examples, use like this:
var result = yourString.Split_RemoveWhiteTokens(',')
Note that the return type is IEnumerable, so you can do additional LINQ queries directly on the return result. Call .ToList() if you want to cast the result to a list.
By EmptyEntries it means a case where two delimiters are directly next to each other with nothing in between. Without using this option, it will print a blank line to represent this delimitation. If you use the "RemoveEmptyEntries" option it will not show the delimitation unless there is actually something between the delimiters. A blank space counts as something between the delimiters. If you tried:
One, Two,, Three,
You should find that RemoveEmptyEntries eliminates the delimitation between the two commas and goes straight from Two to Three.
var tagsSplit = tags.Split(',')
.Where(str => str != String.IsNullOrWhiteSpace(str))
.Select(s => s.Trim());

Splitting a String into only 2 parts

I want to take a string from a textbox (txtFrom) and save the first word and save whatever is left in another part. (the whatever is left is everything past the first space)
Example string = "Bob jones went to the store"
array[0] would give "Bob"
array[1] would give "jones went to the store"
I know there is string[] array = txtFrom.Split(' '); , but that gives me an array of 6 with individual words.
Use String.Split(Char[], Int32) overload like this:
string[] array = txtFrom.Text.Split(new char[]{' '},2);
http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx
You simply combine a split with a join to get the first element:
string[] items = source.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string firstItem = items[0];
string remainingItems = string.Join(" ", items.Skip(1).ToList());
You simply take the first item and then reform the remainder back into a string.
char[] delimiterChars = { ' ', ',' };
string text = txtString.Text;
string[] words = text.Split(delimiterChars, 2);
txtString1.Text = words[0].ToString();
txtString2.Text = words[1].ToString();
There is an overload of the String.Split() method which takes an integer representing the number of substrings to return.
So your method call would become: string[] array = txtFrom.Text.Split(' ', 2);
You can also try RegularExpressions
Match M = System.Text.RegularExpressions.Regex.Match(source,"(.*?)\s(.*)");
M.Groups[1] //Bob
M.Groups[2] // jones went to the store
The regular expression matches everything up to the first space and stores it in the first group the ? mark tells it to make the smallest match possible. The second clause grabs everything after the space and stores it in the second group

Categories