splitting string into array with a specific number of elements, c# - c#

I have a string which consists number of ordered terms separated by lines (\n) as it shown in the following example: (note, the string I have is an element of an array of string)
term 1
term 2
.......
.......
term n
I want to split a specific number of terms, let we say (1000) only and discard the rest of the terms. I'm trying the following code :
string[] training = traindocs[tr].Trim().Split('\n');
List <string> trainterms = new List<string>();
for (int i = 0; i < 1000; i++)
{
if (i >= training.Length)
break;
trainterms.Add(training[i].Trim().Split('\t')[0]);
}
Can I conduct this operation without using List or any other data structure? I mean just extract the specific number of the terms into the the Array (training) directly ?? thanks in advance.

How about LINQ? The .Take() extension method kind of seems to fit your bill:
List<string> trainterms = traindocs[tr].Trim().Split('\n').Take(1000).ToList();

According to MSDN you can use an overloaded version of the split method.
public string[] Split( char[] separator, int count,
StringSplitOptions options )
Parameters
separator Type: System.Char[] An array of Unicode characters that
delimit the substrings in this string, an empty array that contains no
delimiters, or null.
count Type: System.Int32 The maximum number of
substrings to return.
options Type: System.StringSplitOptions
StringSplitOptions.RemoveEmptyEntries to omit empty array elements
from the array returned; or StringSplitOptions.None to include empty
array elements in the array returned.
Return Value
Type: System.String[] An array whose elements contain the substrings
in this string that are delimited by one or more characters in
separator. For more information, see the Remarks section.
So something like so:
String str = "A,B,C,D,E,F,G,H,I";
String[] str2 = str.Split(new Char[]{','}, 5, StringSplitOptions.RemoveEmptyEntries);
System.Console.WriteLine(str2.Length);
System.Console.Read();
Would print: 5
EDIT:
Upon further investigation it seems that the count parameter just instructs when the splitting stops. The rest of the string will be kept in the last element.
So, the code above, would yield the following result:[0] = A, [1] = B, [2] = C, [3] = D, [4] = E,F,G,H,I, which is not something you seem to be after.
To fix this, you would need to do something like so:
String str = "A\nB\nC\nD\nE\nF\nG\nH\nI";
List<String> myList = str.Split(new Char[]{'\n'}, 5, StringSplitOptions.RemoveEmptyEntries).ToList<String>();
myList[myList.Count - 1] = myList[myList.Count - 1].Split(new Char[] { '\n' })[0];
System.Console.WriteLine(myList.Count);
foreach (String str1 in myList)
{
System.Console.WriteLine(str1);
}
System.Console.Read();
The code above will only retain the first 5 (in your case, 1000) elements. Thus, I think that Darin's solution might be cleaner, if you will.

If you want most efficient(fastest) way, you have to use overload of String.Split, passing total number of items required.
If you want easy way, use LINQ.

Related

How to remove Whitespce from stringArray formed based on whitespace

I have a string which contains value like.
90 524 000 1234567890 2207 1926 00:34 02:40 S
Now i have broken this string into string Array based on white-space.Now i want to create one more string array into such a way so that all the white-space gets removed and it contains only real value.
Also i want to get the position of the string array element from the original string array based on the selection from the new string array formed by removing white space.
Please help me.
You can use StringSplitOptions.RemoveEmptyEntries via String.Split.
var values = input.Split(new [] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringSplitOptions.RemoveEmptyEntries: The return value does not include array elements that contain an empty string
When the Split method encounters two consecutive white-space it will return an empty string.Using StringSplitOptions.RemoveEmptyEntries will remove the empty strings and give you only the values you want.
You can also achieve this using LINQ
var values = input.Split().Where(x => x != string.Empty).ToArray();
Edit: If I understand you correctly you want the positions of the values in your old array. If so you can do this by creating a dictionary where the keys are the actual values and the values are indexes:
var oldValues = input.Split(' ');
var values = input.Split().Where(x => x != string.Empty).ToArray();
var indexes = values.ToDictionary(x => x, x => Array.IndexOf(oldValues, x));
Then indexes["1234567890"] will give you the position of 1234567890 in the first array.
You can use StringSplitOptions.RemoveEmptyEntries:
string[] arr = str.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
Note that i've also added tab character as delimiter. There are other white-space characters like the line separator character, add as desired. Full list here.
string s = "90 524 000 1234567890 2207 1926 00:34 02:40 S ";
s.Split(' ').Where(x=>!String.IsNullOrWhiteSpace(x))

String splitting with a special structure

I have strings of the following form:
str = "[int]:[int],[int]:[int],[int]:[int],[int]:[int], ..." (for undefined number of times).
What I did was this:
string[] str_split = str.Split(',');
for( int i = 0; i < str_split.Length; i++ )
{
string[] str_split2 = str_split[i].Split(':');
}
Unfortunately this breaks when some of the numbers have extra ',' inside a number. For example, we have something like this:
695,000:14,306,000:12,136000:12,363000:6
in which the followings are the numbers, ordered from the left to the right:
695,000
14
306,000
12
136000
12
363000
6
How can I resolve this string splitting problem?
If it is the case that only the number to the left of the colon separator can contain commas, then you could simply express this as:
string s = "695,000:14,306,000:12,136000:12,363000:6";
var parts = Regex.Split(s, #":|(?<=:\d+),");
The regex pattern, which identifies the separators, reads: "any colon, or any comma that follows a colon and a sequence of digits (but not another comma)".
A simple solution is split using : as delimiter. The resultant array will have numbers of the format [int],[int]. Parse through the array and split each entry using , as the delimiter. This will give you an array of [int] numbers.
It might not be the best way to do it and it might not work all the time but here's what I'd do.
string[] leftRightDoubles = str.Split(':');
foreach(string substring in leftRightDoubles){
string[] indivNumbers = str.Split(',');
//if indivNumbers.Length == 2, you know that these two are separate numbers
//if indivNumbers.Length > 2, use heuristics to determine which parts belong to which number
if(indivNumbers.Length > 2) {
for(int i = 0, i < indivNumbers.Length, i++) {
if(indivNumbers[i] != '000') { //Or use some other heuristic
//It's a new number
} else {
//It's the rest of previous number
}
}
}
}
//It's sort of pseudocode with comments (haven't touched C# in a while so I don't want to write full C# code)

Cannot convert string[] to char[]

I have a textbox (textBoxA), I would do a split of the content, being single letters would put them in a char[] array (I will not use the lists). Here is the code I used, where I'm wrong?
char[] but = textBoxA.Text.Split("-".ToCharArray());
If you don't mind the iteration, use Linq :) (using System.Linq;)
char[] but = textBoxA.Text.Split('-').Select(s => Convert.ToChar(s)).ToArray();
Consider what you are doing. String.Split returns an array of Strings (string[]). If you assume that your input will only be individual characters, then you can use:
char[] values = textBoxA.Text.Split(new [] { '-' }, StringSplitOptions.RemoveEmptyEntries).Select(e => e[0]).ToArray( );
Array of string will be returned. See
string[] but = textBoxA.Text.Split("-".ToCharArray());
Also,
string[] but = textBoxA.Text.Split('-');
Splitreturns a string array. If you want a char array with each of the strings in order, you have to loop to the array returned by Split, convert each string individually and Append (or Add, can't recall the correct syntax), the char array that results from the conversion to your destination array.
You can also use..
string s = "A-B-C-D-E";
char[] but = s.Split('-').Select(Convert.ToChar).ToArray();
...which is slightly shorter that one of the answers.

How to get all words over a specified length from a string?

I am writing a text analysis program for an assignment and need to write a function which will return all words over a specified length from a string (in this case all words with over 6 characters).
I have found plenty of examples which show how to return groups of words based on their lengths but none on how to get ALL the words over a specified length
static IEnumerable<string> getWordsWithMinLength(string text, int minLength)
{
string[] words = text.Split();
return words.Where(w => w.Length >= minLength);
}
String [] words = text.Split(new char[] {' '},
System.StringSplitOptions.RemoveEmptyEntries );
String [] filteredWords = words.Where(w => w.Length>6).ToArray();
Create a list of strings var list = new List<string>(),
loop through every word in your text,
if (word.Length > 6) { list.Add(word) },
and when you're done, return list;
VoilĂ !
At least you used the homework tag, this does scream "hey, do my work for me." What have you tried so far? Where are you having problems?
Break the problem down. It seems like you have 3 logical pieces:
1)From a string, get all words
2)From those words, find all that have a length greater than N
3)Return those words.
Check out String.Split() for #1, and .Where() in Linq to do the filtering.

Looking for the simplest way to extract tow strings from another in C#

I have the following strings:
string a = "1. testdata";
string b = "12. testdata xxx";
What I would like is to be able to extract the number into one string and the characters following the number into another. I tried using .IndexOf(".") and then remove, trim and
substrings. If possible I would like to find something simpler as I have this to do in a
lot of parts of my code.
if the format is always the same you could do:
a.Split('.');
Proposed solutions so far are not correct.
First, after Split('.') or Split(".") you will have space in the beginning of second substring.
Second, if you have more than one dot - you'll have to do something yet after the split.
More robust solution is below:
string a = "11. Test string. With dots.";
var res = a.Split(new[] {". "}, 2, StringSplitOptions.None);
string number = res[0];
string val = res[1];
Argument 2 specifies maximum number of strings to return. Thus when you have several dots - it will make a split only at the first.
string[]list = a.Split(".");
string numbers = list[0];
string chars = list[1];

Categories