Issue with .NET String.Split - c#

I'm attempting to parse a text file containing data that is being used on a remote FTP server. The data is delimited by an equals sign (=) and I'm attempting to load each row in to two columns in a DataGridView. The code I have written works fine except for when an equals character is thrown into the second column's value. When this happens, regardless of specifying the maximum count as being 2. I'd prefer not to change the delimiter if possible.
Here is the code that is being problematic:
dataGrid_FileContents.Rows.Clear();
char delimiter = '=';
StreamReader fileReader = new StreamReader(fileLocation);
String fileData = fileReader.ReadToEnd();
String[] rows = fileData.Split("\n".ToCharArray());
for(int i = 0; i < rows.Length; i++)
{
String str = rows[i];
String[] items = str.Split(new char[] { delimiter }, 1, StringSplitOptions.RemoveEmptyEntries);
if (items.Length == 2)
{
dataGrid_FileContents.Rows.Add(items[0], items[1]);
}
}
fileReader.Close();
And an example of the file being loaded:
boats=123
cats=234-f
cars==1
It works as intended for the first two rows and then ignores the last row as it ends up creating a String[] with 1 element and two String[]s with zero elements.

Try the following. It will capture the value before and after the first '=', correctly parsing the cars==1 scenario.
String[] items = str.Split(new char[] { delimiter }, 2, stringSplitOptions.None);

A different solution, if you want everything after the first equals then you could approach this problem using string.IndexOf
for(int i = 0; i < rows.Length; i++)
{
String str = rows[i];
int pos = str.IndexOf(delimiter);
if (pos != -1)
{
string first = str.Substring(0, pos-1);
string second = str.Substring(pos + 1);
dataGrid_FileContents.Rows.Add(first, second);
}
}

Just read all items delimeted by '=' in row.
Then iterate over items, and check, that item not empty, than use this prepared data to write
here illustrated snippet
http://dotnetfiddle.net/msVho2
and your snippet can be transformed to something like bellow
dataGrid_FileContents.Rows.Clear();
char delimiter = '=';
using(StreamReader fileReader = new StreamReader(fileLocation))
{
string[] data = new string[2];
while(true)
{
string row = fileReader.ReadLine();
if(row == null)
break;
string[] items = row.Split(delimiter);
int data_index = 0;
foreach(string item in items)
{
if(data_index >= data.Length)
{
//TODO: log warning
break;
}
if(!string.IsNullOrWhiteSpace(item))
{
data[data_index++] = item;
}
}
if(data_index < data.Length)
{
//TODO: log error, only 1 item in row
continue;
}
dataGrid_FileContents.Rows.Add(data[0], data[1]);
}
}

Related

Get Occurrences of Letters in A String

I am trying to count occurrences of letters in a string and almost got the result using the below code snippet:
public static void GetNoofLetters()
{
string str = "AAAAABBCCCDDDD";
int count = 1;
char[] charVal = str.ToCharArray();
List<string> charCnt = new List<string>();
string concat = "";
//Getting each letters using foreach loop
foreach (var ch in charVal)
{
int index = charCnt.FindIndex(c => c.Contains(ch.ToString())); //Checks if there's any existing letter in the list
if(index >= 0) //If letter exists, then count and replace the last value
{
count++;
charCnt[charCnt.Count - 1] = count.ToString() + ch.ToString();
}
else
{
charCnt.Add(ch.ToString()); //If no matching letter exists, then add it to the list initially
count = 1;
}
}
foreach (var item in charCnt)
{
concat += item;
}
Console.WriteLine(concat.Trim());
}
The code works for the given input sample and returns output as: 5A2B3C4D. Simple is that.
But say I've the following input: Second input sample
string str = "AAAAABBCCCDDDDAA";
Expected output:
5A2B3C4D2A
With the above code that I've returns the output as follows:
5A2B3C6A
The above actually occurred for the below code snippet:
if(index >= 0) //If letter found, then count and replace the last value
{
count++;
charCnt[charCnt.Count - 1] = count.ToString() + ch.ToString();
}
Is there any better idea that I can resolve to get the expected output for the second input sample? I can understand, am close enough and may be missing something that's simple enough.
Code sample: Count Occurrences of Letters
Why don't we just loop over value and count? We can have two possibilities:
When character c doesn't equal to current (we have the different character) we should write down the previous sequence and start a new one
Otherwise, add 1 to count
Code:
private static string Compress(string value) {
if (string.IsNullOrEmpty(value))
return value;
char current = '\0';
int count = 0;
StringBuilder result = new StringBuilder(2 * value.Length);
foreach (char c in value) {
if (count != 0 && c != current) {
result.Append(count);
result.Append(current);
count = 0;
}
current = c;
count += 1;
}
result.Append(count);
result.Append(current);
return result.ToString();
}
Please, fiddle yourself
Well, I ended with the following code sample:
public static void Main()
{
string str = "AAAAABBCCCDDDDAABBBBAABB";
int count = 1;
char[] charVal = str.ToCharArray();
List<string> charCnt = new List<string>();
charCnt.Add("");
string concat = "";
//Getting each letters using foreach loop
foreach (var ch in charVal)
{
var lastItem = charCnt.LastOrDefault();
if (lastItem.EndsWith((ch.ToString()))) //If letter exists, then count and replace the last value
{
count++;
charCnt[charCnt.Count - 1] = count.ToString() + ch.ToString();
}
else
{
charCnt.Add(ch.ToString()); //If no matching letter exists, then add it to the list initially
count = 1;
}
}
foreach (var item in charCnt)
{
concat += item; //Concatenate items from the list
}
Console.WriteLine(concat.Trim());
}
Here's a woking sample: Get Occurrences of Letters in A String

How to reverse an array of strings without changing the position of special characters in C#

I'm working on reversing a sentence. I'm able to do it. But I'm not sure, how to reverse the word without changing the special characters positions. I'm using regex but as soon as it finds the special characters it's stopping the reversal of the word.
Following is the code:
Console.WriteLine("Enter:");
string w = Console.ReadLine();
string rw = String.Empty;
String[] arr = w.Split(' ');
var regexItem = new Regex("^[a-zA-Z0-9]*$");
StringBuilder appendString = new StringBuilder();
for (int i = 0; i < arr.Length; i++)
{
char[] chararray = arr[i].ToCharArray();
for (int j = chararray.Length - 1; j >= 0; j--)
{
if (regexItem.IsMatch(rw))
{
rw = appendString.Append(chararray[j]).ToString();
}
}
sb.Append(' ');
}
Console.WriteLine(rw);
Console.ReadLine();
Example : Input
Marshall! Hello.
Expected output
llahsram! olleh.
A basic solution with regex and LINQ. Try it online.
public static void Main()
{
Console.WriteLine("Marshall! Hello.");
Console.WriteLine(Reverse("Marshall! Hello."));
}
public static string Reverse(string source)
{
// we split by groups to keep delimiters
var parts = Regex.Split(source, #"([^a-zA-Z0-9])");
// if we got a group of valid characters
var results = parts.Select(x => x.All(char.IsLetterOrDigit)
// we reverse it
? new string(x.Reverse().ToArray())
// or we keep the delimiters as it
: x);
// then we concat all of them
return string.Concat(results);
}
The same solution without LINQ. Try it online.
public static void Main()
{
Console.WriteLine("Marshall! Hello.");
Console.WriteLine(Reverse("Marshall! Hello."));
}
public static bool IsLettersOrDigits(string s)
{
foreach (var c in s)
{
if (!char.IsLetterOrDigit(c))
{
return false;
}
}
return true;
}
public static string Reverse(char[] s)
{
Array.Reverse(s);
return new string(s);
}
public static string Reverse(string source)
{
var parts = Regex.Split(source, #"([^a-zA-Z0-9])");
var results = new List<string>();
foreach(var x in parts)
{
results.Add(IsLettersOrDigits(x)
? Reverse(x.ToCharArray())
: x);
}
return string.Concat(results);
}
This is a solution without LINQ. I wasn't sure about what are considered special characters.
string sentence = "Marshall! Hello.";
List<string> words = sentence.Split(' ').ToList();
List<string> reversedWords = new List<string>();
foreach (string word in words)
{
char[] arr = new char[word.Length];
for( int i=0; i<word.Length; i++)
{
if(!Char.IsLetterOrDigit((word[i])))
{
for ( int x=0; x< i; x++)
{
arr[x] = arr[x + 1];
}
arr[i] = word[i];
}
else
{
arr[word.Length - 1 - i] = word[i];
}
}
reversedWords.Add(new string(arr));
}
string reversedSentence = string.Join(" ", reversedWords);
Console.WriteLine(reversedSentence);
And this is the output:
Updated Output = llahsraM! olleH.
Here is a non-regex version that does what you want:
var sentence = "Hello, john!";
var parts = sentence.Split(' ');
var reversed = new StringBuilder();
var charPositions = sentence.Select((c, idx) => new { Char = c, Index = idx })
.Where(_ => !char.IsLetterOrDigit(_.Char));
for (int i = 0; i < parts.Length; i++)
{
var chars = parts[i].ToCharArray();
for (int j = chars.Length - 1; j >= 0; j--)
{
if (char.IsLetterOrDigit(chars[j]))
{
reversed.Append(chars[j]);
}
}
}
foreach (var ch in charPositions)
{
reversed.Insert(ch.Index, ch.Char);
}
// olleH, nhoj!
Console.WriteLine(reversed.ToString());
Basically the trick is to remember the position of special (i.e. non letter or digit) characters and insert them at the end to those positions.
This solution is without LINQ and Regex. It may not be an efficient answer but working properly for small string values.
// This will reverse the string and special characters will just stay there.
public string ReverseString(string rString)
{
StringBuilder ss = new StringBuilder(rString);
int y = 0;
// The idea is to swap values. Like swapping first value with last one. It will keep swapping unless it reaches at the middle of the string where no swapping will be needed.
// This first loop is to detect first values.
for(int i=rString.Length-1;i>=0;i--)
{
// This condition is to check if the values is String or not. If it is not string then it is considered as special character which will just stay there at same old position.
if(Char.IsLetter(Convert.ToChar(rString.Substring(i,1))))
{
// This is second loop which is starting from end to swap values from end with first.
for (int k = y; k < rString.Length; k++)
{
// Again checking last values if values are string or not.
if (Char.IsLetter(Convert.ToChar(rString.Substring(k, 1))))
{
// This is swapping. So st1 is First value in that string
// st2 is the last item in that string
char st1 = Convert.ToChar(rString.Substring(k, 1));
char st2 = Convert.ToChar(rString.Substring(i, 1));
//This is swapping. So last item will go to first position and first item will go to last position, To make sure string is reversed.
// Remember when the string value is Special Character, swapping will move forward without swapping.
ss[rString.IndexOf(rString.Substring(i, 1))] = st1;
ss[rString.IndexOf(rString.Substring(k, 1))] = st2;
y++;
// When the swapping is done for first 2 items. The loop will stop to change the values.
break;
}
else
{
// This is just increment if value was Special character.
y++;
}
}
}
}
return ss.ToString();
}
Thanks!

How to split a string on the nth occurrence?

What I want to do is to split on the nth occurrence of a string (in this case it's "\t"). This is the code I'm currently using and it splits on every occurrence of "\t".
string[] items = input.Split(new char[] {'\t'}, StringSplitOptions.RemoveEmptyEntries);
If input = "one\ttwo\tthree\tfour", my code returns the array of:
one
two
three
four
But let's say I want to split it on every "\t" after the second "\t". So, it should return:
one two
three
four
There is nothing built in.
You can use the existing Split, use Take and Skip with string.Join to rebuild the parts that you originally had.
string[] items = input.Split(new char[] {'\t'},
StringSplitOptions.RemoveEmptyEntries);
string firstPart = string.Join("\t", items.Take(nthOccurrence));
string secondPart = string.Join("\t", items.Skip(nthOccurrence))
string[] everythingSplitAfterNthOccurence = items.Skip(nthOccurrence).ToArray();
An alternative is to iterate over all the characters in the string, find the index of the nth occurrence and substring before and after it (or find the next index after the nth, substring on that etc... etc... etc...).
[EDIT] After re-reading the edited OP, I realise this doesn't do what is now asked. This will split on every nth target; the OP wants to split on every target AFTER the nth one.
I'll leave this here for posterity anyway.
If you were using the MoreLinq extensions you could take advantage of its Batch method.
Your code would then look like this:
string text = "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\t13\t14\t15\t16\t17";
var splits = text.Split('\t').Batch(5);
foreach (var split in splits)
Console.WriteLine(string.Join("", split));
I'd probably just use Oded's implementation, but I thought I'd post this for an alternative approach.
The implementation of Batch() looks like this:
public static class EnumerableExt
{
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(this IEnumerable<TSource> source, int size)
{
TSource[] bucket = null;
var count = 0;
foreach (var item in source)
{
if (bucket == null)
bucket = new TSource[size];
bucket[count++] = item;
if (count != size)
continue;
yield return bucket;
bucket = null;
count = 0;
}
if (bucket != null && count > 0)
yield return bucket.Take(count);
}
}
It is likely that you will have to split and re-combine. Something like
int tabIndexToRemove = 3;
string str = "My\tstring\twith\tloads\tof\ttabs";
string[] strArr = str.Split('\t');
int numOfTabs = strArr.Length - 1;
if (tabIndexToRemove > numOfTabs)
throw new IndexOutOfRangeException();
str = String.Empty;
for (int i = 0; i < strArr.Length; i++)
str += i == tabIndexToRemove - 1 ?
strArr[i] : String.Format("{0}\t", strArr[i]);
Result:
My string withloads of tabs
I hope this helps.
// Return a substring of str upto but not including
// the nth occurence of substr
function getNth(str, substr, n) {
var idx;
var i = 0;
var newstr = '';
do {
idx = s.indexOf(c);
newstr += str.substring(0, idx);
str = str.substring(idx+1);
} while (++i < n && (newstr += substr))
return newstr;
}

Parsing data have blank array field showing

I am parsing my data output, however, my data has return charicters in it (\n). So when I run my code, the array is built and one of the arrays (4) is blank data... I have tried using null, "", and " ". Would anyone know how I can prevent that last array from showing?
char[] returnChar= {'\n' };
string parseText = captcha;
string[] words = parseText.Split(returnChar);
int count = words.Length;
for (int i = 0; i < count; i++)
{
if (words[i] == null)
{
MessageBox.Show("This row is empty: " + i);
}
MessageBox.Show(words[i]);
}
When doing String.Split, define the second parameter - StringSplitOptions.
string[] words =
parseText.Split(returnChar, StringSplitOptions.RemoveEmptyEntries);
This way it will skip over empty elements.

Split large string into smaller chunks in c#

I have a large string separated by newline character. This string contains 100 lines. I want to split these line into small chunks say chunk of 20 also based on newline character.
Let's say the string variable is like this,
Line1 This is line2 Line3 is here I am Line4
Now I want to split this large string variable into small chunks of 2. The result should be 2 strings as,
Line1 This is line2
Line3 is here I am Line4
Using Split function, I am not getting the expected results. Please help me in achieving this.
Thanks in advance,
Vijay
The simple approach (Split on Environment.NewLine, then loop and append):
public static List<string> GetStringSegments(string originalString, int linesPerSegment)
{
List<string> segments = new List<string>();
string[] allLines = originalString.Split(new string[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sb = new StringBuilder();
int linesProcessed = 0;
for (int i = 0; i < allLines.Length; i++)
{
sb.AppendLine(allLines[i]);
linesProcessed++;
if (linesProcessed == linesPerSegment
|| i == allLines.Length-1)
{
segments.Add(sb.ToString());
sb.Clear();
inesProcessed = 0;
}
}
return segments;
}
The above approach is slightly inefficient since it requires splitting the string first into individual lines, which creates unnecessary strings. A string of 1000 lines will create an array of 1000 strings. We can improved this if we just scan the string and search for \n:
public static List<string> GetStringSegments(string original, int linesPerSegment)
{
List<string> segments = new List<string>();
int startIndex = 0;
int newLinesEncountered = 0;
for (int i = 0; i < original.Length; i++)
{
if (original[i] == '\n')
{
newLinesEncountered++;
}
if (newLinesEncountered == linesPerSegment
|| i == original.Length - 1)
{
segments.Add(original.Substring(startIndex, (i - startIndex + 1)));
startIndex = i + 1;
newLinesEncountered = 0;
}
}
return segments;
}
You can use something like the batch operator from http://www.make-awesome.com/2010/08/batch-or-partition-a-collection-with-linq
string s = "[YOUR DATA]";
var lines = s.Split(new[]{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
foreach(var batch in lines.Batch(20))
{
foreach(batchLine in batch)
{
Console.Writeline(batchLine);
}
}
static class LinqEx
{
// from http://www.make-awesome.com/2010/08/batch-or-partition-a-collection-with-linq
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection,
int batchSize)
{
List<T> nextbatch = new List<T>(batchSize);
foreach (T item in collection)
{
nextbatch.Add(item);
if (nextbatch.Count == batchSize)
{
yield return nextbatch;
nextbatch = new List<T>(batchSize);
}
}
if (nextbatch.Count > 0)
yield return nextbatch;
}
}
As several people mentioned, using string.Split will split the whole string into memory, which might be an allocation-heavy operation. This is why we have the TextReader class and its descendants, which should provide better memory performance, and might also be clearer, logically:
using (var reader = new StringReader(myString))
{
do
{
StringBuilder newString = null;
StringWriter newStringWriter = null;
if (lineCounter % 20 == 0)
{
newString = new StringBuilder();
newStringWriter = new StringWriter(newString);
newStringCollection.Add(newString);
}
string line = reader.ReadLine();
if (!string.isNullOrEmpty(line))
{
newStringWriter.WriteLine(line);
lineCounter++;
}
}
while (line != null)
}
We're using the StringReader to read our big string, one line at a time. And the corresponding StringWriter writes those lines to the new string, one line a time. After every 20 lines, we start a new StringBuilder (and the appropriate StringWriter wrapper).
split the strings by newline.
Then merge/fetch the number of strings together while using the strings.
string s = "Line1\nThis is line2 \nLine3 is here\nI am Line4";
string [] str = s.split('\n');
List<String> str1 = new List<String>();
for(int i=0; i<str.Length; i+=2)
{
string ss = str[i];
if(i+1 <str.Length)
ss += '\n' + str[i+1];
str1.Add(ss);
}
str = str1.ToArray();
If condition has been checked inside loop because may be the length of str is odd
var strAray = myLongString.Split('\n').ToList();
var skip=0;
var take=20;
var chunk = strAray.Skip(skip).Take(take).ToList();
While(chunk.Count >0)
{
foreach(var line in chunk)
{
// use line string
}
skip++;
chunk = strAray.Skip(skip).Take(take).ToList()
}

Categories