Differentiate between tab-separated, space-separated and comma-separated streams - c#

I am using the following code to read a tab-delimited stream.
using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
while ((line = readFile.ReadLine()) != null)
{
row = line.Split('\t');
parsedData.Add(row);
}
}
However, occasionally a user may supply a space-separated or comma-separated file. How do I automatically detect the delimiter instead of having to change row = line.Split('\t'); to row = line.Split(' '); or row = line.Split(',');?
Thanks.

You can use to string.Split method to split your data by number of characters
var delims = new [] {',', '\t', ' ' };
var result = line.Split(delims, StringSplitOptions.RemoveEmptyEntries);
Or you can use Regex
var result = Regex.Split(line, #"[,\t ]+")

You can't differentiate between them before hand.
What you can do is try to split on all of them:
row = line.Split('\t', ' ', ',');
This of course assumes that the data between delimiters doesn't contain the delimiters.

You'll have to define what a separator is and how you detect it. If you say: "The separator for a file is the first non-quoted whitespace character I encounter on the first line", then you can read the first line and determine the separator. You can then pass that to the .Split() method.

row = line.Split(new char[]{' ', ',', '\t'}, StringSplitOptions.RemoveEmptyEntries);

Related

Split String with multiple delimiters to different arrays in c#

So I have a text file copied into memory that is delimited as follows:
"425,9856\n852,9658\n"
This is a long string with some 30,000 entries in total. What I want to do is create two arrays, one for the value to the left of the comma, one for the value to the right of the comma, and then to each array respectively i want to append the next two comma delimited strings that come after the "\n".
I have tried splitting using .Split and passing two delimiting values, but it obviously just creates one array with all values sequentially. Such as:
425
9856
852
9658
When what I want is:
array1:
452
852
array2:
9856
9658
Does that make sense?
many thanks
Since you're reading from a file, why not stream the input line-by-line, rather than reading the whole lot into memory in one go?
using var reader = new StreamReader(filePath);
while (reader.ReadLine() is not null line)
{
// Each line is of the form '425,9856', so just split on the comma
var parts = line.Split(',');
firstList.Add(parts[0]);
secondList.Add(parts[1]);
}
You can just split it twice to get what you want
public static void Main()
{
var foo = "425,9856" + Environment.NewLine + "852,9658" + Environment.NewLine;
var array1 = new List<string>();
var array2 = new List<string>();
string[] lines = foo.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None);
foreach(var line in lines)
{
//Console.WriteLine("line: " + line);
var lineSplit = line.Split(',');
//Console.WriteLine("lineSplit: " + lineSplit.Length);
//lineSplit.Dump();
if(lineSplit.Length > 1)
{
array1.Add(lineSplit[0]);
array2.Add(lineSplit[1]);
}
}
Console.WriteLine("Array1: ");
array1.Dump();
Console.WriteLine("Array2: ");
array2.Dump();
}
And here's a working fiddle of it.
You can use RegEx
string row = #"425,9856\n852,9658\n";
string left = #"[^|(?<=n)]\d*(?=,)";
string right = #"(?<=,)\d*(?=\\)";
Regex rgLeft = new Regex(left);
var l = rgLeft.Matches(row).Select(p=> p.Value);
Regex rgRight = new Regex(right);
var r = rgRight.Matches(row).Select(p=> p.Value);

String.Split for new lines working with string[] but not with char[]

check this code:
string t = #"\nazerty \n\nazerty \n\nazerty \nazerty";
string[] firstMethod = t.Split(new char[]{'\n'}, StringSplitOptions.RemoveEmptyEntries);
string[] secondMethod = t.Split(new string[]{#"\n"}, StringSplitOptions.RemoveEmptyEntries);
why does first method NOT work and second does ???
Thx
This isn't working because you are using verbatim strings, i.e.:
string t = #"\nazerty \n\nazerty \n\nazerty \nazerty";
... is equivalent to:
string t = "\\nazerty \\n\\nazerty \\n\\nazerty \\nazerty";
It's likely that you actually wanted the following, which uses newline characters instead of literal backslash-n:
string t = "\nazerty \n\nazerty \n\nazerty \nazerty";
This would be "successfully" split on either new[] { "\n" } or new[] { '\n' } (but not new[] { #"\n" } which expects backslash-backslash-n).

MemoryStream to string[]

I read the content of a CSV file from a zip file in memory(the requirment is not to write to disk) into the MemoryStream. and use to following code to get the human readable string
string result = Encoding.ASCII.GetString(memoryStream.ToArray());
However, we would like the result to be a string[] to map each row in the CSV file.
Is there a way to handle this automatically?
Thanks
Firstly, there's no need to call ToArray on the memory stream. Just use a StreamReader, and call ReadLine() repeatedly:
memoryStream.Position = 0; // Rewind!
List<string> rows = new List<string>();
// Are you *sure* you want ASCII?
using (var reader = new StreamReader(memoryStream, Encoding.ASCII))
{
string line;
while ((line = reader.ReadLine()) != null)
{
rows.Add(line);
}
}
You can use Split method to split string by newlines:
string[] result = Encoding.
ASCII.
GetString(memoryStream.ToArray()).
Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
Depending on the contents of your CSV file, this can be a much harder problem than you're giving it credit for.
assume this is your csv:
id, data1, data2
1, some data, more data
2, "This element has a new line
right in the middle of the field", and that can create problems if you're reading line by line
If you simply read this in line by line with reader.ReadLine(), you're not going to get what you want if you happen to have quoted fields with new lines in the middle (which is generally allowed in CSVs). you need something more like this
List<String> results = new List<string>();
StringBuilder nextRow = new StringBuilder();
bool inQuote = false;
char nextChar;
while(reader.ReadChar(out nextChar)){ // pretend ReadChar reads a char into nextChar and returns false when it hits EOF
if(nextChar == '"'){
inQuote = !inQuote;
} else if(!inQuote && nextChar == '\n'){
results.Add(nextRow.ToString());
nextRow.Length = 0;
} else{ nextString.Append(nextChar); }
}
note that this handles double quotes. Missing quotes will be a problem, but they always are in .csv files.

Adding a Newline

The code below adds the characters \r\n to my string variable but once the string is returned the Newline is ignored.
Here is a snippet of the returned string: Mondavi\r\nrms_processtype
And here is the code where I add a Newline:
char[] charsToTrim = { ',', ' ' };
feed = feed.TrimEnd(charsToTrim) + Environment.NewLine;
Here's the code that error's when it attempts to read the "feed" variable
var dict = feed.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None)
.SelectMany(s => s.Split('|')).ToDictionary(t => t.Split(',')[0], t => t.Split(',')[1]);
Try StringSplitOptions.RemoveEmptyEntries

C# Why do I only get partial results when parsing out a CSV or TSV file?

I am trying to get the second value from a CSV file with 100 rows. I am getting the first 42 values then it stops... no error messege, or error handling at all for that matter. I am perplexed and am on a timeline. It is also doing it for a TSV file, but giving the first 43 results. Please help and let me know if it looks strange to you.
I am using streamreader, reading each line into a string array, splitting the array and taking the second value and adding it to a list...
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
StreamReader sr = new StreamReader(path);
List<string> stkno = new List<string>();
foreach (var line in path)
{
string s = sr.ReadLine();
string[] words = s.Split(',');
stkno.Add(words[1]);
}
var message = string.Join(",", stkno.ToArray());
MessageBox.Show(message);
Your path variable is a string. That means when you foreach over it, you're getting a sequence of characters - 'C' then ':' then '\' etc. I don't think that's what you mean to do...
Here's a simpler approach using File.ReadLines:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
List<string> stkno = (from line in File.ReadLines(path)
let words = line.Split(',')
select words[1]).ToList();
Or:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
List<string> stkno = File.ReadLines(path)
.Select(line => line.Split(',')[1])
.ToList();
If you're using .NET 3.5 and you don't mind reading the whole file in one go, you can use File.ReadAllLines instead.
You are accidentally iterating over the number of characters in the file path instead of the number of lines in the string. This change should fix that:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
StreamReader sr = new StreamReader(path);
List<string> stkno = new List<string>();
while (sr.Peek() >= 0)
{
string s = sr.ReadLine();
string[] words = s.Split(',');
stkno.Add(words[1]);
}
var message = string.Join(",", stkno.ToArray());
MessageBox.Show(message);
How about this:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
var secondWords = from line in File.ReadAllLines(path)
let words = line.Split(',')
select words[1];
var message = string.Join(",", secondWords.ToArray());
I think you mean to do:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
StreamReader sr = new StreamReader(path);
List<string> stkno = new List<string>();
string s;
while(s = sr.ReadLine() != null)
{
string[] words = s.Split(',');
stkno.Add(words[1]);
}
var message = string.Join(",", stkno.ToArray());
MessageBox.Show(message);

Categories