Convert a single datarow in a CSV like string in C# - c#

How can I convert a single datarow (of a datatable) into a CSV like string with only few C# commands.
In other words into a string like "value_1;value_2;...;value_n"

Quick and dirty for a single DataRow:
System.Data.DataRow row = ...;
string csvLine = String.Join(
CultureInfo.CurrentCulture.TextInfo.ListSeparator,
row.ItemArray);
If you do not care about the culture specific separator you may do this to convert a full DataTable:
public static string ToCsv(System.Data.DataTable table)
{
StringBuilder csv = new StringBuilder();
foreach (DataRow row in table.Rows)
csv.AppendLine(String.Join(";", row.ItemArray));
return csv.ToString();
}
Here a more complex example if you need to handle a little bit of formatting for the values (in case they're not just numbers).
public static string ToCsv(DataTable table)
{
StringBuilder csv = new StringBuilder();
foreach (DataRow row in table.Rows)
{
for (int i = 0; i < row.ItemArray.Length; ++i)
{
if (i > 0)
csv.Append(CultureInfo.CurrentCulture.TextInfo.ListSeparator);
csv.Append(FormatValue(row.ItemArray[i]));
}
csv.AppendLine();
}
return csv.ToString();
}
Or, if you prefer LINQ (and assuming table is not empty):
public static string ToCsv(DataTable table, string separator = null)
{
if (separator == null)
separator = CultureInfo.CurrentCulture.TextInfo.ListSeparator;
return table.Rows
.Cast<DataRow>()
.Select(r => String.Join(separator, r.ItemArray.Select(c => FormatValue(c)))
.Aggregate(new StringBuilder(), (result, line) => result.AppendLine(line))
.ToString();
}
Using this private function to format a value. It's a very naive implementation, for non primitive types you should use TypeConverter (if any, see this nice library: Universal Type Converter) and quote the text only if needed (2.6):
private static string FormatValue(object value)
{
if (Object.ReferenceEquals(value, null))
return "";
Type valueType = value.GetType();
if (valueType.IsPrimitive || valueType == typeof(DateTime))
return value.ToString();
return String.Format("\"{0}\"",
value.ToString().Replace("\"", "\"\"");
}
Notes
Even if there is a RFC for CSV many applications does not follow its rules and they handle special cases in their very own way (Microsoft Excel, for example, uses the locale list separator instead of comma and it doesn't handle newlines in strings as required by the standard).

Here is a start:
StringBuilder line = new StringBuilder();
bool first = true;
foreach (object o in theDataRow.ItemArray) {
string s = o.Tostring();
if (s.Contains("\"") || s.Contains(",")) {
s = "\"" + s.Replace("\"", "\"\"") + "\"";
}
if (first) {
first = false;
} else {
line.Adppend(',');
}
line.Append(s);
}
String csv = line.ToString();
It will handle quoted values and values containing the separator, i.e. a value that contains quotation marks or the separator needs to be surrounded by quotation marks, and the quotation marks inside it needs to be escaped by doubling them.
Note that the code uses comma as separator, as that's what the C in CSV stands for. Some programs might be more comfortable using semicolon.
Room for imrovement: There are also other characters that should trigger quoting, like padding spaces and line breaks.
Note: Even if there is a standard defined for CSV files now, it's rarely followed because many programs were developed long before the standard existed. You just have to adapt to the peculiarities of any program that you need to communicate with.

Related

DateTime.Parse date conversion is not consistent

I have an Excel sheet where one column has a date. I have C# code to take that column, convert it to a date and then insert it to a SQL database.
The conversion is done like this:
r.transactionDate = DateTime.Parse(Convert.ToString(xlRange.Cells[i, 1].Value));
The data transfer is working without an issue.
When the date in the Excel sheet is something like 25/05/2022 (25th of May), it is converted properly and the date I get in the SQL database is the 25/05/2022.
However, the problem is when the date is something like 12/05/2022 (12th May); it is converted to 05/12/2022 (05th of December).
How can I fix this issue?
It's an excel sheet downloaded from Paypal that has all the dates of payments. There is a date column with dates.
Paypal does not offer a download in XLS or XLSX format. What you have is, most likely, a CSV file that Excel can open. It appears as an Excel file in your file system because Excel has registered the CSV file extension.
CSV (Comma-Separated Values) is a text file with all values represented as strings, separated by the , character. It has been around for a very long time, but was never formally specified. Or more precisely, formal specifications for CSV have been created several times, and nobody really knows which one is the right one. CSV files that work perfectly with one program can be unreadable to another.
Excel's CSV import is notorious for using US date formats regardless of your computer's regional settings. Each value is separated and examined individually. If the value looks like a date format then Excel attempts to parse it as a US date. If the first set of digits is in the range 1-12 then it is interpreted as the month. If the first set of digits is 13 or more then it tries the regional date strings, and may fall back to day-first if that fails. (This varies apparently.)
For this reason (among others) I strongly recommend that you never open a CSV file via Excel automation. Ever. In fact you should never open a CSV file with dates in it using Excel unless you know that the file was generated with month-first date formats only. Even then it is a needless waste of resources to open a text file with Excel.
You can do better.
There are plenty of libraries out there that will help you with CSV import. I've used a few (CsvHelper is a reasonable starting place), but usually I just write something simple to do the job. Read the file a line at a time (StreamReader.ReadLine), split each line into a collection of values (via my SplitCSV method), then write a ParseCSV method for the type that takes a collection of strings and returns a configured object. That way I have direct control over the way the input data is interpreted without having Excel mess things up for me.
Here's a simple example:
const string InputFile = #"C:\Temp\test.csv";
static void Main()
{
var rows = ReadCSV(InputFile, RowData.ParseCSV);
foreach (var row in rows)
{
Console.WriteLine($"#{row.RowNum}, Date: {row.SomeDate}, Amount: ${row.Amount:#,0.00}");
}
}
class RowData
{
public int RowNum { get; set; }
public DateTime SomeDate { get; set; }
public decimal Amount { get; set; }
public static RowData ParseCSV(string[] values)
{
if (values is null || values.Length < 3)
return null;
if (!int.TryParse(values[0], out var rownum) ||
!DateTime.TryParse(values[1], out var somedate) ||
!decimal.TryParse(values[2], out var amount)
)
return null;
return new RowData
{
RowNum = rownum,
SomeDate = somedate,
Amount = amount
};
}
}
static IEnumerable<T> ReadCSV<T>(string filename, Func<string[], T> parser, bool skipHeaders = true)
where T : class
{
bool first = true;
foreach (var line in Lines(filename))
{
if (first)
{
first = false;
if (skipHeaders)
continue;
}
T curr = null;
try
{
var values = SplitCSV(line);
curr = parser(values);
}
catch
{
// Do something here if you care about bad data.
}
if (curr != null)
yield return curr;
}
}
static string[] SplitCSV(string line)
{
var sb = new StringBuilder();
return internal_split().ToArray();
// The actual split, done as an enumerator.
IEnumerable<string> internal_split()
{
bool inQuote = false;
foreach (char c in line)
{
if (c == ',' && !inQuote)
{
// yield value
yield return sb.ToString();
sb.Clear();
}
else if (c == '"')
inQuote = !inQuote;
else
sb.Append(c);
}
// yield last field
yield return sb.ToString();
}
}
static IEnumerable<string> Lines(string filename)
{
string line;
using var reader = File.OpenText(filename);
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
(It's not the best way, but it's a way to do it.)

CSV export Function, what to do if string contains the character seperator?

i use a fuction to convert a Datatable to CSV and i use File.WriteAllText to save it to a file.
private static string DataTableToCSV(DataTable dtable, char seperator)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < dtable.Columns.Count; i++)
{
sb.Append(dtable.Columns[i]);
if (i < dtable.Columns.Count - 1)
sb.Append(seperator);
}
sb.AppendLine();
foreach (DataRow dr in dtable.Rows)
{
for (int i = 0; i < dtable.Columns.Count; i++)
{
sb.Append(dr[i].ToString());
if (i < dtable.Columns.Count - 1)
{
sb.Append(seperator);
}
}
sb.AppendLine();
}
return sb.ToString();
}
well, the Code is working. My problem is, in CSV the seperator is ';'. Now, of course, errors occur when a string in the table contains a semicolon. Is there perhaps an elegant way to solve the problem?
You should consider using a library to handle the CSV part for you. The current accepted answer handles quoting when the value contains the delimiter, but what happens when the value contains or starts with the quote character, or what if the value contains a newline? That approach will create an invalid file. The de-facto CSV standard specifies that fields should be quoted when the value contains a delimiter or a newline, and that quotes should be doubled up to "escape" them.
There are many libraries that can help with this, including one that I'm the author of: Sylvan.Data.Csv. Sylvan handles your scenarios in a very straightforward way:
using Sylvan.Data.Csv;
static string DataTableToCSV(DataTable dtable, char seperator)
{
using var sw = new StringWriter();
var opts = new CsvDataWriterOptions { Delimiter = seperator };
using var csvw = CsvDataWriter.Create(sw, opts);
csvw.Write(dtable.CreateDataReader());
return sw.ToString();
}
I wrote a little helper for formatting every line of my CSV.
private string FormatForCsv(string value) => value != null && value.Contains(';') ? value.Replace(value, "\"" + value + "\"") : value;
So then you can implement using:
sb.Append(FormatForCsv(dr[i]?.ToString()));
Of course you can use the same for the headers too.
I also added a null check when converting dr[i] to a string, just to be on the safe side.

Parse Text File Into Dictionary

I have a text file that has several hundred configuration values. The general format of the configuration data is "Label:Value". Using C# .net, I would like to read these configurations, and use the Values in other portions of the code. My first thought is that I would use a string search to look for the Labels then parse out the values following the labels and add them to a dictionary, but this seems rather tedious considering the number of labels/values that I would have to search for. I am interested to hear some thoughts on a possible architecture to perform this task. I have included a small section of a sample text file that contains some of the labels and values (below). A couple of notes: The Values are not always numeric (as seen in the AUX Serial Number); For whatever reason, the text files were formatted using spaces (\s) rather than tabs (\t). Thanks in advance for any time you spend thinking about this.
Sample Text:
AUX Serial Number: 445P000023 AUX Hardware Rev: 1
Barometric Pressure Slope: -1.452153E-02
Barometric Pressure Intercept: 9.524336E+02
This is a nice little brain tickler. I think this code might be able to point you in the right direction. Keep in mind, this fills a Dictionary<string, string>, so there are no conversions of values into ints or the like. Also, please excuse the mess (and the poor naming conventions). It was a quick write-up based on my train of thought.
Dictionary<string, string> allTheThings = new Dictionary<string, string>();
public void ReadIt()
{
// Open the file into a streamreader
using (System.IO.StreamReader sr = new System.IO.StreamReader("text_path_here.txt"))
{
while (!sr.EndOfStream) // Keep reading until we get to the end
{
string splitMe = sr.ReadLine();
string[] bananaSplits = splitMe.Split(new char[] { ':' }); //Split at the colons
if (bananaSplits.Length < 2) // If we get less than 2 results, discard them
continue;
else if (bananaSplits.Length == 2) // Easy part. If there are 2 results, add them to the dictionary
allTheThings.Add(bananaSplits[0].Trim(), bananaSplits[1].Trim());
else if (bananaSplits.Length > 2)
SplitItGood(splitMe, allTheThings); // Hard part. If there are more than 2 results, use the method below.
}
}
}
public void SplitItGood(string stringInput, Dictionary<string, string> dictInput)
{
StringBuilder sb = new StringBuilder();
List<string> fish = new List<string>(); // This list will hold the keys and values as we find them
bool hasFirstValue = false;
foreach (char c in stringInput) // Iterate through each character in the input
{
if (c != ':') // Keep building the string until we reach a colon
sb.Append(c);
else if (c == ':' && !hasFirstValue)
{
fish.Add(sb.ToString().Trim());
sb.Clear();
hasFirstValue = true;
}
else if (c == ':' && hasFirstValue)
{
// Below, the StringBuilder currently has something like this:
// " 235235 Some Text Here"
// We trim the leading whitespace, then split at the first sign of a double space
string[] bananaSplit = sb.ToString()
.Trim()
.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);
// Add both results to the list
fish.Add(bananaSplit[0].Trim());
fish.Add(bananaSplit[1].Trim());
sb.Clear();
}
}
fish.Add(sb.ToString().Trim()); // Add the last result to the list
for (int i = 0; i < fish.Count; i += 2)
{
// This for loop assumes that the amount of keys and values added together
// is an even number. If it comes out odd, then one of the lines on the input
// text file wasn't parsed correctly or wasn't generated correctly.
dictInput.Add(fish[i], fish[i + 1]);
}
}
So the only general approach that I can think of, given the format that you're limited to, is to first find the first colon on the line and take everything before it as the label. Skip all whilespace characters until you get to the first non-whitespace character. Take all non-whitespace characters as the value of the label. If there is a colon after the end of that value take everything after the end of the previous value to the colon as the next value and repeat. You'll also probably need to trim whitespace around the labels.
You might be able to capture that meaning with a regex, but it wouldn't likely be a pretty one if you could; I'd avoid it for something this complex unless you're entire development team is very proficient with them.
I would try something like this:
While string contains triple space, replace it with double space.
Replace all ": " and ": " (: with double space) with ":".
Replace all " " (double space) with '\n' (new line).
If line don't contain ':' than skip the line. Else, use string.Split(':'). This way you receive arrays of 2 strings (key and value). Some of them may contain empty characters at the beginning or at the end.
Use string.Trim() to get rid of those empty characters.
Add received key and value to Dictionary.
I am not sure if it solves all your cases but it's a general clue how I would try to do it.
If it works you could think about performance (use StringBuilder instead of string wherever it is possible etc.).
This is probably the dirtiest function I´ve ever written, but it works.
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] data = line.Split(':');
if (line != String.Empty)
{
for (int i = 0; i < data.Length - 1; i++)
{
if (i != 0)
{
bool isPair;
if (i % 2 == 0)
{
isPair = true;
}
else
{
isPair = false;
}
if (isPair)
{
string keyOdd = data[i].Trim();
try { keyOdd = keyOdd.Substring(keyOdd.IndexOf(' ')).TrimStart(); }
catch { }
string valueOdd = data[i + 1].TrimStart();
try { valueOdd = valueOdd.Remove(valueOdd.IndexOf(' ')); } catch{}
yourDic.Add(keyOdd, valueOdd);
}
else
{
string keyPair = data[i].TrimStart();
keyPair = keyPair.Substring(keyPair.IndexOf(' ')).Trim();
string valuePair = data[i + 1].TrimStart();
try { valuePair = valuePair.Remove(valuePair.IndexOf(' ')); } catch { }
yourDic.Add(keyPair, valuePair);
}
}
else
{
string key = data[i].Trim();
string value = data[i + 1].TrimStart();
try { value = value.Remove(value.IndexOf(' ')); } catch{}
yourDic.Add(key, value);
}
}
}
}
How does it works?, well splitting the line you can know what you can get in every position of the array, so I just play with the even and odd values.
You will understand me when you debug this function :D. It fills the Dictionary that you need.
I have another idea. Does values contain spaces? If not you could do like this:
Ignore white spaces until you read some other char (first char of key).
Read string until ':' occures.
Trim key that you get.
Ignore white spaces until you read some other char (first char of value).
Read until you get empty char.
Trim value that you get.
If it is the end than stop. Else, go back to step 1.
Good luck.
Maybe something like this would work, be careful with the ':' character
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
yourDic.Add(line.Split(':')[0], line.Split(':')[1]);
}
Anyway, I recommend to organize that file in some way that you´ll always know in what format it comes.

What is the best way to parse this string in C#?

I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:
key:value[space]key:value[space]key:value[space]
So I wrote this code to parse it:
string myString = ReadinString();
string[] tokens = myString.split(' ');
foreach (string token in tokens) {
string key = token.split(':')[0];
string value = token.split(':')[1];
. . . .
}
The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.
NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.
Use this regular expression:
\w+:[\w\s]+(?![\w+:])
I tested it on
test:testvalue test2:test value test3:testvalue3
It returns three matches:
test:testvalue
test2:test value
test3:testvalue3
You can change \w to any character set that can occur in your input.
Code for testing this:
var regex = new Regex(#"\w+:[\w\s]+(?![\w+:])");
var test = "test:testvalue test2:test value test3:testvalue3";
foreach (Match match in regex.Matches(test))
{
var key = match.Value.Split(':')[0];
var value = match.Value.Split(':')[1];
Console.WriteLine("{0}:{1}", key, value);
}
Console.ReadLine();
As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.
This cannot work without changing your split from a space to something else such as a "|".
Consider this:
Alfred Bester:Alfred Bester Alfred:Alfred Bester
Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?
string input = "foo:Foobarius Maximus Tiberius Kirk bar:Barforama zap:Zip Brannigan";
foreach (Match match in Regex.Matches(input, #"(\w+):([^:]+)(?![\w+:])"))
{
Console.WriteLine("{0} = {1}",
match.Groups[1].Value,
match.Groups[2].Value
);
}
Gives you:
foo = Foobarius Maximus Tiberius Kirk
bar = Barforama
zap = Zip Brannigan
You could try to Url encode the content between the space (The keys and the values not the : symbol) but this would require that you have control over the Input Method.
Or you could simply use another format (Like XML or JSON), but again you will need control over the Input Format.
If you can't control the input format you could always use a Regular expression and that searches for single spaces where a word plus : follows.
Update (Thanks Jon Grant)
It appears that you can have spaces in the key and the value. If this is the case you will need to seriously rethink your strategy as even Regex won't help.
string input = "key1:value key2:value key3:value";
Dictionary<string, string> dic = input.Split(' ').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);
The first will produce an array:
"key:value", "key:value"
Then an array of arrays:
{ "key", "value" }, { "key", "value" }
And then a dictionary:
"key" => "value", "key" => "value"
Note, that Dictionary<K,V> doesn't allow duplicated keys, it will raise an exception in such a case. If such a scenario is possible, use ToLookup().
Using a regular expression can solve your problem:
private void DoSplit(string str)
{
str += str.Trim() + " ";
string patterns = #"\w+:([\w+\s*])+[^!\w+:]";
var r = new System.Text.RegularExpressions.Regex(patterns);
var ms = r.Matches(str);
foreach (System.Text.RegularExpressions.Match item in ms)
{
string[] s = item.Value.Split(new char[] { ':' });
//Do something
}
}
This code will do it (given the rules below). It parses the keys and values and returns them in a Dictonary<string, string> data structure. I have added some code at the end that assumes given your example that the last value of the entire string/stream will be appended with a [space]:
private Dictionary<string, string> ParseKeyValues(string input)
{
Dictionary<string, string> items = new Dictionary<string, string>();
string[] parts = input.Split(':');
string key = parts[0];
string value;
int currentIndex = 1;
while (currentIndex < parts.Length-1)
{
int indexOfLastSpace=parts[currentIndex].LastIndexOf(' ');
value = parts[currentIndex].Substring(0, indexOfLastSpace);
items.Add(key, value);
key = parts[currentIndex].Substring(indexOfLastSpace + 1);
currentIndex++;
}
value = parts[parts.Length - 1].Substring(0,parts[parts.Length - 1].Length-1);
items.Add(key, parts[parts.Length-1]);
return items;
}
Note: this algorithm assumes the following rules:
No spaces in the values
No colons in the keys
No colons in the values
Without any Regex nor string concat, and as an enumerable (it supposes keys don't have spaces, but values can):
public static IEnumerable<KeyValuePair<string, string>> Split(string text)
{
if (text == null)
yield break;
int keyStart = 0;
int keyEnd = -1;
int lastSpace = -1;
for(int i = 0; i < text.Length; i++)
{
if (text[i] == ' ')
{
lastSpace = i;
continue;
}
if (text[i] == ':')
{
if (lastSpace >= 0)
{
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1, lastSpace - keyEnd - 1));
keyStart = lastSpace + 1;
}
keyEnd = i;
continue;
}
}
if (keyEnd >= 0)
yield return new KeyValuePair<string, string>(text.Substring(keyStart, keyEnd - keyStart), text.Substring(keyEnd + 1));
}
I guess you could take your method and expand upon it slightly to deal with this stuff...
Kind of pseudocode:
List<string> parsedTokens = new List<String>();
string[] tokens = myString.split(' ');
for(int i = 0; i < tokens.Length; i++)
{
// We need to deal with the special case of the last item,
// or if the following item does not contain a colon.
if(i == tokens.Length - 1 || tokens[i+1].IndexOf(':' > -1)
{
parsedTokens.Add(tokens[i]);
}
else
{
// This bit needs to be refined to deal with values with multiple spaces...
parsedTokens.Add(tokens[i] + " " + tokens[i+1]);
}
}
Another approach would be to split on the colon... That way, your first array item would be the name of the first key, second item would be the value of the first key and then name of the second key (can use LastIndexOf to split it out), and so on. This would obviously get very messy if the values can include colons, or the keys can contain spaces, but in that case you'd be pretty much out of luck...

How do i parse a text file in c# [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How do i parse a text file in c#?
Check this interesting approach, Linq To Text Files, very nice, you only need a IEnumerable<string> method, that yields every file.ReadLine(), and you do the query.
Here is another article that better explains the same technique.
using (TextReader rdr = new StreamReader(fullFilePath))
{
string line;
while ((line = rdr.ReadLine()) != null)
{
// use line here
}
}
set the variable "fullFilePath" to the full path eg. C:\temp\myTextFile.txt
The algorithm might look like this:
Open Text File
For every line in the file:
Parse Line
There are several approaches to parsing a line.
The easiest from a beginner standpoint is to use the String methods.
System.String at MSDN
If you are up for more of a challenge, then you can use the System.Text.RegularExpression library to parse your text.
RegEx at MSDN
You might want to use a helper class such as the one described at http://www.blackbeltcoder.com/Articles/strings/a-text-parsing-helper-class.
From years of analyzing CSV files, including ones that are broken or have edge cases, here is my code that passes virtually all of my unit tests:
/// <summary>
/// Read in a line of text, and use the Add() function to add these items to the current CSV structure
/// </summary>
/// <param name="s"></param>
public static bool TryParseCSVLine(string s, char delimiter, char text_qualifier, out string[] array)
{
bool success = true;
List<string> list = new List<string>();
StringBuilder work = new StringBuilder();
for (int i = 0; i < s.Length; i++) {
char c = s[i];
// If we are starting a new field, is this field text qualified?
if ((c == text_qualifier) && (work.Length == 0)) {
int p2;
while (true) {
p2 = s.IndexOf(text_qualifier, i + 1);
// for some reason, this text qualifier is broken
if (p2 < 0) {
work.Append(s.Substring(i + 1));
i = s.Length;
success = false;
break;
}
// Append this qualified string
work.Append(s.Substring(i + 1, p2 - i - 1));
i = p2;
// If this is a double quote, keep going!
if (((p2 + 1) < s.Length) && (s[p2 + 1] == text_qualifier)) {
work.Append(text_qualifier);
i++;
// otherwise, this is a single qualifier, we're done
} else {
break;
}
}
// Does this start a new field?
} else if (c == delimiter) {
list.Add(work.ToString());
work.Length = 0;
// Test for special case: when the user has written a casual comma, space, and text qualifier, skip the space
// Checks if the second parameter of the if statement will pass through successfully
// e.g. "bob", "mary", "bill"
if (i + 2 <= s.Length - 1) {
if (s[i + 1].Equals(' ') && s[i + 2].Equals(text_qualifier)) {
i++;
}
}
} else {
work.Append(c);
}
}
list.Add(work.ToString());
// If we have nothing in the list, and it's possible that this might be a tab delimited list, try that before giving up
if (list.Count == 1 && delimiter != DEFAULT_TAB_DELIMITER) {
string[] tab_delimited_array = ParseLine(s, DEFAULT_TAB_DELIMITER, DEFAULT_QUALIFIER);
if (tab_delimited_array.Length > list.Count) {
array = tab_delimited_array;
return success;
}
}
// Return the array we parsed
array = list.ToArray();
return success;
}
However, this function does not actually parse every valid CSV file out there! Some files have embedded newlines in them, and you need to enable your stream reader to parse multiple lines together to return an array. Here's a tool that does that:
/// <summary>
/// Parse a line whose values may include newline symbols or CR/LF
/// </summary>
/// <param name="sr"></param>
/// <returns></returns>
public static string[] ParseMultiLine(StreamReader sr, char delimiter, char text_qualifier)
{
StringBuilder sb = new StringBuilder();
string[] array = null;
while (!sr.EndOfStream) {
// Read in a line
sb.Append(sr.ReadLine());
// Does it parse?
string s = sb.ToString();
if (TryParseCSVLine(s, delimiter, text_qualifier, out array)) {
return array;
}
}
// Fails to parse - return the best array we were able to get
return array;
}
For reference, I placed my open source CSV code on code.google.com.
If you have more than a trivial language, use a parser generator. It drove me nuts but I've heard good things about ANTLR (Note: get the manual and read it before you start. If you have used a parser generator other than it before you will not approach it correctly right off the bat, at least I didn't)
Other tools also exist.
What do you mean by parse? Parse usually means to split the input into tokens, which you might do if you're trying to implement a programming language. If you're just wanting to read the contents of a text file, look at System.IO.FileInfo.
Without really knowing what sort of text file you're on about, its hard to answer. However, the FileHelpers library has a broad set of tools to help with fixed length file formats, multirecord, delimited etc.
A small improvement on Pero's answer:
FileInfo txtFile = new FileInfo("c:\myfile.txt");
if(!txtFile.Exists) { // error handling }
using (TextReader rdr = txtFile.OpenText())
{
// use the text file as Pero suggested
}
The FileInfo class gives you the opportunity to "do stuff" with the file before you actually start reading from it. You can also pass it around between functions as a better abstraction of the file's location (rather than using the full path string). FileInfo canonicalizes the path so it's absolutely correct (e.g. turning / into \ where appropriate) and lets you extract extra data about the file -- parent directory, extension, name only, permissions, etc.
To begin with, make sure that you have the following namespaces:
using System.Data;
using System.IO;
using System.Text.RegularExpressions;
Next, we build a function that parses any CSV input string into a DataTable:
public DataTable ParseCSV(string inputString) {
DataTable dt=new DataTable();
// declare the Regular Expression that will match versus the input string
Regex re=new Regex("((?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
ArrayList colArray=new ArrayList();
ArrayList rowArray=new ArrayList();
int colCount=0;
int maxColCount=0;
string rowbreak="";
string field="";
MatchCollection mc=re.Matches(inputString);
foreach(Match m in mc) {
// retrieve the field and replace two double-quotes with a single double-quote
field=m.Result("${field}").Replace("\"\"","\"");
rowbreak=m.Result("${rowbreak}");
if (field.Length > 0) {
colArray.Add(field);
colCount++;
}
if (rowbreak.Length > 0) {
// add the column array to the row Array List
rowArray.Add(colArray.ToArray());
// create a new Array List to hold the field values
colArray=new ArrayList();
if (colCount > maxColCount)
maxColCount=colCount;
colCount=0;
}
}
if (rowbreak.Length == 0) {
// this is executed when the last line doesn't
// end with a line break
rowArray.Add(colArray.ToArray());
if (colCount > maxColCount)
maxColCount=colCount;
}
// create the columns for the table
for(int i=0; i < maxColCount; i++)
dt.Columns.Add(String.Format("col{0:000}",i));
// convert the row Array List into an Array object for easier access
Array ra=rowArray.ToArray();
for(int i=0; i < ra.Length; i++) {
// create a new DataRow
DataRow dr=dt.NewRow();
// convert the column Array List into an Array object for easier access
Array ca=(Array)(ra.GetValue(i));
// add each field into the new DataRow
for(int j=0; j < ca.Length; j++)
dr[j]=ca.GetValue(j);
// add the new DataRow to the DataTable
dt.Rows.Add(dr);
}
// in case no data was parsed, create a single column
if (dt.Columns.Count == 0)
dt.Columns.Add("NoData");
return dt;
}
Now that we have a parser for converting a string into a DataTable, all we need now is a function that will read the content from a CSV file and pass it to our ParseCSV function:
public DataTable ParseCSVFile(string path) {
string inputString="";
// check that the file exists before opening it
if (File.Exists(path)) {
StreamReader sr = new StreamReader(path);
inputString = sr.ReadToEnd();
sr.Close();
}
return ParseCSV(inputString);
}
And now you can easily fill a DataGrid with data coming off the CSV file:
protected System.Web.UI.WebControls.DataGrid DataGrid1;
private void Page_Load(object sender, System.EventArgs e) {
// call the parser
DataTable dt=ParseCSVFile(Server.MapPath("./demo.csv"));
// bind the resulting DataTable to a DataGrid Web Control
DataGrid1.DataSource=dt;
DataGrid1.DataBind();
}
Congratulations! You are now able to parse CSV into a DataTable. Good luck with your programming.

Categories