How to write Efficiently write Large 2D arrays to CSV file - c#

EDIT: Using the solution provided, I was able to get it to work (not the most efficient but it works) using the following:
public static void WriteArray(object[,] dataArray, string filePath, bool deleteFileIfExists = true)
{
if (deleteFileIfExists && File.Exists(filePath))
{
File.Delete(filePath);
}
byte columnCount = (byte) dataArray.GetLength(1);
int rowCount = dataArray.GetLength(0);
for(int row = ReadExcel.ExcelIndex; row <= rowCount; row++)
{
List<string> stringList = new List<string>();
for(byte column = ReadExcel.ExcelIndex; column <= columnCount; column++)
{
stringList.Add(dataArray[row, column].ToString());
}
string rowAsString = StringifyRow(stringList.ToArray());
WriteLine(filePath, rowAsString);
}
}
public static string StringifyRow(string[] row)
{
return string.Join(",", row);
}
public static void WriteLine(string filePath, string rowAsString)
{
using (StreamWriter writer = new StreamWriter(filePath, append: true))
{
writer.WriteLine(rowAsString);
writer.Flush();
}
}
I am working on a data consolidation application. This app will read in data as 2D arrays, manipulate some of the columns in memory, consolidate arrays, then write to a csv file.
I am working on a test method of writing to CSV. Currently my array is just over 70K rows and 29 columns. In production, it will always be at least 4 times as many rows. For the test, I did not consolidate any arrays. This is just from one data source.
In order to write to CSV, I have tried numerous methods:
Pass the array (object[,]) to a helper method that loops through each row, creates a string from the values, then appends the string to a list, finally passes the list to the csvwriter to write the list of strings to a file. I get OutOfMemory errors when creating the list of strings.
The same as above but with StringBuilder and no List, just a giant string with stringbuilder (found on .netperls) - same error - OutofMemory exception
No helper method, just pass the array to the csvwriter method, loop through each row, build a string (I've tried this with string builder as well) and write each line one by one to the file. I get an OutOfMemory exception.
Some of my code (some various methods have been commented out) is below:
using System.IO;
using System.Text;
using System.Collections.Generic;
namespace ExcelIO
{
public static class CsvWriter
{
public static void WriteStringListCsv(List<string> data, string filePath, bool deleteIfExists = true)
{
if (deleteIfExists && File.Exists(filePath))
{
File.Delete(filePath);
}
//foreach(string record in data)
//{
// File.WriteAllText(filePath, record);
//}
using (StreamWriter outfile = new StreamWriter(filePath))
{
foreach (string record in data)
{
//trying to write data to csv
outfile.WriteLine(record);
}
}
}
public static void WriteDataArrayToCsv(List<string> data, string filePath, bool deleteIfExists = true)
{
if (deleteIfExists && File.Exists(filePath))
{
File.Delete(filePath);
}
//foreach(string record in data)
//{
// File.WriteAllText(filePath, record);
//}
//using (StreamWriter outfile = new StreamWriter(filePath))
//{
// File.WriteAllText(filePath, )
//}
using (StreamWriter outfile = new StreamWriter(filePath))
{
foreach(string record in data)
{
outfile.WriteLine(record);
}
}
}
public static List<string> ConvertArrayToStringList(object[,] dataArray, char delimiter=',', bool includeHeaders = true)
{
List<string> dataList = new List<string>();
byte colCount = (byte)dataArray.GetLength(1);
int rowCount = dataArray.GetLength(0);
int startingIndex = includeHeaders ? ReadExcel.ExcelIndex : ReadExcel.ExcelIndex + 1;
//StringBuilder dataAsString = new StringBuilder();
for (int rowIndex = startingIndex; rowIndex <= rowCount; rowCount++)
{
StringBuilder rowAsString = new StringBuilder();
//string rowAsString = "";
for (byte colIndex = ReadExcel.ExcelIndex; colIndex <= colCount; colIndex++)
{
//rowAsString += dataArray[rowIndex, colIndex];
//rowAsString += (colIndex == colCount) ? "" : delimiter.ToString();
// Wrap in Quotes
rowAsString.Append($"\"{dataArray[rowIndex, colIndex]}\"");
if (colIndex == colCount)
{
rowAsString.Append(delimiter.ToString());
}
}
// Move to nextLine
//dataAsString.AppendLine();
//outfile.WriteLine(rowAsString);
dataList.Add(rowAsString.ToString());
}
//return dataAsString.ToString();
return dataList;
}
}
}
I've tried everything I've seen from searching online but everything gives me an OutOfMemroy exception when piecing together the rows (even just doing one row at a time and writing that). Is there a better way to efficiently and hopefully quickly write a large 2D array to a csv file?
Any and all tips are greatly appreciated.

With code like below I've never had any memory issues, even with huge data
class Program
{
static void Main(string[] args)
{
StreamWriter writer = new StreamWriter("Filename");
List<Record> data = new List<Record>();
foreach (Record record in data)
{
string line = string.Join(",", record.field1, record.field2, record.field3, record.field4, record.field5);
writer.WriteLine(line);
}
writer.Flush();
writer.Close();
}
}
public class Record
{
public string field1 { get; set; }
public string field2 { get; set; }
public string field3 { get; set; }
public string field4 { get; set; }
public string field5 { get; set; }
}

Related

Read lines of data from CSV then display data

I have to read info from a txt file, store it in a manner (array or list), then display the data. Program must include at least one additional class.
I've hit a wall and can't progress.
string, string, double, string
name,badge,salary,position
name,badge,salary,position
name,badge,salary,position
I'm sorry and I know the code below is disastrous but I'm at a loss and am running out of time.
namespace Employees
{
class Program
{
static void Main()
{
IndividualInfo collect = new IndividualInfo();
greeting();
collect.ReadInfo();
next();
for (int i = 0; i < 5; i++)
{
displayInfo(i);
}
exit();
void greeting()
{
Console.WriteLine("\nWelcome to the Software Development Company\n");
}
void next()
{
Console.WriteLine("\n*Press enter key to display information . . . *");
Console.Read();
}
void displayInfo(int i)
{
Console.WriteLine($"\nSoftware Developer {i + 1} Information:");
Console.WriteLine($"\nName:\t\t\t{collect.nameList[i]}");
}
void exit()
{
Console.WriteLine("\n\n*Press enter key to exit . . . *");
Console.Read();
Console.Read();
}
}
}
}
class IndividualInfo
{
public string Name { get; set; }
//public string Badge{ get; set; }
//public string Position{ get; set; }
//public string Salary{ get; set; }
public void ReadInfo()
{
int i = 0;
string inputLine;
string[] eachLine = new string[4];
string[,] info = new string[5, 4]; // 5 developers, 4x info each
StreamReader file = new StreamReader("data.txt");
while ((inputLine = file.ReadLine()) != null)
{
eachLine = inputLine.Split(',');
for (int x = 0; x < 5; x++)
{
eachLine[x] = info[i, x];
x++;
}
i++;
}
string name = info[i, 0];
string badge = info[i, 1];
string position = info[i, 2];
double salary = Double.Parse(info[i, 3]);
}
public List<string> nameList = new List<string>();
}
So far I think I can collect it with a two-dimensional array, but a List(s) would be better. Also, the code I've posted up there won't run because I can't yet figure out a way to get it to display. Which is why I'm here.
using System.IO;
static void Main(string[] args)
{
using(var reader = new StreamReader(#"C:\test.csv"))
{
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(';');
listA.Add(values[0]);
listB.Add(values[1]);
}
}
}
https://www.rfc-editor.org/rfc/rfc4180
or
using Microsoft.VisualBasic.FileIO;
var path = #"C:\Person.csv"; // Habeeb, "Dubai Media City, Dubai"
using (TextFieldParser csvParser = new TextFieldParser(path))
{
csvParser.CommentTokens = new string[] { "#" };
csvParser.SetDelimiters(new string[] { "," });
csvParser.HasFieldsEnclosedInQuotes = true;
// Skip the row with the column names
csvParser.ReadLine();
while (!csvParser.EndOfData)
{
// Read current line fields, pointer moves to the next line.
string[] fields = csvParser.ReadFields();
string Name = fields[0];
string Address = fields[1];
}
}
http://codeskaters.blogspot.ae/2015/11/c-easiest-csv-parser-built-in-net.html
or
LINQ way:
var lines = File.ReadAllLines("test.txt").Select(a => a.Split(';'));
var csv = from line in lines
select (from piece in line
select piece);
^^Wrong - Edit by Nick
It appears the original answerer was attempting to populate csv with a 2 dimensional array - an array containing arrays. Each item in the first array contains an array representing that line number with each item in the nested array containing the data for that specific column.
var csv = from line in lines
select (line.Split(',')).ToArray();
This question was fully addressed here:
Reading CSV file and storing values into an array

Compare text files in C# and remove duplicate lines

1.txt:
Origination,destination,datetime,price
YYZ,YTC,2016-04-01 12:30,$550
YYZ,YTC,2016-04-01 12:30,$550
LKC,LKP,2016-04-01 12:30,$550
2.txt:
Origination|destination|datetime|price
YYZ|YTC|2016-04-01 12:30|$550
AMV|YRk|2016-06-01 12:30|$630
LKC|LKP|2016-12-01 12:30|$990
I have two text files with ',' and '|' as separators, and I want to create a console app in C# which reads these two files when I pass an origination and destination location from command prompt.
While searching, I want to ignore duplicate lines, and I want to display the results in order by price.
The output should be { origination } -> { destination } -> datetime -> price
Need help how to perform.
Here's a simple solution that works for your example files. It doesn't have any error checking for if the file is in a bad format.
using System;
using System.Collections.Generic;
class Program
{
class entry
{
public string origin;
public string destination;
public DateTime time;
public double price;
}
static void Main(string[] args)
{
List<entry> data = new List<entry>();
//parse the input files and add the data to a list
ParseFile(data, args[0], ',');
ParseFile(data, args[1], '|');
//sort the list (by price first)
data.Sort((a, b) =>
{
if (a.price != b.price)
return a.price > b.price ? 1 : -1;
else if (a.origin != b.origin)
return string.Compare(a.origin, b.origin);
else if (a.destination != b.destination)
return string.Compare(a.destination, b.destination);
else
return DateTime.Compare(a.time, b.time);
});
//remove duplicates (list must be sorted for this to work)
int i = 1;
while (i < data.Count)
{
if (data[i].origin == data[i - 1].origin
&& data[i].destination == data[i - 1].destination
&& data[i].time == data[i - 1].time
&& data[i].price == data[i - 1].price)
data.RemoveAt(i);
else
i++;
}
//print the results
for (i = 0; i < data.Count; i++)
Console.WriteLine("{0}->{1}->{2:yyyy-MM-dd HH:mm}->${3}",
data[i].origin, data[i].destination, data[i].time, data[i].price);
Console.ReadLine();
}
private static void ParseFile(List<entry> data, string filename, char separator)
{
using (System.IO.FileStream fs = System.IO.File.Open(filename, System.IO.FileMode.Open))
using (System.IO.StreamReader reader = new System.IO.StreamReader(fs))
while (!reader.EndOfStream)
{
string[] line = reader.ReadLine().Split(separator);
if (line.Length == 4)
{
entry newitem = new entry();
newitem.origin = line[0];
newitem.destination = line[1];
newitem.time = DateTime.Parse(line[2]);
newitem.price = double.Parse(line[3].Substring(line[3].IndexOf('$') + 1));
data.Add(newitem);
}
}
}
}
I'm not 100% clear on what the output of your program is supposed to be, so I'll leave that part of the implementation up to you. My strategy was to use a constructor method that takes a string (that you will read from a file) and a delimiter (since it varies) and use that to create objects which you can manipulate (e.g. add to hash sets, etc).
PriceObject.cs
using System;
using System.Globalization;
namespace ConsoleApplication1
{
class PriceObject
{
public string origination { get; set; }
public string destination { get; set; }
public DateTime time { get; set; }
public decimal price { get; set; }
public PriceObject(string inputLine, char delimiter)
{
string[] parsed = inputLine.Split(new char[] { delimiter }, 4);
origination = parsed[0];
destination = parsed[1];
time = DateTime.ParseExact(parsed[2], "yyyy-MM-dd HH:mm", CultureInfo.InvariantCulture);
price = Decimal.Parse(parsed[3], NumberStyles.Currency, new CultureInfo("en-US"));
}
public override bool Equals(object obj)
{
var item = obj as PriceObject;
return origination.Equals(item.origination) &&
destination.Equals(item.destination) &&
time.Equals(item.time) &&
price.Equals(item.price);
}
public override int GetHashCode()
{
unchecked
{
var result = 17;
result = (result * 23) + origination.GetHashCode();
result = (result * 23) + destination.GetHashCode();
result = (result * 23) + time.GetHashCode();
result = (result * 23) + price.GetHashCode();
return result;
}
}
}
}
Program.cs
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
HashSet<PriceObject> list1 = new HashSet<PriceObject>();
HashSet<PriceObject> list2 = new HashSet<PriceObject>();
using (StreamReader reader = File.OpenText(args[0]))
{
string line = reader.ReadLine(); // this will remove the header row
while (!reader.EndOfStream)
{
line = reader.ReadLine();
if (String.IsNullOrEmpty(line))
continue;
// add each line to our list
list1.Add(new PriceObject(line, ','));
}
}
using (StreamReader reader = File.OpenText(args[1]))
{
string line = reader.ReadLine(); // this will remove the header row
while (!reader.EndOfStream)
{
line = reader.ReadLine();
if (String.IsNullOrEmpty(line))
continue;
// add each line to our list
list2.Add(new PriceObject(line, '|'));
}
}
// merge the two hash sets, order by price
list1.UnionWith(list2);
List<PriceObject> output = list1.ToList();
output.OrderByDescending(x => x.price).ToList();
// display output here, e.g. define your own ToString method, etc
foreach (var item in output)
{
Console.WriteLine(item.ToString());
}
Console.ReadLine();
}
}
}

Regex to get all "cells" form csv file row [duplicate]

Is there a default/official/recommended way to parse CSV files in C#? I don't want to roll my own parser.
Also, I've seen instances of people using ODBC/OLE DB to read CSV via the Text driver, and a lot of people discourage this due to its "drawbacks." What are these drawbacks?
Ideally, I'm looking for a way through which I can read the CSV by column name, using the first record as the header / field names. Some of the answers given are correct but work to basically deserialize the file into classes.
A CSV parser is now a part of .NET Framework.
Add a reference to Microsoft.VisualBasic.dll (works fine in C#, don't mind the name)
using (TextFieldParser parser = new TextFieldParser(#"c:\temp\test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
The docs are here - TextFieldParser Class
P.S. If you need a CSV exporter, try CsvExport (discl: I'm one of the contributors)
CsvHelper (a library I maintain) will read a CSV file into custom objects.
using (var reader = new StreamReader("path\\to\\file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
var records = csv.GetRecords<Foo>();
}
Sometimes you don't own the objects you're trying to read into. In this case, you can use fluent mapping because you can't put attributes on the class.
public sealed class MyCustomObjectMap : CsvClassMap<MyCustomObject>
{
public MyCustomObjectMap()
{
Map( m => m.Property1 ).Name( "Column Name" );
Map( m => m.Property2 ).Index( 4 );
Map( m => m.Property3 ).Ignore();
Map( m => m.Property4 ).TypeConverter<MySpecialTypeConverter>();
}
}
Let a library handle all the nitty-gritty details for you! :-)
Check out FileHelpers and stay DRY - Don't Repeat Yourself - no need to re-invent the wheel a gazillionth time....
You basically just need to define that shape of your data - the fields in your individual line in the CSV - by means of a public class (and so well-thought out attributes like default values, replacements for NULL values and so forth), point the FileHelpers engine at a file, and bingo - you get back all the entries from that file. One simple operation - great performance!
In a business application, i use the Open Source project on codeproject.com, CSVReader.
It works well, and has good performance. There is some benchmarking on the link i provided.
A simple example, copied from the project page:
using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));
Console.WriteLine();
}
}
As you can see, it's very easy to work with.
I know its a bit late but just found a library Microsoft.VisualBasic.FileIO which has TextFieldParser class to process csv files.
Here is a helper class I use often, in case any one ever comes back to this thread (I wanted to share it).
I use this for the simplicity of porting it into projects ready to use:
public class CSVHelper : List<string[]>
{
protected string csv = string.Empty;
protected string separator = ",";
public CSVHelper(string csv, string separator = "\",\"")
{
this.csv = csv;
this.separator = separator;
foreach (string line in Regex.Split(csv, System.Environment.NewLine).ToList().Where(s => !string.IsNullOrEmpty(s)))
{
string[] values = Regex.Split(line, separator);
for (int i = 0; i < values.Length; i++)
{
//Trim values
values[i] = values[i].Trim('\"');
}
this.Add(values);
}
}
}
And use it like:
public List<Person> GetPeople(string csvContent)
{
List<Person> people = new List<Person>();
CSVHelper csv = new CSVHelper(csvContent);
foreach(string[] line in csv)
{
Person person = new Person();
person.Name = line[0];
person.TelephoneNo = line[1];
people.Add(person);
}
return people;
}
[Updated csv helper: bug fixed where the last new line character created a new line]
If you need only reading csv files then I recommend this library: A Fast CSV Reader
If you also need to generate csv files then use this one: FileHelpers
Both of them are free and opensource.
This solution is using the official Microsoft.VisualBasic assembly to parse CSV.
Advantages:
delimiter escaping
ignores Header
trim spaces
ignore comments
Code:
using Microsoft.VisualBasic.FileIO;
public static List<List<string>> ParseCSV (string csv)
{
List<List<string>> result = new List<List<string>>();
// To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project.
using (TextFieldParser parser = new TextFieldParser(new StringReader(csv)))
{
parser.CommentTokens = new string[] { "#" };
parser.SetDelimiters(new string[] { ";" });
parser.HasFieldsEnclosedInQuotes = true;
// Skip over header line.
//parser.ReadLine();
while (!parser.EndOfData)
{
var values = new List<string>();
var readFields = parser.ReadFields();
if (readFields != null)
values.AddRange(readFields);
result.Add(values);
}
}
return result;
}
I have written TinyCsvParser for .NET, which is one of the fastest .NET parsers around and highly configurable to parse almost any CSV format.
It is released under MIT License:
https://github.com/bytefish/TinyCsvParser
You can use NuGet to install it. Run the following command in the Package Manager Console.
PM> Install-Package TinyCsvParser
Usage
Imagine we have list of Persons in a CSV file persons.csv with their first name, last name and birthdate.
FirstName;LastName;BirthDate
Philipp;Wagner;1986/05/12
Max;Musterman;2014/01/02
The corresponding domain model in our system might look like this.
private class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public DateTime BirthDate { get; set; }
}
When using TinyCsvParser you have to define the mapping between the columns in the CSV data and the property in you domain model.
private class CsvPersonMapping : CsvMapping<Person>
{
public CsvPersonMapping()
: base()
{
MapProperty(0, x => x.FirstName);
MapProperty(1, x => x.LastName);
MapProperty(2, x => x.BirthDate);
}
}
And then we can use the mapping to parse the CSV data with a CsvParser.
namespace TinyCsvParser.Test
{
[TestFixture]
public class TinyCsvParserTest
{
[Test]
public void TinyCsvTest()
{
CsvParserOptions csvParserOptions = new CsvParserOptions(true, new[] { ';' });
CsvPersonMapping csvMapper = new CsvPersonMapping();
CsvParser<Person> csvParser = new CsvParser<Person>(csvParserOptions, csvMapper);
var result = csvParser
.ReadFromFile(#"persons.csv", Encoding.ASCII)
.ToList();
Assert.AreEqual(2, result.Count);
Assert.IsTrue(result.All(x => x.IsValid));
Assert.AreEqual("Philipp", result[0].Result.FirstName);
Assert.AreEqual("Wagner", result[0].Result.LastName);
Assert.AreEqual(1986, result[0].Result.BirthDate.Year);
Assert.AreEqual(5, result[0].Result.BirthDate.Month);
Assert.AreEqual(12, result[0].Result.BirthDate.Day);
Assert.AreEqual("Max", result[1].Result.FirstName);
Assert.AreEqual("Mustermann", result[1].Result.LastName);
Assert.AreEqual(2014, result[1].Result.BirthDate.Year);
Assert.AreEqual(1, result[1].Result.BirthDate.Month);
Assert.AreEqual(1, result[1].Result.BirthDate.Day);
}
}
}
User Guide
A full User Guide is available at:
http://bytefish.github.io/TinyCsvParser/
Here is a short and simple solution.
using (TextFieldParser parser = new TextFieldParser(outputLocation))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
string[] headers = parser.ReadLine().Split(',');
foreach (string header in headers)
{
dataTable.Columns.Add(header);
}
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
dataTable.Rows.Add(fields);
}
}
Here is my KISS implementation...
using System;
using System.Collections.Generic;
using System.Text;
class CsvParser
{
public static List<string> Parse(string line)
{
const char escapeChar = '"';
const char splitChar = ',';
bool inEscape = false;
bool priorEscape = false;
List<string> result = new List<string>();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < line.Length; i++)
{
char c = line[i];
switch (c)
{
case escapeChar:
if (!inEscape)
inEscape = true;
else
{
if (!priorEscape)
{
if (i + 1 < line.Length && line[i + 1] == escapeChar)
priorEscape = true;
else
inEscape = false;
}
else
{
sb.Append(c);
priorEscape = false;
}
}
break;
case splitChar:
if (inEscape) //if in escape
sb.Append(c);
else
{
result.Add(sb.ToString());
sb.Length = 0;
}
break;
default:
sb.Append(c);
break;
}
}
if (sb.Length > 0)
result.Add(sb.ToString());
return result;
}
}
Some time ago I had wrote simple class for CSV read/write based on Microsoft.VisualBasic library. Using this simple class you will be able to work with CSV like with 2 dimensions array. You can find my class by the following link: https://github.com/ukushu/DataExporter
Simple example of usage:
Csv csv = new Csv("\t");//delimiter symbol
csv.FileOpen("c:\\file1.csv");
var row1Cell6Value = csv.Rows[0][5];
csv.AddRow("asdf","asdffffff","5")
csv.FileSave("c:\\file2.csv");
For reading header only you need is to read csv.Rows[0] cells :)
This code reads csv to DataTable:
public static DataTable ReadCsv(string path)
{
DataTable result = new DataTable("SomeData");
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
bool isFirstRow = true;
//IList<string> headers = new List<string>();
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
if (isFirstRow)
{
foreach (string field in fields)
{
result.Columns.Add(new DataColumn(field, typeof(string)));
}
isFirstRow = false;
}
else
{
int i = 0;
DataRow row = result.NewRow();
foreach (string field in fields)
{
row[i++] = field;
}
result.Rows.Add(row);
}
}
}
return result;
}
Single source file solution for straightforward parsing needs, useful. Deals with all the nasty edge cases. Such as new line normalization and handling new lines in quoted string literals. Your welcome!
If you CSV file has a header you just read out the column names (and compute column indexes) from the first row. Simple as that.
Note that Dump is a LINQPad method, you might want to remove that if you are not using LINQPad.
void Main()
{
var file1 = "a,b,c\r\nx,y,z";
CSV.ParseText(file1).Dump();
var file2 = "a,\"b\",c\r\nx,\"y,z\"";
CSV.ParseText(file2).Dump();
var file3 = "a,\"b\",c\r\nx,\"y\r\nz\"";
CSV.ParseText(file3).Dump();
var file4 = "\"\"\"\"";
CSV.ParseText(file4).Dump();
}
static class CSV
{
public struct Record
{
public readonly string[] Row;
public string this[int index] => Row[index];
public Record(string[] row)
{
Row = row;
}
}
public static List<Record> ParseText(string text)
{
return Parse(new StringReader(text));
}
public static List<Record> ParseFile(string fn)
{
using (var reader = File.OpenText(fn))
{
return Parse(reader);
}
}
public static List<Record> Parse(TextReader reader)
{
var data = new List<Record>();
var col = new StringBuilder();
var row = new List<string>();
for (; ; )
{
var ln = reader.ReadLine();
if (ln == null) break;
if (Tokenize(ln, col, row))
{
data.Add(new Record(row.ToArray()));
row.Clear();
}
}
return data;
}
public static bool Tokenize(string s, StringBuilder col, List<string> row)
{
int i = 0;
if (col.Length > 0)
{
col.AppendLine(); // continuation
if (!TokenizeQuote(s, ref i, col, row))
{
return false;
}
}
while (i < s.Length)
{
var ch = s[i];
if (ch == ',')
{
row.Add(col.ToString().Trim());
col.Length = 0;
i++;
}
else if (ch == '"')
{
i++;
if (!TokenizeQuote(s, ref i, col, row))
{
return false;
}
}
else
{
col.Append(ch);
i++;
}
}
if (col.Length > 0)
{
row.Add(col.ToString().Trim());
col.Length = 0;
}
return true;
}
public static bool TokenizeQuote(string s, ref int i, StringBuilder col, List<string> row)
{
while (i < s.Length)
{
var ch = s[i];
if (ch == '"')
{
// escape sequence
if (i + 1 < s.Length && s[i + 1] == '"')
{
col.Append('"');
i++;
i++;
continue;
}
i++;
return true;
}
else
{
col.Append(ch);
i++;
}
}
return false;
}
}
Another one to this list, Cinchoo ETL - an open source library to read and write multiple file formats (CSV, flat file, Xml, JSON etc)
Sample below shows how to read CSV file quickly (No POCO object required)
string csv = #"Id, Name
1, Carl
2, Tom
3, Mark";
using (var p = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in p)
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
Sample below shows how to read CSV file using POCO object
public partial class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
static void CSVTest()
{
string csv = #"Id, Name
1, Carl
2, Tom
3, Mark";
using (var p = ChoCSVReader<EmployeeRec>.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in p)
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
}
Please check out articles at CodeProject on how to use it.
This parser supports nested commas and quotes in a column:
static class CSVParser
{
public static string[] ParseLine(string line)
{
List<string> cols = new List<string>();
string value = null;
for(int i = 0; i < line.Length; i++)
{
switch(line[i])
{
case ',':
cols.Add(value);
value = null;
if(i == line.Length - 1)
{// It ends with comma
cols.Add(null);
}
break;
case '"':
cols.Add(ParseEnclosedColumn(line, ref i));
i++;
break;
default:
value += line[i];
if (i == line.Length - 1)
{// Last character
cols.Add(value);
}
break;
}
}
return cols.ToArray();
}//ParseLine
static string ParseEnclosedColumn(string line, ref int index)
{// Example: "b"",bb"
string value = null;
int numberQuotes = 1;
int index2 = index;
for (int i = index + 1; i < line.Length; i++)
{
index2 = i;
switch (line[i])
{
case '"':
numberQuotes++;
if (numberQuotes % 2 == 0)
{
if (i < line.Length - 1 && line[i + 1] == ',')
{
index = i;
return value;
}
}
else if (i > index + 1 && line[i - 1] == '"')
{
value += '"';
}
break;
default:
value += line[i];
break;
}
}
index = index2;
return value;
}//ParseEnclosedColumn
}//class CSVParser
Based on unlimit's post on How to properly split a CSV using C# split() function? :
string[] tokens = System.Text.RegularExpressions.Regex.Split(paramString, ",");
NOTE: this doesn't handle escaped / nested commas, etc., and therefore is only suitable for certain simple CSV lists.
If anyone wants a snippet they can plop into their code without having to bind a library or download a package. Here is a version I wrote:
public static string FormatCSV(List<string> parts)
{
string result = "";
foreach (string s in parts)
{
if (result.Length > 0)
{
result += ",";
if (s.Length == 0)
continue;
}
if (s.Length > 0)
{
result += "\"" + s.Replace("\"", "\"\"") + "\"";
}
else
{
// cannot output double quotes since its considered an escape for a quote
result += ",";
}
}
return result;
}
enum CSVMode
{
CLOSED = 0,
OPENED_RAW = 1,
OPENED_QUOTE = 2
}
public static List<string> ParseCSV(string input)
{
List<string> results;
CSVMode mode;
char[] letters;
string content;
mode = CSVMode.CLOSED;
content = "";
results = new List<string>();
letters = input.ToCharArray();
for (int i = 0; i < letters.Length; i++)
{
char letter = letters[i];
char nextLetter = '\0';
if (i < letters.Length - 1)
nextLetter = letters[i + 1];
// If its a quote character
if (letter == '"')
{
// If that next letter is a quote
if (nextLetter == '"' && mode == CSVMode.OPENED_QUOTE)
{
// Then this quote is escaped and should be added to the content
content += letter;
// Skip the escape character
i++;
continue;
}
else
{
// otherwise its not an escaped quote and is an opening or closing one
// Character is skipped
// If it was open, then close it
if (mode == CSVMode.OPENED_QUOTE)
{
results.Add(content);
// reset the content
content = "";
mode = CSVMode.CLOSED;
// If there is a next letter available
if (nextLetter != '\0')
{
// If it is a comma
if (nextLetter == ',')
{
i++;
continue;
}
else
{
throw new Exception("Expected comma. Found: " + nextLetter);
}
}
}
else if (mode == CSVMode.OPENED_RAW)
{
// If it was opened raw, then just add the quote
content += letter;
}
else if (mode == CSVMode.CLOSED)
{
// Otherwise open it as a quote
mode = CSVMode.OPENED_QUOTE;
}
}
}
// If its a comma seperator
else if (letter == ',')
{
// If in quote mode
if (mode == CSVMode.OPENED_QUOTE)
{
// Just read it
content += letter;
}
// If raw, then close the content
else if (mode == CSVMode.OPENED_RAW)
{
results.Add(content);
content = "";
mode = CSVMode.CLOSED;
}
// If it was closed, then open it raw
else if (mode == CSVMode.CLOSED)
{
mode = CSVMode.OPENED_RAW;
results.Add(content);
content = "";
}
}
else
{
// If opened quote, just read it
if (mode == CSVMode.OPENED_QUOTE)
{
content += letter;
}
// If opened raw, then read it
else if (mode == CSVMode.OPENED_RAW)
{
content += letter;
}
// It closed, then open raw
else if (mode == CSVMode.CLOSED)
{
mode = CSVMode.OPENED_RAW;
content += letter;
}
}
}
// If it was still reading when the buffer finished
if (mode != CSVMode.CLOSED)
{
results.Add(content);
}
return results;
}
For smaller input CSV data LINQ is fully enough.
For example for the following CSV file content:
schema_name,description,utype
"IX_HE","High-Energy data","x"
"III_spectro","Spectrosopic data","d"
"VI_misc","Miscellaneous","f"
"vcds1","Catalogs only available in CDS","d"
"J_other","Publications from other journals","b"
when we read the whole content into single string called data, then
using System;
using System.IO;
using System.Linq;
var data = File.ReadAllText(Path2CSV);
// helper split characters
var newline = Environment.NewLine.ToCharArray();
var comma = ",".ToCharArray();
var quote = "\"".ToCharArray();
// split input string data to lines
var lines = data.Split(newline);
// first line is header, take the header fields
foreach (var col in lines.First().Split(comma)) {
// do something with "col"
}
// we skip the first line, all the rest are real data lines/fields
foreach (var line in lines.Skip(1)) {
// first we split the data line by comma character
// next we remove double qoutes from each splitted element using Trim()
// finally we make an array
var fields = line.Split(comma)
.Select(_ => { _ = _.Trim(quote); return _; })
.ToArray();
// do something with the "fields" array
}

Removing quotes in file helpers

I have a .csv file(I have no control over the data) and for some reason it has everything in quotes.
"Date","Description","Original Description","Amount","Type","Category","Name","Labels","Notes"
"2/02/2012","ac","ac","515.00","a","b","","javascript://"
"2/02/2012","test","test","40.00","a","d","c",""," "
I am using filehelpers and I am wondering what the best way to remove all these quotes would be? Is there something that says "if I see quotes remove. If no quotes found do nothing"?
This messes with the data as I will have "\"515.00\"" with unneeded extra quotes(especially since I want in this case it to be a decimal not a string".
I am also not sure what the "javascript" is all about and why it was generated but this is from a service I have no control over.
edit
this is how I consume the csv file.
using (TextReader textReader = new StreamReader(stream))
{
engine.ErrorManager.ErrorMode = ErrorMode.SaveAndContinue;
object[] transactions = engine.ReadStream(textReader);
}
You can use the FieldQuoted attribute described best on the attributes page here. Note that the attribute can be applied to any FileHelpers field (even if it type Decimal). (Remember that the FileHelpers class describes the spec for your import file.. So when you mark a Decimal field as FieldQuoted, you are saying in the file, this field will be quoted.)
You can even specify whether or not the quotes are optional with
[FieldQuoted('"', QuoteMode.OptionalForBoth)]
Here is a console application which works with your data:
class Program
{
[DelimitedRecord(",")]
[IgnoreFirst(1)]
public class Format1
{
[FieldQuoted]
[FieldConverter(ConverterKind.Date, "d/M/yyyy")]
public DateTime Date;
[FieldQuoted]
public string Description;
[FieldQuoted]
public string OriginalDescription;
[FieldQuoted]
public Decimal Amount;
[FieldQuoted]
public string Type;
[FieldQuoted]
public string Category;
[FieldQuoted]
public string Name;
[FieldQuoted]
public string Labels;
[FieldQuoted]
[FieldOptional]
public string Notes;
}
static void Main(string[] args)
{
var engine = new FileHelperEngine(typeof(Format1));
// read in the data
object[] importedObjects = engine.ReadString(#"""Date"",""Description"",""Original Description"",""Amount"",""Type"",""Category"",""Name"",""Labels"",""Notes""
""2/02/2012"",""ac"",""ac"",""515.00"",""a"",""b"","""",""javascript://""
""2/02/2012"",""test"",""test"",""40.00"",""a"",""d"",""c"","""","" """);
// check that 2 records were imported
Assert.AreEqual(2, importedObjects.Length);
// check the values for the first record
Format1 customer1 = (Format1)importedObjects[0];
Assert.AreEqual(DateTime.Parse("2/02/2012"), customer1.Date);
Assert.AreEqual("ac", customer1.Description);
Assert.AreEqual("ac", customer1.OriginalDescription);
Assert.AreEqual(515.00, customer1.Amount);
Assert.AreEqual("a", customer1.Type);
Assert.AreEqual("b", customer1.Category);
Assert.AreEqual("", customer1.Name);
Assert.AreEqual("javascript://", customer1.Labels);
Assert.AreEqual("", customer1.Notes);
// check the values for the second record
Format1 customer2 = (Format1)importedObjects[1];
Assert.AreEqual(DateTime.Parse("2/02/2012"), customer2.Date);
Assert.AreEqual("test", customer2.Description);
Assert.AreEqual("test", customer2.OriginalDescription);
Assert.AreEqual(40.00, customer2.Amount);
Assert.AreEqual("a", customer2.Type);
Assert.AreEqual("d", customer2.Category);
Assert.AreEqual("c", customer2.Name);
Assert.AreEqual("", customer2.Labels);
Assert.AreEqual(" ", customer2.Notes);
}
}
(Note, your first line of data seems to have 8 fields instead of 9, so I marked the Notes field with FieldOptional).
Here’s one way of doing it:
string[] lines = new string[]
{
"\"Date\",\"Description\",\"Original Description\",\"Amount\",\"Type\",\"Category\",\"Name\",\"Labels\",\"Notes\"",
"\"2/02/2012\",\"ac\",\"ac\",\"515.00\",\"a\",\"b\",\"\",\"javascript://\"",
"\"2/02/2012\",\"test\",\"test\",\"40.00\",\"a\",\"d\",\"c\",\"\",\" \"",
};
string[][] values =
lines.Select(line =>
line.Trim('"')
.Split(new string[] { "\",\"" }, StringSplitOptions.None)
.ToArray()
).ToArray();
The lines array represents the lines in your sample. Each " character must be escaped as \" in C# string literals.
For each line, we start off by removing the first and last " characters, then proceed to split it into a collection of substrings, using the "," character sequence as the delimiter.
Note that the above code will not work if you have " characters occurring naturally within your values (even if escaped).
Edit: If your CSV is to be read from a stream, all your need to do is:
var lines = new List<string>();
using (var streamReader = new StreamReader(stream))
while (!streamReader.EndOfStream)
lines.Add(streamReader.ReadLine());
The rest of the above code would work intact.
Edit: Given your new code, check whether you’re looking for something like this:
for (int i = 0; i < transactions.Length; ++i)
{
object oTrans = transactions[i];
string sTrans = oTrans as string;
if (sTrans != null &&
sTrans.StartsWith("\"") &&
sTrans.EndsWith("\""))
{
transactions[i] = sTrans.Substring(1, sTrans.Length - 2);
}
}
I have the same predicament and I replace the quotes when I load the value into my list object:
using System;
using System.Collections.Generic;
using System.IO;
using System.Windows.Forms;
namespace WindowsFormsApplication6
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
LoadCSV();
}
private void LoadCSV()
{
List<string> Rows = new List<string>();
string m_CSVFilePath = "<Path to CSV File>";
using (StreamReader r = new StreamReader(m_CSVFilePath))
{
string row;
while ((row = r.ReadLine()) != null)
{
Rows.Add(row.Replace("\"", ""));
}
foreach (var Row in Rows)
{
if (Row.Length > 0)
{
string[] RowValue = Row.Split(',');
//Do something with values here
}
}
}
}
}
}
This code might help which I developed:
using (StreamReader r = new StreamReader("C:\\Projects\\Mactive\\Audience\\DrawBalancing\\CSVFiles\\Analytix_ABC_HD.csv"))
{
string row;
int outCount;
StringBuilder line=new StringBuilder() ;
string token="";
char chr;
string Eachline;
while ((row = r.ReadLine()) != null)
{
outCount = row.Length;
line = new StringBuilder();
for (int innerCount = 0; innerCount <= outCount - 1; innerCount++)
{
chr=row[innerCount];
if (chr != '"')
{
line.Append(row[innerCount].ToString());
}
else if(chr=='"')
{
token = "";
innerCount = innerCount + 1;
for (; innerCount < outCount - 1; innerCount++)
{
chr=row[innerCount];
if(chr=='"')
{
break;
}
token = token + chr.ToString();
}
if(token.Contains(",")){token=token.Replace(",","");}
line.Append(token);
}
}
Eachline = line.ToString();
Console.WriteLine(Eachline);
}
}

Parsing CSV files in C#, with header

Is there a default/official/recommended way to parse CSV files in C#? I don't want to roll my own parser.
Also, I've seen instances of people using ODBC/OLE DB to read CSV via the Text driver, and a lot of people discourage this due to its "drawbacks." What are these drawbacks?
Ideally, I'm looking for a way through which I can read the CSV by column name, using the first record as the header / field names. Some of the answers given are correct but work to basically deserialize the file into classes.
A CSV parser is now a part of .NET Framework.
Add a reference to Microsoft.VisualBasic.dll (works fine in C#, don't mind the name)
using (TextFieldParser parser = new TextFieldParser(#"c:\temp\test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
The docs are here - TextFieldParser Class
P.S. If you need a CSV exporter, try CsvExport (discl: I'm one of the contributors)
CsvHelper (a library I maintain) will read a CSV file into custom objects.
using (var reader = new StreamReader("path\\to\\file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
var records = csv.GetRecords<Foo>();
}
Sometimes you don't own the objects you're trying to read into. In this case, you can use fluent mapping because you can't put attributes on the class.
public sealed class MyCustomObjectMap : CsvClassMap<MyCustomObject>
{
public MyCustomObjectMap()
{
Map( m => m.Property1 ).Name( "Column Name" );
Map( m => m.Property2 ).Index( 4 );
Map( m => m.Property3 ).Ignore();
Map( m => m.Property4 ).TypeConverter<MySpecialTypeConverter>();
}
}
Let a library handle all the nitty-gritty details for you! :-)
Check out FileHelpers and stay DRY - Don't Repeat Yourself - no need to re-invent the wheel a gazillionth time....
You basically just need to define that shape of your data - the fields in your individual line in the CSV - by means of a public class (and so well-thought out attributes like default values, replacements for NULL values and so forth), point the FileHelpers engine at a file, and bingo - you get back all the entries from that file. One simple operation - great performance!
In a business application, i use the Open Source project on codeproject.com, CSVReader.
It works well, and has good performance. There is some benchmarking on the link i provided.
A simple example, copied from the project page:
using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));
Console.WriteLine();
}
}
As you can see, it's very easy to work with.
I know its a bit late but just found a library Microsoft.VisualBasic.FileIO which has TextFieldParser class to process csv files.
Here is a helper class I use often, in case any one ever comes back to this thread (I wanted to share it).
I use this for the simplicity of porting it into projects ready to use:
public class CSVHelper : List<string[]>
{
protected string csv = string.Empty;
protected string separator = ",";
public CSVHelper(string csv, string separator = "\",\"")
{
this.csv = csv;
this.separator = separator;
foreach (string line in Regex.Split(csv, System.Environment.NewLine).ToList().Where(s => !string.IsNullOrEmpty(s)))
{
string[] values = Regex.Split(line, separator);
for (int i = 0; i < values.Length; i++)
{
//Trim values
values[i] = values[i].Trim('\"');
}
this.Add(values);
}
}
}
And use it like:
public List<Person> GetPeople(string csvContent)
{
List<Person> people = new List<Person>();
CSVHelper csv = new CSVHelper(csvContent);
foreach(string[] line in csv)
{
Person person = new Person();
person.Name = line[0];
person.TelephoneNo = line[1];
people.Add(person);
}
return people;
}
[Updated csv helper: bug fixed where the last new line character created a new line]
If you need only reading csv files then I recommend this library: A Fast CSV Reader
If you also need to generate csv files then use this one: FileHelpers
Both of them are free and opensource.
This solution is using the official Microsoft.VisualBasic assembly to parse CSV.
Advantages:
delimiter escaping
ignores Header
trim spaces
ignore comments
Code:
using Microsoft.VisualBasic.FileIO;
public static List<List<string>> ParseCSV (string csv)
{
List<List<string>> result = new List<List<string>>();
// To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project.
using (TextFieldParser parser = new TextFieldParser(new StringReader(csv)))
{
parser.CommentTokens = new string[] { "#" };
parser.SetDelimiters(new string[] { ";" });
parser.HasFieldsEnclosedInQuotes = true;
// Skip over header line.
//parser.ReadLine();
while (!parser.EndOfData)
{
var values = new List<string>();
var readFields = parser.ReadFields();
if (readFields != null)
values.AddRange(readFields);
result.Add(values);
}
}
return result;
}
I have written TinyCsvParser for .NET, which is one of the fastest .NET parsers around and highly configurable to parse almost any CSV format.
It is released under MIT License:
https://github.com/bytefish/TinyCsvParser
You can use NuGet to install it. Run the following command in the Package Manager Console.
PM> Install-Package TinyCsvParser
Usage
Imagine we have list of Persons in a CSV file persons.csv with their first name, last name and birthdate.
FirstName;LastName;BirthDate
Philipp;Wagner;1986/05/12
Max;Musterman;2014/01/02
The corresponding domain model in our system might look like this.
private class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public DateTime BirthDate { get; set; }
}
When using TinyCsvParser you have to define the mapping between the columns in the CSV data and the property in you domain model.
private class CsvPersonMapping : CsvMapping<Person>
{
public CsvPersonMapping()
: base()
{
MapProperty(0, x => x.FirstName);
MapProperty(1, x => x.LastName);
MapProperty(2, x => x.BirthDate);
}
}
And then we can use the mapping to parse the CSV data with a CsvParser.
namespace TinyCsvParser.Test
{
[TestFixture]
public class TinyCsvParserTest
{
[Test]
public void TinyCsvTest()
{
CsvParserOptions csvParserOptions = new CsvParserOptions(true, new[] { ';' });
CsvPersonMapping csvMapper = new CsvPersonMapping();
CsvParser<Person> csvParser = new CsvParser<Person>(csvParserOptions, csvMapper);
var result = csvParser
.ReadFromFile(#"persons.csv", Encoding.ASCII)
.ToList();
Assert.AreEqual(2, result.Count);
Assert.IsTrue(result.All(x => x.IsValid));
Assert.AreEqual("Philipp", result[0].Result.FirstName);
Assert.AreEqual("Wagner", result[0].Result.LastName);
Assert.AreEqual(1986, result[0].Result.BirthDate.Year);
Assert.AreEqual(5, result[0].Result.BirthDate.Month);
Assert.AreEqual(12, result[0].Result.BirthDate.Day);
Assert.AreEqual("Max", result[1].Result.FirstName);
Assert.AreEqual("Mustermann", result[1].Result.LastName);
Assert.AreEqual(2014, result[1].Result.BirthDate.Year);
Assert.AreEqual(1, result[1].Result.BirthDate.Month);
Assert.AreEqual(1, result[1].Result.BirthDate.Day);
}
}
}
User Guide
A full User Guide is available at:
http://bytefish.github.io/TinyCsvParser/
Here is a short and simple solution.
using (TextFieldParser parser = new TextFieldParser(outputLocation))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
string[] headers = parser.ReadLine().Split(',');
foreach (string header in headers)
{
dataTable.Columns.Add(header);
}
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
dataTable.Rows.Add(fields);
}
}
Here is my KISS implementation...
using System;
using System.Collections.Generic;
using System.Text;
class CsvParser
{
public static List<string> Parse(string line)
{
const char escapeChar = '"';
const char splitChar = ',';
bool inEscape = false;
bool priorEscape = false;
List<string> result = new List<string>();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < line.Length; i++)
{
char c = line[i];
switch (c)
{
case escapeChar:
if (!inEscape)
inEscape = true;
else
{
if (!priorEscape)
{
if (i + 1 < line.Length && line[i + 1] == escapeChar)
priorEscape = true;
else
inEscape = false;
}
else
{
sb.Append(c);
priorEscape = false;
}
}
break;
case splitChar:
if (inEscape) //if in escape
sb.Append(c);
else
{
result.Add(sb.ToString());
sb.Length = 0;
}
break;
default:
sb.Append(c);
break;
}
}
if (sb.Length > 0)
result.Add(sb.ToString());
return result;
}
}
Some time ago I had wrote simple class for CSV read/write based on Microsoft.VisualBasic library. Using this simple class you will be able to work with CSV like with 2 dimensions array. You can find my class by the following link: https://github.com/ukushu/DataExporter
Simple example of usage:
Csv csv = new Csv("\t");//delimiter symbol
csv.FileOpen("c:\\file1.csv");
var row1Cell6Value = csv.Rows[0][5];
csv.AddRow("asdf","asdffffff","5")
csv.FileSave("c:\\file2.csv");
For reading header only you need is to read csv.Rows[0] cells :)
This code reads csv to DataTable:
public static DataTable ReadCsv(string path)
{
DataTable result = new DataTable("SomeData");
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
bool isFirstRow = true;
//IList<string> headers = new List<string>();
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
if (isFirstRow)
{
foreach (string field in fields)
{
result.Columns.Add(new DataColumn(field, typeof(string)));
}
isFirstRow = false;
}
else
{
int i = 0;
DataRow row = result.NewRow();
foreach (string field in fields)
{
row[i++] = field;
}
result.Rows.Add(row);
}
}
}
return result;
}
Single source file solution for straightforward parsing needs, useful. Deals with all the nasty edge cases. Such as new line normalization and handling new lines in quoted string literals. Your welcome!
If you CSV file has a header you just read out the column names (and compute column indexes) from the first row. Simple as that.
Note that Dump is a LINQPad method, you might want to remove that if you are not using LINQPad.
void Main()
{
var file1 = "a,b,c\r\nx,y,z";
CSV.ParseText(file1).Dump();
var file2 = "a,\"b\",c\r\nx,\"y,z\"";
CSV.ParseText(file2).Dump();
var file3 = "a,\"b\",c\r\nx,\"y\r\nz\"";
CSV.ParseText(file3).Dump();
var file4 = "\"\"\"\"";
CSV.ParseText(file4).Dump();
}
static class CSV
{
public struct Record
{
public readonly string[] Row;
public string this[int index] => Row[index];
public Record(string[] row)
{
Row = row;
}
}
public static List<Record> ParseText(string text)
{
return Parse(new StringReader(text));
}
public static List<Record> ParseFile(string fn)
{
using (var reader = File.OpenText(fn))
{
return Parse(reader);
}
}
public static List<Record> Parse(TextReader reader)
{
var data = new List<Record>();
var col = new StringBuilder();
var row = new List<string>();
for (; ; )
{
var ln = reader.ReadLine();
if (ln == null) break;
if (Tokenize(ln, col, row))
{
data.Add(new Record(row.ToArray()));
row.Clear();
}
}
return data;
}
public static bool Tokenize(string s, StringBuilder col, List<string> row)
{
int i = 0;
if (col.Length > 0)
{
col.AppendLine(); // continuation
if (!TokenizeQuote(s, ref i, col, row))
{
return false;
}
}
while (i < s.Length)
{
var ch = s[i];
if (ch == ',')
{
row.Add(col.ToString().Trim());
col.Length = 0;
i++;
}
else if (ch == '"')
{
i++;
if (!TokenizeQuote(s, ref i, col, row))
{
return false;
}
}
else
{
col.Append(ch);
i++;
}
}
if (col.Length > 0)
{
row.Add(col.ToString().Trim());
col.Length = 0;
}
return true;
}
public static bool TokenizeQuote(string s, ref int i, StringBuilder col, List<string> row)
{
while (i < s.Length)
{
var ch = s[i];
if (ch == '"')
{
// escape sequence
if (i + 1 < s.Length && s[i + 1] == '"')
{
col.Append('"');
i++;
i++;
continue;
}
i++;
return true;
}
else
{
col.Append(ch);
i++;
}
}
return false;
}
}
Another one to this list, Cinchoo ETL - an open source library to read and write multiple file formats (CSV, flat file, Xml, JSON etc)
Sample below shows how to read CSV file quickly (No POCO object required)
string csv = #"Id, Name
1, Carl
2, Tom
3, Mark";
using (var p = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in p)
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
Sample below shows how to read CSV file using POCO object
public partial class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
static void CSVTest()
{
string csv = #"Id, Name
1, Carl
2, Tom
3, Mark";
using (var p = ChoCSVReader<EmployeeRec>.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in p)
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
}
Please check out articles at CodeProject on how to use it.
This parser supports nested commas and quotes in a column:
static class CSVParser
{
public static string[] ParseLine(string line)
{
List<string> cols = new List<string>();
string value = null;
for(int i = 0; i < line.Length; i++)
{
switch(line[i])
{
case ',':
cols.Add(value);
value = null;
if(i == line.Length - 1)
{// It ends with comma
cols.Add(null);
}
break;
case '"':
cols.Add(ParseEnclosedColumn(line, ref i));
i++;
break;
default:
value += line[i];
if (i == line.Length - 1)
{// Last character
cols.Add(value);
}
break;
}
}
return cols.ToArray();
}//ParseLine
static string ParseEnclosedColumn(string line, ref int index)
{// Example: "b"",bb"
string value = null;
int numberQuotes = 1;
int index2 = index;
for (int i = index + 1; i < line.Length; i++)
{
index2 = i;
switch (line[i])
{
case '"':
numberQuotes++;
if (numberQuotes % 2 == 0)
{
if (i < line.Length - 1 && line[i + 1] == ',')
{
index = i;
return value;
}
}
else if (i > index + 1 && line[i - 1] == '"')
{
value += '"';
}
break;
default:
value += line[i];
break;
}
}
index = index2;
return value;
}//ParseEnclosedColumn
}//class CSVParser
Based on unlimit's post on How to properly split a CSV using C# split() function? :
string[] tokens = System.Text.RegularExpressions.Regex.Split(paramString, ",");
NOTE: this doesn't handle escaped / nested commas, etc., and therefore is only suitable for certain simple CSV lists.
If anyone wants a snippet they can plop into their code without having to bind a library or download a package. Here is a version I wrote:
public static string FormatCSV(List<string> parts)
{
string result = "";
foreach (string s in parts)
{
if (result.Length > 0)
{
result += ",";
if (s.Length == 0)
continue;
}
if (s.Length > 0)
{
result += "\"" + s.Replace("\"", "\"\"") + "\"";
}
else
{
// cannot output double quotes since its considered an escape for a quote
result += ",";
}
}
return result;
}
enum CSVMode
{
CLOSED = 0,
OPENED_RAW = 1,
OPENED_QUOTE = 2
}
public static List<string> ParseCSV(string input)
{
List<string> results;
CSVMode mode;
char[] letters;
string content;
mode = CSVMode.CLOSED;
content = "";
results = new List<string>();
letters = input.ToCharArray();
for (int i = 0; i < letters.Length; i++)
{
char letter = letters[i];
char nextLetter = '\0';
if (i < letters.Length - 1)
nextLetter = letters[i + 1];
// If its a quote character
if (letter == '"')
{
// If that next letter is a quote
if (nextLetter == '"' && mode == CSVMode.OPENED_QUOTE)
{
// Then this quote is escaped and should be added to the content
content += letter;
// Skip the escape character
i++;
continue;
}
else
{
// otherwise its not an escaped quote and is an opening or closing one
// Character is skipped
// If it was open, then close it
if (mode == CSVMode.OPENED_QUOTE)
{
results.Add(content);
// reset the content
content = "";
mode = CSVMode.CLOSED;
// If there is a next letter available
if (nextLetter != '\0')
{
// If it is a comma
if (nextLetter == ',')
{
i++;
continue;
}
else
{
throw new Exception("Expected comma. Found: " + nextLetter);
}
}
}
else if (mode == CSVMode.OPENED_RAW)
{
// If it was opened raw, then just add the quote
content += letter;
}
else if (mode == CSVMode.CLOSED)
{
// Otherwise open it as a quote
mode = CSVMode.OPENED_QUOTE;
}
}
}
// If its a comma seperator
else if (letter == ',')
{
// If in quote mode
if (mode == CSVMode.OPENED_QUOTE)
{
// Just read it
content += letter;
}
// If raw, then close the content
else if (mode == CSVMode.OPENED_RAW)
{
results.Add(content);
content = "";
mode = CSVMode.CLOSED;
}
// If it was closed, then open it raw
else if (mode == CSVMode.CLOSED)
{
mode = CSVMode.OPENED_RAW;
results.Add(content);
content = "";
}
}
else
{
// If opened quote, just read it
if (mode == CSVMode.OPENED_QUOTE)
{
content += letter;
}
// If opened raw, then read it
else if (mode == CSVMode.OPENED_RAW)
{
content += letter;
}
// It closed, then open raw
else if (mode == CSVMode.CLOSED)
{
mode = CSVMode.OPENED_RAW;
content += letter;
}
}
}
// If it was still reading when the buffer finished
if (mode != CSVMode.CLOSED)
{
results.Add(content);
}
return results;
}
For smaller input CSV data LINQ is fully enough.
For example for the following CSV file content:
schema_name,description,utype
"IX_HE","High-Energy data","x"
"III_spectro","Spectrosopic data","d"
"VI_misc","Miscellaneous","f"
"vcds1","Catalogs only available in CDS","d"
"J_other","Publications from other journals","b"
when we read the whole content into single string called data, then
using System;
using System.IO;
using System.Linq;
var data = File.ReadAllText(Path2CSV);
// helper split characters
var newline = Environment.NewLine.ToCharArray();
var comma = ",".ToCharArray();
var quote = "\"".ToCharArray();
// split input string data to lines
var lines = data.Split(newline);
// first line is header, take the header fields
foreach (var col in lines.First().Split(comma)) {
// do something with "col"
}
// we skip the first line, all the rest are real data lines/fields
foreach (var line in lines.Skip(1)) {
// first we split the data line by comma character
// next we remove double qoutes from each splitted element using Trim()
// finally we make an array
var fields = line.Split(comma)
.Select(_ => { _ = _.Trim(quote); return _; })
.ToArray();
// do something with the "fields" array
}

Categories