CSVHelper BadDataFound in a valid csv - c#

Our customer started reporting bugs with importing data from CSV file. After seeing the csv file, we decided to switch from custom CSV parser to CSVHelper, but the CSV Helper can't read some valid CSV files.
The users are able to load any csv file into our application, so we can't use any class mapper. We use csv.Parser.Read to read string[] dataRows. We can't change a way how this csv file is generated, it is generated by another company and we can't convince them to change the generation when this file is in a valid format.
If we youse BadDataFound handler, the context.RawRecord is:
"1000084;SMRSTOVACI TRUBICE PBF 12,7/6,4 (1/2\") H;"
the data row in csv file is:
1000084;SMRSTOVACI TRUBICE PBF 12,7/6,4 (1/2") H;;;ks;21,59;26,46;21.00;;;8591735015183;8591735015183;Technik;Kabelový spojovací materiál;Označování, smršťování, izolace;Bužírky, smršťovačky;
This should be a valid csv file by RFC 4180.
The code is:
using (var reader = new StreamReader(filePath, Encoding.Default))
{
using (var csv = new CsvReader(reader))
{
csv.Read();
csv.ReadHeader();
List<string> badRecord = new List<string>();
csv.Configuration.BadDataFound = context => badRecord.Add(context.RawRecord);
header = csv.Context.HeaderRecord.ToList();
while (true)
{
var dataRow = csv.Parser.Read();
if (dataRow == null)
{
break;
}
data.Add(dataRow);
}
}
}
Can you help me to configure CSVHelper to be able to load this row to string[]? Or can you suggest different parse which will be able to do that?
Thank you

I believe it is the quote in the middle of the row that is causing the issue. Try setting the configuration to ignore quotes.
using (var reader = new StreamReader(filePath, Encoding.Default))
{
using (var csv = new CsvReader(reader))
{
csv.Configuration.Delimiter = ";";
csv.Configuration.IgnoreQuotes = true;
csv.Read();
csv.ReadHeader();
List<string> badRecord = new List<string>();
csv.Configuration.BadDataFound = context => badRecord.Add(context.RawRecord);
header = csv.Context.HeaderRecord.ToList();
while (true)
{
var dataRow = csv.Parser.Read();
if (dataRow == null)
{
break;
}
data.Add(dataRow);
}
}
}
Updated for version 27.2.1
using (var reader = new StreamReader(filePath, Encoding.Default))
{
List<string> badRecord = new List<string>();
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = ";",
Mode = CsvMode.NoEscape,
BadDataFound = context => badRecord.Add(context.RawRecord)
};
using (var csv = new CsvReader(reader, config))
{
csv.Read();
csv.ReadHeader();
header = csv.Context.Reader.HeaderRecord.ToList();
while (csv.Parser.Read())
{
data.Add(csv.Parser.Record);
}
}
}

Related

How to merge mutliple CSV files into one with newline after each dataset

I wrote a method which creates a list of strings. The string's values are accountance data.
When I click on a button, a new .csv-file will be created.
It looks like this:
As you can see, there is no newline carriage return feed at the end of the line.
I would like to combine all of these .csv files to 1, each dataset for 1 row.
I tried that manually with this simple cmd copy command copy *.csv allcsv.csv but they are all appended in the first row instead of added to the next row:
What do I need to add/change in my code to include the newline character at the end of each row?
How could I include the cmd copy command in my method the easiest way possible?
private void BuchungssatzBilden(object obj)
{
//Lieferschein-Buchungswerte in Liste speichern
List<string> bs = new List<string>();
bs.Add(SelItem.Umsatz.ToString());
bs.Add(SelItem.Gegenkonto);
bs.Add(SelItem.Beleg);
bs.Add(SelItem.Buchungsdatum);
bs.Add(SelItem.Konto);
bs.Add(SelItem.Kost1);
bs.Add(SelItem.Kost2);
bs.Add(SelItem.Text);
using (var stream = new MemoryStream())
using (var reader = new StreamReader(stream))
using (var sr = new StreamWriter(#"C:\" + SelItem.Beleg + SelItem.Text + SelItem.Hv + ".csv", true, Encoding.UTF8))
{
using (var csv = new CsvWriter(sr, System.Globalization.CultureInfo.CurrentCulture))
{
//csv.Configuration.Delimiter = ";";
//csv.Configuration.HasHeaderRecord = true;
foreach (var s in bs)
{
csv.WriteField(s);
}
csv.Flush();
stream.Position = 0;
reader.ReadToEnd();
}
}
MessageBox.Show("CSV erfolgreich erstellt!");
}

Compare two csv files in C#

I want to compare two csv files and print the differences in a file. I currently use the code below to remove a row. Can I change this code so that it compares two csv files or is there a better way in c# to compare csv files?
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(System.IO.File.OpenRead(path)))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.Contains(csvseperator))
{
string[] split = line.Split(Convert.ToChar(scheidingsteken));
if (split[selectedRow] == value)
{
}
else
{
line = string.Join(csvseperator, split);
lines.Add(line);
}
}
}
}
using (StreamWriter writer = new StreamWriter(path, false))
{
foreach (string line in lines)
writer.WriteLine(line);
}
}
Here is another way to find differences between CSV files, using Cinchoo ETL - an open source library
For the below sample CSV files
sample1.csv
id,name
1,Tom
2,Mark
3,Angie
sample2.csv
id,name
1,Tom
2,Mark
4,Lu
METHOD 1:
Using Cinchoo ETL, below code shows how to find differences between rows by all columns
var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray();
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray();
using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader())
{
output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default));
output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default));
}
sampleDiff.csv
id,name
3,Angie
4,Lu
Sample fiddle: https://dotnetfiddle.net/nwLeJ2
METHOD 2:
If you want to do the differences by id column,
var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray();
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray();
using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader())
{
output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
}
Sample fiddle: https://dotnetfiddle.net/t6mmJW
If you only want to compare one column you can use this code:
List<string> lines = new List<string>();
List<string> lines2 = new List<string>();
try
{
StreamReader reader = new StreamReader(System.IO.File.OpenRead(pad));
StreamReader read = new StreamReader(System.IO.File.OpenRead(pad2));
string line;
string line2;
//With this you can change the cells you want to compair
int comp1 = 1;
int comp2 = 1;
while ((line = reader.ReadLine()) != null && (line2 = read.ReadLine()) != null)
{
string[] split = line.Split(Convert.ToChar(seperator));
string[] split2 = line2.Split(Convert.ToChar(seperator));
if (line.Contains(seperator) && line2.Contains(seperator))
{
if (split[comp1] != split2[comp2])
{
//It is not the same
}
else
{
//It is the same
}
}
}
reader.Dispose();
read.Dispose();
}
catch
{
}

Reading only headers from csv

I am trying to read only headers from a csv file using CSVHELPER but i am unable to get GetFieldHeaders() method of csvhelper.
I have taken code from this link :Source
public static String[] GetHeaders(string filePath)
{
using (CsvReader csv = new CsvReader(new StreamReader("data.csv")))
{
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();//Error:doesnt contains definition
}
}
But GetFieldHeaders is not working.
Note: I only want to read headers from csv file
Update : Headers in my csv files are like below :
Id,Address,Name,Rank,Degree,Fahrenheit,Celcius,Location,Type,Stats
So can anybody tell me what i am missing??
Please try below code ... hope this will help you.
var csv = new CsvReader(new StreamReader("YOUR FILE PATH"));
csv.ReadHeader();
var headers = csv.Parser.RawRecord;
Note: headers will return all headers together.. you will need to make substring(s) for each comma to get each header separately.
I did not try to use this library. But quick overview of documentation brought this possible solution:
public static String[] GetHeaders(string filePath)
{
using (CsvReader csv = new CsvReader(new StreamReader("data.csv")))
{
csv.Configuration.HasHeaderRecord = true;
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
}
}
* See documentation and search for header.
You can try this instead:
public IEnumerable<string> ReadHeaders(string path)
{
using (var reader = new StreamReader(path))
{
var csv = new CsvReader(reader);
if (csv.Read())
return csv.FieldHeaders;
}
return null;
}
All of the methods in the other answers seem to have been removed in the latest version?
This is what worked for me:
using (var fileReader = File.OpenText("data.csv"))
{
var csv = new CsvReader(fileReader);
csv.Configuration.HasHeaderRecord = true;
csv.Read();
csv.ReadHeader();
string[] headers = ((CsvFieldReader)((CsvParser)csv.Parser).FieldReader).Context.HeaderRecord;
}

Parsing CSV File with \" in C#

I'm using VB's TextField in C# to parse a CSV file. But I am getting an error when it gets to \"
using (TextFieldParser csvReader = new TextFieldParser(csvFilePath)) {
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datacolumn = new DataColumn(column);
datacolumn.AllowDBNull = true;
csvData.Columns.Add(datacolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
And this is the line in the csv that is causing the error:
"101","Brake System","Level should be between \"MIN\" and \"MAX\" marks."
I don't know how to deal with the \" in C# using TextFieldParser
If the csv file will fit into memory, you could read it in, replace each \" with "", and use a MemoryStream as the input to the the TextFieldParser:
string data = File.ReadAllText(#"C:\temp\csvdata.txt").Replace("\\\"", "\"\"");
//TODO: Use the correct Encoding.
using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(data)))
{
using (TextFieldParser csvReader = new TextFieldParser(ms))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string s in colFields)
{
Console.WriteLine(s);
}
}
}
Which, for your example data, outputs
101
Brake System
Level should be between "MIN" and "MAX" marks.
If you don't mind using a different library, Ctl.Data has a mode (parseMidQuotes: true) specifically to allow parsing broken CSV like this.
using (StreamReader sr = new StreamReader("data.csv"))
{
var reader = new CsvReader<Record>(sr, parseMidQuotes: true, readHeader: false);
while (reader.Read())
{
Record rec = reader.CurrentObject.Value;
rec.Description = rec.Description.Replace("\\\"", "\"");
// use record...
}
}
And define your Record object:
(Normally it would match the header of the file to the properties, but in your case with a headerless file you need to specify the order with the Column attribute.
class Record
{
[Column(Order = 0)]
public int Id { get; set; }
[Column(Order = 1)]
public string Category { get; set; }
[Column(Order = 2)]
public string Description { get; set; }
}
(Disclaimer: I'm the author of said library)
Here's how to do it using two different methods without using TextFieldParser. TextFieldParser is very slow and not recommended for use in an actual production application.
Here's the simpler method using just String methods, and assuming that it's delimited with , without any quotes or any other special CSV formatting.
FileInfo file = new FileInfo("myfile.csv");
using (TextReader reader = file.OpenText())
{
for(String line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
string[] fields = line.Split(new[] {','});
foreach(String f in fields)
{
//do whatever you need for each field
}
}
}
Now if you want to use CsvHelper (available on nuget) becaues you have a more complicated CSV file with things like quoted field, headers, or if the rows of your CSV can map directly to an object that you have then this library might help you.
Not Mapped Example:
FileInfo file = new FileInfo("myfile.csv");
using (TextReader reader = file.OpenText())
using (CsvReader csv = new CsvReader(reader))
{
csv.Configuration.Delimiter = ",";
csv.Configuration.HasHeaderRecord = false;
csv.Configuration.IgnoreQuotes = true; //if you don't use field quoting
csv.Configuration.TrimFields = true; //trim fields as you read them
csv.Configuration.WillThrowOnMissingField = false; //otherwise null fields aren't allowed
while(csv.Read())
{
myStringVar = csv.GetField<string>(0); //gets first field as string
myIntVar = csv.GetField<int>(1); //gets second field as int
... //etc, you get the idea
}
}
Mapped Example:
Mapping Class- Assumes you have a class named MyClass with the fields named field1, field2, field3
public sealed class MyClassMap : CsvClassMap<MyClass>
{
public MyClassMap()
{
Map(m => m.field1).Index(0);
Map(m => m.field2).Index(1);
Map(m => m.field3).Index(2);
}
}
Parsing Code
FileInfo file = new FileInfo("myfile.csv");
using (TextReader reader = file.OpenText())
using (CsvReader csv = new CsvReader(reader))
{
csv.Configuration.Delimiter = ",";
csv.Configuration.HasHeaderRecord = false;
csv.Configuration.IgnoreQuotes = true; //if you don't use field quoting
csv.Configuration.TrimFields = true; //trim fields as you read them
csv.Configuration.WillThrowOnMissingField = false; //otherwise null fields aren't allowed
csv.Configuration.RegisterClassMap<MyClassMap>(); //adds our mapping class to the reader
while(csv.Read())
{
myObject = csv.GetRecord<MyClass>();
//do whatever here
}
}
Both of these methods won't care that you have any strange characters like \ in your csv file.
Disclaimer: I have no relation to CsvHelper, but have had success with it in a few projects in the past in which it has made my life much easier

Reading/writing CSV/tab delimited files in c#

I need to read from a CSV/Tab delimited file and write to such a file as well from .net.
The difficulty is that I don't know the structure of each file and need to write the cvs/tab file to a datatable, which the FileHelpers library doesn't seem to support.
I've already written it for Excel using OLEDB, but can't really see a way to write a tab file for this, so will go back to a library.
Can anyone help with suggestions?
.NET comes with a CSV/tab delminited file parser called the TextFieldParser class.
http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
It supports the full RFC for CSV files and really good error reporting.
I used this CsvReader, it is really great and well configurable. It behaves well with all kinds of escaping for strings and separators. The escaping in other quick and dirty implementations were poor, but this lib is really great at reading. With a few additional codelines you can also add a cache if you need to.
Writing is not supported but it rather trivial to implement yourself. Or inspire yourself from this code.
Simple example with CsvHelper
using (TextWriter writer = new StreamWriter(filePath)
{
var csvWriter = new CsvWriter(writer);
csvWriter.Configuration.Delimiter = "\t";
csvWriter.Configuration.Encoding = Encoding.UTF8;
csvWriter.WriteRecords(exportRecords);
}
Here are a couple CSV reader implementations:
http://www.codeproject.com/KB/database/CsvReader.aspx
http://www.heikniemi.fi/jhlib/ (just one part of the library; includes a CSV writer too)
I doubt there is a standard way to convert CSV to DataTable or database 'automatically', you'll have to write code to do that. How to do that is a separate question.
You'll create your datatable in code, and (presuming a header row) can create columns based on your first line in the file. After that, it will simply be a matter of reading the file and creating new rows based on the data therein.
You could use something like this:
DataTable Tbl = new DataTable();
using(StreamReader sr = new StreamReader(path))
{
int count = 0;
string headerRow = sr.Read();
string[] headers = headerRow.split("\t") //Or ","
foreach(string h in headers)
{
DataColumn dc = new DataColumn(h);
Tbl.Columns.Add(dc);
count++;
}
while(sr.Peek())
{
string data = sr.Read();
string[] cells = data.Split("\t")
DataRow row = new DataRow();
foreach(string c in cells)
{
row.Columns.Add(c);
}
Tbl.Rows.Add(row);
}
}
The above code has not been compiled, so it may have some errors, but it should get you on the right track.
You can read and write csv file..
This may be helpful for you.
pass split char to this parameter "serparationChar"
Example : -
private DataTable dataTable = null;
private bool IsHeader = true;
private string headerLine = string.Empty;
private List<string> AllLines = new List<string>();
private StringBuilder sb = new StringBuilder();
private char seprateChar = ',';
public DataTable ReadCSV(string path, bool IsReadHeader, char serparationChar)
{
seprateChar = serparationChar;
IsHeader = IsReadHeader;
using (StreamReader sr = new StreamReader(path,Encoding.Default))
{
while (!sr.EndOfStream)
{
AllLines.Add( sr.ReadLine());
}
createTemplate(AllLines);
}
return dataTable;
}
public void WriteCSV(string path,DataTable dtable,char serparationChar)
{
AllLines = new List<string>();
seprateChar = serparationChar;
List<string> StableHeadrs = new List<string>();
int colCount = 0;
using (StreamWriter sw = new StreamWriter(path))
{
foreach (DataColumn col in dtable.Columns)
{
sb.Append(col.ColumnName);
if(dataTable.Columns.Count-1 > colCount)
sb.Append(seprateChar);
colCount++;
}
AllLines.Add(sb.ToString());
for (int i = 0; i < dtable.Rows.Count; i++)
{
sb.Clear();
for (int j = 0; j < dtable.Columns.Count; j++)
{
sb.Append(Convert.ToString(dtable.Rows[i][j]));
if (dataTable.Columns.Count - 1 > j)
sb.Append(seprateChar);
}
AllLines.Add(sb.ToString());
}
foreach (string dataline in AllLines)
{
sw.WriteLine(dataline);
}
}
}
private DataTable createTemplate(List<string> lines)
{
List<string> headers = new List<string>();
dataTable = new DataTable();
if (lines.Count > 0)
{
string[] argHeaders = null;
for (int i = 0; i < lines.Count; i++)
{
if (i > 0)
{
DataRow newRow = dataTable.NewRow();
// others add to rows
string[] argLines = lines[i].Split(seprateChar);
for (int b = 0; b < argLines.Length; b++)
{
newRow[b] = argLines[b];
}
dataTable.Rows.Add(newRow);
}
else
{
// header add to columns
argHeaders = lines[0].Split(seprateChar);
foreach (string c in argHeaders)
{
DataColumn column = new DataColumn(c, typeof(string));
dataTable.Columns.Add(column);
}
}
}
}
return dataTable;
}
I have found best solution
http://www.codeproject.com/Articles/415732/Reading-and-Writing-CSV-Files-in-Csharp
Just I had to re-write
void ReadTest()
{
// Read sample data from CSV file
using (CsvFileReader reader = new CsvFileReader("ReadTest.csv"))
{
CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
Console.Write(s);
Console.Write(" ");
}
Console.WriteLine();
row = new CsvRow(); //this line added
}
}
}
Well, there is another library Cinchoo ETL - an open source one, for reading and writing CSV files.
Couple of ways you can read CSV files
Id, Name
1, Tom
2, Mark
This is how you can use this library to read it
using (var reader = new ChoCSVReader("emp.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO object defined to match up with CSV file like below
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can parse the same file using this POCO class as below
using (var reader = new ChoCSVReader<Employee>("emp.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library

Categories