CsvHelper delimeter character same as end of line character - c#

I've run into an issue while parsing some csv-like files that I know how to fix, but like to confirm if that's the appropriate way to do.
The file structure
The file I'm trying to parse has a structure similar to .csv in that it's values are separated with a delimeter (in my case it's |), but different to the ones I've previously seen is that it also has a delimeter at the end of the line, e.g:
Column1|Column2|Column3|
Row1Val1|Row1Val2|Row1Val3|
Row2Val1|Row2Val2|Row2Val3|
The issue
The problem arose when I wrote some unit tests to cover my service that wraps over the CsvHelper library. Apparently there is some issue when I provide the following configuration:
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|",
HasHeaderRecord = true,
NewLine = "|\r\n"
};
With the above configuration, csvReader.GetRecords() returns no results. I believe that's because the order of operations for the parser is to first look for columns, then end of line - and it tries to parse empty column without realizing it's actually part of the delimeter.
(I can paste the code for the getRecords call as well, but it's basically generic code taken from examples - the only difference is I'm using System.IO.Abstractions library for easier unit testing)
The attempts to solve the problem
If I remove the NewLine configuration value, parser works fine when reading the file (even if it has end-of-line delimeter character at the end). Then, however, my "write CSV" tests break, since CsvHelper no longer is adding proper line endings to the file.
The question(s)
Is there any way I can configure CsvHelper to cover both cases with one configuration, or should I basically use two different configurations, depending on whether I'm writing to CSV or reading from it? This seems a little bit counter-intuitive for me, since it's basically the same format I'm trying to follow, but different configurations are expected?

You could manually write the empty column for each line and then you could keep the configuration the same for reading and writing.
void Main()
{
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|"
};
var records = new List<MyClass>
{
new MyClass {Column1 = "Row1Val1", Column2 = "Row1Val2", Column3 = "Row1Val3"},
new MyClass {Column1 = "Row2Val1", Column2 = "Row2Val2", Column3 = "Row2Val3"}
};
using (var writer = new StreamWriter("file.csv"))
using (var csv = new CsvWriter(writer, config))
{
csv.WriteHeader<MyClass>();
csv.WriteField(string.Empty);
foreach (var record in records)
{
csv.NextRecord();
csv.WriteRecord(record);
csv.WriteField(string.Empty);
}
}
using (var reader = new StreamReader("file.csv"))
using (var csv = new CsvReader(reader, config))
{
var importRecords = csv.GetRecords<MyClass>();
importRecords.Dump();
}
}
public class MyClass
{
public string Column1 { get; set; }
public string Column2 { get; set; }
public string Column3 { get; set; }
}

Related

C# - LinQ - Read text files, group by first column and order by last column?

hi have a text files that contains 3 columns something like this:
contract1;pdf1;63
contract1;pdf2;5
contract1;pdf3;2
contract1;pdf4;00
contract2;pdf1;2
contract2;pdf2;30
contract2;pdf3;5
contract2;pdf4;80
now, i want to write those information into another text files ,and the output will be order put for first the records with the last column in "2,5", something like this:
contract1;pdf3;2
contract1;pdf2;5
contract1;pdf1;63
contract1;pdf4;00
contract2;pdf1;2
contract2;pdf3;5
contract2;pdf2;30
contract2;pdf4;80
how can i do?
thanks
You can use LINQ to group and sort the lines after reading, then put them back together:
var output = File.ReadAllLines(#"path-to-file")
.Select(s => s.Split(';'))
.GroupBy(s => s[0])
.SelectMany(sg => sg.OrderBy(s => s[2] == "2" ? "-" : s[2] == "5" ? "+" : s[2]).Select(sg => String.Join(";", sg)));
Then just write them to a file.
I'm not going to write your program for you, but I would recommend this library for reading and writing delimited files:
https://joshclose.github.io/CsvHelper/getting-started/
When you new up the reader make sure to specify your semi-colon delimiter:
using (var reader = new StreamReader("path\\to\\input_file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = ";";
var records = csv.GetRecords<Row>();
// manipulate the data as needed here
}
Your "Row" class (choose a more appropriate name for clarity) will specify the schema of the flat file. It sounds like you don't have headers? If not, you can specify the Order of each item.
public class Row
{
[Index(1)]
public string MyValue1 { get; set; }
[Index(2)]
public string MyValue2 { get; set; }
}
After reading the data in, you can manipulate it as needed. If the output format is different from the input format, you should convert the input class into an output class. You can use the Automapper library if you would like. However, for a simple project I would suggest to just manually convert the input class into the output class.
Lastly, write the data back out:
using (var writer = new StreamWriter("path\\to\\output_file.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}

C# ExcelWorksheet cell value is read incorrectly

I am parsing a csv file using C# and ExcelWorksheet.
I have a cell that contains an integer. 3020191002155959391100
When I parse the cell using
var value = sheet.Cells[rowNumber, columnNumber.Column].Value;
value is 3.0201910021559592E+21
when I parse the cell using sheet.Cells[rowNumber, columnNumber.Column].Text;
the value is 3020191002155960000000
how do I prevent the rounding off?
The maximum value of an int in C# is
int.MaxValue: 2,147,483,647
Source: https://www.dotnetperls.com/int-maxvalue
Therefore your number is too big to be read as an int.
However, upon reading your comments it appears that you're using the Excel reader to read a CSV file, which is the wrong tool for the job. Use a CSV parser such as CSVHelper (https://joshclose.github.io/CsvHelper/) which will make your life easy.
Here's an example of how to read such long numbers as string using CSVHelper.
First I'll create a class to match your CSV file. I created a dummy CSV file that looks like the following. Simply, three long numbers in a row.
3020191002155959391100,3020191002155959391101,3020191002155959391102
Now I create a class like so:
class CSVRecord
{
public string Data1 { get; set; }
public string Data2 { get; set; }
public string Data3 { get; set; }
}
Then you can read all the records in one go. Note the csv.Configuration settings, depending on the complexity of the file you read you'll have to change those. I advice you to read the documentation and examples on the link provided.
var file = #"data.csv";
var records = new List<CSVRecord>();
using (var reader = new StreamReader(file))
{
using (var csv = new CsvReader(reader))
{
csv.Configuration.HasHeaderRecord = false;
records = csv.GetRecords<CSVRecord>().ToList();
}
}
I needed to set the specific column types to string, before processing each cell.
columnDataTypes.Add(eDataTypes.String);

How to trim all column values with CsvEngine.CsvToDataTable()?

I am using FileHelpers 3.3.1 to import CSV data and populate DataTables in my c# app. It works well, and here is how I'm calling it:
DataTable dt = CsvEngine.CsvToDataTable(fullPath, ',');
The problem is that some column values have padding, as in spaces to the left and/or right side of the values, and those spaces are not being trimmed. My CSV files are large and performance of my importer app is important, so I really want to avoid looping through the datatable after the fact and trimming every column value of every row.
Is there a way to invoke a "trim all column values automatically" during the call to CsvToDataTable()?
I know there is a FieldTrim attribute that does this very thing, but I cannot bind rigid classes to my CSV files because I have many different CSV files and they all have different column names and data types. So that's not a practical option for me. It seems like there would be a built-in way to trim using one of the generic CSV parsers like CsvToDataTable().
What is my best option?
The FileHelpers CsvEngine class is quite limited. It is a sealed class so you cannot easily inherit or override from it.
If you don't mind a hacky solution, the following works
// Set the internal TrimChars via reflection
public static class FileBaseExtensions
{
public static void SetTrimCharsViaReflection(this FieldBase field, Char [] value)
{
var prop = typeof(FieldBase).GetProperty("TrimChars", BindingFlags.NonPublic | BindingFlags.Instance);
prop.SetValue(field, value);
}
}
CsvOptions options = new CsvOptions("Records", ',', filename);
var engine = new CsvEngine(options);
foreach (var field in engine.Options.Fields)
{
field.SetTrimCharsViaReflection(new char[] { ' ', '\t' });
field.TrimMode = TrimMode.Both;
}
var dataTable = engine.ReadFileAsDT(filename);
But you would be better off using a standard FileHelperEngine and creating your own version of CsvClassBuilder (source code here) to create the mapping class. You would have to change the AddFields method as follows:
public override DelimitedFieldBuilder AddField(string fieldName, string fieldType)
{
base.AddField(fieldName, fieldType);
if (base.mFields.Count > 1)
{
base.LastField.FieldOptional = true;
base.LastField.FieldQuoted = true;
base.LastField.QuoteMode = QuoteMode.OptionalForBoth;
base.LastField.QuoteMultiline = MultilineMode.AllowForBoth;
// <New>
base.LastField.TrimMode = TrimMode.Both;
base.LastField.TrimChars = " \t"; // trim spaces and tabs
// </New>
}
return base.LastField;
}
If necessary you can lift the code for CsvToDataTable from the source code for CsvEngine which is here.

CsvHelper Ignore case for header names

I have some class
public class Import
{
public DateTime Date { get; set; }
public string Category { get; set; }
}
In csv file header names can be in lowercase.
How I can ignore case while reading file?
var reader = new StreamReader(#"///");
var csv = new CsvReader(reader);
var records = csv.GetRecords<Import>().ToList();
If you are using the http://joshclose.github.io/CsvHelper/ you can provide some configuration when constructing the CsvReader or configuring it after construction.
using (var stringReader = new StringReader(yourString))
using (var csvReader = new CsvReader(stringReader))
{
// Ignore header case.
csvReader.Configuration.PrepareHeaderForMatch = (string header, int index) => header.ToLower();
return csvReader.GetRecords<Import>().ToList();
}
There is more documentation in the PrepareHeaderForMatch section at https://joshclose.github.io/CsvHelper/api/CsvHelper.Configuration/Configuration/
For more granularity there are also class mapping instructions for which can be found under here:
https://joshclose.github.io/CsvHelper/examples/configuration
Hope that helps.
In the current version of CsvHelper, you have to configure it like this:
var csvConfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
PrepareHeaderForMatch
= args => args.Header.ToLower()
};
using (var reader = new StreamReader(inputFile))
using (var csv = new CsvReader(reader, csvConfig))
{
...
}
A blog post from Mak (2022-09-26) has three different ways to configure CsvHelper.
When your CSV header names don’t match your property names exactly,
CsvHelper will throw an exception. For example, if your header name is
“title” and your property name is “Title”, it’ll throw an exception
like: HeaderValidationException: Header with name ‘Title'[0] was not
found.
If you don’t want to (or can’t) change the names to match, then you
can configure CsvHelper to map headers to properties with different
names. You have three options:
Use the [Name] attribute on properties that need it.
Use CsvConfiguration.PrepareHeaderForMatch when there’s a pattern to the
naming differences (such as a casing difference).
Use a ClassMap to explicitly declare how all properties should be mapped.
C# – Configuring CsvHelper when the header names are different from the properties

EF Code First renaming of properties in generated POCO classes

I'm using Entity Data Model Wizard(Code First From Database) to generate the dbcontext and POCO Classes. Unfortunately I'm running on a very old database and all the database columns have lowercase names, frequently with underscores and look like garbage in C#. It'd be really nice with the ~100 tables we have if the code generator would put the attribute [Column("column_name")] above everything that wasn't capitalized in the database or if there was an easy way to tell visual studio to look at a file and add that attribute for all lowercase properties that don't already have it(or even just all properties period).
I've looked at some T4 stuff, reverse poco generator, etc. But it seemed nearly as time consuming to get it up and running as manually renaming the properties. Is the source for the (Entity Data Model Wizard) code that runs when you select "ADO.NET Entity Data Model" in the VS Add New Item window available anywhere so I can start with something that is already working?
Or does someone know of an epic find/replace that will take
public string n_addr1 { get; set; }
and give
[Column("n_addr1")]
public string N_addr1 { get; set; }
without knowing what n_addr1 is called, meaning it would have to match on public string and/or {get; set;}
I did something similar and I'm going to post the code I used to find the "name" of the class. I edited so that it works with a fileName you pass. Tested on one of my classes and this is working.
var fileName = #"YOUR FILE NAME";
StringBuilder builder = new StringBuilder();
using (StreamReader sr = new StreamReader(fileName))
{
while (!sr.EndOfStream)
{
var line = sr.ReadLine();
var match = Regex.Match(line, #"{\s?get;\s?set;\s?}");
if (match.Success)
{
var split = Regex.Split(line, #"{\s?get;\s?set;\s?}");
var declaration = split[0].Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var last = declaration.Count();
var name = declaration[last - 1];
builder.AppendLine(string.Format("[Column(\"{0}\")]", name));
}
builder.AppendLine(line);
}
}

Categories