How to trim all column values with CsvEngine.CsvToDataTable()? - c#

I am using FileHelpers 3.3.1 to import CSV data and populate DataTables in my c# app. It works well, and here is how I'm calling it:
DataTable dt = CsvEngine.CsvToDataTable(fullPath, ',');
The problem is that some column values have padding, as in spaces to the left and/or right side of the values, and those spaces are not being trimmed. My CSV files are large and performance of my importer app is important, so I really want to avoid looping through the datatable after the fact and trimming every column value of every row.
Is there a way to invoke a "trim all column values automatically" during the call to CsvToDataTable()?
I know there is a FieldTrim attribute that does this very thing, but I cannot bind rigid classes to my CSV files because I have many different CSV files and they all have different column names and data types. So that's not a practical option for me. It seems like there would be a built-in way to trim using one of the generic CSV parsers like CsvToDataTable().
What is my best option?

The FileHelpers CsvEngine class is quite limited. It is a sealed class so you cannot easily inherit or override from it.
If you don't mind a hacky solution, the following works
// Set the internal TrimChars via reflection
public static class FileBaseExtensions
{
public static void SetTrimCharsViaReflection(this FieldBase field, Char [] value)
{
var prop = typeof(FieldBase).GetProperty("TrimChars", BindingFlags.NonPublic | BindingFlags.Instance);
prop.SetValue(field, value);
}
}
CsvOptions options = new CsvOptions("Records", ',', filename);
var engine = new CsvEngine(options);
foreach (var field in engine.Options.Fields)
{
field.SetTrimCharsViaReflection(new char[] { ' ', '\t' });
field.TrimMode = TrimMode.Both;
}
var dataTable = engine.ReadFileAsDT(filename);
But you would be better off using a standard FileHelperEngine and creating your own version of CsvClassBuilder (source code here) to create the mapping class. You would have to change the AddFields method as follows:
public override DelimitedFieldBuilder AddField(string fieldName, string fieldType)
{
base.AddField(fieldName, fieldType);
if (base.mFields.Count > 1)
{
base.LastField.FieldOptional = true;
base.LastField.FieldQuoted = true;
base.LastField.QuoteMode = QuoteMode.OptionalForBoth;
base.LastField.QuoteMultiline = MultilineMode.AllowForBoth;
// <New>
base.LastField.TrimMode = TrimMode.Both;
base.LastField.TrimChars = " \t"; // trim spaces and tabs
// </New>
}
return base.LastField;
}
If necessary you can lift the code for CsvToDataTable from the source code for CsvEngine which is here.

Related

CsvHelper delimeter character same as end of line character

I've run into an issue while parsing some csv-like files that I know how to fix, but like to confirm if that's the appropriate way to do.
The file structure
The file I'm trying to parse has a structure similar to .csv in that it's values are separated with a delimeter (in my case it's |), but different to the ones I've previously seen is that it also has a delimeter at the end of the line, e.g:
Column1|Column2|Column3|
Row1Val1|Row1Val2|Row1Val3|
Row2Val1|Row2Val2|Row2Val3|
The issue
The problem arose when I wrote some unit tests to cover my service that wraps over the CsvHelper library. Apparently there is some issue when I provide the following configuration:
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|",
HasHeaderRecord = true,
NewLine = "|\r\n"
};
With the above configuration, csvReader.GetRecords() returns no results. I believe that's because the order of operations for the parser is to first look for columns, then end of line - and it tries to parse empty column without realizing it's actually part of the delimeter.
(I can paste the code for the getRecords call as well, but it's basically generic code taken from examples - the only difference is I'm using System.IO.Abstractions library for easier unit testing)
The attempts to solve the problem
If I remove the NewLine configuration value, parser works fine when reading the file (even if it has end-of-line delimeter character at the end). Then, however, my "write CSV" tests break, since CsvHelper no longer is adding proper line endings to the file.
The question(s)
Is there any way I can configure CsvHelper to cover both cases with one configuration, or should I basically use two different configurations, depending on whether I'm writing to CSV or reading from it? This seems a little bit counter-intuitive for me, since it's basically the same format I'm trying to follow, but different configurations are expected?
You could manually write the empty column for each line and then you could keep the configuration the same for reading and writing.
void Main()
{
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|"
};
var records = new List<MyClass>
{
new MyClass {Column1 = "Row1Val1", Column2 = "Row1Val2", Column3 = "Row1Val3"},
new MyClass {Column1 = "Row2Val1", Column2 = "Row2Val2", Column3 = "Row2Val3"}
};
using (var writer = new StreamWriter("file.csv"))
using (var csv = new CsvWriter(writer, config))
{
csv.WriteHeader<MyClass>();
csv.WriteField(string.Empty);
foreach (var record in records)
{
csv.NextRecord();
csv.WriteRecord(record);
csv.WriteField(string.Empty);
}
}
using (var reader = new StreamReader("file.csv"))
using (var csv = new CsvReader(reader, config))
{
var importRecords = csv.GetRecords<MyClass>();
importRecords.Dump();
}
}
public class MyClass
{
public string Column1 { get; set; }
public string Column2 { get; set; }
public string Column3 { get; set; }
}

compare rows values of two different CSV files in c#

I know there is more similar question but I was not able to find the answer to mine. I have two CSV files. Both files contain image metadata for the same images, however, the first file image IDs are outdated. So I need to take the IDs from the second file and replace outdated IDs with new ones. I was thinking to compare image Longitude, Latitude, and Altitude rows values, and where it matches in both files I take image id from the second file. The IDs would be used in the new object. And the sequence of lines in files is different and the first file contains more lines than the second one.
The files structure looks as follows:
First file:
ImgID,Longitude,Latitude,Altitude
01,44.7282372307,27.5786807185,14.1536407471
02,44.7287939869,27.5777060219,13.2340240479
03,44.7254687824,27.582636255,16.5887145996
04,44.7254294913,27.5826908925,16.5794525146
05,44.728785278,27.5777185252,13.2553100586
06,44.7282279311,27.5786933339,14.1576690674
07,44.7253847039,27.5827526969,16.6026000977
08,44.7287777782,27.5777295052,13.2788238525
09,44.7282196988,27.5787045314,14.1649169922
10,44.7253397041,27.5828151049,16.6300048828
11,44.728769439,27.5777417846,13.3072509766
Second file:
ImgID,Longitude,Latitude,Altitude
5702,44.7282372307,27.5786807185,14.1536407471
5703,44.7287939869,27.5777060219,13.2340240479
5704,44.7254687824,27.582636255,16.5887145996
5705,44.7254294913,27.5826908925,16.5794525146
5706,44.728785278,27.5777185252,13.2553100586
5707,44.7282279311,27.5786933339,14.1576690674
How this can be done in C#? Is there is some handy library to work with?
I would use the CSVHelper library for CSV read/write as it is a complete nice library. For this, you should declare a class to hold your data, and its property names must match your CSV file's column names.
public class ImageData
{
public int ImgID { get; set; }
public double Longitude { get; set; }
public double Latitude { get; set; }
public double Altitude { get; set; }
}
Then to see if two lines are equal, what you need to do is see if each property in each line in one file matches the other. You could do this by simply comparing properties, but I'd rather write a comparer for this, like so:
public class ImageDataComparer : IEqualityComparer<ImageData>
{
public bool Equals(ImageData x, ImageData y)
{
return (x.Altitude == y.Altitude && x.Latitude == y.Latitude && x.Longitude == y.Longitude);
}
public int GetHashCode(ImageData obj)
{
unchecked
{
int hash = (int)2166136261;
hash = (hash * 16777619) ^ obj.Altitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Latitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Longitude.GetHashCode();
return hash;
}
}
}
Simple explanation is that we override the Equals() method and dictate that two instances of ImageData class are equal if the three property values are matching. I will show the usage in a bit.
The CSV read/write part is pretty easy (the library's help page has some good examples and tips, please read it). I can write two methods for reading and writing like so:
public static List<ImageData> ReadCSVData(string filePath)
{
List<ImageData> records;
using (var reader = new StreamReader(filePath))
{
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = true;
records = csv.GetRecords<ImageData>().ToList();
}
}
return records;
}
public static void WriteCSVData(string filePath, List<ImageData> records)
{
using (var writer = new StreamWriter(filePath))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
}
}
You can actually write generic <T> read/write methods so the two methods are usable with different classes, if that's something useful for you.
Next is the crucial part. First, read the two files to memory using the methods we just defined.
var oldData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"));
var newData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "NewFile.csv"));
Now, I can go through each line in the 'old' data, and see if there's a corresponding record in 'new' data. If so, I grab the ID from the new data and replace the ID of old data with it. Notice the usage of the comparer we wrote.
foreach (var line in oldData)
{
var replace = newData.FirstOrDefault(x => new ImageDataComparer().Equals(x, line));
if (replace != null && replace.ImgID != line.ImgID)
{
line.ImgID = replace.ImgID;
}
}
Next, simply overwrite the old data file.
WriteCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"), oldData);
Results
I'm using a simplified version of your data to easily verify our results.
Old Data
ImgID,Longitude,Latitude,Altitude
1,1,2,3
2,2,3,4
3,3,4,5
4,4,5,6
5,5,6,7
6,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
New Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
Now our expected results should be that the first 6 lines of the old files should have the ids updated, and that's what we get:
Updated Old Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
An alternate way to do it, if for some reason you didn't want to use the CSVHelper, is to write a method that compares two lines of data and determines if they're equal (by ignoring the first column data):
public static bool DataLinesAreEqual(string first, string second)
{
if (first == null || second == null) return false;
var xParts = first.Split(',');
var yParts = second.Split(',');
if (xParts.Length != 4 || yParts.Length != 4) return false;
return xParts.Skip(1).SequenceEqual(yParts.Skip(1));
}
Then we can read all the lines from both files into arrays, and then we can update our first file lines with those from the second file if our method says they're equal:
var csvPath1 = #"c:\temp\csvData1.csv";
var csvPath2 = #"c:\temp\csvData2.csv";
// Read lines from both files
var first = File.ReadAllLines(csvPath1);
var second = File.ReadAllLines(csvPath2);
// Select the updated line where necessary
var updated = first.Select(f => second.FirstOrDefault(s => DataLinesAreEqual(f, s)) ?? f);
// Write the updated result back to the first file
File.WriteAllLines(csvPath1, updated);

Read all values from CSV into a List using CsvHelper

So I've been reading that I shouldn't write my own CSV reader/writer, so I've been trying to use the CsvHelper library installed via nuget. The CSV file is a grey scale image, with the number of rows being the image height and the number columns the width. I would like to read the values row-wise into a single List<string> or List<byte>.
The code I have so far is:
using CsvHelper;
public static List<string> ReadInCSV(string absolutePath)
{
IEnumerable<string> allValues;
using (TextReader fileReader = File.OpenText(absolutePath))
{
var csv = new CsvReader(fileReader);
csv.Configuration.HasHeaderRecord = false;
allValues = csv.GetRecords<string>
}
return allValues.ToList<string>();
}
But allValues.ToList<string>() is throwing a:
CsvConfigurationException was unhandled by user code
An exception of type 'CsvHelper.Configuration.CsvConfigurationException' occurred in CsvHelper.dll but was not handled in user code
Additional information: Types that inherit IEnumerable cannot be auto mapped. Did you accidentally call GetRecord or WriteRecord which acts on a single record instead of calling GetRecords or WriteRecords which acts on a list of records?
GetRecords is probably expecting my own custom class, but I'm just wanting the values as some primitive type or string. Also, I suspect the entire row is being converted to a single string, instead of each value being a separate string.
According to #Marc L's post you can try this:
public static List<string> ReadInCSV(string absolutePath) {
List<string> result = new List<string>();
string value;
using (TextReader fileReader = File.OpenText(absolutePath)) {
var csv = new CsvReader(fileReader);
csv.Configuration.HasHeaderRecord = false;
while (csv.Read()) {
for(int i=0; csv.TryGetField<string>(i, out value); i++) {
result.Add(value);
}
}
}
return result;
}
If all you need is the string values for each row in an array, you could use the parser directly.
var parser = new CsvParser( textReader );
while( true )
{
string[] row = parser.Read();
if( row == null )
{
break;
}
}
http://joshclose.github.io/CsvHelper/#reading-parsing
Update
Version 3 has support for reading and writing IEnumerable properties.
The whole point here is to read all lines of CSV and deserialize it to a collection of objects. I'm not sure why do you want to read it as a collection of strings. Generic ReadAll() would probably work the best for you in that case as stated before. This library shines when you use it for that purpose:
using System.Linq;
...
using (var reader = new StreamReader(path))
using (var csv = new CsvReader(reader))
{
var yourList = csv.GetRecords<YourClass>().ToList();
}
If you don't use ToList() - it will return a single record at a time (for better performance), please read https://joshclose.github.io/CsvHelper/examples/reading/enumerate-class-records
Please try this. This had worked for me.
TextReader reader = File.OpenText(filePath);
CsvReader csvFile = new CsvReader(reader);
csvFile.Configuration.HasHeaderRecord = true;
csvFile.Read();
var records = csvFile.GetRecords<Server>().ToList();
Server is an entity class. This is how I created.
public class Server
{
private string details_Table0_ProductName;
public string Details_Table0_ProductName
{
get
{
return details_Table0_ProductName;
}
set
{
this.details_Table0_ProductName = value;
}
}
private string details_Table0_Version;
public string Details_Table0_Version
{
get
{
return details_Table0_Version;
}
set
{
this.details_Table0_Version = value;
}
}
}
You are close. It isn't that it's trying to convert the row to a string. CsvHelper tries to map each field in the row to the properties on the type you give it, using names given in a header row. Further, it doesn't understand how to do this with IEnumerable types (which string implements) so it just throws when it's auto-mapping gets to that point in testing the type.
That is a whole lot of complication for what you're doing. If your file format is sufficiently simple, which yours appear to be--well known field format, neither escaped nor quoted delimiters--I see no reason why you need to take on the overhead of importing a library. You should be able to enumerate the values as needed with System.IO.File.ReadLines() and String.Split().
//pseudo-code...you don't need CsvHelper for this
IEnumerable<string> GetFields(string filepath)
{
foreach(string row in File.ReadLines(filepath))
{
foreach(string field in row.Split(',')) yield return field;
}
}
static void WriteCsvFile(string filename, IEnumerable<Person> people)
{
StreamWriter textWriter = File.CreateText(filename);
var csvWriter = new CsvWriter(textWriter, System.Globalization.CultureInfo.CurrentCulture);
csvWriter.WriteRecords(people);
textWriter.Close();
}

Fill/update the enum values at runtime in C#

I have windows app where-in i need to fill enum values at runtime by reading a text file named "Controls.txt".
As restriction, i'm not suppose to use dictionary. Below is the default values available in the enum MyControls. I have to use enums only.
public enum MyControls
{
Button1 = 0,
Button2 = 1,
Button3 = 2,
}
If Controls.txt file is available, then content of enum should change like
public enum MyControls
{
btn1 = 0,
btn2 = 1,
btn3 = 2,
}
how do i achieve this. I also came across the link Creating / Modifying Enums at Runtime but could not get idea.
I strongly think you are trying to solve the wrong problem. The value of enum is type-safety. I do not think that filling it up dynamically is a good idea. What would really be useful is to have an enum populated by a text file (for example) even before compilation. You can do this using text templates in VS.
You can find an example in my blog post here: http://skleanthous.azurewebsites.net/post/2014/05/21/Creating-enums-from-the-database-and-using-them-in-Entity-framework-5-and-later-in-model-first
Although my example loads from a db, changing it to load from a text file should be trivial.
Apart from the fact that i agree with the other answer that says that you lose type and compile time safety, using EnumBuilderClass should be the only way (thanks to huMpty duMpty's comment).
// sample "file":
string fileContent = #"
btn1 = 0,
btn2 = 1,
btn3 = 2,
";
var enumBody = fileContent.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(line => new { bothToken = line.Trim().Trim(',').Split('=') })
.Where(x => x.bothToken.Length == 2)
.Select(x => new { Name = x.bothToken[0].Trim(), Value = int.Parse(x.bothToken[1].Trim()) });
AppDomain currentDomain = AppDomain.CurrentDomain;
AssemblyName asmName = new AssemblyName("EnumAssembly");
AssemblyBuilder asmBuilder = currentDomain.DefineDynamicAssembly(asmName, AssemblyBuilderAccess.RunAndSave);
ModuleBuilder mb = asmBuilder.DefineDynamicModule(asmName.Name, asmName.Name + ".dll");
string enumTypeName = string.Format("{0}.{1}", typeof(MyControls).Namespace, typeof(MyControls).Name);
EnumBuilder eb = mb.DefineEnum(enumTypeName, TypeAttributes.Public, typeof(int));
foreach(var element in enumBody)
{
FieldBuilder fb1 = eb.DefineLiteral(element.Name, element.Value);
}
Type eType = eb.CreateType();
foreach (object obj in Enum.GetValues(eType))
{
Console.WriteLine("{0}.{1} = {2}", eType, obj, ((int)obj));
}
Output:
Namespacename.MyControls.btn1 = 0
Namespacename.MyControls.btn2 = 1
Namespacename.MyControls.btn3 = 2
Well, I agree that the use case above is not something I would use. I, however, do not agree when it comes to there being no use for it. We use for example use enums to classify string values for machine learning modules. We write code at runtime to use it at runtime and grouping enums is a hell of a lot faster than grouping and analysing strings. There is nothing good when using strings in large qualities. They are problematic when doing a comparison, memory allocation, garbage collections, grouping, sorting, there are just too many bytes.
Databases that manage large volumes of data will generate a hash of a string and store that, then compare the strings hash (not unique but a number) and the string at the same statement making the TSQL language use the more definitive index on the hash field to narrow the search, then comparing the string values to make sure the right value is used. in TSQL one would do it this way:
SELECT *
FROM Production.Product
WHERE CHECKSUM(N'Bearing Ball') = cs_Pname
AND Name = N'Bearing Ball';
GO
but in .net we keep thinking that comparing strings is the way to go.
It makes little sense for me to dump my code here as it is proprietary but that there is plenty of good samples out there, an Article by Bob Dain shows line by line how this can be done and is located here
A snippet of his solution looks like this:
using System;
using System.Reflection;
using System.IO;
namespace RemoteUser
{
public class RemoteUserClass
{
public RemoteUserClass()
{
// Load the remote assembly
AssemblyName name = new AssemblyName();
name.CodeBase = "file://" + Directory.GetCurrentDirectory() +
"ThirdPartyDll.dll";
Assembly assembly = AppDomain.CurrentDomain.Load(name);
// Instantiate the class
object remoteObject =
assembly.CreateInstance("ThirdPartyDll.ThirdPartyClass");
Type remoteType =
assembly.GetType("ThirdPartyDll.ThirdPartyClass");
// Load the enum type
PropertyInfo flagsInfo =
remoteType.GetProperty("ThirdPartyBitFields");
Type enumType = assembly.GetType("ThirdPartyDll.BitFields");
// Load the enum values
FieldInfo enumItem1 = enumType.GetField("AnotherSetting");
FieldInfo enumItem2 = enumType.GetField("SomethingElse");
// Calculate the new value
int enumValue1 = (int)enumItem1.GetValue(enumType);
int enumValue2 = (int)enumItem2.GetValue(enumType);
int currentValue = (int)flagsInfo.GetValue(remoteObject, null);
int newValue = currentValue | enumValue1 | enumValue2;
// Store the new value back in Options.FieldFlags
object newEnumValue = Enum.ToObject(enumType, newValue);
flagsInfo.SetValue(remoteObject, newEnumValue, null);
// Call the method
MethodInfo method = remoteType.GetMethod("DoSomeGood");
method.Invoke(remoteObject, null);
}
}
}
One can use the System.Reflection.Emit namespace for many things, one can generate a class that makes a license key for one. One can also write code, and code writing and updating code is the future.

C# Linq to CSV Dynamic Object runtime column name

I'm new to using Dynamic Objects in C#. I am reading a CSV file very similarly to the code found here: http://my.safaribooksonline.com/book/programming/csharp/9780321637208/csharp-4dot0-features/ch08lev1sec3
I can reference the data I need with a static name, however I can not find the correct syntax to reference using a dynamic name at run time.
For example I have:
var records = from r in myDynamicClass.Records select r;
foreach(dynamic rec in records)
{
Console.WriteLine(rec.SomeColumn);
}
And this works fine if you know the "SomeColumn" name. I would prefer to have a column name a a string and be able to make the same type refrence at run time.
Since one has to create the class which inherits from DynamicObject, simply add an indexer to the class to achieve one's result via strings.
The following example uses the same properties found in the book example, the properties which holds the individual line data that has the column names. Below is the indexer on that class to achieve the result:
public class myDynamicClassDataLine : System.Dynamic.DynamicObject
{
string[] _lineContent; // Actual line data
List<string> _headers; // Associated headers (properties)
public string this[string indexer]
{
get
{
string result = string.Empty;
int index = _headers.IndexOf(indexer);
if (index >= 0 && index < _lineContent.Length)
result = _lineContent[index];
return result;
}
}
}
Then access the data such as
var csv =
#",,SomeColumn,,,
ab,cd,ef,,,"; // Ef is the "SomeColumn"
var data = new myDynamicClass(csv); // This holds multiple myDynamicClassDataLine items
Console.WriteLine (data.OfType<dynamic>().First()["SomeColumn"]); // "ef" is the output.
You will need to use reflection. To get the names you would use:
List<string> columnNames = new List<string>(records.GetType().GetProperties().Select(i => i.Name));
You can then loop through your results and output the values for each column like so:
foreach(dynamic rec in records)
{
foreach (string prop in columnNames)
Console.Write(rec.GetType().GetProperty (prop).GetValue (rec, null));
}
Try this
string column = "SomeColumn";
var result = rec.GetType().GetProperty (column).GetValue (rec, null);

Categories