I want to be able to filter out a CSV file and perform data validation on the filtered data. I imagine for loops, but the file has 2 million cells and it would take a long time. I am using Lumenworks CSVReader for accessing the file using C#.
I found this method csvfile.Where<> but I have no idea what to put in the parameters. Sorry I am still new to coding as well.
[EDIT] This is my code for loading the file. Thanks for all the help!
//Creating C# table from CSV data
var csvTable = new DataTable();
var csvReader = new CsvReader(newStreamReader(System.IO.File.OpenRead(filePath[0])), true);
csvTable.Load(csvReader);
//grabs header from the CSV data table
string[] headers = csvReader.GetFieldHeaders(); //this method gets the headers of the CSV file
string filteredData[] = csvReader.Where // this is where I would want to implement the where method, or some sort of way to filter the data
//I can access the rows and columns with this
csvTable.Rows[0][0]
csvTable.Columns[0][0]
//After filtering (maybe even multiple filters) I want to add up all the filtered data (assuming they are integers)
var dataToValidate = 0;
foreach var data in filteredData{
dataToValidate += data;
}
if (dataToValidate == 123)
//data is validated
I would read some of the documentation for the package you are using:
https://github.com/phatcher/CsvReader
https://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
To specifically answer the filtering question, so it only contains the data you are searching for consider the following:
var filteredData = new List<List<string>>();
using (CsvReader csv = new CsvReader(new StreamReader(System.IO.File.OpenRead(filePath[0])), true));
{
string searchTerm = "foo";
while (csv.ReadNextRecord())
{
var row = new List<string>();
for (int i = 0; i < csv.FieldCount; i++)
{
if (csv[i].Contains(searchTerm))
{
row.Add(csv[i]);
}
}
filteredData.Add(row);
}
}
This will give you a list of a list of string that you can enumerate over to do your validation
int dataToValidate = 0;
foreach (var row in filteredData)
{
foreach (var data in row)
{
// do the thing
}
}
--- Old Answer ---
Without seeing the code you are using to load the file, it might be a bit difficult to give you a full answer, ~2 Million cells may be slow no matter what what.
Your .Where comes from System.Linq
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where?view=net-6.0
A simple example using .Where
//Read the file and return a list of strings that match the where clause
public List<string> ReadCSV()
{
List<string> data = File.ReadLines(#"C:\Users\Public\Documents\test.csv");
.Select(line => line.Split(','))
// token[x] where x is the column number, assumes ID is column 0
.Select(tokens => new CsvFileStructure { Id = tokens[0], Value = tokens[1] })
// Where filters based on whatever you are looking for in the CSV
.Where(csvFileStructure => csvFileStructure.Id == "1")
.ToList();
return data;
}
// Map of your data structure
public class CsvFileStructure
{
public long Id { get; set; }
public string Name { get; set; }
public string Value { get; set; }
}
Modified from this answer:
https://stackoverflow.com/a/10332737/7366061
There is no csvreader.Where method. The "where" is part of Linq in C#. The link below shows an example of computing columns in a csv file using Linq:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/how-to-compute-column-values-in-a-csv-text-file-linq
I'm using CsvHelper. To write to a .csv file, I need a header based off of a class.
I write the header manually and it works, but I need to be done automatically when it is read.
All the documentation I can find says to use this expression, writer.WriteHeader<CSVDataFormat>(); but that isn't working because it needs more work done to it.
Here is the class that the header should be based off:
public class CSVDataFormat
{
public string FirstName { get; set; }
public string LastName { get; set; }
public float Wage { get; set; }
}
Here is the code for the reading and writing:
private void ReadCSV(string ogCsvFile)
{
using (var streamReaederfileDir = new StreamReader(#ogCsvFile))
{
using (var streamWriterFileDir = new StreamWriter(Path.Combine(Path.GetDirectoryName(ogCsvFile), "New" + Path.GetFileName(ogCsvFile))))
{
var reader = new CsvReader(streamReaederfileDir);
var writer = new CsvWriter(streamWriterFileDir);
writer.WriteHeader<CSVDataFormat>();
IEnumerable records = reader.GetRecords<CSVDataFormat>().ToList();
foreach (CSVDataFormat record in records)
{
record.Wage = record.Wage + (record.Wage / 10);
writer.WriteField(record.FirstName);
writer.WriteField(record.LastName);
writer.WriteField(record.Wage);
writer.NextRecord();
}
}
}
}
Update
This is the error I am getting when I run my code:
An unhandled exception of type 'CsvHelper.CsvMissingFieldException' occurred in CsvHelper.dll
Additional information: Fields 'FirstName' do not exist in the CSV file.
You may be confused how CSVHelper works. This code handles the write aspect of your read in-write out loop:
List<Employee> empList = new List<Employee>();
empList.Add(new Employee { FirstName = "Ziggy", LastName = "Walters", Wage = 132.50F });
empList.Add(new Employee { FirstName = "Zoey", LastName = "Strand", Wage = 76.50F });
using (StreamWriter sw = new StreamWriter(#"C:\Temp\emp.csv"))
using (CsvWriter cw = new CsvWriter(sw))
{
cw.WriteHeader<Employee>();
foreach (Employee emp in empList)
{
emp.Wage *= 1.1F;
cw.WriteRecord<Employee>(emp);
}
}
CSVWriter implements IDisposable, so I put it into a using block as well.
The wage adjustment is slightly streamlined
Result:
FirstName,LastName,Wage
Ziggy,Walters,145.75
Zoey,Strand,84.15
Write header just writes the first line - the names of the columns/items. Notice that the wages listed are different than what I used to create each one.
For what you are doing, I would read in a typed object in place of iterating the empList. For the error listed in the edit, that means that it could not find a column by that name in the input file (probably because you didnt use the Types overload). The class property names should match the column names exactly (you may also want to configure CSVHelper).
The full in-out loop is only slightly more complex:
using (StreamReader sr = new StreamReader(#"C:\Temp\empIN.csv"))
using (StreamWriter sw = new StreamWriter(#"C:\Temp\empOUT.csv"))
using (CsvWriter cw = new CsvWriter(sw))
using (CsvReader cr = new CsvReader(sr))
{
cw.WriteHeader<Employee>();
var records = cr.GetRecords<Employee>();
foreach (Employee emp in records)
{
emp.Wage *= 1.1F;
cw.WriteRecord<Employee>(emp);
}
}
Results using the output from the first loop as input:
FirstName,LastName,Wage
Ziggy,Walters,160.325
Zoey,Strand,92.565
If there is no header record in the incoming CSV it wont know how to map data to the class. You need to add a map:
public class EmployeeMap : CsvHelper.Configuration.CsvClassMap<Employee>
{
public EmployeeMap()
{
Map(m => m.FirstName).Index(0);
Map(m => m.LastName).Index(1);
Map(m => m.Wage).Index(2);
}
}
Mine is nested inside the Employee class. Then give CSVHelper that map:
... before your try to read from the incoming CSV:
cr.Configuration.RegisterClassMap<Employee.EmployeeMap>();
cw.WriteHeader<Employee>();
...
Now it knows how to map csv columns to the properties in your class.
I believe this exception is from the CsvReader and not the CsvWriter. Default CsvConfiguration expects a header and uses AutoMap to generate a PropertyName_to_Index mapping.
From the documentation you may need to define a map
see mapping
I have some class
public class Import
{
public DateTime Date { get; set; }
public string Category { get; set; }
}
In csv file header names can be in lowercase.
How I can ignore case while reading file?
var reader = new StreamReader(#"///");
var csv = new CsvReader(reader);
var records = csv.GetRecords<Import>().ToList();
If you are using the http://joshclose.github.io/CsvHelper/ you can provide some configuration when constructing the CsvReader or configuring it after construction.
using (var stringReader = new StringReader(yourString))
using (var csvReader = new CsvReader(stringReader))
{
// Ignore header case.
csvReader.Configuration.PrepareHeaderForMatch = (string header, int index) => header.ToLower();
return csvReader.GetRecords<Import>().ToList();
}
There is more documentation in the PrepareHeaderForMatch section at https://joshclose.github.io/CsvHelper/api/CsvHelper.Configuration/Configuration/
For more granularity there are also class mapping instructions for which can be found under here:
https://joshclose.github.io/CsvHelper/examples/configuration
Hope that helps.
In the current version of CsvHelper, you have to configure it like this:
var csvConfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
PrepareHeaderForMatch
= args => args.Header.ToLower()
};
using (var reader = new StreamReader(inputFile))
using (var csv = new CsvReader(reader, csvConfig))
{
...
}
A blog post from Mak (2022-09-26) has three different ways to configure CsvHelper.
When your CSV header names don’t match your property names exactly,
CsvHelper will throw an exception. For example, if your header name is
“title” and your property name is “Title”, it’ll throw an exception
like: HeaderValidationException: Header with name ‘Title'[0] was not
found.
If you don’t want to (or can’t) change the names to match, then you
can configure CsvHelper to map headers to properties with different
names. You have three options:
Use the [Name] attribute on properties that need it.
Use CsvConfiguration.PrepareHeaderForMatch when there’s a pattern to the
naming differences (such as a casing difference).
Use a ClassMap to explicitly declare how all properties should be mapped.
C# – Configuring CsvHelper when the header names are different from the properties
So I've been reading that I shouldn't write my own CSV reader/writer, so I've been trying to use the CsvHelper library installed via nuget. The CSV file is a grey scale image, with the number of rows being the image height and the number columns the width. I would like to read the values row-wise into a single List<string> or List<byte>.
The code I have so far is:
using CsvHelper;
public static List<string> ReadInCSV(string absolutePath)
{
IEnumerable<string> allValues;
using (TextReader fileReader = File.OpenText(absolutePath))
{
var csv = new CsvReader(fileReader);
csv.Configuration.HasHeaderRecord = false;
allValues = csv.GetRecords<string>
}
return allValues.ToList<string>();
}
But allValues.ToList<string>() is throwing a:
CsvConfigurationException was unhandled by user code
An exception of type 'CsvHelper.Configuration.CsvConfigurationException' occurred in CsvHelper.dll but was not handled in user code
Additional information: Types that inherit IEnumerable cannot be auto mapped. Did you accidentally call GetRecord or WriteRecord which acts on a single record instead of calling GetRecords or WriteRecords which acts on a list of records?
GetRecords is probably expecting my own custom class, but I'm just wanting the values as some primitive type or string. Also, I suspect the entire row is being converted to a single string, instead of each value being a separate string.
According to #Marc L's post you can try this:
public static List<string> ReadInCSV(string absolutePath) {
List<string> result = new List<string>();
string value;
using (TextReader fileReader = File.OpenText(absolutePath)) {
var csv = new CsvReader(fileReader);
csv.Configuration.HasHeaderRecord = false;
while (csv.Read()) {
for(int i=0; csv.TryGetField<string>(i, out value); i++) {
result.Add(value);
}
}
}
return result;
}
If all you need is the string values for each row in an array, you could use the parser directly.
var parser = new CsvParser( textReader );
while( true )
{
string[] row = parser.Read();
if( row == null )
{
break;
}
}
http://joshclose.github.io/CsvHelper/#reading-parsing
Update
Version 3 has support for reading and writing IEnumerable properties.
The whole point here is to read all lines of CSV and deserialize it to a collection of objects. I'm not sure why do you want to read it as a collection of strings. Generic ReadAll() would probably work the best for you in that case as stated before. This library shines when you use it for that purpose:
using System.Linq;
...
using (var reader = new StreamReader(path))
using (var csv = new CsvReader(reader))
{
var yourList = csv.GetRecords<YourClass>().ToList();
}
If you don't use ToList() - it will return a single record at a time (for better performance), please read https://joshclose.github.io/CsvHelper/examples/reading/enumerate-class-records
Please try this. This had worked for me.
TextReader reader = File.OpenText(filePath);
CsvReader csvFile = new CsvReader(reader);
csvFile.Configuration.HasHeaderRecord = true;
csvFile.Read();
var records = csvFile.GetRecords<Server>().ToList();
Server is an entity class. This is how I created.
public class Server
{
private string details_Table0_ProductName;
public string Details_Table0_ProductName
{
get
{
return details_Table0_ProductName;
}
set
{
this.details_Table0_ProductName = value;
}
}
private string details_Table0_Version;
public string Details_Table0_Version
{
get
{
return details_Table0_Version;
}
set
{
this.details_Table0_Version = value;
}
}
}
You are close. It isn't that it's trying to convert the row to a string. CsvHelper tries to map each field in the row to the properties on the type you give it, using names given in a header row. Further, it doesn't understand how to do this with IEnumerable types (which string implements) so it just throws when it's auto-mapping gets to that point in testing the type.
That is a whole lot of complication for what you're doing. If your file format is sufficiently simple, which yours appear to be--well known field format, neither escaped nor quoted delimiters--I see no reason why you need to take on the overhead of importing a library. You should be able to enumerate the values as needed with System.IO.File.ReadLines() and String.Split().
//pseudo-code...you don't need CsvHelper for this
IEnumerable<string> GetFields(string filepath)
{
foreach(string row in File.ReadLines(filepath))
{
foreach(string field in row.Split(',')) yield return field;
}
}
static void WriteCsvFile(string filename, IEnumerable<Person> people)
{
StreamWriter textWriter = File.CreateText(filename);
var csvWriter = new CsvWriter(textWriter, System.Globalization.CultureInfo.CurrentCulture);
csvWriter.WriteRecords(people);
textWriter.Close();
}
I have a class as follows :
public class Test
{
public int Id {get;set;}
public string Name { get; set; }
public string CreatedDate {get;set;}
public string DueDate { get; set; }
public string ReferenceNo { get; set; }
public string Parent { get; set; }
}
and I have a list of Test objects
List<Test>testobjs=new List();
Now I would like to convert it into csv in following format:
"1,John Grisham,9/5/2014,9/5/2014,1356,0\n2,Stephen King,9/3/2014,9/9/2014,1367,0\n3,The Rainmaker,4/9/2014,18/9/2014,1";
I searched for "Converting list to csv c#" and I got solutions as follows:
string.Join(",", list.Select(n => n.ToString()).ToArray())
But this will not put the \n as needed i.e for each object
Is there any fastest way other than string building to do this? Please help...
Use servicestack.text
Install-Package ServiceStack.Text
and then use the string extension methods ToCsv(T)/FromCsv()
Examples:
https://github.com/ServiceStack/ServiceStack.Text
Update:
Servicestack.Text is now free also in v4 which used to be commercial. No need to specify the version anymore! Happy serializing!
Because speed was mentioned in the question, my interest was piqued on just what the relative performances might be, and just how fast I could get it.
I know that StringBuilder was excluded, but it still felt like probably the fastest, and StreamWriter has of course the advantage of writing to either a MemoryStream or directly to a file, which makes it versatile.
So I knocked up a quick test.
I built a list half a million objects identical to yours.
Then I serialized with CsvSerializer, and with two hand-rolled tight versions, one using a StreamWriter to a MemoryStream and the other using a StringBuilder.
The hand rolled code was coded to cope with quotes but nothing more sophisticated. This code was pretty tight with the minimum I could manage of intermediate strings, no concatenation... but not production and certainly no points for style or flexibility.
But the output was identical in all three methods.
The timings were interesting:
Serializing half a million objects, five runs with each method, all times to the nearest whole mS:
StringBuilder 703 734 828 671 718 Avge= 730.8
MemoryStream 812 937 874 890 906 Avge= 883.8
CsvSerializer 1,734 1,469 1,719 1,593 1,578 Avge= 1,618.6
This was on a high end i7 with plenty of RAM.
Other things being equal, I would always use the library.
But if a 2:1 performance difference became critical, or if RAM or other issues turned out to exaggerate the difference on a larger dataset, or if the data were arriving in chunks and was to be sent straight to disk, I might just be tempted...
Just in case anyone's interested, the core of the code (for the StringBuilder version) was
private void writeProperty(StringBuilder sb, string value, bool first, bool last)
{
if (! value.Contains('\"'))
{
if (!first)
sb.Append(',');
sb.Append(value);
if (last)
sb.AppendLine();
}
else
{
if (!first)
sb.Append(",\"");
else
sb.Append('\"');
sb.Append(value.Replace("\"", "\"\""));
if (last)
sb.AppendLine("\"");
else
sb.Append('\"');
}
}
private void writeItem(StringBuilder sb, Test item)
{
writeProperty(sb, item.Id.ToString(), true, false);
writeProperty(sb, item.Name, false, false);
writeProperty(sb, item.CreatedDate, false, false);
writeProperty(sb, item.DueDate, false, false);
writeProperty(sb, item.ReferenceNo, false, false);
writeProperty(sb, item.Parent, false, true);
}
If you don't want to load library's than you can create the following method:
private void SaveToCsv<T>(List<T> reportData, string path)
{
var lines = new List<string>();
IEnumerable<PropertyDescriptor> props = TypeDescriptor.GetProperties(typeof(T)).OfType<PropertyDescriptor>();
var header = string.Join(",", props.ToList().Select(x => x.Name));
lines.Add(header);
var valueLines = reportData.Select(row => string.Join(",", header.Split(',').Select(a => row.GetType().GetProperty(a).GetValue(row, null))));
lines.AddRange(valueLines);
File.WriteAllLines(path, lines.ToArray());
}
and than call the method:
SaveToCsv(testobjs, "C:/PathYouLike/FileYouLike.csv")
Your best option would be to use an existing library. It saves you the hassle of figuring it out yourself and it will probably deal with escaping special characters, adding header lines etc.
You could use the CSVSerializer from ServiceStack. But there are several other in nuget.
Creating the CSV will then be as easy as string csv = CsvSerializer.SerializeToCsv(testobjs);
You could use the FileHelpers library to convert a List of objects to CSV.
Consider the given object, add the DelimitedRecord Attribute to it.
[DelimitedRecord(",")]
public class Test
{
public int Id {get;set;}
public string Name { get; set; }
public string CreatedDate {get;set;}
public string DueDate { get; set; }
public string ReferenceNo { get; set; }
public string Parent { get; set; }
}
Once the List is populated, (as per question it is testobjs)
var engine = new FileHelperEngine<Test>();
engine.HeaderText = engine.GetFileHeader();
string dirPath = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData) + "\\" + ConfigurationManager.AppSettings["MyPath"];
if (!Directory.Exists(dirPath))
{
Directory.CreateDirectory(dirPath);
}
//File location, where the .csv goes and gets stored.
string filePath = Path.Combine(dirPath, "MyTestFile_" + ".csv");
engine.WriteFile(filePath, testobjs);
This will just do the job for you. I'd been using this to generate data reports for a while until I switched to Python.
PS: Too late to answer but hope this helps somebody.
Use Cinchoo ETL
Install-Package ChoETL
or
Install-Package ChoETL.NETStandard
Sample shows how to use it
List<Test> list = new List<Test>();
list.Add(new Test { Id = 1, Name = "Tom" });
list.Add(new Test { Id = 2, Name = "Mark" });
using (var w = new ChoCSVWriter<Test>(Console.Out)
.WithFirstLineHeader()
)
{
w.Write(list);
}
Output CSV:
Id,Name,CreatedDate,DueDate,ReferenceNo,Parent
1,Tom,,,,
2,Mark,,,,
For more information, go to github
https://github.com/Cinchoo/ChoETL
Sample fiddle: https://dotnetfiddle.net/M7v7Hi
LINQtoCSV is the fastest and lightest I've found and is available on GitHub. Lets you specify options via property attributes.
Necromancing this one a bit; ran into the exact same scenario as above, went down the road of using FastMember so we didn't have to adjust the code every time we added a property to the class:
[HttpGet]
public FileResult GetCSVOfList()
{
// Get your list
IEnumerable<MyObject> myObjects =_service.GetMyObject();
//Get the type properties
var myObjectType = TypeAccessor.Create(typeof(MyObject));
var myObjectProperties = myObjectType.GetMembers().Select(x => x.Name);
//Set the first row as your property names
var csvFile = string.Join(',', myObjectProperties);
foreach(var myObject in myObjects)
{
// Use ObjectAccessor in order to maintain column parity
var currentMyObject = ObjectAccessor.Create(myObject);
var csvRow = Environment.NewLine;
foreach (var myObjectProperty in myObjectProperties)
{
csvRow += $"{currentMyObject[myObjectProperty]},";
}
csvRow.TrimEnd(',');
csvFile += csvRow;
}
return File(Encoding.ASCII.GetBytes(csvFile), "text/csv", "MyObjects.csv");
}
Should yield a CSV with the first row being the names of the fields, and rows following. Now... to read in a csv and create it back into a list of objects...
Note: example is in ASP.NET Core MVC, but should be very similar to .NET framework. Also had considered ServiceStack.Text but the license was not easy to follow.
For the best solution, you can read this article: Convert List of Object to CSV File C# - Codingvila
using Codingvila.Models;
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Linq;
using System.Text;
using System.Web;
using System.Web.Mvc;
namespace Codingvila.Controllers
{
public class HomeController : Controller
{
public ActionResult Index()
{
CodingvilaEntities entities = new CodingvilaEntities();
var lstStudents = (from Student in entities.Students
select Student);
return View(lstStudents);
}
[HttpPost]
public FileResult ExportToCSV()
{
#region Get list of Students from Database
CodingvilaEntities entities = new CodingvilaEntities();
List<object> lstStudents = (from Student in entities.Students.ToList()
select new[] { Student.RollNo.ToString(),
Student.EnrollmentNo,
Student.Name,
Student.Branch,
Student.University
}).ToList<object>();
#endregion
#region Create Name of Columns
var names = typeof(Student).GetProperties()
.Select(property => property.Name)
.ToArray();
lstStudents.Insert(0, names.Where(x => x != names[0]).ToArray());
#endregion
#region Generate CSV
StringBuilder sb = new StringBuilder();
foreach (var item in lstStudents)
{
string[] arrStudents = (string[])item;
foreach (var data in arrStudents)
{
//Append data with comma(,) separator.
sb.Append(data + ',');
}
//Append new line character.
sb.Append("\r\n");
}
#endregion
#region Download CSV
return File(Encoding.ASCII.GetBytes(sb.ToString()), "text/csv", "Students.csv");
#endregion
}
}
}