I want to be able to filter out a CSV file and perform data validation on the filtered data. I imagine for loops, but the file has 2 million cells and it would take a long time. I am using Lumenworks CSVReader for accessing the file using C#.
I found this method csvfile.Where<> but I have no idea what to put in the parameters. Sorry I am still new to coding as well.
[EDIT] This is my code for loading the file. Thanks for all the help!
//Creating C# table from CSV data
var csvTable = new DataTable();
var csvReader = new CsvReader(newStreamReader(System.IO.File.OpenRead(filePath[0])), true);
csvTable.Load(csvReader);
//grabs header from the CSV data table
string[] headers = csvReader.GetFieldHeaders(); //this method gets the headers of the CSV file
string filteredData[] = csvReader.Where // this is where I would want to implement the where method, or some sort of way to filter the data
//I can access the rows and columns with this
csvTable.Rows[0][0]
csvTable.Columns[0][0]
//After filtering (maybe even multiple filters) I want to add up all the filtered data (assuming they are integers)
var dataToValidate = 0;
foreach var data in filteredData{
dataToValidate += data;
}
if (dataToValidate == 123)
//data is validated
I would read some of the documentation for the package you are using:
https://github.com/phatcher/CsvReader
https://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
To specifically answer the filtering question, so it only contains the data you are searching for consider the following:
var filteredData = new List<List<string>>();
using (CsvReader csv = new CsvReader(new StreamReader(System.IO.File.OpenRead(filePath[0])), true));
{
string searchTerm = "foo";
while (csv.ReadNextRecord())
{
var row = new List<string>();
for (int i = 0; i < csv.FieldCount; i++)
{
if (csv[i].Contains(searchTerm))
{
row.Add(csv[i]);
}
}
filteredData.Add(row);
}
}
This will give you a list of a list of string that you can enumerate over to do your validation
int dataToValidate = 0;
foreach (var row in filteredData)
{
foreach (var data in row)
{
// do the thing
}
}
--- Old Answer ---
Without seeing the code you are using to load the file, it might be a bit difficult to give you a full answer, ~2 Million cells may be slow no matter what what.
Your .Where comes from System.Linq
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where?view=net-6.0
A simple example using .Where
//Read the file and return a list of strings that match the where clause
public List<string> ReadCSV()
{
List<string> data = File.ReadLines(#"C:\Users\Public\Documents\test.csv");
.Select(line => line.Split(','))
// token[x] where x is the column number, assumes ID is column 0
.Select(tokens => new CsvFileStructure { Id = tokens[0], Value = tokens[1] })
// Where filters based on whatever you are looking for in the CSV
.Where(csvFileStructure => csvFileStructure.Id == "1")
.ToList();
return data;
}
// Map of your data structure
public class CsvFileStructure
{
public long Id { get; set; }
public string Name { get; set; }
public string Value { get; set; }
}
Modified from this answer:
https://stackoverflow.com/a/10332737/7366061
There is no csvreader.Where method. The "where" is part of Linq in C#. The link below shows an example of computing columns in a csv file using Linq:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/how-to-compute-column-values-in-a-csv-text-file-linq
I know there is more similar question but I was not able to find the answer to mine. I have two CSV files. Both files contain image metadata for the same images, however, the first file image IDs are outdated. So I need to take the IDs from the second file and replace outdated IDs with new ones. I was thinking to compare image Longitude, Latitude, and Altitude rows values, and where it matches in both files I take image id from the second file. The IDs would be used in the new object. And the sequence of lines in files is different and the first file contains more lines than the second one.
The files structure looks as follows:
First file:
ImgID,Longitude,Latitude,Altitude
01,44.7282372307,27.5786807185,14.1536407471
02,44.7287939869,27.5777060219,13.2340240479
03,44.7254687824,27.582636255,16.5887145996
04,44.7254294913,27.5826908925,16.5794525146
05,44.728785278,27.5777185252,13.2553100586
06,44.7282279311,27.5786933339,14.1576690674
07,44.7253847039,27.5827526969,16.6026000977
08,44.7287777782,27.5777295052,13.2788238525
09,44.7282196988,27.5787045314,14.1649169922
10,44.7253397041,27.5828151049,16.6300048828
11,44.728769439,27.5777417846,13.3072509766
Second file:
ImgID,Longitude,Latitude,Altitude
5702,44.7282372307,27.5786807185,14.1536407471
5703,44.7287939869,27.5777060219,13.2340240479
5704,44.7254687824,27.582636255,16.5887145996
5705,44.7254294913,27.5826908925,16.5794525146
5706,44.728785278,27.5777185252,13.2553100586
5707,44.7282279311,27.5786933339,14.1576690674
How this can be done in C#? Is there is some handy library to work with?
I would use the CSVHelper library for CSV read/write as it is a complete nice library. For this, you should declare a class to hold your data, and its property names must match your CSV file's column names.
public class ImageData
{
public int ImgID { get; set; }
public double Longitude { get; set; }
public double Latitude { get; set; }
public double Altitude { get; set; }
}
Then to see if two lines are equal, what you need to do is see if each property in each line in one file matches the other. You could do this by simply comparing properties, but I'd rather write a comparer for this, like so:
public class ImageDataComparer : IEqualityComparer<ImageData>
{
public bool Equals(ImageData x, ImageData y)
{
return (x.Altitude == y.Altitude && x.Latitude == y.Latitude && x.Longitude == y.Longitude);
}
public int GetHashCode(ImageData obj)
{
unchecked
{
int hash = (int)2166136261;
hash = (hash * 16777619) ^ obj.Altitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Latitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Longitude.GetHashCode();
return hash;
}
}
}
Simple explanation is that we override the Equals() method and dictate that two instances of ImageData class are equal if the three property values are matching. I will show the usage in a bit.
The CSV read/write part is pretty easy (the library's help page has some good examples and tips, please read it). I can write two methods for reading and writing like so:
public static List<ImageData> ReadCSVData(string filePath)
{
List<ImageData> records;
using (var reader = new StreamReader(filePath))
{
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = true;
records = csv.GetRecords<ImageData>().ToList();
}
}
return records;
}
public static void WriteCSVData(string filePath, List<ImageData> records)
{
using (var writer = new StreamWriter(filePath))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
}
}
You can actually write generic <T> read/write methods so the two methods are usable with different classes, if that's something useful for you.
Next is the crucial part. First, read the two files to memory using the methods we just defined.
var oldData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"));
var newData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "NewFile.csv"));
Now, I can go through each line in the 'old' data, and see if there's a corresponding record in 'new' data. If so, I grab the ID from the new data and replace the ID of old data with it. Notice the usage of the comparer we wrote.
foreach (var line in oldData)
{
var replace = newData.FirstOrDefault(x => new ImageDataComparer().Equals(x, line));
if (replace != null && replace.ImgID != line.ImgID)
{
line.ImgID = replace.ImgID;
}
}
Next, simply overwrite the old data file.
WriteCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"), oldData);
Results
I'm using a simplified version of your data to easily verify our results.
Old Data
ImgID,Longitude,Latitude,Altitude
1,1,2,3
2,2,3,4
3,3,4,5
4,4,5,6
5,5,6,7
6,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
New Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
Now our expected results should be that the first 6 lines of the old files should have the ids updated, and that's what we get:
Updated Old Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
An alternate way to do it, if for some reason you didn't want to use the CSVHelper, is to write a method that compares two lines of data and determines if they're equal (by ignoring the first column data):
public static bool DataLinesAreEqual(string first, string second)
{
if (first == null || second == null) return false;
var xParts = first.Split(',');
var yParts = second.Split(',');
if (xParts.Length != 4 || yParts.Length != 4) return false;
return xParts.Skip(1).SequenceEqual(yParts.Skip(1));
}
Then we can read all the lines from both files into arrays, and then we can update our first file lines with those from the second file if our method says they're equal:
var csvPath1 = #"c:\temp\csvData1.csv";
var csvPath2 = #"c:\temp\csvData2.csv";
// Read lines from both files
var first = File.ReadAllLines(csvPath1);
var second = File.ReadAllLines(csvPath2);
// Select the updated line where necessary
var updated = first.Select(f => second.FirstOrDefault(s => DataLinesAreEqual(f, s)) ?? f);
// Write the updated result back to the first file
File.WriteAllLines(csvPath1, updated);
I'm trying to learn some C# here. My goal is to create and write on multiple custom files which name varies based on a part of the string to be written. Below some examples:
Let's say strings to be written are basically rows of a csv file:
2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;
2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;G
2019-10-29 16:13:11;;18;0;;3;false;false;true;
As you may notice, first field of each string is a date, and that's and that is the key field to choose the name of the file to write to.
First two fields have same date, so both strings will be printed on a single file, the third one in a second file since it has different date.
Expected Result:
First File:
2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;
2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;
Second File:
2019-10-29 16:13:11;;18;0;;3;false;false;true;
Now I have multiple rows like those, and I'd like to print them on different files based on their first value.
I managed to create a class which might represent each row:
class Value {
public DateTime date = DateTime.Now;
public decimal cod = 0;
public decimal quantity = 0;
public decimal price = 0;
//Other irrelevant fields
}
And I also tried to develop a method to write a single Value on given File:
private static void WriteValue(Value content, string folder, string fileName) {
using(StreamWriter writer = new StreamWriter(Path.Combine(folder, fileName), true, Encoding.ASCII)) {
writer.Write(content.dataora.ToString("yyyyMMdd"));
writer.Write("0000");
writer.Write("I");
writer.Write("C");
writer.Write(content.codpro.ToString().PadLeft(14, '0'));
writer.Write(Convert.ToInt64(content.qta * 100).ToString().PadLeft(8, '0'));
writer.WriteLine();
}
}
And a Method to write Values them into files
static void WriteValues(List<Value> fileContent) {
//Once I got all Values of File in a List of Values, I try to write them in files
}
if(fileContent.Count > 0) {
foreach(Value riga in fileContent) {
//Temp Dates, used to compare previous Date in order to know if I have to write Value in new File or it can be written on same File
string dataTemp = riga.dataora.ToString("yyyy-MM-dd");
string lastData = string.Empty;
string FileName = "ordinivoa_999999." + DateTime.Now.ToString("yyMMddHHmmssfff");
//If lastData is Empty we are writing first value
if (string.IsNullOrEmpty(lastData)) {
WriteValue(riga, toLinfaFolder, FileName);
lastData = dataTemp;
}
//Else if lastData is equal as last managed date we write on same file
else if (lastData == dataTemp) {
WriteValue(riga, toLinfaFolder, FileName);
}
else {
//Else current date of Value is new, so we write it in another file
string newFileName = "ordinivoa_999999." + DateTime.Now.AddMilliseconds(1).ToString("yyMMddHHmmssfff");
WriteValue(riga, toLinfaFolder, newFileName);
lastData = dataTemp;
}
}
}
}
My issue is method above has strange behavior, writes first equal dates on a single file, which is good, but writes all other values in a single file, even if we have different dates.
How to make sure each value gets printed on in a single file only if has same date value?
You can group equal dates easily with a LINQ query
private static void WriteValues(List<Value> fileContent)
{
var dateGroups = fileContent
.GroupBy(v => $"ordinivoa_999999.{v.date:yyMMddHHmmssfff}");
foreach (var group in dateGroups) {
string path = Path.Combine(toLinfaFolder, group.Key);
using (var writer = new StreamWriter(path, true, Encoding.ASCII)) {
foreach (Value item in group) {
//TODO: write item to file
writer.WriteLine(...
}
}
}
}
Since a DateTime stores values in units of one ten-millionth of a second, two dates looking equal once formatted, might still be different. So I suggest grouping on the filename to avoid this effect. I used string interpolation to create and format the file name.
Don't open and close the file for each text line.
At the top of your code file you need a
using System.Linq;
You are on the right path declaring a class, but you're also doing a whole bunch of unnecessary stuff. Using LINQ this can be simplified by a great deal.
First I define a class, and since all you want to do is write each record, I would use a DateTime field, and a string field for the entire raw record.
class MyRecordOfSomeType
{
public DateTime Date { get; set; }
public string RawData { get; set; }
}
The DateTime filed is so that it'll come in handy when you're doing LINQ.
Now we iterate through your data, split using ;, then create your class instance list.
var data = new List<string>()
{
"2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;",
"2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;G",
"2019-10-29 16:13:11;;18;0;;3;false;false;true;"
};
var records = new List<MyRecordOfSomeType>();
foreach (var item in data)
{
var parts = item.Split(';');
DateTime.TryParse(parts[0], out DateTime result);
var rec = new MyRecordOfSomeType() { Date = result, RawData = item };
records.Add(rec);
}
Then we group by date. Note that it's important to group by the Date component of the DateTime structure, otherwise it will consider the Time component as well and you'll have more files than you need.
var groups = records.GroupBy(x => x.Date.Date);
Finally, iterate your groups, and write contents of each group to a new file.
foreach (var group in groups)
{
var fileName = string.Format("ordinivoa_999999_{0}.csv", group.Key.ToString("yyMMddHHmmssfff"));
File.WriteAllLines(fileName, group.Select(x => x.RawData));
}
I want to use FileHelpers to read an extremely basic CSV file into C#.
I have a Model that looks like this;
[DelimitedRecord(",")]
[IgnoreFirst()]
public class NavigationButton
{
public int ID;
public string Text;
public string Path;
}
The CSV file looks like this;
I want to be able to read the appropriate lines and create a new NavigationButton for each record read in from the CSV file. I have read them into a DataTable using;
public DataTable GetNavigationButtonNames()
{
var filename = #"C:\Desktop\NavigationButtons.csv";
var engine = new FileHelperEngine(typeof(NavigationButton));
return engine.ReadFileAsDT(filename);
}
but I now cannot loop through the DataTable as it doesn't implement IEnumerable. I would have created a new NavigationButton in a foreach loop and added in the appropriate rows, however this cannot be done the way I have started out.
How can I change this method so that I can loop through the object I read into from the CSV file and create a new button for each row in the CSV file?
Use the generic version of FileHelperEngine<T> instead, then you can do:
var filename = #"C:\Desktop\NavigationButtons.csv";
var engine = new FileHelperEngine<NavigationButton>();
// ReadFile returns an array of NavigationButton
var records = engine.ReadFile(filename);
// then you can do your foreach or just get the button names with LINQ
return records.Select(x => x.Text);
The documentation for ReadFile() is here.
Or there is also ReadFileAsList() if you prefer.
How about this:
List<NavigationButton> buttons = new List<NavigationButton>();
DataTable dt = GetNavigationButtonNames();
foreach (DataRow dr in dt.Rows)
{
buttons.Add(new NavigationButton
{
ID = int.Parse(dr["id"]),
Text = dr["Text"].ToString(),
Path = dr["Path"].ToString() });
});
}
I'm new to using Dynamic Objects in C#. I am reading a CSV file very similarly to the code found here: http://my.safaribooksonline.com/book/programming/csharp/9780321637208/csharp-4dot0-features/ch08lev1sec3
I can reference the data I need with a static name, however I can not find the correct syntax to reference using a dynamic name at run time.
For example I have:
var records = from r in myDynamicClass.Records select r;
foreach(dynamic rec in records)
{
Console.WriteLine(rec.SomeColumn);
}
And this works fine if you know the "SomeColumn" name. I would prefer to have a column name a a string and be able to make the same type refrence at run time.
Since one has to create the class which inherits from DynamicObject, simply add an indexer to the class to achieve one's result via strings.
The following example uses the same properties found in the book example, the properties which holds the individual line data that has the column names. Below is the indexer on that class to achieve the result:
public class myDynamicClassDataLine : System.Dynamic.DynamicObject
{
string[] _lineContent; // Actual line data
List<string> _headers; // Associated headers (properties)
public string this[string indexer]
{
get
{
string result = string.Empty;
int index = _headers.IndexOf(indexer);
if (index >= 0 && index < _lineContent.Length)
result = _lineContent[index];
return result;
}
}
}
Then access the data such as
var csv =
#",,SomeColumn,,,
ab,cd,ef,,,"; // Ef is the "SomeColumn"
var data = new myDynamicClass(csv); // This holds multiple myDynamicClassDataLine items
Console.WriteLine (data.OfType<dynamic>().First()["SomeColumn"]); // "ef" is the output.
You will need to use reflection. To get the names you would use:
List<string> columnNames = new List<string>(records.GetType().GetProperties().Select(i => i.Name));
You can then loop through your results and output the values for each column like so:
foreach(dynamic rec in records)
{
foreach (string prop in columnNames)
Console.Write(rec.GetType().GetProperty (prop).GetValue (rec, null));
}
Try this
string column = "SomeColumn";
var result = rec.GetType().GetProperty (column).GetValue (rec, null);