CsvHelper - Split output files [duplicate]

CsvHelper - Split output files [duplicate] - c#

This question already has answers here:
Split a List into smaller lists of N size [duplicate]
(21 answers)
Closed 3 years ago.
I'm using Csv Helper to write out a Linq Query with million of rows. I would like to split the output by, for instance, 1 million of rows each. Could I do that or should I use other type of writting method?
Here is my code:
var _path = UniversalVariables.outputCsvFiles + "entire_output.csv";
var pvQuery = from car in Cars
select car;
if (!Directory.Exists(UniversalVariables.outputCsvFiles))
{
Directory.CreateDirectory(UniversalVariables.outputCsvFiles);
}
using (var sw = new StreamWriter(_path))
using (var csv = new CsvWriter(sw))
{
csv.Configuration.Delimiter = UniversalVariables.csvDelimiter;
csv.Configuration.HasHeaderRecord = true;
csv.WriteHeader<Car>();
csv.NextRecord();
csv.WriteRecords(pvQuery);
sw.Flush();
}

You could use Linq to split the collection in to sub collections (chunks of size n). For example
pvQuery.Select((x,index)=>new {Value=x,Index=index})
.GroupBy(x=>(int)(x.Index/numberOfItemsPerGroup))
.Select(x=>x.Select(c=>c.Value));
Making it a Extension method
static class Extensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int numberOfItemsPerGroup)
{
return source.Select((x,index)=>new {Value=x,Index=index})
.GroupBy(x=>(int)(x.Index/numberOfItemsPerGroup))
.Select(x=>x.Select(c=>c.Value));
}
}
Client code
SourceCollection.Split(numberOfItemsPerGroup);

Related

How to work with data in List in method C# [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 days ago.
Improve this question
I have this method, where i load data from csv file
public static List<string> LoadStations()
{
using (StreamReader reader = new StreamReader(#"X:\2022-23\ZPR\GDI Jízní řád\jizdniradgdi\Stations.csv"))
{
List<string> stations = new List<string>();
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
string[] values = line.Split(new char[] { ';' }, 1);
values[0] = values[0].Trim();
foreach (var item in values)
{
stations.Add(item);
}
}
return stations;
}
}
How can i work with the data in the list somewhere else?
For example if i want to write them in console, but not in the method?

How can i work with the data in the list somewhere else?
You call the method:
var stations = LoadStations();
Now the variable stations will refer to the list. You could do something like this to write out the contents:
var stations = LoadStations();
foreach(var station in stations)
{
Console.WriteLine(station);
}
But this doesn't seem right. If you have a file with 3 columns and 4 rows, you're gonna end up with a list that has 3 single items. It throws away the rest of the data. That seems like a recipe for duplicating work later on.
Instead, I recommend starting from code more like this:
public static IEnumerable<string[]> LoadStations(string fileName)
{
var lines = File.ReadLines(fileName);
return lines.Select(line => line.Split(';'));
}
This returns all the columns. It does skip the .Trim() call, but we can put that back:
public static IEnumerable<string[]> LoadStations(string fileName)
{
var lines = File.ReadLines(fileName);
return lines.Select(line => line.Split(';').Select(c => c.Trim()).ToArray());
}
Then you can use it like this:
var stations = LoadStations(#"X:\2022-23\ZPR\GDI Jízní řád\jizdniradgdi\Stations.csv");
foreach(var station in stations)
{
Console.WriteLine(station[0]);
}
Notice this doesn't even use a list. Instead, it uses an IEnumerable. The advantage is this means you only need to keep one line in memory at a time, but it was still easy to use with a foreach loop.
Even better: Convert the string array into a class with proper field or property names. This will save you so much work needing to reparse the same data later on.
public static IEnumerable<Station> LoadStations(string fileName)
{
var lines = File.ReadLines(fileName);
return lines.Select(line => new Station() {
Name = line[0].Trim(),
OtherField = line[1].Trim(),
IntegerField = int.Parse(line[2].Trim()),
Etc = line[N].Trim()
});
}
var stations = LoadStations(#"X:\2022-23\ZPR\GDI Jízní řád\jizdniradgdi\Stations.csv");
foreach(var station in stations)
{
Console.WriteLine(station.Name);
}
Even better: Use a dedicated CSV parser! There are just so many edge cases around CSV data. We think it's simple, and a given file usually is. But more broadly you will tend towards better performance and consistency pulling a real CSV parser from NuGet.

How to find duplicate keys from a List<KeyValuePair<byte[], string>> fileHashList = new List<KeyValuePair<byte[], string>>(); [duplicate]

This question already has answers here:
Group by array contents
(1 answer)
Easiest way to compare arrays in C#
(19 answers)
Closed 2 years ago.
I've a lsit of type List<KeyValuePair<byte[], string>> fileHashList = new List<KeyValuePair<byte[], string>>();
foreach (string entry in results)
{
FileInfo fileInfo = new FileInfo(Path.Combine("DirectoryPath"), entry));
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(fileInfo.FullName))
{
var hash = md5.ComputeHash(stream);
fileHashList.Add(new KeyValuePair<byte[], string>(hash, fileInfo.FullName));
}
}
}
I need to find all the duplicate keys in this list.
I tried this but doesn't work in my case, I get "Enumeration yielded no results" even though I've same keys!
Let me know if any additional data is needed
Thanks

compare rows values of two different CSV files in c#

I know there is more similar question but I was not able to find the answer to mine. I have two CSV files. Both files contain image metadata for the same images, however, the first file image IDs are outdated. So I need to take the IDs from the second file and replace outdated IDs with new ones. I was thinking to compare image Longitude, Latitude, and Altitude rows values, and where it matches in both files I take image id from the second file. The IDs would be used in the new object. And the sequence of lines in files is different and the first file contains more lines than the second one.
The files structure looks as follows:
First file:
ImgID,Longitude,Latitude,Altitude
01,44.7282372307,27.5786807185,14.1536407471
02,44.7287939869,27.5777060219,13.2340240479
03,44.7254687824,27.582636255,16.5887145996
04,44.7254294913,27.5826908925,16.5794525146
05,44.728785278,27.5777185252,13.2553100586
06,44.7282279311,27.5786933339,14.1576690674
07,44.7253847039,27.5827526969,16.6026000977
08,44.7287777782,27.5777295052,13.2788238525
09,44.7282196988,27.5787045314,14.1649169922
10,44.7253397041,27.5828151049,16.6300048828
11,44.728769439,27.5777417846,13.3072509766
Second file:
ImgID,Longitude,Latitude,Altitude
5702,44.7282372307,27.5786807185,14.1536407471
5703,44.7287939869,27.5777060219,13.2340240479
5704,44.7254687824,27.582636255,16.5887145996
5705,44.7254294913,27.5826908925,16.5794525146
5706,44.728785278,27.5777185252,13.2553100586
5707,44.7282279311,27.5786933339,14.1576690674
How this can be done in C#? Is there is some handy library to work with?

I would use the CSVHelper library for CSV read/write as it is a complete nice library. For this, you should declare a class to hold your data, and its property names must match your CSV file's column names.
public class ImageData
{
public int ImgID { get; set; }
public double Longitude { get; set; }
public double Latitude { get; set; }
public double Altitude { get; set; }
}
Then to see if two lines are equal, what you need to do is see if each property in each line in one file matches the other. You could do this by simply comparing properties, but I'd rather write a comparer for this, like so:
public class ImageDataComparer : IEqualityComparer<ImageData>
{
public bool Equals(ImageData x, ImageData y)
{
return (x.Altitude == y.Altitude && x.Latitude == y.Latitude && x.Longitude == y.Longitude);
}
public int GetHashCode(ImageData obj)
{
unchecked
{
int hash = (int)2166136261;
hash = (hash * 16777619) ^ obj.Altitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Latitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Longitude.GetHashCode();
return hash;
}
}
}
Simple explanation is that we override the Equals() method and dictate that two instances of ImageData class are equal if the three property values are matching. I will show the usage in a bit.
The CSV read/write part is pretty easy (the library's help page has some good examples and tips, please read it). I can write two methods for reading and writing like so:
public static List<ImageData> ReadCSVData(string filePath)
{
List<ImageData> records;
using (var reader = new StreamReader(filePath))
{
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = true;
records = csv.GetRecords<ImageData>().ToList();
}
}
return records;
}
public static void WriteCSVData(string filePath, List<ImageData> records)
{
using (var writer = new StreamWriter(filePath))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
}
}
You can actually write generic <T> read/write methods so the two methods are usable with different classes, if that's something useful for you.
Next is the crucial part. First, read the two files to memory using the methods we just defined.
var oldData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"));
var newData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "NewFile.csv"));
Now, I can go through each line in the 'old' data, and see if there's a corresponding record in 'new' data. If so, I grab the ID from the new data and replace the ID of old data with it. Notice the usage of the comparer we wrote.
foreach (var line in oldData)
{
var replace = newData.FirstOrDefault(x => new ImageDataComparer().Equals(x, line));
if (replace != null && replace.ImgID != line.ImgID)
{
line.ImgID = replace.ImgID;
}
}
Next, simply overwrite the old data file.
WriteCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"), oldData);
Results
I'm using a simplified version of your data to easily verify our results.
Old Data
ImgID,Longitude,Latitude,Altitude
1,1,2,3
2,2,3,4
3,3,4,5
4,4,5,6
5,5,6,7
6,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
New Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
Now our expected results should be that the first 6 lines of the old files should have the ids updated, and that's what we get:
Updated Old Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13

An alternate way to do it, if for some reason you didn't want to use the CSVHelper, is to write a method that compares two lines of data and determines if they're equal (by ignoring the first column data):
public static bool DataLinesAreEqual(string first, string second)
{
if (first == null || second == null) return false;
var xParts = first.Split(',');
var yParts = second.Split(',');
if (xParts.Length != 4 || yParts.Length != 4) return false;
return xParts.Skip(1).SequenceEqual(yParts.Skip(1));
}
Then we can read all the lines from both files into arrays, and then we can update our first file lines with those from the second file if our method says they're equal:
var csvPath1 = #"c:\temp\csvData1.csv";
var csvPath2 = #"c:\temp\csvData2.csv";
// Read lines from both files
var first = File.ReadAllLines(csvPath1);
var second = File.ReadAllLines(csvPath2);
// Select the updated line where necessary
var updated = first.Select(f => second.FirstOrDefault(s => DataLinesAreEqual(f, s)) ?? f);
// Write the updated result back to the first file
File.WriteAllLines(csvPath1, updated);

C# How do I write a variable to a file? [duplicate]

This question already has answers here:
How to save a List<string> on Settings.Default?
(4 answers)
Saving from List<T> to txt
(7 answers)
Closed 7 years ago.
I'm new to C# and I'm trying to make a List persist when relaunching the application. Everywhere I go I can't seem to find a simple way to do this, something like Python's Pickle. Any help would be appreciated thank you.

The answer to this really depends on what exactly you want to save. Is it an actual List, as in List<> obejct? What does it contain? If it's something simple such as a List< string >, then do
var list = new List<string>();
list.Add("HELLO");
list.Add("hi");
// save
using (var fs = new FileStream(#"F:\test.xml", FileMode.Create))
{
var serializer = new XmlSerializer(typeof(List<string>));
serializer.Serialize(fs, list);
}
// read
using (var s = new FileStream(#"F:\test.xml", FileMode.Open))
{
var serializer = new XmlSerializer(typeof(List<string>));
List<string> result = (List<string>)serializer.Deserialize(s);
}

Design Pattern to output separate different files from same input parameter

I have this Food class with 20 properties. I need to use this Food class and output 3 different files, using variations of these 20 fields. For example, File 1 contains output only 8 fields. File 2 contains 15 fields. File 3 contains 18 fields.
So right now, I have these 3 separate methods.
FoodService()
{
void WriteRecommendedFood(IList<Food> foodList);
void WriteRecommendedFoodCalculation(IList<Food> foodList);
void WriteRecommendedFoodAllEligibleFoods(IList<Food> foodList);
}
So I'd write:
public void WriteRecommendedFood(IList<Food> foodList)
{
using (StreamWriter sw = new StreamWriter("file.csv", false)
{
StringBuilder sb = new StringBuilder();
foreach (Food f in foodList)
{
sb.Append(f.Field1);
//Repeat for the # of fields I want to spit out
sb.Clear();
sw.WriteLIne(sb.ToString());
}
sw.Close();
}
}
I feel like I'm writing the same code three times (with slight variations).
I started to read up on different design patterns like Visitor and Strategy pattern, but I'm not sure which design pattern to improve my code. (Note: I only need to output it to a comma delimited file at this time. Also, from the UI side, the user gets to select which one they want to output (either one to all 3 files.) Any suggestions?

It seems that the only thing that changes between these three functions is a list of fields that get written. There are several ways you can represent "a field" (and thus a list of fields) in a program; one of the most convenient is doing so as a function that extracts this field's value from a Food instance.
The type of this representation would be Func<Food, object>, so with a List<Func<Food, object>> you are good to go.
public void WriteFoodData(IEnumerable<Food> foodList, IEnumerable<Func<Food, object>> valueProviders)
{
using (StreamWriter sw = new StreamWriter("file.csv", false))
{
StringBuilder sb = new StringBuilder();
foreach (Food f in foodList)
{
foreach (var provider in valueProviders)
{
sb.Append(provider(f).ToString());
}
sw.WriteLIne(sb.ToString());
sb.Clear();
}
}
}
Now you can create a "list of fields" and use it to call this method:
var valueProviders = new List<Func<Food, object>>
{
f => f.Field1,
f => f.Field4,
// etc
};
var foods = /* whatever */
WriteFoodData(foods, valueProviders);

I would remove the responsibility for formatting from your FoodService and inject it instead.
public class FoodService()
{
public void WriteRecommendedFood(IList<Food> foodList, IFoodFormatter formatter)
{
using (StreamWriter sw = new StreamWriter("file.csv", false)
{
StringBuilder sb = new StringBuilder();
foreach (Food f in foodList)
{
sw.WriteLine(foodformatter.Format(f));
}
sw.Close();
}
}
}
interface IFoodFormatter
{
string Format(Food f);
}
This whay you can create concrete formatters like CalculationFormatter and ElligableFoodsFormatter.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.