I know there is more similar question but I was not able to find the answer to mine. I have two CSV files. Both files contain image metadata for the same images, however, the first file image IDs are outdated. So I need to take the IDs from the second file and replace outdated IDs with new ones. I was thinking to compare image Longitude, Latitude, and Altitude rows values, and where it matches in both files I take image id from the second file. The IDs would be used in the new object. And the sequence of lines in files is different and the first file contains more lines than the second one.
The files structure looks as follows:
First file:
ImgID,Longitude,Latitude,Altitude
01,44.7282372307,27.5786807185,14.1536407471
02,44.7287939869,27.5777060219,13.2340240479
03,44.7254687824,27.582636255,16.5887145996
04,44.7254294913,27.5826908925,16.5794525146
05,44.728785278,27.5777185252,13.2553100586
06,44.7282279311,27.5786933339,14.1576690674
07,44.7253847039,27.5827526969,16.6026000977
08,44.7287777782,27.5777295052,13.2788238525
09,44.7282196988,27.5787045314,14.1649169922
10,44.7253397041,27.5828151049,16.6300048828
11,44.728769439,27.5777417846,13.3072509766
Second file:
ImgID,Longitude,Latitude,Altitude
5702,44.7282372307,27.5786807185,14.1536407471
5703,44.7287939869,27.5777060219,13.2340240479
5704,44.7254687824,27.582636255,16.5887145996
5705,44.7254294913,27.5826908925,16.5794525146
5706,44.728785278,27.5777185252,13.2553100586
5707,44.7282279311,27.5786933339,14.1576690674
How this can be done in C#? Is there is some handy library to work with?
I would use the CSVHelper library for CSV read/write as it is a complete nice library. For this, you should declare a class to hold your data, and its property names must match your CSV file's column names.
public class ImageData
{
public int ImgID { get; set; }
public double Longitude { get; set; }
public double Latitude { get; set; }
public double Altitude { get; set; }
}
Then to see if two lines are equal, what you need to do is see if each property in each line in one file matches the other. You could do this by simply comparing properties, but I'd rather write a comparer for this, like so:
public class ImageDataComparer : IEqualityComparer<ImageData>
{
public bool Equals(ImageData x, ImageData y)
{
return (x.Altitude == y.Altitude && x.Latitude == y.Latitude && x.Longitude == y.Longitude);
}
public int GetHashCode(ImageData obj)
{
unchecked
{
int hash = (int)2166136261;
hash = (hash * 16777619) ^ obj.Altitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Latitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Longitude.GetHashCode();
return hash;
}
}
}
Simple explanation is that we override the Equals() method and dictate that two instances of ImageData class are equal if the three property values are matching. I will show the usage in a bit.
The CSV read/write part is pretty easy (the library's help page has some good examples and tips, please read it). I can write two methods for reading and writing like so:
public static List<ImageData> ReadCSVData(string filePath)
{
List<ImageData> records;
using (var reader = new StreamReader(filePath))
{
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = true;
records = csv.GetRecords<ImageData>().ToList();
}
}
return records;
}
public static void WriteCSVData(string filePath, List<ImageData> records)
{
using (var writer = new StreamWriter(filePath))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
}
}
You can actually write generic <T> read/write methods so the two methods are usable with different classes, if that's something useful for you.
Next is the crucial part. First, read the two files to memory using the methods we just defined.
var oldData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"));
var newData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "NewFile.csv"));
Now, I can go through each line in the 'old' data, and see if there's a corresponding record in 'new' data. If so, I grab the ID from the new data and replace the ID of old data with it. Notice the usage of the comparer we wrote.
foreach (var line in oldData)
{
var replace = newData.FirstOrDefault(x => new ImageDataComparer().Equals(x, line));
if (replace != null && replace.ImgID != line.ImgID)
{
line.ImgID = replace.ImgID;
}
}
Next, simply overwrite the old data file.
WriteCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"), oldData);
Results
I'm using a simplified version of your data to easily verify our results.
Old Data
ImgID,Longitude,Latitude,Altitude
1,1,2,3
2,2,3,4
3,3,4,5
4,4,5,6
5,5,6,7
6,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
New Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
Now our expected results should be that the first 6 lines of the old files should have the ids updated, and that's what we get:
Updated Old Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
An alternate way to do it, if for some reason you didn't want to use the CSVHelper, is to write a method that compares two lines of data and determines if they're equal (by ignoring the first column data):
public static bool DataLinesAreEqual(string first, string second)
{
if (first == null || second == null) return false;
var xParts = first.Split(',');
var yParts = second.Split(',');
if (xParts.Length != 4 || yParts.Length != 4) return false;
return xParts.Skip(1).SequenceEqual(yParts.Skip(1));
}
Then we can read all the lines from both files into arrays, and then we can update our first file lines with those from the second file if our method says they're equal:
var csvPath1 = #"c:\temp\csvData1.csv";
var csvPath2 = #"c:\temp\csvData2.csv";
// Read lines from both files
var first = File.ReadAllLines(csvPath1);
var second = File.ReadAllLines(csvPath2);
// Select the updated line where necessary
var updated = first.Select(f => second.FirstOrDefault(s => DataLinesAreEqual(f, s)) ?? f);
// Write the updated result back to the first file
File.WriteAllLines(csvPath1, updated);
C# Folks! I have 2 List that I want to compare.
Example:
List<string> ONE contains:
A
B
C
List<string> TWO contains:
B
C
I know I can achieve the results of ONE if I do:
ONE.Except(TWO);
Results: A
How can I do the same if my Lists contain a file extension for each
Element?
List<string> ONE contains:
A.pdf
B.pdf
C.pdf
List<string> TWO contains: (will always have .txt extension)
B.txt
C.txt
Results should = A.pdf
I realized that I need to display the full filename (A.pdf) in a report at the end, so I cannot strip the extension, like I originally did.
Thanks for the help!
EDIT:
This is how I went about it, but I am not sure if this is the "best" or "most performant" way to actually solve it, but it does seem to work...
foreach (string s in ONE)
{
//since I know TWO will always be .txt
string temp = Path.GetFileNameWithoutExtension(s) + ".txt";
if (TWO.Contains(temp))
{
// yes it exists, do something
}
else
{
// no it does not exist, do something
}
}
This a very straightforward and a easy code , but if your requirement has more file extension
List<string> lstA = new List<string>() { "A.pdf", "B.pdf", "C.pdf" };
List<string> lstB = new List<string>() { "B.txt", "C.txt" };
foreach (var item in lstA)
{
if (lstB.Contains(item.Replace(".pdf",".txt"))==false)
{
Console.WriteLine(item);
}
}
You can implement a custom equality comparer:
class FileNameComparer: IEqualityComparer<String>
{
public bool Equals(String b1, String b2)
{
return Path.GetFileNameWithoutExtension(b1).Equals(Path.GetFileNameWithoutExtension(b2));
}
public int GetHashCode(String a)
{
return Path.GetFileNameWithoutExtension(a).GetHashCode();
}
}
... and pass it to the Except method:
System.Console.WriteLine(string.Join(", ", list1.Except(list2, new FileNameComparer())));
So I've been reading that I shouldn't write my own CSV reader/writer, so I've been trying to use the CsvHelper library installed via nuget. The CSV file is a grey scale image, with the number of rows being the image height and the number columns the width. I would like to read the values row-wise into a single List<string> or List<byte>.
The code I have so far is:
using CsvHelper;
public static List<string> ReadInCSV(string absolutePath)
{
IEnumerable<string> allValues;
using (TextReader fileReader = File.OpenText(absolutePath))
{
var csv = new CsvReader(fileReader);
csv.Configuration.HasHeaderRecord = false;
allValues = csv.GetRecords<string>
}
return allValues.ToList<string>();
}
But allValues.ToList<string>() is throwing a:
CsvConfigurationException was unhandled by user code
An exception of type 'CsvHelper.Configuration.CsvConfigurationException' occurred in CsvHelper.dll but was not handled in user code
Additional information: Types that inherit IEnumerable cannot be auto mapped. Did you accidentally call GetRecord or WriteRecord which acts on a single record instead of calling GetRecords or WriteRecords which acts on a list of records?
GetRecords is probably expecting my own custom class, but I'm just wanting the values as some primitive type or string. Also, I suspect the entire row is being converted to a single string, instead of each value being a separate string.
According to #Marc L's post you can try this:
public static List<string> ReadInCSV(string absolutePath) {
List<string> result = new List<string>();
string value;
using (TextReader fileReader = File.OpenText(absolutePath)) {
var csv = new CsvReader(fileReader);
csv.Configuration.HasHeaderRecord = false;
while (csv.Read()) {
for(int i=0; csv.TryGetField<string>(i, out value); i++) {
result.Add(value);
}
}
}
return result;
}
If all you need is the string values for each row in an array, you could use the parser directly.
var parser = new CsvParser( textReader );
while( true )
{
string[] row = parser.Read();
if( row == null )
{
break;
}
}
http://joshclose.github.io/CsvHelper/#reading-parsing
Update
Version 3 has support for reading and writing IEnumerable properties.
The whole point here is to read all lines of CSV and deserialize it to a collection of objects. I'm not sure why do you want to read it as a collection of strings. Generic ReadAll() would probably work the best for you in that case as stated before. This library shines when you use it for that purpose:
using System.Linq;
...
using (var reader = new StreamReader(path))
using (var csv = new CsvReader(reader))
{
var yourList = csv.GetRecords<YourClass>().ToList();
}
If you don't use ToList() - it will return a single record at a time (for better performance), please read https://joshclose.github.io/CsvHelper/examples/reading/enumerate-class-records
Please try this. This had worked for me.
TextReader reader = File.OpenText(filePath);
CsvReader csvFile = new CsvReader(reader);
csvFile.Configuration.HasHeaderRecord = true;
csvFile.Read();
var records = csvFile.GetRecords<Server>().ToList();
Server is an entity class. This is how I created.
public class Server
{
private string details_Table0_ProductName;
public string Details_Table0_ProductName
{
get
{
return details_Table0_ProductName;
}
set
{
this.details_Table0_ProductName = value;
}
}
private string details_Table0_Version;
public string Details_Table0_Version
{
get
{
return details_Table0_Version;
}
set
{
this.details_Table0_Version = value;
}
}
}
You are close. It isn't that it's trying to convert the row to a string. CsvHelper tries to map each field in the row to the properties on the type you give it, using names given in a header row. Further, it doesn't understand how to do this with IEnumerable types (which string implements) so it just throws when it's auto-mapping gets to that point in testing the type.
That is a whole lot of complication for what you're doing. If your file format is sufficiently simple, which yours appear to be--well known field format, neither escaped nor quoted delimiters--I see no reason why you need to take on the overhead of importing a library. You should be able to enumerate the values as needed with System.IO.File.ReadLines() and String.Split().
//pseudo-code...you don't need CsvHelper for this
IEnumerable<string> GetFields(string filepath)
{
foreach(string row in File.ReadLines(filepath))
{
foreach(string field in row.Split(',')) yield return field;
}
}
static void WriteCsvFile(string filename, IEnumerable<Person> people)
{
StreamWriter textWriter = File.CreateText(filename);
var csvWriter = new CsvWriter(textWriter, System.Globalization.CultureInfo.CurrentCulture);
csvWriter.WriteRecords(people);
textWriter.Close();
}
Apologies if the answer to this is obvious, I'm fairly new to C# and OOP. I've stepped though my code and spent quite some time on Google but I can't find the answer to my question (quite possibly because I am using the wrong search terms!).
I have the following class that creates a static List<List<string>> and has a method to add items to that list:
public static class WordList
{
static List<List<string>> _WordList; // Static List instance
static WordList()
{
//
// Allocate the list.
//
_WordList = new List<List<string>>();
}
public static void Record(List<string> Words)
{
//
// Record this value in the list.
//
_WordList.Add(Words);
}
}
Else where I create a List<string> which I pass into the Record() method to be added to _WordList. The problem is when I add items to WordList it gives every item in that list the same value. e.g.:
1st item added contains "Foo" and "bar"
2nd item added contains "Not","Foo" and "bar"
So instead of a list that looks like:
1: "Foo","bar"
2: "Not","Foo","bar"
I end up with:
1: "Not","Foo","bar"
2: "Not","Foo","bar"
I haven't used a List<string[]> instead of a List<List<string>> because the way I am getting the List<string> to add is by reading a text file line by line with a delimiter saying when I should add the List<string> and clear it so I can start again. Therefore I don't know how long an array I need to declare.
Hope this makes some kind of sense! If you need anymore of the code posting to help let me know.
Thanks, in advance.
EDIT
Here is the code for the creation of the List<string> that is passed to the Record() method. I think I see what people are saying about not creating a new instance of the List<string> but I'm not sure how to remedy this in regards to my code. I will have a think about it and post an answer if I come up with one!
public static void LoadWordList(string path)
{
string line;
List<string> WordsToAdd = new List<string>();
StreamReader file = new System.IO.StreamReader(path);
while ((line = file.ReadLine()) != null)
{
if (line.Substring(0, 1) == "$")
{
WordList.Record(WordsToAdd);
WordsToAdd.Clear();
WordsToAdd.Add(line.Replace("$", ""));
}
else
{
WordsToAdd.Add(line.Replace("_"," "));
}
}
file.Close();
}
Instead of
WordList.Record(WordsToAdd);
WordsToAdd.Clear();
WordsToAdd.Add(line.Replace("$", ""));
do
WordList.Record(WordsToAdd);
WordsToAdd = new List<string>();
WordsToAdd.Add(line.Replace("$", ""));
All that your Record method is doing is adding a reference to the List<string> you've passed to it. You then clear that same list, and start adding different strings to it.
Maybe something like:
public static void Record(IEnumerable<string> Words)
{
_WordList.Add(Words.ToList());
}
Which will force a copy to occur; also, by accepting IEnumerable<string>, it puts less restrictions on the code that calls it.
Can you post the code that adds the list - I bet you are doing something like
create a list l
add it
modify l
add it
This result in a single object (because you created it only once) with multiple references to it, namely from the first value in _WordList, from the second value in _WordList, from l.
So the right way to do it is:
create list l
add it
create NEW list l
add it
Or in code:
List<string> l = new string[] { "Foo", "bar" }.ToList();
WordList.Record(l);
l = new string[] { "Not", "Foo", "bar" }.ToList();
WordList.Record(l);
You haven't shown how you are adding items to the list. Here's an example which works as expected:
using System;
using System.Collections.Generic;
using System.Linq;
public static class WordList
{
static List<List<string>> _WordList; // Static List instance
static WordList()
{
_WordList = new List<List<string>>();
}
public static void Record(List<string> Words)
{
_WordList.Add(Words);
}
public static void Print()
{
foreach (var item in _WordList)
{
Console.WriteLine("-----");
Console.WriteLine(string.Join(",", item.ToArray()));
}
}
}
class Program
{
static void Main()
{
WordList.Record(new[] { "Foo", "bar" }.ToList());
WordList.Record(new[] { "Not", "Foo", "bar" }.ToList());
WordList.Print();
}
}
I'm still trying to understand KeyValuePairs but I believe this idea should work. In my code below it searchs through a large string and extracts 2 substrings. One substring (keep in mind the value between the quotes varies) is something like Identity="EDN\username" another substring is something like FrameworkSiteID="Desoto" So I was thinking about combining these strings together before I added them to the List but here is my problem.. The login string below is a Unique field of strings that I need to use in a SQL statement to select records in SQLServer and the framew strings are strings I need lined up with the login strings (and all the columns and rows of data coming from SQLServer) when I output this to a text file. Should I make the login strings KEYS and the framew strings VALUES? If so how do I do that?? Hope that makes sense. I can further explain if needs be
Regex reg = new Regex("Identity=\"[^\"]*\"");
Regex reg1 = new Regex("FrameworkSiteID=\"[^\"]*\"");
foreach (FileInfo file in Files)
{
string line = "";
using (StreamReader sr = new StreamReader(file.FullName))
{
while (!String.IsNullOrEmpty(line = sr.ReadLine()))
{
if (line.ToUpper().Contains("IDENTITY="))
{
string login = reg.Match(line).Groups[0].Value;
string framew = reg1.Match(line).Groups[0].Value; //added
IdentityLines.Add(new KeyValuePair<string, string>(file.Name, login + " " + framew));
//This is probably not what I need
}
else
{
IdentityLines.Add(new KeyValuePair<string, string>(file.Name, "NO LOGIN"));
}
}
KeyValuePair<TKey,TValue> is a structure used by the Dictionary<TKey,TValue> class. Instead of keeping a list of KeyValuePair<TKey,TValue> objects, just create a Dictionary<TKey,TValue> and add keys/values to it.
Example:
Dictionary<string,string> identityLines = new Dictionary<string,string>();
foreach (FileInfo file in Files)
{
string line = "";
using (StreamReader sr = new StreamReader(file.FullName))
{
while (!String.IsNullOrEmpty(line = sr.ReadLine()))
{
if (line.ToUpper().Contains("IDENTITY="))
{
string login = reg.Match(line).Groups[0].Value;
string framew = reg1.Match(line).Groups[0].Value; //added
identityLines.Add(login, framew);
}
}
}
}
This will create an association between logins and framews. If you want to sort these by file, you can make a Dictionary<string, Dictionary<string,string>> and associate each identityLines dictionary with a specific filename. Note that the key values of the Dictionary<TKey, TValue> type are unique - you will get an error if you try to add a key that has already been added.
I'm note clear what the purpose of this is. You don't seem to be using the KeyValuePairs as pairs of a Key and a Value. Are you using them as a general pair class? It's a reasonable use (I do this myself), but I'm not sure what help you are seeking.
The intended purpose of KeyValuePair is as a helper-class in the implementation of Dictionaries. This would be useful if you are going to look up values based on having a key, though it doesn't seem from your explanation that you are.
Why are you using the filename as the key? Does it matter?
I also don't see why you are loading all of this stuff into a list. Why not just yield them out and use them as they are found?
foreach (FileInfo file in Files)
{
using (StreamReader sr = new StreamReader(file.FullName))
{
for(string line = sr.ReadLine(); !string.IsNullOrEmpty(line); line = sr.ReadLine())
{
if(line.IndexOf("IDENTITY=", StringComparison.InvariantCultureIgnoreCase) != -1)
{
string login = reg.Match(line).Groups[0].Value;
string framew = reg1.Match(line).Groups[0].Value; //added
yield return new KeyValuePair<string, string>(login, framew));
}
}
}
}
On the other hand, if you do want to use them as key-d values:
Dictionary<string, string> logins = new Dictionary<string, string>();
foreach (FileInfo file in Files)
{
using (StreamReader sr = new StreamReader(file.FullName))
{
for(string line = sr.ReadLine(); !string.IsNullOrEmpty(line); line = sr.ReadLine())
{
if(line.IndexOf("IDENTITY=", StringComparison.InvariantCultureIgnoreCase) != -1)
{
string login = reg.Match(line).Groups[0].Value;
string framew = reg1.Match(line).Groups[0].Value; //added
logins.Add(login, framew));
}
}
}
}
Now logins[login] returns the related framew. If you want this to be case-insensitive then use new Dictionary<string, string>(StringComparer.InvariantCultureIgnoreCase) or new Dictionary<string, string>(StringComparer.CurrentCultureIgnoreCase) as appropriate.
Finally, are you sure there will be no blank likes until the end of the file? If there could be you should use line != null rather than !string.IsNullOrEmpty() to avoid stopping your file read prematurely.