C# Read and split from .txt to struct array - c#

I'm trying to make a basic login for my console app. I store the user data in a .txt file like this:
ID;Name;IsAdmin. The txt has several lines.
In the app I want to store user data in a struct User array. I can't seem to find the method to read the file, split and put the different data to the right place. This is what I have so far:
Loading user data to struct array
public static void LoadIDs()
{
int entries = FileHandling.CountRows(usersPath);
User[] users = new User[entries]; //Length depends on how many lines are in the .txt
for (int i = 0; i < users.Length; i++)
{
users[i] = new User(1,"a",false); //ID(int), name, isAdmin [This is where I want to put the data from the .txt]
}
}
Reading and spliting the text
public static string ReadFileToArray(string path)
{
String input = File.ReadAllText(path);
foreach (var record in input.Split('\n'))
{
foreach (var data in record.Split(';'))
{
return data;
}
}
return null;
}
I know that this doesn't work at all this way but my knowledge is limited yet and I cannot think of other solutions.

You have a better tool to store your users. Instead of an array (that forces you to know the length of the data loaded) you can use a List where you can add your elements while you read them.
Another point to change is the File.ReadAllText in File.ReadLines. This will allow to read line by line your file directly in the loop
public List<User> BuildUserList(string path)
{
List<User> result = new List<User>();
foreach (var record in File.ReadLines(path)
{
string[] data = record.Split(';'))
User current = new User();
current.ID = Convert.ToInt32(data[0]);
current.Name = data[1];
current.IsAdmin = Convert.ToBoolean(data[2]);
result.Add(current);
}
return result;
}
Now you can use the list like an array if you need
List<User> users = BuildUserList("yourfile.txt");
if(users.Count > 0)
{
Console.WriteLine("Name=" + users[0].Name);
}

If I were to assume your file especially each line having Id;Name;Admin values, I would write something like below to extract it. Please note that there are simple syntax out there but following logic will be helpful for beginners to understand how this could be achieved.
List<User> userList = new List<User>();
// Read the file located at c:\test.txt (this might be different in your case)
System.IO.StreamReader file = new System.IO.StreamReader(#"c:\test.txt");
string line;
while ((line = file.ReadLine()) != null)
{
//following logic will read each line and split by the separator before
// creating a new User instance. Remember to add more defensive logic to
// cover all cases
var extract = line.Split(';');
userList.Add(new User()
{
Id = extract[0],
Name = extract[1],
IsAdmin = extract[2]
});
}
file.Close();
//at this stage you will have List of User and converting it to array using following call
var userArray = userList.ToArray();

And just as another variant, a linq-solution could look like this:
var users = (
from string line in System.IO.File.ReadAllLines(#"..filepath..")
let parts = line.Split(';')
where parts.Length == 3
select new User() {
ID = Convert.ToInt32(parts[0]),
Name = parts[1],
IsAdmin = Convert.ToBoolean(parts[2])}
).ToArray();
This can be elegant and short, error handling may be a bit more difficult.

This will read your file lazily, so it can handle extremely huge files with ease (assuming the rest of your code can):
public IEnumerable<User> ReadUsers(string path)
{
return File.ReadLines(path)
.Select(l=>l.Split(';'))
.Select(l=> new User
{
Id = int.Parse(l[0]),
Name = l[1],
IsAdmin = bool.Parse(l[2])
});
}
or
public IEnumerable<User> ReadUsers(string path)
{
return File.ReadLines(path)
.Select(l=>l.Split(';'))
.Select(l=> new User(int.Parse(l[0]), l[1], bool.Parse(l[2])));
}

Related

Is there a way to filter a CSV file for data validation without for loops. (Lumenworks CSVReader)

I want to be able to filter out a CSV file and perform data validation on the filtered data. I imagine for loops, but the file has 2 million cells and it would take a long time. I am using Lumenworks CSVReader for accessing the file using C#.
I found this method csvfile.Where<> but I have no idea what to put in the parameters. Sorry I am still new to coding as well.
[EDIT] This is my code for loading the file. Thanks for all the help!
//Creating C# table from CSV data
var csvTable = new DataTable();
var csvReader = new CsvReader(newStreamReader(System.IO.File.OpenRead(filePath[0])), true);
csvTable.Load(csvReader);
//grabs header from the CSV data table
string[] headers = csvReader.GetFieldHeaders(); //this method gets the headers of the CSV file
string filteredData[] = csvReader.Where // this is where I would want to implement the where method, or some sort of way to filter the data
//I can access the rows and columns with this
csvTable.Rows[0][0]
csvTable.Columns[0][0]
//After filtering (maybe even multiple filters) I want to add up all the filtered data (assuming they are integers)
var dataToValidate = 0;
foreach var data in filteredData{
dataToValidate += data;
}
if (dataToValidate == 123)
//data is validated
I would read some of the documentation for the package you are using:
https://github.com/phatcher/CsvReader
https://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
To specifically answer the filtering question, so it only contains the data you are searching for consider the following:
var filteredData = new List<List<string>>();
using (CsvReader csv = new CsvReader(new StreamReader(System.IO.File.OpenRead(filePath[0])), true));
{
string searchTerm = "foo";
while (csv.ReadNextRecord())
{
var row = new List<string>();
for (int i = 0; i < csv.FieldCount; i++)
{
if (csv[i].Contains(searchTerm))
{
row.Add(csv[i]);
}
}
filteredData.Add(row);
}
}
This will give you a list of a list of string that you can enumerate over to do your validation
int dataToValidate = 0;
foreach (var row in filteredData)
{
foreach (var data in row)
{
// do the thing
}
}
--- Old Answer ---
Without seeing the code you are using to load the file, it might be a bit difficult to give you a full answer, ~2 Million cells may be slow no matter what what.
Your .Where comes from System.Linq
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where?view=net-6.0
A simple example using .Where
//Read the file and return a list of strings that match the where clause
public List<string> ReadCSV()
{
List<string> data = File.ReadLines(#"C:\Users\Public\Documents\test.csv");
.Select(line => line.Split(','))
// token[x] where x is the column number, assumes ID is column 0
.Select(tokens => new CsvFileStructure { Id = tokens[0], Value = tokens[1] })
// Where filters based on whatever you are looking for in the CSV
.Where(csvFileStructure => csvFileStructure.Id == "1")
.ToList();
return data;
}
// Map of your data structure
public class CsvFileStructure
{
public long Id { get; set; }
public string Name { get; set; }
public string Value { get; set; }
}
Modified from this answer:
https://stackoverflow.com/a/10332737/7366061
There is no csvreader.Where method. The "where" is part of Linq in C#. The link below shows an example of computing columns in a csv file using Linq:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/how-to-compute-column-values-in-a-csv-text-file-linq

C# - check which element in a csv is not in an other csv and then write the elements to another csv

My task is to check which of the elements of a column in one csv are not included in the elements of a column in the other csv. There is a country column in both csv and the task is to check which countries are not in the secong csv but are in the first csv.
I guess I have to solve it with Lists after I read the strings from the two csv. But I dont know how to check which items in the first list are not in the other list and then put it to a third list.
There are many way to achieve this, for many real world CSV applications it is helpful to read the CSV input into a typed in-memory store there are standard libraries that can assist with this like CsvHelper as explained in this canonical post: Parsing CSV files in C#, with header
However for this simple requirement we only need to parse the values for Country form the master list, in this case the second csv. We don't need to manage, validate or parse any of the other fields in the CSVs
Build a list of unique Country values from the second csv
Iterate the first csv
Get the Country value
Check against the list of countries from the second csv
Write to the third csv if the country was not found
You can test the following code on .NET Fiddle
NOTE: this code uses StringWriter and StringReader as their interfaces are the same as the file reader and writers in the System.IO namespace. but we can remove the complexity associated with file access for this simple requirement
string inputcsv = #"Id,Field1,Field2,Country,Field3
1,one,two,Australia,three
2,one,two,New Zealand,three
3,one,two,Indonesia,three
4,one,two,China,three
5,one,two,Japan,three";
string masterCsv = #"Field1,Country,Field2
one,Indonesia,...
one,China,...
one,Japan,...";
string errorCsv = "";
// For all in inputCsv where the country value is not listed in the masterCsv
// Write to errorCsv
// Step 1: Build a list of unique Country values
bool csvHasHeader = true;
int countryIndexInMaster = 1;
char delimiter = ',';
List<string> countries = new List<string>();
using (var masterReader = new System.IO.StringReader(masterCsv))
{
string line = null;
if (csvHasHeader)
{
line = masterReader.ReadLine();
// an example of how to find the column index from first principals
if(line != null)
countryIndexInMaster = line.Split(delimiter).ToList().FindIndex(x => x.Trim('"').Equals("Country", StringComparison.OrdinalIgnoreCase));
}
while ((line = masterReader.ReadLine()) != null)
{
string country = line.Split(delimiter)[countryIndexInMaster].Trim('"');
if (!countries.Contains(country))
countries.Add(country);
}
}
// Read the input CSV, if the country is not in the master list "countries", write it to the errorCsv
int countryIndexInInput = 3;
csvHasHeader = true;
var outputStringBuilder = new System.Text.StringBuilder();
using (var outputWriter = new System.IO.StringWriter(outputStringBuilder))
using (var inputReader = new System.IO.StringReader(inputcsv))
{
string line = null;
if (csvHasHeader)
{
line = inputReader.ReadLine();
if (line != null)
{
countryIndexInInput = line.Split(delimiter).ToList().FindIndex(x => x.Trim('"').Equals("Country", StringComparison.OrdinalIgnoreCase));
outputWriter.WriteLine(line);
}
}
while ((line = inputReader.ReadLine()) != null)
{
string country = line.Split(delimiter)[countryIndexInInput].Trim('"');
if(!countries.Contains(country))
{
outputWriter.WriteLine(line);
}
}
outputWriter.Flush();
errorCsv = outputWriter.ToString();
}
// dump output to the console
Console.WriteLine(errorCsv);
Since you write about solving it with lists, I assume you can load those values from the CSV to the lists, so let's start with:
List<string> countriesIn1st = LoadDataFrom1stCsv();
List<string> countriesIn2nd = LoadDataFrom2ndCsv();
Then you can easily solve it with linq:
List<string> countriesNotIn2nd = countriesIn1st.Where(country => !countriesIn2nd.Contains(country)).ToList();
Now you have your third list with countries that are in first, but not in the second list. You can save it.

compare rows values of two different CSV files in c#

I know there is more similar question but I was not able to find the answer to mine. I have two CSV files. Both files contain image metadata for the same images, however, the first file image IDs are outdated. So I need to take the IDs from the second file and replace outdated IDs with new ones. I was thinking to compare image Longitude, Latitude, and Altitude rows values, and where it matches in both files I take image id from the second file. The IDs would be used in the new object. And the sequence of lines in files is different and the first file contains more lines than the second one.
The files structure looks as follows:
First file:
ImgID,Longitude,Latitude,Altitude
01,44.7282372307,27.5786807185,14.1536407471
02,44.7287939869,27.5777060219,13.2340240479
03,44.7254687824,27.582636255,16.5887145996
04,44.7254294913,27.5826908925,16.5794525146
05,44.728785278,27.5777185252,13.2553100586
06,44.7282279311,27.5786933339,14.1576690674
07,44.7253847039,27.5827526969,16.6026000977
08,44.7287777782,27.5777295052,13.2788238525
09,44.7282196988,27.5787045314,14.1649169922
10,44.7253397041,27.5828151049,16.6300048828
11,44.728769439,27.5777417846,13.3072509766
Second file:
ImgID,Longitude,Latitude,Altitude
5702,44.7282372307,27.5786807185,14.1536407471
5703,44.7287939869,27.5777060219,13.2340240479
5704,44.7254687824,27.582636255,16.5887145996
5705,44.7254294913,27.5826908925,16.5794525146
5706,44.728785278,27.5777185252,13.2553100586
5707,44.7282279311,27.5786933339,14.1576690674
How this can be done in C#? Is there is some handy library to work with?
I would use the CSVHelper library for CSV read/write as it is a complete nice library. For this, you should declare a class to hold your data, and its property names must match your CSV file's column names.
public class ImageData
{
public int ImgID { get; set; }
public double Longitude { get; set; }
public double Latitude { get; set; }
public double Altitude { get; set; }
}
Then to see if two lines are equal, what you need to do is see if each property in each line in one file matches the other. You could do this by simply comparing properties, but I'd rather write a comparer for this, like so:
public class ImageDataComparer : IEqualityComparer<ImageData>
{
public bool Equals(ImageData x, ImageData y)
{
return (x.Altitude == y.Altitude && x.Latitude == y.Latitude && x.Longitude == y.Longitude);
}
public int GetHashCode(ImageData obj)
{
unchecked
{
int hash = (int)2166136261;
hash = (hash * 16777619) ^ obj.Altitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Latitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Longitude.GetHashCode();
return hash;
}
}
}
Simple explanation is that we override the Equals() method and dictate that two instances of ImageData class are equal if the three property values are matching. I will show the usage in a bit.
The CSV read/write part is pretty easy (the library's help page has some good examples and tips, please read it). I can write two methods for reading and writing like so:
public static List<ImageData> ReadCSVData(string filePath)
{
List<ImageData> records;
using (var reader = new StreamReader(filePath))
{
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = true;
records = csv.GetRecords<ImageData>().ToList();
}
}
return records;
}
public static void WriteCSVData(string filePath, List<ImageData> records)
{
using (var writer = new StreamWriter(filePath))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
}
}
You can actually write generic <T> read/write methods so the two methods are usable with different classes, if that's something useful for you.
Next is the crucial part. First, read the two files to memory using the methods we just defined.
var oldData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"));
var newData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "NewFile.csv"));
Now, I can go through each line in the 'old' data, and see if there's a corresponding record in 'new' data. If so, I grab the ID from the new data and replace the ID of old data with it. Notice the usage of the comparer we wrote.
foreach (var line in oldData)
{
var replace = newData.FirstOrDefault(x => new ImageDataComparer().Equals(x, line));
if (replace != null && replace.ImgID != line.ImgID)
{
line.ImgID = replace.ImgID;
}
}
Next, simply overwrite the old data file.
WriteCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"), oldData);
Results
I'm using a simplified version of your data to easily verify our results.
Old Data
ImgID,Longitude,Latitude,Altitude
1,1,2,3
2,2,3,4
3,3,4,5
4,4,5,6
5,5,6,7
6,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
New Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
Now our expected results should be that the first 6 lines of the old files should have the ids updated, and that's what we get:
Updated Old Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
An alternate way to do it, if for some reason you didn't want to use the CSVHelper, is to write a method that compares two lines of data and determines if they're equal (by ignoring the first column data):
public static bool DataLinesAreEqual(string first, string second)
{
if (first == null || second == null) return false;
var xParts = first.Split(',');
var yParts = second.Split(',');
if (xParts.Length != 4 || yParts.Length != 4) return false;
return xParts.Skip(1).SequenceEqual(yParts.Skip(1));
}
Then we can read all the lines from both files into arrays, and then we can update our first file lines with those from the second file if our method says they're equal:
var csvPath1 = #"c:\temp\csvData1.csv";
var csvPath2 = #"c:\temp\csvData2.csv";
// Read lines from both files
var first = File.ReadAllLines(csvPath1);
var second = File.ReadAllLines(csvPath2);
// Select the updated line where necessary
var updated = first.Select(f => second.FirstOrDefault(s => DataLinesAreEqual(f, s)) ?? f);
// Write the updated result back to the first file
File.WriteAllLines(csvPath1, updated);

LINQ or Lambda for two for loops

The code I have written works fine, this inquiry being purely for educational purposes. I want to know how others would do this better and cleaner. I especially hate the way I use two for loops to get data. There has to be a more efficient way.
I tried to do with LINQ but one of them is a class and the other one is just a string[]. So I couldn't figure out how to use it.
I have got a Document Name Table in my SQL database and Files in Content Folder.
I have got a Two list- ListOfFileNamesSavedInTheDB and ListOfFileNamesInTheFolder.
Basically, I am getting all file names saved in Database and checking is it exist in the Folder, if not delete file name from the database.
var clientDocList = documentRepository.Documents.Where(c => c.ClientID == clientID).ToList();
if (Directory.Exists(directoryPath))
{
string[] fileList = Directory.GetFiles(directoryPath).Select(Path.GetFileName).ToArray();
foreach (var clientDoc in clientDocList)
{
bool fileNotExist = true;
foreach (var file in fileList)
{
if (clientDoc.DocFileName.Trim().ToUpper()==file.ToUpper().Trim())
{
fileNotExist = false;
break;
}
}
if (fileNotExist)
{
documentRepository.Delete(clientDoc);
}
}
}
I am not exactly sure of how you want your code to work but I believe you need something like this
//string TextResult = "";
ClientDocList documentRepository = GetClientDocList();
var directoryPath = "";
var clientID = 1;
var clientDocList = documentRepository.Documents.Where(c => c.ClientID == clientID).ToList();
if (Directory.Exists(directoryPath) || true) // I need to pass your condition
{
string[] files = new string[] { "file1", "file5", "file6" };
List<string> fileList = files.Select(x => x.Trim().ToUpper()).ToList(); // I like working with lists, if you want an array it's ok
foreach (var clientDoc in clientDocList.Where(c => !fileList.Contains(c.DocFileName.Trim().ToUpper())))
{
//TextResult += $" {clientDoc.DocFileName} does not exists so you have to delete it from db";
documentRepository.Delete(clientDoc);
}
}
//Console.WriteLine(TextResult);
To be honest, I really don't like this line
fileList = files.Select(x => x.Trim().ToUpper()).ToList()
so I would suggest you add a helper function comparing the list of file names to the specific file name
public static bool TrimContains(List<string> names, string name)
{
return names.Any(x => x.Trim().Equals(name.Trim(), StringComparison.InvariantCultureIgnoreCase));
}
and your final code would become
List<string> fileList = new List<string>() { "file1", "file5", "file6" };
foreach (var clientDoc in clientDocList.Where(c => !TrimContains(fileList, c.DocFileName)))
{
//TextResult += $" {clientDoc.DocFileName} does not exists so you have to delete it from db";
documentRepository.Delete(clientDoc);
}
Instead of retrieving all documents from database and do the checking in memory, I suggest to check which document doesn't exist in folder in one query:
if (Directory.Exists(directoryPath))
{
var fileList = Directory.GetFiles(directoryPath).Select(Path.GetFileName);
var clientDocList = documentRepository.Documents.Where(c => c.ClientID == clientID && !fileList.Contains(c.DocFileName.Trim())).ToList();
documentRepository.Documents.RemoveRange(clientDocList);
}
Note: this is just a sample to demonstrate the idea, may have syntax error somewhere since I don't have IDE with me at the moment. But the idea is there
This code is not only shorter but also more efficient since it only uses a single query to retrieve documents from database. I assume the number of files in a folder is not too large to convert to SQL by EF

how to efficiently Comparing two lists with 500k objects and strings

So i have a main directory with sub folders and around 500k images. I know alot of theese images does not exist in my database and i want to know which ones so that i can delete them.
This is the code i have so far:
var listOfAdPictureNames = ImageDB.GetAllAdPictureNames();
var listWithFilesFromImageFolder = ImageDirSearch(adPicturesPath);
var result = listWithFilesFromImageFolder.Where(p => !listOfAdPictureNames.Any(q => p.FileName == q));
var differenceList = result.ToList();
listOfAdPictureNames is of type List<string>
here is my model that im returing from the ImageDirSearch:
public class CheckNotUsedAdImagesModel
{
public List<ImageDirModel> ListWithUnusedAdImages { get; set; }
}
public class ImageDirModel
{
public string FileName { get; set; }
public string Path { get; set; }
}
and here is the recursive method to get all images from my folder.
private List<ImageDirModel> ImageDirSearch(string path)
{
string adPicturesPath = ConfigurationManager.AppSettings["AdPicturesPath"];
List<ImageDirModel> files = new List<ImageDirModel>();
try
{
foreach (string f in Directory.GetFiles(path))
{
var model = new ImageDirModel();
model.Path = f.ToLower();
model.FileName = Path.GetFileName(f.ToLower());
files.Add(model);
}
foreach (string d in Directory.GetDirectories(path))
{
files.AddRange(ImageDirSearch(d));
}
}
catch (System.Exception excpt)
{
throw new Exception(excpt.Message);
}
return files;
}
The problem I have is that this row:
var result = listWithFilesFromImageFolder.Where(p => !listOfAdPictureNames.Any(q => p.FileName == q));
takes over an hour to complete. I want to know if there is a better way to check in my images folder if there are images there that doesn't exist in my database.
Here is the method that get all the image names from my database layer:
public static List<string> GetAllAdPictureNames()
{
List<string> ListWithAllAdFileNames = new List<string>();
using (var db = new DatabaseLayer.DBEntities())
{
ListWithAllAdFileNames = db.ad_pictures.Select(b => b.filename.ToLower()).ToList();
}
if (ListWithAllAdFileNames.Count < 1)
return new List<string>();
return ListWithAllAdFileNames;
}
Perhaps Except is what you're looking for. Something like this:
var filesInFolderNotInDb = listWithFilesFromImageFolder.Select(p => p.FileName).Except(listOfAdPictureNames).ToList();
Should give you the files that exist in the folder but not in the database.
Instead of the search being repeated on each of these lists its optimal to sort second list "listOfAdPictureNames" (Use any of n*log(n) sorts). Then checking for existence by binary search will be the most efficient all other techniques including the current one are exponential in order.
As I said in my comment, you seem to have recreated the FileInfo class, you don't need to do this, so your ImageDirSearch can become the following
private IEnumerable<string> ImageDirSearch(string path)
{
return Directory.EnumerateFiles(path, "*.jpg", SearchOption.TopDirectoryOnly);
}
There doesn't seem to be much gained by returning the whole file info where you only need the file name, and also this only finds jpgs, but this can be changed..
The ToLower calls are quite expensive and a bit pointless, so is the to list when you are planning on querying again so you can get rid of that and return an IEnumerable again, (this is in the GetAllAdPictureNames method)
Then your comparison can use equals and ignore case.
!listOfAdPictureNames.Any(q => p.Equals(q, StringComparison.InvariantCultureIgnoreCase));
One more thing that will probably help is removing items from the list of file names as they are found, this should make the searching of the list quicker every time one is removed since there is less to iterate through.

Categories