I am working on a project in which I need to read an Excel file and validate the dataset.
let's say as an example there is a column called "Date Of Birth" in Excel Sheet, so I need to check whether it's the correct date. because the user might enter number or a just letters into that column. So I can't ask to add Excel validation to that Excel file. so there is no validation in the Excel file. so users can add anything to any column.
and the other one is this excel header is not constant. these headers can change with time.
because of that, I can't create model classes for excel.
So I use "DocumentFormat.OpenXML" to read the Excel file. and use ExpandoObject to store this data set because as I said these headers might be changed.
I created a basic class to save the basic info of the cell
public class CellDetail
{
public string CellHeading { get; set; } = string.Empty;
public string CellValue { get; set; } = string.Empty;
public string CellReference { get; set; } = string.Empty;
}
"cellHeading" is the column Name, "CellValue" is the value of the cell, "CellReference" is the reference of the cell means as an example B12. Column "B" 12th line.
I was able to read the Excel sheet and create the dataset. finally I created JSON with this dataset
private ExpandoObject ConvertCellToExpandoObject(SpreadsheetDocument spreadsheetDocument, Cell cell)
{
var cellDetail = _excelFileService.GetCellValue(spreadsheetDocument, cell);
dynamic item = new ExpandoObject();
item.Id = cellDetail.CellReference;
item.CellDetail = new CellDetail { CellHeading = cellDetail.CellHeading, CellValue = cellDetail.CellValue, CellReference = cellDetail .CellReference };
return item;
}
public string ExcelDataSetToJSON()
{
List<ExpandoObject> allCellsInOneRow = new List<ExpandoObject>();
List<List<ExpandoObject>> excelDataSet = new();
SpreadsheetDocument spreadsheetDocument;
var sheet = _excelFileService.GetSheetDataBySheetName(FILEPATH, SHEETNAME);
var rowList = sheet.Elements<Row>();
using (spreadsheetDocument = SpreadsheetDocument.Open(FILEPATH, false))
{
foreach (var row in rowList)
{
var cellList = row.Elements<Cell>().Take(6);
foreach (Cell cell in cellList)
{
var cellDetailInExpando = ConvertCellToExpandoObject(spreadsheetDocument, cell);
allCellsInOneRow.Add(cellDetailInExpando);
}
excelDataSet.Add(allCellsInOneRow);
allCellsInOneRow = new List<ExpandoObject>();
}
}
return Newtonsoft.Json.JsonConvert.SerializeObject(excelDataSet);
}
"ExcelDataSetToJSON" method return a JSON. it's look like this
so now I need to validate this JSON with JSON schema validation. still I didn't create JSON schema to validate the JSON. I saw one of the NuGet package named "Json.NET Schema". that used to validate JSON. I need to implement something like that.
I need to check mainly these things in the JSON
value should be number,
value should be string,
Minimum & maximum range,
value should be correct date
value should be one of the given options
how do I implement my own custom schema and do these validation?
I've run into an issue while parsing some csv-like files that I know how to fix, but like to confirm if that's the appropriate way to do.
The file structure
The file I'm trying to parse has a structure similar to .csv in that it's values are separated with a delimeter (in my case it's |), but different to the ones I've previously seen is that it also has a delimeter at the end of the line, e.g:
Column1|Column2|Column3|
Row1Val1|Row1Val2|Row1Val3|
Row2Val1|Row2Val2|Row2Val3|
The issue
The problem arose when I wrote some unit tests to cover my service that wraps over the CsvHelper library. Apparently there is some issue when I provide the following configuration:
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|",
HasHeaderRecord = true,
NewLine = "|\r\n"
};
With the above configuration, csvReader.GetRecords() returns no results. I believe that's because the order of operations for the parser is to first look for columns, then end of line - and it tries to parse empty column without realizing it's actually part of the delimeter.
(I can paste the code for the getRecords call as well, but it's basically generic code taken from examples - the only difference is I'm using System.IO.Abstractions library for easier unit testing)
The attempts to solve the problem
If I remove the NewLine configuration value, parser works fine when reading the file (even if it has end-of-line delimeter character at the end). Then, however, my "write CSV" tests break, since CsvHelper no longer is adding proper line endings to the file.
The question(s)
Is there any way I can configure CsvHelper to cover both cases with one configuration, or should I basically use two different configurations, depending on whether I'm writing to CSV or reading from it? This seems a little bit counter-intuitive for me, since it's basically the same format I'm trying to follow, but different configurations are expected?
You could manually write the empty column for each line and then you could keep the configuration the same for reading and writing.
void Main()
{
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|"
};
var records = new List<MyClass>
{
new MyClass {Column1 = "Row1Val1", Column2 = "Row1Val2", Column3 = "Row1Val3"},
new MyClass {Column1 = "Row2Val1", Column2 = "Row2Val2", Column3 = "Row2Val3"}
};
using (var writer = new StreamWriter("file.csv"))
using (var csv = new CsvWriter(writer, config))
{
csv.WriteHeader<MyClass>();
csv.WriteField(string.Empty);
foreach (var record in records)
{
csv.NextRecord();
csv.WriteRecord(record);
csv.WriteField(string.Empty);
}
}
using (var reader = new StreamReader("file.csv"))
using (var csv = new CsvReader(reader, config))
{
var importRecords = csv.GetRecords<MyClass>();
importRecords.Dump();
}
}
public class MyClass
{
public string Column1 { get; set; }
public string Column2 { get; set; }
public string Column3 { get; set; }
}
hi have a text files that contains 3 columns something like this:
contract1;pdf1;63
contract1;pdf2;5
contract1;pdf3;2
contract1;pdf4;00
contract2;pdf1;2
contract2;pdf2;30
contract2;pdf3;5
contract2;pdf4;80
now, i want to write those information into another text files ,and the output will be order put for first the records with the last column in "2,5", something like this:
contract1;pdf3;2
contract1;pdf2;5
contract1;pdf1;63
contract1;pdf4;00
contract2;pdf1;2
contract2;pdf3;5
contract2;pdf2;30
contract2;pdf4;80
how can i do?
thanks
You can use LINQ to group and sort the lines after reading, then put them back together:
var output = File.ReadAllLines(#"path-to-file")
.Select(s => s.Split(';'))
.GroupBy(s => s[0])
.SelectMany(sg => sg.OrderBy(s => s[2] == "2" ? "-" : s[2] == "5" ? "+" : s[2]).Select(sg => String.Join(";", sg)));
Then just write them to a file.
I'm not going to write your program for you, but I would recommend this library for reading and writing delimited files:
https://joshclose.github.io/CsvHelper/getting-started/
When you new up the reader make sure to specify your semi-colon delimiter:
using (var reader = new StreamReader("path\\to\\input_file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = ";";
var records = csv.GetRecords<Row>();
// manipulate the data as needed here
}
Your "Row" class (choose a more appropriate name for clarity) will specify the schema of the flat file. It sounds like you don't have headers? If not, you can specify the Order of each item.
public class Row
{
[Index(1)]
public string MyValue1 { get; set; }
[Index(2)]
public string MyValue2 { get; set; }
}
After reading the data in, you can manipulate it as needed. If the output format is different from the input format, you should convert the input class into an output class. You can use the Automapper library if you would like. However, for a simple project I would suggest to just manually convert the input class into the output class.
Lastly, write the data back out:
using (var writer = new StreamWriter("path\\to\\output_file.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
I know there is more similar question but I was not able to find the answer to mine. I have two CSV files. Both files contain image metadata for the same images, however, the first file image IDs are outdated. So I need to take the IDs from the second file and replace outdated IDs with new ones. I was thinking to compare image Longitude, Latitude, and Altitude rows values, and where it matches in both files I take image id from the second file. The IDs would be used in the new object. And the sequence of lines in files is different and the first file contains more lines than the second one.
The files structure looks as follows:
First file:
ImgID,Longitude,Latitude,Altitude
01,44.7282372307,27.5786807185,14.1536407471
02,44.7287939869,27.5777060219,13.2340240479
03,44.7254687824,27.582636255,16.5887145996
04,44.7254294913,27.5826908925,16.5794525146
05,44.728785278,27.5777185252,13.2553100586
06,44.7282279311,27.5786933339,14.1576690674
07,44.7253847039,27.5827526969,16.6026000977
08,44.7287777782,27.5777295052,13.2788238525
09,44.7282196988,27.5787045314,14.1649169922
10,44.7253397041,27.5828151049,16.6300048828
11,44.728769439,27.5777417846,13.3072509766
Second file:
ImgID,Longitude,Latitude,Altitude
5702,44.7282372307,27.5786807185,14.1536407471
5703,44.7287939869,27.5777060219,13.2340240479
5704,44.7254687824,27.582636255,16.5887145996
5705,44.7254294913,27.5826908925,16.5794525146
5706,44.728785278,27.5777185252,13.2553100586
5707,44.7282279311,27.5786933339,14.1576690674
How this can be done in C#? Is there is some handy library to work with?
I would use the CSVHelper library for CSV read/write as it is a complete nice library. For this, you should declare a class to hold your data, and its property names must match your CSV file's column names.
public class ImageData
{
public int ImgID { get; set; }
public double Longitude { get; set; }
public double Latitude { get; set; }
public double Altitude { get; set; }
}
Then to see if two lines are equal, what you need to do is see if each property in each line in one file matches the other. You could do this by simply comparing properties, but I'd rather write a comparer for this, like so:
public class ImageDataComparer : IEqualityComparer<ImageData>
{
public bool Equals(ImageData x, ImageData y)
{
return (x.Altitude == y.Altitude && x.Latitude == y.Latitude && x.Longitude == y.Longitude);
}
public int GetHashCode(ImageData obj)
{
unchecked
{
int hash = (int)2166136261;
hash = (hash * 16777619) ^ obj.Altitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Latitude.GetHashCode();
hash = (hash * 16777619) ^ obj.Longitude.GetHashCode();
return hash;
}
}
}
Simple explanation is that we override the Equals() method and dictate that two instances of ImageData class are equal if the three property values are matching. I will show the usage in a bit.
The CSV read/write part is pretty easy (the library's help page has some good examples and tips, please read it). I can write two methods for reading and writing like so:
public static List<ImageData> ReadCSVData(string filePath)
{
List<ImageData> records;
using (var reader = new StreamReader(filePath))
{
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = true;
records = csv.GetRecords<ImageData>().ToList();
}
}
return records;
}
public static void WriteCSVData(string filePath, List<ImageData> records)
{
using (var writer = new StreamWriter(filePath))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
}
}
You can actually write generic <T> read/write methods so the two methods are usable with different classes, if that's something useful for you.
Next is the crucial part. First, read the two files to memory using the methods we just defined.
var oldData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"));
var newData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "NewFile.csv"));
Now, I can go through each line in the 'old' data, and see if there's a corresponding record in 'new' data. If so, I grab the ID from the new data and replace the ID of old data with it. Notice the usage of the comparer we wrote.
foreach (var line in oldData)
{
var replace = newData.FirstOrDefault(x => new ImageDataComparer().Equals(x, line));
if (replace != null && replace.ImgID != line.ImgID)
{
line.ImgID = replace.ImgID;
}
}
Next, simply overwrite the old data file.
WriteCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"), oldData);
Results
I'm using a simplified version of your data to easily verify our results.
Old Data
ImgID,Longitude,Latitude,Altitude
1,1,2,3
2,2,3,4
3,3,4,5
4,4,5,6
5,5,6,7
6,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
New Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
Now our expected results should be that the first 6 lines of the old files should have the ids updated, and that's what we get:
Updated Old Data
ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
An alternate way to do it, if for some reason you didn't want to use the CSVHelper, is to write a method that compares two lines of data and determines if they're equal (by ignoring the first column data):
public static bool DataLinesAreEqual(string first, string second)
{
if (first == null || second == null) return false;
var xParts = first.Split(',');
var yParts = second.Split(',');
if (xParts.Length != 4 || yParts.Length != 4) return false;
return xParts.Skip(1).SequenceEqual(yParts.Skip(1));
}
Then we can read all the lines from both files into arrays, and then we can update our first file lines with those from the second file if our method says they're equal:
var csvPath1 = #"c:\temp\csvData1.csv";
var csvPath2 = #"c:\temp\csvData2.csv";
// Read lines from both files
var first = File.ReadAllLines(csvPath1);
var second = File.ReadAllLines(csvPath2);
// Select the updated line where necessary
var updated = first.Select(f => second.FirstOrDefault(s => DataLinesAreEqual(f, s)) ?? f);
// Write the updated result back to the first file
File.WriteAllLines(csvPath1, updated);
I'm trying to learn some C# here. My goal is to create and write on multiple custom files which name varies based on a part of the string to be written. Below some examples:
Let's say strings to be written are basically rows of a csv file:
2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;
2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;G
2019-10-29 16:13:11;;18;0;;3;false;false;true;
As you may notice, first field of each string is a date, and that's and that is the key field to choose the name of the file to write to.
First two fields have same date, so both strings will be printed on a single file, the third one in a second file since it has different date.
Expected Result:
First File:
2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;
2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;
Second File:
2019-10-29 16:13:11;;18;0;;3;false;false;true;
Now I have multiple rows like those, and I'd like to print them on different files based on their first value.
I managed to create a class which might represent each row:
class Value {
public DateTime date = DateTime.Now;
public decimal cod = 0;
public decimal quantity = 0;
public decimal price = 0;
//Other irrelevant fields
}
And I also tried to develop a method to write a single Value on given File:
private static void WriteValue(Value content, string folder, string fileName) {
using(StreamWriter writer = new StreamWriter(Path.Combine(folder, fileName), true, Encoding.ASCII)) {
writer.Write(content.dataora.ToString("yyyyMMdd"));
writer.Write("0000");
writer.Write("I");
writer.Write("C");
writer.Write(content.codpro.ToString().PadLeft(14, '0'));
writer.Write(Convert.ToInt64(content.qta * 100).ToString().PadLeft(8, '0'));
writer.WriteLine();
}
}
And a Method to write Values them into files
static void WriteValues(List<Value> fileContent) {
//Once I got all Values of File in a List of Values, I try to write them in files
}
if(fileContent.Count > 0) {
foreach(Value riga in fileContent) {
//Temp Dates, used to compare previous Date in order to know if I have to write Value in new File or it can be written on same File
string dataTemp = riga.dataora.ToString("yyyy-MM-dd");
string lastData = string.Empty;
string FileName = "ordinivoa_999999." + DateTime.Now.ToString("yyMMddHHmmssfff");
//If lastData is Empty we are writing first value
if (string.IsNullOrEmpty(lastData)) {
WriteValue(riga, toLinfaFolder, FileName);
lastData = dataTemp;
}
//Else if lastData is equal as last managed date we write on same file
else if (lastData == dataTemp) {
WriteValue(riga, toLinfaFolder, FileName);
}
else {
//Else current date of Value is new, so we write it in another file
string newFileName = "ordinivoa_999999." + DateTime.Now.AddMilliseconds(1).ToString("yyMMddHHmmssfff");
WriteValue(riga, toLinfaFolder, newFileName);
lastData = dataTemp;
}
}
}
}
My issue is method above has strange behavior, writes first equal dates on a single file, which is good, but writes all other values in a single file, even if we have different dates.
How to make sure each value gets printed on in a single file only if has same date value?
You can group equal dates easily with a LINQ query
private static void WriteValues(List<Value> fileContent)
{
var dateGroups = fileContent
.GroupBy(v => $"ordinivoa_999999.{v.date:yyMMddHHmmssfff}");
foreach (var group in dateGroups) {
string path = Path.Combine(toLinfaFolder, group.Key);
using (var writer = new StreamWriter(path, true, Encoding.ASCII)) {
foreach (Value item in group) {
//TODO: write item to file
writer.WriteLine(...
}
}
}
}
Since a DateTime stores values in units of one ten-millionth of a second, two dates looking equal once formatted, might still be different. So I suggest grouping on the filename to avoid this effect. I used string interpolation to create and format the file name.
Don't open and close the file for each text line.
At the top of your code file you need a
using System.Linq;
You are on the right path declaring a class, but you're also doing a whole bunch of unnecessary stuff. Using LINQ this can be simplified by a great deal.
First I define a class, and since all you want to do is write each record, I would use a DateTime field, and a string field for the entire raw record.
class MyRecordOfSomeType
{
public DateTime Date { get; set; }
public string RawData { get; set; }
}
The DateTime filed is so that it'll come in handy when you're doing LINQ.
Now we iterate through your data, split using ;, then create your class instance list.
var data = new List<string>()
{
"2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;",
"2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;G",
"2019-10-29 16:13:11;;18;0;;3;false;false;true;"
};
var records = new List<MyRecordOfSomeType>();
foreach (var item in data)
{
var parts = item.Split(';');
DateTime.TryParse(parts[0], out DateTime result);
var rec = new MyRecordOfSomeType() { Date = result, RawData = item };
records.Add(rec);
}
Then we group by date. Note that it's important to group by the Date component of the DateTime structure, otherwise it will consider the Time component as well and you'll have more files than you need.
var groups = records.GroupBy(x => x.Date.Date);
Finally, iterate your groups, and write contents of each group to a new file.
foreach (var group in groups)
{
var fileName = string.Format("ordinivoa_999999_{0}.csv", group.Key.ToString("yyMMddHHmmssfff"));
File.WriteAllLines(fileName, group.Select(x => x.RawData));
}