Validate JSON with Custom JSON schema using C# - c#

I am working on a project in which I need to read an Excel file and validate the dataset.
let's say as an example there is a column called "Date Of Birth" in Excel Sheet, so I need to check whether it's the correct date. because the user might enter number or a just letters into that column. So I can't ask to add Excel validation to that Excel file. so there is no validation in the Excel file. so users can add anything to any column.
and the other one is this excel header is not constant. these headers can change with time.
because of that, I can't create model classes for excel.
So I use "DocumentFormat.OpenXML" to read the Excel file. and use ExpandoObject to store this data set because as I said these headers might be changed.
I created a basic class to save the basic info of the cell
public class CellDetail
{
public string CellHeading { get; set; } = string.Empty;
public string CellValue { get; set; } = string.Empty;
public string CellReference { get; set; } = string.Empty;
}
"cellHeading" is the column Name, "CellValue" is the value of the cell, "CellReference" is the reference of the cell means as an example B12. Column "B" 12th line.
I was able to read the Excel sheet and create the dataset. finally I created JSON with this dataset
private ExpandoObject ConvertCellToExpandoObject(SpreadsheetDocument spreadsheetDocument, Cell cell)
{
var cellDetail = _excelFileService.GetCellValue(spreadsheetDocument, cell);
dynamic item = new ExpandoObject();
item.Id = cellDetail.CellReference;
item.CellDetail = new CellDetail { CellHeading = cellDetail.CellHeading, CellValue = cellDetail.CellValue, CellReference = cellDetail .CellReference };
return item;
}
public string ExcelDataSetToJSON()
{
List<ExpandoObject> allCellsInOneRow = new List<ExpandoObject>();
List<List<ExpandoObject>> excelDataSet = new();
SpreadsheetDocument spreadsheetDocument;
var sheet = _excelFileService.GetSheetDataBySheetName(FILEPATH, SHEETNAME);
var rowList = sheet.Elements<Row>();
using (spreadsheetDocument = SpreadsheetDocument.Open(FILEPATH, false))
{
foreach (var row in rowList)
{
var cellList = row.Elements<Cell>().Take(6);
foreach (Cell cell in cellList)
{
var cellDetailInExpando = ConvertCellToExpandoObject(spreadsheetDocument, cell);
allCellsInOneRow.Add(cellDetailInExpando);
}
excelDataSet.Add(allCellsInOneRow);
allCellsInOneRow = new List<ExpandoObject>();
}
}
return Newtonsoft.Json.JsonConvert.SerializeObject(excelDataSet);
}
"ExcelDataSetToJSON" method return a JSON. it's look like this
so now I need to validate this JSON with JSON schema validation. still I didn't create JSON schema to validate the JSON. I saw one of the NuGet package named "Json.NET Schema". that used to validate JSON. I need to implement something like that.
I need to check mainly these things in the JSON
value should be number,
value should be string,
Minimum & maximum range,
value should be correct date
value should be one of the given options
how do I implement my own custom schema and do these validation?

Related

Is there a way to filter a CSV file for data validation without for loops. (Lumenworks CSVReader)

I want to be able to filter out a CSV file and perform data validation on the filtered data. I imagine for loops, but the file has 2 million cells and it would take a long time. I am using Lumenworks CSVReader for accessing the file using C#.
I found this method csvfile.Where<> but I have no idea what to put in the parameters. Sorry I am still new to coding as well.
[EDIT] This is my code for loading the file. Thanks for all the help!
//Creating C# table from CSV data
var csvTable = new DataTable();
var csvReader = new CsvReader(newStreamReader(System.IO.File.OpenRead(filePath[0])), true);
csvTable.Load(csvReader);
//grabs header from the CSV data table
string[] headers = csvReader.GetFieldHeaders(); //this method gets the headers of the CSV file
string filteredData[] = csvReader.Where // this is where I would want to implement the where method, or some sort of way to filter the data
//I can access the rows and columns with this
csvTable.Rows[0][0]
csvTable.Columns[0][0]
//After filtering (maybe even multiple filters) I want to add up all the filtered data (assuming they are integers)
var dataToValidate = 0;
foreach var data in filteredData{
dataToValidate += data;
}
if (dataToValidate == 123)
//data is validated
I would read some of the documentation for the package you are using:
https://github.com/phatcher/CsvReader
https://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
To specifically answer the filtering question, so it only contains the data you are searching for consider the following:
var filteredData = new List<List<string>>();
using (CsvReader csv = new CsvReader(new StreamReader(System.IO.File.OpenRead(filePath[0])), true));
{
string searchTerm = "foo";
while (csv.ReadNextRecord())
{
var row = new List<string>();
for (int i = 0; i < csv.FieldCount; i++)
{
if (csv[i].Contains(searchTerm))
{
row.Add(csv[i]);
}
}
filteredData.Add(row);
}
}
This will give you a list of a list of string that you can enumerate over to do your validation
int dataToValidate = 0;
foreach (var row in filteredData)
{
foreach (var data in row)
{
// do the thing
}
}
--- Old Answer ---
Without seeing the code you are using to load the file, it might be a bit difficult to give you a full answer, ~2 Million cells may be slow no matter what what.
Your .Where comes from System.Linq
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where?view=net-6.0
A simple example using .Where
//Read the file and return a list of strings that match the where clause
public List<string> ReadCSV()
{
List<string> data = File.ReadLines(#"C:\Users\Public\Documents\test.csv");
.Select(line => line.Split(','))
// token[x] where x is the column number, assumes ID is column 0
.Select(tokens => new CsvFileStructure { Id = tokens[0], Value = tokens[1] })
// Where filters based on whatever you are looking for in the CSV
.Where(csvFileStructure => csvFileStructure.Id == "1")
.ToList();
return data;
}
// Map of your data structure
public class CsvFileStructure
{
public long Id { get; set; }
public string Name { get; set; }
public string Value { get; set; }
}
Modified from this answer:
https://stackoverflow.com/a/10332737/7366061
There is no csvreader.Where method. The "where" is part of Linq in C#. The link below shows an example of computing columns in a csv file using Linq:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/how-to-compute-column-values-in-a-csv-text-file-linq

C# - check which element in a csv is not in an other csv and then write the elements to another csv

My task is to check which of the elements of a column in one csv are not included in the elements of a column in the other csv. There is a country column in both csv and the task is to check which countries are not in the secong csv but are in the first csv.
I guess I have to solve it with Lists after I read the strings from the two csv. But I dont know how to check which items in the first list are not in the other list and then put it to a third list.
There are many way to achieve this, for many real world CSV applications it is helpful to read the CSV input into a typed in-memory store there are standard libraries that can assist with this like CsvHelper as explained in this canonical post: Parsing CSV files in C#, with header
However for this simple requirement we only need to parse the values for Country form the master list, in this case the second csv. We don't need to manage, validate or parse any of the other fields in the CSVs
Build a list of unique Country values from the second csv
Iterate the first csv
Get the Country value
Check against the list of countries from the second csv
Write to the third csv if the country was not found
You can test the following code on .NET Fiddle
NOTE: this code uses StringWriter and StringReader as their interfaces are the same as the file reader and writers in the System.IO namespace. but we can remove the complexity associated with file access for this simple requirement
string inputcsv = #"Id,Field1,Field2,Country,Field3
1,one,two,Australia,three
2,one,two,New Zealand,three
3,one,two,Indonesia,three
4,one,two,China,three
5,one,two,Japan,three";
string masterCsv = #"Field1,Country,Field2
one,Indonesia,...
one,China,...
one,Japan,...";
string errorCsv = "";
// For all in inputCsv where the country value is not listed in the masterCsv
// Write to errorCsv
// Step 1: Build a list of unique Country values
bool csvHasHeader = true;
int countryIndexInMaster = 1;
char delimiter = ',';
List<string> countries = new List<string>();
using (var masterReader = new System.IO.StringReader(masterCsv))
{
string line = null;
if (csvHasHeader)
{
line = masterReader.ReadLine();
// an example of how to find the column index from first principals
if(line != null)
countryIndexInMaster = line.Split(delimiter).ToList().FindIndex(x => x.Trim('"').Equals("Country", StringComparison.OrdinalIgnoreCase));
}
while ((line = masterReader.ReadLine()) != null)
{
string country = line.Split(delimiter)[countryIndexInMaster].Trim('"');
if (!countries.Contains(country))
countries.Add(country);
}
}
// Read the input CSV, if the country is not in the master list "countries", write it to the errorCsv
int countryIndexInInput = 3;
csvHasHeader = true;
var outputStringBuilder = new System.Text.StringBuilder();
using (var outputWriter = new System.IO.StringWriter(outputStringBuilder))
using (var inputReader = new System.IO.StringReader(inputcsv))
{
string line = null;
if (csvHasHeader)
{
line = inputReader.ReadLine();
if (line != null)
{
countryIndexInInput = line.Split(delimiter).ToList().FindIndex(x => x.Trim('"').Equals("Country", StringComparison.OrdinalIgnoreCase));
outputWriter.WriteLine(line);
}
}
while ((line = inputReader.ReadLine()) != null)
{
string country = line.Split(delimiter)[countryIndexInInput].Trim('"');
if(!countries.Contains(country))
{
outputWriter.WriteLine(line);
}
}
outputWriter.Flush();
errorCsv = outputWriter.ToString();
}
// dump output to the console
Console.WriteLine(errorCsv);
Since you write about solving it with lists, I assume you can load those values from the CSV to the lists, so let's start with:
List<string> countriesIn1st = LoadDataFrom1stCsv();
List<string> countriesIn2nd = LoadDataFrom2ndCsv();
Then you can easily solve it with linq:
List<string> countriesNotIn2nd = countriesIn1st.Where(country => !countriesIn2nd.Contains(country)).ToList();
Now you have your third list with countries that are in first, but not in the second list. You can save it.

C# ExcelWorksheet cell value is read incorrectly

I am parsing a csv file using C# and ExcelWorksheet.
I have a cell that contains an integer. 3020191002155959391100
When I parse the cell using
var value = sheet.Cells[rowNumber, columnNumber.Column].Value;
value is 3.0201910021559592E+21
when I parse the cell using sheet.Cells[rowNumber, columnNumber.Column].Text;
the value is 3020191002155960000000
how do I prevent the rounding off?
The maximum value of an int in C# is
int.MaxValue: 2,147,483,647
Source: https://www.dotnetperls.com/int-maxvalue
Therefore your number is too big to be read as an int.
However, upon reading your comments it appears that you're using the Excel reader to read a CSV file, which is the wrong tool for the job. Use a CSV parser such as CSVHelper (https://joshclose.github.io/CsvHelper/) which will make your life easy.
Here's an example of how to read such long numbers as string using CSVHelper.
First I'll create a class to match your CSV file. I created a dummy CSV file that looks like the following. Simply, three long numbers in a row.
3020191002155959391100,3020191002155959391101,3020191002155959391102
Now I create a class like so:
class CSVRecord
{
public string Data1 { get; set; }
public string Data2 { get; set; }
public string Data3 { get; set; }
}
Then you can read all the records in one go. Note the csv.Configuration settings, depending on the complexity of the file you read you'll have to change those. I advice you to read the documentation and examples on the link provided.
var file = #"data.csv";
var records = new List<CSVRecord>();
using (var reader = new StreamReader(file))
{
using (var csv = new CsvReader(reader))
{
csv.Configuration.HasHeaderRecord = false;
records = csv.GetRecords<CSVRecord>().ToList();
}
}
I needed to set the specific column types to string, before processing each cell.
columnDataTypes.Add(eDataTypes.String);

LinqToExcel Header Mapping

I am using LinqToExcel to get the content of the excel file out.
With header mapping class like the following I can map the property of my class to a column in the excel:
public class Transaction
{
[ExcelColumn("Trans Id")]
public string TradeNumber { get; set; }
[ExcelColumn("Trans Version")]
public string TransVersion { get; set; }
}
However, sometime the incoming file has different header, for example sometimes it has header "Trans Id" Sometimes it has "Trans ID", the program cannot convert the column when the header is "Trans ID"
Is there a way to make LinqToExcel compare column name in case insensitive mode?
Or there is a place to let me override the comparison method of LinqToExcel.
Thanks!
I tried to use the
public void AddTransformation<TSheetData>(Expression<Func<TSheetData, object>> property, Func<string, object> transformation);
part of the library, but that only deals with the value, not the column name.
Not sure if this is the best solution for it, but it worked for me. I tried to find similar ways around it, but if you're unable to control the column names like that
//Get the Header information
//Worksheet title
//List of Columns (can narrow down if you always know the placement)
ExcelQueryFactory HeaderInfo = new ExcelQueryFactory("FILE NAME.xlsx");
List<string> worksheetName = HeaderInfo.GetWorksheetNames().ToList();
IEnumerable<string> columnNames = HeaderInfo.GetColumnNames(worksheetName[0].ToString());
//Get those values that you're looking for. Pulling in the unedited Excel column name
string TradeNumber_HeaderName = columnNames.Where(a => a.ToUpper().Trim() == "TRANS ID" || a => a.ToUpper().Trim() == "TRANSID").FirstOrDefault() ?? "Trans ID";
string TransVersion_HeaderName = columnNames.Where(a => a.ToUpper().Trim() == "TRANS VERSION").FirstOrDefault() ?? "Trans Version";
//Whatever your new connection is now to , and this will use that column value dynamically.
ExcelQueryFactory ExcelConn = ...
ExcelConn.AddMapping<Transaction>(x => x.TradeNumber, TradeNumber_HeaderName);
ExcelConn.AddMapping<Transaction>(x => x.TransVersion, TransVersion_HeaderName);
You could define the mapping yourself with:
var excelFile = new ExcelQueryFactory(pathToExcelFile);
excelFile.AddMapping("Trans Id", "Trans ID");
This is just a suggestion, you would have to create a mapping for each scenario...ughh.
Let us know if the AddMapping works for you.

CSV to C# FileHelpers

I want to use FileHelpers to read an extremely basic CSV file into C#.
I have a Model that looks like this;
[DelimitedRecord(",")]
[IgnoreFirst()]
public class NavigationButton
{
public int ID;
public string Text;
public string Path;
}
The CSV file looks like this;
I want to be able to read the appropriate lines and create a new NavigationButton for each record read in from the CSV file. I have read them into a DataTable using;
public DataTable GetNavigationButtonNames()
{
var filename = #"C:\Desktop\NavigationButtons.csv";
var engine = new FileHelperEngine(typeof(NavigationButton));
return engine.ReadFileAsDT(filename);
}
but I now cannot loop through the DataTable as it doesn't implement IEnumerable. I would have created a new NavigationButton in a foreach loop and added in the appropriate rows, however this cannot be done the way I have started out.
How can I change this method so that I can loop through the object I read into from the CSV file and create a new button for each row in the CSV file?
Use the generic version of FileHelperEngine<T> instead, then you can do:
var filename = #"C:\Desktop\NavigationButtons.csv";
var engine = new FileHelperEngine<NavigationButton>();
// ReadFile returns an array of NavigationButton
var records = engine.ReadFile(filename);
// then you can do your foreach or just get the button names with LINQ
return records.Select(x => x.Text);
The documentation for ReadFile() is here.
Or there is also ReadFileAsList() if you prefer.
How about this:
List<NavigationButton> buttons = new List<NavigationButton>();
DataTable dt = GetNavigationButtonNames();
foreach (DataRow dr in dt.Rows)
{
buttons.Add(new NavigationButton
{
ID = int.Parse(dr["id"]),
Text = dr["Text"].ToString(),
Path = dr["Path"].ToString() });
});
}

Categories