I have CSV template file with different data files with filenames in below format that reads and store in CSVTemplate model
Note: The data is CSV file for each file types with names will be ordered.
id,FileType,FileName
1,Excel Files, Excel File 1
2,Excel Files, Excel File 2
3,Excel Files, Excel File 2.1
4,Document Files, Document File 1,
5,Document Files, Document File 2
6,Document Files, Document File 3
7,Document Files, Document File 3.1
8,Document Files, Document File 3.2
Model:
public class CSVTemplate
{
public int id { get; set; }
public string FileType{ get; set; }
public string FileName{ get; set; }
}
Code for reading CS Records: I am using CSVHelper NuGet package
public List<CSVTemplate> Fetch_TemplateData()
{
List<CSVTemplate> csvTemplateData = new List<CSVTemplate>();
try
{
string csvFilePath = #"File1.csv";
//CSVHelper Configuration Settings
var csvconfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
NewLine = Environment.NewLine,
HasHeaderRecord = true,
DetectDelimiterValues = new string[] { "," },
};
//Reading CSV File
using (var reader = new StreamReader(csvFilePath))
{
//Fetching CSV Records
using (var csv = new CsvReader(reader, csvconfig))
{
csvTemplateData = csv.GetRecords<CSVTemplate>().ToList();
}
};
return csvTemplateData;
}
catch (Exception)
{
throw;
}
}
I have below JSON response from WebAPI contains file names in title for all types of files which is nested and inner nested.
JSON:
{
"Field": [
{
"title": "Excel Files",
"Field": [
{
"title": "Excel File 1"
},
{
"title": "Excel File 2",
"Field": [
{
"title": "Excel File 2.1"
},
{
"title": "Excel File 2.2"
}
]
},
{
"title": "Excel File 3",
"Field": [
{
"title": "Excel File 3.1",
"Field": [
{
"title": "Excel File 3.1.1"
}
]
}
]
}
]
},
{
"title": "Document Files",
"Field": [
{
"title": "Document File 1"
},
{
"title": "Document File 2",
"Field": [
{
"title": "Document File 2.1"
},
{
"title": "Document File 2.2"
}
]
},
{
"title": "Document File 3",
"Field": [
{
"title": "Document File 3.1",
"Field": [
{
"title": "Document File 3.1.1"
}
]
}
]
}
]
}
]
}
CSV file contains mandatory file names of all file types. webapi response (json) contains mandatory and non mandatory file names. I need to compare CSV with webapi data and check if mandatory filenames are missed.
Model:
public class Field
{
public string title { get; set; }
public List<Field> Field { get; set; }
}
public class Root
{
public List<Field> Field { get; set; }
}
Updated Code:
public bool Validate_TemplateData(List<CSVTemplate> csvData)
{
bool isTemplateValidate = false;
try
{
string jsonFile = #".\json1.json";
string jsonString = File.ReadAllText(jsonFile);
Root jsondata = JsonConvert.DeserializeObject<Root>(jsonString);
if (jsondata != null)
{
bool isMatched = false;
foreach (var csvRowData in csvData)
{
var csvTitle = csvRowData.FileName;
foreach (var field in jsondata)
{
isMatched = CheckFieldTitle(field, 0, csvTitle);
if (isMatched)
{
break;
}
}
if (!isMatched)
{
Console.WriteLine($"Title : {csvTitle} Not matched.");
break;
}
}
}
}
catch (Exception)
{
throw;
}
return isTemplateValidate;
}
Recursive Function : to read all nested field names/titles from JSON.
private static bool CheckFieldTitle(Field field, int level, string csvTitle)
{
bool isTileMatched = false;
try
{
var fieldTitle = field.Title;
if (csvTitle.Trim().ToUpper().Equals(fieldTitle.Trim().ToUpper()))
{
Console.WriteLine($"Title: {csvTitle} Matched");
isTileMatched = true;
}
else
{
//check in Nested Fields
foreach (var subFields in field.Field)
{
isTileMatched = PrintSection(subSections, sectionLevel + 1, csvTitle);
if (isTileMatched)
{
return true;
}
}
}
}
catch (Exception)
{
isTileMatched = false;
throw;
}
return isTileMatched;
}
Output:
I want output which are missed in JSON data but available in CSV file.
Document File 3.2
Conditions:
If comparison failed at any level of loop with any file type, the rest of comparison should not proceed.
Example: Comparison failed at Excel File 2, so print Excel File 2 and stop comparison for all rest of files and file types.
Comparison logic should be for each CSV row data with each web API response field title(json). It should not be like fetching entire JSON FileNames(Titles) as 1 list, and CSV Data(FileNames) as another list and do compare both list for differences.
Comparison should be Order of JSON data. (In this example, first compare all excel files and then Document Files)
Problem:
The code is partially working but numerous of loops happening in recursive function which I would like to avoid.
Best Options:
To use any third party Nuget Packages (Object Comparison) for this kind of problem to optimize the code which I am unable to find.
Unable to do with LINQ query to optimize for each calls in solution.
I suggest you group the files by FileType from the JSON and then find the first missing CSV file.
First, convert the JSON into a Dictionary by FileType and gather all the titles into a HashSet<string>:
var jData = JToken.Parse(jsonString);
var jd = jData.SelectToken("Field")
.ToDictionary(f => f["title"],
f => f["Field"].SelectTokens("..title")
.Select(jt => jt.ToString())
.ToHashSet());
Then, convert the CSV file into a list of CSVTemplate. I just did some simple code (not for production use):
var csvTemplateData = csvSrc.Split('\n').Skip(1).Select(line => line.Split(','))
.Select(va => new CSVTemplate {
id = va[0].ToInt(),
FileType = va[1],
FileName = va[2].Trim()
})
.ToList();
(ToInt() is the obvious extension method.)
Finally, you can find the first missing filename by checking each CSVTemplate in order to see if it is present:
var firstMissing = csvTemplateData.FirstOrDefault(t => !jd[t.FileType]
.Contains(t.FileName))
?.FileName;
firstMissing will be null if no missing CSV FileName is found, otherwise it will contain the FileName. If you prefer, you can leave off the ?.FileName to get the first missing CSVTemplate.
Related
I am converting a JSON file to a CSV file. The JSON has multiple nested objects. While converting, I am able to get all the values out of the JSON and into the CSV. However, all the values are being shown as one row with the same heading repeated multiple times. I am using CHOETL library.
using (var csv = new ChoCSVWriter("file1.csv").WithFirstLineHeader().WithDelimiter(","))
{
using (var json = new ChoJSONReader("file2.json")
.WithField("RecordID", jsonPath: "$..Events[*].RecordId")
.WithField("RecordType", jsonPath: "$..Events[*].RecordType")
.WithField("EventDate", jsonPath: "$..Events[*].EventDate")
{
csv.Write(json);
}
}
It shows the results as
Record ID_0 Record ID_1 Record ID_2
123 456 789
Instead of as
Record ID
123
456
789
Here is the JSON File
[
{
"Id": "3e399241",
"IdLineage": [
"sfdsfdsfs",
"sdfdsfdsf"
],
"Individuals": [
{
"Id": "1232112",
"IdLineage": [
"fdsfsd1"
],
"Events": [
{
"RecordId": "2132121321",
"RecordType": "SALE",
"EventDate": "2016-01-04T05:00:00Z"
},
{
"RecordId": "123213212",
"RecordType": "SALE",
"EventDate": "2012-07-16T04:00:00Z"
}
]
},
{
"Id": "ssf2112",
"IdLineage": [],
"Events": [
{
"RecordId": "123213ds21",
"RecordType": "ACXIOMRECORD",
"EventDate": "2017-12-17T03:33:54.875Z"
}
]
},
{
"Id": "asadsad",
"IdLineage": [],
"Events": [
{
"RecordId": "213213sa21",
"RecordType": "SALE",
"EventDate": "2018-03-09T05:00:00Z"
}
]
}
]
}
]
Based on sample code you posted, you are creating object from JSON as below
{
RecordID : Array,
RecordType: Array,
EventDate: Array
}
This leads to generate CSV in the below format, this is as expected.
RecordID_0, RecordID_1, RecordID_2, RecordType_0, RecordType_1, ....
If you want to create CSV in the below format, you will have to fix the json path on each record field
RecordID, RecordType, EventData
sample code
using (var csv = new ChoCSVWriter("file1.csv").WithFirstLineHeader().WithDelimiter(","))
{
using (var json = new ChoJSONReader("file2.json")
.WithField("RecordID", jsonPath: "$..Events.RecordId")
.WithField("RecordType", jsonPath: "$..Events.RecordType")
.WithField("EventDate", jsonPath: "$..Events.EventDate")
{
csv.Write(json);
}
}
UPDATE #1:
After looking at the sample JSON, this is how you can pull the data and produce CSV file in expected format
StringBuilder msg = new StringBuilder();
using (var w = new ChoCSVWriter(msg)
.WithFirstLineHeader()
)
{
using (var r = new ChoJSONReader("Sample32.json")
.WithJSONPath("$..Events[*]")
)
{
w.Write(r);
}
}
Console.WriteLine(msg.ToString());
OUTPUT #1:
RecordId,RecordType,EventDate
2132121321,SALE,1/4/2016 5:00:00 AM
123213212,SALE,7/16/2012 4:00:00 AM
123213ds21,ACXIOMRECORD,12/17/2017 3:33:54 AM
213213sa21,SALE,3/9/2018 5:00:00 AM
UPDATE #2:
You must use Linq to combine id's with event members. Sample below shows how to
using (var fw = new StreamWriter("Sample32.csv", true))
{
using (var w = new ChoCSVWriter(fw)
.WithFirstLineHeader()
)
{
using (var r = new ChoJSONReader("Sample32.json")
.WithJSONPath("$..Individuals[*]")
)
{
w.Write(r.SelectMany(r1 => ((dynamic[])r1.Events).Select(r2 => new { r1.Id, r2.RecordId, r2.RecordType, r2.EventDate })));
}
}
}
Console.WriteLine(File.ReadAllText("Sample32.csv"));
OUTPUT #2:
Id,RecordId,RecordType,EventDate
1232112,2132121321,SALE,1/4/2016 5:00:00 AM
1232112,123213212,SALE,7/16/2012 4:00:00 AM
ssf2112,123213ds21,ACXIOMRECORD,12/17/2017 3:33:54 AM
asadsad,213213sa21,SALE,3/9/2018 5:00:00 AM
Is possible to create (from the example below), lets say, 2 instances of Animal and 3 instances of Flower based on data from a single json file?
Example code:
class Nature
{
// do something with a specific json file
class Animal
{
string id;
bool isMammal;
}
class Flower
{
string id;
int numberOfPetals;
}
}
Expected result:
An x amount of instances of Animal
An y amount of instances of Flower
PS:
x and y depends on the data obtained from the json file.
The workaround I thought was to, instead of creating an json file, create a .txt file, which contains Json data fragments. Then, save the contents of the .txt file into a variable. And finally, from that variable select each json data fragments to work with them individually, as if each one were an individual json file.
But, would it have a simpler way of doing this?
You can use JSON structure instead of maintining the count.
{
"AnimalCollection" :
[
{ "id":"Animal-1", "IsMammal":true },
{ "id":"Animal-2", "IsMammal":true },
{ "id":"Animal-3", "IsMammal":true }
],
"FlowerCollection":
[
{ "id":"Flower-1", "numberOfPetals":30 },
{ "id":"Flower-2", "numberOfPetals":20 },
{ "id":"Flower-3", "numberOfPetals":10 },
{ "id":"Flower-4", "numberOfPetals":3 }
]
}
Then you can deserialize this using newtonsoft.json into below type
public class Data
{
public Animal[] AnimalCollection {get;set;}
public Flower[] FlowerCollection{get;set;
}
It will contain 3 animal instances and 4 flower instances from JSON.
Hope this helps.
You can load the JSON file and convert the data into datatable. Create new list of Animal and Flower.
c# code as follows:
//Declare typed list of Animals
List<Animal> AllAnimals = new List<Animal>();
//Declare typed list of Flowers
List<Flower> AllFlowers = new List<Flower>();
DataTable dt;
//Load JSON file data
using (StreamReader r = new StreamReader(JSONFilePath))
{
string json = r.ReadToEnd();
//convert JSON data to datatable
dt = (DataTable)JsonConvert.DeserializeObject(json, (typeof(DataTable)));
}
foreach(DataRow row in dt.Rows)
{
if (row[1].ToString() == "Animal")
{
Animal NewAnimal = new Animal();
NewAnimal.id = row[0].ToString();
NewAnimal.isMammal = row[2].ToString();
AllAnimals.Add(NewAnimal);
}
else
{
Flower NewFlower = new Flower();
NewFlower.id = row[0].ToString();
NewFlower.numberOfPetals = row[3].ToString();
AllFlowers.Add(NewFlower);
}
}
Below is a sample of json data that can be loaded from a file:
{
{
"id": "0",
"EntryType": "Animal",
"IsMamal": "True",
"numberOfPetals": ""
},
{
"id": "1",
"EntryType": "Animal",
"IsMamal": "True",
"numberOfPetals": ""
},
{
"id": "2",
"EntryType": "Flower",
"IsMamal": "",
"numberOfPetals": "8"
},
{
"id": "1",
"EntryType": "Flower",
"IsMamal": "",
"numberOfPetals": "6"
},
{
"id": "2",
"EntryType": "Flower",
"IsMamal": "",
"numberOfPetals": "10"
}
}
I have a very large JSON file, now the car array below can be upto 100,000,000 records. The total file size can vary from 500mb to 10 GB. I am using Newtonsoft json.net
Input
{
"name": "John",
"age": "30",
"cars": [{
"brand": "ABC",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "1",
"day": "1"
}, {
"brand": "XYZ",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "10",
"day": "01"
}],
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"}
Desired Output
File 1 Json
{
"name": "John",
"age": "30",
"cars": {
"brand": "ABC",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "1",
"day": "1"
},
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"
}
File 2 Json
{
"name": "John",
"age": "30",
"cars": {
"brand": "XYZ",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "10",
"day": "01"
},
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"
}
So I came up with the following code which kinda works
public static void SplitJson(Uri objUri, string splitbyProperty)
{
try
{
bool readinside = false;
HttpClient client = new HttpClient();
using (Stream stream = client.GetStreamAsync(objUri).Result)
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
Node objnode = new Node();
while (reader.Read())
{
JObject obj = new JObject(reader);
if (reader.TokenType == JsonToken.String && reader.Path.ToString().Contains("name") && !reader.Value.ToString().Equals(reader.Path.ToString()))
{
objnode.name = reader.Value.ToString();
}
if (reader.TokenType == JsonToken.Integer && reader.Path.ToString().Contains("age") && !reader.Value.ToString().Equals(reader.Path.ToString()))
{
objnode.age = reader.Value.ToString();
}
if (reader.Path.ToString().Contains(splitbyProperty) && reader.TokenType == JsonToken.StartArray)
{
int counter = 0;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
counter = counter + 1;
var item = JsonSerializer.Create().Deserialize<Car>(reader);
objnode.cars = new List<Car>();
objnode.cars.Add(item);
insertIntoFileSystem(objnode, counter);
}
if (reader.TokenType == JsonToken.EndArray)
break;
}
}
}
}
}
catch (Exception)
{
throw;
}
}
public static void insertIntoFileSystem(Node objNode, int counter)
{
string fileName = #"C:\Temp\output_" + objNode.name + "_" + objNode.age + "_" + counter + ".json";
var serialiser = new JsonSerializer();
using (TextWriter tw = new StreamWriter(fileName))
{
using (StringWriter textWriter = new StringWriter())
{
serialiser.Serialize(textWriter, objNode);
tw.WriteLine(textWriter);
}
}
}
ISSUE
Any field after the array is not being captured when file is large in size. Is there a way to skip or do parallel processing of the reader for large array in json. In short I am not able to capture the below part using my code
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"}
You are going to need to process your large JSON file in two passes to achieve the result you want.
In the first pass, split the file into two: create a file containing just the huge array, and a second file which contains all the other information, which will be used as a template for the individual JSON files you ultimately want to create.
In the second pass, read the template file into memory (I'm assuming this part of the JSON is relatively smallish so this should not be a problem), then use a reader to process the array file one item at a time. For each item, combine it with the template and write it to a separate file.
At the end, you can delete the temporary array and template files.
Here is what it might look like in code:
using System.IO;
using System.Text;
using System.Net.Http;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
public static void SplitJson(Uri objUri, string arrayPropertyName)
{
string templateFileName = #"C:\Temp\template.json";
string arrayFileName = #"C:\Temp\array.json";
// Split the original JSON stream into two temporary files:
// one that has the huge array and one that has everything else
HttpClient client = new HttpClient();
using (Stream stream = client.GetStreamAsync(objUri).Result)
using (JsonReader reader = new JsonTextReader(new StreamReader(inputStream)))
using (JsonWriter templateWriter = new JsonTextWriter(new StreamWriter(templateFileName)))
using (JsonWriter arrayWriter = new JsonTextWriter(new StreamWriter(arrayFileName)))
{
if (reader.Read() && reader.TokenType == JsonToken.StartObject)
{
templateWriter.WriteStartObject();
while (reader.Read() && reader.TokenType != JsonToken.EndObject)
{
string propertyName = (string)reader.Value;
reader.Read();
templateWriter.WritePropertyName(propertyName);
if (propertyName == arrayPropertyName)
{
arrayWriter.WriteToken(reader);
templateWriter.WriteStartObject(); // empty placeholder object
templateWriter.WriteEndObject();
}
else if (reader.TokenType == JsonToken.StartObject ||
reader.TokenType == JsonToken.StartArray)
{
templateWriter.WriteToken(reader);
}
else
{
templateWriter.WriteValue(reader.Value);
}
}
templateWriter.WriteEndObject();
}
}
// Now read the huge array file and combine each item in the array
// with the template to make new files
JObject template = JObject.Parse(File.ReadAllText(templateFileName));
using (JsonReader arrayReader = new JsonTextReader(new StreamReader(arrayFileName)))
{
int counter = 0;
while (arrayReader.Read())
{
if (arrayReader.TokenType == JsonToken.StartObject)
{
counter++;
JObject item = JObject.Load(arrayReader);
template[arrayPropertyName] = item;
string fileName = string.Format(#"C:\Temp\output_{0}_{1}_{2}.json",
template["name"], template["age"], counter);
File.WriteAllText(fileName, template.ToString());
}
}
}
// Clean up temporary files
File.Delete(templateFileName);
File.Delete(arrayFileName);
}
Note the above approach will require double the disk space of the original JSON during processing due to the temporary files. If this is a problem, you can modify the code to download the file twice instead (although this will likely increase the processing time). In the first download, create the template JSON and ignore the array; in the second download, advance to the array and process it with the template as before to create the output files.
I am using Newtonsoft JSON to parse some output from an API. However, the API result is nearly 2.5 MB, and parsing out the entire file just to find the data I need takes a long time. Here is a snippet of the API output:
{
"response": {
"success": 1,
"current_time": 1416085203,
"raw_usd_value": 0.2,
"usd_currency": "metal",
"usd_currency_index": 5002,
"items": {
"A Brush with Death": {
"defindex": [
30186
],
"prices": {
"6": {
"Tradable": {
"Craftable": [
{
"currency": "metal",
"value": 3,
"last_update": 1414184620,
"difference": -0.165
}
]
}
}
}
},
My code is supposed to find the only object that is a child of the items object with the number '5021' in the defindex array, and pull out the currency and value data. Here is the code I use to find the data:
dynamic result = Newtonsoft.Json.JsonConvert.DeserializeObject(priceFile);
int keyprice = 0;
foreach(var items in result.response.items){
foreach(var item in items){
string indexstr = item.defindex.ToString();
if (indexstr.Contains(defindex))
{
foreach(var price in item.prices){
foreach (var quality in price)
{
Console.WriteLine("{0} {1}", quality.Tradable.Craftable[0].value, quality.Tradable.Craftable[0].currency);
keyprice = quality.Tradable.Craftable[0].value;
return keyprice;
}
}
}
}
}
Ideally, the code should only take up to 10 seconds to run.
I would create an class for response object, and then use code similar to the following. I tested on a 2.8MB json file and it averaged about 1.2 second, also try using fastJSON (there is a nuget package) - it is the fastest parser I have found.
string fileName = #"c:\temp\json\yourfile.json";
string json;
using (StreamReader sr = new StreamReader(fileName))
{
json = sr.ReadToEnd();
}
response myResponse = fastJSON.JSON.ToObject<response>(json);
var item = myResponse.First(i => i.defindex == "5051");
foreach (var price in item.prices)
{
foreach (var quality in price)
{
Console.WriteLine("{0} {1}", quality.Tradable.Craftable[0].value, quality.Tradable.Craftable[0].currency);
keyprice = quality.Tradable.Craftable[0].value;
return keyprice;
}
}
I have a class named "IndexModel":
public class IndexModel
{
[ElasticProperty(Index= FieldIndexOption.NotAnalyzed, Store = true)]
public string ModelNumber{ get; set; }
}
following is how i setup the elastic client:
var uri = new Uri("http://localhost:9200");
var config = new ConnectionSettings(uri);
var client = new ElasticClient(config);
client.Map<IndexModel>(m => m.MapFromAttributes());
I can see the mapped result from response:
Request {
"indexmodel": {
"properties": {
"modelNumber": {
"type": "string",
"store": true,
"index": "not_analyzed"
},
}
}
}
and i have one index record for this type, the value of "ModelNumber" property is "test-123", and following is my query:
var result = client.Search<IndexModel>(s => s.Query(new TermQuery() { Field = Property.Path<IndexModel>(it => it.ModelNumber), Value = "test-123"}));
here is the final mapped request i got:
Method: POST,
Url: http://localhost:9200/_search,
Request: {
"query": {
"term": {
"modelNumber": {
"value": "test-123"
}
}
}
}
But i can not get the result, if i change the value of "ModelNumber" property to "test123", re-index it, and search it by keywords "test123", then it's works, so i think the analyzer still analyzed the "ModelNumber" property, can someone help me, thanks.
I had the same problem, the solution is first create the index then put the mapping and at last add your data.
Add Type Attribute to your model field
[ElasticProperty(OmitNorms = true, Index = FieldIndexOption.NotAnalyzed)]
var node = new Uri("http://192.168.0.56:9200/");
var settings = new ConnectionSettings(node, defaultIndex: "ticket");
var client = new ElasticClient(settings);
var createIndexResult = client.CreateIndex("ticket");
var mapResult = client.Map<TicketElastic>(c => c.MapFromAttributes().IgnoreConflicts().Type("TicketElastic").Indices("ticket"));