Related
I am converting a JSON file to a CSV file. The JSON has multiple nested objects. While converting, I am able to get all the values out of the JSON and into the CSV. However, all the values are being shown as one row with the same heading repeated multiple times. I am using CHOETL library.
using (var csv = new ChoCSVWriter("file1.csv").WithFirstLineHeader().WithDelimiter(","))
{
using (var json = new ChoJSONReader("file2.json")
.WithField("RecordID", jsonPath: "$..Events[*].RecordId")
.WithField("RecordType", jsonPath: "$..Events[*].RecordType")
.WithField("EventDate", jsonPath: "$..Events[*].EventDate")
{
csv.Write(json);
}
}
It shows the results as
Record ID_0 Record ID_1 Record ID_2
123 456 789
Instead of as
Record ID
123
456
789
Here is the JSON File
[
{
"Id": "3e399241",
"IdLineage": [
"sfdsfdsfs",
"sdfdsfdsf"
],
"Individuals": [
{
"Id": "1232112",
"IdLineage": [
"fdsfsd1"
],
"Events": [
{
"RecordId": "2132121321",
"RecordType": "SALE",
"EventDate": "2016-01-04T05:00:00Z"
},
{
"RecordId": "123213212",
"RecordType": "SALE",
"EventDate": "2012-07-16T04:00:00Z"
}
]
},
{
"Id": "ssf2112",
"IdLineage": [],
"Events": [
{
"RecordId": "123213ds21",
"RecordType": "ACXIOMRECORD",
"EventDate": "2017-12-17T03:33:54.875Z"
}
]
},
{
"Id": "asadsad",
"IdLineage": [],
"Events": [
{
"RecordId": "213213sa21",
"RecordType": "SALE",
"EventDate": "2018-03-09T05:00:00Z"
}
]
}
]
}
]
Based on sample code you posted, you are creating object from JSON as below
{
RecordID : Array,
RecordType: Array,
EventDate: Array
}
This leads to generate CSV in the below format, this is as expected.
RecordID_0, RecordID_1, RecordID_2, RecordType_0, RecordType_1, ....
If you want to create CSV in the below format, you will have to fix the json path on each record field
RecordID, RecordType, EventData
sample code
using (var csv = new ChoCSVWriter("file1.csv").WithFirstLineHeader().WithDelimiter(","))
{
using (var json = new ChoJSONReader("file2.json")
.WithField("RecordID", jsonPath: "$..Events.RecordId")
.WithField("RecordType", jsonPath: "$..Events.RecordType")
.WithField("EventDate", jsonPath: "$..Events.EventDate")
{
csv.Write(json);
}
}
UPDATE #1:
After looking at the sample JSON, this is how you can pull the data and produce CSV file in expected format
StringBuilder msg = new StringBuilder();
using (var w = new ChoCSVWriter(msg)
.WithFirstLineHeader()
)
{
using (var r = new ChoJSONReader("Sample32.json")
.WithJSONPath("$..Events[*]")
)
{
w.Write(r);
}
}
Console.WriteLine(msg.ToString());
OUTPUT #1:
RecordId,RecordType,EventDate
2132121321,SALE,1/4/2016 5:00:00 AM
123213212,SALE,7/16/2012 4:00:00 AM
123213ds21,ACXIOMRECORD,12/17/2017 3:33:54 AM
213213sa21,SALE,3/9/2018 5:00:00 AM
UPDATE #2:
You must use Linq to combine id's with event members. Sample below shows how to
using (var fw = new StreamWriter("Sample32.csv", true))
{
using (var w = new ChoCSVWriter(fw)
.WithFirstLineHeader()
)
{
using (var r = new ChoJSONReader("Sample32.json")
.WithJSONPath("$..Individuals[*]")
)
{
w.Write(r.SelectMany(r1 => ((dynamic[])r1.Events).Select(r2 => new { r1.Id, r2.RecordId, r2.RecordType, r2.EventDate })));
}
}
}
Console.WriteLine(File.ReadAllText("Sample32.csv"));
OUTPUT #2:
Id,RecordId,RecordType,EventDate
1232112,2132121321,SALE,1/4/2016 5:00:00 AM
1232112,123213212,SALE,7/16/2012 4:00:00 AM
ssf2112,123213ds21,ACXIOMRECORD,12/17/2017 3:33:54 AM
asadsad,213213sa21,SALE,3/9/2018 5:00:00 AM
I have a specific JSON string that I need to match for a rest call. I'm pulling the data from an excel spreadsheet. One of the sections has repeating input like below. The data in my spreadsheet looks like this:
The JSON I need to generate looks like:
"detailInputs": [
{
"name": "SOGrid",
"repeatingInputs": [
{
"inputs": [
{
"name": "ItemNumber",
"value": "XYZ"
},
{
"name": "Quantity",
"value": "1"
}
]
},
{
"inputs": [
{
"name": "ItemNumber",
"value": "ABC"
},
{
"name": "Quantity",
"value": "3"
}
]
}
]
What I've tried so far is below (note jsonArraystring is the header information formatted in a previous section):
using (var conn = new OleDbConnection(connectionString))
{
sheetName = "Detail";
conn.Open();
var cmd = conn.CreateCommand();
cmd.CommandText = $"SELECT * FROM [{sheetName}$]";
using (var rdr = cmd.ExecuteReader())
{
var query = rdr.Cast<DbDataRecord>().Select(row => new {
name = row[0],
value = row[1],
//description = row[2]
});
var json = JsonConvert.SerializeObject(query);
jsonArrayString = jsonArrayString + ",\"detailInputs\":[{\"name\":\"SOGrid\",\"repeatingInputs\":[{\"inputs\": " + json + "}]}]}";
This is very close, but puts the "repeating Inputs" are all in one inputs section.
I also tried assigning the values to a dictionary and list in hopes of pulling the appropriate pairs and formatting the JSON from that, this is the beginning of that, but I'm not familiar enough with unraveling the key value pairs to get that formatted correctly.
using (var conn = new OleDbConnection(connectionString))
{
sheetName = "Detail";
conn.Open();
int counter = 0;
var cmd = conn.CreateCommand();
cmd.CommandText = $"SELECT * FROM [{sheetName}$]";
var values = new List<Dictionary<string, object>>();
var ListValues = new List<string>();
using (var rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
var fieldValues = new Dictionary<string, object>();
var fieldValuesList = new List<string>();
for (int i = 0; i < rdr.FieldCount; i++)
{
fieldValues.Add(rdr.GetName(i), rdr[i]);
fieldValuesList.Add(rdr.GetName(i));
}
// add the dictionary on the values list
values.Add(fieldValues);
}
The root question is how can I create a repeating inputs structure as shown in the JSON sample, by pulling from excel data.
What you want to do is to serialize the contents of the Excel worksheet as the array value of the "repeatingInputs" property, using a specific structure. I would suggest breaking this down into a series of LINQ transformations.
First, introduce a couple of extension methods:
public static class DataReaderExtensions
{
// Adapted from this answer https://stackoverflow.com/a/1202973
// To https://stackoverflow.com/questions/1202935/convert-rows-from-a-data-reader-into-typed-results
// By https://stackoverflow.com/users/3043/joel-coehoorn
public static IEnumerable<T> SelectRows<T>(this IDataReader reader, Func<IDataRecord, T> select)
{
while (reader.Read())
{
yield return select(reader);
}
}
}
public static class EnumerableExtensions
{
// Adapted from this answer https://stackoverflow.com/a/419058
// To https://stackoverflow.com/questions/419019/split-list-into-sublists-with-linq/
// By https://stackoverflow.com/users/50776/casperone
public static IEnumerable<List<T>> ChunkWhile<T>(this IEnumerable<T> enumerable, Func<List<T>, T, bool> shouldAdd)
{
if (enumerable == null || shouldAdd == null)
throw new ArgumentNullException();
return enumerable.ChunkWhileIterator(shouldAdd);
}
static IEnumerable<List<T>> ChunkWhileIterator<T>(this IEnumerable<T> enumerable, Func<List<T>, T, bool> shouldAdd)
{
List<T> list = new List<T>();
foreach (var item in enumerable)
{
if (list.Count > 0 && !shouldAdd(list, item))
{
yield return list;
list = new List<T>();
}
list.Add(item);
}
if (list.Count != 0)
{
yield return list;
}
}
}
The first method packages an IDataReader into an enumerable of typed objects, one for each row. Doing this makes it easier to feed the data reader's contents into subsequent LINQ transformations. The second method breaks a flat enumerable into an enumerable of "chunks" of lists, based on some predicate condition. This will be used to break the rows into chunks at each ItemNumber row.
Using these two extension methods we can generate the required JSON as follows:
public static string ExtractRows(string connectionString, string sheetName)
{
using (var conn = new OleDbConnection(connectionString))
{
conn.Open();
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = string.Format("SELECT * FROM [{0}]", sheetName);
using (var rdr = cmd.ExecuteReader())
{
var query = rdr
// Wrap the IDataReader in a LINQ enumerator returning an array of key/value pairs for each row.
// Project the first two columns into a single anonymous object.
.SelectRows(r =>
{
// Check we have two columns in the row, and the first (Name) column value is non-null.
// You might instead check that we have at least two columns.
if (r.FieldCount != 2 || r.IsDBNull(0))
throw new InvalidDataException();
return new { Name = r[0].ToString(), Value = r[1] };
})
// Break the columns into chunks when the first name repeats
.ChunkWhile((l, r) => l[0].Name != r.Name)
// Wrap in the container Inputs object
.Select(r => new { Inputs = r });
// Serialize in camel case
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};
return JsonConvert.SerializeObject(query, Formatting.Indented, settings);
}
}
}
}
Which will generate the required value for "repeatingInputs":
[
{
"inputs": [
{
"name": "ItemNumber",
"value": "XYZ"
},
{
"name": "Quantity",
"value": "1"
}
]
},
{
"inputs": [
{
"name": "ItemNumber",
"value": "ABC"
},
{
"name": "Quantity",
"value": "3"
}
]
}
]
I have a very large JSON file, now the car array below can be upto 100,000,000 records. The total file size can vary from 500mb to 10 GB. I am using Newtonsoft json.net
Input
{
"name": "John",
"age": "30",
"cars": [{
"brand": "ABC",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "1",
"day": "1"
}, {
"brand": "XYZ",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "10",
"day": "01"
}],
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"}
Desired Output
File 1 Json
{
"name": "John",
"age": "30",
"cars": {
"brand": "ABC",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "1",
"day": "1"
},
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"
}
File 2 Json
{
"name": "John",
"age": "30",
"cars": {
"brand": "XYZ",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "10",
"day": "01"
},
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"
}
So I came up with the following code which kinda works
public static void SplitJson(Uri objUri, string splitbyProperty)
{
try
{
bool readinside = false;
HttpClient client = new HttpClient();
using (Stream stream = client.GetStreamAsync(objUri).Result)
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
Node objnode = new Node();
while (reader.Read())
{
JObject obj = new JObject(reader);
if (reader.TokenType == JsonToken.String && reader.Path.ToString().Contains("name") && !reader.Value.ToString().Equals(reader.Path.ToString()))
{
objnode.name = reader.Value.ToString();
}
if (reader.TokenType == JsonToken.Integer && reader.Path.ToString().Contains("age") && !reader.Value.ToString().Equals(reader.Path.ToString()))
{
objnode.age = reader.Value.ToString();
}
if (reader.Path.ToString().Contains(splitbyProperty) && reader.TokenType == JsonToken.StartArray)
{
int counter = 0;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
counter = counter + 1;
var item = JsonSerializer.Create().Deserialize<Car>(reader);
objnode.cars = new List<Car>();
objnode.cars.Add(item);
insertIntoFileSystem(objnode, counter);
}
if (reader.TokenType == JsonToken.EndArray)
break;
}
}
}
}
}
catch (Exception)
{
throw;
}
}
public static void insertIntoFileSystem(Node objNode, int counter)
{
string fileName = #"C:\Temp\output_" + objNode.name + "_" + objNode.age + "_" + counter + ".json";
var serialiser = new JsonSerializer();
using (TextWriter tw = new StreamWriter(fileName))
{
using (StringWriter textWriter = new StringWriter())
{
serialiser.Serialize(textWriter, objNode);
tw.WriteLine(textWriter);
}
}
}
ISSUE
Any field after the array is not being captured when file is large in size. Is there a way to skip or do parallel processing of the reader for large array in json. In short I am not able to capture the below part using my code
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"}
You are going to need to process your large JSON file in two passes to achieve the result you want.
In the first pass, split the file into two: create a file containing just the huge array, and a second file which contains all the other information, which will be used as a template for the individual JSON files you ultimately want to create.
In the second pass, read the template file into memory (I'm assuming this part of the JSON is relatively smallish so this should not be a problem), then use a reader to process the array file one item at a time. For each item, combine it with the template and write it to a separate file.
At the end, you can delete the temporary array and template files.
Here is what it might look like in code:
using System.IO;
using System.Text;
using System.Net.Http;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
public static void SplitJson(Uri objUri, string arrayPropertyName)
{
string templateFileName = #"C:\Temp\template.json";
string arrayFileName = #"C:\Temp\array.json";
// Split the original JSON stream into two temporary files:
// one that has the huge array and one that has everything else
HttpClient client = new HttpClient();
using (Stream stream = client.GetStreamAsync(objUri).Result)
using (JsonReader reader = new JsonTextReader(new StreamReader(inputStream)))
using (JsonWriter templateWriter = new JsonTextWriter(new StreamWriter(templateFileName)))
using (JsonWriter arrayWriter = new JsonTextWriter(new StreamWriter(arrayFileName)))
{
if (reader.Read() && reader.TokenType == JsonToken.StartObject)
{
templateWriter.WriteStartObject();
while (reader.Read() && reader.TokenType != JsonToken.EndObject)
{
string propertyName = (string)reader.Value;
reader.Read();
templateWriter.WritePropertyName(propertyName);
if (propertyName == arrayPropertyName)
{
arrayWriter.WriteToken(reader);
templateWriter.WriteStartObject(); // empty placeholder object
templateWriter.WriteEndObject();
}
else if (reader.TokenType == JsonToken.StartObject ||
reader.TokenType == JsonToken.StartArray)
{
templateWriter.WriteToken(reader);
}
else
{
templateWriter.WriteValue(reader.Value);
}
}
templateWriter.WriteEndObject();
}
}
// Now read the huge array file and combine each item in the array
// with the template to make new files
JObject template = JObject.Parse(File.ReadAllText(templateFileName));
using (JsonReader arrayReader = new JsonTextReader(new StreamReader(arrayFileName)))
{
int counter = 0;
while (arrayReader.Read())
{
if (arrayReader.TokenType == JsonToken.StartObject)
{
counter++;
JObject item = JObject.Load(arrayReader);
template[arrayPropertyName] = item;
string fileName = string.Format(#"C:\Temp\output_{0}_{1}_{2}.json",
template["name"], template["age"], counter);
File.WriteAllText(fileName, template.ToString());
}
}
}
// Clean up temporary files
File.Delete(templateFileName);
File.Delete(arrayFileName);
}
Note the above approach will require double the disk space of the original JSON during processing due to the temporary files. If this is a problem, you can modify the code to download the file twice instead (although this will likely increase the processing time). In the first download, create the template JSON and ignore the array; in the second download, advance to the array and process it with the template as before to create the output files.
I am using Newtonsoft JSON to parse some output from an API. However, the API result is nearly 2.5 MB, and parsing out the entire file just to find the data I need takes a long time. Here is a snippet of the API output:
{
"response": {
"success": 1,
"current_time": 1416085203,
"raw_usd_value": 0.2,
"usd_currency": "metal",
"usd_currency_index": 5002,
"items": {
"A Brush with Death": {
"defindex": [
30186
],
"prices": {
"6": {
"Tradable": {
"Craftable": [
{
"currency": "metal",
"value": 3,
"last_update": 1414184620,
"difference": -0.165
}
]
}
}
}
},
My code is supposed to find the only object that is a child of the items object with the number '5021' in the defindex array, and pull out the currency and value data. Here is the code I use to find the data:
dynamic result = Newtonsoft.Json.JsonConvert.DeserializeObject(priceFile);
int keyprice = 0;
foreach(var items in result.response.items){
foreach(var item in items){
string indexstr = item.defindex.ToString();
if (indexstr.Contains(defindex))
{
foreach(var price in item.prices){
foreach (var quality in price)
{
Console.WriteLine("{0} {1}", quality.Tradable.Craftable[0].value, quality.Tradable.Craftable[0].currency);
keyprice = quality.Tradable.Craftable[0].value;
return keyprice;
}
}
}
}
}
Ideally, the code should only take up to 10 seconds to run.
I would create an class for response object, and then use code similar to the following. I tested on a 2.8MB json file and it averaged about 1.2 second, also try using fastJSON (there is a nuget package) - it is the fastest parser I have found.
string fileName = #"c:\temp\json\yourfile.json";
string json;
using (StreamReader sr = new StreamReader(fileName))
{
json = sr.ReadToEnd();
}
response myResponse = fastJSON.JSON.ToObject<response>(json);
var item = myResponse.First(i => i.defindex == "5051");
foreach (var price in item.prices)
{
foreach (var quality in price)
{
Console.WriteLine("{0} {1}", quality.Tradable.Craftable[0].value, quality.Tradable.Craftable[0].currency);
keyprice = quality.Tradable.Craftable[0].value;
return keyprice;
}
}
I have some JSON Data which looks like this:
{
"response":{
"_token":"StringValue",
"code":"OK",
"user":{
"userid":"2630944",
"firstname":"John",
"lastname":"Doe",
"reference":"999999999",
"guid":"StringValue",
"domainid":"99999",
"username":"jdoe",
"email":"jdoe#jdoe.edu",
"passwordquestion":"",
"flags":"0",
"lastlogindate":"2013-02-05T17:54:06.31Z",
"creationdate":"2011-04-15T14:40:07.22Z",
"version":"3753",
"data":{
"aliasname":{
"$value":"John Doe"
},
"smsaddress":{
"$value":"5555555555#messaging.sprintpcs.com"
},
"blti":{
"hideemail":"false",
"hidefullname":"false"
},
"notify":{
"grades":{
"$value":"0"
},
"messages":{
"$value":"1"
}
},
"beta_component_courseplanexpress_1":{
"$value":"true"
}
}
}
}
I am using C# with JSON.NET to parse through the data. I've been able to sucessfully get data using this algorithm:
User MyUser = new User();
JToken data = JObject.Parse(json);
MyUser.FirstName = (string) data.SelectToken("response.user.firstname");
//The same for all the other properties.
The problem is with the data field. This field is based on user preferences mostly and data is only inserted as it is used. The fields are all custom and developers can put in as many as they want without restrictions. Essentially, it's all free form data. Also as you notice they can be nested really far with data.
I've tried to run:
MyUser.Data = JsonConvert.DeserializeObject<List<JToken>>((string) data.SelectToken("response.user.data");
which doesn't work.
How would you go about converting it to be used in a C# object?
You can access it via the JToken / JArray / JObject methods. For example, this will list all of the keys under the data:
public class StackOverflow_14714085
{
const string JSON = #"{
""response"": {
""_token"": ""StringValue"",
""code"": ""OK"",
""user"": {
""userid"": ""2630944"",
""firstname"": ""John"",
""lastname"": ""Doe"",
""reference"": ""999999999"",
""guid"": ""StringValue"",
""domainid"": ""99999"",
""username"": ""jdoe"",
""email"": ""jdoe#jdoe.edu"",
""passwordquestion"": """",
""flags"": ""0"",
""lastlogindate"": ""2013-02-05T17:54:06.31Z"",
""creationdate"": ""2011-04-15T14:40:07.22Z"",
""version"": ""3753"",
""data"": {
""aliasname"": {
""$value"": ""John Doe""
},
""smsaddress"": {
""$value"": ""5555555555#messaging.sprintpcs.com""
},
""blti"": {
""hideemail"": ""false"",
""hidefullname"": ""false""
},
""notify"": {
""grades"": {
""$value"": ""0""
},
""messages"": {
""$value"": ""1""
}
},
""beta_component_courseplanexpress_1"": {
""$value"": ""true""
}
}
}
}
}";
public static void Test()
{
var jo = JObject.Parse(JSON);
var data = (JObject)jo["response"]["user"]["data"];
foreach (var item in data)
{
Console.WriteLine("{0}: {1}", item.Key, item.Value);
}
}
}
Json.NET can actually parse to a dynamic if that is useful to you.
Which means you can do something like.
dynamic parsedObject = JsonConvert.DeserializeObject("{ test: \"text-value\" }");
parsedObject["test"]; // "text-value"
parsedObject.test; // "text-value"
parsedObject.notHere; // null
Edit: might be more suitable for you to iterate the values if you don't know what you are looking for though.
dynamic parsedObject = JsonConvert.DeserializeObject("{ test: { inner: \"text-value\" } }");
foreach (dynamic entry in parsedObject)
{
string name = entry.Name; // "test"
dynamic value = entry.Value; // { inner: "text-value" }
}