Generic code to split large JSON based on node name

Generic code to split large JSON based on node name - c#

I have a very large JSON file, now the car array below can be upto 100,000,000 records. The total file size can vary from 500mb to 10 GB. I am using Newtonsoft json.net
Input
{
"name": "John",
"age": "30",
"cars": [{
"brand": "ABC",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "1",
"day": "1"
}, {
"brand": "XYZ",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "10",
"day": "01"
}],
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"}
Desired Output
File 1 Json
{
"name": "John",
"age": "30",
"cars": {
"brand": "ABC",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "1",
"day": "1"
},
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"
}
File 2 Json
{
"name": "John",
"age": "30",
"cars": {
"brand": "XYZ",
"models": ["Alhambra", "Altea", "AlteaXL", "Arosa", "Cordoba", "CordobaVario", "Exeo", "Ibiza", "IbizaST", "ExeoST", "Leon", "LeonST", "Inca", "Mii", "Toledo"],
"year": "2019",
"month": "10",
"day": "01"
},
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"
}
So I came up with the following code which kinda works
public static void SplitJson(Uri objUri, string splitbyProperty)
{
try
{
bool readinside = false;
HttpClient client = new HttpClient();
using (Stream stream = client.GetStreamAsync(objUri).Result)
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
Node objnode = new Node();
while (reader.Read())
{
JObject obj = new JObject(reader);
if (reader.TokenType == JsonToken.String && reader.Path.ToString().Contains("name") && !reader.Value.ToString().Equals(reader.Path.ToString()))
{
objnode.name = reader.Value.ToString();
}
if (reader.TokenType == JsonToken.Integer && reader.Path.ToString().Contains("age") && !reader.Value.ToString().Equals(reader.Path.ToString()))
{
objnode.age = reader.Value.ToString();
}
if (reader.Path.ToString().Contains(splitbyProperty) && reader.TokenType == JsonToken.StartArray)
{
int counter = 0;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
counter = counter + 1;
var item = JsonSerializer.Create().Deserialize<Car>(reader);
objnode.cars = new List<Car>();
objnode.cars.Add(item);
insertIntoFileSystem(objnode, counter);
}
if (reader.TokenType == JsonToken.EndArray)
break;
}
}
}
}
}
catch (Exception)
{
throw;
}
}
public static void insertIntoFileSystem(Node objNode, int counter)
{
string fileName = #"C:\Temp\output_" + objNode.name + "_" + objNode.age + "_" + counter + ".json";
var serialiser = new JsonSerializer();
using (TextWriter tw = new StreamWriter(fileName))
{
using (StringWriter textWriter = new StringWriter())
{
serialiser.Serialize(textWriter, objNode);
tw.WriteLine(textWriter);
}
}
}
ISSUE
Any field after the array is not being captured when file is large in size. Is there a way to skip or do parallel processing of the reader for large array in json. In short I am not able to capture the below part using my code
"TestCity": "TestCityValue",
"TestCity1": "TestCityValue1"}

You are going to need to process your large JSON file in two passes to achieve the result you want.
In the first pass, split the file into two: create a file containing just the huge array, and a second file which contains all the other information, which will be used as a template for the individual JSON files you ultimately want to create.
In the second pass, read the template file into memory (I'm assuming this part of the JSON is relatively smallish so this should not be a problem), then use a reader to process the array file one item at a time. For each item, combine it with the template and write it to a separate file.
At the end, you can delete the temporary array and template files.
Here is what it might look like in code:
using System.IO;
using System.Text;
using System.Net.Http;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
public static void SplitJson(Uri objUri, string arrayPropertyName)
{
string templateFileName = #"C:\Temp\template.json";
string arrayFileName = #"C:\Temp\array.json";
// Split the original JSON stream into two temporary files:
// one that has the huge array and one that has everything else
HttpClient client = new HttpClient();
using (Stream stream = client.GetStreamAsync(objUri).Result)
using (JsonReader reader = new JsonTextReader(new StreamReader(inputStream)))
using (JsonWriter templateWriter = new JsonTextWriter(new StreamWriter(templateFileName)))
using (JsonWriter arrayWriter = new JsonTextWriter(new StreamWriter(arrayFileName)))
{
if (reader.Read() && reader.TokenType == JsonToken.StartObject)
{
templateWriter.WriteStartObject();
while (reader.Read() && reader.TokenType != JsonToken.EndObject)
{
string propertyName = (string)reader.Value;
reader.Read();
templateWriter.WritePropertyName(propertyName);
if (propertyName == arrayPropertyName)
{
arrayWriter.WriteToken(reader);
templateWriter.WriteStartObject(); // empty placeholder object
templateWriter.WriteEndObject();
}
else if (reader.TokenType == JsonToken.StartObject ||
reader.TokenType == JsonToken.StartArray)
{
templateWriter.WriteToken(reader);
}
else
{
templateWriter.WriteValue(reader.Value);
}
}
templateWriter.WriteEndObject();
}
}
// Now read the huge array file and combine each item in the array
// with the template to make new files
JObject template = JObject.Parse(File.ReadAllText(templateFileName));
using (JsonReader arrayReader = new JsonTextReader(new StreamReader(arrayFileName)))
{
int counter = 0;
while (arrayReader.Read())
{
if (arrayReader.TokenType == JsonToken.StartObject)
{
counter++;
JObject item = JObject.Load(arrayReader);
template[arrayPropertyName] = item;
string fileName = string.Format(#"C:\Temp\output_{0}_{1}_{2}.json",
template["name"], template["age"], counter);
File.WriteAllText(fileName, template.ToString());
}
}
}
// Clean up temporary files
File.Delete(templateFileName);
File.Delete(arrayFileName);
}
Note the above approach will require double the disk space of the original JSON during processing due to the temporary files. If this is a problem, you can modify the code to download the file twice instead (although this will likely increase the processing time). In the first download, create the template JSON and ignore the array; in the second download, advance to the array and process it with the template as before to create the output files.

Related

Avoid unnecessary for loops in recursive function in C#

I have CSV template file with different data files with filenames in below format that reads and store in CSVTemplate model
Note: The data is CSV file for each file types with names will be ordered.
id,FileType,FileName
1,Excel Files, Excel File 1
2,Excel Files, Excel File 2
3,Excel Files, Excel File 2.1
4,Document Files, Document File 1,
5,Document Files, Document File 2
6,Document Files, Document File 3
7,Document Files, Document File 3.1
8,Document Files, Document File 3.2
Model:
public class CSVTemplate
{
public int id { get; set; }
public string FileType{ get; set; }
public string FileName{ get; set; }
}
Code for reading CS Records: I am using CSVHelper NuGet package
public List<CSVTemplate> Fetch_TemplateData()
{
List<CSVTemplate> csvTemplateData = new List<CSVTemplate>();
try
{
string csvFilePath = #"File1.csv";
//CSVHelper Configuration Settings
var csvconfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
NewLine = Environment.NewLine,
HasHeaderRecord = true,
DetectDelimiterValues = new string[] { "," },
};
//Reading CSV File
using (var reader = new StreamReader(csvFilePath))
{
//Fetching CSV Records
using (var csv = new CsvReader(reader, csvconfig))
{
csvTemplateData = csv.GetRecords<CSVTemplate>().ToList();
}
};
return csvTemplateData;
}
catch (Exception)
{
throw;
}
}
I have below JSON response from WebAPI contains file names in title for all types of files which is nested and inner nested.
JSON:
{
"Field": [
{
"title": "Excel Files",
"Field": [
{
"title": "Excel File 1"
},
{
"title": "Excel File 2",
"Field": [
{
"title": "Excel File 2.1"
},
{
"title": "Excel File 2.2"
}
]
},
{
"title": "Excel File 3",
"Field": [
{
"title": "Excel File 3.1",
"Field": [
{
"title": "Excel File 3.1.1"
}
]
}
]
}
]
},
{
"title": "Document Files",
"Field": [
{
"title": "Document File 1"
},
{
"title": "Document File 2",
"Field": [
{
"title": "Document File 2.1"
},
{
"title": "Document File 2.2"
}
]
},
{
"title": "Document File 3",
"Field": [
{
"title": "Document File 3.1",
"Field": [
{
"title": "Document File 3.1.1"
}
]
}
]
}
]
}
]
}
CSV file contains mandatory file names of all file types. webapi response (json) contains mandatory and non mandatory file names. I need to compare CSV with webapi data and check if mandatory filenames are missed.
Model:
public class Field
{
public string title { get; set; }
public List<Field> Field { get; set; }
}
public class Root
{
public List<Field> Field { get; set; }
}
Updated Code:
public bool Validate_TemplateData(List<CSVTemplate> csvData)
{
bool isTemplateValidate = false;
try
{
string jsonFile = #".\json1.json";
string jsonString = File.ReadAllText(jsonFile);
Root jsondata = JsonConvert.DeserializeObject<Root>(jsonString);
if (jsondata != null)
{
bool isMatched = false;
foreach (var csvRowData in csvData)
{
var csvTitle = csvRowData.FileName;
foreach (var field in jsondata)
{
isMatched = CheckFieldTitle(field, 0, csvTitle);
if (isMatched)
{
break;
}
}
if (!isMatched)
{
Console.WriteLine($"Title : {csvTitle} Not matched.");
break;
}
}
}
}
catch (Exception)
{
throw;
}
return isTemplateValidate;
}
Recursive Function : to read all nested field names/titles from JSON.
private static bool CheckFieldTitle(Field field, int level, string csvTitle)
{
bool isTileMatched = false;
try
{
var fieldTitle = field.Title;
if (csvTitle.Trim().ToUpper().Equals(fieldTitle.Trim().ToUpper()))
{
Console.WriteLine($"Title: {csvTitle} Matched");
isTileMatched = true;
}
else
{
//check in Nested Fields
foreach (var subFields in field.Field)
{
isTileMatched = PrintSection(subSections, sectionLevel + 1, csvTitle);
if (isTileMatched)
{
return true;
}
}
}
}
catch (Exception)
{
isTileMatched = false;
throw;
}
return isTileMatched;
}
Output:
I want output which are missed in JSON data but available in CSV file.
Document File 3.2
Conditions:
If comparison failed at any level of loop with any file type, the rest of comparison should not proceed.
Example: Comparison failed at Excel File 2, so print Excel File 2 and stop comparison for all rest of files and file types.
Comparison logic should be for each CSV row data with each web API response field title(json). It should not be like fetching entire JSON FileNames(Titles) as 1 list, and CSV Data(FileNames) as another list and do compare both list for differences.
Comparison should be Order of JSON data. (In this example, first compare all excel files and then Document Files)
Problem:
The code is partially working but numerous of loops happening in recursive function which I would like to avoid.
Best Options:
To use any third party Nuget Packages (Object Comparison) for this kind of problem to optimize the code which I am unable to find.
Unable to do with LINQ query to optimize for each calls in solution.

I suggest you group the files by FileType from the JSON and then find the first missing CSV file.
First, convert the JSON into a Dictionary by FileType and gather all the titles into a HashSet<string>:
var jData = JToken.Parse(jsonString);
var jd = jData.SelectToken("Field")
.ToDictionary(f => f["title"],
f => f["Field"].SelectTokens("..title")
.Select(jt => jt.ToString())
.ToHashSet());
Then, convert the CSV file into a list of CSVTemplate. I just did some simple code (not for production use):
var csvTemplateData = csvSrc.Split('\n').Skip(1).Select(line => line.Split(','))
.Select(va => new CSVTemplate {
id = va[0].ToInt(),
FileType = va[1],
FileName = va[2].Trim()
})
.ToList();
(ToInt() is the obvious extension method.)
Finally, you can find the first missing filename by checking each CSVTemplate in order to see if it is present:
var firstMissing = csvTemplateData.FirstOrDefault(t => !jd[t.FileType]
.Contains(t.FileName))
?.FileName;
firstMissing will be null if no missing CSV FileName is found, otherwise it will contain the FileName. If you prefer, you can leave off the ?.FileName to get the first missing CSVTemplate.

How to output JSON array as a single field in CSV using ChoETL

I'm using ChoETL to convert JSON to CSV. Currently, if a property in the JSON object is an array it is output into separate fields in JSON.
Example:
{
"id", 1234,
"states": [
"PA",
"VA"
]
},
{
"id", 1235,
"states": [
"CA",
"DE",
"MD"
]
},
This results in CSV like this (using pipe as a delimeter)
"id"|"states_0"|"states_1"|"states_2"
"1234"|"PA"|"VA"
"1235"|"CA"|"DE"|"MD"
What I would like is for the array to be displayed in a single states field as a comma separated string
"id"|"states"
"1234"|"PA,VA"
"1235"|"CA,DE,MD"
Here is the code I have in place to perform the parsing and transformation.
public static class JsonCsvConverter
{
public static string ConvertJsonToCsv(string json)
{
var csvData = new StringBuilder();
using (var jsonReader = ChoJSONReader.LoadText(json))
{
using (var csvWriter = new ChoCSVWriter(csvData).WithFirstLineHeader())
{
csvWriter.WithMaxScanRows(1000);
csvWriter.Configuration.Delimiter = "|";
csvWriter.Configuration.QuoteAllFields = true;
csvWriter.Write(jsonReader);
}
}
return csvData.ToString();
}
}
Edited: Removed test code that wasn't useful

This is how you can produce the expected output using the code below
var csvData = new StringBuilder();
using (var jsonReader = ChoJSONReader.LoadText(json))
{
using (var csvWriter = new ChoCSVWriter(csvData)
.WithFirstLineHeader()
.WithDelimiter("|")
.QuoteAllFields()
.Configure(c => c.UseNestedKeyFormat = false)
.WithField("id")
.WithField("states", m => m.ValueConverter(o => String.Join(",", ((Array)o).OfType<string>())))
)
{
csvWriter.Write(jsonReader);
}
}
Console.WriteLine(csvData.ToString());
Output:
id|states
"1234"|"PA,VA"
"1235"|"CA,DE,MD"
PS: on the next release, this issue will be handled automatically without using valueconverters

JSON repeating inputs section from Excel

I have a specific JSON string that I need to match for a rest call. I'm pulling the data from an excel spreadsheet. One of the sections has repeating input like below. The data in my spreadsheet looks like this:
The JSON I need to generate looks like:
"detailInputs": [
{
"name": "SOGrid",
"repeatingInputs": [
{
"inputs": [
{
"name": "ItemNumber",
"value": "XYZ"
},
{
"name": "Quantity",
"value": "1"
}
]
},
{
"inputs": [
{
"name": "ItemNumber",
"value": "ABC"
},
{
"name": "Quantity",
"value": "3"
}
]
}
]
What I've tried so far is below (note jsonArraystring is the header information formatted in a previous section):
using (var conn = new OleDbConnection(connectionString))
{
sheetName = "Detail";
conn.Open();
var cmd = conn.CreateCommand();
cmd.CommandText = $"SELECT * FROM [{sheetName}$]";
using (var rdr = cmd.ExecuteReader())
{
var query = rdr.Cast<DbDataRecord>().Select(row => new {
name = row[0],
value = row[1],
//description = row[2]
});
var json = JsonConvert.SerializeObject(query);
jsonArrayString = jsonArrayString + ",\"detailInputs\":[{\"name\":\"SOGrid\",\"repeatingInputs\":[{\"inputs\": " + json + "}]}]}";
This is very close, but puts the "repeating Inputs" are all in one inputs section.
I also tried assigning the values to a dictionary and list in hopes of pulling the appropriate pairs and formatting the JSON from that, this is the beginning of that, but I'm not familiar enough with unraveling the key value pairs to get that formatted correctly.
using (var conn = new OleDbConnection(connectionString))
{
sheetName = "Detail";
conn.Open();
int counter = 0;
var cmd = conn.CreateCommand();
cmd.CommandText = $"SELECT * FROM [{sheetName}$]";
var values = new List<Dictionary<string, object>>();
var ListValues = new List<string>();
using (var rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
var fieldValues = new Dictionary<string, object>();
var fieldValuesList = new List<string>();
for (int i = 0; i < rdr.FieldCount; i++)
{
fieldValues.Add(rdr.GetName(i), rdr[i]);
fieldValuesList.Add(rdr.GetName(i));
}
// add the dictionary on the values list
values.Add(fieldValues);
}
The root question is how can I create a repeating inputs structure as shown in the JSON sample, by pulling from excel data.

What you want to do is to serialize the contents of the Excel worksheet as the array value of the "repeatingInputs" property, using a specific structure. I would suggest breaking this down into a series of LINQ transformations.
First, introduce a couple of extension methods:
public static class DataReaderExtensions
{
// Adapted from this answer https://stackoverflow.com/a/1202973
// To https://stackoverflow.com/questions/1202935/convert-rows-from-a-data-reader-into-typed-results
// By https://stackoverflow.com/users/3043/joel-coehoorn
public static IEnumerable<T> SelectRows<T>(this IDataReader reader, Func<IDataRecord, T> select)
{
while (reader.Read())
{
yield return select(reader);
}
}
}
public static class EnumerableExtensions
{
// Adapted from this answer https://stackoverflow.com/a/419058
// To https://stackoverflow.com/questions/419019/split-list-into-sublists-with-linq/
// By https://stackoverflow.com/users/50776/casperone
public static IEnumerable<List<T>> ChunkWhile<T>(this IEnumerable<T> enumerable, Func<List<T>, T, bool> shouldAdd)
{
if (enumerable == null || shouldAdd == null)
throw new ArgumentNullException();
return enumerable.ChunkWhileIterator(shouldAdd);
}
static IEnumerable<List<T>> ChunkWhileIterator<T>(this IEnumerable<T> enumerable, Func<List<T>, T, bool> shouldAdd)
{
List<T> list = new List<T>();
foreach (var item in enumerable)
{
if (list.Count > 0 && !shouldAdd(list, item))
{
yield return list;
list = new List<T>();
}
list.Add(item);
}
if (list.Count != 0)
{
yield return list;
}
}
}
The first method packages an IDataReader into an enumerable of typed objects, one for each row. Doing this makes it easier to feed the data reader's contents into subsequent LINQ transformations. The second method breaks a flat enumerable into an enumerable of "chunks" of lists, based on some predicate condition. This will be used to break the rows into chunks at each ItemNumber row.
Using these two extension methods we can generate the required JSON as follows:
public static string ExtractRows(string connectionString, string sheetName)
{
using (var conn = new OleDbConnection(connectionString))
{
conn.Open();
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = string.Format("SELECT * FROM [{0}]", sheetName);
using (var rdr = cmd.ExecuteReader())
{
var query = rdr
// Wrap the IDataReader in a LINQ enumerator returning an array of key/value pairs for each row.
// Project the first two columns into a single anonymous object.
.SelectRows(r =>
{
// Check we have two columns in the row, and the first (Name) column value is non-null.
// You might instead check that we have at least two columns.
if (r.FieldCount != 2 || r.IsDBNull(0))
throw new InvalidDataException();
return new { Name = r[0].ToString(), Value = r[1] };
})
// Break the columns into chunks when the first name repeats
.ChunkWhile((l, r) => l[0].Name != r.Name)
// Wrap in the container Inputs object
.Select(r => new { Inputs = r });
// Serialize in camel case
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};
return JsonConvert.SerializeObject(query, Formatting.Indented, settings);
}
}
}
}
Which will generate the required value for "repeatingInputs":
[
{
"inputs": [
{
"name": "ItemNumber",
"value": "XYZ"
},
{
"name": "Quantity",
"value": "1"
}
]
},
{
"inputs": [
{
"name": "ItemNumber",
"value": "ABC"
},
{
"name": "Quantity",
"value": "3"
}
]
}
]

JSON parsing takes way too long

I am using Newtonsoft JSON to parse some output from an API. However, the API result is nearly 2.5 MB, and parsing out the entire file just to find the data I need takes a long time. Here is a snippet of the API output:
{
"response": {
"success": 1,
"current_time": 1416085203,
"raw_usd_value": 0.2,
"usd_currency": "metal",
"usd_currency_index": 5002,
"items": {
"A Brush with Death": {
"defindex": [
30186
],
"prices": {
"6": {
"Tradable": {
"Craftable": [
{
"currency": "metal",
"value": 3,
"last_update": 1414184620,
"difference": -0.165
}
]
}
}
}
},
My code is supposed to find the only object that is a child of the items object with the number '5021' in the defindex array, and pull out the currency and value data. Here is the code I use to find the data:
dynamic result = Newtonsoft.Json.JsonConvert.DeserializeObject(priceFile);
int keyprice = 0;
foreach(var items in result.response.items){
foreach(var item in items){
string indexstr = item.defindex.ToString();
if (indexstr.Contains(defindex))
{
foreach(var price in item.prices){
foreach (var quality in price)
{
Console.WriteLine("{0} {1}", quality.Tradable.Craftable[0].value, quality.Tradable.Craftable[0].currency);
keyprice = quality.Tradable.Craftable[0].value;
return keyprice;
}
}
}
}
}
Ideally, the code should only take up to 10 seconds to run.

I would create an class for response object, and then use code similar to the following. I tested on a 2.8MB json file and it averaged about 1.2 second, also try using fastJSON (there is a nuget package) - it is the fastest parser I have found.
string fileName = #"c:\temp\json\yourfile.json";
string json;
using (StreamReader sr = new StreamReader(fileName))
{
json = sr.ReadToEnd();
}
response myResponse = fastJSON.JSON.ToObject<response>(json);
var item = myResponse.First(i => i.defindex == "5051");
foreach (var price in item.prices)
{
foreach (var quality in price)
{
Console.WriteLine("{0} {1}", quality.Tradable.Craftable[0].value, quality.Tradable.Craftable[0].currency);
keyprice = quality.Tradable.Craftable[0].value;
return keyprice;
}
}

How do I get formatted JSON in .NET using C#?

I am using .NET JSON parser and would like to serialize my config file so it is readable. So instead of:
{"blah":"v", "blah2":"v2"}
I would like something nicer like:
{
"blah":"v",
"blah2":"v2"
}
My code is something like this:
using System.Web.Script.Serialization;
var ser = new JavaScriptSerializer();
configSz = ser.Serialize(config);
using (var f = (TextWriter)File.CreateText(configFn))
{
f.WriteLine(configSz);
f.Close();
}

You are going to have a hard time accomplishing this with JavaScriptSerializer.
Try JSON.Net.
With minor modifications from JSON.Net example
using System;
using Newtonsoft.Json;
namespace JsonPrettyPrint
{
internal class Program
{
private static void Main(string[] args)
{
Product product = new Product
{
Name = "Apple",
Expiry = new DateTime(2008, 12, 28),
Price = 3.99M,
Sizes = new[] { "Small", "Medium", "Large" }
};
string json = JsonConvert.SerializeObject(product, Formatting.Indented);
Console.WriteLine(json);
Product deserializedProduct = JsonConvert.DeserializeObject<Product>(json);
}
}
internal class Product
{
public String[] Sizes { get; set; }
public decimal Price { get; set; }
public DateTime Expiry { get; set; }
public string Name { get; set; }
}
}
Results
{
"Sizes": [
"Small",
"Medium",
"Large"
],
"Price": 3.99,
"Expiry": "\/Date(1230447600000-0700)\/",
"Name": "Apple"
}
Documentation: Serialize an Object

A shorter sample code for Json.Net library
private static string FormatJson(string json)
{
dynamic parsedJson = JsonConvert.DeserializeObject(json);
return JsonConvert.SerializeObject(parsedJson, Formatting.Indented);
}

If you have a JSON string and want to "prettify" it, but don't want to serialise it to and from a known C# type then the following does the trick (using JSON.NET):
using System;
using System.IO;
using Newtonsoft.Json;
class JsonUtil
{
public static string JsonPrettify(string json)
{
using (var stringReader = new StringReader(json))
using (var stringWriter = new StringWriter())
{
var jsonReader = new JsonTextReader(stringReader);
var jsonWriter = new JsonTextWriter(stringWriter) { Formatting = Formatting.Indented };
jsonWriter.WriteToken(jsonReader);
return stringWriter.ToString();
}
}
}

Shortest version to prettify existing JSON: (edit: using JSON.net)
JToken.Parse("mystring").ToString()
Input:
{"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }}
Output:
{
"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{
"value": "New",
"onclick": "CreateNewDoc()"
},
{
"value": "Open",
"onclick": "OpenDoc()"
},
{
"value": "Close",
"onclick": "CloseDoc()"
}
]
}
}
}
To pretty-print an object:
JToken.FromObject(myObject).ToString()

Oneliner using Newtonsoft.Json.Linq:
string prettyJson = JToken.Parse(uglyJsonString).ToString(Formatting.Indented);

Net Core App
var js = JsonSerializer.Serialize(obj, new JsonSerializerOptions {
WriteIndented = true
});

All this can be done in one simple line:
string jsonString = JsonConvert.SerializeObject(yourObject, Formatting.Indented);

Here is a solution using Microsoft's System.Text.Json library:
static string FormatJsonText(string jsonString)
{
using var doc = JsonDocument.Parse(
jsonString,
new JsonDocumentOptions
{
AllowTrailingCommas = true
}
);
MemoryStream memoryStream = new MemoryStream();
using (
var utf8JsonWriter = new Utf8JsonWriter(
memoryStream,
new JsonWriterOptions
{
Indented = true
}
)
)
{
doc.WriteTo(utf8JsonWriter);
}
return new System.Text.UTF8Encoding()
.GetString(memoryStream.ToArray());
}

You may use following standard method for getting formatted Json
JsonReaderWriterFactory.CreateJsonWriter(Stream stream, Encoding encoding, bool ownsStream, bool indent, string indentChars)
Only set "indent==true"
Try something like this
public readonly DataContractJsonSerializerSettings Settings =
new DataContractJsonSerializerSettings
{ UseSimpleDictionaryFormat = true };
public void Keep<TValue>(TValue item, string path)
{
try
{
using (var stream = File.Open(path, FileMode.Create))
{
//var currentCulture = Thread.CurrentThread.CurrentCulture;
//Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
try
{
using (var writer = JsonReaderWriterFactory.CreateJsonWriter(
stream, Encoding.UTF8, true, true, " "))
{
var serializer = new DataContractJsonSerializer(type, Settings);
serializer.WriteObject(writer, item);
writer.Flush();
}
}
catch (Exception exception)
{
Debug.WriteLine(exception.ToString());
}
finally
{
//Thread.CurrentThread.CurrentCulture = currentCulture;
}
}
}
catch (Exception exception)
{
Debug.WriteLine(exception.ToString());
}
}
Pay your attention to lines
var currentCulture = Thread.CurrentThread.CurrentCulture;
Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
....
Thread.CurrentThread.CurrentCulture = currentCulture;
For some kinds of xml-serializers you should use InvariantCulture to avoid exception during deserialization on the computers with different Regional settings. For example, invalid format of double or DateTime sometimes cause them.
For deserializing
public TValue Revive<TValue>(string path, params object[] constructorArgs)
{
try
{
using (var stream = File.OpenRead(path))
{
//var currentCulture = Thread.CurrentThread.CurrentCulture;
//Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
try
{
var serializer = new DataContractJsonSerializer(type, Settings);
var item = (TValue) serializer.ReadObject(stream);
if (Equals(item, null)) throw new Exception();
return item;
}
catch (Exception exception)
{
Debug.WriteLine(exception.ToString());
return (TValue) Activator.CreateInstance(type, constructorArgs);
}
finally
{
//Thread.CurrentThread.CurrentCulture = currentCulture;
}
}
}
catch
{
return (TValue) Activator.CreateInstance(typeof (TValue), constructorArgs);
}
}
Thanks!

Using System.Text.Json set JsonSerializerOptions.WriteIndented = true:
JsonSerializerOptions options = new JsonSerializerOptions { WriteIndented = true };
string json = JsonSerializer.Serialize<Type>(object, options);

2023 Update
For those who ask how I get formatted JSON in .NET using C# and want to see how to use it right away and one-line lovers. Here are the indented JSON string one-line codes:
There are 2 well-known JSON formatter or parsers to serialize:
Newtonsoft Json.Net version:
using Newtonsoft.Json;
var jsonString = JsonConvert.SerializeObject(yourObj, Formatting.Indented);
.Net 7 version:
using System.Text.Json;
var jsonString = JsonSerializer.Serialize(yourObj, new JsonSerializerOptions { WriteIndented = true });

using System.Text.Json;
...
var parsedJson = JsonSerializer.Deserialize<ExpandoObject>(json);
var options = new JsonSerializerOptions() { WriteIndented = true };
return JsonSerializer.Serialize(parsedJson, options);

First I wanted to add comment under Duncan Smart post, but unfortunately I have not got enough reputation yet to leave comments. So I will try it here.
I just want to warn about side effects.
JsonTextReader internally parses json into typed JTokens and then serialises them back.
For example if your original JSON was
{ "double":0.00002, "date":"\/Date(1198908717056)\/"}
After prettify you get
{
"double":2E-05,
"date": "2007-12-29T06:11:57.056Z"
}
Of course both json string are equivalent and will deserialize to structurally equal objects, but if you need to preserve original string values, you need to take this into concideration

I have something very simple for this. You can put as input really any object to be converted into json with a format:
private static string GetJson<T> (T json)
{
return JsonConvert.SerializeObject(json, Formatting.Indented);
}

This worked for me. In case someone is looking for a VB.NET version.
#imports System
#imports System.IO
#imports Newtonsoft.Json
Public Shared Function JsonPrettify(ByVal json As String) As String
Using stringReader = New StringReader(json)
Using stringWriter = New StringWriter()
Dim jsonReader = New JsonTextReader(stringReader)
Dim jsonWriter = New JsonTextWriter(stringWriter) With {
.Formatting = Formatting.Indented
}
jsonWriter.WriteToken(jsonReader)
Return stringWriter.ToString()
End Using
End Using
End Function

.NET 5 has built in classes for handling JSON parsing, serialization, deserialization under System.Text.Json namespace. Below is an example of a serializer which converts a .NET object to a JSON string,
using System.Text.Json;
using System.Text.Json.Serialization;
private string ConvertJsonString(object obj)
{
JsonSerializerOptions options = new JsonSerializerOptions();
options.WriteIndented = true; //Pretty print using indent, white space, new line, etc.
options.NumberHandling = JsonNumberHandling.AllowNamedFloatingPointLiterals; //Allow NANs
string jsonString = JsonSerializer.Serialize(obj, options);
return jsonString;
}

Below code works for me:
JsonConvert.SerializeObject(JToken.Parse(yourobj.ToString()))

For UTF8 encoded JSON file using .NET Core 3.1, I was finally able to use JsonDocument based upon this information from Microsoft: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to#utf8jsonreader-utf8jsonwriter-and-jsondocument
string allLinesAsOneString = string.Empty;
string [] lines = File.ReadAllLines(filename, Encoding.UTF8);
foreach(var line in lines)
allLinesAsOneString += line;
JsonDocument jd = JsonDocument.Parse(Encoding.UTF8.GetBytes(allLinesAsOneString));
var writer = new Utf8JsonWriter(Console.OpenStandardOutput(), new JsonWriterOptions
{
Indented = true
});
JsonElement root = jd.RootElement;
if( root.ValueKind == JsonValueKind.Object )
{
writer.WriteStartObject();
}
foreach (var jp in root.EnumerateObject())
jp.WriteTo(writer);
writer.WriteEndObject();
writer.Flush();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Generic code to split large JSON based on node name - c#

Related

Avoid unnecessary for loops in recursive function in C#

How to output JSON array as a single field in CSV using ChoETL

JSON repeating inputs section from Excel

JSON parsing takes way too long

How do I get formatted JSON in .NET using C#?

Categories

Resources