I'm trying to parse a json file and then operate on it to insert data into a SQL Server 2008 Database.
Example:
var sr = new StreamReader("C:\\path to file\file.json");
var json = JsonSerializer.SerializeToString(sr);
var o = JsonObject.Parse(json);
But I always get this error at the second line - "Timeouts are not supported on this stream."
The Json file looks like this:
"main":{
"prg": [
{
"Id": 1,
"name": "A&E",
more fields
}
"prg": [
{
"Id": 2,
"name": "asda",
more fields
}
}
I need to make something like this
foreach (prg in main)
entity.id = prg.id
entity.name = prg.name
How can I do this and why I get that timeout exception?
EDIT: To better understand my question this is how I do for an XML file
XmlDocument sourceDoc = new XmlDocument();
sourceDoc.Load(SourcesElement2); // where SourcesElement2 is the path to my XML
XmlNodeList prg = sourceDoc.GetElementsByTagName("prg");
foreach (XmlNode item in prg)
{
entity.Name= item.SelectSingleNode("name").InnerText;
...
}
I have converted the XML to Json and I want to do same thing. For every "prg" node in the Json File insert a new item in the database
EDIT2:
This is what I've done.
using (
StreamReader stream =
File.OpenText(
"C:\\path\\Sources.json")
)
{
JObject sources = (JObject) JToken.ReadFrom(new JsonTextReader(stream));
var a = sources["on"];
var b = a["sources"];
var c = b["prgs"];
foreach (var item in c)
{
var d= item.SelectToken("prg");
// Here d is null
}
I have the same question as the one from above. For every "prg" node in the Json File insert a new item in the database. How can I do this ? ( path to prg is on/sources/prgs/ )
I don't think you want to serialize the stream.
JsonSerializer.SerializeToString(sr)
You want to deserialize from the stream.
JsonSerializer.Deserialize
You might want to use JsonReader for performance reasons.
Your XML example load the whole file in the memory - you don't want to do that for large documents. reader.Read() pattern is better suited for processing large files.
For everyone with the same problem here is what I've done ( NOTE: this is just an example )
By the way, thank you for everyone who tried to answer my question and I'm sorry for my mistakes.
List<string> d = new List<string>();
using (
StreamReader stream =
File.OpenText(
"C:\\path\\Sources.json")
)
{
JObject sources = (JObject) JToken.ReadFrom(new JsonTextReader(stream));
var a = sources["on"];
var b = a["sources"];
var c = b["prgs"];
foreach (JObject item in c["prg"].ToList())
{
d.Add(item.Value<string>("name"));
}
}
//part below is just for testing
foreach (var VARIABLE in d)
{
Console.WriteLine(VARIABLE);
}
Console.ReadLine();
I would approach it by converting that JSON object into a C# class and then applying logic to the C# object and/or use DataTables
[Note: There are solutions online that show you would to easily pass an Object or List<Object> into a DataTable and then pass it "easily" to SQL]
The first step is still your hiccup in either solution, how do I pull in a large JSON string from filesystem?
if you have the JSON, use json2csharp.com and/or jsonutils.com to retrieve the classes in order to Deserialize it to your object.
StreamReader re = new StreamReader(#"C:\path to file\file.json");
JsonTextReader reader = new JsonTextReader(re);
YourClass DeserializedObject = se.Deserialize<YourClass>(reader);
Console.WriteLine(DeserializeObject.SomeProperty);
Related
I have a very, very large JSON file (1000+ MB) of identical JSON objects. For example:
[
{
"id": 1,
"value": "hello",
"another_value": "world",
"value_obj": {
"name": "obj1"
},
"value_list": [
1,
2,
3
]
},
{
"id": 2,
"value": "foo",
"another_value": "bar",
"value_obj": {
"name": "obj2"
},
"value_list": [
4,
5,
6
]
},
{
"id": 3,
"value": "a",
"another_value": "b",
"value_obj": {
"name": "obj3"
},
"value_list": [
7,
8,
9
]
},
...
]
Every single item in the root JSON list follows the same structure and thus would be individually deserializable. I already have the C# classes written to receive this data, and deserializing a JSON file containing a single object without the list works as expected.
At first, I tried to just directly deserialize my objects in a loop:
JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (!sr.EndOfStream)
{
o = serializer.Deserialize<MyObject>(reader);
}
}
This didn't work, threw an exception clearly stating that an object is expected, not a list. My understanding is that this command would just read a single object contained at the root level of the JSON file, but since we have a list of objects, this is an invalid request.
My next idea was to deserialize as a C# List of objects:
JsonSerializer serializer = new JsonSerializer();
List<MyObject> o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (!sr.EndOfStream)
{
o = serializer.Deserialize<List<MyObject>>(reader);
}
}
This does succeed. However, it only somewhat reduces the issue of high RAM usage. In this case it does look like the application is deserializing items one at a time, and so is not reading the entire JSON file into RAM, but we still end up with a lot of RAM usage because the C# List object now contains all of the data from the JSON file in RAM. This has only displaced the problem.
I then decided to simply try taking a single character off the beginning of the stream (to eliminate the [) by doing sr.Read() before going into the loop. The first object then does read successfully, but subsequent ones do not, with an exception of "unexpected token". My guess is this is the comma and space between the objects throwing the reader off.
Simply removing square brackets won't work since the objects do contain a primitive list of their own, as you can see in the sample. Even trying to use }, as a separator won't work since, as you can see, there are sub-objects within the objects.
What my goal is, is to be able to read the objects from the stream one at a time. Read an object, do something with it, then discard it from RAM, and read the next object, and so on. This would eliminate the need to load either the entire JSON string or the entire contents of the data into RAM as C# objects.
What am I missing?
This should resolve your problem. Basically it works just like your initial code except it's only deserializing object when the reader hits the { character in the stream and otherwise it's just skipping to the next one until it finds another start object token.
JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (reader.Read())
{
// deserialize only when there's "{" character in the stream
if (reader.TokenType == JsonToken.StartObject)
{
o = serializer.Deserialize<MyObject>(reader);
}
}
}
I think we can do better than the accepted answer, using more features of JsonReader to make a more generalized solution.
As a JsonReader consumes tokens from a JSON, the path is recorded in the JsonReader.Path property.
We can use this to precisely select deeply nested data from a JSON file, using regex to ensure that we're on the right path.
So, using the following extension method:
public static class JsonReaderExtensions
{
public static IEnumerable<T> SelectTokensWithRegex<T>(
this JsonReader jsonReader, Regex regex)
{
JsonSerializer serializer = new JsonSerializer();
while (jsonReader.Read())
{
if (regex.IsMatch(jsonReader.Path)
&& jsonReader.TokenType != JsonToken.PropertyName)
{
yield return serializer.Deserialize<T>(jsonReader);
}
}
}
}
The data you are concerned with lies on paths:
[0]
[1]
[2]
... etc
We can construct the following regex to precisely match this path:
var regex = new Regex(#"^\[\d+\]$");
it now becomes possible to stream objects out of your data (without fully loading or parsing the entire JSON) as follows
IEnumerable<MyObject> objects = jsonReader.SelectTokensWithRegex<MyObject>(regex);
Or if we want to dig even deeper into the structure, we can be even more precise with our regex
var regex = new Regex(#"^\[\d+\]\.value$");
IEnumerable<string> objects = jsonReader.SelectTokensWithRegex<string>(regex);
to only extract value properties from the items in the array.
I've found this technique extremely useful for extracting specific data from huge (100 GiB) JSON dumps, directly from HTTP using a network stream (with low memory requirements and no intermediate storage required).
.NET 6
This is easily done with the System.Text.Json.JsonSerializer in .NET 6:
using (FileStream? fileStream = new FileStream("hugefile.json", FileMode.Open))
{
IAsyncEnumerable<Person?> people = JsonSerializer.DeserializeAsyncEnumerable<Person?>(fileStream);
await foreach (Person? person in people)
{
Console.WriteLine($"Hello, my name is {person.Name}!");
}
}
Here is another easy way to parse large JSON file using Cinchoo ETL, an open source library (Uses JSON.NET under the hood to parse the json in stream manner)
using (var r = ChoJSONReader<MyObject>.LoadText(json)
)
{
foreach (var rec in r)
Console.WriteLine(rec.Dump());
}
Sample fiddle: https://dotnetfiddle.net/i5qJ5R
Is this what you're looking for? Found on a previous question
The current version of Json.net does not allow you to use the accepted answer code. A current alternative is:
public static object DeserializeFromStream(Stream stream)
{
var serializer = new JsonSerializer();
using (var sr = new StreamReader(stream))
using (var jsonTextReader = new JsonTextReader(sr))
{
return serializer.Deserialize(jsonTextReader);
}
}
Documentation: Deserialize JSON from a file stream
Does anyone know how to convert the below nested JSON to CSV via CHOETL (An ETL framework for .NET)? Thank you!
I'm using this code but it will only return the first equipment record.
CODE:
{
using (var json = new ChoJSONReader("./test.json"))
{
csv.Write(json.Cast<dynamic>().Select(i => new
{
EquipmentId = i.GpsLocation.Equipment[0].EquipmentId,
InquiryValue = i.GpsLocation.Equipment[0].InquiryValue,
Timestamp = i.GpsLocation.Equipment[0].Timestamp
}));
}
}
JSON:
"GpsLocation": {
"Equipment": [
{
"EquipmentId": "EQ00001",
"InquiryValue": [
"IV00001"
],
"Timestamp": "2020-01-01 01:01:01.01",
},
{
"EquipmentId": "EQ00002",
"InquiryValue": [
"IV00002"
],
"Timestamp": "2020-01-01 01:01:01.01"
}
]
}
}````
As others suggest, the issue is you are only looking at the first element of the array.
It appears that the easiest way to control what you serialise into CSV is by correctly defining your source objects from JSON. JSON Path expressions come in pretty handy.
What I ended up doing here is query all JSON to return an array of Equipment objects regardless of where they are in the hierarchy (which means you may need to filter it a bit better depending on your full JSON).
Then it's pretty easy to define each field based on JSON path and just pass the result to CSVWriter.
Also check out some gotchas that I outlined in the respective comment lines.
void Main()
{
var jsonString = "{\"GpsLocation\":{\"Equipment\":[{\"EquipmentId\":\"EQ00001\",\"InquiryValue\":[\"IV00001\"],\"Timestamp\":\"2020-01-01 01:01:01.01\"},{\"EquipmentId\":\"EQ00002\",\"InquiryValue\":[\"IV00002\"],\"Timestamp\":\"2020-01-01 01:01:01.01\"}]}}";
var jsonReader = new StringReader(jsonString);
var csvWriter = new StringWriter(); // outputs to string, comment out if you want file output
//var csvWriter = new StreamWriter(".\\your_output.csv"); // writes to a file of your choice
using (var csv = new ChoCSVWriter(csvWriter))
using (var json = new ChoJSONReader(jsonReader)
.WithJSONPath("$..Equipment[*]", true) // firstly you scope the reader to all Equipment objects. take note of the second parameter. Apparently you need to pass true here as otherwise it just won't return anythig
.WithField("EquipmentId", jsonPath: "$.EquipmentId", isArray: false) // then you scope each field in the array to what you want it to be. Since you want scalar values, pass `isArray: false` for better predictability
.WithField("InquiryValue", jsonPath: "$.InquiryValue[0]", isArray: false) // since your InquiryValue is actually an array, you want to obtain first element here. if you don't do this, fields names and values would go askew
.WithField("Timestamp", jsonPath: "$.Timestamp", fieldType: typeof(DateTime), isArray: false)) // you can also supply field type, otherwise it seems to default to `string`
{
csv.WithFirstLineHeader().Write(json);
}
Console.WriteLine(csvWriter.GetStringBuilder().ToString()); // comment this out if writing to file - you won't need it
}
Update summary:
Pivoted to update the code to rely on JSON Path scoping - this seems to allow for field name manipulation with pretty low effort
Looking at your comment, you could probably simplify your file writer a little bit - use StreamWriter instead of StringWriter - see updated code for example
Here is the working sample of producing CSV from your JSON
string json = #"{
""GpsLocation"": {
""Equipment"": [
{
""EquipmentId"": ""EQ00001"",
""InquiryValue"": [
""IV00001""
],
""Timestamp"": ""2020-02-01 01:01:01.01"",
},
{
""EquipmentId"": ""EQ00002"",
""InquiryValue"": [
""IV00002""
],
""Timestamp"": ""2020-01-01 01:01:01.01""
}
]
}
}";
StringBuilder csv = new StringBuilder();
using (var r = ChoJSONReader.LoadText(json)
.WithJSONPath("$.GpsLocation.Equipment")
.WithField("EquipmentId")
.WithField("InquiryValue", jsonPath: "InquiryValue[0]", fieldType: typeof(string))
.WithField("Timestamp", fieldType: typeof(DateTime))
)
{
using (var w = new ChoCSVWriter(csv)
.WithFirstLineHeader())
w.Write(r);
}
Console.WriteLine(csv.ToString());
Output:
EquipmentId,InquiryValue,Timestamp
EQ00001,IV00001,2/1/2020 1:01:01 AM
EQ00002,IV00002,1/1/2020 1:01:01 AM
Sample fiddle: https://dotnetfiddle.net/hJWtqH
Your code is sound, but the issue is that you're only writing the first variable in the array by using i.GpsLocation.Equipment[0]. Instead, try looping over everything by putting it into a for loop, and changing the [0] to your iterating variable inside of said loop.
Hi so am trying to parse this JSON line but i got some others that are like this in files thats why i want to automate this so i can remove the invalid lines to make the file a valid JSON for reading, The problem is that the JSON contains multiple JSON in 1 line
Example:
{"item":"value"}{"anotheritem":"value"}
Is there anyway to remove
{"anotheritem":"value"}
So it turns in to a valid JSON that is readable to start parsing the files
I tried doing using StreamReader cause there in a file i have multiple files that contain these invalid JSON
So i got it to be able to detect the Invalid JSON but for some reason i can't get it to read the JSON so i can use .remove to remove the invalid line
using (StreamReader r = new StreamReader(itemDir))
{
string json = r.ReadToEnd();
if (json.Contains("anotheritem"))
{
JObject NoGood = JObject.FromObject(json);
MessageBox.Show(NoGood.ToString());
}
}
The Error:
Object serialized to String. JObject instance expected.
Thank you all for your time and help.
If each object are side by side without space or any other character, you can convert your string to an json array.
string value = "{\"item\":\"value\"}{\"anotheritem\":\"value\"}";
string arrayValue = "[" + value.Replace("}{", "},{") + "]";
var array = JArray.Parse(arrayValue);
var goopArray = array.OfType<JObject>().Where(o => o.Property("anotheritem") == null);
Edit : see my second answer. More robust solution. More modern. And support dotnet core builtin json serializer.
Json.Net
Even better solution, Json.NET have a builtin feature for this exact scenario. See Read Multiple Fragments With JsonReader
The JsonTextReader have a property SupportMultipleContent that allow to read consecutive items when set to true
string value = "{\"item\":\"value\"}{\"anotheritem\":\"value\"}";
var reader = new JsonTextReader(new System.IO.StringReader(value));
reader.SupportMultipleContent = true;
var list = new List<JObject>();
while (reader.Read())
{
var item = JObject.Load(reader);
list.Add(item);
}
System.Text.Json
If you want to use System.Text.Json, it's also acheivable. They are no SupportMultipleContent property but Utf8JsonReader will do the job for you.
string value = "{\"item\":\"value\"}{\"anotheritem\":\"value\"}";
var bytes = Encoding.UTF8.GetBytes(value).AsSpan();
var list = new List<JsonDocument>();
while (bytes.Length != 0)
{
var reader = new Utf8JsonReader(bytes);
var item = JsonDocument.ParseValue(ref reader);
list.Add(item);
bytes = bytes.Slice((int) reader.BytesConsumed);
}
I read json values from a text and store it in array using this code.
string[] allLines = System.IO.File.ReadAllLines("D:\\tweets.txt");
I need to extract certain fields from this array containing Json.
My Json is of this type:
{"Name":"John","Id":"45","Time":"11 pm"}
{"Name":"Pear","Id":"34","Time":"3 pm"}
I want to extract each "Name" in one array and each "Id" in one array, something like this.
string[] Name= null;
string[] Id= null;
for (var i = 0; i < allLines[i].length; i++)
{
Name = allLines[i].Name;
Id = allLines[i].Id;
}
I tried another way using json parsing as well. I can obtain one row at one time json deserialized this way. But then confused how to obtain the selected fields.
StreamReader streamReader = System.IO.File.OpenText("D:\\tweets.txt");
string lineContent = streamReader.ReadLine();
do
{
if (lineContent != null)
{
var a = JsonConvert.DeserializeObject(lineContent);
}
lineContent = streamReader.ReadLine();
}
while (streamReader.Peek() != -1);
streamReader.Close();
Please help.
I recommend using Json.NET to parse JSON, you can download it as a NuGet package.
It has some great documentation about querying JSON here
Querying your JSON with LINQ would look something like:
JObject json = JObject.Parse(json);
var name = from p in json
select (string)p["Name"];
This example uses the NewtonSoft Json library to deserialize your Json into an object. Then, linq is used to pull out the lists that you are interested in.
I have written this example as a Console Application in Visual Studio. You will need to create a new Console Application, then copy this code into it.
Also, to use the NewtonSoft library in your new Console Application, you will need to load it from NuGet. To do this in VisualStudio, you will need to
Right-click on the project name
Click on Manage NuGet Packages...
In the search box on the top right, enter "newtonsoft" (without the quotes)
Newtonsoft.Json should show up in the list. Click it and press the Install button. This will load the binary and set up the references in your project. After that, you can use the sample code shown in this example.
static void Main(string[] args)
{
TestParseJson();
Console.WriteLine();
Console.WriteLine("Press Any Key to End");
Console.ReadLine(); // Wait for input so we can see our output text
}
// 1 - Construct an object used for deserialization. You will need to make this class match the format of the records in your json text file.
public class User
{
public string Name { get; set; }
public int Id { get; set; }
public DateTime Time { get; set; }
}
// 2 - Simulate Json creation and then use the NewtonSoft library to deserialize. You will need to just extract from the DeserializeObject line down
public static void TestParseJson()
{
// Read list of json objects from file, one line at a time. The json in the test file users.json is in this format:
// {"Name":"John","Id":45,"Time":"2015-11-05T19:18:02.0324468Z"}
// {"Name":"Pear","Id":34,"Time":"2015-11-06T19:18:02.0329474Z"}
var jsonLines = File.ReadLines(#"c:\temp\users.json");
var deserializedUsers = jsonLines.Select(JsonConvert.DeserializeObject<User>).ToList();
// Use Linq to project the list of deserializedUsers into the lists that you want
var names = deserializedUsers.Select(user => user.Name);
var ids = deserializedUsers.Select(user => user.Id);
// Output list of names and ids for debugging purposes
Console.WriteLine("");
Console.WriteLine(" Names:");
foreach (var name in names)
{
Console.WriteLine(" " + name);
}
Console.WriteLine(" Ids:");
foreach (var id in ids)
{
Console.WriteLine(" " + id);
}
}
Your JSON does not actually represent an array, but rather a series of individual objects back-to-back. To be considered a valid JSON array, the objects would need to be enclosed in square brackets and separated by commas (see JSON.org). Regardless, it is still possible to read and parse this non-standard JSON using Json.Net. The JsonTextReader class has a special SupportMultipleContent setting that is designed to cope with this situation. You can process your file using the following code:
List<string> names = new List<string>();
List<string> ids = new List<string>();
JsonSerializer serializer = new JsonSerializer();
using (StreamReader sr = new StreamReader("D:\\tweets.txt"))
using (JsonTextReader reader = new JsonTextReader(sr))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
JObject jo = JObject.Load(reader);
names.Add((string)jo["Name"]);
ids.Add((string)jo["Id"]);
}
}
}
Console.WriteLine("Names: " + string.Join(", ", names));
Console.WriteLine("Ids: " + string.Join(", ", ids));
Fiddle: https://dotnetfiddle.net/tYVLLr
Are there any libraries in .Net to help compare and find differences between two json objects? I've found some solutions available for JavaScript, but nothing interesting for C#. The point of my question is to create json with changes marked in some way, based on the comparison. So that the user could see where the changes are.
using Microsoft.XmlDiffPatch;
using Newtonsoft.Json;
Convert each json to xml and use MS XmlDiff libary. Available on nuget. Differences are given in another xml doc which here I write to the console. This is suitable for unit testing for example.
public bool CompareJson(string expected, string actual)
{
var expectedDoc = JsonConvert.DeserializeXmlNode(expected, "root");
var actualDoc = JsonConvert.DeserializeXmlNode(actual, "root");
var diff = new XmlDiff(XmlDiffOptions.IgnoreWhitespace |
XmlDiffOptions.IgnoreChildOrder);
using (var ms = new MemoryStream())
using (var writer = new XmlTextWriter(ms, Encoding.UTF8))
{
var result = diff.Compare(expectedDoc, actualDoc, writer);
if (!result)
{
ms.Seek(0, SeekOrigin.Begin);
Console.WriteLine(new StreamReader(ms).ReadToEnd());
}
return result;
}
}
I have used different JSON objects than those in your example but it will apply to your case correctly.
private static string GetJsonDiff(string action, string existing, string modified, string objectType)
{
// convert JSON to object
JObject xptJson = JObject.Parse(modified);
JObject actualJson = JObject.Parse(existing);
// read properties
var xptProps = xptJson.Properties().ToList();
var actProps = actualJson.Properties().ToList();
// find differing properties
var auditLog = (from existingProp in actProps
from modifiedProp in xptProps
where modifiedProp.Path.Equals(existingProp.Path)
where !modifiedProp.Value.ToString().Equals(existingProp.Value.ToString())
select new AuditLog
{
Field = existingProp.Path,
OldValue = existingProp.Value.ToString(),
NewValue = modifiedProp.Value.ToString(),
Action = action, ActionBy = GetUserName(),
ActionDate = DateTime.UtcNow.ToLongDateString(),
ObjectType = objectType
}).ToList();
return JsonConvert.SerializeObject(auditLog);
}
I think your best bet is to use JSON.NET to create two JSON objects, then recursively loop through the tree, comparing each node to see if it exists and is equal while you go.
I think the best way to go here is to create objects using newtonsoft json.http://www.nuget.org/packages/newtonsoft.json/
So, you will have two objects of the same type, which you can easily compare and mark the differences.
private IEnumerable<JProperty> JSONCompare(string expectedJSON, string actualJSON)
{
// convert JSON to object
JObject xptJson = JObject.Parse(expectedJSON);
JObject actualJson = JObject.Parse(actualJSON);
// read properties
var xptProps = xptJson.Properties().ToList();
var actProps = actualJson.Properties().ToList();
// find missing properties
var missingProps = xptProps.Where(expected => actProps.Where(actual => actual.Name == expected.Name).Count() == 0);
return missingProps;
}