ReadAllTextFile vs. StreamReader for JSON Parsing

ReadAllTextFile vs. StreamReader for JSON Parsing - c#

This is more of a theoretical question, but I am curious what the difference between these two methods of reading a file is and why I would want to choose one over the other.
I am parsing a JSON configuration file (from local disk). Here is one method of doing it:
// Uses JSON.NET Serializer + StreamReader
using(var s = new StreamReader(file))
{
var jtr = new JsonTextReader(sr);
var jsonSerializer = new JsonSerializer();
return jsonSerializer.Deserialize<Configuration>(jtr);
}
...and, the second option...
// Reads the entire file and deserializes.
var json = File.ReadAllText(file);
return JsonConvert.DeserializeObject<JsonVmrConfigurationProvider>(json);
Is one any better than the other? Is there a case where one or the other should be used?
Again, this is more theoretical, but, I realized I don't really know the answer to it, and a search online didn't produce results that satisfied me. I could see the second being bad if the file was large (it isn't) since it's being read into memory in one shot. Any other reasons?

by reading the code you found that deserializaton from a string finally reach:
public static object DeserializeObject(string value, Type type, JsonSerializerSettings settings)
{
ValidationUtils.ArgumentNotNull(value, "value");
JsonSerializer jsonSerializer = JsonSerializer.CreateDefault(settings);
// by default DeserializeObject should check for additional content
if (!jsonSerializer.IsCheckAdditionalContentSet())
jsonSerializer.CheckAdditionalContent = true;
using (var reader = new JsonTextReader(new StringReader(value)))
{
return jsonSerializer.Deserialize(reader, type);
}
}
That is the creation of a JsonTextReader.
So the difference seems to effectively be the handling of huge files.
-- previous comment temper:
JsonTextReader overrides JsonReader.Close() and handles the stream (if CloseInput is true), but not only.
CloseInput should be true by default as the StringReader is not explicitly disposed in previous fragment of code

With File.ReadAllText(), the entire JSON needs to be loaded into memory first before deserializing it. Using a StreamReader, the file is read and deserialized incrementally. So if your file is huge, you might want to use the StreamReader to avoid loading the whole file into memory. If your JSON file is small (most cases), it doesn't really make a difference.

Related

Wrapping a JSON Stream

I'm trying to store large objects as gzipped JSON text to an Azure blob.
I don't want to hold the serialized data in memory, and I don't want to spool to disk if I can avoid it, but I don't see how to just let it serialize and compress on the fly.
I'm using JSON.NET from Newtonsoft (pretty much the de facto standard JSON serializer for .NET), but the signatures of the methods don't really seem to support on-the-fly streaming.
Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob has an UploadFromStream(Stream source, AccessCondition accessCondition = null, BlobRequestOptions options = null, OperationContext operationContext = null) method, but in order for that to work properly, I need to have the position of the stream be 0, and the JsonSerializer.SerializeObject doesn't do that. It just acts on a stream, and when it's done the stream position is at EOF.
What I'd like to do is something like this:
public void SaveObject(object obj, string path, JsonSerializerSettings settings = null)
{
using (var jsonStream = new JsonStream(object, settings ?? _defaultSerializerSettings))
using (var gzipStream = new GZipStream(jsonStream))
{
var blob = GetCloudBlockBlob(path);
blob.UploadFromStream(gzipStream);
}
}
...the idea being, serialization does not start until something is pulling data (in this case, the GZipStream, which does not compress data until pulled by the blob.UploadFromStream() method) thus it maintains a low overhead. It does not need to be a seekable stream, it just needs to be read on demand.
I trust everyone can see how this would work if you were doing a stream from System.IO.File.OpenRead() instead of new JsonStream(object obj). While it gets a bit more complicated because Json.NET needs to "look ahead" and potentially fill a buffer, they got it working with the CryptoStream and GZipStream and that works real slick.
Is there a way to do this that does not load the entire JSON representation of the object into memory, or spool it to disk first just to regurgitate? If CryptoStreams can do it, we should be able to do it with Json.NET without a large amount of effort. I would think.

Is any ways to convert google protobuf message into json with indentation?

I need to save Google protobuf IMessage object into json file, using C#.
here is sample code:
using (var input = File.OpenRead(protodatFile))
{
string jsonString = null;
message.MergeFrom(input); //read message from protodat file
JsonFormatter formater = new JsonFormatter(
new JsonFormatter.Settings(false));
jsonString = formatter.Format(message);
System.IO.File.WriteAllText(jsonFile, jsonString);
}
This uses the JsonFormatter from the Google Protobuf library.
The problem: all json content is stored in one line. when file is quite big(>50 MB) it is hard to open/view in text editor.
What is the best way to make indented jsonFile here?

As an extremely inefficient workaround, one can use Json.NET to re-format the Protobuffer Json:
// Re-Parse the one-line protobuf json string into an object:
object parsed = Newtonsoft.Json.JsonConvert.DeserializeObject(jsonString);
// Format this object as JSON with indentation:
string jsonWithIndent = Newtonsoft.Json.JsonConvert.SerializeObject(parsed, Newtonsoft.Json.Formatting.Indented);
Since the original question asked about a 50MB file, this is probably a bad solution for such large-ish files, but given that I couldn't find anything on JsonFormatter.Settings, it was what I reached for.

As of version 3.22.0 of Google.Protobuf you can use the following:
JsonFormatter formatter = new JsonFormatter(JsonFormatter.Settings.Default.WithIndentation());
output = formatter.Format(msg);
See here for the corresponding issue: https://github.com/protocolbuffers/protobuf/pull/9391

Big Data Xml file (file size over 20GB) convert to Json File

I have an XML file. I want to convert it to JSON with C#. However, the XML file is over 20 GB.
I have tried to read XML with XmlReader, then append every node to a JSON file. I wrote the following code:
var path = #"c:\result.json";
TextWriter tw = new StreamWriter(path, true, Encoding.UTF8);
tw.Write("{\"A\":");
using (XmlTextReader xmlTextReader = new XmlTextReader("c:\\muslum.xml"))
{
while (xmlTextReader.Read())
{
if (xmlTextReader.Name == "A")
{
var xmlDoc = new XmlDocument();
var v = xmlTextReader.ReadInnerXml();
string json = Newtonsoft.Json.JsonConvert.SerializeXmlNode(xmlDoc, Newtonsoft.Json.Formatting.None, true);
tw.Write(json);
}
}
}
tw.Write("}");
tw.Close();
This code not working. I am getting error while converting json. Is there any best way to perform the conversion?

I would do it the following way
generate classes out of xsd schema using xsd.exe
open file and read top level ( i e your document level ) tags one by one (with XmlTextReader or XmlReader)
serialize each tag into object using generated classes
deserialize resulting object to json and save to whatever
consider saving in batches of 1000-2000 tags
you are right about serialize/deserialize being slow. still doing work in several threads, preferably using TPL will give you good speed. Also consider using json.net serializer, it is really a lot faster than standard ones ( it is standard for web.api though)
I can put some code snippets in the morning if you need them.
We are processing big ( 1-10gigs) files this manner in order to save data to sql server database.

Serializing xml to object while keeping the original xml?

I'm using an external API to receive xml and serializing this to an object but I want a way to be able to keep the original xml used to serialize for debugging and auditing.
Here's a sample of how I'm serializing:
XmlReader reader = this.Execute(url);
return Read<Property>(reader, "property");
Extract of Execute() routine:
StringBuilder sb = new StringBuilder();
Stream s = response.GetResponseStream();
XmlReader reader = XmlReader.Create(s);
return reader;
Read() simply wraps up the native xml serialization:
private T Read<T>(XmlReader reader, string rootElement)
{
XmlRootAttribute root = new XmlRootAttribute();
root.ElementName = rootElement;
root.IsNullable = true;
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T), root);
object result = xmlSerializer.Deserialize(reader);
return (T)result;
}
I've had a look around at it appears once you've used the reader, you can't use it again (forward only reading stream?). Without trying to change to much, how can I extract the contents of the reader as xml while still benefiting from the built in serialization with the reader?
What would be nice is to adjust Read with an out param:
private T Read<T>(XmlReader reader, string rootElement, out string sourceXml);

You did not share the code for this.Execute(url), but presumably you build a reader from a stream. First write that stream to a string, then use it somewhere. If the stream is not seekable, dispose it and create a new stream from it.
Also, note that XmlSerializer can take a stream instead of a reader, so you could never bother with the reader and just pass streams among your methods.

Use fiddler.
Fiddler is a Web Debugging Proxy which logs all HTTP(S) traffic between your computer and the Internet. Fiddler allows you to inspect traffic, set breakpoints, and "fiddle" with incoming or outgoing data. Fiddler includes a powerful event-based scripting subsystem, and can be extended using any .NET language.

Deserialize on the fly, or LINQ to XML

Working with C# Visual Studio 2008, MVC1.
I'm creating an xml file by fetching one from a WebService and adding some nodes to it. Now I wanted to deserialize it to a class which is the model used to strongtyped the View.
First of all, I'm facing problems to achieve that without storing the xml in the filesystem cause I don't know how this serialize and deserialize work. I guess there's a way and it's a matter of time.
But, searching for the previous in the web I came accross LINQ to XML and now I doubt whether is better to use it.
The xml would be formed by some clients details, and basically I will use all of them.
Any hint?
Thanks!!

You can save a XElement to and from a MemoryStream (no need to save it to a file stream)
MemoryStream ms = new MemoryStream();
XmlWriter xw = XmlWriter.Create(ms);
document.Save(xw);
xw.Flush();
Then if you reset the position back to 0 you can deserialize it using the DataContractSerializer.
ms.Position = 0;
DataContractSerializer serializer = new DataContractSerializer(typeof(Model));
Model model = (model) serializer.ReadObject(ms);
There are other options for how serialization works, so if this is not what you have, let me know what you are using and I will help.

try this:
XmlSerializer xmls = new XmlSerializer(typeof(XElement));
FileStream FStream;
try
{
FStream = new FileStream(doctorsPath, FileMode.Open);
_Doctors = (XElement)xmls.Deserialize(FStream); FStream.Close();
FStream = new FileStream(patientsPath, FileMode.Open);
_Patients = (XElement)xmls.Deserialize(FStream)
FStream.Close();
FStream = new FileStream(treatmentsPath, FileMode.Open);
_Treatments = (XElement)xmls.Deserialize(FStream);
FStream.Close();
}
catch
{ }
This will load all of the XML files into our XElement variables. The try – catch block is a form of exception handling that ensures that if one of the functions in the try block throws an exception, the program will jump to the catch section where nothing will happen. When working with files, especially reading files, it is a good idea to work with try – catch.

LINQ to XML is an excellent feature. You can always rely on that. You don't need to write or read or data from file. You can specify either string or stream to the XDocument
There are enough ways to load an XML element to the XDocument object. See the appropriate Load functions. Once you load the content, you can easily add/remove the elements and later you can save to disk if you want.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

ReadAllTextFile vs. StreamReader for JSON Parsing - c#

Related

Wrapping a JSON Stream

Is any ways to convert google protobuf message into json with indentation?

Big Data Xml file (file size over 20GB) convert to Json File

Serializing xml to object while keeping the original xml?

Deserialize on the fly, or LINQ to XML

Categories

Resources