JSON.NET: Minify/Format content without re-parsing - c#

Is it possible to minify/format a JSON string using the Newtonsoft JSON.NET library without forcing the system to reparse the code? This is what I have for my methods:
public async Task<string> Minify(string json)
{
// TODO: Some way to do this without a re-parse?
var jsonObj = await JsonOpener.GetJsonFromString(json);
return jsonObj.ToString(Formatting.None);
}
public async Task<string> Beautify(string json)
{
// TODO: Some way to do this without a re-parse?
var jsonObj = await JsonOpener.GetJsonFromString(json);
return FormatJson(jsonObj);
}
private string FormatJson(JToken input)
{
// We could just do input.ToString(Formatting.Indented), but this allows us
// to take advantage of JsonTextWriter's formatting options.
using (var stringWriter = new StringWriter(new StringBuilder()))
{
using (var jsonWriter = new JsonTextWriter(stringWriter))
{
// Configures indentation character and indentation width
// (e.g., "indent each level using 2 spaces", or "use tabs")
ConfigureWriter(jsonWriter);
var serializer = new JsonSerializer();
serializer.Serialize(jsonWriter, input);
return stringWriter.ToString();
}
}
}
This code works just fine in small blocks of JSON, but it starts to get bogged down with large blocks of content. If I could just strip out everything without having to go through the parser, it would be much faster, I'd imagine.
If I have to reinvent the wheel and strip out all whitespace or whatnot myself, I will, but I don't know if there any gotchas that come into play.
For that matter, is there another library better suited to this?
EDIT: My bad, JSON does not support comments natively.

Yes, you can do this using Json.Net. Just connect a JsonTextReader directly to a JsonTextWriter. That way you are reusing the tokenizer logic of the reader and the formatting logic of the writer, but you skip the step of converting the tokens into an intermediate object representation and back (which is the time-consuming part).
Here is how I would break it into helper methods to make it super easy and flexible to use:
public static string Minify(string json)
{
return ReformatJson(json, Formatting.None);
}
public static string Beautify(string json)
{
return ReformatJson(json, Formatting.Indented);
}
public static string ReformatJson(string json, Formatting formatting)
{
using (StringReader stringReader = new StringReader(json))
using (StringWriter stringWriter = new StringWriter())
{
ReformatJson(stringReader, stringWriter, formatting);
return stringWriter.ToString();
}
}
public static void ReformatJson(TextReader textReader, TextWriter textWriter, Formatting formatting)
{
using (JsonReader jsonReader = new JsonTextReader(textReader))
using (JsonWriter jsonWriter = new JsonTextWriter(textWriter))
{
jsonWriter.Formatting = formatting;
jsonWriter.WriteToken(jsonReader);
}
}
Here is a short demo: https://dotnetfiddle.net/RevZNU
With this setup you could easily add additional overloads that work on streams, too, if you needed it. For example:
public static void Minify(Stream inputStream, Stream outputStream, Encoding encoding = null)
{
ReformatJson(inputStream, outputStream, Formatting.None, encoding);
}
public static void Beautify(Stream inputStream, Stream outputStream, Encoding encoding = null)
{
ReformatJson(inputStream, outputStream, Formatting.Indented, encoding);
}
public static void ReformatJson(Stream inputStream, Stream outputStream, Formatting formatting, Encoding encoding = null)
{
if (encoding == null)
encoding = new UTF8Encoding(false);
const int bufferSize = 1024;
using (StreamReader streamReader = new StreamReader(inputStream, encoding, true, bufferSize, true))
using (StreamWriter streamWriter = new StreamWriter(outputStream, encoding, bufferSize, true))
{
ReformatJson(streamReader, streamWriter, formatting);
}
}

Related

Json serialize error via RecyclableMemoryStream

I use code like this to replace default json serialize method:
private readonly static RecyclableMemoryStreamManager _recyclableMemoryStreamManager =
new RecyclableMemoryStreamManager(blockSize: 128 * 1024, largeBufferMultiple: 1024 * 1024, maximumBufferSize: 128 * 1024 * 1024);
private ByteArrayContent Serialize(object content, JsonSerializerSettings serializerSettings, Encoding encoding, string mediaType)
{
var jsonSerializer = Newtonsoft.Json.JsonSerializer.Create(serializerSettings);
using (var memoryStream = _recyclableMemoryStreamManager.GetStream())
{
using (var textWriter = new StreamWriter(memoryStream, encoding, 1024, true))
{
using (var jsonTextWriter = new JsonTextWriter(textWriter) { CloseOutput = false })
{
jsonSerializer.Serialize(jsonTextWriter, content);
jsonTextWriter.Flush();
var arraySegment = new ArraySegment<byte>(memoryStream.GetBuffer(), 0, (int)memoryStream.Length);
var resContent = new ByteArrayContent(arraySegment.Array, arraySegment.Offset, arraySegment.Count);
resContent.Headers.ContentType = new MediaTypeHeaderValue(mediaType);
return resContent;
}
}
}
}
But sometimes, http response json with sytanx error:
{
"code": 0,
"msg": null,
"data": [
// ....
]
}
')","foo":"","bar":"baz","flag":0,')","foo":"","bar":"baz","flag":0,
')","foo":"","bar":"baz","flag":0,')","foo":"","bar":"baz","flag":0,
How to fix this?
I think it maybe reuse buffer error,
Maybe can change the values of RecyclableMemoryStreamManager ?
_recyclableMemoryStreamManager.AggressiveBufferReturn = true;
The buffer from GetBuffer() is only well-defined for the lifetime of the stream; you dispose the stream when the method exits the using block for memoryStream, which means those buffers are now up for grabs for re-use.
You may wish to use StreamContent instead; this accepts a Stream of the payload, and disposes it when sent: I'd use that; that would give you the exact semantics you want here. Note: don't dispose memoryStream yourself - remove that using (perhaps adding a catch block that does memoryStream?.Dispose(); throw;).
Note also that GetBuffer() is not necessarily the optimal API for RecyclableMemoryStream, since it may use multiple discontiguous buffers internally; there should be a ReadOnlySequence<byte> GetReadOnlySequence() API which allows that usage - however, this still has the same lifetime limitations impacting buffer re-use, so: it wouldn't change anything here.
Untested, but for consideration:
private HttpContent Serialize(object content, JsonSerializerSettings serializerSettings, Encoding encoding, string mediaType)
{
var jsonSerializer = JsonSerializer.Create(serializerSettings);
var memoryStream = _recyclableMemoryStreamManager.GetStream();
try
{
using (var textWriter = new StreamWriter(memoryStream, encoding, 1024, true))
{
using var jsonTextWriter = new JsonTextWriter(textWriter) { CloseOutput = false };
jsonSerializer.Serialize(jsonTextWriter, content);
jsonTextWriter.Flush();
}
memoryStream.Position = 0; // rewind
var resContent = new StreamContent(memoryStream);
resContent.Headers.ContentType = new MediaTypeHeaderValue(mediaType);
return resContent;
}
catch
{
memoryStream?.Dispose();
throw;
}
}
However, I would expect it would be better to serialize directly to the output via the inbuilt JSON media encoder, rather than using an intermediate buffer.

How to test XML De-/Serialization

So I tried to create a very simple XmlFileWriter
public class XmlFileWriter
{
public void WriteTo<TSerializationData>(string path, TSerializationData data)
{
using (StreamWriter streamWriter = new StreamWriter(path))
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(TSerializationData));
xmlSerializer.Serialize(streamWriter, data);
}
}
}
and XmlFileReader
public class XmlFileReader
{
public TSerializationData ReadFrom<TSerializationData>(string path)
{
using (StreamReader streamReader = new StreamReader(path))
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(TSerializationData));
return (TSerializationData) xmlSerializer.Deserialize(streamReader);
}
}
}
I want to create unit tests for both of them with xUnit. Since they are coupled to the filesystem I was looking for a way to mock it somehow. Many Posts highly recommend the System.IO.Abstractions package and the additional TestingHelpers.
I will only show the tests for the reader for now since both scenarios are very similiar. This is what I have so far
[Fact]
public void ThrowsExceptionIfPathIsInvalid()
{
XmlFileReader xmlFileReader = new XmlFileReader();
// use an empty path since it should be invalid
Assert.Throws<Exception>(() => xmlFileReader.ReadFrom<object>(string.Empty));
}
[Fact]
public void DeserializesDataFromXmlFile()
{
// Generate dummy data with default values
MyDummyClass dummyData = new MyDummyClass();
XmlFileWriter xmlFileWriter = new XmlFileWriter();
XmlFileReader xmlFileReader = new XmlFileReader();
string filePath = "???"; // TODO
// Generate a new file and use it as a mock file
xmlFileWriter.WriteTo(filePath, dummyData);
// Read from that file
MyDummyClass fileContent = xmlFileReader.ReadFrom<MyDummyClass>(filePath);
// Compare the result
Assert.Equal(dummyData, fileContent);
}
I'm struggling with decoupling the real Filesystem. How would I make the XmlSerializer class use a fake filesystem? I installed the abstractions package but I don't know how to use it for this case (for reading and writing).
StreamReader and StreamWriter both have constructors that accept a Stream. I recommend making your method also take streams as parameters, and the your unit tests can supply a MemoryStream containing your test xml as a string (which can be hardcoded), while your actual application can provide a FileStream that is the file on disk. Like so:
public void WriteTo<TSerializationData>(Stream location, TSerializationData data)
{
// Code here doesn't change
}
public TSerializationData ReadFrom<TSerializationData>(Stream location)
{
// Code here doesn't change
}
Then in your tests you can do:
using (var ms = new MemoryStream())
{
using (var sr = new StreamWriter())
{
sr.Write("<xml>This is your dummy XML string, can be anything you want</xml>");
}
MyDummyClass fileContent = xmlFileReader.ReadFrom<MyDummyClass>(ms);
}
And if you want to read from a file you can do:
// Using whatever FileMode/ FileAccess you need
MyDummyClass fileContent;
using (var fs = File.Open(#"C:\Path\To\File.xml", FileMode.Open, FileAccess.Read))
{
fileContent = xmlFileReader.ReadFrom<MyDummyClass>(fs);
}

Use XmlSerializer.CanDeserialize() when deserializing from string

I have a method that returns object from .xml file
(please don't mind resource usage and naming, it's just an example)
public static T FromXMLFile<T>(string filePath)
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
FileStream fs = new FileStream(filePath, FileMode.Open);
XmlTextReader xmlTextReader = new XmlTextReader(fs);
if(xmlSerializer.CanDeserialize(xmlTextReader))
{
object tempObject = (T)xmlSerializer.Deserialize(xmlTextReader );
xmlTextReader.Close();
return (T)tempObject;
}
else
return default(T);
}
Now I would like to do the same but with with string instead of a file. I came up with something like this (again, simplified example)
public static T FromString<T>(string inputString)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
T result;
try
{
using (TextReader reader = new StringReader(inputString))
{
result = (T)serializer.Deserialize(reader);
}
return result;
}
catch //temporary solution, finally should stick to .CanDeserialize(xmlTextReader) usage
{
return default(T);
}
}
How would I use .CanDeserialize() in this case?
Rather than using the Deserialize(TextReader) overload, create an XmlReader from the TextReader, and use that XmlReader for both the Deserialize and CanDeserialize calls:
using (TextReader reader = new StringReader(inputString))
using (XmlReader xmlReader = XmlReader.Create(reader))
{
if (serializer.CanDeserialize(xmlReader))
{
result = (T)serializer.Deserialize(xmlReader);
}
}
This approach - with both read and write - also allows you to supply additional reader/writer settings for fine-grained control of the API.

Can I decompress and deserialize a file using streams?

My application serializes an object using Json.Net, compresses the resulting JSON, then saves this to file. Additionally the application can load an object from one of these files. These objects can be tens of Mb in size and I'm concerned about memory usage, due to the way the existing code creates large strings and byte arrays:-
public void Save(MyClass myObject, string filename)
{
var json = JsonConvert.SerializeObject(myObject);
var bytes = Compress(json);
File.WriteAllBytes(filename, bytes);
}
public MyClass Load(string filename)
{
var bytes = File.ReadAllBytes(filename);
var json = Decompress(bytes);
var myObject = JsonConvert.DeserializeObject<MyClass>(json);
}
private static byte[] Compress(string s)
{
var bytes = Encoding.Unicode.GetBytes(s);
using (var ms = new MemoryStream())
{
using (var gs = new GZipStream(ms, CompressionMode.Compress))
{
gs.Write(bytes, 0, bytes.Length);
gs.Close();
return ms.ToArray();
}
}
}
private static string Decompress(byte[] bytes)
{
using (var msi = new MemoryStream(bytes))
{
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(msi, CompressionMode.Decompress))
{
gs.CopyTo(mso);
return Encoding.Unicode.GetString(mso.ToArray());
}
}
}
}
I was wondering if the Save/Load methods could be replaced with streams? I've found examples of using streams with Json.Net but am struggling to get my head around how to fit in the additional compression stuff.
JsonSerializer has methods to serialize from a JsonTextReader and to a StreamWriter, both of which can be created on top of any sort of stream, including a GZipStream. Using them, you can create the following extension methods:
public static partial class JsonExtensions
{
// Buffer sized as recommended by Bradley Grainger, https://faithlife.codes/blog/2012/06/always-wrap-gzipstream-with-bufferedstream/
// But anything smaller than 85,000 bytes should be OK, since objects larger than that go on the large object heap. See:
// https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap
const int BufferSize = 8192;
// Disable writing of BOM as per https://datatracker.ietf.org/doc/html/rfc8259#section-8.1
static readonly Encoding DefaultEncoding = new UTF8Encoding(false);
public static void SerializeToFileCompressed(object value, string path, JsonSerializerSettings settings = null)
{
using (var fs = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.Read))
SerializeCompressed(value, fs, settings);
}
public static void SerializeCompressed(object value, Stream stream, JsonSerializerSettings settings = null)
{
using (var compressor = new GZipStream(stream, CompressionMode.Compress))
using (var writer = new StreamWriter(compressor, DefaultEncoding, BufferSize))
{
var serializer = JsonSerializer.CreateDefault(settings);
serializer.Serialize(writer, value);
}
}
public static T DeserializeFromFileCompressed<T>(string path, JsonSerializerSettings settings = null)
{
using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read))
return DeserializeCompressed<T>(fs, settings);
}
public static T DeserializeCompressed<T>(Stream stream, JsonSerializerSettings settings = null)
{
using (var compressor = new GZipStream(stream, CompressionMode.Decompress))
using (var reader = new StreamReader(compressor))
using (var jsonReader = new JsonTextReader(reader))
{
var serializer = JsonSerializer.CreateDefault(settings);
return serializer.Deserialize<T>(jsonReader);
}
}
}
See Performance Tips: Optimize Memory Usage in the Json.NET documentation.
For those looking for an idea how to use the extensions from #dbc in uwp apps, I modified the code to this - where the StorageFile is a file you have access to write to.
public static async void SerializeToFileCompressedAsync(object value, StorageFile file, JsonSerializerSettings settings = null)
{
using (var stream = await file.OpenStreamForWriteAsync())
SerializeCompressed(value, stream, settings);
}
public static void SerializeCompressed(object value, Stream stream, JsonSerializerSettings settings = null)
{
using (var compressor = new GZipStream(stream, CompressionMode.Compress))
using (var writer = new StreamWriter(compressor))
{
var serializer = JsonSerializer.CreateDefault(settings);
serializer.Serialize(writer, value);
}
}
public static async Task<T> DeserializeFromFileCompressedAsync<T>(StorageFile file, JsonSerializerSettings settings = null)
{
using (var stream = await file.OpenStreamForReadAsync())
return DeserializeCompressed<T>(stream, settings);
}
public static T DeserializeCompressed<T>(Stream stream, JsonSerializerSettings settings = null)
{
using (var compressor = new GZipStream(stream, CompressionMode.Decompress))
using (var reader = new StreamReader(compressor))
using (var jsonReader = new JsonTextReader(reader))
{
var serializer = JsonSerializer.CreateDefault(settings);
return serializer.Deserialize<T>(jsonReader);
}
}

Binary Serialization to ResultBuffer in C#

I have a working XML Serializer which serializes a C# object to an entity in AutoCAD. I'd like to be able to do the same thing but with Binary Serialization for the cases in which XML does not work. So far my serialization method looks like this:
public static void BinarySave(Entity entityToWriteTo, Object objToSerialize, string key = "default")
{
using (MemoryStream stream = new MemoryStream())
{
BinaryFormatter serializer = new BinaryFormatter();
serializer.Serialize(stream, objToSerialize);
stream.Position = 0;
ResultBuffer data = new ResultBuffer();
/*Code to get binary serialization into result buffer*/
using (Transaction tr = db.TransactionManager.StartTransaction())
{
using (DocumentLock docLock = doc.LockDocument())
{
if (!entityToWriteTo.IsWriteEnabled)
{
entityToWriteTo = tr.GetObject(entityToWriteTo.Id, OpenMode.ForWrite) as Entity;
}
if (entityToWriteTo.ExtensionDictionary == ObjectId.Null)
{
entityToWriteTo.CreateExtensionDictionary();
}
using (DBDictionary dict = tr.GetObject(entityToWriteTo.ExtensionDictionary, OpenMode.ForWrite, false) as DBDictionary)
{
Xrecord xrec;
if (dict.Contains(key))
{
xrec = tr.GetObject(dict.GetAt(key), OpenMode.ForWrite) as Xrecord;
xrec.Data = data;
}
else
{
xrec = new Xrecord();
xrec.Data = data;
dict.SetAt(key, xrec);
tr.AddNewlyCreatedDBObject(xrec, true);
}
xrec.Dispose();
}
tr.Commit();
}
data.Dispose();
}
}
}
It's heavily based on my XML Serializer except I have no idea how to get the serialized object into a resultbuffer to be added to the Xrecord of entityToWriteTo.
If XML isn't working for you for some reason, I'd suggest trying a different textual data format such as JSON. The free and open-source JSON formatter Json.NET has some support for situations that can trip up XmlSerializer, including
Dictionaries.
Classes lacking default constructors.
polymorphic types.
Complex data conversion and remapping.
Plus, JSON is quite readable so you may be able to diagnose problems in your data by visual examination.
That being said, you can convert the output stream from BinaryFormatter to a base64 string using the following helper methods:
public static class BinaryFormatterHelper
{
public static string ToBase64String<T>(T obj)
{
using (var stream = new MemoryStream())
{
new BinaryFormatter().Serialize(stream, obj);
return Convert.ToBase64String(stream.GetBuffer(), 0, checked((int)stream.Length)); // Throw an exception on overflow.
}
}
public static T FromBase64String<T>(string data)
{
using (var stream = new MemoryStream(Convert.FromBase64String(data)))
{
var formatter = new BinaryFormatter();
var obj = formatter.Deserialize(stream);
if (obj is T)
return (T)obj;
return default(T);
}
}
}
The resulting string can then be stored in a ResultBuffer as you would store an XML string.

Categories