I need to create a large XML file which unfortunately I have no control over.
One of the elements of the XML contains hundreds of thousands of elements (up to 500k), so the resulting XML of the XElement is quite large, however still smaller than what I would expect for an OutOfMemory exception.
The below code is being used to serialise objects to XML, then use an XmlReader to stream that XML and pull out the element I need (which is huge). I'm then trying to compress the element (using GZip to Base64) however, I can't even get this far because xmlReader.ReadOuterXml throws an OutOfMemory exception.
I have tried various ways of doing this, including using XmlDocument (.InnerXml/.OuterXml) and XDocument with XElements (.ToString()) but all of them throw OutOfMemory exceptions.
The reason I'm trying to convert my element to a string in the first place is because my compression expects a byte[] - I'm not sure if my approach is flawed in this case.
Code:
private string GetCompressed(object obj)
{
XmlSerializer serialiser = new XmlSerializer(obj.GetType());
// Serialise the XML to an XDocument so that we can manipulate it
using (MemoryStream memoryStream = new MemoryStream())
{
// Serialise to the memory stream
serialiser.Serialize(memoryStream, obj);
// Reset the position to 0.
memoryStream.Seek(0, SeekOrigin.Begin);
// Create an XML reader to stream out the results
using (XmlReader xmlReader = XmlReader.Create(memoryStream))
{
while (xmlReader.Read())
{
if (xmlReader.Name == "ElementIWant")
{
return CompressXML(xmlReader.ReadOuterXml()); //<= ReadOuterXml() throws OutOfMemory
}
}
}
}
return string.Empty;
}
Related
I am trying to read json file from cloud storage and trying to convert that into Google.Cloud.DocumentAI.V1.Document.
I have done POC, but its throwing exception Google.Protobuf.InvalidProtocolBufferException: 'Protocol message end-group tag did not match expected tag.'
First I am reading .Json file into MemoryStream and trying to Merge in to Document.
using Google.Cloud.DocumentAI.V1;
public static void StreamToDocument()
{
byte[] fileContents = File.ReadAllBytes("C:\\upload\\temp.json");
using (Stream memoryStream = new MemoryStream(fileContents))
{
Document doc = new Document();
var byteArray = memoryStream;
doc.MergeFrom(byteArray);
}
}
Error Message I am getting is
Is there any other way I can achieve this ?
The code that you've specified there expects the data to be binary protobuf content. For JSON, you want:
string json = File.ReadAllText("C:\\upload\\temp.json");
Document document = Document.Parser.ParseJson(json);
I am trying to dispose the memory from an XmlDocument object
using (XmlNodeReader xnrAwards = new XmlNodeReader(ndListItems))
{
ndListItems.InnerXml = ndListItems.InnerXml.Replace("_x002e_", "_").Replace("ows_", "");
dsAward.ReadXml(xnrAwards, XmlReadMode.ReadSchema);
XmlDocument xdocAwards = new XmlDocument();
xdocAwards.LoadXml(ndListItems.OuterXml);
xdocAwards.Save(ABCListName + "_XML.xml");
}
Any idea on how to dispose the memory off this object as this is giving me an outofmemoryexception
Stop using XmlDocument if memory is a concern. It loads the entire document at once, which is causing you your issues.
Use instead a stream-based reader: XmlReader
This object chunks up the file into a buffer instead of loading the entire thing.
using (XmlReader reader = XmlReader.Create(file)) {
while (reader.Read()) {
//Do processing on each
}
}
Note this is a forward-only reader, and it's not as straight-forward to use as XmlDocument, but buffering your data will ensure you don't run into further memory exceptions.
If you're curious about the mechanism used for buffering, it's a yield return behind the scenes (which is actually compiled to a switch case if you want to get down to the nitty-gritty). Here is a blog post of someone doing something similar with a text file: http://jamesmccaffrey.wordpress.com/2011/02/04/processing-a-text-file-with-the-c-yield-return-pattern/
ref: http://msdn.microsoft.com/en-us/library/vstudio/system.xml.xmlreader
I want to read xml on runtime, without save it on a path
After my searching i find that, In console application i need to use Console.Out for displaying result
xmlSerializer.Serialize(Console.Out, patient);
In Windows / Web Application we need to set path like
StreamWriter streamWriter = new StreamWriter(#"C:\test.xml");
but i need to read xml with out save it, i am using Webserive where i need to read it and take a decision that either it is valid or not
I hope i define it clearly..
Use the XmlDocument object.
There are several ways to load the XML, you can use the XmlDocument.Load() and specify your URL in there or use XmlDocument.LoadXml() to load the XML from a string.
You could use the XmlDocument.LoadXml class to read the received xml. There is no need to save it to disk.
try
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(receivedXMLStr);
//valid xml
}
catch (XmlException xe)
{
//invalid xml
}
Use Linq2Xml..
XElement doc;
try
{
doc=XElement.Load(yourStream);
}
catch
{
//invalid XML
}
foreach(XElement node in doc.Descendants())
{
node.Value;//value of this node
nodes.Attributes();//all the attributes of this node
}
Thanks all of you for your reply, i want to laod my XML without save it on a local Path, because saving creating many XML.
Finally i find the solutions for load the XML from class on a Memory stream, I thinn this solution is very easy and optimize
XmlDocument doc = new XmlDocument();
System.Xml.Serialization.XmlSerializer serializer2 = new System.Xml.Serialization.XmlSerializer(Patients.GetType());
System.IO.MemoryStream stream = new System.IO.MemoryStream();
serializer2.Serialize(stream, Patients);
stream.Position = 0;
doc.Load(stream);
You need to use the Deserialize option to read the xml. Follow the below steps to achieve it,
Create a target class. It structure should represent the xml output.
After creating the class, use the below code to load your xml into the target object
TargetType result = null;
XmlSerializer worker = new XmlSerializer(typeof(TargetType));
result = worker.Deserialize("<xml>.....</xml>");
Now the xml is loaded into the object 'result' and use it.
I have the following method that I use to serialize various objects to XML. I then write the XML to a file. All the objects have the proper [DataContract] and [DataMember] attributes.
public static string Serialize<T>(T item)
{
var builder = new StringBuilder();
var serializer = new DataContractSerializer(typeof(T));
using (var xmlWriter = XmlWriter.Create(builder))
{
serializer.WriteObject(xmlWriter, item);
return builder.ToString();
}
}
The serialization works fine, however, I am missing the end of the content. I.e., the string does not contain the full XML document: the end gets truncated. Sometimes the string ends right in the middle of a tag.
There does not seem to be a miximum length that would cause an issue: I have strings of 18k that are incomplete and I have strings of 80k that are incomplete as well.
The XML structure is fairly simple and only about 6-8 nodes deep.
Am I missing something?
xmlWriter isn't flushed at the point you call ToString(); try:
using (var xmlWriter = XmlWriter.Create(builder))
{
serializer.WriteObject(xmlWriter, item);
}
return builder.ToString();
This does the ToString() after the Dispose() on xmlWriter, meaning it will flush any buffered data to the output (builder in this case).
I'm trying to load a xml document into an object XPathDocument in C#.
My xml documents include this line:
trés dégagée + rade
and when the parser arrives there it gives me this error:
"An error occurred while parsing EntityName"
I know that's normal cause of the character "é". Does anybody know how can I avoid this error... My idea is to insert into the xml document an entities declaration and after replace all special characters with entities...but it's long and I’m not sure if it's working. Do you have other ideas? Simpler?
Thanks a lot
Was about to post this and just then the servers went down. I think I've rewritten it correctly from memory:
I think that the problem lies within the fact that by default the XPathDocument uses an XmlTextReader to parse the contents of the supplied file and this XmlTextReader uses an EntityHandling setting of ExpandEntities.
In other words, when you rely on the default settings, an XmlTextReader will validate the input XML and try to resolve all entities. The better way is to do this manually by taking full control over the XmlReaderSettings (I always do it manually):
string myXMLFile = "SomeFile.xml";
string fileContent = LoadXML(myXMLFile);
private string LoadXML(string xml)
{
XPathDocument xDoc;
XmlReaderSettings xrs = new XmlReaderSettings();
// The following line does the "magic".
xrs.CheckCharacters = false;
using (XmlReader xr = XmlReader.Create(xml, xrs))
{
xDoc = new XPathDocument(xr);
}
if (xDoc != null)
{
XPathNavigator xNav = xDoc.CreateNavigator();
return xNav.OuterXml;
}
else
// Unable to load file
return null;
}
Typically this is caused by a mismatch between the encoding used to read the file and the files actually encoding.
At a guess I would say the file is UTF-8 encoded but you are reading it with a default encoding.
Try beefing up your question with more details to get a more definitive answer.