C# - XML - Compression

C# - XML - Compression - c#

I have a situation where I am generating a XML file to be submitted to a webservice, sometimes due to the amount of data it exceeds 30mb or 50mb.
I need to compress the file, using c#, .net framework 4.0, rather one of the nodes which has most of the data.. I have no idea how i am going to do it .. is it possible if someone can give me a example of how I can get this done please.
the xml file looks like this
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<HeaderTalk xmlns="http://www.w3schools.com/xml">
<EnvelopeVersion>2.0</EnvelopeVersion>
<Header>
<MessageDetails>
<Class>CHAR-CLM</Class>
</MessageDetails>
<SenderDetails>
<IDAuthentication>
<SenderID>aaaaaa</SenderID>
<Authentication>
<Method>MD5</Method>
<Role>principal</Role>
<Value>a3MweCsv60kkAgzEpXeCqQ==</Value>
</Authentication>
</IDAuthentication>
<EmailAddress>Someone#somewhere.com</EmailAddress>
</SenderDetails>
</Header>
<TalkDetails>
<ChannelRouting>
<Channel>
<URI>1953</URI>
<Product>My product</Product>
<Version>2.0</Version>
</Channel>
</ChannelRouting>
</TalkDetails>
<Body>
<envelope xmlns="http://www.w3schools.com/xml/">
<PeriodEnd>2013-08-13</PeriodEnd>
<IRmark Type="generic">zZrxvJ7JmMNaOyrMs9ZOaRuihkg=</IRmark>
<Sender>Individual</Sender>
<Report>
<AuthOfficial>
<OffName>
<Fore>B</Fore>
<Sur>M</Sur>
</OffName>
<Phone>0123412345</Phone>
</AuthOfficial>
<DefaultCurrency>GBP</DefaultCurrency>
<Claim>
<OrgName>B</OrgName>
<ref>AB12345</ref>
<Repayment>
<Account>
<Donor>
<Fore>Barry</Fore>
</Donor>
<Total>7.00</Total>
</Account>
<Account>
<Donor>
<Fore>Anthony</Fore>
</Donor>
<Total>20.00</Total>
</Account>
</Repayment>
</Claim>
</Report>
</envelope>
</Body>
</HeaderTalk>
The CLAIM node is what I want to Compress , as it can be Millions of records that get included in the XML.
I am a novice in coding, it has taken a long time for me to get this XML generated, and been searching to find a way to compress the node but I just cant get it to work..
the Result needs to be exactly same till the DefaultCurrency node..
and then
</AuthOfficial>
<DefaultCurrency>GBP</DefaultCurrency>
<CompressedPart Type="zip">UEsDBBQAAAAIAFt690K1</CompressedPart>
</Report>
</envelope>
</Body>
</HeaderTalk>
or
</AuthOfficial>
<DefaultCurrency>GBP</DefaultCurrency>
<CompressedPart Type="gzip">UEsDBBQAAAAIAFt690K1</CompressedPart>
</Report>
</envelope>
</Body>
</HeaderTalk>
Thank you everyone in advance please. Or if someone can suggest where I can look and get some idea, on what I want to do.
to create the file , I am simple iterating through a Dataset and Writing the nodes using XmlElements and setting innertexts to my values ..
The Code I have used to write is ..
//claim
XmlElement GovtSenderClaim = xmldoc.CreateElement("Claim");
XmlElement GovtSenderOrgname = xmldoc.CreateElement("OrgName");
GovtSenderOrgname.InnerText = Charity_name;
GovtSenderClaim.AppendChild(GovtSenderOrgname);
XmlElement GovtSenderHMRCref = xmldoc.CreateElement("ref");
GovtSenderHMRCref.InnerText = strref ;
GovtSenderClaim.AppendChild(GovtSenderref);
XmlElement GovtSenderRepayments = xmldoc.CreateElement("Repayment");
while (reader.Read())
{
XmlElement GovtSenderAccount = xmldoc.CreateElement("Account");
XmlElement GovtSenderDonor = xmldoc.CreateElement("Donor");
XmlElement GovtSenderfore = xmldoc.CreateElement("Fore");
GovtSenderfore.InnerText = reader["EmployeeName_first_name"].ToString();
GovtSenderDonor.AppendChild(GovtSenderfore);
GovtSenderAccount .AppendChild(GovtSenderDonor);
XmlElement GovtSenderTotal = xmldoc.CreateElement("Total");
GovtSenderTotal.InnerText = reader["Total"].ToString();
GovtSenderAccount .AppendChild(GovtSenderTotal);
GovtSenderRepayments.AppendChild(GovtSenderAccount );
}
GovtSenderClaim.AppendChild(GovtSenderRepayments);
GovtSenderReport.AppendChild(GovtSenderClaim);
and the rest of the nodes to close..

You can try this: it will compress only the nodes you select. It's a little different from what you asked, because it will replace the content of the element, leaving the element + its attributes as they were.
{
// You are using a namespace!
XNamespace ns = "http://www.w3schools.com/xml/";
var xml2 = XDocument.Parse(xml);
// Compress
{
// Will compress all the XElement that are called Claim
// You should probably select the XElement in a better way
var nodes = from p in xml2.Descendants(ns + "Claim") select p;
foreach (XElement el in nodes)
{
CompressElementContent(el);
}
}
// Decompress
{
// Will decompress all the XElement that are called Claim
// You should probably select the XElement in a better way
var nodes = from p in xml2.Descendants(ns + "Claim") select p;
foreach (XElement el in nodes)
{
DecompressElementContent(el);
}
}
}
public static void CompressElementContent(XElement el)
{
string content;
using (var reader = el.CreateReader())
{
reader.MoveToContent();
content = reader.ReadInnerXml();
}
using (var ms = new MemoryStream())
{
using (DeflateStream defl = new DeflateStream(ms, CompressionMode.Compress))
{
// So that the BOM isn't written we use build manually the encoder.
// See for example http://stackoverflow.com/a/2437780/613130
// But note that false is implicit in the parameterless constructor
using (StreamWriter sw = new StreamWriter(defl, new UTF8Encoding()))
{
sw.Write(content);
}
}
string base64 = Convert.ToBase64String(ms.ToArray());
el.ReplaceAll(new XText(base64));
}
}
public static void DecompressElementContent(XElement el)
{
var reader = el.CreateReader();
reader.MoveToContent();
var content = reader.ReadInnerXml();
var bytes = Convert.FromBase64String(content);
using (var ms = new MemoryStream(bytes))
{
using (DeflateStream defl = new DeflateStream(ms, CompressionMode.Decompress))
{
using (StreamReader sr = new StreamReader(defl, Encoding.UTF8))
{
el.ReplaceAll(ParseXmlFragment(sr));
}
}
}
}
public static IEnumerable<XNode> ParseXmlFragment(StreamReader sr)
{
var settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};
using (var xmlReader = XmlReader.Create(sr, settings))
{
xmlReader.MoveToContent();
while (xmlReader.ReadState != ReadState.EndOfFile)
{
yield return XNode.ReadFrom(xmlReader);
}
}
}
The decompress is quite complex, because it's difficult to replace the content of an Xml. In the end I split the content XNode by Xnode in ParseXmlFragment and ReplaceAll in DecompressElementContent.
As a sidenote, you have two similar-but-different namespaces in you XML: http://www.w3schools.com/xml and http://www.w3schools.com/xml/
This other variant will do exactly what you asked (so it will create a CompressedPart node) minus the attribute with the type of compression.
{
XNamespace ns = "http://www.w3schools.com/xml/";
var xml2 = XDocument.Parse(xml);
// Compress
{
// Here the ToList() is necessary, because we will replace the selected elements
var nodes = (from p in xml2.Descendants(ns + "Claim") select p).ToList();
foreach (XElement el in nodes)
{
CompressElementContent(el);
}
}
// Decompress
{
// Here the ToList() is necessary, because we will replace the selected elements
var nodes = (from p in xml2.Descendants("CompressedPart") select p).ToList();
foreach (XElement el in nodes)
{
DecompressElementContent(el);
}
}
}
public static void CompressElementContent(XElement el)
{
string content = el.ToString();
using (var ms = new MemoryStream())
{
using (DeflateStream defl = new DeflateStream(ms, CompressionMode.Compress))
{
// So that the BOM isn't written we use build manually the encoder.
using (StreamWriter sw = new StreamWriter(defl, new UTF8Encoding()))
{
sw.Write(content);
}
}
string base64 = Convert.ToBase64String(ms.ToArray());
var newEl = new XElement("CompressedPart", new XText(base64));
el.ReplaceWith(newEl);
}
}
public static void DecompressElementContent(XElement el)
{
var reader = el.CreateReader();
reader.MoveToContent();
var content = reader.ReadInnerXml();
var bytes = Convert.FromBase64String(content);
using (var ms = new MemoryStream(bytes))
{
using (DeflateStream defl = new DeflateStream(ms, CompressionMode.Decompress))
{
using (StreamReader sr = new StreamReader(defl, Encoding.UTF8))
{
var newEl = XElement.Parse(sr.ReadToEnd());
el.ReplaceWith(newEl);
}
}
}
}

I need to compress the file, using c#, .net framework 4.0, rather one of the nodes
You can use GZip compression. Something like
public static void Compress(FileInfo fileToCompress)
{
using (FileStream originalFileStream = fileToCompress.OpenRead())
{
if ((File.GetAttributes(fileToCompress.FullName) & FileAttributes.Hidden) != FileAttributes.Hidden & fileToCompress.Extension != ".gz")
{
using (FileStream compressedFileStream = File.Create(fileToCompress.FullName + ".gz"))
{
using (GZipStream compressionStream = new GZipStream(compressedFileStream, CompressionMode.Compress))
{
originalFileStream.CopyTo(compressionStream);
Console.WriteLine("Compressed {0} from {1} to {2} bytes.",
fileToCompress.Name, fileToCompress.Length.ToString(), compressedFileStream.Length.ToString());
}
}
}
}
}
public static void Decompress(FileInfo fileToDecompress)
{
using (FileStream originalFileStream = fileToDecompress.OpenRead())
{
string currentFileName = fileToDecompress.FullName;
string newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);
using (FileStream decompressedFileStream = File.Create(newFileName))
{
using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
Console.WriteLine("Decompressed: {0}", fileToDecompress.Name);
}
}
}
}
Another possible way of doing is by Deflate. See here. The major difference between GZipStream and Deflate stream will be that GZipStream will add CRC to ensure the data has no error.

What you're asking is possible, but somewhat involved. You'll need to create the compressed node in memory and then write it. I don't know how you're writing your XML, so I'll assume that you have something like:
open xml writer
write <MessageDetails>
write <SenderDetails>
write other nodes
write Claim node
write other stuff
close file
To write your claim node, you'll want to write to an in-memory stream and then base64 encode it. The resulting string is what you write to the file as your <CompressedPart>.
string compressedData;
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gz = new GZipStream(CompressionMode.Compress, true))
{
using (XmlWriter writer = XmlWriter.Create(gz))
{
writer.WriteStartElement("Claim");
// write claim stuff here
writer.WriteEndElement();
}
}
// now base64 encode the memory stream buffer
byte[] buff = ms.GetBuffer();
compressedData = Convert.ToBase64String(buff, 0, buff.Length);
}
Your data is then in the compressedData string, which you can write as element data.
As I said in my comment, GZip will typically take 80% off your raw XML size, so that 50 MB becomes 10 MB. But base64 encoding will add 33% to the compressed size. I'd expect the result to be approximately 13.5 MB.
Update
Based on your additional code, what you're trying to do doesn't look too difficult. I think what you want to do is:
// do a bunch of stuff
GovtSenderClaim.AppendChild(GovtSenderRepayments);
// start of added code
// compress the GovtSenderClaim element
// This code writes the GovtSenderClaim element to a compressed MemoryStream.
// We then read the MemoryStream and create a base64 encoded representation.
string compressedData;
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gz = new GZipStream(CompressionMode.Compress, true))
{
using (StreamWriter writer = StreamWriter(gz))
{
GovtSenderClaim.Save(writer);
}
}
// now base64 encode the memory stream buffer
byte[] buff = ms.ToArray();
compressedData = Convert.ToBase64String(buff, 0, buff.Length);
}
// compressedData now contains the compressed Claim node, encoded in base64.
// create the CompressedPart element
XElement CompressedPart = xmldoc.CreateElement("CompressedPart");
CompressedPart.SetAttributeValue("Type", "gzip");
CompressedPart.SetValue(compressedData);
GovtSenderReport.AppendChild(CompressedPart);
// GovtSenderReport.AppendChild(GovtSenderClaim);

This is what I have done to make it work ..
public void compressTheData(string xml)
{
XNamespace ns = "http://www.w3schools.com/xml/";
var xml2 = XDocument.Load(xml);
// Compress
{
var nodes = (from p in xml2.Descendants(ns + "Claim") select p).ToList();
foreach (XElement el in nodes)
{
CompressElementContent(el);
}
}
xml2.Save(xml);
}
public static void CompressElementContent(XElement el)
{
string content = el.ToString();
using (var ms = new MemoryStream())
{
using (GZipStream defl = new GZipStream(ms, CompressionMode.Compress))
{
using (StreamWriter sw = new StreamWriter(defl))
{
sw.Write(content);
}
}
string base64 = Convert.ToBase64String(ms.ToArray());
XElement newEl = new XElement("CompressedPart", new XText(base64));
XAttribute attrib = new XAttribute("Type", "gzip");
newEl.Add(attrib);
el.ReplaceWith(newEl);
}
}
Thank you everyone for your inputs.

Related

XMLSerializer - issue with UTF-8 vs UTF-16 Code

I am trying to serialize a simple object (5 string properties) into XML to save to a DB Image field. Then I need to DeSerialize it back into a string later in the program.
However, I am getting some errors - caused by the XML being saved thinking it is in UTF-16 - however, when I load it from the DB back into a string - it thinks it is a UTF 8 String.
The error I get is
InnerException {"There is no Unicode byte order mark. Cannot switch to Unicode."} System.Exception {System.Xml.XmlException}
-- Message "There is an error in XML document (0, 0)." string
Is this happening because of the two different ways I save and load the string to/from the DB? On the save I am using a StringBuilder - but on the load from DB I am using just a String.
Thoughts?
Serialize and Save to DB
// Now Save the OBject XML to the Query Tables
var serializer = new XmlSerializer(ExportConfig.GetType());
StringBuilder StringResult = new StringBuilder();
using (var writer = XmlWriter.Create(StringResult))
{
serializer.Serialize(writer, ExportConfig);
}
//MessageBox.Show("XML : " + StringResult);
// Now Save to the Query
try
{
string UpdateSQL = "Update ZQryRpt "
+ " Set ExportConfig = " + TAGlobal.QuotedStr(StringResult.ToString())
+ " where QryId = " + TAGlobal.QuotedStr(((DataRowView)bindingSource_zQryRpt.Current).Row["QryID"].ToString())
;
ExecNonSelectSQL(UpdateSQL, uniConnection_Config);
}
catch (Exception Error)
{
MessageBox.Show("Error Setting ExportConfig: " + Error.Message);
}
Load from DB And Deserialize
byte[] binaryData = (byte[])((DataRowView)bindingSource_zQryRpt.Current).Row["ExportConfig"];
string XMLStored = System.Text.Encoding.UTF8.GetString(binaryData, 0, binaryData.Length);
if (XMLStored.Length > 0)
{
IIDExportObject ExportConfig = new IIDExportObject();
var serializer = new XmlSerializer(ExportConfig.GetType());
//StringBuilder StringResult = new StringBuilder(XMLStored);
// Load the XML from the Query into the StringBuilder
// Now we need to build a Stream from the String to use in the XMLReader
byte[] byteArray = Encoding.UTF8.GetBytes(XMLStored);
MemoryStream stream = new MemoryStream(byteArray);
using (var reader = XmlReader.Create(stream))
{
ExportConfig = (IIDExportObject)serializer.Deserialize(reader);
}
}

John - thank you very much for the comment! It allowed me to complete the code and find a solution.
As you noted - using a stream reader was the solution - but I could not read the first line because there was only one 'line' in my string. However, I could use the line
using (StreamReader sr = new StreamReader(stream, false))
Which allows me to read the stream and ignore the "Byte Order Mark Detection" set to false.
string XMLStored = MainFormRef.GetExportConfigForCurrentQuery();
if (XMLStored.Length > 0)
{
IIDExportObject ExportConfig = new IIDExportObject();
try
{
var serializer = new XmlSerializer(ExportConfig.GetType());
// Now we need to build a Stream from the String to use in the XMLReader
byte[] byteArray = Encoding.UTF8.GetBytes(XMLStored);
MemoryStream stream = new MemoryStream(byteArray);
// Now we need to use a StreamReader to get around UTF8 vs UTF16 issues
// A little cumbersome - but it works
using (StreamReader sr = new StreamReader(stream, false))
{
using (var reader = XmlReader.Create(sr))
{
ExportConfig = (IIDExportObject)serializer.Deserialize(reader);
}
}
}
catch
{
}
I am not sure this is the best solution - but it works. I will be curious to see if anyone else has a better way of dealing with this.

Thanks to G Bradley, I took his answer and generalized it a bit to make it a bit easier to call.
public static string SerializeToXmlString<T>(T objectToSerialize)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = false;
settings.Encoding = Encoding.UTF8;
StringBuilder builder = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(builder, settings))
{
serializer.Serialize(writer, objectToSerialize);
}
return builder.ToString();
}
public static T DeserializeFromXmlString<T>(string xmlString)
{
if (string.IsNullOrWhiteSpace(xmlString))
return default;
var serializer = new XmlSerializer(typeof(T));
byte[] byteArray = Encoding.UTF8.GetBytes(xmlString);
MemoryStream stream = new MemoryStream(byteArray);
using (StreamReader sr = new StreamReader(stream, false))
{
using (var reader = XmlReader.Create(sr))
{
return (T)serializer.Deserialize(reader);
}
}
}

Edit ZipArchive in memory .NET

I'm trying to edit a XmlDocument file contained in a Zip file:
var zip = new ZipArchive(myZipFileInMemoryStream, ZipArchiveMode.Update);
var entry = zip.GetEntry("filenameToEdit");
using (var st = entry.Open())
{
var xml = new XmlDocument();
xml.Load(st);
foreach (XmlElement el in xml.GetElementsByTagName("Relationship"))
{
if(el.HasAttribute("Target") && el.GetAttribute("Target").Contains(".dat")){
el.SetAttribute("Target", path);
}
}
xml.Save(st);
}
After executing this code the contained file is not changed. IF instead of xml.Save(st); I write the xml to disk, I got the edited one.
Why is the edited file not written to the zip? How do I fix it?
EDIT:
I updated the code:
var tmp = new MemoryStream();
using (var zip = new ZipArchive(template, ZipArchiveMode.Read, true))
{
var entry = zip.GetEntry("xml");
using (var st = entry.Open())
{
var xml = new XmlDocument();
xml.Load(st);
foreach (XmlElement el in xml.GetElementsByTagName("Relationship"))
{
if (el.HasAttribute("Target") && el.GetAttribute("Target").Contains(".dat"))
{
el.SetAttribute("Target", path);
}
}
xml.Save(tmp);
}
}
using (var zip = new ZipArchive(template, ZipArchiveMode.Update, true))
{
var entry = zip.GetEntry("xml");
using (var st = entry.Open())
{
tmp.Position = 0;
tmp.CopyTo(st);
}
}
In this way the zip file is edited, but it works only if the length of the streams is equal. If tmp is shorter the rest of the st is still in the file...
Hints?

I use this code to create a Zip InMemory (using the DotNetZip Library) :
MemoryStream saveStream = new MemoryStream();
ZipFile arrangeZipFile = new ZipFile();
arrangeZipFile.AddEntry("test.xml", "content...");
arrangeZipFile.Save(saveStream);
saveStream.Seek(0, SeekOrigin.Begin);
saveStream.Flush(); // might be useless, because it's in memory...
After that I have a valid Zip inside the MemoryStream. I'm not sure why I added the Flush() - I would guess this is redundant.
To edit an existing Zip you could read it in a MemoryStream and instead of creating "new ZipFile()" use "new ZipFile(byteArray...)".

Writing to Filestream and copying to MemoryStream

I want to overwrite or create an xml file on disk, and return the xml from the function. I figured I could do this by copying from FileStream to MemoryStream. But I end up appending a new xml document to the same file, instead of creating a new file each time.
What am I doing wrong? If I remove the copying, everything works fine.
public static string CreateAndSave(IEnumerable<OrderPage> orderPages, string filePath)
{
if (orderPages == null || !orderPages.Any())
{
return string.Empty;
}
var xmlBuilder = new StringBuilder();
var writerSettings = new XmlWriterSettings
{
Indent = true,
Encoding = Encoding.GetEncoding("ISO-8859-1"),
CheckCharacters = false,
ConformanceLevel = ConformanceLevel.Document
};
using (var fs = new FileStream(filePath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
try
{
XmlWriter xmlWriter = XmlWriter.Create(fs, writerSettings);
xmlWriter.WriteStartElement("PRINT_JOB");
WriteXmlAttribute(xmlWriter, "TYPE", "Order Confirmations");
foreach (var page in orderPages)
{
xmlWriter.WriteStartElement("PAGE");
WriteXmlAttribute(xmlWriter, "FORM_TYPE", page.OrderType);
var outBound = page.Orders.SingleOrDefault(x => x.FlightInfo.Direction == FlightDirection.Outbound);
var homeBound = page.Orders.SingleOrDefault(x => x.FlightInfo.Direction == FlightDirection.Homebound);
WriteXmlOrder(xmlWriter, outBound, page.ContailDetails, page.UserId, page.PrintType, FlightDirection.Outbound);
WriteXmlOrder(xmlWriter, homeBound, page.ContailDetails, page.UserId, page.PrintType, FlightDirection.Homebound);
xmlWriter.WriteEndElement();
}
xmlWriter.WriteFullEndElement();
MemoryStream destination = new MemoryStream();
fs.CopyTo(destination);
Log.Progress("Xml string length: {0}", destination.Length);
xmlBuilder.Append(Encoding.UTF8.GetString(destination.ToArray()));
destination.Flush();
destination.Close();
xmlWriter.Flush();
xmlWriter.Close();
}
catch (Exception ex)
{
Log.Warning(ex, "Unhandled exception occured during create of xml. {0}", ex.Message);
throw;
}
fs.Flush();
fs.Close();
}
return xmlBuilder.ToString();
}
Cheers
Jens

FileMode.OpenOrCreate is causing the file contents to be overwritten without shortening, leaving any 'trailing' data from previous runs. If FileMode.Create is used the file will be truncated first. However, to read back the contents you just wrote you will need to use Seek to reset the file pointer.
Also, flush the XmlWriter before copying from the underlying stream.
See also the question Simultaneous Read Write a file in C Sharp (3817477).
The following test program seems to do what you want (less your own logging and Order details).
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Threading.Tasks;
namespace ReadWriteTest
{
class Program
{
static void Main(string[] args)
{
string filePath = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.Personal),
"Test.xml");
string result = CreateAndSave(new string[] { "Hello", "World", "!" }, filePath);
Console.WriteLine("============== FIRST PASS ==============");
Console.WriteLine(result);
result = CreateAndSave(new string[] { "Hello", "World", "AGAIN", "!" }, filePath);
Console.WriteLine("============== SECOND PASS ==============");
Console.WriteLine(result);
Console.ReadLine();
}
public static string CreateAndSave(IEnumerable<string> orderPages, string filePath)
{
if (orderPages == null || !orderPages.Any())
{
return string.Empty;
}
var xmlBuilder = new StringBuilder();
var writerSettings = new XmlWriterSettings
{
Indent = true,
Encoding = Encoding.GetEncoding("ISO-8859-1"),
CheckCharacters = false,
ConformanceLevel = ConformanceLevel.Document
};
using (var fs = new FileStream(filePath, FileMode.Create, FileAccess.ReadWrite))
{
try
{
XmlWriter xmlWriter = XmlWriter.Create(fs, writerSettings);
xmlWriter.WriteStartElement("PRINT_JOB");
foreach (var page in orderPages)
{
xmlWriter.WriteElementString("PAGE", page);
}
xmlWriter.WriteFullEndElement();
xmlWriter.Flush(); // Flush from xmlWriter to fs
xmlWriter.Close();
fs.Seek(0, SeekOrigin.Begin); // Go back to read from the begining
MemoryStream destination = new MemoryStream();
fs.CopyTo(destination);
xmlBuilder.Append(Encoding.UTF8.GetString(destination.ToArray()));
destination.Flush();
destination.Close();
}
catch (Exception ex)
{
throw;
}
fs.Flush();
fs.Close();
}
return xmlBuilder.ToString();
}
}
}
For the optimizers out there, the StringBuilder was unnecessary because the string is formed whole and the MemoryStream can be avoided by just wrapping fs in a StreamReader. This would make the code as follows.
public static string CreateAndSave(IEnumerable<string> orderPages, string filePath)
{
if (orderPages == null || !orderPages.Any())
{
return string.Empty;
}
string result;
var writerSettings = new XmlWriterSettings
{
Indent = true,
Encoding = Encoding.GetEncoding("ISO-8859-1"),
CheckCharacters = false,
ConformanceLevel = ConformanceLevel.Document
};
using (var fs = new FileStream(filePath, FileMode.Create, FileAccess.ReadWrite))
{
try
{
XmlWriter xmlWriter = XmlWriter.Create(fs, writerSettings);
xmlWriter.WriteStartElement("PRINT_JOB");
foreach (var page in orderPages)
{
xmlWriter.WriteElementString("PAGE", page);
}
xmlWriter.WriteFullEndElement();
xmlWriter.Close(); // Flush from xmlWriter to fs
fs.Seek(0, SeekOrigin.Begin); // Go back to read from the begining
var reader = new StreamReader(fs, writerSettings.Encoding);
result = reader.ReadToEnd();
// reader.Close(); // This would just flush/close fs early(which would be OK)
}
catch (Exception ex)
{
throw;
}
}
return result;
}

I know I'm late, but there seems to be a simpler solution. You want your function to generate xml, write it to a file and return the generated xml. Apparently allocating a string cannot be avoided (because you want it to be returned), same for writing to a file. But reading from a file (as in your and SensorSmith's solutions) can easily be avoided by simply "swapping" the operations - generate xml string and write it to a file. Like this:
var output = new StringBuilder();
var writerSettings = new XmlWriterSettings { /* your settings ... */ };
using (var xmlWriter = XmlWriter.Create(output, writerSettings))
{
// Your xml generation code using the writer
// ...
// You don't need to flush the writer, it will be done automatically
}
// Here the output variable contains the xml, let's take it...
var xml = output.ToString();
// write it to a file...
File.WriteAllText(filePath, xml);
// and we are done :-)
return xml;
IMPORTANT UPDATE: It turns out that the XmlWriter.Create(StringBuider, XmlWriterSettings) overload ignores the Encoding from the settings and always uses "utf-16", so don't use this method if you need other encoding.

Using DataContractJsonSerializer to create a Non XML Json file

I want to use the DataContractJsonSerializer to serialize to file in JsonFormat.
The problem is that the WriteObjectmethod only has 3 options XmlWriter, XmlDictionaryWriter and Stream.
To get what I want I used the following code:
var js = new DataContractJsonSerializer(typeof(T), _knownTypes);
using (var ms = new MemoryStream())
{
js.WriteObject(ms, item);
ms.Position = 0;
using (var sr = new StreamReader(ms))
{
using (var writer = new StreamWriter(path, false))
{
string jsonData = sr.ReadToEnd();
writer.Write(jsonData);
}
}
}
Is this the only way or have I missed something?

Assuming you're just trying to write the text to a file, it's not clear why you're writing it to a MemoryStream first. You can just use:
var js = new DataContractJsonSerializer(typeof(T), _knownTypes);
using (var stream = File.Create(path))
{
js.WriteObject(stream, item);
}
That's rather simpler, and should do what you want...

I am actually quite terrified to claim to know something that Jon Skeet doesn't, but I have used code similar to the following which produces the Json text file and maintains proper indentation:
var js = new DataContractJsonSerializer(typeof(T), _knownTypes);
using (var stream = File.Create(path))
{
using (var writer = JsonReaderWriterFactory.CreateJsonWriter(stream, Encoding.UTF8, true, true, "\t"))
{
js.WriteObject(writer, item);
writer.Flush();
}
}
(as suggested here.)

C# - From NetworkStream to XElement managing different encoding

I've an application that search XML over the network (using TcpClient), these XML have various encoding (one site in UTF8, other in Windows-1252). I would like encode all of these XML in UTF-8 (always) to be sure I'm clean.
How can I do the conversion from the NetworkStream to an XElement encoding correctly all data?
I've this :
NetworkStream _clientStream = /* ... */;
MemoryStream _responseBytes = new MemoryStream();
// serverEncoding -> Xml Encoding I get from server
// _UTF8Encoder -> Local encoder (always UTF8)
try
{
_clientStream.CopyTo(_responseBytes);
if (serverEncoding != _UTF8Encoder)
{
MemoryStream encodedStream = new MemoryStream();
string line = null;
using (StreamReader reader = new StreamReader(_responseBytes))
{
using (StreamWriter writer = new StreamWriter(encodedStream))
{
while ((line = reader.ReadLine()) != null)
{
writer.WriteLine(
Encoding.Convert(serverEncoding, _UTF8Encoder, serverEncoding.GetBytes(line))
);
}
}
}
_responseBytes = encodedStream;
}
_responseBytes.Position = 0;
using (XmlReader reader = XmlReader.Create(_responseBytes))
{
xmlResult = XElement.Load(reader, LoadOptions.PreserveWhitespace);
}
}
catch (Exception ex)
{ }
Have you a better solution (and by ignoring all '\0' ?).
Edit
This works :
byte[] b = _clientStream.ReadToEnd();
var text = _UTF8Encoder.GetString(b, 0, b.Length);
xmlResult = XElement.Parse(text, LoadOptions.PreserveWhitespace);
But this not :
using (var reader = new StreamReader(_clientStream, false))
xmlResult = XElement.Load(reader, LoadOptions.PreserveWhitespace);
I don't understand why ...

You can simply create a StreamReader around the NetworkStream, passing the encoding of the stream, then pass it to XElement.Load:
XElement elem
using(var reader = new StreamReader = new StreamReader(_clientStream, serverEncoding))
elem = XElement.Load(reader);
There is no point in manually transcoding it to a different encoding.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# - XML - Compression - c#

Related

XMLSerializer - issue with UTF-8 vs UTF-16 Code

Edit ZipArchive in memory .NET

Writing to Filestream and copying to MemoryStream

Using DataContractJsonSerializer to create a Non XML Json file

C# - From NetworkStream to XElement managing different encoding

Categories

Resources