Why is Windows Azure storage blob XML truncated? - c#

I'm trying to save an xml file into a blob. I have no error, everything seems fine except when I navigate to the blob url I see a blank page. If I look at the source code of the web page I can see my xml but truncated.
Here is te code.
StringBuilder fileString = new StringBuilder();
XmlWriterSettings xmlSettings=new XmlWriterSettings
{
Encoding = new UTF8Encoding(false)
};
using (XmlWriter writer = XmlWriter.Create(fileString, xmlSettings))
{
bla bla
}
CloudBlockBlob fileBlob = container.GetBlockBlobReference("site.xml");
fileBlob.UploadText(fileString.ToString());

I found the solution in some other post (not so much the problem although it has to do with encoding of text always being utf-16 despite setting up the writer as utf-8). I am now using a Stream and it works fine.
MemoryStream fileString = new MemoryStream();
XmlWriterSettings xmlSettings=new XmlWriterSettings
{
Encoding = Encoding.UTF8,
Indent = true
};
using (XmlWriter writer = XmlWriter.Create(fileString, xmlSettings))
{
bla bla
}
CloudBlockBlob fileBlob = container.GetBlockBlobReference("site.xml");
fileBlob.UploadText(StreamToString(fileString));
private static string StreamToString(Stream stream)
{
stream.Position = 0;
var reader = new StreamReader(stream);
return reader.ReadToEnd();
}

If you have decided to use a Stream, you can also upload it using UploadFromStream instead of UploadText, which encodes the string into a sequence of bytes, creates a memory stream, and calls UploadFromStream anyway.

Related

XmlWriter Encoding in C#

I have an XmlDocument and I am saving it with an XmlWriter, using this post. Despite setting the Encoding to Utf-8 and the file getting saved with Utf-8 encoding in fact, the xml declaration in the file has the "utf-16" as the value of the encoding attribute.
I can't see where is the error in my code:
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings
{
Encoding=Encoding.UTF8
};
using (XmlWriter writer = XmlWriter.Create(sb, settings))
{
xDoc.Save(writer);
}
using (
StreamWriter sw = new StreamWriter(
new FileStream(strXmlName, FileMode.Create, FileAccess.Write),
Encoding.UTF8
)
)
{
sw.Write(sb.ToString());
}
The reason for this is covered in the question #dbc links to in the comments: The overload of XmlWriter.Create that accepts a StringBuilder will create a StringWriter, which has its encoding set to UTF-16.
However, in this case it's not clear why you're using a StringBuilder when your goal is to write to a file. You could create an XmlWriter for the file directly:
var settings = new XmlWriterSettings
{
Indent = true
};
using (var writer = XmlWriter.Create(strXmlName, settings))
{
xDoc.WriteTo(writer);
}
The encoding here will default to UTF-8.
As an aside, I'd suggest you check out the much newer XDocument and friends, it's a much more friendly API than XmlDocument.

Hyphen in XML causes XmlWriter to fail with UTF-8 error

I have XML with the element: <DESCRIPTION>fault – No reply</DESCRIPTION>
I am transforming the Xml from a web-service as follows based on Jon Skeet's code https://stackoverflow.com/a/427737/197229 (the original Xml validates fine):
public sealed class StringWriterUTF8 : StringWriter
{
public override Encoding Encoding
{
get { return Encoding.UTF8; }
}
}
WebRequest request = WebRequest.Create(url);
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader streamReader = new StreamReader(stream);
string xml = streamReader.ReadToEnd();
logger.Log().Debug(String.Format("Received Xml:\n{0}", xml));
if(Transform != null)
{
using (var stringReader = new StringReader(xml))
using (var xmlReader = XmlReader.Create(stringReader))
using (var stringWriter = new StringWriterUTF8())
using (var xmlTextWriter = XmlWriter.Create(stringWriter, new XmlWriterSettings()
{ Indent= true}))
{
Transform.Transform(xmlReader,xmlTextWriter);
xml = stringWriter.ToString();
logger.Log().Debug(String.Format("Transformed Xml:\n{0}", xml));
}
}
Everything looks great... but the generated XML is failing validation when I try to use it, even though to the naked eye it looks fine. If I remove that hyphen, there are no problems.
I don't understand why the original XML is fine and the .Net classes are getting tripped up, but if I try and validate the Xml in Notepad++ I get this:
Input is not proper UTF-8, indicate encoding ! Bytes: 0x96 0x20 0x4E
0x6F
How can I resolve this? All I want to do is receive Xml and transform it to a new Xml file without encoding weirdness!

How to get Xml as string from XDocument?

I am new to LINQ to XML. After you have built XDocument, how do you get the OuterXml of it like you did with XmlDocument?
You only need to use the overridden ToString() method of the object:
XDocument xmlDoc ...
string xml = xmlDoc.ToString();
This works with all XObjects, like XElement, etc.
I don't know when this changed, but today (July 2017) when trying the answers out, I got
"System.Xml.XmlDocument"
Instead of ToString(), you can use the originally intended way accessing the XmlDocument content: writing the xml doc to a stream.
XmlDocument xml = ...;
string result;
using (StringWriter writer = new StringWriter())
{
xml.Save(writer);
result = writer.ToString();
}
Several responses give a slightly incorrect answer.
XDocument.ToString() omits the XML declaration (and, according to #Alex Gordon, may return invalid XML if it contains encoded unusual characters like &).
Saving XDocument to StringWriter will cause .NET to emit encoding="utf-16", which you most likely don't want (if you save XML as a string, it's probably because you want to later save it as a file, and de facto standard for saving files is UTF-8 - .NET saves text files as UTF-8 unless specified otherwise).
#Wolfgang Grinfeld's answer is heading in the right direction, but it's unnecessarily complex.
Use the following:
var memory = new MemoryStream();
xDocument.Save(memory);
string xmlText = Encoding.UTF8.GetString(memory.ToArray());
This will return XML text with UTF-8 declaration.
Doing XDocument.ToString() may not get you the full XML.
In order to get the XML declaration at the start of the XML document as a string, use the XDocument.Save() method:
var ms = new MemoryStream();
using (var xw = XmlWriter.Create(new StreamWriter(ms, Encoding.GetEncoding("ISO-8859-1"))))
new XDocument(new XElement("Root", new XElement("Leaf", "data"))).Save(xw);
var myXml = Encoding.GetEncoding("ISO-8859-1").GetString(ms.ToArray());
Use ToString() to convert XDocument into a string:
string result = string.Empty;
XElement root = new XElement("xml",
new XElement("MsgType", "<![CDATA[" + "text" + "]]>"),
new XElement("Content", "<![CDATA[" + "Hi, this is Wilson Wu Testing for you! You can ask any question but no answer can be replied...." + "]]>"),
new XElement("FuncFlag", 0)
);
result = root.ToString();
While #wolfgang-grinfeld's answer is technically correct (as it also produces the XML declaration, as opposed to just using .ToString() method), the code generated UTF-8 byte order mark (BOM), which for some reason XDocument.Parse(string) method cannot process and throws Data at the root level is invalid. Line 1, position 1. error.
So here is a another solution without the BOM:
var utf8Encoding =
new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
using (var memory = new MemoryStream())
using (var writer = XmlWriter.Create(memory, new XmlWriterSettings
{
OmitXmlDeclaration = false,
Encoding = utf8Encoding
}))
{
CompanyDataXml.Save(writer);
writer.Flush();
return utf8Encoding.GetString(memory.ToArray());
}
I found this example in the Microsoft .NET 6 documentation for XDocument.Save method. I think it answers the original question (what is the XDocument equivalent for XmlDocument.OuterXml), and also addresses the concerns that others have pointed out already. By using the XmlWritingSettings you can predictably control the string output.
https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xdocument.save
StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = true;
xws.Indent = true;
using (XmlWriter xw = XmlWriter.Create(sb, xws)) {
XDocument doc = new XDocument(
new XElement("Child",
new XElement("GrandChild", "some content")
)
);
doc.Save(xw);
}
Console.WriteLine(sb.ToString());
Looking at these answers, I see a lot of unnecessary complexity and inefficiency in pursuit of generating the XML declaration automatically. But since the declaration is so simple, there isn't much value in generating it. Just KISS (keep it simple, stupid):
// Extension method
public static string ToStringWithDeclaration(this XDocument doc, string declaration = null)
{
declaration ??= "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n";
return declaration + doc.ToString();
}
// Usage
string xmlString = doc.ToStringWithDeclaration();
// Or
string xmlString = doc.ToStringWithDeclaration("...");
Using XmlWriter instead of ToString() can give you more control over how the output is formatted (such as if you want indentation), and it can write to other targets besides string.
The reason to target a memory stream is performance. It lets you skip the step of storing the XML in a string (since you know the data must end up in a different encoding eventually, whereas string is always UTF-16 in C#). For instance, for an HTTP request:
// Extension method
public static ByteArrayContent ToByteArrayContent(
this XDocument doc, XmlWriterSettings xmlWriterSettings = null)
{
xmlWriterSettings ??= new XmlWriterSettings();
using (var stream = new MemoryStream())
{
using (var writer = XmlWriter.Create(stream, xmlWriterSettings))
{
doc.Save(writer);
}
var content = new ByteArrayContent(stream.GetBuffer(), 0, (int)stream.Length);
content.Headers.ContentType = new MediaTypeHeaderValue("text/xml");
return content;
}
}
// Usage (XDocument -> UTF-8 bytes)
var content = doc.ToByteArrayContent();
var response = await httpClient.PostAsync("/someurl", content);
// Alternative (XDocument -> string -> UTF-8 bytes)
var content = new StringContent(doc.ToStringWithDeclaration(), Encoding.UTF8, "text/xml");
var response = await httpClient.PostAsync("/someurl", content);

Serializing an object as UTF-8 XML in .NET

Proper object disposal removed for brevity but I'm shocked if this is the simplest way to encode an object as UTF-8 in memory. There has to be an easier way doesn't there?
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream, System.Text.Encoding.UTF8);
serializer.Serialize(streamWriter, entry);
memoryStream.Seek(0, SeekOrigin.Begin);
var streamReader = new StreamReader(memoryStream, System.Text.Encoding.UTF8);
var utf8EncodedXml = streamReader.ReadToEnd();
No, you can use a StringWriter to get rid of the intermediate MemoryStream. However, to force it into XML you need to use a StringWriter which overrides the Encoding property:
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding => Encoding.UTF8;
}
Or if you're not using C# 6 yet:
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding { get { return Encoding.UTF8; } }
}
Then:
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
string utf8;
using (StringWriter writer = new Utf8StringWriter())
{
serializer.Serialize(writer, entry);
utf8 = writer.ToString();
}
Obviously you can make Utf8StringWriter into a more general class which accepts any encoding in its constructor - but in my experience UTF-8 is by far the most commonly required "custom" encoding for a StringWriter :)
Now as Jon Hanna says, this will still be UTF-16 internally, but presumably you're going to pass it to something else at some point, to convert it into binary data... at that point you can use the above string, convert it into UTF-8 bytes, and all will be well - because the XML declaration will specify "utf-8" as the encoding.
EDIT: A short but complete example to show this working:
using System;
using System.Text;
using System.IO;
using System.Xml.Serialization;
public class Test
{
public int X { get; set; }
static void Main()
{
Test t = new Test();
var serializer = new XmlSerializer(typeof(Test));
string utf8;
using (StringWriter writer = new Utf8StringWriter())
{
serializer.Serialize(writer, t);
utf8 = writer.ToString();
}
Console.WriteLine(utf8);
}
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding => Encoding.UTF8;
}
}
Result:
<?xml version="1.0" encoding="utf-8"?>
<Test xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<X>0</X>
</Test>
Note the declared encoding of "utf-8" which is what we wanted, I believe.
Your code doesn't get the UTF-8 into memory as you read it back into a string again, so its no longer in UTF-8, but back in UTF-16 (though ideally its best to consider strings at a higher level than any encoding, except when forced to do so).
To get the actual UTF-8 octets you could use:
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream, System.Text.Encoding.UTF8);
serializer.Serialize(streamWriter, entry);
byte[] utf8EncodedXml = memoryStream.ToArray();
I've left out the same disposal you've left. I slightly favour the following (with normal disposal left in):
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
using(var memStm = new MemoryStream())
using(var xw = XmlWriter.Create(memStm))
{
serializer.Serialize(xw, entry);
var utf8 = memStm.ToArray();
}
Which is much the same amount of complexity, but does show that at every stage there is a reasonable choice to do something else, the most pressing of which is to serialise to somewhere other than to memory, such as to a file, TCP/IP stream, database, etc. All in all, it's not really that verbose.
Very good answer using inheritance, just remember to override the initializer
public class Utf8StringWriter : StringWriter
{
public Utf8StringWriter(StringBuilder sb) : base (sb)
{
}
public override Encoding Encoding { get { return Encoding.UTF8; } }
}
I found this blog post which explains the problem very well, and defines a few different solutions:
(dead link removed)
I've settled for the idea that the best way to do it is to completely omit the XML declaration when in memory. It actually is UTF-16 at that point anyway, but the XML declaration doesn't seem meaningful until it has been written to a file with a particular encoding; and even then the declaration is not required. It doesn't seem to break deserialization, at least.
As #Jon Hanna mentions, this can be done with an XmlWriter created like this:
XmlWriter writer = XmlWriter.Create (output, new XmlWriterSettings() { OmitXmlDeclaration = true });

Memory efficiency :Passing Html code of aspx page through codebehind

My goal is to generate the aspx code of a page in the form of string.I am calling the below codebehind code through asynchronous request in javascript and i am getting the response back through Response.Write
string html = string.Empty;
using (var memoryStream = new MemoryStream())
{
using (var streamWriter = new StreamWriter(memoryStream))
{
var htmlWriter = new HtmlTextWriter(streamWriter);
base.Render(htmlWriter);
htmlWriter.Flush();
memoryStream.Position = 0;
using (var streamReader = new StreamReader(memoryStream))
{
html = streamReader.ReadToEnd();
streamReader.Close();
}
}
}
Response.Write(html);
Response.End();
I want to ask that is the above code is memory efficient, I am thinking of "yield" to use as it evaluates lazily.Can u suggest on memory efficency of above code.
Use a StringWriter instead of the MemoryStream, the StreamWriter and the StreamReader:
string html;
using (StringWriter stream = new StringWriter()) {
using (HtmlTextWriter writer = new HtmlTextWriter(stream)) {
base.Render(writer);
}
html = stream.ToString();
}
Response.Write(html);
Response.End();
The StringWriter uses a StringBuilder internally. The ToString method calls ToString on the Stringuilder, so it returns the internal string buffer as the string. That means that the string is only created once, and not copied back and forth.
Your method stores an html copy at html variable, and another at memoryStream. Try this:
base.Render(new HtmlTextWriter(Response.Output));
Response.End();
While this can work, I'm not sure what are you trying to accomplish.

Categories