Convert XmlDocument to String - c#

Here is how I'm currently converting XMLDocument to String
StringWriter stringWriter = new StringWriter();
XmlTextWriter xmlTextWriter = new XmlTextWriter(stringWriter);
xmlDoc.WriteTo(xmlTextWriter);
return stringWriter.ToString();
The problem with this method is that if I have " ((quotes) which I have in attributes) it escapes them.
For Instance:
<Campaign name="ABC">
</Campaign>
Above is the expected XML. But it returns
<Campaign name=\"ABC\">
</Campaign>
I can do String.Replace "\" but is that method okay? Are there any side-effects? Will it work fine if the XML itself contains a "\"

Assuming xmlDoc is an XmlDocument object whats wrong with xmlDoc.OuterXml?
return xmlDoc.OuterXml;
The OuterXml property returns a string version of the xml.

There aren't any quotes. It's just VS debugger. Try printing to the console or saving to a file and you'll see. As a side note: always dispose disposable objects:
using (var stringWriter = new StringWriter())
using (var xmlTextWriter = XmlWriter.Create(stringWriter))
{
xmlDoc.WriteTo(xmlTextWriter);
xmlTextWriter.Flush();
return stringWriter.GetStringBuilder().ToString();
}

If you are using Windows.Data.Xml.Dom.XmlDocument version of XmlDocument (used in UWP apps for example), you can use yourXmlDocument.GetXml() to get the XML as a string.

As an extension method:
public static class Extensions
{
public static string AsString(this XmlDocument xmlDoc)
{
using (StringWriter sw = new StringWriter())
{
using (XmlTextWriter tx = new XmlTextWriter(sw))
{
xmlDoc.WriteTo(tx);
string strXmlText = sw.ToString();
return strXmlText;
}
}
}
}
Now to use simply:
yourXmlDoc.AsString()

" is shown as \" in the debugger, but the data is correct in the string, and you don't need to replace anything. Try to dump your string to a file and you will note that the string is correct.

you can use xmlDoc.InnerXml property to get xml in string

Related

XDocument will not parse html entities (e.g. ) but XmlDocument will

I am currently converting our old parsers that run on XmlDocument to the XDocument. I do this mainly to get the Linq querying and the added linenumber info.
My xml contains an element like this:
<?xml version="1.0"?>
<fulltext>
hello this is a failed textnode
and I don't know how to parse it.
</fulltext>
My problem is that while XmlDocument seems to have no problem reading that node with:
var xmlDocument = new XmlDocument();
var physicalPath = GetPhysicalPath(uploadFolderFile);
try
{
xmlDocument.Load(physicalPath);
}
catch (XmlException xmlException)
{
_log.Warn("Problems with the document", xmlException);
}
The example above parses the document fine but when I try to do:
XDocument xmlDocument;
var physicalPath = GetPhysicalPath(uploadFolderFile);
var xmlStream = new System.IO.StreamReader(physicalPath);
try
{
xmlDocument = XDocument.Load(xmlStream, LoadOptions.SetLineInfo | LoadOptions.SetBaseUri);
}
catch (XmlException)
{
_log.Warn("Trying to clean document for HexaDecimal", xmlException);
}
It fails to read the document because of the character
The special character seems to be allowed in XML version 1.1 but changing the description doesn't help.
I have thought about just parsing the document with XmlDocument and then converting it; but that seems to be counterintuitive. Can anybody help with this problem?
Ok...so I sort of found a solution to this problem.
First of all I try to parse the xml using the following code:
private XDocument GetXmlDocument(String physicalPath)
{
XDocument xmlDocument;
var xmlStream = new System.IO.StreamReader(physicalPath);
try
{
xmlDocument = XDocument.Load(xmlStream, LoadOptions.SetLineInfo);
}
catch (XmlException)
{
//_log.Warn("Trying to clean document for HexaDecimal", xmlException);
xmlDocument = XmlSanitizingStream.TryToCleanXMLBeforeParsing(physicalPath);
}
return xmlDocument;
}
If it fails to load the document, then I will try to clean it using the technique used in this blogpost:
http://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/
It will not remove the character I mentioned before, but it will remove any character not allowed by the XML standard.
Then, after sanitizing the XML, I add an XMLReader and set its settings to not check characters:
public static XDocument TryToCleanXMLBeforeParsing(String physicalPath)
{
string xml;
Encoding encoding;
using (var reader = new XmlSanitizingStream(File.OpenRead(physicalPath)))
{
xml = reader.ReadToEnd();
encoding = reader.CurrentEncoding;
}
byte[] encodedString;
if (encoding.Equals(Encoding.UTF8)) encodedString = Encoding.UTF8.GetBytes(xml);
else if (encoding.Equals(Encoding.UTF32)) encodedString = Encoding.UTF32.GetBytes(xml);
else encodedString = Encoding.Unicode.GetBytes(xml);
var ms = new MemoryStream(encodedString);
ms.Flush();
ms.Position = 0;
var settings = new XmlReaderSettings {CheckCharacters = false};
XmlReader xmlReader = XmlReader.Create(ms, settings);
var xmlDocument = XDocument.Load(xmlReader);
ms.Close();
return xmlDocument;
}
Since I've cleaned the document removing illegal characters before I add the ignore characters to the reader, I am pretty sure that I do not read a malformed XML document. Worst case scenario is I get a malformed XML and it will throw an error anyways.
I only use this for parsing and it should only be used to read the data. This will not make the XML well-formed and will in many cases throw exceptions elsewhere in your code. I am only using this because I cannot change what the customer is sending us and I have to read it as is.

Convert utf-8 XML document to utf-16 for inserting into SQL

I have an XML document that has been created using utf-8 encoding. I want to store that document in a sql 2008 xml column but I understand I need to convert it to utf-16 in order to do that.
I've tried using XDocument to do this but I'm not getting a valid XML result after the conversion. Here is what I've tried to do the conversion on (Utf8StringWriter is a small class that inherits from StringWriter and overloads Encoding):
XDocument xDoc = XDocument.Parse(utf8Xml);
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings()
{ Encoding = writer.Encoding, Indent = true });
xDoc.WriteTo(xml);
string utf16Xml = writer.ToString();
The data in the utf16Xml is invalid and when trying to insert into the database I get the error:
{"XML parsing: line 1, character 38, unable to switch the encoding"}
However the initial utf8Xml data is definitely valid and contains all the info I need.
UPDATE:
The initial XML is obtained by using XMLSerializer (with an Utf8StringWriter class) to create the xml string from an existing object model (engine). The code for this is:
public static void Serialise<T>(T engine, ref StringWriter writer)
{
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() { Encoding = writer.Encoding });
XmlSerializer xs = new XmlSerializer(engine.GetType());
xs.Serialize(xml, engine);
}
I have to leave this like this as that code is out of my control to change.
Before I even send the utf16Xml string to the failing database call I can view it via the Visual Studio debugger and I notice that the entire string is not present and instead I get a string literal was not closed error on the XML viewer.
The error is on first line XDocument xDoc = XDocument.Parse(utf8Xml);. Most likely you converted utf8 stream into a string (utf8xml), but encoding specified in the string is still utf-8, so XML reader fails. If it is true than load XML directly from stream using Load instead of converting it to string first.
Set the encoding of the document to UTF-16 after you have parsed it from utf8xml
XDocument xDoc = XDocument.Parse(utf8Xml);
xDoc.Declaration.Encoding = "utf-16";
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings()
{ Encoding = writer.Encoding, Indent = true });
xDoc.WriteTo(xml);
string utf16Xml = writer.ToString();
Here's what I had to do to make it work. This just converts the XML to utf-16
string getUtf16Xml(System.Xml.XmlDocument xmlDoc)
{
System.Xml.Linq.XDocument xDoc = System.Xml.Linq.XDocument.Parse(xmlDoc.OuterXml);
xDoc.Declaration.Encoding = "utf-16";
return xDoc.ToString();
}
Then I can save the results to the DB.

creating new xml file by C#

how we can create new xml file by C# in the following situation:
its winform.
our xml code is in a string str = "<><><><>...etc"
now i want to create an xml file myxml.xml which contains everything in str,
please answer in the context and thanks...
need simple source code
XmlDocument.LoadXml
using System;
using System.Xml;
public class Sample {
public static void Main() {
// Create the XmlDocument.
XmlDocument doc = new XmlDocument();
doc.LoadXml("<item><name>wrench</name></item>"); //Your string here
// Save the document to a file and auto-indent the output.
XmlTextWriter writer = new XmlTextWriter("data.xml",null);
writer.Formatting = Formatting.Indented;
doc.Save(writer);
}
}
Hi how about creating a method just for this:
private static void CreateXMLFile(string xml, string filePath)
{
// Create the XmlDocument.
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml); //Your string here
// Save the document to a file and auto-indent the output.
XmlTextWriter writer = new XmlTextWriter(filePath, null);
writer.Formatting = Formatting.Indented;
doc.Save(writer);
}
If the only thing you want to do is to save the string as a file (and you don't want to do do XML-specific stuff like checking well-formedness or automatic indenting), you can simply use File.WriteAllText().

How to get Xml as string from XDocument?

I am new to LINQ to XML. After you have built XDocument, how do you get the OuterXml of it like you did with XmlDocument?
You only need to use the overridden ToString() method of the object:
XDocument xmlDoc ...
string xml = xmlDoc.ToString();
This works with all XObjects, like XElement, etc.
I don't know when this changed, but today (July 2017) when trying the answers out, I got
"System.Xml.XmlDocument"
Instead of ToString(), you can use the originally intended way accessing the XmlDocument content: writing the xml doc to a stream.
XmlDocument xml = ...;
string result;
using (StringWriter writer = new StringWriter())
{
xml.Save(writer);
result = writer.ToString();
}
Several responses give a slightly incorrect answer.
XDocument.ToString() omits the XML declaration (and, according to #Alex Gordon, may return invalid XML if it contains encoded unusual characters like &).
Saving XDocument to StringWriter will cause .NET to emit encoding="utf-16", which you most likely don't want (if you save XML as a string, it's probably because you want to later save it as a file, and de facto standard for saving files is UTF-8 - .NET saves text files as UTF-8 unless specified otherwise).
#Wolfgang Grinfeld's answer is heading in the right direction, but it's unnecessarily complex.
Use the following:
var memory = new MemoryStream();
xDocument.Save(memory);
string xmlText = Encoding.UTF8.GetString(memory.ToArray());
This will return XML text with UTF-8 declaration.
Doing XDocument.ToString() may not get you the full XML.
In order to get the XML declaration at the start of the XML document as a string, use the XDocument.Save() method:
var ms = new MemoryStream();
using (var xw = XmlWriter.Create(new StreamWriter(ms, Encoding.GetEncoding("ISO-8859-1"))))
new XDocument(new XElement("Root", new XElement("Leaf", "data"))).Save(xw);
var myXml = Encoding.GetEncoding("ISO-8859-1").GetString(ms.ToArray());
Use ToString() to convert XDocument into a string:
string result = string.Empty;
XElement root = new XElement("xml",
new XElement("MsgType", "<![CDATA[" + "text" + "]]>"),
new XElement("Content", "<![CDATA[" + "Hi, this is Wilson Wu Testing for you! You can ask any question but no answer can be replied...." + "]]>"),
new XElement("FuncFlag", 0)
);
result = root.ToString();
While #wolfgang-grinfeld's answer is technically correct (as it also produces the XML declaration, as opposed to just using .ToString() method), the code generated UTF-8 byte order mark (BOM), which for some reason XDocument.Parse(string) method cannot process and throws Data at the root level is invalid. Line 1, position 1. error.
So here is a another solution without the BOM:
var utf8Encoding =
new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
using (var memory = new MemoryStream())
using (var writer = XmlWriter.Create(memory, new XmlWriterSettings
{
OmitXmlDeclaration = false,
Encoding = utf8Encoding
}))
{
CompanyDataXml.Save(writer);
writer.Flush();
return utf8Encoding.GetString(memory.ToArray());
}
I found this example in the Microsoft .NET 6 documentation for XDocument.Save method. I think it answers the original question (what is the XDocument equivalent for XmlDocument.OuterXml), and also addresses the concerns that others have pointed out already. By using the XmlWritingSettings you can predictably control the string output.
https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xdocument.save
StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = true;
xws.Indent = true;
using (XmlWriter xw = XmlWriter.Create(sb, xws)) {
XDocument doc = new XDocument(
new XElement("Child",
new XElement("GrandChild", "some content")
)
);
doc.Save(xw);
}
Console.WriteLine(sb.ToString());
Looking at these answers, I see a lot of unnecessary complexity and inefficiency in pursuit of generating the XML declaration automatically. But since the declaration is so simple, there isn't much value in generating it. Just KISS (keep it simple, stupid):
// Extension method
public static string ToStringWithDeclaration(this XDocument doc, string declaration = null)
{
declaration ??= "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n";
return declaration + doc.ToString();
}
// Usage
string xmlString = doc.ToStringWithDeclaration();
// Or
string xmlString = doc.ToStringWithDeclaration("...");
Using XmlWriter instead of ToString() can give you more control over how the output is formatted (such as if you want indentation), and it can write to other targets besides string.
The reason to target a memory stream is performance. It lets you skip the step of storing the XML in a string (since you know the data must end up in a different encoding eventually, whereas string is always UTF-16 in C#). For instance, for an HTTP request:
// Extension method
public static ByteArrayContent ToByteArrayContent(
this XDocument doc, XmlWriterSettings xmlWriterSettings = null)
{
xmlWriterSettings ??= new XmlWriterSettings();
using (var stream = new MemoryStream())
{
using (var writer = XmlWriter.Create(stream, xmlWriterSettings))
{
doc.Save(writer);
}
var content = new ByteArrayContent(stream.GetBuffer(), 0, (int)stream.Length);
content.Headers.ContentType = new MediaTypeHeaderValue("text/xml");
return content;
}
}
// Usage (XDocument -> UTF-8 bytes)
var content = doc.ToByteArrayContent();
var response = await httpClient.PostAsync("/someurl", content);
// Alternative (XDocument -> string -> UTF-8 bytes)
var content = new StringContent(doc.ToStringWithDeclaration(), Encoding.UTF8, "text/xml");
var response = await httpClient.PostAsync("/someurl", content);

Apply XSLT on in-memory XML and returning in-memory XML

I am looking for a static function in the .NET framework which takes an XML snippet and an XSLT file, applies the transformation in memory, and returns the transformed XML.
I would like to do this:
string rawXml = invoiceTemplateDoc.MainDocumentPart.Document.InnerXml;
rawXml = DoXsltTransformation(rawXml, #"c:\prepare-invoice.xslt"));
// ... do more manipulations on the rawXml
Alternatively, instead of taking and returning strings, it could be taking and returning XmlNodes.
Is there such a function?
You can use the StringReader and StringWriter classes :
string input = "<?xml version=\"1.0\"?> ...";
string output;
using (StringReader sReader = new StringReader(input))
using (XmlReader xReader = XmlReader.Create(sReader))
using (StringWriter sWriter = new StringWriter())
using (XmlWriter xWriter = XmlWriter.Create(sWriter))
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load("transform.xsl");
xslt.Transform(xReader, xWriter);
output = sWriter.ToString();
}
A little know feature is that you can in fact transform data directly into a XmlDocument DOM or into a LINQ-to-XML XElement or XDocument (via the CreateWriter() method) without having to pass through a text form by getting an XmlWriter instance to feed them with data.
Assuming that your XML input is IXPathNavigable and that you have loaded a XslCompiledTransform instance, you can do the following:
XmlDocument target = new XmlDocument(input.CreateNavigator().NameTable);
using (XmlWriter writer = target.CreateNavigator().AppendChild()) {
transform.Transform(input, writer);
}
You then have the transformed document in the taget document. Note that there are other overloads on the transform to allow you to pass XSLT arguments and extensions into the stylesheet.
If you want you can write your own static helper or extension method to perform the transform as you need it. However, it may be a good idea to cache the transform since loading and compiling it is not free.
Have you noticed that there is the XsltCompiledTransform class?

Categories