Avoid XmlDocument validating namespaces in C# - c#

I'm trying to find a way of indenting a HTML file, I've been using XMLDocument and just using a XmlTextWriter.
However I am unable to format it correctly for HTML documents because it checks the doctype and tries to download it.
Is there a "dumb" indenting mechanism that doesnt validate or check the document and does a best effort indentation? The files are 4-10Mb in size and they are autogenerated, we have to handle it internal - its fine, the user can wait, I just want to avoid forking to a new process etc.
Here's my code for reference
using (MemoryStream ms = new MemoryStream())
using (XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.Unicode))
{
XmlDocument doc = new XmlDocument();
// LoadSettings the unformatted XML text string into an instance
// of the XML Document Object Model (DOM)
doc.LoadXml(content);
// Set the formatting property of the XML Text Writer to indented
// the text writer is where the indenting will be performed
xtw.Formatting = Formatting.Indented;
// write dom xml to the xmltextwriter
doc.WriteContentTo(xtw);
// Flush the contents of the text writer
// to the memory stream, which is simply a memory file
xtw.Flush();
// set to start of the memory stream (file)
ms.Seek(0, SeekOrigin.Begin);
// create a reader to read the contents of
// the memory stream (file)
using (StreamReader sr = new StreamReader(ms))
return sr.ReadToEnd();
}
Essentially, right now I use a MemoryStream, XmlTextWriter and XmlDocument, once indented I read it back from the MemoryStream and return it as a string. Failures happen for XHTML documents and some HTML 4 documents because its trying to grab the dtds. I tried setting XmlResolver as null but to no avail :(

Without access to the specific X[H]TML causing the problems, it's hard to know if this will work, but have you tried using XDocument instead?
XDocument xdoc = XDocument.Parse(xml);
string formatted = xdoc.ToString();

Related

Transforming two xml files with XslCompiledTransform

I have two xml files that I want to transform using XslCompiledTransform. Trouble is i have to do that in one transformation. I'm using .Transform method for the first file, while the other file is being referred to within xsl script. What I need as a result is html output that contains some data from both xml files. My code is:
XsltSettings settings = new XsltSettings(true, true);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
myXslTransform.Load(openFileDialog1.FileName, settings, new XmlUrlResolver());
string HTMLoutput;
StringWriter writer = new StringWriter();
myXslTransform.Transform("file1.xml", null, writer);
HTMLoutput = writer.ToString();
writer.Close();
I catch following exception: "An error occurred while loading document'file2.xml" and InnerException: "For security reasons DTD is prohibited in this XML document. To enable DTD processing set the DtdProcessing property on XmlReaderSettings to Parse and pass the settings into XmlReader.Create method."
So how do I do what InnerExcetion tells me to do when XmlReader is being used by .Transform method? Or is there any other way to achieve this kind of transformation?
Use an XmlReader for file1.xml with XmlReaderSettings allowing Dtds, I think any secondary XML documents loaded with the document function are then loaded with the same settings.

Deserialize on the fly, or LINQ to XML

Working with C# Visual Studio 2008, MVC1.
I'm creating an xml file by fetching one from a WebService and adding some nodes to it. Now I wanted to deserialize it to a class which is the model used to strongtyped the View.
First of all, I'm facing problems to achieve that without storing the xml in the filesystem cause I don't know how this serialize and deserialize work. I guess there's a way and it's a matter of time.
But, searching for the previous in the web I came accross LINQ to XML and now I doubt whether is better to use it.
The xml would be formed by some clients details, and basically I will use all of them.
Any hint?
Thanks!!
You can save a XElement to and from a MemoryStream (no need to save it to a file stream)
MemoryStream ms = new MemoryStream();
XmlWriter xw = XmlWriter.Create(ms);
document.Save(xw);
xw.Flush();
Then if you reset the position back to 0 you can deserialize it using the DataContractSerializer.
ms.Position = 0;
DataContractSerializer serializer = new DataContractSerializer(typeof(Model));
Model model = (model) serializer.ReadObject(ms);
There are other options for how serialization works, so if this is not what you have, let me know what you are using and I will help.
try this:
XmlSerializer xmls = new XmlSerializer(typeof(XElement));
FileStream FStream;
try
{
FStream = new FileStream(doctorsPath, FileMode.Open);
_Doctors = (XElement)xmls.Deserialize(FStream); FStream.Close();
FStream = new FileStream(patientsPath, FileMode.Open);
_Patients = (XElement)xmls.Deserialize(FStream)
FStream.Close();
FStream = new FileStream(treatmentsPath, FileMode.Open);
_Treatments = (XElement)xmls.Deserialize(FStream);
FStream.Close();
}
catch
{ }
This will load all of the XML files into our XElement variables. The try – catch block is a form of exception handling that ensures that if one of the functions in the try block throws an exception, the program will jump to the catch section where nothing will happen. When working with files, especially reading files, it is a good idea to work with try – catch.
LINQ to XML is an excellent feature. You can always rely on that. You don't need to write or read or data from file. You can specify either string or stream to the XDocument
There are enough ways to load an XML element to the XDocument object. See the appropriate Load functions. Once you load the content, you can easily add/remove the elements and later you can save to disk if you want.

XDocument : is it possible to force the load of a malformed XML file?

I have a malformed XML file. The root tag is not closed by a tag. The final tag is missing.
When I try to load my malformed XML file in C#
StreamReader sr = new StreamReader(path);
batchFile = XDocument.Load(sr); // Exception
I get an exception "Unexpected end of file has occurred. The following elements are not closed: batch. Line 54, position 1."
Is it possible to ignore the close tag or to force the loading? I noticed that all my XML tools ((like XML notepad) ) automaticly fix or ignore the problem. I can not fix the XML file. This one copme from a third party software and sometimes the file is correct.
You cant do it with XDocument because this class loads all document in memory and parse it completly.
But its possible to process document with XmlReader it would get you to read and process complete document and at the end youll get missing tag exeption.
I suggest using Tidy.NET to cleanup messy input
Tidy.NET has a nice API to get a list of problems (MessageCollection) in your 'XML' and you can use it to fix the text stream in memory. The simplest thing would be to fix one error at a time, thought that will not perform too well with many errors. Otherwise, you might fix errors in reverse document order so that the offsets of messages stay valid while doing the fixes
Here is an example to convert HTML input into XHTML:
Tidy tidy = new Tidy();
/* Set the options you want */
tidy.Options.DocType = DocType.Strict;
tidy.Options.DropFontTags = true;
tidy.Options.LogicalEmphasis = true;
tidy.Options.Xhtml = true;
tidy.Options.XmlOut = true;
tidy.Options.MakeClean = true;
tidy.Options.TidyMark = false;
/* Declare the parameters that is needed */
TidyMessageCollection tmc = new TidyMessageCollection();
MemoryStream input = new MemoryStream();
MemoryStream output = new MemoryStream();
byte[] byteArray = Encoding.UTF8.GetBytes("Put your HTML here...");
input.Write(byteArray, 0 , byteArray.Length);
input.Position = 0;
tidy.Parse(input, output, tmc);
string result = Encoding.UTF8.GetString(output.ToArray());
What you could do is add the closing tag to the xml in memory and then load it.
So after loading the xml into the streamreader, manipulate the data before you do the xml load

XMLReader from a string content

I'm trying to generate XML from another XML using a XslTransform. I get both files (source XML and XSL transformation file) as string content, so I'm trying to pass the XSL file to XslTransform.Load() method as XmlReader. Now the XmlReader has to be created form a source string containing XSL file, so i try doing it like this:
MemoryStream memStream = new MemoryStream();
byte[] data = Encoding.Default.GetBytes(transformation.XsltContent);
memStream.Write(data, 0, data.Length);
memStream.Position = 0;
XmlReader reader = XmlReader.Create(memStream);
and also tried using a StringReader:
XmlReader reader = XmlReader.Create(new StringReader(transformation.XsltContent));
Unfortunately, bot methods don't seems to work, the input seems to be ok, I even tried creating some basic one-element XML to pass, won't work either - reader contains {None}.
Could someone point out what seems to be the problem here?
XmlReader xmlReader = XmlReader.Create(new StringReader(YourStringValue));
The StringReader -> XmlReader approach is fine, you should stick to it. The reader reports none because it hasn't been read yet. Try calling Read() on it to see what happens then. The transformation will also call read on it.

LINQ to XML - Adding a node to a .csproj file

I've written a code generator, that generates C# files. If the file being generated is new, I need to add a reference to it to our .csproj file. I have the following method that adds a node to a .csproj file.
private static void AddToProjectFile(string projectFileName, string projectFileEntry)
{
StreamReader streamReader = new StreamReader(projectFileName);
XmlTextReader xmlReader = new XmlTextReader(streamReader);
XElement element;
XNamespace nameSpace;
// Load the xml document
XDocument xmlDoc = XDocument.Load(xmlReader);
// Get the xml namespace
nameSpace = xmlDoc.Root.Name.Namespace;
// Close the reader so we can save the file back.
streamReader.Close();
// Create the new element we want to add.
element = new XElement(nameSpace + "Compile", new XAttribute("Include", projectFileEntry));
// Add the new element.
xmlDoc.Root.Elements(nameSpace + "ItemGroup").ElementAt(1).Add(element);
xmlDoc.Save(projectFileName);
}
This method works fine. However, it doesn't add the node on a new line. It will append it to the previous line in the .csproj file. This makes for a bit of a mess when doing TFS merging. How can I add the new node on a new line?
Why are you using StreamReader and then XmlTextReader? Just pass the filename to the XDocument.Load. Then everything works as you would expect.
If you create the reader on your own XDocument can't modify its settings and thus the reader will report whitespaces which are then stored in the XLinq tree and when written out they disable automatic formatting in the writer. So you can either set IgnoreWhitespaces to true on your reader, or pass the input just as a filename, which will let XDocument use its own settings which will include IgnoreWhitespaces.
As a side note, please don't use XmlTextReader, a more spec compliant XML reader is created when you call XmlReader.Create.

Categories