Alternatives to XDocument and XmlDocument for loading xml files in C#?

Alternatives to XDocument and XmlDocument for loading xml files in C#? - c#

I want to change an attribute inside an xml file using C#.
Here is a sample XML file
<?xml version="1.0" encoding="us-ascii"?>
<Client>
<Age>25</Age>
<Weight>50</Weight>
</Client>
I tried loading the xml file using both XmlDocument and XDocument. They both take so much time (more than 5 minutes) to load.
Here is the code I am using to load the file:
string filePath = #"myFile.xml";
XmlDocument xmlData = new XmlDocument();
As per Google, the problem is that XDocument and XmlDocument will load all the DTDs for XML file, and this is why it takes much time. Is there a workaround for this? or maybe any alternative that allows me to change an attribute without loading all the DtDs?

You can control how DTDs are cached, parsed or used for validation with XmlReaderSettings and still use XDocument.
If you can take the time to cache the DTDs and changing them isn't part of your test, you could take the hit once and cache them.
If that's too much time or they aren't available and they aren't needed for your tests, you could skip DTD processing.
using (var reader = XmlReader.Create(_,
new XmlReaderSettings
{
DtdProcessing = DtdProcessing.Ignore,
ValidationType = ValidationType.None,
//DtdProcessing = DtdProcessing.Parse,
//ValidationType = ValidationType.DTD,
XmlResolver = new XmlUrlResolver
{
CachePolicy = new RequestCachePolicy(RequestCacheLevel.CacheIfAvailable),
//CachePolicy = new RequestCachePolicy(RequestCacheLevel.NoCacheNoStore),
}
}))
{
var doc = XDocument.Load(reader);
//…
}
XmlReaderSettings has many other properties that sometimes come in handy.

Related

How to validate xml, not containing xmlns=..., with c# XmlSerializer?

I am working with Mismo 2.3.1, dtd based schema. I converted the dtd to xsd and then generated c# code to serialize/deserialze object representations of the xml doc.
Given a valid mismo 2.3.1 xml doc, I can deserialize into my generated C# class.
I have code working to use XmlSerializer along with XmlReaderSettings and XmlSchmeas collection, reading in my converted xsd.
If I put xmlns="http://mySchema..." in the root element, and try to validate intentionally invalid xml, works as expected, my validation event gets pinged with accurate description.
If I take out the xmlns attribute, then i get "could not find schema information for element [my root element]"
Any idea on how to validate xml that comes in without the xmlns spec? Any settings to say to the serializer "use this schema when you come across this element"?
Thanks in advance!

static void Main() {
var settings = new XmlReaderSettings();
settings.NameTable = new NameTable();
var nsMgr = new XmlNamespaceManager(settings.NameTable);
nsMgr.AddNamespace("", "http://example.com/2013/ns"); // <-- set default namespace
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(null, #"C:\XSDSchema.xsd"); // <-- set schema location for the default namespace
var parserCtx = new XmlParserContext(settings.NameTable, nsMgr, XmlSpace.Default);
using (var reader = XmlReader.Create(#"C:\file.xml", settings, parserCtx)) {
var serializer = new XmlSerializer(typeof(Foo));
Foo f = (Foo)serializer.Deserialize(reader);
}
}

C# XDocument Load with multiple roots

I have an XML file with no root. I cannot change this. I am trying to parse it, but XDocument.Load won't do it. I have tried to set ConformanceLevel.Fragment, but I still get an exception thrown. Does anyone have a solution to this?
I tried with XmlReader, but things are messed up and can't get it work right. XDocument.Load works great, but if I have a file with multiple roots, it doesn't.

XmlReader itself does support reading of xml fragment - i.e.
var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var reader = XmlReader.Create("fragment.xml", settings))
{
// you can work with reader just fine
}
However XDocument.Load does not support reading of fragmented xml.
Quick and dirty way is to wrap the nodes under one virtual root before you invoke the XDocument.Parse. Like:
var fragments = File.ReadAllText("fragment.xml");
var myRootedXml = "<root>" + fragments + "</root>";
var doc = XDocument.Parse(myRootedXml);
This approach is limited to small xml files - as you have to read file into memory first; and concatenating large string means moving large objects in memory - which is best avoided.
If performance matters you should be reading nodes into XDocument one-by-one via XmlReader as explained in excellent #Martin-Honnen 's answer (https://stackoverflow.com/a/18203952/2440262)
If you use API that takes for granted that XmlReader iterates over valid xml, and performance matters, you can use joined-stream approach instead:
using (var jointStream = new MultiStream())
using (var openTagStream = new MemoryStream(Encoding.ASCII.GetBytes("<root>"), false))
using (var fileStream =
File.Open(#"fragment.xml", FileMode.Open, FileAccess.Read, FileShare.Read))
using (var closeTagStream = new MemoryStream(Encoding.ASCII.GetBytes("</root>"), false))
{
jointStream.AddStream(openTagStream);
jointStream.AddStream(fileStream);
jointStream.AddStream(closeTagStream);
using (var reader = XmlReader.Create(jointStream))
{
// now you can work with reader as if it is reading valid xml
}
}
MultiStream - see for example https://gist.github.com/svejdo1/b9165192d313ed0129a679c927379685
Note: XDocument loads the whole xml into memory. So don't use it for large files - instead use XmlReader for iteration and load just the crispy bits as XElement via XNode.ReadFrom(...)

The only in-memory tree representations in the .NET framework that can deal with fragments are the XmlDocumentFragment in .NET's DOM implementation so you would need to create an XmlDocument and a fragment with e.g.
XmlDocument doc = new XmlDocument();
XmlDocumentFragment frag = doc.CreateDocumentFragment();
frag.InnerXml = stringWithXml; // for instance
// frag.InnerXml = File.ReadAllText("fragment.xml");
or is XPathDocument where you can create one using an XmlReader with ConformanceLevel set to Fragment:
XPathDocument doc;
using (XmlReader xr =
XmlReader.Create("fragment.xml",
new XmlReaderSettings()
{
ConformanceLevel = ConformanceLevel.Fragment
}))
{
doc = new XPathDocument(xr);
}
// new create XPathNavigator for read out data e.g.
XPathNavigator nav = doc.CreateNavigator();
Obviously XPathNavigator is read-only.
If you want to use LINQ to XML then I agree with the suggestions made that you need to create an XElement as a wrapper. Instead of pulling in a string with the file contents you could however use XNode.ReadFrom with an XmlReader e.g.
public static class MyExtensions
{
public static IEnumerable<XNode> ParseFragment(XmlReader xr)
{
xr.MoveToContent();
XNode node;
while (!xr.EOF && (node = XNode.ReadFrom(xr)) != null)
{
yield return node;
}
}
}
then
XElement root = new XElement("root",
MyExtensions.ParseFragment(XmlReader.Create(
"fragment.xml",
new XmlReaderSettings() {
ConformanceLevel = ConformanceLevel.Fragment })));
That might work better and more efficiently than reading everything into a string.

If you wanted to use XmlDocument.Load() then you would need to wrap the content in a root node.
or you could try something like this...
while (xmlReader.Read())
{
if (xmlReader.NodeType == XmlNodeType.Element)
{
XmlDocument d = new XmlDocument();
d.CreateElement().InnerText = xmlReader.ReadOuterXml();
}
}

XML document cannot have more than one root elements. One root element is required. You may do one thing. Get all the fragment elements and wrap them into a root element and parse it with XDocument.
This would be the best and easiest approach that one could think of.

How to prevent XXE attack (XmlDocument in .NET)

We had a security audit on our code, and they mentioned that our code is vulnerable to EXternal Entity (XXE) attack. I am using following code -
string OurOutputXMLString=
"<ce><input><transaction><length>00000</length><tran_type>Login</tran_type></transaction><user><user_id>ce_userid</user_id><subscriber_name>ce_subscribername</subscriber_name><subscriber_id>ce_subscriberid</subscriber_id><group_id>ce_groupid</group_id><permissions></permissions></user><consumer><login_details><username>UnitTester9</username><password>pDhE5AsKBHw85Sqgg6qdKQ==</password><pin>tOlkiae9epM=</pin></login_details></consumer></input></ce>"
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(OurOutputXMLString);
In the audit report they say that it's failing because an XML entity can contain URLs that can resolve outside of intended control. XML entity resolver will attempt to resolve and retrieve external references. If attacker-controlled XML can be submitted to one of these functions, then the attacker could gain access to information about an internal network, local filesystem, or other sensitive data.
To avoid this I wrote the following code but it doesn't work.
MemoryStream stream =
new MemoryStream(System.Text.Encoding.Default.GetBytes(OurOutputXMLString));
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.MaxCharactersFromEntities = 6000;
XmlReader reader = XmlReader.Create(stream, settings);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
But I can see here that reader does not have any value to load into xmlDoc(XmlDocument).
Can anyone help where I am missing things?

External resources are resolved using the XmlResolver provided via XmlDocument.XmlResolver property. If your XML documents **should not contain any external resource **(for example DTDs or schemas) simply set this property to null:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.XmlResolver = null;
xmlDoc.LoadXml(OurOutputXMLString);
If you want to filter where these URLs come from (for example to allow only certain domains) just derive your own class from XmlUrlResolver and override the ResolveUri() method. There you can check what the URL is and sanitize it (for example you can allow only URLs within your local network or from trusted sources).
For example:
class CustomUrlResovler : XmlUrlResolver
{
public override Uri ResolveUri(Uri baseUri, string relativeUri)
{
Uri uri = new Uri(baseUri, relativeUri);
if (IsUnsafeHost(uri.Host))
return null;
return base.ResolveUri(baseUri, relativeUri);
}
private bool IsUnsafeHost(string host)
{
return false;
}
}
Where IsUnsafeHost() is a custom function that check if the given host is allowed or not. See this post here on SO for few ideas. Just return null from ResolveUri() to save your code from this kind of attacks. In case the URI is allowed you can simply return the default XmlUrlResolver.ResolveUri() implementation.
To use it:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.XmlResolver = new CustomUrlResolver();
xmlDoc.LoadXml(OurOutputXMLString);
For more details about how XML external resources are resolved just read Resolving External Resources on MS Docs. If your code is more complex than this example then you should definitely read Remarks section for XmlDocument.XmlResolver property.

So its better to use
new XmlDocument { XmlResolver = null };
Interestingly from .net 4.5.2 and 4.6, the default resolver behaves differently and does not use an XmlUrlResolver upfront implicitly to resolve any urls or locations as i seen.
//In pre 4.5.2 it is a security issue.
//In 4.5.2 it will not resolve any more the url references in dtd and such,
//Still better to avoid the below since it will trigger security warnings.
new XmlDocument();

Setting the XmlReaderSettings.DtdProcessing to DtdProcessing.Prohibit works totally fine in .NET 4.7.2. Here is what i used to test.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE demo
[
<!ELEMENT demo ANY >
<!ENTITY % extentity SYSTEM "https://www.hl7.org/documentcenter/public/wg/structure/CDA.xsl">
%extentity;
]>
<test>
Some random content
</test>
Saved the above content in a file and read the file from the following fragment of c# code.
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.MaxCharactersFromEntities = 6000;
//The following stream should be the filestream of the above content.
XmlReader reader = XmlReader.Create(stream, settings);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
I get the following exception.
For security reasons DTD is prohibited in this XML document. To enable DTD
processing set the DtdProcessing property on XmlReaderSettings to Parse and
pass the settings into XmlReader.Create method.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlDocument.Load(XmlReader reader)

Validate xml against dtd from string

I have an xml file that refers to a local dtd file. But the problem is that my files are being compressed into a single file (I am using Unity3D and it puts all my textfiles into one binary). This question is not Unity3D specific, it is useful for anyone that tries to load a DTD schema from a string.
I have thought of a workaround to load the xml and load the dtd separately and then add the dtd file to the XmlSchemas of my document. Like so:
private void ReadConfig(string filePath)
{
// load the xml file
TextAsset text = (TextAsset)Resources.Load(filePath);
StringReader sr = new StringReader(text.text);
sr.Read(); // skip BOM, Unity3D catch!
// load the dtd file
TextAsset dtdAsset = (TextAsset)Resources.Load("Configs/condigDtd");
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add(...); // my dtd should be added into this schemaset somehow, but it's only a string and not a filepath.
XmlReaderSettings settings = new XmlReaderSettings() { ValidationType = ValidationType.DTD, ProhibitDtd = false, Schemas = schemaSet};
XmlReader r = XmlReader.Create(sr, settings);
XmlDocument doc = new XmlDocument();
doc.Load(r);
}
The xml starts like this, but the dtd cannot be found. Not strange, because the xml file was loaded as a string, not from a file.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Scene SYSTEM "configDtd.dtd">

XmlSchema has a Read method that takes in a Stream and a ValidationEventHandler. If the DTD is a string, you could convert it to a stream
System.Text.Encoding encode = System.Tet.Encoding.UTF8;
MemoryStream ms = new MemoryStream(encode.GetBytes(myDTD));
create the XmlSchema
XmlSchema mySchema = XmlSchema.Read(ms, DTDValidation);
add this schema to the XmlDocument containing the xml you are validating
myXMLDocument.Schemas.Add(mySchema);
myXMLDocument.Schemas.Compile();
myXMLDocument.Validate(DTDValidation);
The DTDValidation() handler would contain code handling what to do if the xml is invalid.

Problem validating XML against DTD in C#

This has been bugging me for a couple days. I'm trying to load a XML from an uploaded file to into an XmlDocument object and get the following yellow-screen-of-death:
For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method.
Here's my code. You can clearly see I'm setting ProhibitDtd to false.
public static XmlDocument LoadXml(FileUpload fu)
{
var settings = new XmlReaderSettings
{
ProhibitDtd = false,
ValidationType = ValidationType.DTD
};
var sDtdPath = string.Format(#"{0}", HttpContext.Current.Server.MapPath("/includes/dtds/2.3/archivearticle.dtd"));
settings.Schemas.Add(null, sDtdPath);
var r = XmlReader.Create(new StreamReader(fu.PostedFile.InputStream), settings);
var document = new XmlDocument();
document.Load(r);
return document;
}

Add XmlResolver=null to your XmlReaderSettings. This will prevent the xmlDocument from trying to access the DTD. If you need to validate, do that in a separate operation.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Alternatives to XDocument and XmlDocument for loading xml files in C#? - c#

Related

How to validate xml, not containing xmlns=..., with c# XmlSerializer?

C# XDocument Load with multiple roots

How to prevent XXE attack (XmlDocument in .NET)

Validate xml against dtd from string

Problem validating XML against DTD in C#

Categories

Resources