Let's say we want to load an xml (cXML) and validate it against a DTD that we have stored locally. Here's the code for this:
XmlPreloadedResolver resolver = new XmlPreloadedResolver(XmlKnownDtds.None);
resolver.Add(new Uri(DocTypeSystemId), File.ReadAllText(#"C:\cXml.dtd"));
XmlReaderSettings settings = new XmlReaderSettings
{
ValidationType = ValidationType.DTD,
DtdProcessing = DtdProcessing.Parse
};
settings.ValidationEventHandler += Settings_ValidationEventHandler;
XmlParserContext context = new XmlParserContext(null, null, "cXML", null,
DocTypeSystemId, null, null, null, XmlSpace.None);
XmlReader reader = XmlReader.Create(stream, settings, context);
XDocument doc = XDocument.Load(reader);
Unfortunately in case the cXML input already comes with a DTD definition, the XmlReader will throw an XmlException stating: Message Cannot have multiple DTDs. Line 2, position 1.
If we remove the DOCTYPE from the input a warning is shown No DTD found. and the xml isn't validated.
It seems that XmlReader has hard time using an XmlParserContext.
If instead the reader is an instance of the obsolete XmlTextReader:
XmlTextReader textReader = new XmlTextReader(stream, XmlNodeType.Document, context);
XmlValidatingReader reader = new XmlValidatingReader(textReader);
reader.ValidationType = ValidationType.DTD;
reader.ValidationEventHandler += Settings_ValidationEventHandler;
Then there is no exception for multiple DTDs and the xml is validated.
Obviously there is a difference between how XmlTextReader and XmlReader function. They both seem to output a warning when the xml is missing a DOCTYPE which halts validation. The following calls are involved in the misunderstanding XmlValidatingReaderImpl.ProcessCoreReaderEvent() and DtdValidator.Validate() (where schemaInfo.SchemaType == SchemaType.DTD is false maybe because it's no DTD exists).
With all this in mind it seems better to just try to change/add the DOCTYPE element in the input xml than battle with XmlParserContext and the different reader implementations.
Related
I am trying to validate a xml file againg a xsd, but for some reason the following Errorr occures.
Strange is that different programms, which do the same does not show an error.
How can i change the settings for the xsd to not check for missing attributes and just ignore them?
Code:
static void ValidateXml(string xmlFile, string xsdFile)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add(null, xsdFile);
settings.ValidationType = ValidationType.Schema;
settings.ValidationEventHandler += ValidationEventHandler;
using (XmlReader reader = XmlReader.Create(xmlFile, settings)) // failing here
{
while(reader.Read())
{
// Reading and doing nothing with the xml file.
}
}
}
Error Message:
System.Xml.Schema.XmlSchemaValidationException: "The 'http://www.w3.org/XML/1998/namespace:lang' attribute is not declared."
I tried it using XMlDocument, but that doesnt work for me, cause i am reading of xml files up to 2 gb.
I am working with Mismo 2.3.1, dtd based schema. I converted the dtd to xsd and then generated c# code to serialize/deserialze object representations of the xml doc.
Given a valid mismo 2.3.1 xml doc, I can deserialize into my generated C# class.
I have code working to use XmlSerializer along with XmlReaderSettings and XmlSchmeas collection, reading in my converted xsd.
If I put xmlns="http://mySchema..." in the root element, and try to validate intentionally invalid xml, works as expected, my validation event gets pinged with accurate description.
If I take out the xmlns attribute, then i get "could not find schema information for element [my root element]"
Any idea on how to validate xml that comes in without the xmlns spec? Any settings to say to the serializer "use this schema when you come across this element"?
Thanks in advance!
static void Main() {
var settings = new XmlReaderSettings();
settings.NameTable = new NameTable();
var nsMgr = new XmlNamespaceManager(settings.NameTable);
nsMgr.AddNamespace("", "http://example.com/2013/ns"); // <-- set default namespace
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(null, #"C:\XSDSchema.xsd"); // <-- set schema location for the default namespace
var parserCtx = new XmlParserContext(settings.NameTable, nsMgr, XmlSpace.Default);
using (var reader = XmlReader.Create(#"C:\file.xml", settings, parserCtx)) {
var serializer = new XmlSerializer(typeof(Foo));
Foo f = (Foo)serializer.Deserialize(reader);
}
}
I have a function that translates a xml file using a xsl style sheet. It does the job fine; but when I want to delete that transformed file sometimes I get the following error: System.IO.IOException: The process cannot access the file
The function is like this:
XslTransform transform = new XslTransform();
transform.Load('xsl_style_sheet');
transform.Transform('fullpath/xmlfilename','fullpath/transformedFileName')
XElement xEle = XElement.Load('fullpath/transformedFileName');
I do what ever with the xEle and in the end I want to delete the 'fullpath/transformedFileName' but some times i get the dreaded System.IO.IOException: The process cannot access the file
Can any one please help. A million thanks
Use the XslCompiledTranform class (XslTranform is obsolete ) and the overload on Transform that accepts an XmlReader and XmlWriter. You can call Dispose on them, they will take care of closing and disposing the underlying stream.
// Load the style sheet.
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load("xsl_style_sheet");
// Create the writer.
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = "\t";
using(XmlWriter writer = XmlWriter.Create("fullpath/transformedFileName", settings))
{
using(XmlReader reader = XmlReader.Create("fullpath/xmlfilename"))
{
reader.MoveToContent();
xslt.Transform(reader, writer);
}
}
using(XmlReader reader = XmlReader.Create("fullpath/transformedFileName"))
{
XElement xEle = XElement.Load(reader);
// do all other stuff you need to do here
// after this the file will be closed
}
We had a security audit on our code, and they mentioned that our code is vulnerable to EXternal Entity (XXE) attack. I am using following code -
string OurOutputXMLString=
"<ce><input><transaction><length>00000</length><tran_type>Login</tran_type></transaction><user><user_id>ce_userid</user_id><subscriber_name>ce_subscribername</subscriber_name><subscriber_id>ce_subscriberid</subscriber_id><group_id>ce_groupid</group_id><permissions></permissions></user><consumer><login_details><username>UnitTester9</username><password>pDhE5AsKBHw85Sqgg6qdKQ==</password><pin>tOlkiae9epM=</pin></login_details></consumer></input></ce>"
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(OurOutputXMLString);
In the audit report they say that it's failing because an XML entity can contain URLs that can resolve outside of intended control. XML entity resolver will attempt to resolve and retrieve external references. If attacker-controlled XML can be submitted to one of these functions, then the attacker could gain access to information about an internal network, local filesystem, or other sensitive data.
To avoid this I wrote the following code but it doesn't work.
MemoryStream stream =
new MemoryStream(System.Text.Encoding.Default.GetBytes(OurOutputXMLString));
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.MaxCharactersFromEntities = 6000;
XmlReader reader = XmlReader.Create(stream, settings);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
But I can see here that reader does not have any value to load into xmlDoc(XmlDocument).
Can anyone help where I am missing things?
External resources are resolved using the XmlResolver provided via XmlDocument.XmlResolver property. If your XML documents **should not contain any external resource **(for example DTDs or schemas) simply set this property to null:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.XmlResolver = null;
xmlDoc.LoadXml(OurOutputXMLString);
If you want to filter where these URLs come from (for example to allow only certain domains) just derive your own class from XmlUrlResolver and override the ResolveUri() method. There you can check what the URL is and sanitize it (for example you can allow only URLs within your local network or from trusted sources).
For example:
class CustomUrlResovler : XmlUrlResolver
{
public override Uri ResolveUri(Uri baseUri, string relativeUri)
{
Uri uri = new Uri(baseUri, relativeUri);
if (IsUnsafeHost(uri.Host))
return null;
return base.ResolveUri(baseUri, relativeUri);
}
private bool IsUnsafeHost(string host)
{
return false;
}
}
Where IsUnsafeHost() is a custom function that check if the given host is allowed or not. See this post here on SO for few ideas. Just return null from ResolveUri() to save your code from this kind of attacks. In case the URI is allowed you can simply return the default XmlUrlResolver.ResolveUri() implementation.
To use it:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.XmlResolver = new CustomUrlResolver();
xmlDoc.LoadXml(OurOutputXMLString);
For more details about how XML external resources are resolved just read Resolving External Resources on MS Docs. If your code is more complex than this example then you should definitely read Remarks section for XmlDocument.XmlResolver property.
So its better to use
new XmlDocument { XmlResolver = null };
Interestingly from .net 4.5.2 and 4.6, the default resolver behaves differently and does not use an XmlUrlResolver upfront implicitly to resolve any urls or locations as i seen.
//In pre 4.5.2 it is a security issue.
//In 4.5.2 it will not resolve any more the url references in dtd and such,
//Still better to avoid the below since it will trigger security warnings.
new XmlDocument();
Setting the XmlReaderSettings.DtdProcessing to DtdProcessing.Prohibit works totally fine in .NET 4.7.2. Here is what i used to test.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE demo
[
<!ELEMENT demo ANY >
<!ENTITY % extentity SYSTEM "https://www.hl7.org/documentcenter/public/wg/structure/CDA.xsl">
%extentity;
]>
<test>
Some random content
</test>
Saved the above content in a file and read the file from the following fragment of c# code.
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.MaxCharactersFromEntities = 6000;
//The following stream should be the filestream of the above content.
XmlReader reader = XmlReader.Create(stream, settings);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
I get the following exception.
For security reasons DTD is prohibited in this XML document. To enable DTD
processing set the DtdProcessing property on XmlReaderSettings to Parse and
pass the settings into XmlReader.Create method.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlDocument.Load(XmlReader reader)
This has been bugging me for a couple days. I'm trying to load a XML from an uploaded file to into an XmlDocument object and get the following yellow-screen-of-death:
For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method.
Here's my code. You can clearly see I'm setting ProhibitDtd to false.
public static XmlDocument LoadXml(FileUpload fu)
{
var settings = new XmlReaderSettings
{
ProhibitDtd = false,
ValidationType = ValidationType.DTD
};
var sDtdPath = string.Format(#"{0}", HttpContext.Current.Server.MapPath("/includes/dtds/2.3/archivearticle.dtd"));
settings.Schemas.Add(null, sDtdPath);
var r = XmlReader.Create(new StreamReader(fu.PostedFile.InputStream), settings);
var document = new XmlDocument();
document.Load(r);
return document;
}
Add XmlResolver=null to your XmlReaderSettings. This will prevent the xmlDocument from trying to access the DTD. If you need to validate, do that in a separate operation.