Reference to undeclared entity &copy - c#

When I process the XML files using C#, I get this error. I searched previous questions and found the reason. I understand these entities are not predefined in XML and must be included in DTD. It is included in the DTD. My XML files include the following DTD.
<!DOCTYPE doc PUBLIC "-//Location//EN"
"NAME.dtd" [
<!ENTITY C-1FHY "SD FFF">
<!ENTITY Ca- "XX">
]>
Also
I need to read content from this XML file. I used XMLReader.
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
XmlReader doc = XmlReader.Create(f, settings);
while (doc.Read())
{
If I ignore DTD, it throws the error. If I parse, then it says it couldnt find the DTD in the location where every file is. If I copy the DTD in the same location where the file is, i dont have any problem.
My problem is there are 500+ docs in more than 60+ sub folders. I can't put a copy of the DTD in every folder. Is there a way I store a single copy of DTD in a path and link it in the code? Please help me in this.

You can make a custom XmlUrlResolver that remaps the file location:
public class XmlUrlOverrideResolver : XmlUrlResolver
{
public Dictionary<string, string> DtdFileMap { get; private set; }
public XmlUrlOverrideResolver()
{
this.DtdFileMap = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
}
public override Uri ResolveUri(Uri baseUri, string relativeUri)
{
string remappedLocation;
if (DtdFileMap.TryGetValue(relativeUri, out remappedLocation))
return new Uri(remappedLocation);
var value = base.ResolveUri(baseUri, relativeUri);
return value;
}
}
And then use it like:
var resolver = new XmlUrlOverrideResolver();
resolver.DtdFileMap[#"NAME.dtd"] = #"C:\Location\Of\File\name.dtd";
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
settings.XmlResolver = resolver;
// Proceed as before.

Related

C# Validate XML file using a DTD file without the DOCTYPE string

I am trying to write a C# class that validates a xml file using a DTD file located in another folder that is not in a relative location with the DOCTYPE string, so far, my code is like this:
var settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
settings.ValidationType = ValidationType.DTD;
settings.XmlResolver = new XmlUrlResolver();
settings.ValidationEventHandler += new ValidationEventHandler(IsLoaded);
using (var reader = XmlReader.Create(new StringReader(xmlString), settings))
{
while (reader.Read()) { }
reader.Close();
}
So far this works fine loading the DTD file from the DOCTYPE string included in the xml file, but the DTD file itself must be kept in a folder that is relative to where the program is being excuted. Is there a way to mingle with the XmlResolver class where I can ask it to get a DTD file from another location on my hard drive, like an absoute path being passed in the find the DTD files instead of using the DOCTYPE string?

Deserialize XML Fragment with Namespace using C#

I'm having issues deserializing the following XML fragment (from OneNote):
<one:OE creationTime="2015-03-21T18:32:38.000Z" lastModifiedTime="2015-03-21T18:32:38.000Z" objectID="{649CA68C-C596-4F89-9885-1553A953529E}{30}{B0}" alignment="left" quickStyleIndex="1" selected="partial">
<one:List>
<one:Bullet bullet="2" fontSize="11.0" />
</one:List>
<one:T><![CDATA[Bullet point one]]></one:T>
</one:OE>
The following code is used to deserialize the above fragment. The OE class has the following attributes:
[System.CodeDom.Compiler.GeneratedCodeAttribute("System.Xml", "4.0.30319.34230")]
[System.SerializableAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(Namespace = "http://schemas.microsoft.com/office/onenote/2013/onenote")]
[System.Xml.Serialization.XmlRootAttribute("OE", Namespace = "http://schemas.microsoft.com/office/onenote/2013/onenote", IsNullable = true)]
public partial class OE : EntityBase<OE>
{
...
}
And the actual method to deserialize the fragment is in the base class, EntityBase:
public static T Deserialize(string xml)
{
System.IO.StringReader stringReader = null;
try
{
stringReader = new System.IO.StringReader(xml);
return ((T)(Serializer.Deserialize(System.Xml.XmlReader.Create(stringReader))));
}
finally
{
if ((stringReader != null))
{
stringReader.Dispose();
}
}
}
The deserialize method is called as follows:
var element = OE.Deserialize(xmlString);
Where the variable xmlString is the XML fragment given above. On calling the Deserialize method, I get the following error:
There is an error in XML document (1,2). ---> System.Xml.XmlException: 'one' is an undeclared prefix. Line 1, position 2.
I have spent some time looking at the attributes declaring the namepaces in the OE class, but everything appears to be correct. Can anyone point out to the mistake I'm making?
The answer given by matrixanomaly is correct, but unfortunately, the OneNote namespace given is incorrect. I'm working with OneNote 2013 and not 2010. The actual code I used to deserialize the same XML fragment as given in my question is as follows:
public static OE DeserializeFragment(string xmlFragment)
{
var serializer = new System.Xml.Serialization.XmlSerializer(typeof(OE));
System.IO.StringReader stringReader = null;
try
{
stringReader = new System.IO.StringReader(xmlFragment);
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("one", "http://schemas.microsoft.com/office/onenote/2013/onenote");
XmlParserContext context = new XmlParserContext(null, nsManager, null, XmlSpace.None);
XmlReaderSettings xmlReaderSettings = new XmlReaderSettings();
xmlReaderSettings.ConformanceLevel = ConformanceLevel.Fragment;
return ((OE)(serializer.Deserialize(System.Xml.XmlReader.Create(stringReader, xmlReaderSettings, context))));
}
finally
{
if ((stringReader != null))
{
stringReader.Dispose();
}
}
}
I think you need the original namespace declaration for one. This is because one is a namespace and items like OE and List are prefixes, which exist in the namespace created by oneNote, which the declaration isn't present in the fragment you posted. A prefix exists to avoid collisions in naming in the event that different XML documents get mixed together. see a w3schools link for further explanation
So a workaround would be to append the namespace such as <xmlns:one="http://schemas.microsoft.com/office/onenote/2010/onenote"> to each fragment (doesn't seem to be the most optimal, but eh it works), and go about deserializing it as you've done.
I don't have OneNote handy so that namespace declaration was from this forum post.
An alternate way of deserializing XML fragments is through XMLReader and XMLReaderSettings, where you can set the coformance level to Fragment. And adding a pre-defined namespace.
Example adapted from this MSDN blog
XmlDocument doc = new XmlDocument();
NameTable nt = new NameTable();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);
nsmgr.AddNamespace("one", "http://schemas.microsoft.com/office/onenote/2010/onenote");
XmlParserContext context = new XmlParserContext(null, nsmgr, null, XmlSpace.None);
XmlReaderSettings xset = new XmlReaderSettings();
xset.ConformanceLevel = ConformanceLevel.Fragment;
XmlReader rd = XmlReader.Create(new StringReader(XMLSource), xset, context);
doc.Load(rd);
I personally prefer using XMLReader and XMLReader settings, though they seem like more work having to CreateReader() and set things up, it looks to be a more robust way.
Or, if you don't want to deal with custom namespaces and what not, and don't run into the problem of collisions, just programatically remove the front declaration of one:, and move on with deserializing it. That's more of string manipulation, though.

How to validate xml, not containing xmlns=..., with c# XmlSerializer?

I am working with Mismo 2.3.1, dtd based schema. I converted the dtd to xsd and then generated c# code to serialize/deserialze object representations of the xml doc.
Given a valid mismo 2.3.1 xml doc, I can deserialize into my generated C# class.
I have code working to use XmlSerializer along with XmlReaderSettings and XmlSchmeas collection, reading in my converted xsd.
If I put xmlns="http://mySchema..." in the root element, and try to validate intentionally invalid xml, works as expected, my validation event gets pinged with accurate description.
If I take out the xmlns attribute, then i get "could not find schema information for element [my root element]"
Any idea on how to validate xml that comes in without the xmlns spec? Any settings to say to the serializer "use this schema when you come across this element"?
Thanks in advance!
static void Main() {
var settings = new XmlReaderSettings();
settings.NameTable = new NameTable();
var nsMgr = new XmlNamespaceManager(settings.NameTable);
nsMgr.AddNamespace("", "http://example.com/2013/ns"); // <-- set default namespace
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(null, #"C:\XSDSchema.xsd"); // <-- set schema location for the default namespace
var parserCtx = new XmlParserContext(settings.NameTable, nsMgr, XmlSpace.Default);
using (var reader = XmlReader.Create(#"C:\file.xml", settings, parserCtx)) {
var serializer = new XmlSerializer(typeof(Foo));
Foo f = (Foo)serializer.Deserialize(reader);
}
}

How to prevent XXE attack (XmlDocument in .NET)

We had a security audit on our code, and they mentioned that our code is vulnerable to EXternal Entity (XXE) attack. I am using following code -
string OurOutputXMLString=
"<ce><input><transaction><length>00000</length><tran_type>Login</tran_type></transaction><user><user_id>ce_userid</user_id><subscriber_name>ce_subscribername</subscriber_name><subscriber_id>ce_subscriberid</subscriber_id><group_id>ce_groupid</group_id><permissions></permissions></user><consumer><login_details><username>UnitTester9</username><password>pDhE5AsKBHw85Sqgg6qdKQ==</password><pin>tOlkiae9epM=</pin></login_details></consumer></input></ce>"
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(OurOutputXMLString);
In the audit report they say that it's failing because an XML entity can contain URLs that can resolve outside of intended control. XML entity resolver will attempt to resolve and retrieve external references. If attacker-controlled XML can be submitted to one of these functions, then the attacker could gain access to information about an internal network, local filesystem, or other sensitive data.
To avoid this I wrote the following code but it doesn't work.
MemoryStream stream =
new MemoryStream(System.Text.Encoding.Default.GetBytes(OurOutputXMLString));
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.MaxCharactersFromEntities = 6000;
XmlReader reader = XmlReader.Create(stream, settings);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
But I can see here that reader does not have any value to load into xmlDoc(XmlDocument).
Can anyone help where I am missing things?
External resources are resolved using the XmlResolver provided via XmlDocument.XmlResolver property. If your XML documents **should not contain any external resource **(for example DTDs or schemas) simply set this property to null:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.XmlResolver = null;
xmlDoc.LoadXml(OurOutputXMLString);
If you want to filter where these URLs come from (for example to allow only certain domains) just derive your own class from XmlUrlResolver and override the ResolveUri() method. There you can check what the URL is and sanitize it (for example you can allow only URLs within your local network or from trusted sources).
For example:
class CustomUrlResovler : XmlUrlResolver
{
public override Uri ResolveUri(Uri baseUri, string relativeUri)
{
Uri uri = new Uri(baseUri, relativeUri);
if (IsUnsafeHost(uri.Host))
return null;
return base.ResolveUri(baseUri, relativeUri);
}
private bool IsUnsafeHost(string host)
{
return false;
}
}
Where IsUnsafeHost() is a custom function that check if the given host is allowed or not. See this post here on SO for few ideas. Just return null from ResolveUri() to save your code from this kind of attacks. In case the URI is allowed you can simply return the default XmlUrlResolver.ResolveUri() implementation.
To use it:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.XmlResolver = new CustomUrlResolver();
xmlDoc.LoadXml(OurOutputXMLString);
For more details about how XML external resources are resolved just read Resolving External Resources on MS Docs. If your code is more complex than this example then you should definitely read Remarks section for XmlDocument.XmlResolver property.
So its better to use
new XmlDocument { XmlResolver = null };
Interestingly from .net 4.5.2 and 4.6, the default resolver behaves differently and does not use an XmlUrlResolver upfront implicitly to resolve any urls or locations as i seen.
//In pre 4.5.2 it is a security issue.
//In 4.5.2 it will not resolve any more the url references in dtd and such,
//Still better to avoid the below since it will trigger security warnings.
new XmlDocument();
Setting the XmlReaderSettings.DtdProcessing to DtdProcessing.Prohibit works totally fine in .NET 4.7.2. Here is what i used to test.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE demo
[
<!ELEMENT demo ANY >
<!ENTITY % extentity SYSTEM "https://www.hl7.org/documentcenter/public/wg/structure/CDA.xsl">
%extentity;
]>
<test>
Some random content
</test>
Saved the above content in a file and read the file from the following fragment of c# code.
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.MaxCharactersFromEntities = 6000;
//The following stream should be the filestream of the above content.
XmlReader reader = XmlReader.Create(stream, settings);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
I get the following exception.
For security reasons DTD is prohibited in this XML document. To enable DTD
processing set the DtdProcessing property on XmlReaderSettings to Parse and
pass the settings into XmlReader.Create method.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlDocument.Load(XmlReader reader)

Problem validating XML against DTD in C#

This has been bugging me for a couple days. I'm trying to load a XML from an uploaded file to into an XmlDocument object and get the following yellow-screen-of-death:
For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method.
Here's my code. You can clearly see I'm setting ProhibitDtd to false.
public static XmlDocument LoadXml(FileUpload fu)
{
var settings = new XmlReaderSettings
{
ProhibitDtd = false,
ValidationType = ValidationType.DTD
};
var sDtdPath = string.Format(#"{0}", HttpContext.Current.Server.MapPath("/includes/dtds/2.3/archivearticle.dtd"));
settings.Schemas.Add(null, sDtdPath);
var r = XmlReader.Create(new StreamReader(fu.PostedFile.InputStream), settings);
var document = new XmlDocument();
document.Load(r);
return document;
}
Add XmlResolver=null to your XmlReaderSettings. This will prevent the xmlDocument from trying to access the DTD. If you need to validate, do that in a separate operation.

Categories