I'm trying to read an XML file with dtd verification but no mather how I do it seems like the program doesn't read my dtd file. I have concentrated the problem to a small xml file and a small dtd file:
test.xml - Located at c:\test.xml
<?xml version="1.0"?>
<!DOCTYPE Product SYSTEM "test.dtd">
<Product ProductID="123">
<ProductName>Rugby jersey</ProductName>
</Product>
test.dtd - located at c:\test.dtd
<!ELEMENT Product (ProductName)>
<!ATTLIST Product ProductID CDATA #REQUIRED>
<!ELEMENT ProductName (#PCDATA)>
My C# program looks like this
namespace XML_to_csv_converter
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
ReadXMLwithDTD();
}
public void ReadXMLwithDTD()
{
// Set the validation settings.
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.DTD;
settings.DtdProcessing = DtdProcessing.Parse;
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
settings.IgnoreWhitespace = true;
// Create the XmlReader object.
XmlReader reader = XmlReader.Create("c:/test.xml", settings);
// Parse the file.
while (reader.Read())
{
System.Console.WriteLine("{0}, {1}: {2} ", reader.NodeType, reader.Name, reader.Value);
}
}
private static void ValidationCallBack(object sender, ValidationEventArgs e)
{
if (e.Severity == XmlSeverityType.Warning)
Console.WriteLine("Warning: Matching schema not found. No validation occurred." + e.Message);
else // Error
Console.WriteLine("Validation error: " + e.Message);
}
}
}
This results in the output:
XmlDeclaration, xml: version="1.0"
DocumentType, Product:
Validation error: The 'Product' element is not declared.
Element, Product:
Validation error: The 'ProductName' element is not declared.
Element, ProductName:
Text, : Rugby jersey
EndElement, ProductName:
EndElement, Product:
I have tried to have the files in defferent locations and i have tried both relative and absolute paths. I have tried to copy an example from microsoft webpage and it resulted in the same problem. Someone have an idea of what can be the problem? Is there any way to see if the program was able to load the dtd file?
I cannot comment so I add an answer to the correct answer by Jim :
// SET THE RESOLVER
settings.XmlResolver = new XmlUrlResolver();
this is a breaking change between .Net 4.5.1 and Net 4.5.2 / .Net 4.6. The resolver was set by default to XmlUrlResolver before. Got stung by this.
You need to add the resolver.
XmlReaderSettings settings = new XmlReaderSettings();
// SET THE RESOLVER
settings.XmlResolver = new XmlUrlResolver();
settings.ValidationType = ValidationType.DTD;
settings.DtdProcessing = DtdProcessing.Parse;
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
settings.IgnoreWhitespace = true;
As long as the two files are in the same directory, this will work.
Alternatively you need to provide an URL to the DTD.
XmlUrlResolver can also be overridden to provide additional semantics to the resolution process.
Related
I am trying to use the XDocument class and XmlSchemaSet class to validate an XMl file.
The XML file already exists but I want to add in just a single element consisting of a couple other elements and I only want to validate this node.
Here is an example of the XML file. The piece I would like to validate is the TestConfiguration node:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Root>
<AppType>Test App</AppType>
<LabelMap>
<Label0>
<Title>Tests</Title>
<Indexes>1,2,3</Indexes>
</Label0>
</LabelMap>
<TestConfiguration>
<CalculateNumbers>true</CalculateNumbers>
<RoundToDecimalPoint>3</RoundToDecimalPoint>
</TestConfiguration>
</Root>
Here is my xsd so far:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="TestConfiguration"
targetNamespace="MyApp_ConfigurationFiles" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="TestConfiguration">
<xs:complexType>
<xs:sequence>
<xs:element name="CalculateNumbers" type="xs:boolean" minOccurs="1" maxOccurs="1"/>
<xs:element name="RoundToDecimalPoint" type="xs:int" minOccurs="1" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Here is the code I use to validate it:
private bool ValidateXML(string xmlFile, string xsdFile)
{
string xsdFilePath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location) ?? string.Empty, xsdFile);
Logger.Info("Validating XML file against XSD schema file.");
Logger.Info("XML: " + xmlFile);
Logger.Info("XSD: " + xsdFilePath);
try
{
XDocument xsdDocument = XDocument.Load(xsdFilePath);
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add(XmlSchema.Read(new StringReader(xsdDocument.ToString()), this.XmlValidationEventHandler));
XDocument xmlDocument = XDocument.Load(xmlFile);
xmlDocument.Validate(schemaSet, this.XmlValidationEventHandler);
}
catch (Exception e)
{
Logger.Info("Error parsing XML file: " + xmlFile);
throw new Exception(e.Message);
}
Logger.Info("XML validated against XSD.");
return true;
}
Even validating the full XML file, the validation will pass successfully causing me to run into problems when I try to load the XML file into the generated class file created by xsd2code, the error: <Root xmlns=''> was not expected..
How can I validate just the TestConfiguration piece?
Thanks
You have a few issues here:
Validating the entire document succeeds when it should fail.
This happens because the root node is unknown to the schema, and encountering an unknown node is considered a validation warning not a validation error - even if that unknown node is the root element. To enable warnings while validating, you need to set XmlSchemaValidationFlags.ReportValidationWarnings. However, there's no way to pass this flag to XDocument.Validate(). The question XDocument.Validate is always successful shows one way to work around this.
Having done this, you must also throw an exception in your validation handler when ValidationEventArgs.Severity == XmlSeverityType.Warning.
(As for requiring a certain root element in your XSD, this is apparently not possible.)
You need a convenient way to validate elements as well as documents, so you can validate your <TestConfiguration> piece.
Your XSD and XML are inconsistent.
You XSD specifies that your elements are in the XML namespace MyApp_ConfigurationFiles in the line targetNamespace="MyApp_ConfigurationFiles" elementFormDefault="qualified". In fact the XML elements shown in your question are not in any namespace.
If the XSD is correct, your XML root node needs to look like:
<Root xmlns="MyApp_ConfigurationFiles">
If the XML is correct, your XSD needs to look like:
<xs:schema id="TestConfiguration"
elementFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
After you have resolved the XSD and XML inconsistency from #3, you can solve issues #1 and #2 by introducing the following extension methods that validate both documents and elements:
public static class XNodeExtensions
{
public static void Validate(this XContainer node, XmlReaderSettings settings)
{
if (node == null)
throw new ArgumentNullException();
using (var innerReader = node.CreateReader())
using (var reader = XmlReader.Create(innerReader, settings))
{
while (reader.Read())
;
}
}
public static void Validate(this XContainer node, XmlSchemaSet schemaSet, XmlSchemaValidationFlags validationFlags, ValidationEventHandler validationEventHandler)
{
var settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.ValidationFlags |= validationFlags;
if (validationEventHandler != null)
settings.ValidationEventHandler += validationEventHandler;
settings.Schemas = schemaSet;
node.Validate(settings);
}
}
Then, to validate the entire document, do:
try
{
var xsdDocument = XDocument.Load(xsdFilePath);
var schemaSet = new XmlSchemaSet();
using (var xsdReader = xsdDocument.CreateReader())
schemaSet.Add(XmlSchema.Read(xsdReader, this.XmlSchemaEventHandler));
var xmlDocument = XDocument.Load(xmlFile);
xmlDocument.Validate(schemaSet, XmlSchemaValidationFlags.ReportValidationWarnings, XmlValidationEventHandler);
}
catch (Exception e)
{
Logger.Info("Error parsing XML file: " + xmlFile);
throw new Exception(e.Message);
}
And to validate a specific node, you can use the same extension methods:
XNamespace elementNamespace = "MyApp_ConfigurationFiles";
var elementName = elementNamespace + "TestConfiguration";
try
{
var xsdDocument = XDocument.Load(xsdFilePath);
var schemaSet = new XmlSchemaSet();
using (var xsdReader = xsdDocument.CreateReader())
schemaSet.Add(XmlSchema.Read(xsdReader, this.XmlSchemaEventHandler));
var xmlDocument = XDocument.Load(xmlFile);
var element = xmlDocument.Root.Element(elementName);
element.Validate(schemaSet, XmlSchemaValidationFlags.ReportValidationWarnings, this.XmlValidationEventHandler);
}
catch (Exception e)
{
Logger.Info(string.Format("Error validating element {0} of XML file: {1}", elementName, xmlFile));
throw new Exception(e.Message);
}
Now validating the entire document fails while validating the {MyApp_ConfigurationFiles}TestConfiguration node succeeds, using the following validation event handlers:
void XmlSchemaEventHandler(object sender, ValidationEventArgs e)
{
if (e.Severity == XmlSeverityType.Error)
throw new XmlException(e.Message);
else if (e.Severity == XmlSeverityType.Warning)
Logger.Info(e.Message);
}
void XmlValidationEventHandler(object sender, ValidationEventArgs e)
{
if (e.Severity == XmlSeverityType.Error)
throw new XmlException(e.Message);
else if (e.Severity == XmlSeverityType.Warning)
throw new XmlException(e.Message);
}
I have a xml file to which I want to add predefined namespeces.. Following is the code:
private const string uri = "http://www.w3.org/TR/html4/";
private static readonly List<string> namespaces = new List<string> { "lun" };
public static XElement AddNameSpaceAndLoadXml(string xmlFile) {
var nameSpaceManager = new XmlNamespaceManager(new NameTable());
// add custom namespace to the manager and take the prefix from the collection
namespaces.ToList().ForEach(name => {
nameSpaceManager.AddNamespace(name, string.Concat(uri, name));
});
XmlParserContext parserContext = new XmlParserContext(null, nameSpaceManager, null, XmlSpace.Default);
using (var reader = XmlReader.Create(#xmlFile, null, parserContext)) {
return XElement.Load(reader);
}
}
The problem is that the resulting xml in memory does not show the correct namespaces added. Also, they are not added at the root but are added next to the tag. Xml added below.
In the xml it is showing p3:read_data while should be lun:read_data.
How do i get to add the namespace on the root tag and not get the incorrect name.
Sample Input xml:
<config file-suffix="perf">
<overview-graph title="Top 5 LUN Reads" max-series="5" remove-series="1">
<counters lun:read_data=""/>
</overview-graph>
</config>
Output xml expected:
<config file-suffix="perf" xmlns:lun="http://www.w3.org/TR/html4/lun">
<overview-graph title="Top 5 LUN Reads" max-series="5" remove-series="1">
<counters lun:read_data="" />
</overview-graph>
</config>
Output that is coming using the above code:
<config file-suffix="perf" >
<overview-graph title="Top 5 LUN Reads" max-series="5" remove-series="1">
<counters p3:read_data="" xmlns:p3="http://www.w3.org/TR/html4/lun"/>
</overview-graph>
</config>
I am not sure if there is a better way, but adding the namespace manually seems to work.
using (var reader = XmlReader.Create(#xmlFile, null, parserContext)) {
var newElement = XElement.Load(reader);
newElement.Add(new XAttribute(XNamespace.Xmlns + "lun", string.Concat(uri, "lun")));
return newElement;
}
I don't know offhand a way to generalize this however (obviously you can add the whole set by enumerating it, but outputting only used namespaces might be interesting).
I am trying to read an XML feed to get the last post date. My xml looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<channel>
<title>mysite</title>
<atom:link href="http://www.mysite.com/news/feed/" rel="self" type="application/rss+xml" />
<link>http://www.mysite.com/news</link>
<description>mysite</description>
<lastBuildDate>Tue, 22 Nov 2011 16:10:27 +0000</lastBuildDate>
<language>en</language>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<generator>http://wordpress.org/?v=3.0.4</generator>
<item>
<title>My first post!</title>
<link>http://www.mysite.com/news/2011/11/22/docstore-v2-released/</link>
<comments>http://www.mysite.com/news/2011/11/22/docstore-v2-released/#comments</comments>
<pubDate>Tue, 22 Nov 2011 16:10:27 +0000</pubDate>
<dc:creator>mysite</dc:creator>
<category><![CDATA[News]]></category>
<category><![CDATA[Promotions]]></category>
<category><![CDATA[docstore]]></category>
I didn't show all of the xml since it is rather long.
My method, so far, looks like this:
private void button1_Click(object sender, EventArgs e)
{
var XmlDoc = new XmlDocument();
// setup the XML namespace manager
var mgr = new XmlNamespaceManager(XmlDoc.NameTable);
// add the relevant namespaces to the XML namespace manager
mgr.AddNamespace("ns", "http://purl.org/rss/1.0/modules/content/");
var webClient = new WebClient();
var stream = new MemoryStream(webClient.DownloadData("http://www.mysite.com/news/feed/"));
XmlDoc.Load(stream);
// **USE** the XML anemspace in your XPath !!
XmlElement NodePath = (XmlElement)XmlDoc.SelectSingleNode("/ns:Response");
while (NodePath != null)
{
foreach (XmlNode Xml_Node in NodePath)
{
Console.WriteLine(Xml_Node.Name + ": " + Xml_Node.InnerText);
}
}
}
I'm having a problem with it telling me:
Namespace Manager or XsltContext needed. This query has a prefix,
variable, or user-defined function.
All I want to pull out of this xml code is the 'lastBuildDate'. I'm going in circles trying to get this code right.
Can someone tell me what I am doing wrong here?
Thank you!
You're not using the namespace manager.
// **USE** the XML anemspace in your XPath !!
XmlElement NodePath = (XmlElement)XmlDoc.SelectSingleNode("/ns:Response", mgr);
There is only one of the element you are going after, you could go directly to it using the XPath. That element is also in the default namespace, so you do not need to do anything special to get to it. What about:
var XPATH_BUILD_DATE="/rss/channel/lastBuildDate";
private void button1_Click(object sender, EventArgs e){
var xmlDoc = new XmlDocument();
var webClient = new WebClient();
var stream = new MemoryStream(webClient.DownloadData("http://www.mysite.com/news/feed/"));
xmlDoc.Load(stream);
XmlElement xmlNode = (XmlElement)xmlDoc.SelectSingleNode(XPATH_BUILD_DATE);
Console.WriteLine(xmlNode.Name + ": " + xmlNode.InnerText);
}
If you did however need to dig into elements in a different namespace, you can do that also with the XPath (example, getting the dc:creator:
/rss/channel/item[1]/*[local-name() = 'creator']
I'm using System.Xml to get attributes from my xml file.
It seems that following code which I found somewhere is able to find nodes correctly however it doesn't recognizes attributes (it's weird because I've created this xml files with System.Xml too):
DataSet task_data = new DataSet("Root");
adapter.Fill(task_data); // MySqlDataAdapter is being used here
task_data.WriteXml(path, XmlWriteMode.WriteSchema);
So I don't know why any other xml which can be found on the internet works and mine which was created with the same module doesn't...
using System;
using System.Xml;
using System.IO;
public class Catalog
{
private XmlDocument xmldoc;
private string path = #"C:\Users\Me\Desktop\task.xml";
public static void Main()
{
Catalog c = new Catalog();
}
public Catalog()
//Constructor
{
FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
xmldoc = new XmlDocument();
xmldoc.Load(fs);
DisplayCatalog();
}
// Method for Displaying the catalog
private void DisplayCatalog()
{
XmlNodeList xmlnode = xmldoc.GetElementsByTagName("task");
Console.WriteLine("Here is the list of catalogs\n\n");
for (int i = 0; i < xmlnode.Count; i++)
{
XmlAttributeCollection xmlattrc = xmlnode[i].Attributes; //HERE IS THE PROBLEM!!!
Console.Write(xmlnode[i].FirstChild.Name);
Console.WriteLine(":\t\t" + xmlnode[i].FirstChild.InnerText);
Console.Write(xmlnode[i].LastChild.Name);
Console.WriteLine(":\t" + xmlnode[i].LastChild.InnerText);
Console.WriteLine();
}
Console.WriteLine("Catalog Finished");
}
//end of class
}
This is the xml you linked to contins no attributes only nodes.
<?xml version="1.0" standalone="yes"?>
<Root>
<task>
<TaskId>1</TaskId>
<TaskDelegatorNote>Presentation</TaskDelegatorNote>
<StartTime>PT10H</StartTime>
<EndTime>PT13H</EndTime>
<TaskEndDate>2011-01-02T00:00:00+00:00</TaskEndDate>
<TaskContractorNote>Done</TaskContractorNote>
<TaskStatus>3</TaskStatus>
<LastModification>Me, 2003-05-15 13:48:59</LastModification>
</task>
<task>
<TaskId>2</TaskId>
<TaskDelegatorNote>It must be done.</TaskDelegatorNote>
<StartTime>PT10H</StartTime>
<EndTime>PT13H</EndTime>
<TaskEndDate>2011-01-02T00:00:00+00:00</TaskEndDate>
<TaskContractorNote />
<TaskStatus>2</TaskStatus>
<LastModification>Admin, 2009-08-04 10:30:49</LastModification>
</task>
</Root>
Here's an xml snippint with a TaskId attribute
<task TaskId = 1>
</task>
To fix this change
Console.Write(xmlattrc[0].Name);
Console.WriteLine(":\t\t" + xmlattrc[0].Value);
to
Console.Write(xmlnode[0].ChildNodes[0].Name);
Console.WriteLine(":\t\t" + xmlnode[0].ChildNodes[0].Value);
Your output would be
Here is the list of catalogs
TaskId:
TaskId: 1
LastModification: Me, 2003-05-15 13:48:59
TaskId:
TaskId: 2
LastModification: Admin, 2009-08-04 10:30:49
Catalog Finished
Press any key to continue . . .
Also you should look at LinqToXML for some other ways of doing projections of your xml nodes
I have looked at many examples for validating an XML file against a DTD, but have not found one that allows me to use a proxy. I have a cXml file as follows (abbreviated for display) which I wish to validate:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.018/InvoiceDetail.dtd">
<cXML payloadID="123456" timestamp="2009-12-10T10:05:30-06:00">
<!-- content snipped -->
</cXML>
I am trying to create a simple C# program to validate the xml against the DTD. I have tried code such as the following but cannot figure out how to get it to use a proxy:
private static bool isValid = false;
static void Main(string[] args)
{
try
{
XmlTextReader r = new XmlTextReader(args[0]);
XmlReaderSettings settings = new XmlReaderSettings();
XmlDocument doc = new XmlDocument();
settings.ProhibitDtd = false;
settings.ValidationType = ValidationType.DTD;
settings.ValidationEventHandler += new ValidationEventHandler(v_ValidationEventHandler);
XmlReader validator = XmlReader.Create(r, settings);
while (validator.Read()) ;
validator.Close();
// Check whether the document is valid or invalid.
if (isValid)
Console.WriteLine("Document is valid");
else
Console.WriteLine("Document is invalid");
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
static void v_ValidationEventHandler(object sender, ValidationEventArgs e)
{
isValid = false;
Console.WriteLine("Validation event\n" + e.Message);
}
The exception I receive is
System.Net.WebException: The remote server returned an error: (407) Proxy Authentication Required.
which occurs on the line while (validator.Read()) ;
I know I can validate against a DTD locally, but I don't want to change the xml DOCTYPE since that is what the final form needs to be (this app is solely for diagnostic purposes). For more information about the cXML spec, you can go to cxml.org.
I appreciate any assistance.
Thanks
It's been a while since your question, so sorry if it's a bit late!
Here's what seems to be the approved way to do it:
1 - Create your very own proxy assembly:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Configuration;
namespace ProxyAssembly
{
public class MyProxy:IWebProxy
{
#region IWebProxy Members
ICredentials IWebProxy.Credentials
{
get
{
return new NetworkCredential(ConfigurationSettings.AppSettings["googleProxyUser"],ConfigurationSettings.AppSettings["googleProxyPassword"],ConfigurationSettings.AppSettings["googleProxyDomain"]);
}
set { }
}
public Uri GetProxy(Uri destination)
{
return new Uri(ConfigurationSettings.AppSettings["googleProxyUrl"]);
}
public bool IsBypassed(Uri host)
{
return Convert.ToBoolean(ConfigurationSettings.AppSettings["bypassProxy"]);
}
#endregion
}
}
2 - Put the needed keys into your web.config:
<add key="googleProxyUrl" value="http://proxy.that.com:8080"/>
<add key="googleProxyUser" value="service"/>
<add key="googleProxyPassword" value="BadDay"/>
<add key="googleProxyDomain" value="corporation"/>
<add key="bypassProxy" value="false"/>
3 - Put in a defaultProxy section into your web.config
<configuration>
<system.net>
<defaultProxy>
<module type="ProxyAssembly.MyProxy, ProxyAssembly"/>
</defaultProxy>
</system.net>
</configuration>
Now ALL requests from your application will go through the proxy. That's ALL requests - ie I don't think that you can select to use this programatically, every resource request will try to go through the proxy! eg: validating xml using dtd docs, webservice calls, etc.
Cheers,
Lance