Querying XML Without Worrying about Namespaces - c#

I have XML with and without a prefix on elements, but no namespaces defined for any of them. When I try to load this, it gives me an error on XDocument.Load (at least, I think that's where it happens) that certain prefixes are not defined. Is there a way to tell the framework to ignore any namespace prefixes? I'm using LINQ to XML, but could use something else if available.
I can't necessarily pre-define them because I'm going to be working with a variety of documents that may or may not have a prefix defined and no definitive xmlns declaration.

Aren't prefixes supposed to represent an abbreviation for a namespace? I believe you need to clean up those prefixes that have no namespace associated with them in the first place before processing it, since it isn't valid XML. A quick regex to replace all prefixes of the form </prefix: with </: and <prefix: with < should do it.
To do this, first replace the following regex matches
</.*?: with </
and <.*?: with < (do not change the ordering).

An approach to what you want to do may be using XmlDocument:
XmlDocument d = new XmlDocument();
using (var textReader = new XmlTextReader(#"test.xml"))
{
textReader.Namespaces = false;
d.Load(textReader);
}
You will lose the power of querying the data using the syntax of LINQ to XML.
You can actually use LINQ to XML and ignore the namespace by setting for each prefix in the file the folowing line
nameSpaceManager.AddNamespace("prefixName", "urn:ignore");
where nameSpaceManager is of type XmlNamespaceManager.
But from your question i sense that this is not a reasonable solution.

Related

XmlDocument.SelectSingleNode omit prefixes and namespaces

This question is a follow up of the answered question:
XmlDocument.SelectSingleNode and prefix + xmlNamespace issue
The problem is that its possible that in the future the namespaces prefixes of the xml received will be changed without warning, so we would like to know if there is any way of using SelectSingleNode but omitting the prefix of an element.
(We know we could remove all the prefixes of the incoming xml but it would require more steps....albeit we would consider it a valid answer if code is provided...)
It doesn't matter if the prefix names change, as long as the namespace URIs do not change.
The prefix name you use in your code and the one in the XML document do not have to match, e.g.
namespaces.AddNamespace("foo", "http://exception.do29.imq.es/xsd");
XmlNode nodemsg = xmldocu.SelectSingleNode("//foo:message", namespaces);
Its possible to omit it using * in the xpath, for example:
//*[local-name()='ElementName']
Based on the question XPath select node with namespace
As an extra this tool was very useful to test different xpaths.... http://xpathvisualizer.codeplex.com/

Validate xml subset using schema subset in c# .net XmlDocument

Currently I have a solution that builds an XML document in a number of sections and then validates the final concatenated xml against a single schema. Is it possible to use a subset of the same schema to validate each section individually?
The answer is yes in most of the cases. For a disclaimer, in theory someone could intentionally write an XML Schema that would make some of my proposals impossible, but then that would be just bad practice in XSD authoring.
For a straightforward solution, the following assumptions should be true:
A section is well formed XML; you're concatenating XmlElement nodes. E.g.:
<section-element ... attribute content>
... more content
</section-element>
Each of the sections being merged has a matching global element declaration in your XML Schema set. If you use the xsi:type attribute for any of your sections, things might get a bit tricky, but not hard to fix.
The validation would be common code, where the XmlReader would be an XmlNodeReader on the node you're concatenating. Use the XmlReaderSettings as usual...
The above would work for any XSD (you don't have a design time dependency of knowing the XSD). For anything below, the code would have to match your XSD...
If you don't have the matching global elements in the XML Schema then you have to look at the type of each matching local element declaration. If the type is global, then you can easily create, in memory, dummy elements that match your sections, of the global type (assuming a Venetian Blind authoring style).
If even the type is anonymous (more of a Russian Doll style), then you can even fake that, by creating a global element with a type that is a copy of the anonymous type - all in memory.

Loading XML Document - Name cannot begin with the zero character

I am trying to load something which claims to be an XML document into any type of .net XML object: XElement, XmlDocument, or XmlTextReader. All of them throw an exception :
Name cannot begin with the '0' character, hexadecimal value 0x30
The error related to a bit of 'XML'
<chart_value
color="ff4400"
alpha="100"
size="12"
position="cursor"
decimal_char="."
0=""
/>
I believe the problem is the author should not have named an attribute as 0.
If I could change this I would, but I do not have control of this feed. I suppose those who use it are using more permissive tools. Is there anyway I can load this as XML without throwing an error?
There is no XML declaration either, nor namespace or contract definition. I was thinking I might have to turn it into a string and do a replace, but this is not very elegant. Was wondering if there was any other options.
As many have said, this is not XML.
Having said that, it's almost XML and WANTS to be XML, so I don't think you should use a regex to screw around inside of it (here's why).
Wherever you're getting the stream, dump into into a string, change 0= to something like zero= and try parsing it.
Don't forget to reverse the operation if you have to return-to-sender.
If you're reading from a file, you can do something like this:
var txt = File.ReadAllText(#"\path\to\wannabe.xml");
var clean = txt.Replace("0=", "zero=");
var doc = new XmlDocument();
doc.LoadXml(clean);
This is not guaranteed to remove all potential XML problems -- but it should remove the one you have.
Just replace the Numeric value with '_'
Example: "0=" replace to "_0="
I hope that will fix the problem, thanks.
It might claim to be an XML document, but the claim is clearly false, so you should reject the document.
The only good way to deal with bad XML is to find out what bit of software is producing it, and either fix it or throw it away. All the benefits of XML go out of the window if people start tolerating stuff that's nearly XML but not quite.
The 0="" obviously uses an invalid attribute name 0. You'd probably have to do a find/replace to try and fix the XML if you cannot fix it at the source that created it. You might be able to use RegEx to try to do more efficient manipulation of the XML string.

replacing substring inside attributes of XmlDocument

I'm using C# with .net 3.5 and have a few cases where I want to replace some substrings in the XML attributes of an XmlDocument with something else.
One case is to replace the single quote character with ' and the other is to clean up some files that contain valid XML but the attributes' values are no longer appropriate (say replace anything attribute which starts with "myMachine" with "newMachine").
Is there a simple way to do this, or do I need to go through each attribute of every node (recursively)?
One way to approach it is to select a list of the correct elements using Linq to XML, and then iterate over that list. Here's an example one-liner:
XDocument doc = XDocument.Load(path);
doc.XPathSelectElements("//element[#attribute-name = 'myMachine']").ToList().ForEach(x => x.SetAttributeValue("attribute-name", "newMachine"));
You could also do a more traditional iteration.
I suggest taking a look at LINQ to XML. There's a collection of code snippets that can help you get started here - LINQ To XML Tutorials with Examples
LINQ to XML should allow you to do what you're looking to do, and you'll probably find it easy once you've played with it a bit.

Declare namespaces within XPath expression

My application needs to evaluate XPath expression against some XML data. Expression is provided by user at runtime. So, I cannot create XmlNamespaceManager to pass to XPathEvaluate because I don't know prefixes and namespaces at compile time.
Is there any possibility to specify namespaces declaration within xpath expression?
Answers to comments:
XML data has one default namespace but there can be nested elements with any namespaces. User knows namespaces of the data he works with.
User-provided xpath expression is to be evaluated against many XML documents, and every document can have its own prefixes for the same namespaces.
If the same prefix can be bound to different namespaces and prefixes aren't known in advance, then the only pure XPath way to specify such expressions is to use this form of referring to elements:
someName[namespace-uri() = 'exactNamespace']
So, a particular XPath expression would be:
/*/a[namespace-uri() = 'defaultNS']/b[namespace-uri() = 'NSB']
/c[namespace-uri() = 'defaultNS']
I don't know any way to define a namespace prefix in an XPath expression.
But you can write the XPath expression to be agnostic of namespace-prefixes by using local-name() and namespace-uri() functions where appropriate.
Or if you know the XML-namespaces in advance, you can register an arbitrary prefix for them in the XmlNamespaceManager and tell your user to use that prefix in the XPath expression. It doesn't matter if the XML document itself registers a different prefix or no prefix at all. Path resolution is based on the namespace alone, not on the prefix.
Another option would be to scan the document at runtime (use XmlReader for low resource overhead if you haven't loaded it already) and then add the used mappings in the document in the XmlNamespaceManager. I'm not sure if you can get the namespaces and prefixes from XmlDocument, but I see no direct method to do it. It's easy with XmlReader though, since it exposes NamespaceURI and Prefix members for each node.
Is there any possibility to specify namespaces declaration within xpath expression?
The answer is no - it's always done in the calling environment (which is actually more flexible).
An alternative would be to use XQuery, which does allow declaring namespaces in the query prolog.
UPDATE (2020)
In XPath 3.1 you can use the syntax /*/Q{http://my-namespace}a.
Sadly, though, if you're still using Microsoft software, then the situation hasn't changed since 2011 - you're still stuck with XPath 1.0 with all its shortcomings.

Categories