I have an XSD file and want to get a list of the names of all the elements in it. I don't mean stuff like <xs:sequence> and so on, just the "real stuff", that actually can appear in XML that are valid according to the XSD.
Real stuff is a bit vague
But if you just want want all elements it's just a it of Xpath.
If you want a tree, then you can't avoid sequence etc.
If you have things like xs:choice in there you have even more issues.
Then there's attributes...
From SimpleContent or ComplexType...
Might be easier to generate a 'blank' xml document from the xsd and then get what you want out of that. That's a fair chunk of code as well though. Might be one lying around you can borrow though.
If you don't actually want to do this from your code, you could use the XML Schema Definition Tool (Xsd.exe) to create source code for runtime objects.
From there you can use Xml serialization to create valid Xml samples for your given Xsd schema.
Since you're trying to code for this, I would assume you want to do this against different XML Schema files, over and over; if true, it would be then important to understand if you really have to embed this in your codebase, or if it can be used as an external tool.
If you really want to do it, most of all you need is in System.Xml.Schema package. Start with an XmlSchemaSet to load and compile your XSD files. Then using an iterator on GlobalElements, go over the global elements that can show as your root elements in XML document and traverse those (for what you need, use the PSVI properties); as someone else was mentioning, there will be types to go through, compositors, etc.; and then there's more: abstract elements (those can't show up in XML, neither references to abstract elements, instead members of substitution groups), prohibited attributes, restricted types, etc.
I've recently answered another post that may be related to your need; your posted XML Schema may look like this:
root/ship/engine/#MaxSpeed,A,1..1,True
root/ship/crew/#function,A,1..1,True
root/ship/#Name,A,1..1,True
root/ship/#class,A,1..1,True
root/ship/special_abilities/hull/#separable,A,0..1,False
root/ship/special_abilities/hull/#canCarryWesley,A,0..1,False
root/ship/special_abilities/hull/#capableOfLanding,A,0..1,False
If you want, you can deal only with the first column; the generated XPath shows only those items (elements or attributes) that have data; processing something like the above might be much easier (split the string using /, elements are all but #, etc.)
Related
First I load the file in a structure
XElement xTree = XElement.Load(xml_file);
Then I create an enumerable collection of the elements.
IEnumerable<XElement> elements = xTree.Elements();
And iterate elements
foreach (XElement el in elements)
{
}
The problem is - when I fail to parse the element (a user made a typo or inserted a wrong value) - how can I report exact line in the file?
Is there any way to tie an element to its corresponding line in the file?
One way to do it (although not a proper one) –
When you find a wrong value, add an invalid char (e.g. ‘<’) to it.
So instead of: <ExeMode>Bla bla bla</ExeMode>
You’ll have: <ExeMode><Bla bla bla</ExeMode>
Then load the XML again with try / catch (System.Xml.XmlException ex).
This XmlException has LineNumber and LinePosition.
If there is a limited set of acceptable values, I believe XML Schemas have the concept of an enumerated type -- so write a schema for the file and have the parser validate against that. Assuming the parser you're using supports Schemas, which most should by now.
I haven't looked at DTDs in decades, but they may have the same kind of capability.
Otherwise, you would have to consider this semantic checking rather than syntactic checking, and that makes it your application's responsibility. If you are using a SAX parser and interpreting the data as you go, you may be able to get the line number; check your parser's features.
Otherwise the best answer I've found is to report the problem using an xpath to the affected node/token rather than a line number. You may be able to find a canned solution for that, either as a routine to run against a DOM tree or as a state machine you can run alongside your SAX code to track the path as you go.
(All the "maybe"s are because I haven't looked at what's currently available in a very long time, and because I'm trying to give an answer that is valid for all languages and parser implementations. This should still get you pointed in some useful directions.)
supposing O have these two equivalent (at least they are supposed to be) XML schemas. The actual XML will eventually be parsed by C#. I think the second way is 'more correct' since I will get attributes as actual attrbutes, instead of child elements, correct?
<?xml version="1.0" encoding="ISO-8859-1"?>
<switch>
<switch_name>switch1</switch_name>
<software_version>1</software_version>
<vendor>Cisco</vendor>
<ip_address>1.1.1.1</ipaddress>
<linecard>
<model_type>12345</model_type>
<fcport>
<slot> 1</slot>
<port> 1</port>
<speed>4</speed>
</fcport>
</linecard>
</switch>
<switch>
<switch name="switch1" version="1" vendor="Cisco" ip_address="1.1.1.1">
<linecard model="12345">
<fcport slot="1" port="1" speed="4">
</fcport>
<linecard>
</switch>
</xml>
Neither one is strictly more "correct" than the other, both will work for your example. Neither breaks any rules.
That said, I think I agree with W3Schools on this one, in that data should go inside child elements rather than attributes. Especially things like IP addresses just FEEL like data that should be a child element rather than an attribute. Attributes I typically use for metadata, such as auto generated IDs.
This is especially true if you later want to account for expansion -- for example, what if you want to associate multiple IPs? With child elements you can just add another element, but with attributes you have to come up with a new attribute name for each addition (ip1, ip2, ip3...).
There are no "right" way of representing data in XML when choosing between using elements or attributes for properties of an entity. Choose whatever works for you.
Generally elements give more freedom as you may have sub-elements eventually. I.e. if property is list of some sort representing it as comma-separated value in attribute looks very non-XML.
Side note: "XML schema" usually means different thing - structured schema for XML... what you have I'd call "representation of data in XML".
The classic article on how to choose between elements and attributes is here:
http://xml.coverpages.org/elementsAndAttrs.html
I note that at the end of the page it quotes John Cowan quoting me: "Beginnners always ask this question. Those with a little experience express their opinions passionately. Experts tell you there is no right answer'."
Currently I have a solution that builds an XML document in a number of sections and then validates the final concatenated xml against a single schema. Is it possible to use a subset of the same schema to validate each section individually?
The answer is yes in most of the cases. For a disclaimer, in theory someone could intentionally write an XML Schema that would make some of my proposals impossible, but then that would be just bad practice in XSD authoring.
For a straightforward solution, the following assumptions should be true:
A section is well formed XML; you're concatenating XmlElement nodes. E.g.:
<section-element ... attribute content>
... more content
</section-element>
Each of the sections being merged has a matching global element declaration in your XML Schema set. If you use the xsi:type attribute for any of your sections, things might get a bit tricky, but not hard to fix.
The validation would be common code, where the XmlReader would be an XmlNodeReader on the node you're concatenating. Use the XmlReaderSettings as usual...
The above would work for any XSD (you don't have a design time dependency of knowing the XSD). For anything below, the code would have to match your XSD...
If you don't have the matching global elements in the XML Schema then you have to look at the type of each matching local element declaration. If the type is global, then you can easily create, in memory, dummy elements that match your sections, of the global type (assuming a Venetian Blind authoring style).
If even the type is anonymous (more of a Russian Doll style), then you can even fake that, by creating a global element with a type that is a copy of the anonymous type - all in memory.
Yes, I really am going to ask about parsing XML with regexes... here goes.
I have some XML-ish data, and I need to parse it. I can't do it completely with an XMLDocument or similar because it's not proper XML, and I'm not sure I can (or want to) change the format. The main problem is tags which have special meaning, and look like this:
<$ something_here $>
C#'s XmlDocument falls over parsing that, and I assume other methods will too. I could, with a lot of work, change the above to something like
<some_special_tag><![CDATA[ something_here ]]></some_special_tag>
But that's ugly, and I don't really want to. The reason it would be time consuming to change is that I have hundreds, maybe thousands of XML documents which would need to be changed.
At the moment, I'm parsing the document with regexes. I only need to pick out a couple of specific tags (not the ones above), and it seems to be working, but I'm uncomfortable with it. I'm doing something like this at the moment:
...
MatchCollection mc = Regex.Matches(Template, "<tagname.*?/tagname>"); // or similar
foreach (Match m in mc) {
try {
XmlDocument xd = new XmlDocument();
xd.LoadXml(m.Value);
...
This at least means I'm not using regexes exclusively :)
Can anyone think of a better way? Is there some way of getting XmlDocument to politely ignore the $ character that causes it to fall over? It doesn't seem likely, but I thought I should at least get some opinions.
No, there is no way to get XmlDocument to parse a document which isn't xml, no matter how close to xml it might look!
If its possible to do then I would definitely recommend that you convert your documents to be actual xml (or at least some recognised document format). Trying to create and maintain a reliable working parser for any format is quite a lot of work, let alone a format that doesn't appear to be rigeriously defined.
Using a some_special_tag element to identify special sections seems like a good idea to me. If necessary you can use a different namespace to ensure no clashes with other elements in your document - this is in fact exactly the way that xslt works ("special" tags are used to mean special things, like templates or nodes that should be replaced) and exactly what xml was designed to support.
Also I don't understand why you would need to place the something_here bit in CDATA sections. All characters that "break" xml can be escaped fairly easily (for example by writing < as <). CDATA sections are generally only used when the contents of a node needs so much escaping that its easier and less messy to just to use CDATA sections instead.
Update: Regarding migration to a new format, can you not use both methods? Attempt to parse the document as an XML document (or if there are performance concerns then perform some other test to quickly determine if the document is in the "old" or "new" format such as checking for a version attribute in the root element) - if it doesn't work then fall back to the old method.
This way as long as everything is working fine (which is will be as long as nothing changes) users don't need to modify their documents, however if they run into problems or want to use any new features then explain to them that they must update their document to the new format.
Depending on how well your current "parser" works, you may even be able to provide an upgrade utility that automatically performns the conversion (as best it can).
Can't you replace <$ something_here $> to that big CDATA section at run-time and then load the XML document as usual?
Is there a simple way to compare two XML structures to determine if they have the same structure and data?
I have a function that returns an XmlNode and I am trying to write unit tests for it. I store the correct XML result in a file. Durring the test I load the file into an XmlDocument, locate the proper XmlNode and compare against the result of the function. A straight compare does not work (as expected) and InnerXml does not work either.
I am considering removing all whitespace from InnerXml and comparing that, or writing my own compare to walk the tree, but I don't like either option much.
XNode.DeepEquals. Read the caveats before using it.
If you must use XmlDocument and its supporting types, consider using Microsoft's XmlDiffPatch, which performs customizable diff-operations on XML data structures.
Like CodeToGlory answered, XNode.DeepEquals() might fit your bill, check the remarks section on the MSDN page.
If you are stuck with XmlDocument (instead of XDocument), the answer is: No, there is no simple (existing way) to do it. XmlNode does not override Equals(), or provide an alternative. But it is not impossible to write, and that same Remarks section can be used as a starting point for a tree-walk algorithm.
Do get a clear picture of your requirements first, concerning Attributes, comments, CDATA nodes etc.