Get all possible XPath expressions with XPathNavigator class?

Get all possible XPath expressions with XPathNavigator class? - c#

I already have an algorithm to retrieve the XPath expressions of an Xml:
Avoid recursion on this function XML related
However, it is imperfect, unsecure, and it needs a lot of additional decoration to format the obtained expressions.
( for an example of desired formatting I mean this: Get avaliable XPaths and its element names using HtmlAgilityPack )
Then, I recently discovered the XPathNavigator class, and to improve in any way the reliability of my current code, I would like to know if with the XPathNavigator class I could retrieve all the XPath exprressions of the Xml document, because that way my algorithm could be based in the efficiency of the .Net framework logic and their rules instead of the imperfect logic of a single programmer.
I search for a solution in C# or Vb.Net.
This is what I tried:
Dim xDoc As XDocument =
<?xml version="1.0"?>
<Document>
<Tests>
<Test>
<Name>A</Name>
<Value>0.01</Value>
<Result>Pass</Result>
</Test>
<Test>
<Name>A</Name>
<Value>0.02</Value>
<Result>Pass</Result>
</Test>
<Test>
<Name>B</Name>
<Value>1.01</Value>
<Result>Fail</Result>
</Test>
</Tests>
</Document>
Dim ms As New MemoryStream
xDoc.Save(ms, SaveOptions.None)
ms.Seek(0, SeekOrigin.Begin)
Dim xpathDoc As New XPathDocument(ms)
Dim xpathNavigator As XPathNavigator = xpathDoc.CreateNavigator
Dim nodes As XPathNodeIterator = xpathNavigator.Select(".//*")
For Each item As XPathNavigator In nodes
Console.WriteLine(item.Name)
Next
With that code I only managed to get this (undesired)kind of output:
Document
Tests
Test
Name
Value
Result
Test
Name
Value
Result
Test
Name
Value
Result
Test
Name
Value
Result
Is possibly to extract all the XPath expressions using the XPathNavigator class?.

No, that's not possible. There are many, many ways to select a particular node with XPath. You might settle on some notion of the "canonical" XPath for any given node, but even that sounds hard to specify, and XPathNavigator has no such notion built in to help you.

Related

How to find nodes in XML File using Line number C#?

I have below XML files. While I'm using it in Process its gives me error like
[2016-06-22 19:29:53 IST] ERROR: "Line 15 column 20: character content of element "electronics" invalid; must be equal to "foods", "goods" or "mobiles" at XPath
/products/product[1]/item_type"
This XPath /products/product[1] is not correct. it's returns by their system.
So, I have to search it and replace with original value through Line No only.
Order.xml
<products>
<product>
<country>AE</country>
<sale>false</sale>
<item_type>foods</item_type>
</product>
<product>
<country>IN</country>
<sale>false</sale>
<item_type>goods</item_type>
</product>
<product>
<country>US</country>
<sale>false</sale>
<item_type>electronics</item_type>
</product>
<product>
<country>AM</country>
<sale>false</sale>
<item_type>mobiles</item_type>
</product>
</products>
Please let me know to do search using Line No(15) in XML document

You should not rely on line/colmns numbers when dealing with xml. Xml does not care much for whitespace and formatting is a merely convenience for readabilty; Although it is correctly reported in your example, I would not trust it will alway be that way in future. Interstingly the xpath is also incorrect but lets not dwell on that for the moment.
You need to take back control of the problem... You should create your own validating reader and capture the validation error while the xml is being parsed. How you do this will depend on which framework your using. Either XmlValidatingReader or, XmlReader with the appropriate XmlReaderSettings.
You can set up a callback/eventhandler to capture the error as the xml file is being read, so you can be certain your in the correct place and have all the information you need to handle the error to hand. Using an XmlReader will also allow you to continue to process the entire file and not just stop at the first error.
The code is too big for SO and we'd require a lot more information from you to do it but a google search will find lots of examples including this one from microsoft: https://msdn.microsoft.com/en-us/library/w5aahf2a(v=vs.100).aspx
Construct a new XmlReaderSettings instance.
Add an XML schema to the Schemas property of the XmlReaderSettings instance.
Specify Schema as the ValidationType.
Specify ValidationFlags and a ValidationEventHandler to handle schema validation errors and warnings encountered during validation.
Pass the XmlReaderSettings object to the Create method of the XmlReader class along with the XML document, creating a schema-validating XmlReader.
Call Read() to run through the xml from start to end.

Would this work for you?
string[] lines = File.ReadAllLines(#"C:\order.xml", Encoding.UTF8);
var line15 = lines[14];

XmlDocument doc = new XmlDocument();
doc.Load(#"C:\order.xml");
XmlElement el =(XmlElement)doc.SelectSingleNode("/products/product[1]/item_type");
if(el != null) { el.ParentNode.RemoveChild(el); }
you can remove node by using above code...using proper Xpath you can remove all the item_type which is not equals to foods,good or mobiles.

Try this,
var xml = XDocument.Load(#"C:\order.xml", LoadOptions.SetLineInfo);
var lineNumbers = xml.Descendants()
.Where(x => !x.Descendants().Any() && //exact node contains the value
x.Value.Contains("foods"))
.Cast<IXmlLineInfo>()
.Select(x => x.LineNumber);
int getLineNumber = lineNumbers.First();

Find Element when XPath is variable

I am trying to compose an algorithm that will take XML as input, and find a value associated with a particular element, but the position of the element within the XML body varies. I have seen many examples of using XDocument.Descendants() but most (if not all) the examples expect the structure to be consistent, and descendants well known. I presume I will need to recurse the XML to find this value, but before heading that way, ask the general population.
What is the best way to find an element in an XDocument when the path for the element may be different on each call? Just need the first occurrence found, not in any particular order. Can be first occurrence found by going wide, or by going deep.
For example, if I am trying to find the element "FirstName", and if the input XML for Call one looks like..
<?xml version="1.0"?>
<PERSON><Name><FirstName>BOB</FirstName></Name></PERSON>
and the next call looks like:
<?xml version="1.0"?>
<PERSONS><PERSON><Name><FirstName>BOB</FirstName></Name></PERSON></PERSONS>
What do you recommend? Is there a "Find" option in XDocument that I have not seen?
UPDATE:
Simple example above works with lazyberezovsky answer of XDocument.Descendents, but real XML does not. My problematic XML...
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header>
<To s:mustUnderstand="1" xmlns="http://schemas.microsoft.com/ws/2005/05/addressing/none">http://localhost:52087/Service1.svc</To>
<Action s:mustUnderstand="1" xmlns="http://schemas.microsoft.com/ws/2005/05/addressing/none">http://tempuri.org/IService1/GetDataUsingDataContract</Action>
</s:Header>
<s:Body>
<GetDataUsingDataContract xmlns="http://tempuri.org/">
<composite xmlns:a="http://schemas.datacontract.org/2004/07/WcfService2" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:BoolValue>false</a:BoolValue>
<a:Name>
<a:FirstName>BOB</a:FirstName>
</a:Name>
<a:StringValue i:nil="true" />
</composite>
</GetDataUsingDataContract>
</s:Body>
</s:Envelope>
UPDATE:
lazyberezovsky helped immensely showing me how Descendents is supposed to work. Be careful of namespaces in XML. Lesson learned. Found another . article with similar issues..
Search XDocument using LINQ without knowing the namespace
Resolved using the following snippet...
var xdoc = XDocument.Parse(xml);
var name = (from p in xdoc.Descendants() where p.Name.LocalName == "FirstName" select p.Value).FirstOrDefault();

When you are using Descendants for finding first occurrence of element, you don't need to know
structure of file. Following code will work for both your cases:
var xdoc = XDocument.Load(path_to_xml);
var name = (string)xdoc.Descendants("FirstName").FirstOrDefault();
Same with XPath
var name = (string)xdoc.XPathSelectElement("//FirstName[1]");

Without knowing all the possible permutations of the XML document (which is very unusual by the way) I don't think anyone could hope to give you any worthwhile recommendations.

"Just need the first occurrence found, not in any particular order." I think Descendants do trick. Look at this:
string xml = #"<?xml version=""1.0""?>
<PERSONS>
<PERSON>
<Name>
<FirstName>BOB</FirstName>
</Name>
</PERSON>
</PERSONS>";
XDocument doc = XDocument.Parse(xml);
Console.WriteLine(string.Join(",", doc.Descendants("FirstName").Select(e =>(string)e)));
xml = #"<?xml version=""1.0""?>
<PERSON>
<Name>
<FirstName>BOB</FirstName>
</Name>
</PERSON>";
doc = XDocument.Parse(xml);
Console.WriteLine(string.Join(",", doc.Descendants("FirstName").Select(e =>(string)e)));

How to deal with namespaces in XML in XmlDocument c#

I have several XML documents, all of which have the same structure (element names, attribute names and hierarchy).
However, some of the elements and attribute have custom namespaces in each XML document which are not known at design time. They change, don't ask...
How can I deal with this when traversing the documents using a single set of XPath?
Should I remove all the namespaces before processing?
Can I automatically register all namespaces with an XmlNamespaceManager?
Any thoughts?
Update: some examples (with namespace declarations omitted for clarity):
<root>
<child attr="val" />
</root>
<root>
<x:child attr="val" />
</root>
<root>
<y:child z:attr="val" />
</root>
Thanks

Suppose you have following xml:
<root xmlns="first">
<el1 xmlns="second">
<el2 xmlns="third">...
You can write you queries to ignore namespaces in the following way:
/*[local-name()='root']/*[local-name()='el1']/*[local-name()='el2']
etc.
Of course you can iterate over the whole document to get namespaces and load them into nsmanager. But in general case this will cause you to evaluate every node in the document. In this case it will be faster to just treat document as a tree of objects and don't use XPath.

I believe you'll find some good insight in this Stackoverflow thread
XPath + Namespace Driving me crazy
In my opinion you have either of two solutions:
1- If the set of all possible namespaces are know before hand, then you can register them all in a XmlNamespaceManager before you begin parsing
2- Use Xpath namespace-agnostic selectors
Of course you can always scrub the xml document from any inline namespaces and start your parsing on a clean unfiorm xml without namespace.. but honestly I don't see the gain in adding this overhead step.

Scott Hanselman has a nice article about extracting all of the XML Namespaces in an XML document. Presumably, when you get all of the XML Namespaces, you can just iterate over all of them and register them in your namespace manager.

You could try something like this to strip the namespaces:
//Implemented based on interface, not part of algorithm
public string RemoveAllNamespaces(string xmlDocument)
{
return RemoveAllNamespaces(XElement.Parse(xmlDocument)).ToString();
}
//Core recursion function
private XElement RemoveAllNamespaces(XElement xmlDocument)
{
if (!xmlDocument.HasElements)
{
XElement xElement = new XElement(xmlDocument.Name.LocalName);
xElement.Value = xmlDocument.Value;
return xElement;
}
return new XElement(xmlDocument.Name.LocalName, xmlDocument.Elements().Select(el => RemoveAllNamespaces(el)));
}
See Peter Stegnar's answer here for more details:
How to remove all namespaces from XML with C#?

You can also use direct node tests with wildcards, which will match any namespace (or lack thereof):
$your-document/*:root/*:child/#*:attr

How do I find a XML node by path in Linq-to-XML

If I get the path to a specific node as a string can I somehow easily find said node by using Linq/Method of the XElement ( or XDocument ).
There are so many different types of XML objects it would also be nice if as a added bonus you could point me to a guide on why/how to use different types.
EDIT: Ok after being pointed towards XPathSelectElement I'm trying it out so I can give him the right answer I can't quite get it to work though. This is the XML I'm trying out
<Product>
<Name>SomeName</Name>
<Type>SomeType</Type>
<Quantity>Alot</Quantity>
</Product>
and my code
string path = "Product/Name";
string name = xml.XPathSelectElement(path).Value;
note my string is coming from elsewhere so I guess it doesn't have to be literal ( at least in debug mode it looks like the one above). I've also tried adding / in front. It gives me a null ref.

Try using the XPathSelectElement extension method of XElement. You can pass the method an XPath expression to evaluate. For example:
XElement myElement = rootElement.XPathSelectElement("//Book[#ISBN='22542']");
Edit:
In reply to your edit, check your XPath expression. If your document only contains that small snippet then /Product/Name will work as the leading slash performs a search from the root of the document:
XElement element = document.XPathSelectElement("/Product/Name");
If there are other products and <Product> is not the root node you'll need to modify the XPath you're using.

You can also use XPathEvaluate
XDocument document = XDocument.Load("temp.xml");
var found = document.XPathEvaluate("/documents/items/item") as IEnumerable<object>;
foreach (var obj in found)
{
Console.Out.WriteLine(obj);
}
Given the following xml:
<?xml version="1.0" encoding="utf-8" ?>
<documents>
<items>
<item name="Jamie"></item>
<item name="John"></item>
</items>
</documents>
This should print the contents from the items node.

Parse XML in C#

Hello I want to know how can I parse this simple XML file content in C#. I can have multiple "in" elements, and from those I want to use date, min, max and state child values.
<out>
<in>
<id>16769</id>
<date>29-10-2010</date>
<now>12</now>
<min>12</min>
<max>23</max>
<state>2</state>
<description>enter text</description>
</in>
<in>
<id>7655</id>
<date>12-10-2010</date>
<now>1</now>
<min>1</min>
<max>2</max>
<state>0</state>
<description>enter text</description>
</in>
</out>

The System.XML namespace has all sorts of tools for parsing, reading, and writing XML data. By the way, your XML isn't well-formed; you've got two <out> elements, but only one </out> element.

Linq to xml is also helpful for parsing xml -
http://msdn.microsoft.com/en-us/library/bb387098.aspx
Also -
http://msdn.microsoft.com/library/bb308960.aspx

You need System.XML, starting with XmlDocument.Load(filename).
Once you have the XmlDocument loaded, you can drill down into it as needed using the built-in .Net XML object model, starting from XmlDocument level. You can walk the tree recursively in a pretty intuitive way, capturing what you want from each XmlNode as you go.
Alternatively (and preferably) you can quickly locate all XmlNodes in your XmlDocument that match certain conditions using XPath - examples here. An example of usage in C# is XmlNode.SelectNodes.
using System;
using System.IO;
using System.Xml;
public class Sample {
public static void Main() {
XmlDocument doc = new XmlDocument();
doc.Load("booksort.xml");
XmlNodeList nodeList;
XmlNode root = doc.DocumentElement;
nodeList=root.SelectNodes("descendant::book[author/last-name='Austen']");
//Change the price on the books.
foreach (XmlNode book in nodeList)
{
book.LastChild.InnerText="15.95";
}
Console.WriteLine("Display the modified XML document....");
doc.Save(Console.Out);
}
}

Examples can be found here http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx

This might be beyond what you want to do, but worth mentioning...
I hate parsing XML. Seriously, I almost refuse to do it, especially since .NET can do it for me. What I would do is create an "In" object that has the properties above. You probably have one already, or it would take 60 seconds to create. You'll also need a List of In objects called "Out".
Then just deserialze the XML into the objects. This takes just a few lines of code. Here is an example. BTW, this makes changing and re-saving the data just as easy.
How to serialize/deserialize

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get all possible XPath expressions with XPathNavigator class? - c#

No, that's not possible. There are many, many ways to select a particular node with XPath. You might settle on some notion of the "canonical" XPath for any given node, but even that sounds hard to specify, and XPathNavigator has no such notion built in to help you.

Related

How to find nodes in XML File using Line number C#?

Find Element when XPath is variable

How to deal with namespaces in XML in XmlDocument c#

How do I find a XML node by path in Linq-to-XML

Parse XML in C#

Categories

Resources