How do I read Element in specific XML file in C#? - c#

my file XML:
<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 11" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<page width="1006" height="636" resolution="300" originalCoords="1" rotation="RotatedUpsidedown">
<block blockType="Text" blockName="" l="979" t="613" r="1006" b="636"><region><rect l="979" t="613" r="1006" b="636"/></region>
<text>
<par lineSpacing="890">
<line baseline="17" l="985" t="620" r="1006" b="636"><formatting lang="EnglishUnitedStates"><charParams l="985" t="620" r="1006" b="636" suspicious="1">r</charParams></formatting></line></par>
</text>
</block>
<block blockType="Barcode" blockName="" l="242" t="21" r="772" b="116"><region><rect l="242" t="21" r="772" b="116"/></region>
<text>
<par><line baseline="0" l="0" t="0" r="0" b="0"><formatting lang="">049102580225180310</formatting></line></par>
</text>
<barcodeInfo type="INTERLEAVED25"/>
</block>
</page>
</document>
I want extract number 049102580225180310 located in <formatting>..</formatting>
I try this code:
XElement racine = XElement.Load("test_XML.xml");
var query = from xx in racine.Elements(XName.Get("block"))
select new
{
CodeBar= xx.Attribute(XName.Get("formatting")).Value
};
But I haven't nothing

Here's a console program, that gets the 2nd formatting (where lang='') node.
using System;
using System.Xml;
namespace ConsoleApplication1 {
class Program {
static void Main(string[] args) {
XmlDocument xml = new XmlDocument();
xml.Load("c:\\temp\\test.xml");
NameTable nt = new NameTable();
XmlNamespaceManager nsmgr;
nsmgr = new XmlNamespaceManager(nt);
nsmgr.AddNamespace("html", xml.DocumentElement.NamespaceURI);
XmlNode ndFormat = xml.SelectSingleNode("//html:formatting[#lang='']", nsmgr);
if (ndFormat != null) {
Console.WriteLine(ndFormat.InnerText);
}
}
}
}

You have a couple issues here:
Your XML has a default namespace in the root node:
<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0"
Thus all child elements are in this namespace, and so when querying for elements by their Name the appropriate namespace must be used when constructing an XName for which to search.
The <formatting> nodes are not direct children of the <block> nodes, they are nested within several levels of XML. Also, they are XML elements, not XML attributes.
Similarly the <block> elements are not direct children of the <document> root element, they are nested inside a <page> element.
In such cases XElement.Descendants(name) can be used to find nested elements by name.
Thus your query should be:
var ns = racine.Name.Namespace; // The root default namespace used by all the elements in the XML.
var query = from block in racine.Descendants(ns + "block")
from formatting in block.Descendants(ns + "formatting")
select new
{
CodeBar= (string)formatting,
};
Sample fiddle that outputs the values of the two <formatting> elements:
{ CodeBar = r }
{ CodeBar = 049102580225180310 }

Related

Reading a XML File using XPath Expression in C#

I am currently having a problem with reading a XML file using XPath expression. I have used the XmlDocument class. When I try reading a particular node from the XML, I get an empty list. The node which I am trying to read is the ID below ProductionRequest.
Here is the XML file which I tried to read:
<?xml version="1.0" encoding="iso-8859-1"?>
<ProductionSchedule xmlns="http://www.wbf.org/xml/b2mml-v02">
<ID>00000020000000</ID>
<Location>
<EquipmentID>8283</EquipmentID>
<EquipmentElementLevel>Site</EquipmentElementLevel>
<Location>
<EquipmentID>0</EquipmentID>
<EquipmentElementLevel>Area</EquipmentElementLevel>
</Location>
</Location>
<ProductionRequest>
<ID>0009300000000</ID>
<ProductProductionRuleID>W001</ProductProductionRuleID>
<StartTime>2017-04-20T23:57:20</StartTime>
<EndTime>2017-04-20T24:00:00</EndTime>
</ProductionRequest>
</ProductionSchedule>
This is the code which I used to read the above XML
using System;
using System.Xml.Linq;
using System.Xml;
using System.Xml.XPath;
namespace XML
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
string fullName = "F:\\Programming\\XML\\Example XML.xml";
XmlDocument xreader = new XmlDocument();
xreader.Load(fullName);
XmlNode root = xreader.DocumentElement;
XmlNodeList xnList1 =
xreader.SelectNodes("/ProductionSchedule/ProductionRequest/ID");
}
}
}
I could not find the cause for this problem. Could anyone help me in this regard. Looking forward for valuable inputs.
Your xml contains namespace http://www.wbf.org/xml/b2mml-v02 at root level node <ProductionSchedule>
And you are using the XPath expression /ProductionSchedule/ProductionRequest/ID but this XPath expression is not suitable for this xml document and that's why you can't get any desired value.
You need to use the below XPath expression to get the id's of all <ProductionRequest> node.
XmlNodeList xnList1 = xreader.SelectNodes("//*[name()='ProductionSchedule']/*[name()='ProductionRequest']/*[name()='ID']");
OR you can add namespace manually like
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xreader.NameTable);
nsmgr.AddNamespace("x", "http://www.wbf.org/xml/b2mml-v02");
XmlNodeList xnList1 = xreader.SelectNodes("//x:ProductionSchedule/x:ProductionRequest/x:ID", nsmgr);
And finally, you can read id from any of the parent nodes in variable xnList1 like
foreach (XmlNode id in xnList1)
{
Console.WriteLine(id.InnerText);
}
Output:

Modify a xml node with spacing and quotes as attricutes

I am trying to access to the child node of an XML but the my first XML node has spacing and quotes as attributes.
var xml = #"<Envelope xsd "http">
<Catalog>
<Price>
<Value Default ="yes">P1</Value>
</Price>
</Catalog>
</Envelope>";
Im trying to change the attribute value of Default from "yes" to "1" but node always returns null.
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
var node = doc.SelectSingleNode("/*/Catalog/Price/Value");
Any ideas?
I do not think that is valid xml, did you perhaps mean the following
using System;
using System.Globalization;
using System.Xml;
namespace ConsoleApplication9
{
class Program
{
private static void Main(string[] args)
{
//Valid XML
string xml = #"<Envelope xsd='http'>
<Catalog>
<Price>
<Value Default='yes'>P1</Value>
</Price>
</Catalog>
</Envelope>";
var doc = new XmlDocument();
doc.LoadXml(xml);
//Select the Value Node
XmlNode node = doc.SelectSingleNode("/*/Catalog/Price/Value");
//Set the Default attribute to 1
node.Attributes["Default"].Value = 1.ToString(CultureInfo.InvariantCulture);
//Check the output
Console.WriteLine(doc.InnerXml.ToString(CultureInfo.InvariantCulture));
//Press enter to exit
Console.ReadLine();
}
}
}
Just saying.
See http://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx.
Use // instead of /* since // gets the root of the document
This might appear a little hardcoded but it should work:
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
Namespace ns= "http"; //set the namespace of the root node here
//the following is where you change the value to 1
doc.Document.Descendants(ns+"Envelope").FirstorDefault().Descendants(ns+"Catalog").Descendants(ns+"Price").FirstorDefault().Elements("Value").Attribute("Default").SetValue("1");
Also, the xml looks a little wrong to me, as someone mentioned, the root node needs to be corrected.

Getting an XElement with a namespace via XPathSelectElements

I have an XML e.g.
<?xml version="1.0" encoding="utf-8"?>
<A1>
<B2>
<C3 id="1">
<D7>
<E5 id="abc" />
</D7>
<D4 id="1">
<E5 id="abc" />
</D4>
<D4 id="2">
<E5 id="abc" />
</D4>
</C3>
</B2>
</A1>
This is may sample code:
var xDoc = XDocument.Load("Test.xml");
string xPath = "//B2/C3/D4";
//or string xPath = "//B2/C3/D4[#id='1']";
var eleList = xDoc.XPathSelectElements(xPath).ToList();
foreach (var xElement in eleList)
{
Console.WriteLine(xElement);
}
It works perfectly, but if I add a namespace to the root node A1, this code doesn't work.
Upon searching for solutions, I found this one, but it uses the Descendants() method to query the XML. From my understanding, this solution would fail if I was searching for <E5> because the same tag exists for <D7>, <D4 id="1"> and <D4 id="2">
My requirement is to search if a node exists at a particular XPath. If there is a way of doing this using Descendants, I'd be delighted to use it. If not, please guide me on how to search using the name space.
My apologies in case this is a duplicate.
To keep using XPath, you can use something link this:
var xDoc = XDocument.Parse(#"<?xml version='1.0' encoding='utf-8'?>
<A1 xmlns='urn:sample'>
<B2>
<C3 id='1'>
<D7><E5 id='abc' /></D7>
<D4 id='1'><E5 id='abc' /></D4>
<D4 id='2'><E5 id='abc' /></D4>
</C3>
</B2>
</A1>");
// Notice this
XmlNamespaceManager nsmgr = new XmlNamespaceManager(new NameTable());
nsmgr.AddNamespace("sample", "urn:sample");
string xPath = "//sample:B2/sample:C3/sample:D4";
var eleList = xDoc.XPathSelectElements(xPath, nsmgr).ToList();
foreach (var xElement in eleList)
{
Console.WriteLine(xElement);
}
but it uses the Descendants() method to query the XML. From my understanding, this solution would fail if I was searching for because the same tag exists for , and
I'm pretty sure you're not quite understanding how that works. From the MSDN documentation:
Returns a filtered collection of the descendant elements for this document or element, in document order. Only elements that have a matching XName are included in the collection.
So in your case, just do this:
xDoc.RootNode
.Descendants("E5")
.Where(n => n.Parent.Name.LocalName == "B4");
Try this
var xDoc = XDocument.Parse("<A1><B2><C3 id=\"1\"><D7><E5 id=\"abc\" /></D7><D4 id=\"1\"><E5 id=\"abc\" /></D4><D4 id=\"2\"><E5 id=\"abc\" /></D4></C3></B2></A1>");
foreach (XElement item in xDoc.Element("A1").Elements("B2").Elements("C3").Elements("D4"))
{
Console.WriteLine(item.Element("E5").Value);//to get the value of E5
Console.WriteLine(item.Element("E5").Attribute("id").Value);//to get the value of attribute
}

Get InnerText from Collection

Is there a way to get the innertext of a node when the node is inside a collection
Currently i have this
Collection<string> DependentNodes = new Collection<string>();
foreach (XmlNode node in nodes)
{
for (int i = 0; i < node.ChildNodes.Count; i++)
{
DependentNodes.Add(node.ChildNodes[i].InnerXml);
//the reason i'm using InnerXml is that it will return all the child node of testfixture in one single line,then we can find the category & check if there's dependson
}
}
string selectedtestcase = "abc_somewords";
foreach (string s in DependentNodes)
{
if(s.Contains(selectedtestcase))
{
MessageBox.Show("aaa");
}
}
When i debug string s or the index has this inside of it[in a single line]
<testfixture name="1" description="a">
<categories>
<category>abc_somewords</category>
</categories>
<test name="a" description="a">
<dependencies>
<dependson typename="dependsonthis" />
</dependencies>
</test>
</testfixture>
What i'm trying to do is when we reach "testfixture 1" it will find "abc_somewords" & search the "dependson typename"node(if any) and get the "typename"(which is "dependonthis").
Could you use linq to xml. Something like the below might be a decent start
xml.Elements("categories").Where(x => x.Element("category").Value.Contains(selectedtestcase));
This is off the top of my head so might will need refining
P.S. Use XElement.Load or XElement.Parse to get your xml into XElements
Since you already working with XmlNode you could use a XPath expression to select the desired textfixture node, and select the dependency value:
XmlDocument doc = // ...
XmlNode node = doc.SelectSingleNode("//testfixture[contains(categories/category, \"abc\")]/test/dependencies/dependson/");
if (node != null)
{
MessageBox.Show(node.Attributes["typename"]);
}
This selects the dependson node which belongs to a testfixture node with a category containing "abc". node.Attributes["typename"] will return the value of the typename attribute.
Edited:
Updated XPath expression to the more specific question information
Assumptions
As you are looping in your code and wanting to create a collection I'm assuming the actual Xml File has several testfixture nodes inside such as the below assumed example:
<root>
<testfixture name="1" description="a">
<categories>
<category>abc_somewords</category>
</categories>
<test name="a" description="a">
<dependencies>
<dependson typename="dependsonthis" />
</dependencies>
</test>
</testfixture>
<testfixture name="2" description="a">
<categories>
<category>another_value</category>
</categories>
<test name="b" description="a">
<dependencies>
<dependson typename="secondentry" />
</dependencies>
</test>
</testfixture>
<testfixture name="3" description="a">
<categories>
<category>abc_somewords</category>
</categories>
<test name="c" description="a">
<dependencies>
<dependson typename="thirdentry" />
</dependencies>
</test>
</testfixture>
</root>
The Code using Linq to Xml
To use Linq you must reference the following name spaces:
using System.Linq;
using System.Xml.Linq;
Using Linq To Xml on the above assumed xml file structure would look like this:
// To Load Xml Content from File.
XDocument doc1 = XDocument.Load(#"C:\MyXml.xml");
Collection<string> DependentNodes = new Collection<string>();
var results =
doc1.Root.Elements("testfixture")
.Where(x => x.Element("categories").Element("category").Value.Contains("abc_somewords"))
.Elements("test").Elements("dependencies").Elements("dependson").Attributes("typename").ToArray();
foreach (XAttribute attribute in results)
{
DependentNodes.Add(attribute.Value.Trim());
}
Result
The resulting Collection will contain the following:
As you can see, only the text of the typename attribute has been extracted where the dependson nodes where in a testfixture node which contained a category node with the value of abc_somewords.
Additional Notes
If you read the xml from a string you can also use this:
// To Load Xml Content from a string.
XDocument doc = XDocument.Parse(myXml);
If your complete Xml structure is different, feel free to post it and I change the code to match.
Have Fun.
I don't know what is "nodes" you are using.
Here is code with your requirement(What I understood).
Collection<XmlNode> DependentNodes = new Collection<XmlNode>();
XmlDocument xDoc = new XmlDocument();
xDoc.Load(#"Path_Of_Your_xml");
foreach (XmlNode node in xDoc.SelectNodes("testfixture")) // Here I am accessing only root node. Give Xpath if ur requrement is changed
{
for (int i = 0; i < node.ChildNodes.Count; i++)
{
DependentNodes.Add(node.ChildNodes[i]);
}
}
string selectedtestcase = "abc_somewords";
foreach (var s in DependentNodes)
{
if (s.InnerText.Contains(selectedtestcase))
{
Console.Write("aaa");
}
}
using System;
using System.Xml;
namespace ConsoleApplication6
{
class Program
{
private const string XML = "<testfixture name=\"1\" description=\"a\">" +
"<categories>" +
"<category>abc_somewords</category>" +
"</categories>" +
"<test name=\"a\" description=\"a\">" +
"<dependencies>" +
"<dependson typename=\"dependsonthis\" />" +
"</dependencies>" +
"</test>" +
"</testfixture>";
static void Main(string[] args)
{
var document = new XmlDocument();
document.LoadXml(XML);
var testfixture = document.SelectSingleNode("//testfixture[#name = 1]");
var category = testfixture.SelectSingleNode(".//category[contains(text(), 'abc_somewords')]");
if(category != null)
{
var depends = testfixture.SelectSingleNode("//dependson");
Console.Out.WriteLine(depends.Attributes["typename"].Value);
}
Console.ReadKey();
}
}
}
Output: dependsonthis

Filter XDocument more efficiently

I would like to filter with high performance XML elements from an XML document.
Take for instance this XML file with contacts:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="asistentes.xslt"?>
<contactlist evento="Cena Navidad 2010" empresa="company">
<contact type="1" id="1">
<name>Name1</name>
<email>xxxx#zzzz.es</email>
<confirmado>SI</confirmado>
</contact>
<contact type="1" id="2">
<name>Name2</name>
<email>xxxxxxxxx#zzzze.es</email>
<confirmado>Sin confirmar</confirmado>
</contact>
</contaclist>
My current code to filter from this XML document:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
string xml = #" the xml above";
XDocument doc = XDocument.Parse(xml);
foreach (XElement element in doc.Descendants("contact")) {
Console.WriteLine(element);
var id = element.Attribute("id").Value;
var valor = element.Descendants("confirmado").ToList()[0].Value;
var email = element.Descendants("email").ToList()[0].Value;
var name = element.Descendants("name").ToList()[0].Value;
if (valor.ToString() == "SI") { }
}
}
}
What would be the best way to optimize this code to filter on <confirmado> element content?
var doc = XDocument.Parse(xml);
var query = from contact in doc.Root.Elements("contact")
let confirmado = (string)contact.Element("confirmado")
where confirmado == "SI"
select new
{
Id = (int)contact.Attribute("id"),
Name = (string)contact.Element("name"),
Email = (string)contact.Element("email"),
Valor = confirmado
};
foreach (var contact in query)
{
...
}
Points of interest:
doc.Root.Elements("contact") selects only the <contact> elements in the document root, instead of searching the whole document for <contact> elements.
The XElement.Element method returns the first child element with the given name. No need to convert the child elements to a list and take the first element.
The XElement and XAttribute classes provide a wide selection of convenient conversion operators.
You could use LINQ:
foreach (XElement element in doc.Descendants("contact").Where(c => c.Element("confirmado").Value == "SI"))

Categories