Using XMLreader and xpath in large xml-file C# - c#

So I have this rather large XML-file i need to parse and I don't want to load the whole file in memory. The XML looks something like this:
<root>
<node attrib ="true">
<child childattrib=1>
</child>
</node>
<node attrib ="false">
<child childattrib=1>
</child>
</node>
</root>
What I want to do is go through each node named node and see if the attribute matches my search-critera. And I want to do it using xpath.
I found Parse xml in c# : combine xmlreader and linq to xml which helps me isolate the node in question.
But I cant use xpath on the parent node. I guess I'll have to create an xmldocument and load the reader, but I cant get it to work the way I want to.

Attributes need double quotes around value(childattrib). Try following which is a combination of xml reader and xml linq. When reading large xml files always use xmlreader.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication74
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XmlReader reader = XmlReader.Create(FILENAME);
while (!reader.EOF)
{
if (reader.Name != "node")
{
reader.ReadToFollowing("node");
}
if (!reader.EOF)
{
XElement node = (XElement)XElement.ReadFrom(reader);
if ((Boolean)node.Attribute("attrib"))
{
}
}
}
}
}
}

Related

Read an XML document and removing nodes

I have an XML file, similar to the one below. I will read the XML and find the EmpId node. Based on other data will determine if I need to keep this node or delete the node. Currently have a process which reads it and creates a list of Record Ids which I will need to remove. In the below XML I will want to keep the EmpId = Emp1 and remove EmpId = Emp2. The removal is the node.
I believe the best approach is to read the XML first to determine which nodes to keep and then go thru the XML again and remove the necessary nodes.
What's the best approach to remove these nodes?
I'm open to creating a new XML document and creating the node which need to be kept. Based on the data I'm reading it's 50/50 if I'll be removing more nodes or keeping more nodes.
...
<HeaderNode>
<Details>
<SubmissionId>1</SubmissionId>
<EmpDetail>
<RecordId>1</RecordId>
<EmpDempgraphic>
<EmpId>Emp1</EmpId>
</EmpDempgraphic>
</EmpDetail>
<EmpDetail>
<RecordId>2</RecordId>
<EmpDempgraphic>
<EmpId>Emp2</EmpId>
</EmpDempgraphic>
</EmpDetail>
</Details>
</HeaderNode>
...
Using Xml Linq :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace XML
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
XElement EmpDetail = doc.Descendants("EmpDetail").Where(x => (int)x.Element("RecordId") == 2).FirstOrDefault();
EmpDetail.Remove();
}
}
}
Unless the document is huge, just load it into an XDocument, Remove the elements you don't want, and Save the document back out.

C# - How to move a node and its sub-nodes to another place within a XML document?

I have a XML file structure like this:
<root>
<list>
<item1>item 1</item1>
<item2>item 2</item2>
<item3>item 3</item3>
<item4>item 4</item4>
</list>
<generated-items>
<item5>item 5</item5>
<item6>item 6</item6>
</generated-items>
</root>
What I want to do is move the generated-list node and its sub-nodes to make it a sub-node within the list node.
So the final result should be like this:
<root>
<list>
<item1>item 1</item1>
<item2>item 2</item2>
<item3>item 3</item3>
<item4>item 4</item4>
<generated-items>
<item5>item 5</item5>
<item6>item 6</item6>
</generated-items>
</list>
</root>
Hope someone can help me find the best solution for this.
Use xml linq :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
XElement root = doc.Root;
XElement list = root.Element("list");
XElement items = root.Element("generated-items");
list.Add(new XElement(items));
items.Remove();
}
}
}

c# - how to get child element val where another child element equals to a val

I'm trying to get a child element's value where another child element value equals to a value,
for example I have this xml file:
<CATALOG>
<game>
<name>Assassins Creed Origins</name>
<picture>pic1</picture>
<torrent>file1</torrent>
</game>
<game>
<name>mylifeisdone</name>
<picture>pic2</picture>
<torrent>file2</torrent>
</game>
</CATALOG>
I want to get picture value where name equals to mylifeisdone
Using Xml Linq :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
List<XElement> games = doc.Descendants("game").ToList();
string picture = games.Where(x => (string)x.Element("name") == "mylifeisdone").Select(x => (string)x.Element("picture")).FirstOrDefault();
}
}
}
The easiest way I could think of is using XDocument:
XDocument doc = XDocument.Parse(#"
<CATALOG>
<game>
<name>Assassins Creed Origins</name>
<picture>pic1</picture>
<torrent>file1</torrent>
</game>
<game>
<name>mylifeisdone</name>
<picture>pic2</picture>
<torrent>file2</torrent>
</game>
</CATALOG>");
var picture = doc.Descendants("game")
.First(g => g.Element("name").Value == "mylifeisdone")
.Element("picture").Value;
This first gets all elements "game" and searches for the first element, of which the name element has the value "mylifeisdone"; after that, it retrieves the value of the "picture" element.
Note: you may need the namespace System.Xml.Linq and, if you are reading the XML from a file, use XDocument.Load("path").

Iteration on XML tags in C#

How would code an iteration to loop through the parent tag on a xml file as below:
<collection>
<parent>
<child1>DHL</child1>
<child2>9000000131</child2>
<child3>ISS Gjøvik</child13>
<child4>ISS Gjøvik</child4>
<child5>ISS Gjøvik</child5>
<child6>9999000000136</child6>
</parent>
<parent>
<child1>DHL</child1>
<child2>9000000132</child2>
<child3>ISS Gjøvik</child13>
<child4>ISS Gjøvik</child4>
<child5>ISS Gjøvik</child5>
<child6>9999000000136</child6>
</parent>
<parent>
<child1>DHL</child1>
<child2>9000000134</child2>
<child3>ISS Gjøvik</child13>
<child4>ISS Gjøvik</child4>
<child5>ISS Gjøvik</child5>
<child6>9999000000136</child6>
</parent>
</collection>
I need to insert the value of child1 as the primary key into the DB.
Have you tried the XmlReader? What do you have so far? Please show us some code. Just a reminder, StackOverflow is a helpdesk, not a programming service.
I see DHL in one of the tags. If that refers to the postal delivery company, they have an API (SDK) that is easy to use from within .NET code..
If you want to use XML (de)serialization that I would suggest that you start reading the System.Xml.Serialization namespace documentation. Microsoft has provided more than enough documentation and examples.
Link to namespace docs: https://msdn.microsoft.com/en-us/library/system.xml.serialization(v=vs.110).aspx
Here are some examples that contains anything that you would need to deserialzation the xml document to a poco class:
https://msdn.microsoft.com/en-us/library/58a18dwa(v=vs.110).aspx
Assuming your xml is in the string variable xml:
var xdoc = XDocument.Parse(xml);
foreach (var parentEls in xdoc.Root.Elements("parent"))
{
string child1Value = parentEls.Element("child1").Value;
// your logic using child1 value
}
Note that your xml is malformed - <child3> is closed by </child13>.
Using xml linq to parse everything
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
var results = doc.Descendants("parent").Select(x => new {
child1 = (string)x.Element("child1"),
child2 = (string)x.Element("child2"),
child3 = (string)x.Element("child3"),
child4 = (string)x.Element("child4"),
child5 = (string)x.Element("child5"),
child6 = (string)x.Element("child6")
}).ToList();
}
}
}

Query an XDocument for elements by name at any depth

I have an XDocument object. I want to query for elements with a particular name at any depth using LINQ.
When I use Descendants("element_name"), I only get elements that are direct children of the current level. I'm looking for the equivalent of "//element_name" in XPath...should I just use XPath, or is there a way to do it using LINQ methods?
Descendants should work absolutely fine. Here's an example:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
string xml = #"
<root>
<child id='1'/>
<child id='2'>
<grandchild id='3' />
<grandchild id='4' />
</child>
</root>";
XDocument doc = XDocument.Parse(xml);
foreach (XElement element in doc.Descendants("grandchild"))
{
Console.WriteLine(element);
}
}
}
Results:
<grandchild id="3" />
<grandchild id="4" />
An example indicating the namespace:
String TheDocumentContent =
#"
<TheNamespace:root xmlns:TheNamespace = 'http://www.w3.org/2001/XMLSchema' >
<TheNamespace:GrandParent>
<TheNamespace:Parent>
<TheNamespace:Child theName = 'Fred' />
<TheNamespace:Child theName = 'Gabi' />
<TheNamespace:Child theName = 'George'/>
<TheNamespace:Child theName = 'Grace' />
<TheNamespace:Child theName = 'Sam' />
</TheNamespace:Parent>
</TheNamespace:GrandParent>
</TheNamespace:root>
";
XDocument TheDocument = XDocument.Parse( TheDocumentContent );
//Example 1:
var TheElements1 =
from
AnyElement
in
TheDocument.Descendants( "{http://www.w3.org/2001/XMLSchema}Child" )
select
AnyElement;
ResultsTxt.AppendText( TheElements1.Count().ToString() );
//Example 2:
var TheElements2 =
from
AnyElement
in
TheDocument.Descendants( "{http://www.w3.org/2001/XMLSchema}Child" )
where
AnyElement.Attribute( "theName" ).Value.StartsWith( "G" )
select
AnyElement;
foreach ( XElement CurrentElement in TheElements2 )
{
ResultsTxt.AppendText( "\r\n" + CurrentElement.Attribute( "theName" ).Value );
}
You can do it this way:
xml.Descendants().Where(p => p.Name.LocalName == "Name of the node to find")
where xml is a XDocument.
Be aware that the property Name returns an object that has a LocalName and a Namespace. That's why you have to use Name.LocalName if you want to compare by name.
Descendants will do exactly what you need, but be sure that you have included a namespace name together with element's name. If you omit it, you will probably get an empty list.
There are two ways to accomplish this,
LINQ to XML
XPath
The following are samples of using these approaches,
List<XElement> result = doc.Root.Element("emails").Elements("emailAddress").ToList();
If you use XPath, you need to do some manipulation with the IEnumerable:
IEnumerable<XElement> mails = ((IEnumerable)doc.XPathEvaluate("/emails/emailAddress")).Cast<XElement>();
Note that
var res = doc.XPathEvaluate("/emails/emailAddress");
results either a null pointer, or no results.
I am using XPathSelectElements extension method which works in the same way to XmlDocument.SelectNodes method:
using System;
using System.Xml.Linq;
using System.Xml.XPath; // for XPathSelectElements
namespace testconsoleApp
{
class Program
{
static void Main(string[] args)
{
XDocument xdoc = XDocument.Parse(
#"<root>
<child>
<name>john</name>
</child>
<child>
<name>fred</name>
</child>
<child>
<name>mark</name>
</child>
</root>");
foreach (var childElem in xdoc.XPathSelectElements("//child"))
{
string childName = childElem.Element("name").Value;
Console.WriteLine(childName);
}
}
}
}
Following #Francisco Goldenstein answer, I wrote an extension method
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
namespace Mediatel.Framework
{
public static class XDocumentHelper
{
public static IEnumerable<XElement> DescendantElements(this XDocument xDocument, string nodeName)
{
return xDocument.Descendants().Where(p => p.Name.LocalName == nodeName);
}
}
}
This my variant of the solution based on LINQ and the Descendants method of the XDocument class
using System;
using System.Linq;
using System.Xml.Linq;
class Test
{
static void Main()
{
XDocument xml = XDocument.Parse(#"
<root>
<child id='1'/>
<child id='2'>
<subChild id='3'>
<extChild id='5' />
<extChild id='6' />
</subChild>
<subChild id='4'>
<extChild id='7' />
</subChild>
</child>
</root>");
xml.Descendants().Where(p => p.Name.LocalName == "extChild")
.ToList()
.ForEach(e => Console.WriteLine(e));
Console.ReadLine();
}
}
Results:
For more details on the Desendants method take a look here.
We know the above is true. Jon is never wrong; real life wishes can go a little further.
<ota:OTA_AirAvailRQ
xmlns:ota="http://www.opentravel.org/OTA/2003/05" EchoToken="740" Target=" Test" TimeStamp="2012-07-19T14:42:55.198Z" Version="1.1">
<ota:OriginDestinationInformation>
<ota:DepartureDateTime>2012-07-20T00:00:00Z</ota:DepartureDateTime>
</ota:OriginDestinationInformation>
</ota:OTA_AirAvailRQ>
For example, usually the problem is, how can we get EchoToken in the above XML document? Or how to blur the element with the name attribute.
You can find them by accessing with the namespace and the name like below
doc.Descendants().Where(p => p.Name.LocalName == "OTA_AirAvailRQ").Attributes("EchoToken").FirstOrDefault().Value
You can find it by the attribute content value, like this one.
(Code and Instructions is for C# and may need to be slightly altered for other languages)
This example works perfect if you want to read from a Parent Node that has many children, for example look at the following XML;
<?xml version="1.0" encoding="UTF-8"?>
<emails>
<emailAddress>jdoe#set.ca</emailAddress>
<emailAddress>jsmith#hit.ca</emailAddress>
<emailAddress>rgreen#set_ig.ca</emailAddress>
</emails>
Now with this code below (keeping in mind that the XML File is stored in resources (See the links at end of snippet for help on resources) You can obtain each email address within the "emails" tag.
XDocument doc = XDocument.Parse(Properties.Resources.EmailAddresses);
var emailAddresses = (from emails in doc.Descendants("emailAddress")
select emails.Value);
foreach (var email in emailAddresses)
{
//Comment out if using WPF or Windows Form project
Console.WriteLine(email.ToString());
//Remove comment if using WPF or Windows Form project
//MessageBox.Show(email.ToString());
}
Results
jdoe#set.ca
jsmith#hit.ca
rgreen#set_ig.ca
Note: For Console Application and WPF or Windows Forms you must add the "using System.Xml.Linq;" Using directive at the top of your project, for Console you will also need to add a reference to this namespace before adding the Using directive. Also for Console there will be no Resource file by default under the "Properties folder" so you have to manually add the Resource file. The MSDN articles below, explain this in detail.
Adding and Editing Resources
How to: Add or Remove Resources

Categories