Parsing this xml - c#

I tried a lot of codes but nothing worked.
I have XML:
<books>
<book>
<title>first title</title>
<publisher>first publisher</publisher>
<description>first description</description>
<published>1410</published>
</book>
<book>
<title>second book</title>
<publisher>second publisher</publisher>
<description>second description</description>
<published>1914</published>
</book>
[another book]
[another book2]
</books>
And I want input like this:
first title | first publisher | first description | 1410
second title | second publisher | second descirpion | 1914
[another books]
"My" Code:
var xdoc = XDocument.Load(#"5.xml");
var entries = from e in xdoc.Descendants("book")
select new
{
Title = (string)e.Element("title"),
Description = (string)e.Element("description")
};
//I DON'T KNOW WHAT IT DO, I FOUND THIS BUT I DON'T KNOW WHAT NEXT
I can parse first book but i can't parse multiple. Sorry for language.

If you want to use an XDocument you may try the following:
using System;
using System.Xml.Linq;
class Program
{
static void Main()
{
var doc = XDocument.Load("5.xml");
var books = doc.Descendants("book");
foreach (var book in books)
{
string title = book.Element("title").Value;
string publisher = book.Element("publisher").Value;
string description = book.Element("description").Value;
string published = book.Element("published").Value;
Console.WriteLine("{0}\t{1}\t{2}\t{3}", title, publisher, description, published);
}
}
}
If on the other hand the XML you are trying to parse is very big and cannot fit into memory it is better to use an XmlReader which will allow you to process it record by record:
using System;
using System.Xml;
class Program
{
static void Main()
{
using (var reader = XmlReader.Create("5.xml"))
{
string title = null, publisher = null, description = null, published = null;
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.EndElement && reader.Name == "book")
{
Console.WriteLine("{0}\t{1}\t{2}\t{3}", title, publisher, description, published);
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "title")
{
title = reader.ReadInnerXml();
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "publisher")
{
publisher = reader.ReadInnerXml();
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "description")
{
description = reader.ReadInnerXml();
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "published")
{
published = reader.ReadInnerXml();
}
}
}
}
}
With this approach you can deal with arbitrary large XML files.

You can use this code to parse your XML
XDocument xDoc = XDocument.Load("5.xml");
var books = (from b in xDoc.Descendants("book")
select new
{
title = (string) b.Element("title"),
publisher = (string) b.Element("publisher"),
despription = (string) b.Element("description"),
published = (string) b.Element("published")
}).ToList();
foreach (var book in books)
{
Console.WriteLine("{0} | {1} | {2} |{3}",book.title,book.publisher,book.despription,book.published);
}

Related

How do I get number of a XML nodes child in c# XmlReader?

this is my XML structure:
<classes>
<Base Name="node1">
<Book Name="child01" CoverArtName="C102.jpg" CoverBaseFolder="" Tooltip="" PluginBook=""/>
<Book Name="child02" CoverArtName="C102.jpg" CoverBaseFolder="" Tooltip="" PluginBook=""/>
<Book Name="child03" CoverArtName="C102.jpg" CoverBaseFolder="" Tooltip="" PluginBook=""/>
</Base >
<Base Name="node2">
<Book Name="child01" CoverArtName="C102.jpg" CoverBaseFolder="" Tooltip="" PluginBook=""/>
<Book Name="child02" CoverArtName="C102.jpg" CoverBaseFolder="" Tooltip="" PluginBook=""/>
</Base >
<Base Name="node3">
</Base >
</classes>
how can i get number of children of each node with xmlReader?
Update:
I read my XML with thes code:
List<Bases> base7 = new List<Bases>();
XmlReader xmlReader = XmlReader.Create("Books.xml");
while (xmlReader.Read())
{
if ((xmlReader.NodeType == XmlNodeType.Element) && (xmlReader.Name == "Base"))
{
if (xmlReader.HasAttributes)
Console.WriteLine(xmlReader.GetAttribute("Name") + ": " + xmlReader.GetAttribute("CoverBaseFolder"));
//Base Name
base7.Add(new Bases() { BaseName = xmlReader.GetAttribute("Name"), Basefolder = xmlReader.GetAttribute("CoverBaseFolder") });
}
}
mainbox.ItemsSource = base7;
The output is a list item with name of node and number of child elements of same node.
This can be done easily by using LinqToXml:
var list = XElement.Load("test.xml")
.Elements("Base")
.Select(e => new
{
Name = e.Attribute("Name").Value,
Count = e.Elements().Count()
})
.ToList();
But if you want to use the XmlReader, for example, to work with xml that does not fit in memory, the code is much more cumbersome:
var bases = new List<Base>();
using (var xmlReader = XmlReader.Create("test.xml"))
{
while (xmlReader.Read())
{
if ((xmlReader.NodeType == XmlNodeType.Element) && (xmlReader.Name == "Base"))
{
var name = xmlReader.GetAttribute("Name");
int count = 0;
using (var innerReader = xmlReader.ReadSubtree())
{
while (innerReader.Read())
{
if (innerReader.NodeType == XmlNodeType.Element && innerReader.Name == "Book")
count++;
}
}
bases.Add(new Base { Name = name, Count = count });
}
}
}
class Base
{
public string Name { get; set; }
public int Count { get; set; }
}
To count the child nodes is convenient to use the ReadSubtree method.
The XmlReader class has many useful methods. Use ReadToFollowing method allows to slightly reduce code.
var bases = new List<Base>();
using (var xmlReader = XmlReader.Create("test.xml"))
{
while (xmlReader.ReadToFollowing("Base"))
{
string name = xmlReader.GetAttribute("Name");
int count = 0;
using (var innerReader = xmlReader.ReadSubtree())
{
while (innerReader.ReadToFollowing("Book"))
count++;
}
bases.Add(new Base { Name = name, Count = count });
}
}

Efficient Parsing of XML

Good day,
I'm writing a program in C# .Net to manage products of my store,
Following a given link I can retrieve an XML file that contains all the possible products that I can list onto my storefront.
The XML structure looks like this :
<Product StockCode="103-10440">
<lastUpdated><![CDATA[Fri, 20 May 2016 17:00:03 GMT]]></lastUpdated>
<StockCode><![CDATA[103-10440]]></StockCode>
<Brand><![CDATA[3COM]]></Brand>
<BrandID><![CDATA[14]]></BrandID>
<ProdName><![CDATA[BIG FLOW BLOWING JUNCTION FLEX BLOCK, TAKES 32, 40]]> </ProdName>
<ProdDesc/>
<Categories>
<TopCat><![CDATA[Accessories]]></TopCat>
<TopCatID><![CDATA[24]]></TopCatID>
</Categories>
<ProdImg/>
<ProdPriceExclVAT><![CDATA[30296.79]]></ProdPriceExclVAT>
<ProdQty><![CDATA[0]]></ProdQty>
<ProdExternalURL><![CDATA[http://pinnacle.eliance.co.za/#!/product/4862]]></ProdExternalURL>
</Product>
Here are the entries I'm looking for :
lastUpdated
StockCode
Brand
ProdName
ProdDesc
TopCat <--- nested in Categories tag.
ProdImg
ProdPriceExclVAT
ProdQty
ProdExternalURL
This is all fine to handle , and in-fact I did :
public ProductList Parse() {
XmlDocument doc = new XmlDocument();
doc.Load(XMLLink);
XmlNodeList ProductNodeList = doc.GetElementsByTagName("Product");
foreach (XmlNode node in ProductNodeList) {
Product Product = new Product();
for (int i = 0; i < node.ChildNodes.Count; i++) {
if (node.ChildNodes[i].Name == "StockCode") {
Product.VariantSKU = Convert.ToString(node.ChildNodes[i].InnerText);
}
if (node.ChildNodes[i].Name == "Brand") {
Product.Vendor = Convert.ToString(node.ChildNodes[i].InnerText);
}
if (node.ChildNodes[i].Name == "ProdName") {
Product.Title = Convert.ToString(node.ChildNodes[i].InnerText);
Product.SEOTitle = Product.Title;
Product.Handle = Product.Title;
}
if (node.ChildNodes[i].Name == "ProdDesc") {
Product.Body = Convert.ToString(node.ChildNodes[i].InnerText);
Product.SEODescription = Product.Body;
if (Product.Body == "") {
Product.Body = "ERROR";
Product.SEODescription = "ERROR";
}
}
if (node.ChildNodes[i].Name == "Categories") {
if (!tempList.Categories.Contains(node.ChildNodes[i].ChildNodes[0].InnerText)) {
if (!tempList.Categories.Contains("All")) {
tempList.Categories.Add("All");
}
tempList.Categories.Add(node.ChildNodes[i].ChildNodes[0].InnerText);
}
Product.Type = Convert.ToString(node.ChildNodes[i].ChildNodes[0].InnerText);
}
if (node.ChildNodes[i].Name == "ProdImg") {
Product.ImageSrc = Convert.ToString(node.ChildNodes[i].InnerText);
if (Product.ImageSrc == "") {
Product.ImageSrc = "ERROR";
}
Product.ImageAlt = Product.Title;
}
if (node.ChildNodes[i].Name == "ProdPriceExclVAT") {
float baseprice = float.Parse(node.ChildNodes[i].InnerText);
double Costprice = ((baseprice * 0.14) + (baseprice * 0.15) + baseprice);
Product.VariantPrice = Costprice.ToString("0.##");
}
}
Product.Supplier = "Pinnacle";
if (!tempList.Suppliers.Contains(Product.Supplier)) {
tempList.Suppliers.Add(Product.Supplier);
}
tempList.Products.Add(Product);
}
return tempList;
}
}
The problem is however, that this way of doing it, takes about 10 seconds to finish, and this is only just the first of multiple such files that I have to parse.
I am looking for the most efficient way to parse this XML file, getting all the fields's data that I mentioned above.
EDIT :
I benchmarked the code when running with a pre-downloaded copy of the file, and when downloading the file from the server at runtime :
With local copy : 5 Seconds.
Without local copy : 7.30 Seconds.
With large XML files you have to use an XmlReader. The code below will read one Product at a time.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
XmlReader reader = XmlReader.Create("filename");
while(!reader.EOF)
{
if (reader.Name != "Product")
{
reader.ReadToFollowing("Product");
}
if (!reader.EOF)
{
XElement product = (XElement)XElement.ReadFrom(reader);
string lastUpdated = (string)product.Element("lastUpdated");
}
}
}
}
}

How to get value from a specific child element in XML using XmlReader?

Here's the XML string.
<?xml version="1.0" encoding="utf-16"?>
<questionresponses>
<question id="dd7e3bce-57ee-497a-afe8-e3d8d25e2671">
<text>Question 1?</text>
<response>abcdefg</response>
<correctresponse>123</correctresponse>
</question>
<question id="efc43b1d-048f-4ba9-9cc0-1cc09a7eeaf2">
<text>Question 2?</text>
<response>12345678</response>
<correctresponse>123</correctresponse>
</question>
</questionresponses>
So how could I get value of <response> element by given question Id? Say, if I give id value = "dd7e3bce-57ee-497a-afe8-e3d8d25e2671", I'd like to have string value abcdefg returned as result.
var xmlstr = "content from above xml example";
using (var reader = XmlReader.Create(new StringReader(xmlstr)))
{
while(reader.Read())
{
if(reader.IsStartElement())
{
var attr = reader["id"];
if(attr != null && attr == "dd7e3bce-57ee-497a-afe8-e3d8d25e2671")
{
if(reader.ReadToDescendant("response"))
{
result = reader.Value; // <= getting empty string? So what's wrong?
break;
}
}
}
}
}
you might need to do like this , problem i think is reader is not moving to text and because of that you are getting empty
if(reader.ReadToDescendant("response"))
{
reader.Read();//this moves reader to next node which is text
result = reader.Value; //this might give value than
break;
}
Above one is working for me you can try out at your end
I would use LINQ2XML..
XDocument doc=XDocument.Parse(xmlstr);
String response=doc.Elements("question")
.Where(x=>x.Attribute("id")==id)
.Single()
.Element("response")
.Value;
if (reader.NodeType == XmlNodeType.Element)
{
if(reader.Name == "response")
{
reader.read();
var res = reader.Value;
}
}
//it works for me !!!!
You can use this function to get a response for specific questions from XML stored in QuestionXML.xml.
private string getResponse(string questionID)
{
string response = string.Empty;
using (StreamReader sr = new StreamReader("QuestionXML.xml", true))
{
XmlDocument xmlDoc1 = new XmlDocument();
xmlDoc1.Load(sr);
XmlNodeList itemNodes = xmlDoc1.GetElementsByTagName("question");
if (itemNodes.Count > 0)
{
foreach (XmlElement node in itemNodes)
{
if (node.Attributes["id"].Value.ToString() == questionID.Trim())
{
response = node.SelectSingleNode("response").InnerText;
break;
}
}
}
}
return response;
}

Specific XML Attribute values into Class List

Its been a while since I last tried to program and has never worked with XML before. I have a internal website that display XML
<Source>
<AllowsDuplicateFileNames>YES</AllowsDuplicateFileNames>
<Description>The main users ....</Description>
<ExportSWF>FALSE</ExportSWF>
<HasDefaultPublishDir>NO</HasDefaultPublishDir>
<Id>28577db1-956c-41f6-b775-a278c39e20a1</Id>
<IsAssociated>YES</IsAssociated>
<LogoURL>http://servername:8080/logos/9V0.png</LogoURL>
<Name>Portal1</Name>
<RequiredParameters>
<RequiredParameter>
<Id>user_name</Id>
<Name>UserID</Name>
<PlaceHolder>username</PlaceHolder>
<ShowAsDescription>true</ShowAsDescription>
</RequiredParameter>
</RequiredParameters>
I don't want the values in the child tags, there is time where there will be more than one portal thus the need/want to use a list. I only need the values inside of the Name and ID tags. also if there is a blank ID tag I don't want to store the either one of them.
My current approach to this is not working as expected:
String URLString = "http://servername:8080/roambi/SourceManager";
XmlTextReader reader = new XmlTextReader(URLString);
List<Portal> lPortals = new List<Portal>();
String sPortal = "";
String sId = "";
while (reader.Read())
{
//Get Portal ID
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Id")
{
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
sId = reader.Value;
}
}
//Get Portal Name
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Name")
{
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
sPortal = reader.Value;
}
//Fill Portal List with Name and ID
if (sId != "" && sPortal != "")
{
lPortals.Add(new Portal
{
Portalname = sPortal,
Portalid = sId
});
}
}
}
foreach (Portal i in lPortals)
{
Console.WriteLine(i.Portalname + " " + i.Portalid);
}
See my standard class
class Portal
{
private String portalname;
private String portalid;
public String Portalname
{
get { return portalname; }
set { portalname = value; }
}
public String Portalid
{
get { return portalid; }
set { portalid = value; }
}
}
Please give me some advice and point me into a direction, As I said its been a while since I last programmed. My current Output is as follow:
Portal1 28577db1-956c-41f6-b775-a278c39e20a1
UserID user_name
UserID is in a child node and I do not want to display child nodes
It's much easier with XDocument class:
String URLString = "http://servername:8080/roambi/SourceManager";
XmlTextReader reader = new XmlTextReader(URLString);
XDocument doc = XDocument.Load(reader);
// assuming there's some root-node whose children are Source nodes
var portals = doc.Root
.Elements("Source")
.Select(source => new Portal
{
Portalname = (string) source.Element("Name"),
Portalid = (string) source.Element("Id")
})
.Where(p => p.Portalid != "")
.ToList();
For each <Source> node in your XML, code above will select direct children nodes (<Name> and <Id>) and build appropriate Portal instances.

readElements XML with XmlReader and Linq

My aim is to read this xml file stream:
<?xml version="1.0" encoding="utf-16"?>
<events>
<header>
<seq>0</seq>
</header>
<body>
<orderBookStatus>
<id>100093</id>
<status>Opened</status>
</orderBookStatus>
<orderBook>
<instrumentId>100093</instrumentId>
<bids>
<pricePoint>
<price>1357.1</price>
<quantity>20</quantity>
</pricePoint>
<pricePoint>
<price>1357.0</price>
<quantity>20</quantity>
</pricePoint>
<pricePoint>
<price>1356.9</price>
<quantity>71</quantity>
</pricePoint>
<pricePoint>
<price>1356.8</price>
<quantity>20</quantity>
</pricePoint>
</bids>
<offers>
<pricePoint>
<price>1357.7</price>
<quantity>51</quantity>
</pricePoint>
<pricePoint>
<price>1357.9</price>
<quantity>20</quantity>
</pricePoint>
<pricePoint>
<price>1358.0</price>
<quantity>20</quantity>
</pricePoint>
<pricePoint>
<price>1358.1</price>
<quantity>20</quantity>
</pricePoint>
<pricePoint>
<price>1358.2</price>
<quantity>20</quantity>
</pricePoint>
</offers>
<lastMarketClosePrice>
<price>1356.8</price>
<timestamp>2011-05-03T20:00:00</timestamp>
</lastMarketClosePrice>
<dailyHighestTradedPrice />
<dailyLowestTradedPrice />
<valuationBidPrice>1357.1</valuationBidPrice>
<valuationAskPrice>1357.7</valuationAskPrice>
<lastTradedPrice>1328.1</lastTradedPrice>
<exchangeTimestamp>1304501070802</exchangeTimestamp>
</orderBook>
</body>
</events>
I created (based on the post here: http://blogs.msdn.com/b/xmlteam/archive/2007/03/24/streaming-with-linq-to-xml-part-2.aspx
a function
public IEnumerable<XElement> readElements(XmlReader r, string matchName)
{
//r.MoveToContent();
while (r.Read())
{
switch (r.NodeType)
{
case XmlNodeType.Element:
{
if (r.Name == matchName)
{
XElement el = XElement.ReadFrom(r) as XElement;
if (el != null)
yield return el;
} break;
}
}
}
}
which I planned to use in the following way
IEnumerable<XElement> xBids = readElements(xmlReader, "bids");
publishPricePoint(xBids, "bids");
IEnumerable<XElement> xOffers = readElements(xmlReader, "offers");
publishPricePoint(xOffers, "offers");
where the method publishPricePoint looks like this:
public void publishPricePoint(IEnumerable<XElement> ie, string side)
{
PricePoint p = new PricePoint();
var ci = CultureInfo.InvariantCulture.Clone() as CultureInfo;
ci.NumberFormat.NumberDecimalSeparator = ".";
var bids = (from b in ie.Elements() select b).ToList();
foreach (XElement e in bids)
{
p.price = decimal.Parse(e.Element("price").Value, ci);
p.qty = int.Parse(e.Element("quantity").Value, ci);
OnPricePointReceived(this, new MessageEventArgs(p, side));
}
}
The problem is, that in this piece of code:
IEnumerable<XElement> xBids = readElements(xmlReader, "bids");
publishPricePoint(xBids, "bids");
IEnumerable<XElement> xOffers = readElements(xmlReader, "offers");
publishPricePoint(xOffers, "offers");
only first two lines work, ie. only bids can be read, not the offers. What is wrong with this? For me, it looks like, there XmlReader disappears after bids have been read.
Thank you for help
================== Other solution =================
while (xmlReader.Read())
{
#region reading bids
if (xmlReader.IsStartElement("bids"))
{
readingBids = true;
readingOffers = false;
}
if (xmlReader.NodeType == XmlNodeType.EndElement && xmlReader.Name == "bids")
{
readingBids = false;
readingOffers = false;
}
if (readingBids == true)
{
if (xmlReader.IsStartElement("price"))
price = xmlReader.ReadElementContentAsDecimal();
if (xmlReader.IsStartElement("quantity"))
{
qty = xmlReader.ReadElementContentAsInt();
OnPricePointReceived(this, new MessageEventArgs(price, qty, "bid"));
}
}
#endregion
#region reading offers
if (xmlReader.IsStartElement("offers"))
{
readingBids = false;
readingOffers = true;
}
if (xmlReader.NodeType == XmlNodeType.EndElement && xmlReader.Name == "offers")
{
readingBids = false;
readingOffers = false;
}
if (readingOffers == true)
{
if (xmlReader.IsStartElement("price"))
price = xmlReader.ReadElementContentAsDecimal();
if (xmlReader.IsStartElement("quantity"))
{
qty = xmlReader.ReadElementContentAsInt();
OnPricePointReceived(this, new MessageEventArgs(price, qty, "offer"));
}
}
}
I think you will have to close and reopen the XmlReader. It simply is in EOF state.
Your solution requires reading everything twice, not too efficient.
Unless your XML is very large (eg > 100 MB) it would be much faster to read it all into an XDocument and filter the bids and offers out with Linq.
Edit: OK, so your data is continuously streamed. That means you can't use a single-tag filter, you'd be skipping the others.
A basic idea: Read every element, with XElement.ReadFrom()
push the elements you want into (separate) queues.
you'll want asynchronous processing. Use the TPL or the (beta) async/await features.
Why dont you do something like this
XDocument document = XDocument.Load(#"XMLFile1.xml");
var bidResults = (from br in document.Descendants("bids")
select br).ToList();
var offerResults = (from or in document.Descendants("offers")
select or).ToList();
then you can just iterate with a foreach (Xelement element in bidResults) to get all the data of the bids and also the data from the offers
foreach (XElement xElement in returnResult)
{
Offer off = new Offer();
off.price = xElement.Element("price") != null ? xElement.Element("price").Value : "";
off.quantity = xElement.Element("quantity") != null ? xElement.Element("quantity").Value : "";
}

Categories