How can I parse the information from this XML? - c#

this is an example of the XML I want to scrape:
http://www.dreamincode.net/forums/xml.php?showuser=335389
Notice that the contactinformation tag has many contact elements, each similar but with different values.
For example, the element that has the AIM content in it, how can I get the content of the Value tag that's in the same family as the AIM content element?
That's where I'm stuck. Thanks!
Basically: I need to find the AIM content tag, make a note of where it is, and find the Value element within that same family. Hope this makes the question clearer

LINQToXML
var doc = XDocument.Load(#"http://www.dreamincode.net/forums/xml.php?showuser=335389");
var aimElements = doc.Descendants("contact").Where(a=>a.Element("title").Value == "AIM").Select(a=>a.Element("value").Value);
this will give you a list of strings that hold the value of the value element for a contact that has the title AIM, you can do a First() or a FirstOrDefault if you believe there should only be 1

Using an xpath like the one below will get you the contact/value node where contact/title is "AIM":
/ipb/profile/contactinformation/contact[title='AIM']/value

Have you tried to parse the XML rather than "scraping" it?

Related

Parsing xml tags with colons inside using C#

I'm working in Xamarin on one android app which is parsing xml from this webiste: http://video.cazin.net/rss.php, and populate listview and in particular I have a problem getting value from this tag:
<media:thumbnail url="http://video.cazin.net/uploads/thumbs/2d07f1e49-1.jpg" width="480" height="360"/>
I created namespace:
xmlNameSpaceManager.AddNamespace("ab", "http://search.yahoo.com/mrss/");
and than tried to get value from url attribute:
XmlNodeList xmlNode = document.SelectNodes("rss/channel/item");
if (xmlNode[i].SelectSingleNode("//ab:thumbnail[#url='http://video.cazin.net/rss.php']", xmlNameSpaceManager) != null)
{
var thumbnail = xmlNode[i].SelectSingleNode("//ab:thumbnail=[#url='http://video.cazin.net/rss.php']", xmlNameSpaceManager);
feedItem.Thumbnail = thumbnail.Value;
}
I also tried something like this:
//ab:thumbnail/#url
but than I got value of just first image. I'm sure the problem is here somewhere because I have the same code parisng images from another xml tag without colon inside and it's working correctly. Does anyone had similar experience and knows what I should put in those braces? Thanks
Your current query is searching for a thumbnail element where the url attribute is equal to http://video.cazin.net/rss.php - there are none that match this.
Your 'I also tried' query of //ab:thumbnail/#url is closer, but the // means that the query will start from the root of the document, so you get the all urls (but you only take the first).
If you want the element that matches taking the current node context into consideration, you need to include the current node context in the query - this is represented by .. So .//ab:thumbnail/#url would find all url attributes in a thumbnail element contained by the current node. You can see the result in this fiddle.
I would strongly suggest you use LINQ to XML instead, however. It's a lot nicer to work with than the old XmlDocument API. For example, you could find all item thumbnail urls using this code:
var doc = XDocument.Load("http://video.cazin.net/rss.php");
XNamespace media = "http://search.yahoo.com/mrss/";
var thumbnailUrls = doc.Descendants("item")
.Descendants(media + "thumbnail")
.Attributes("url");

count number of "elements" in an XML tag using c#

I'm using C# in reading an XML file and counting how many "elements" there are in an XML tag, like this for example...
<Languages>English, Deutsche, Francais</Languages>
there are 3 "elements" inside the Languages tag: English, Deutsche, and Francais . I need to know how to count them and return the value of how much elements there are. The contents of the tag have the possibility of changing over time, because the XML file has to expand/accommodate additional languages (whenever needed).
IF this is not possible, please do suggest workarounds for the problem. Thank you.
EDIT: I haven't come up with the code to read the XML file, but I'm also interested in learning how to.
EDIT 2: revisions made to question
string xml = #"<Languages>English, Deutsche, Francais</Languages>";
var doc = XDocument.Parse(xml);
string languages = doc.Elements("Languages").FirstOrDefault().Value;
int count = languages.Split(',').Count();
In response to your edits which indicate that you're not simply trying to pull out comma separated strings from an XML element, then your approach to storing the XML in the first place is incorrect. As another poster commented, it should be:
<Languages>
<Language>English</Language>
<Language>Deutsche</Language>
<Language>Francais</Language>
</Languages>
Then, to get the count of languages:
string xml = #"<Languages>
<Language>English</Language>
<Language>Deutsche</Language>
<Language>Francais</Language>
</Languages>";
var doc = XDocument.Parse(xml);
int count = doc.Element("Languages").Elements().Count();
First, an "ideal" solution: do not put more than one piece of information in a single tag. Rather, put each language in its own tag, like this:
<Languages>
<Language>English</Language>
<Language>Deutsche</Language>
<Language>Francais</Language>
</Languages>
If this is not possible, retrieve the content of the tag with multiple languages, split using allLanguages.Split(',', ' '), and obtain the count by checking the length of the resultant array.
Ok, but just to be clear, an XML Element has a very specific meaning. In fact, the entire codeblock you have is an XML Element.
XElement xElm = new XElement("Languages", "English, Deutsche, Francais");
string[] elements = xElm.Value.Split(",".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

How to generate xpath by looking for a string in an HTML document?

I have an HTML document, and I am willing to find out the xpath to an element containing a certain string.
To elaborate a bit more:
My HTML document is created dynamically and I have no specific names for s. The divs I am interested at look like (more or less):
<div>Country: China</div>
<div>Type: Earphones</div>
I want to get the whole string "Country: China". In order to do so, I want to find the xpath to this div by searching for "Country:" in the HTML.
I hope I was specific enough... Thank you!
Here are a couple ways:
//div[contains(child::text(), "Country:")]
//div/child::text()[contains(., "Country:")]/parent::node()
If you want to try things out within a browser, try out in-browser XPath bookmarklet.

XPATH query, HtmlAgilityPack and Extracting Text

I had been trying to extract links from a class called "tim_new" . I have been given a solution as well.
Both the solution, snippet and necessary information is given here
The said XPATH query was "//a[#class='tim_new'], my question is, how did this query differentiate between the first line of the snippet (given in the link above and the second line of the snippet).
More specifically, what is the literal translation (in English) of this XPATH query.
Furthermore, I want to write a few lines of code to extract the text written against NSE:
<div class="FL gL_12 PL10 PT15">BSE: 523395 | NSE: 3MINDIA | ISIN: INE470A01017</div>
Would appreciate help in forming the necessary selection query.
My code is written as:
IEnumerable<string> NSECODE = doc.DocumentNode.SelectSingleNode("//div[#NSE:]");
But this doesnt look right. Would appreciate some help.
The XPath in the first selection reads "select all document elements that have an attribute named class with a value of tim_new". The stuff in brackets is not what you're returning, it's the criteria you're applying to the search.
I don't have the HTML Agility pack, but if you are trying to query the divs that have "NSE:" as its text, your XPath for the second query should just be "//div" then you'll want to filter using LINQ.
Something like
var nodes =
doc.DocumentNode.SelectNodes("//div[text()]").Where(a => a.InnerText.IndexOf("NSE:") > -1);
So in English, "Return all the div elements that immediately contain text to LINQ, then check that the inner text value contains NSE:".
Again, I'm not sure the syntax is perfect, but that's the idea.
The XPath "//div[#NSE:]" would return all divs that have and attribute named, NSE:, which would be illegal anyway because ":" isn't allowed in an attribute name. Youre looking for the text of the element, not one of its attributes.
Hope that helps.'
Note: If you have nested divs that both contain text as in <div>NSE: some text<div>NSE: more text</div></div> you're going to get duplicate results.

Xpath, retrieving node value

I get this return value from Sharepoint... which I have just included the first part of the xml snippet...
<Result ID=\"1,New\" xmlns=\"http://schemas.microsoft.com/sharepoint/soap/\">
<ErrorCode>0x00000000</ErrorCode><ID /><z:row ows_ID=\"9\"
It populates a XmlNode node object.
How using xPath can I get the value of ows_id ?
My code so far...
XmlNode results = list.UpdateListItems("MySharePointList", batch);
Update
So far I have this : results.FirstChild.ChildNodes[2].Attributes["ows_ID"].Value
But I am not sure how reliable it is, can anyone improve on it?
I don't know if its necessarily an improvement, but it might be more readable, though more verbose:
/*[local-name() = 'Result']/*[local-name() = 'row']/#ows_ID
There is probably more to the fragment you posted so this XPath query might need a fixup when used against the actual xml result.
The function, local-name(), lets you ignore namespaces, which can be both a boon and a curse. :)
When you start from root:
/Result/z:row/#ows_ID
also you can improve search if exists multiple Result:
/Result[#ID='1,New']/z:row/#ows_ID
<xsl:value-of select="Result/b:row/#ows_ID"/>
or
<xsl:value-of select="Result/b:row[#ows_ID = '9']"/>
Depending on what value you wanted
You probably need to make sure the z namespace prefix is declared correctly - that's implementation dependent. Here's how you do it in Java's XPath implementation.
Then to select the value of the ows_ID attribute, you need to navigate to the element itself, then use #ows_ID to get the value.
The specific xpath calls depend on what library you use (e.g. libxml xpath implementation).
But the generic xpath statement would be:
"//z:row[#ows_ID='9']"
This will select all z:row nodes with an attribute ows_ID of value 9.
You can modify this query to match all z:row nodes or only those with a specific attribute.
For details look here: W3Schools XPath syntax

Categories