Parsing XML with C# - c#

I have an XML file as follows:
I uploaded the XML file : http://dl.dropbox.com/u/10773282/2011/result.xml . It's a machine generated XML, so you might need some XML viewer/editor.
I use this C# code to get the elements in CoverageDSPriv/Module/*.
using System;
using System.Xml;
using System.Xml.Linq;
namespace HIR {
class Dummy {
static void Main(String[] argv) {
XDocument doc = XDocument.Load("result.xml");
var coveragePriv = doc.Descendants("CoverageDSPriv"); //.First();
var cons = coveragePriv.Elements("Module");
foreach (var con in cons)
{
var id = con.Value;
Console.WriteLine(id);
}
}
}
}
Running the code, I get this result.
hello.exe6144008016161810hello.exehello.exehello.exe81061hello.exehello.exe!17main_main40030170170010180180011190190012200200013hello.exe!107testfunctiontestfunction(int)40131505001460600158080216120120017140140018AA
I expect to get
hello.exe
61440
...
However, I get just one line of long string.
Q1 : What might be wrong?
Q2 : How to get the # of elements in cons? I tried cons.Count, but it doesn't work.
Q3 : If I need to get nested value of <CoverageDSPriv><Module><ModuleNmae> I use this code :
var coveragePriv = doc.Descendants("CoverageDSPriv"); //.First();
var cons = coveragePriv.Elements("Module").Elements("ModuleName");
I can live with this, but if the elements are deeply nested, I might be wanting to have direct way to get the elements. Are there any other ways to do that?
ADDED
var cons = coveragePriv.Elements("Module").Elements();
solves this issue, but for the NamespaceTable, it again prints out all the elements in one line.
hello.exe
61440
0
8
0
1
6
1
61810hello.exehello.exehello.exe81061hello.exehello.exe!17main_main40030170170010180180011190190012200200013hello.exe!107testfunctiontestfunction(int)40131505001460600158080216120120017140140018
Or, Linq to XML can be a better solution, as this post.

It looks to me like you only have one element named Module -- so .Value is simply returning you the InnerText of that entire element. Were you intending this instead?
coveragePriv.Element("Module").Elements();
This would return all the child elements of the Module element, which seems to be what your'e after.
Update:
<NamespaceTable> is a child of <Module> but you appear to want to handle it similarly to <Module> in that you want to write out each child element. Thus, one brute-force approach would be to add another loop for <NamespaceTable>:
foreach (var con in cons)
{
if (con.Name == "NamespaceTable")
{
foreach (var nsElement in con.Elements())
{
var nsId = nsElement.Value;
Console.WriteLine(nsId);
}
}
else
{
var id = con.Value;
Console.WriteLine(id);
}
}
Alternatively, perhaps you'd rather just denormalize them altogether via .Descendents():
var cons = coveragePriv.Element("Module").Descendents();
foreach (var con in cons)
{
var id = con.Value;
Console.WriteLine(id);
}

XMLElement.Value has unexpected results. In XML using .net you are really in charge of manually traversing the xml tree. If the element is text then value may return what you want but if its another element then not so much.
I have done a lot of xml parsing and I find there are way better ways to handle XML depending on what you are doing with the data.
1) You can look into XSLT transforms if you plan on outputting this data as text, more xml, or html. This is a great way to convert the data to some other readable format. We use this when we want to display our metadata on our website in html.
2) Look into XML Serialization. C# makes this very easy and it is amazing to use because then you can work with a regular C# object when consuming the data. MS even has tools to create the serlization class from the XML. I usually start with that, clean it up and add my own tweaks to make it work as I wish. The best way is to deserialize the object to XML and see if that matches what you have.
3) Try Linq to XML. It will allow you to query the XML as if it were a database. It is a little slower generally but unless you need absolute performance it works very well for minimizing your work.

Related

Parsing an XML file that can be singleline or multiline

I have an XML file that can be one-line:
<webshop><item></item><item></item></webshop>
or multiline:
<webshop>
<item>
</item>
<item>
</item>
</webshop>
or mixed:
<webshop>
<item></item>
<item></item>
</webshop>
Each tag also has a short variant like <webshop/> and <item/> where the tag is opened and closed in one pair of < > brackets.
each tag can appaer any amount of times, but the <item></item> or <item/> tag will only appaer inside <webshop> ... </webshop>. Also, the entire xml tag hierarchy is much larger then just these two tags (but I kept it simple for this question), and each tag can have attributes.
I'm trying to parse such an xmlfile using an xmlreader in c#, but I always run into a problem.
If I try:
while(reader.ReadToFollowing("webshop"))
{
Console.WriteLine("webshop");
//get attributes of webshop tag and do something...
while(reader.ReadToFollowing("item"))
{
Console.WriteLine("Item");
//get attributes of item tag and do something...
}
}
I never get all the data when the xml is singleline, mixed or the tags close themself (<item/> instead of <item></item>). Most of the time, the reader just stops after one instance of <webshop> or <item>
Is there a robust way to parse this xml, even if the exact lining is not known beforehand? I want to loop over all webshops, and for each webshop loop all over items, and then do something with this data.
Here's a very simple Linq to XML way to read your xml file:
var xml = #"<webshop><item></item><item></item></webshop>";
var reader = XDocument.Parse(xml);
var webshops = from w in reader.Elements("webshop")
select w;
foreach(var shop in webshops)
{
var items = from i in shop.Elements("item")
select i;
//can now grab any attributes of the items
}
Without more details on the attributes in these elements, I can't provide much more detail in an example, but I think this is enough to show you how it can be done.
If you aren't going to do any filtering and just want all of the webshop elements and then their constituent item subelements, you can simplify what I have above like so:
var webshops = reader.Elements("webshop");
foreach(var shop in webshops)
{
var items = shop.Elements("item");
//can now grab any attributes of the items
}
I originally included the more verbose way of structuring the queries in case you wanted to do any filtering or wanted to do something more complex then simply selecting the given elements. This simplified method will produce the same results as my first example.
Please take a look at the answer in this stack overflow discussion.
binding xml elements to model in MVC4
Basically, there are many ways to read xml files in you c# code. It all depend on what you are trying to achieve and how flexible it has to be. I personally prefer to XmlSeriealizer as it translate the xml into c# objects. the only downside is that you have to define classes for the xml to translate into.

Trouble getting data out of a xml file

I am trying to parse out some information from Google's geocoding API but I am having a little trouble with efficiently getting the data out of the xml. See link for example
All I really care about is getting the short_name from address_component where the type is administrative_area_level_1 and the long_name from administrative_area_level_2
However with my test program my XPath query returns no results for both queries.
public static void Main(string[] args)
{
using(WebClient webclient = new WebClient())
{
webclient.Proxy = null;
string locationXml = webclient.DownloadString("http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
using(var reader = new StringReader(locationXml))
{
var doc = new XPathDocument(reader);
var nav = doc.CreateNavigator();
Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type=administrative_area_level_1]/short_name").InnerXml);
Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type=administrative_area_level_2]/long_name").InnerXml);
}
}
}
Can anyone help me find what I am doing wrong, or recommending a better way?
You need to put the value of the node you're looking for in quotes:
".../address_component[type='administrative_area_level_1']/short_name"
↑ ↑
I'd definitely recommend using LINQ to XML instead of XPathNavigator. It makes XML querying a breeze, in my experience. In this case I'm not sure exactly what's wrong... but I'll come up with a LINQ to XML snippet instead.
using System;
using System.Linq;
using System.Net;
using System.Xml.Linq;
class Test
{
public static void Main(string[] args)
{
using(WebClient webclient = new WebClient())
{
webclient.Proxy = null;
string locationXml = webclient.DownloadString
("http://maps.google.com/maps/api/geocode/xml?address=1600"
+ "+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
XElement root = XElement.Parse(locationXml);
XElement result = root.Element("result");
Console.WriteLine(result.Elements("address_component")
.Where(x => (string) x.Element("type") ==
"administrative_area_level_1")
.Select(x => x.Element("short_name").Value)
.First());
Console.WriteLine(result.Elements("address_component")
.Where(x => (string) x.Element("type") ==
"administrative_area_level_2")
.Select(x => x.Element("long_name").Value)
.First());
}
}
}
Now this is more code1... but I personally find it easier to get right than XPath, because the compiler is helping me more.
EDIT: I feel it's worth going into a little more detail about why I generally prefer code like this over using XPath, even though it's clearly longer.
When you use XPath within a C# program, you have two different languages - but only one is in control (C#). XPath is relegated to the realm of strings: Visual Studio doesn't give an XPath expression any special handling; it doesn't understand that it's meant to be an XPath expression, so it can't help you. It's not that Visual Studio doesn't know about XPath; as Dimitre points out, it's perfectly capable of spotting errors if you're editing an XSLT file, just not a C# file.
This is the case whenever you have one language embedded within another and the tool is unaware of it. Common examples are:
SQL
Regular expressions
HTML
XPath
When code is presented as data within another language, the secondary language loses a lot of its tooling benefits.
While you can context switch all over the place, pulling out the XPath (or SQL, or regular expressions etc) into their own tooling (possibly within the same actual program, but in a separate file or window) I find this makes for harder-to-read code in the long run. If code were only ever written and never read afterwards, that might be okay - but you do need to be able to read code afterwards, and I personally believe the readability suffers when this happens.
The LINQ to XML version above only ever uses strings for pure data - the names of elements etc - and uses code (method calls) to represent actions such as "find elements with a given name" or "apply this filter". That's more idiomatic C# code, in my view.
Obviously others don't share this viewpoint, but I thought it worth expanding on to show where I'm coming from.
Note that this isn't a hard and fast rule of course... in some cases XPath, regular expressions etc are the best solution. In this case, I'd prefer the LINQ to XML, that's all.
1 Of course I could have kept each Console.WriteLine call on a single line, but I don't like posting code with horizontal scrollbars on SO. Note that writing the correct XPath version with the same indentation as the above and avoiding scrolling is still pretty nasty:
Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/" +
"address_component[type='administrative_area_level_1']" +
"/short_name").InnerXml);
In general, long lines work a lot better in Visual Studio than they do on Stack Overflow...
I would recommend just typing the XPath expression as part of an XSLT file in Visual Studio. You'll get error messages "as you type" -- this is an excellent XML/XSLT/XPath editor.
For example, I am typing:
<xsl:apply-templates select="#* | node() x"/>
and immediately get in the Error List window the following error:
Error 9 Expected end of the expression, found 'x'. #* | node() -->x<--
XSLTFile1.xslt 9 14 Miscellaneous Files
Only when the XPath expression does not raise any errors (I might also test that it selects the intended nodes, too), would I put this expression into my C# code.
This ensures that I will have no XPath -- syntax and semantic -- errors when I run the C# program.
dtb's response is accurate. I wanted to add that you can use xpath testing tools like the link below to help find the correct xpath:
http://www.bit-101.com/xpath/
string url = #"http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false";
string value = "administrative_area_level_1";
using(WebClient client = new WebClient())
{
string wcResult = client.DownloadString(url);
XDocument xDoc = XDocument.Parse(wcResult);
var result = xDoc.Descendants("address_component")
.Where(p=>p.Descendants("type")
.Any(q=>q.Value.Contains(value))
);
}
The result is an enumeration of "address_component"s that have at least one "type" node that has contains the value you're searching for. The result of the query above is an XElement that contains the following data.
<address_component>
<long_name>California</long_name>
<short_name>CA</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
I would really recommend spending a little time learning LINQ in general because its very useful for manipulating and querying in-memory objects, querying databases and tends to be easier than using XPath when working with XML. My favorite site to reference is http://www.hookedonlinq.com/

Best way to replace XML Text

I have a web service which returns the following XML:
<Validacion>
<Es_Valido>NK7+22XrSgJout+ZeCq5IA==</Es_Valido>
</Validacion>
<Estatus>
<Estatus>dqrQ7VtQmNFXmXmWlZTL7A==</Estatus>
</Estatus>
<Generales>
<Nombre>V4wb2/tq9tEHW80tFkS3knO8i4yTpJzh7Jqi9MxpVVE=</Nombre>
<Apellido>jXyRpjDQvsnzZz+wsq6b42amyTg2np0wckLmQjQx1rCJc8d3dDg6toSdSX200eGi</Apellido>
<Ident_Clie>IYbofEiD+wOCJ+ujYTUxgsWJTnGfVU+jcQyhzgQralM=</Ident_Clie> <Fec_Creacion>hMI2YyE5h2JVp8CupWfjLy24W7LstxgmlGoDYjPev0r8TUf2Tav9MBmA2Xd9Pe8c</Fec_Creacion>
<Nom_Asoc>CF/KXngDNY+nT99n1ITBJJDb08/wdou3e9znoVaCU3dlTQi/6EmceDHUbvAAvxsKH9MUeLtbCIzqpJq74e QfpA==</Nom_Asoc>
<Fec_Defuncion />
</Generales>
The text inside the tags in encrypted, I need to decrypt the text, I've come up with a regular expressions solution but I don't think it's very optimal, is there a better way to do this? thanks!
I wouldn't use a regular expression. Load the XML with something like LINQ to XML, find every element which just has a text child, and replace the contents of that child with the decrypted form.
Do you know which elements will be encrypted? That would make it even easier. Basically you'll want something along the lines of:
// It's possible that modifying elements while executing Descendants()
// would be okay, but I'm not sure
List<XElement> elements = doc.Descendants().ToList();
foreach (XElement element in elements)
{
if (ShouldDecrypt(element)) // Whatever this would do
{
element.Value = Decrypt(element.Value);
}
}
(I'm assuming you know how to do the actual decryption part, of course.)
Never ever use regular expressions to parse XML. XmlReader and XmlDocument, both found inside System.Xml, provide a way better way to parse XML.
Do you know the type of encryption used? Look here to get the basics on the Cryptology capabilities in .NET

Efficient way to construct an XML document from a string containing well formed XML for navigation?

I have a string that contains well formed xml in it. I want to navigate the XML in that string to extract the text in certain nodes. How do I efficiently accomplish this using a built-in .NET class. Which .NET XML class would you use and why?
Many thanks for your help.
Note 1: Linq is not available to me.
Note 2: Editing the XML is not important. Read-only access is what I need.
For speed, use an XmlReader:
using (StringReader sr = new StringReader(myString))
using (XmlReader xr = XmlReader.Create(sr))
{
while (xr.Read())
{
if (xr.NodeType == XmlNodeType.Element && xr.Name == "foo")
{
Console.WriteLine(xr.ReadString());
}
}
}
The above prints out the text content of every element named "foo" in the XML document. (Well, sort of. ReadString doesn't handle nested elements very gracefully.)
Using an XPathDocument is slower, because the entire document gets parsed before you can start searching it, but it has the merit of simplicity:
using (StringReader sr = new StringReader(myString))
{
XPathDocument d = new XPathDocument(sr);
foreach (XPathNavigator n in d.CreateNavigator().Select("//foo/text()"))
{
Console.WriteLine(n.Value);
}
}
If you're not concerned with performance or memory utilization, it's simplest to use an XmlDocument:
XmlDocument d = new XmlDocument();
d.LoadXml(myString);
foreach (XmlNode n in d.SelectNodes("//foo/text()"))
{
Console.WriteLine(n.Value);
}
For navigation? Probably XPathDocument:
string s = #"<xml/>";
XPathDocument doc = new XPathDocument(new StringReader(s));
From MSDN,
Provides a fast, read-only, in-memory representation of an XML document by using the XPath data model.
Unlike XmlDocument etc, it is optimised for readonly usage; more efficient but less powerful (i.e. you can't edit it). For notes on how to query it, see here.
I would use XmlDocument.Load() to get a DOM from the string. Then you can traverse it using the appropriate DOM methods or XPATH as needed.
It depends on the structure of XML. If it is relatively simple, then the most efficient way to wrap the string into StringReader, and then wrap that into XmlReader. The benefit is that you won't have to create an XML tree in memory, copying data from the string - you'll just read nodes one by one.
If the document structure is complicated enough, you might need (or want) a DOM - in which case XDocument.Parse should do the trick.

Xml Comparison in C#

I'm trying to compare two Xml files using C# code.
I want to ignore Xml syntax differences (i.e. prefix names).
For that I am using Microsoft's XML Diff and Patch C# API.
It works for some Xml's but I couldn't find a way to configure it to work with the following two Xml's:
XML A:
<root xmlns:ns="http://myNs">
<ns:child>1</ns:child>
</root>
XML B:
<root>
<child xmlns="http://myNs">1</child>
</root>
My questions are:
Am I right that these two xml's are semantically equal (or isomorphic)?
Can Microsoft's XML Diff and Patch API be configured to support it?
Are there any other C# utilities to to this?
The documents are isomorphic as can be shown by the program below. I think if you use XmlDiffOptions.IgnoreNamespaces and XmlDiffOptions.IgnorePrefixes to configure Microsoft.XmlDiffPatch.XmlDiff, you get the result you want.
using System.Linq;
using System.Xml.Linq;
namespace SO_794331
{
class Program
{
static void Main(string[] args)
{
var docA = XDocument.Parse(
#"<root xmlns:ns=""http://myNs""><ns:child>1</ns:child></root>");
var docB = XDocument.Parse(
#"<root><child xmlns=""http://myNs"">1</child></root>");
var rootNameA = docA.Root.Name;
var rootNameB = docB.Root.Name;
var equalRootNames = rootNameB.Equals(rootNameA);
var descendantsA = docA.Root.Descendants();
var descendantsB = docB.Root.Descendants();
for (int i = 0; i < descendantsA.Count(); i++)
{
var descendantA = descendantsA.ElementAt(i);
var descendantB = descendantsB.ElementAt(i);
var equalChildNames = descendantA.Name.Equals(descendantB.Name);
var valueA = descendantA.Value;
var valueB = descendantB.Value;
var equalValues = valueA.Equals(valueB);
}
}
}
}
I know that you're focus isn't on unit tests, but XMLUnit can compare two XML files and I think it's able to solve your example. Maybe you could look at the code ahd figure out your solution.
I've got an answer by Martin Honnen in XML and the .NET Framework MSDN Forum.
In short he suggests to use XQuery 1.0's deep-equal function and supplies some C# implementations. Seems to work.
It might be an idea to load XmlDocument instances from each xml file, and compare the XML DOM instead? Providing the correct validation is done on each, that should give you a common ground for a comparison, and should allow standard difference reporting. Possibly even the ability to update one from the other with the delta.
Those documents aren't semantically equivalent. The top-level element of the first is in the http://myNS namespace, while the top-level element of the second is in the default namespace.
The child elements of the two documents are equivalent. But the documents themselves aren't.
Edit:
There's a world of difference between xmls:ns='http://myNS' and xmlns='http://myNS', which I appear to have overlooked. Anyway, those documents are semantically equivalent and I'm just mistaken.

Categories