Linq duplicate elements when iterating over XML - c#

<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<stock-items>
<stock-item>
<name>Loader 34</name>
<sku>45GH6</sku>
<vendor>HITINANY</vendor>
<useage>Lifter 45 models B to C</useage>
<typeid>01</typeid>
<version>01</version>
<reference>33</reference>
<comments>EOL item. No Re-order</comments>
<traits>
<header>56765</header>
<site>H4</site>
<site>A6</site>
<site>V1</site>
</traits>
<type-validators>
<actions>
<endurance-tester>bake/shake</endurance-tester>
</actions>
<rules>
<results-file>Test-Results.txt</results-file>
<file-must-contain file-name="Test-Results.xml">
<search>
<term>[<![CDATA[<"TEST TYPES 23 & 49 PASSED"/>]]></term>
<search-type>exactMatch</search-type>
</search>
</file-must-contain>
</rules>
</type-validators>
</stock-item>
</stock-items>
Im trying to get the rules fragment from the xml above into a string so it can be added to a database. Currently the search element and its contents are added twice. I know why this is happing but cant figure out how to prevent it.
Heres my code
var Rules = from rules in Type.Descendants("rules")
select rules.Descendants();
StringBuilder RulesString = new StringBuilder();
foreach (var rule in Rules)
{
foreach (var item in rule)
{
RulesString.AppendLine(item.ToString());
}
}
Console.WriteLine(RulesString);
Finally any elements in rules are optional and some of these elements may or may not contain other child elements up to 4 or 5 levels deep. TIA
UPDATE:
To try and make it clearer what im trying to achieve.
From the xml above I should end up with a string containing everthing in the rules element, exactly like this:
<results-file>Test-Results.txt</results-file>
<file-must-contain file-name="Test-Results.xml">
<search>
<term>[<![CDATA[<"TEST TYPES 23 & 49 PASSED"/>]]></term>
<search-type>exactMatch</search-type>
</search>
</file-must-contain>

Objective is to extract the entire contents of the rules element as is while taking account that the rules element may or may not contains child elements several levels deep
If you just want the entirety of the rules element as a string (rather than caring about its contents as xml), you don't need to dig into its contents, you just need to get the element as an XNode and then call ToString() on it :
The following example uses this method to retrieve indented XML.
XElement xmlTree = new XElement("Root",
new XElement("Child1", 1)
);
Console.WriteLine(xmlTree);
This example produces the following output:
<Root>
<Child1>1</Child1>
</Root>

if you want to prevent duplicates than you will need to use Distinct() or GroupBy() after parsing the xml and before building the string.
I'm still not fully understanding exactly what the output should be, so I can't provide a clear solution on what exactly to use, or how, in terms of locating duplicates. If you can refine the original post that would help.
we need the structure of the xml as it would appear in your scenario. nesting and all.
we need an example of the final string.
saving it to a db doesn't really matter for this post so you only need to briefly mention that once, if at all.

Related

Count how many differences in 2 XML files

Imagine one XML as:
<foo>
<node1>Some value</node1>
<node2>BB</node2>
<node3>TTTTT</node3>
<node4>XXXX</node4>
</foo>
and another XML as:
<foo>
<node1>Something Else</node1>
<node4>XXXX</node4>
<node5>TTTTT</node5>
</foo>
The difference count here is 3
a) node1 value is different
b) node2 is missing in 2nd XML
c) node5 is missing in 1st XML
I've tried using the XMLDiff class but the result is too cumbersome for my needs.
Schema:
Root named "foo" and one single set of childrens with one value each.
Question:
What's the simplest and fastest way to code this in C#?
One way to do this might be to generate a list of XPath assertions from your first document, in the form:
/foo/node1 = "Some value"
/foo/node2 = "BB"
/foo/node3 = "TTTT"
/foo/node4 = "XXXX"
and then apply these assertions to the second document to count how many of them are true. Because this won't catch data that is absent on the first document and present in the second, you might want to do the inverse as well. It's not perfect, of course, for example it won't catch differences in element order. But you haven't actually defined what you mean by a significant difference, and you could adjust the XPath expressions to assert what you consider significant. For example you could vary the last assertion to:
count(/foo/node4[. = "XXXX"]) = 1
The simplest and fastest way to code this, of course, is not in C#, unless that happens to be the only programming language you know. Using XSLT or XQuery would be much better.
Have you considered using XNode.DeepEquals, with the root (in this case 'foo') of each XML file being your node? The MSDN page on how to use it is here:
http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.deepequals.aspx

Parsing an XML file that can be singleline or multiline

I have an XML file that can be one-line:
<webshop><item></item><item></item></webshop>
or multiline:
<webshop>
<item>
</item>
<item>
</item>
</webshop>
or mixed:
<webshop>
<item></item>
<item></item>
</webshop>
Each tag also has a short variant like <webshop/> and <item/> where the tag is opened and closed in one pair of < > brackets.
each tag can appaer any amount of times, but the <item></item> or <item/> tag will only appaer inside <webshop> ... </webshop>. Also, the entire xml tag hierarchy is much larger then just these two tags (but I kept it simple for this question), and each tag can have attributes.
I'm trying to parse such an xmlfile using an xmlreader in c#, but I always run into a problem.
If I try:
while(reader.ReadToFollowing("webshop"))
{
Console.WriteLine("webshop");
//get attributes of webshop tag and do something...
while(reader.ReadToFollowing("item"))
{
Console.WriteLine("Item");
//get attributes of item tag and do something...
}
}
I never get all the data when the xml is singleline, mixed or the tags close themself (<item/> instead of <item></item>). Most of the time, the reader just stops after one instance of <webshop> or <item>
Is there a robust way to parse this xml, even if the exact lining is not known beforehand? I want to loop over all webshops, and for each webshop loop all over items, and then do something with this data.
Here's a very simple Linq to XML way to read your xml file:
var xml = #"<webshop><item></item><item></item></webshop>";
var reader = XDocument.Parse(xml);
var webshops = from w in reader.Elements("webshop")
select w;
foreach(var shop in webshops)
{
var items = from i in shop.Elements("item")
select i;
//can now grab any attributes of the items
}
Without more details on the attributes in these elements, I can't provide much more detail in an example, but I think this is enough to show you how it can be done.
If you aren't going to do any filtering and just want all of the webshop elements and then their constituent item subelements, you can simplify what I have above like so:
var webshops = reader.Elements("webshop");
foreach(var shop in webshops)
{
var items = shop.Elements("item");
//can now grab any attributes of the items
}
I originally included the more verbose way of structuring the queries in case you wanted to do any filtering or wanted to do something more complex then simply selecting the given elements. This simplified method will produce the same results as my first example.
Please take a look at the answer in this stack overflow discussion.
binding xml elements to model in MVC4
Basically, there are many ways to read xml files in you c# code. It all depend on what you are trying to achieve and how flexible it has to be. I personally prefer to XmlSeriealizer as it translate the xml into c# objects. the only downside is that you have to define classes for the xml to translate into.

How to read XPathNodelist

Programming in C# I got an Xml.XpathNodeList object "ResultsPartRel.nodeList". Debugging it with Visual Studio I can read "Results View ; Expanding the Results View will enumerate the IEnumerable"
Questions:
1.- Which is the best way to read those nodes?
2.- I program the next code but I dont get the expected results. I get the same result twice. (ResultsPartRel.nodeList contains 2 nodes)
List<string> childrenName = new List<string>();
foreach (XmlElement node in ResultsPartRel.nodeList)
{
string nameChildren = node.SelectSingleNode("//related_id/Item/keyed_name").InnerText;
childrenName.Add(nameChildren);
}
Thank you in advance.
EDIT
<related_id>
<Item>
<classification>Component</classification>
<id></id>
<keyed_name>glass</keyed_name> <!-- I want to get this InnerText -->
</Item>
</related_id>
<source_id>968C45A47942454DA9B34245A9F72A8C</source_id>
<itemtype>5E9C5A12CC58413A8670CF4003C57848</itemtype>
Well we really need to see the XML sample and a verbal explanation of which data you want to extract. Currently you do a node.SelectSingleNode(...) so that looks as if you want to select a path relative to node but then you use an absolute path starting with //, that is why you get the same result twice.
So you want node.SelectSingleNode(".//related_id/Item/keyed_name") or perhaps even node.SelectSingleNode("related_id/Item/keyed_name"), depending on the XML you have.
You can get the first element. (With the "//" means search for all following tags, so you will probably get more results).When you want the first element write "//related_id/Item/keyed_name*1*".
Or you can write the exact path.(this is the safest way) To make it easy for yourself there is a Firefox extension xPath Checker load the document in firefox than right click the element and show Xpath. Then you get a exact path.

Find a node in an XML file - performance improvement in C#

Say I need to find a particular node in an XML file, using C#.
<node attribute="my-target-attribute">
"my-target-attribute" is a variable input at runtime.
The node is in no particular place in the XML file, I basically just need to scan the entire XML hierarchy until I find a node with the matching attribute.
Is there any way I can pre-process the XML so finding the node will be faster? I need to keep the original XML structure in place. The XML file could potentially have 10,000 nodes.
You can certainly preprocess the XML to make lookups faster:
Dictionary<string, XmlElement> elementMap = new Dictionary<string, XmlElement>();
AddElementToMap(doc.DocumentElement, elementMap);
...
private void AddElementToMap(XmlElement elm, Dictionary<string, XmlElement> elementMap)
{
elementMap[elm.GetAttribute("attribute")] = elm;
foreach (XmlElement child in elm.SelectNodes("node"))
{
AddElementToMap(child, elementMap);
}
}
Once you've done this, a lookup is simple:
XmlElement elm = elementMap[value];
That code assumes that every element in your document is named "node", that every one has an attribute named "attribute", and that all of the attribute values are unique. The code's more complicated if any of those conditions are untrue, but not exceptionally so.
You could use xslt to transform the xml so that the node is in a known depth. Then when you select with XPath, you can select accordingly without using the // operator.
With VTD-XML (http://vtd-xml.sf.net) you can index the XML document into VTD+XML,
which eliminate the overhead of parsing
http://www.codeproject.com/KB/XML/VTD-XML-indexing.aspx
similar to another answer, you can use xpath similar like selectNodes("//[#attribute='my-target-attribute']"). // will search nodes in all level depths.

How to tell if a string is xml?

We have a string field which can contain XML or plain text. The XML contains no <?xml header, and no root element, i.e. is not well formed.
We need to be able to redact XML data, emptying element and attribute values, leaving just their names, so I need to test if this string is XML before it's redacted.
Currently I'm using this approach:
string redact(string eventDetail)
{
string detail = eventDetail.Trim();
if (!detail.StartsWith("<") && !detail.EndsWith(">")) return eventDetail;
...
Is there a better way?
Are there any edge cases this approach could miss?
I appreciate I could use XmlDocument.LoadXml and catch XmlException, but this feels like an expensive option, since I already know that a lot of the data will not be in XML.
Here's an example of the XML data, apart from missing a root element (which is omitted to save space, since there will be a lot of data), we can assume it is well formed:
<TableName FirstField="Foo" SecondField="Bar" />
<TableName FirstField="Foo" SecondField="Bar" />
...
Currently we are only using attribute based values, but we may use elements in the future if the data becomes more complex.
SOLUTION
Based on multiple comments (thanks guys!)
string redact(string eventDetail)
{
if (string.IsNullOrEmpty(eventDetail)) return eventDetail; //+1 for unit tests :)
string detail = eventDetail.Trim();
if (!detail.StartsWith("<") && !detail.EndsWith(">")) return eventDetail;
XmlDocument xml = new XmlDocument();
try
{
xml.LoadXml(string.Format("<Root>{0}</Root>", detail));
}
catch (XmlException e)
{
log.WarnFormat("Data NOT redacted. Caught {0} loading eventDetail {1}", e.Message, eventDetail);
return eventDetail;
}
... // redact
If you're going to accept not well formed XML in the first place, I think catching the exception is the best way to handle it.
One possibility is to mix both solutions. You can use your redact method and try to load it (inside the if). This way, you'll only try to load what is likely to be a well-formed xml, and discard most of the non-xml entries.
If your goal is reliability then the best option is to use XmlDocument.LoadXml to determine if it's valid XML or not. A full parse of the data may be expensive but it's the only way to reliably tell if it's valid XML or not. Otherwise any character you don't examine in the buffer could cause the data to be illegal XML.
Depends on how accurate a test you want. Considering that you already don't have the official <xml, you're already trying to detect something that isn't XML. Ideally you'd parse the text by a full XML parser (as you suggest LoadXML); anything it rejects isn't XML. The question is, do you care if you accept a non-XML string? For instance,
are you OK with accepting
<the quick brown fox jumped over the lazy dog's back>
as XML and stripping it? If so, your technique is fine. If not, you have to decide how tight a test you want and code a recognizer with that degree of tightness.
How is the data coming to you? What is the other type of data surrounding it? Perhaps there is a better way; perhaps you can tokenise the data you control, and then infer that anything that is not within those tokens is XML, but we'd need to know more.
Failing a cute solution like that, I think what you have is fine (for validating that it starts and ends with those characters).
We need to know more about the data format really.
If the XML contains no root element (i.e. it's an XML fragment, not a full document), then the following would be perfectly valid sample, as well - but wouldn't match your detector:
foo<bar/>baz
In fact, any text string would be valid XML fragment (consider if the original XML document was just the root element wrapping some text, and you take the root element tags away)!
try
{
XmlDocument myDoc = new XmlDocument();
myDoc.LoadXml(myString);
}
catch(XmlException ex)
{
//take care of the exception
}

Categories