Parsing XML in C#

Parsing XML in C# - c#

I am still working on a project and I am enjoying it greatly.
I wanted to see if I could implement a live updating feed using XML
at the moment I dont even know how to parse this particular type of XML as all the tutorials I have found are for parsing node values etc
but I was thinking something along the lines of this
<Object name="ObjectName" type="ObjectType" size="ObjectSize" image="ObjectImage" />
if you guys could help me understand how to access the inner elements of from that node that would be amazing, and if it is not too much to ask just a small explanation so I understand. I know how to parse XML that looks like this using XElement
<Object>
<Name>ObjectName</Name>
<Type>ObjectType</Type>
<Size>ObjectSize</Size>
<Image>ObjectImage</Image>
</Object>
I just cant seem to parse the example at the top, I dont mind if its Linq as long as it is in C#, maybe tell me why you would chose one over the other? Also have you got any idea on how to perhaps check if the file has changed, so I could implement a live update?
Thanks for your Help
John

The example at the top uses attributes instead of sub-elements but it's just as easy to work with:
XElement element = XElement.Parse(xml);
string name = (string) element.Attribute("name");
string type = (string) element.Attribute("type");
string size = (string) element.Attribute("size");
string image = (string) element.Attribute("image");
I usually prefer to use the explicit string conversion instead of the Value property as if you perform the conversion on a null reference, you just end up with a null string reference instead of a NullReferenceException. Of course, if it's a programming error for an attribute to be missing, then an exception is more appropriate and the Value property is fine. (The same logic applies to converting XElement values as well, by the way.)

If you have a domain object that represents your document (usually the case), then the XmlSerializer is quite easy to use.
[XmlRoot("Object")
public class Item
{
public string Name { get; set; }
public string Type { get; set; }
public string Size { get; set; }
public string Image { get; set; }
}
Usage:
XmlSerializer ser = new XmlSerializer(typeof(Item));
Item item = (Item)ser.Deserialize(someXmlStream);
I find using this approach easier than manual parsing when an entire document represents a domain object of some kind.

Use can also use XEelment.FirstAttribute to get the first attribute on the element and then XAttribute.NextAttribute to loop through them all. This doesn't rely on you knowing that the attribute is present.
XAttribute attribute = element.FirstAttribute;
while (attribute != null)
{
// Do stuff
attribute = attribute.NextAttribute`
}

Related

Is there a way to construct a deeply nested Dictionary with nice syntax and IntelliSense support?

I am trying to create an object that contains the parsed register values of a robot. I have an ASCII text file that contains representations of the variables and arrays. However, I am struggling to think of an easy way to use the deeply nested values. Ideally, the syntax to use the deeply nested objects would be something like Registers["PositionRegister"]["CurrentPosition"]["X_Coordinate"] and the dictionary would be something like this:
(There was a JSON representation here of what I wanted the dictionary to look like, but people kept suggesting JSON serialization...)
However, since I am parsing the file and constructing the object at the same time, I don't know how deep the nesting will go until the parsing is complete.
I've tried using a simple Register class that can contain a dictionary of sub-Registers:
public class Register
{
public Dictionary<string, Register>? subRegisters;
public string name { get; set; }
public string value { get; set; }
}
However, the usage turns into super unintuitive syntax like motionRegister.subRegisters["Register1"].subRegisters["SubRegister1"].subRegisters["Value1"].Value and I'm duplicating information by using the name as the key.
I've also tried using only nested Dictionaries like:
public Dictionary<string, object> CreateRegisters()
{
Dictionary<string, object> TopLevelRegisters = new();
Dictionary<string, object> SubRegisters = new();
Dictionary<string, object> SubSubRegisters = new();
SubSubRegisters.Add("SubSubElement1", "5678");
SubRegisters.Add("SubElement1", "1234");
SubRegisters.Add("SubElement2", SubSubRegisters);
SubRegisters.Add("SubElement3", "1357");
TopLevelRegisters.Add("Register1", SubRegisters);
return TopLevelRegisters;
}
but they end up being super difficult to use since IntelliSense doesn't know what the object will be until runtime. I would return it as another Dictionary, but I don't know how deep the nesting will have to go.
I'm sure that there's a simple solution, but I can't seem to find it.

The closest thing I could come up with is to
Subclass Dictionary<>, and define the subclass in terms of itself (allows for arbitrary depth, and prevents the need for what you call the "unintuitive syntax" of a sub-dictionary manifesting in the path)
Hide the existing indexer with a new implementation (allows for auto-construction of a new level)
Provide a Value property for storing the value of the leaf node.
Provide a ToString() that returns Value (allows for the elimination of .Value from the syntax in certain cases, such as concatenation of strings, WriteLine, etc.)
NOTE: A Name property is dropped altogether because the name can be determined based on the dictionary key.
This code will look something like this
public class RecursiveDictionary : Dictionary<string, RecursiveDictionary>
{
public string? Value { get; set; }
public override string? ToString() => Value;
public new RecursiveDictionary this[string key]
{
get
{
if (!TryGetValue(key, out var subDictionary))
base[key] = subDictionary = new RecursiveDictionary();
return subDictionary;
}
set => base[key] = value;
}
}
During parsing, you only have to output each path to a terminal Value or if you keep track of where you're at in the parsing, simply set the Value of the current (sub)dictionary. (Side note, it doesn't matter how you build it or whether the source is proprietary (your case), JSON, or some other format).
Here's an example construction:
var rd = new RecursiveDictionary();
rd["x"].Value = "Hi!";
rd["x"]["y"].Value = "VALUE";
rd["a"]["b"]["c"]["d"].Value = "VALUETWO";
Notice, I didn't have to allocate RecursiveDictionary for every level; this is because the get portion of the indexer does that for me.
From the static Intellisense (i.e. while program is not running), you can hover over rd and see it is a RecursiveDictionary,
hover over Value and see that it is a string,
and hover over one of the ] or [ and see that it is an indexer on the dictionary:
Now for the dynamic (runtime under debugger) I'm not going to show all the code for various accesses or the Intellisense for it, but I can emulate both in a watch window. What you see in the watch window below could just as easily been lines of Console.WriteLine(...) with you hovering over the various locations. So imagine these examples:
CAVEAT: The ToString() might give you some unexpected results depending on the situation. Are you concatenating strings, are you viewing in the debugger, are you using in a WriteLine(), are you passing a sub-dictionary to another method, etc. If that becomes problematic, then go for a slightly less terse syntax that requires you to always get the value explicitly via the Value property.

C# - XML deserialization - ignore elements with attribue

I need to deserialize some xml to c# objects. This is my class:
[XmlRoot("root")]
[Serializable]
public class MyRoot
{
[XmlElement("category")]
public List<Category> Categories { get; set; }
}
I'm deserializing like this:
root = (MyRoot)new XmlSerializer(typeof(MyRoot)).Deserialize(new StringReader(client.DownloadString(XmlUrl)));
But I want to ignore some Category elements with specified "id" attribute values. Is there some way I can do this?

Implementing IXmlSerializable is one way to go, but perhaps an easier path would be simply modifying the XML (using LINQ or XSLT?) ahead of time:
HashSet<string> badIds = new HashSet<string>();
badIds.Add("1");
badIds.Add("excludeme");
XDocument xd = XDocument.Load(new StringReader(client.DownloadString(XmlUrl)));
var badCategories = xd.Root.Descendants("category").Where(x => badIds.Contains((string)x.Attribute("id")));
if (badCategories != null && badCategories.Any())
badCategories.Remove();
MyRoot root = (MyRoot)new XmlSerializer(typeof(MyRoot)).Deserialize(xd.Root.CreateReader());
You could do something similar on your resulting collection, but it's entirely possible you don't serialize the id, and may not want to/need to otherwise.

Another approach is to have a property named something like ImportCategories with the [XmlElement("category")] attribute and then have Categories as a property that returns a filtered list from ImportCategories using LINQ.
Then your code would do the deserialisaion and then use root.Categories.

To do this the Microsoft way, you would need to implement an IXmlSerializable interface for the class that you want to serialize:
https://msdn.microsoft.com/en-us/library/system.xml.serialization.ixmlserializable(v=vs.110).aspx
It's going to require some hand-coding on your part - you basically have to implement the WriteXml and ReadXml methods, and you get a XmlWriter and a XmlReader interface respectively, to do what you need to do.
Just remember to keep your classes pretty atomic, so that you don't end up custom-serializing for the entire object graph (ugh).

XML LINQ query returns nothing

I'm trying to parse an xml file using LINQ, but as I understand the query returns null. (It's WP7)
Here's the code:
var resultQuery = from q in XElement.Parse(result).Elements("Question")
select new Question
{
QuestionId = q.Attribute("id").Value,
Type = q.Element("Question").Attribute("type").Value,
Subject = q.Element("Subject").Value,
Content = q.Element("Content").Value,
Date = q.Element("Date").Value,
Timestamp = q.Element("Timestamp").Value,
Link = q.Element("Link").Value,
CategoryId = q.Element("Category").Attribute("id").Value,
UserId = q.Element("UserId").Value,
UserNick = q.Element("UserNick").Value,
UserPhotoURL = q.Element("UserPhotoURL").Value,
NumAnswers = q.Element("NumAnswers").Value,
NumComments = q.Element("NumComments").Value,
};
"result" is the xml string, just like this one.
http://i48.tinypic.com/1ex5s.jpg (couldn't post properly formatted text so here's a pic : P )
Error:
http://i48.tinypic.com/2uyk2ok.jpg
Sorry, if I haven't explained it properly and if this has already been asked (tried searching but didn't help).

You have run into an XML namespace problem. When you are just querying "Question", the string is translated into an XName with the default namespace. There are no elements in the default namespace in your XML, only elements in the urn:yahoo:answers namespace (see the top level element, where it says xmlns="urn:yahoo:answers").
You need to query the correct XML namespace, like this:
var ns = new XNameSpace("urn:yahoo:answers");
var resultQuery = from q in XElement.Parse(result).Elements(ns + "Question");
When picking out the individual properties, remember to add the namespace also.
XName is a class that represents an XML name, which might have a namespace defined by XNameSpace. These two classes has an implicit conversion operator implemented that allows you to implicitly convert from string to XName. This is the reason the calls work by just specifying a string name, but only when the elements are in the default namespace.
The implicitness of this makes it very easy easier to work with XML namespaces, but when one does not know the mechanism behind, it gets confusing very quickly. The XNameclass documentation has some excellent examples.

Two ways to fix it:
Add the root element was part since Elements only search one level - XElement.Parse(result).Root.Elements("Question")
Use the Descendants method since that will search the entire xml tree.

XML linq query lists first elements but not all

I have this XML file that I parse into its elements and create a list of a custom object Module.
XDocument kobra = XDocument.Load(new StringReader(results.OuterXml));
XNamespace ns = "#RowsetSchema";
var kobraNodeList = from s in kobra.Descendants(ns + "row")
select new Module
{
id = s.Attribute("ows_ID").Value,
name = s.Attribute("ows_Title").Value,
sourceFile = s.Attribute("ows_Source_x0020_Message_x0020_File_").Value,
scope = Scope.KOBRA,
component = string.Empty
};
and here's my Module struct:
public struct Module
{
public string name;
public Scope scope;
public string component;
public int wordCound;
public string id;
public string sourceFile;
}
The code works fine, but things get weird when I try to convert the var kobraNodeList into a list of Modules, I get a System.NullReferenceException at the AddRange line:
this.moduleList = new List<Module>;
this.moduleList.AddRange(kobraNodeList);
When trying to debug, I notice that although kobraNodeList.Count() also returns System.NullReferenceException, a kobraNodeList.Any() returns true, and kobraNodeList.First() returns a perfectly valid and correct Module struct with the desired data.
The XML file is valid, and if I replace the linq query with this:
var kobraNodeList = from s in kobra.Descendants(ns + "row")
select s;
I get a valid list of XElement, which I can Count() ok.
Can someone explain me what's wrong? BTW, I'm using .NET 3.5.

That looks like one (or more) of kobra.Descendants has ows_ID, ows_Title or ows_Source_x0020_Message_x0020_File_ attribute missing.
Linq uses deferred execution, so it won't try to build the sequence until you ask for the items. When you call Any() or First(), it only needs the first item in the sequence to work, which tells me that the first item in kobra.Descendants does have all of the required nodes.
However, one of the items after the first is probably missing at least one of those attributes - so you end up asking for the Value of a NULL attribute.

Inside
select new Module
{
// properties...
}
You could be running into a NullReferenceException as you access .Value on elements that might not exist in the XML document. Your first object in the collection is likely fine, hence your results when using Any() or First(). Subsequent items could be missing elements/attributes you are trying to use.
Try this as a replacement instead of using .Value directly.
id = (string)s.Attribute("whatever") // etc.

One of your lines such as s.Attribute("ows_Source_x0020_Message_x0020_File_") will be returning null for one of the records so s.Attribute("ows_Source_x0020_Message_x0020_File_").Value would cause the null reference exception.

Databinding question: DataGridView <=> XDocument (using LINQ-to-XML)

Learning LINQ has been a lot of fun so far, but despite reading a couple books and a bunch of online resources on the topic, I still feel like a total n00b. Recently, I just learned that if my query returns an Anonymous type, the DataGridView I'm populating will be ReadOnly (because, apparently Anonymous types are ReadOnly.)
Right now, I'm trying to figure out the easiest way to:
Get a subset of data from an XML file into a DataGridView,
Allow the user to edit said data,
Stick the changed data back into the XML file.
So far I have Steps 1 and 2 figured out:
public class Container
{
public string Id { get; set; }
public string Barcode { get; set; }
public float Quantity { get; set; }
}
// For use with the Distinct() operator
public class ContainerComparer : IEqualityComparer<Container>
{
public bool Equals(Container x, Container y)
{
return x.Id == y.Id;
}
public int GetHashCode(Container obj)
{
return obj.Id.GetHashCode();
}
}
var barcodes = (from src in xmldoc.Descendants("Container")
where src.Descendants().Count() > 0
select
new Container
{
Id = (string)src.Element("Id"),
Barcode = (string)src.Element("Barcode"),
Quantity = float.Parse((string)src.Element("Quantity").Attribute("value"))
}).Distinct(new ContainerComparer());
dataGridView1.DataSource = barcodes.ToList();
This works great at getting the data I want from the XML into the DataGridView so that the user has a way to manipulate the values.
Upon doing a Step-thru trace of my code, I'm finding that the changes to the values made in DataGridView are not bound to the XDocument object and as such, do not propagate back.
How do we take care of Step 3? (getting the data back to the XML) Is it possible to Bind the XML directly to the DataGridView? Or do I have to write another LINQ statement to get the data from the DGV back to the XDocument?
Suggstions?

So I think the problem that you have is that there is no relationship between the objects you are binding to and the XML source document.
What you are doing is creating a heap of objects, pushing in some strings and a float, and then binding the grid view to that list of objects. All the objects know is that some data was given in the constructor, it has no knowledge of where that data come from. When you call "select new something()" you are creating a new object, that new object doesn't know or care that it was created using LINQ to XML...
The easiest way I can think of to resolve it would be to change the setter of your container properties so that they load the XML, change the element they are supposed to represent, and then save the xml again. Perhaps giving the Container a reference to the element or document would make this easier.
The other way would be to hook into the grid view events so that when rows are edited you can capture the changes and write them to the XML file.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing XML in C# - c#

Related

Is there a way to construct a deeply nested Dictionary with nice syntax and IntelliSense support?

C# - XML deserialization - ignore elements with attribue

XML LINQ query returns nothing

XML linq query lists first elements but not all

Databinding question: DataGridView <=> XDocument (using LINQ-to-XML)

Categories

Resources