Parsing XML file using C#?

Parsing XML file using C#? - c#

I'm new to both XML and C#; I'm trying to find a way to efficiently parse a given xml file to retrieve relevant numerical values, base on the "proj_title" value=heat_run or any other possible values. For example, calculating the duration of a particular test run (proj_end val-proj_start val).
ex.xml:
<proj ID="2">
<proj_title>heat_run</proj_title>
<proj_start>100</proj_start>
<proj_end>200</proj_end>
</proj>
...
We can't search by proj ID since this value is not fixed from test run to test run. The above file is huge: ~8mb, and there's ~2000 tags w/ the name proj_title. is there an efficient way to first find all tag names w/ proj_title="heat_run", then to retrieve the proj start and end value for this particular proj_title using C#??
Here's my current C# code:
public class parser
{
public static void Main()
{
XmlDocument xmlDoc= new XmlDocument();
xmlDoc.Load("ex.xml");
//~2000 tags w/ proj_title
//any more efficient way to just look for proj_title="heat_run" specifically?
XmlNodeList heat_run_nodes=xmlDoc.GetElementsByTagName("proj_title");
}
}

8MB really isn't very large at all by modern standards. Personally I'd use LINQ to XML:
XDocument doc = XDocument.Load("ex.xml");
var projects = doc.Descendants("proj_title")
.Where(x => (string) x == "heat_run")
.Select(x => x.Parent) // Just for simplicity
.Select(x => new {
Start = (int) x.Element("proj_start"),
End = (int) x.Element("proj_end")
});
foreach (var project in projects)
{
Console.WriteLine("Start: {0}; End: {1}", project.Start, project.End);
}
(Obviously adjust this to your own requirements - it's not really clear what you need to do based on the question.)
Alternative query:
var projects = doc.Descendants("proj")
.Where(x => (string) x.Element("proj_title") == "heat_run")
.Select(x => new {
Start = (int) x.Element("proj_start"),
End = (int) x.Element("proj_end")
});

You can use XPath to find all nodes that match, for example:
XmlNodeList matches = xmlDoc.SelectNodes("proj[proj_title='heat_run']")
matches will contain all proj nodes that match the critera. Learn more about XPath: http://www.w3schools.com/xsl/xpath_syntax.asp
MSDN Documentation on SelectNodes

Use XDocument and use the LINQ api.
http://msdn.microsoft.com/en-us/library/bb387098.aspx
If the performance is not what you expect after trying it, you have to look for a sax parser.
A Sax parser will not load the whole document in memory and try to apply an xpath expression on everything in memory. It works more in an event driven approach and in some cases this can be a lot faster and does not use as much memory.
There are probably sax parsers for .NET around there, haven't used them myself for .NET but I did for C++.

Related

How to search entire XML file for keyword?

Im learning C# and one of the things I am trying to do is read in an XML file and search it.
I have found a few examples where I can search specific nodes (if its a name or ISBN for example) for specific key words.
What I was looking to do was to search the entire XML file in order to find all possible matches of a keyword.
I know LIST allows a "contains" to find keywords, is there a similar function for searching an XML file?
Im using the generic books.xml file that is included when visual studio is installed.

If you only want to search for keyword appear in leaf nodes' text, try the following
(using this sample books.xml):
string keyword = "com";
var doc = XDocument.Load("books.xml");
var query = doc.Descendants()
.Where(x => !x.HasElements &&
x.Value.IndexOf(keyword, StringComparison.InvariantCultureIgnoreCase) >= 0);
foreach (var element in query)
Console.WriteLine(element);
Output:
<genre>Computer</genre>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
<genre>Computer</genre>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>

If you are looking for a keyword that you already know, you can parse the XML just as simple text file and use StreamReader for parsing it. But if you are looking for an element in XML you can use XmlTextReader(), please consider this example:
using (XmlTextReader reader = new XmlTextReader(xmlPath))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
//do your code here
}
}
}
Hope it helps. :)

For example, you could use LINQ TO XML. This example searches for the keyword both in elements and attributes - in their names and values.
private static IEnumerable<XElement> FindElements(string filename, string name)
{
XElement x = XElement.Load(filename);
return x.Descendants()
.Where(e => e.Name.ToString().Equals(name) ||
e.Value.Equals(name) ||
e.Attributes().Any(a => a.Name.ToString().Equals(name) ||
a.Value.Equals(name)));
}
and use it:
string s = "search value";
foreach (XElement x in FindElements("In.xml", s))
Console.WriteLine(x.ToString());

Update XML file from List<T>

I have a XML file containing records like -
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfCLocation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<CLocation>
<CId>5726</CId>
<Long>0</Long>
<Lat>0</Lat>
<Status>Pending</Status>
</CLocation>
<CLocation>
<CId>5736</CId>
<Long>0</Long>
<Lat>0</Lat>
<Status>Processed</Status>
</CLocation>
</ArrayOfCLocation>
I take these records into List as -
XDocument xDocument = XDocument.Load(filePath);
List<T> list = xDocument.Descendants("CLocation")
.Select(c => (new T()
{
CId = Convert.ToInt32(c.Descendants("CId").FirstOrDefault().Value),
Lat = Convert.ToDouble(c.Descendants("Lat").FirstOrDefault().Value),
Long = Convert.ToDouble(c.Descendants("Long").FirstOrDefault().Value),
Status = (Status)Enum.Parse(typeof(Status), c.Descendants("Status").FirstOrDefault().Value)
}))
.Where(c => c.Status == Status.Pending)
.Take(listCount)
.ToList();
Now, I update T objects(setting their Lat/Log fields) in above collection
and after processing these objects, I want to update these records back into XML file.
Can anyone please guide me for a efficient solution for how can I update these objects back into XML file.

You could do something like this:
foreach (var location in list)
{
var elem = xDocument.Root.Elements()
.Single(e => (int)e.Element("CId") == location.CId);
elem.Element("Long").ReplaceNodes(location.Long);
elem.Element("Lat").ReplaceNodes(location.Lat);
}
You can then save the modified xDocument back to a file, or whatever.
If you find this is not efficient enough, there are several ways to speed things up. For example create a Dictionary of elements by CId, so that the whole document is not searched every time.
But if you have huge files, loading them whole into memory might not be possible or a good idea. Using XmlReader and XmlWriter will work for files of any size, but they are not as easy to use.
Another option to consider is XML serialization. It's made specifically for converting XML into your objects and back.
Also, the code you have could be simplified quite a lot, and in the process made faster:
xDocument.Root.Elements("CLocation")
.Select(c => new Location
{
CId = (int)c.Element("CId"),
Lat = (double)c.Element("Lat"),
Long = (double)c.Element("Long"),
Status = (Status)Enum.Parse(typeof(Status), c.Element("Status").Value)
})

C# , xml parsing. get data between tags

I have a string :
responsestring = "<?xml version="1.0" encoding="utf-8"?>
<upload><image><name></name><hash>SOmetext</hash>"
How can i get the value between
<hash> and </hash>
?
My attempts :
responseString.Substring(responseString.LastIndexOf("<hash>") + 6, 8); // this sort of works , but won't work in every situation.
also tried messing around with xmlreader , but couldn't find the solution.
ty

Try
XDocument doc = XDocument.Parse(str);
var a = from hash in doc.Descendants("hash")
select hash.Value;
you will need System.Core and System.Xml.Linq assembly references

Others have suggested LINQ to XML solutions, which is what I'd use as well, if possible.
If you're stuck with .NET 2.0, use XmlDocument or even XmlReader.
But don't try to manipulate the raw string yourself using Substring and IndexOf. Use an XML API of some description. Otherwise you will get it wrong. It's a matter of using the right tool for the job. Parsing XML properly is a significant chunk of work - work that's already been done.
Now, just to make this a full answer, here's a short but complete program using your sample data:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
string response = #"<?xml version='1.0' encoding='utf-8'?>
<upload><image><name></name><hash>Some text</hash></image></upload>";
XDocument doc = XDocument.Parse(response);
foreach (XElement hashElement in doc.Descendants("hash"))
{
string hashValue = (string) hashElement;
Console.WriteLine(hashValue);
}
}
}
Obviously that will loop over all the hash elements. If you only want one, you could use doc.Descendants("hash").Single() or doc.Descendants("hash").First() depending on your requirements.
Note that both the conversion I've used here and the Value property will return the concatenation of all text nodes within the element. Hopefully that's okay for you - or you could get just the first text node which is a direct child if necessary.

var val = XElement.Parse();
val.Descendants(...).Value

Get your xml well formed and escape the double quotes with backslash. Then apply the following code
XDocument resp = XDocument.Parse("<hash>SOmetext</hash>");
var r= from element in resp.Elements()
where element.Name == "hash"
select element;
foreach (var item in r)
{
Console.WriteLine(item.Value);
}

You can use an xmlreader and/or xpath queries to get all desired data.

XmlReader_Object.ReadToFollowing("hash");
string value = XmlReader_Object.ReadInnerXml();

Updating XmlDocument using linq (possibly)

I have an XmlDocument object in memory, here is a sample of the data:
<terms>
<letter name="F">
<term>
<name>fascículo</name>
<translation language="en">fascicle</translation>
<definition>pequeño paquete de fibras nerviosas o musculares</definition>
</term>
(There are many terms in the actual document)
I want to be able to find a term by its name node, and then add an element as a child of the term
<terms>
<letter name="F">
<term>
<name>fascículo</name>
<translation language="en">fascicle</translation>
<definition>pequeño paquete de fibras nerviosas o musculares</definition>
<image>hi there</image>
</term>
Now I can achieve this using Xpath, find the node, then create a new node with the new values, blah blah.
But that seems all a little bit long winded in the world of linq.
This is what I have so far:
private static XmlDocument AddImages(XmlDocument termDoc)
{
XDocument xDoc = XDocument.Load(new XmlNodeReader(termDoc));
using (CsvReader csv = new CsvReader(new StreamReader("spa2engimages.csv"), false))
{
csv.ReadNextRecord();
csv.ReadNextRecord();
XElement selectedTerm;
string name, imageref;
while (csv.ReadNextRecord())
{
imageref = csv[0].ToString();
name = csv[3].ToString();
selectedTerm = xDoc.Descendants("term").Single(t => t.Descendants("name").Single().Value == name);
//now want to add a new node and save it back in to the termDoc somehow
}
}
return termDoc;
}
But I am a bit lost from there. Any ideas?

The following will add the element for you
xDoc.Descendants("term").Single(t => t.Descendants("name").Single().Value == name).Add(new XElement("image", "hi there"));
The biggest issue I see which is making this clunky is the fact you need to switch back and forward between XmlDocument and XDocument. My recommendation is if you are going to use XmlDocument then use XPath and if you want to use LINQ then use XDocument. This constant switching will kill performance and maintainability.

This is how to do it with xPath, just for clarity
termDoc.SelectSingleNode("//term[name='" + name + "']").AppendChild(imageNode);

Converting one XML document into another XML document

I want to convert an XML document containing many elements within a node (around 150) into another XML document with a slightly different schema but mostly with the same element names. Now do I have to manually map each element/node between the 2 documents. For that I will have to hardcode 150 lines of mapping and element names. Something like this:
XElement newOrder = new XElement("Order");
newOrder.Add(new XElement("OrderId", (string)oldOrder.Element("OrderId")),
newOrder.Add(new XElement("OrderName", (string)oldOrder.Element("OrderName")),
...............
...............
...............and so on
The newOrder document may contain additional nodes which will be set to null if nothing is found for them in the oldOrder. So do I have any other choice than to hardcode 150 element names like orderId, orderName and so on... Or is there some better more maintainable way?

Use an XSLT transform instead. You can use the built-in .NET XslCompiledTransform to do the transformation. Saves you from having to type out stacks of code. If you don't already know XSL/XSLT, then learning it is something that'll bank you CV :)
Good luck!

Use an XSLT transformation to translate your old xml document into the new format.

XElement.Add has an overload that takes object[].
List<string> elementNames = GetElementNames();
newOrder.Add(
elementNames
.Select(name => GetElement(name, oldOrder))
.Where(element => element != null)
.ToArray()
);
//
public XElement GetElement(string name, XElement source)
{
XElement result = null;
XElement original = source.Elements(name).FirstOrDefault();
if (original != null)
{
result = new XElement(name, (string)original)
}
return result;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing XML file using C#? - c#

You can use XPath to find all nodes that match, for example: XmlNodeList matches = xmlDoc.SelectNodes("proj[proj_title='heat_run']") matches will contain all proj nodes that match the critera. Learn more about XPath: http://www.w3schools.com/xsl/xpath_syntax.asp MSDN Documentation on SelectNodes

Related

How to search entire XML file for keyword?

Update XML file from List<T>

C# , xml parsing. get data between tags

Updating XmlDocument using linq (possibly)

Converting one XML document into another XML document

Categories

Resources