How to search entire XML file for keyword?

How to search entire XML file for keyword? - c#

Im learning C# and one of the things I am trying to do is read in an XML file and search it.
I have found a few examples where I can search specific nodes (if its a name or ISBN for example) for specific key words.
What I was looking to do was to search the entire XML file in order to find all possible matches of a keyword.
I know LIST allows a "contains" to find keywords, is there a similar function for searching an XML file?
Im using the generic books.xml file that is included when visual studio is installed.

If you only want to search for keyword appear in leaf nodes' text, try the following
(using this sample books.xml):
string keyword = "com";
var doc = XDocument.Load("books.xml");
var query = doc.Descendants()
.Where(x => !x.HasElements &&
x.Value.IndexOf(keyword, StringComparison.InvariantCultureIgnoreCase) >= 0);
foreach (var element in query)
Console.WriteLine(element);
Output:
<genre>Computer</genre>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
<genre>Computer</genre>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>

If you are looking for a keyword that you already know, you can parse the XML just as simple text file and use StreamReader for parsing it. But if you are looking for an element in XML you can use XmlTextReader(), please consider this example:
using (XmlTextReader reader = new XmlTextReader(xmlPath))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
//do your code here
}
}
}
Hope it helps. :)

For example, you could use LINQ TO XML. This example searches for the keyword both in elements and attributes - in their names and values.
private static IEnumerable<XElement> FindElements(string filename, string name)
{
XElement x = XElement.Load(filename);
return x.Descendants()
.Where(e => e.Name.ToString().Equals(name) ||
e.Value.Equals(name) ||
e.Attributes().Any(a => a.Name.ToString().Equals(name) ||
a.Value.Equals(name)));
}
and use it:
string s = "search value";
foreach (XElement x in FindElements("In.xml", s))
Console.WriteLine(x.ToString());

Related

How can I find a specific XML element programmatically?

I have this chunk of XML
<EnvelopeStatus>
<CustomFields>
<CustomField>
<Name>Matter ID</Name>
<Show>True</Show>
<Required>True</Required>
<Value>3</Value>
</CustomField>
<CustomField>
<Name>AccountId</Name>
<Show>false</Show>
<Required>false</Required>
<Value>10804813</Value>
<CustomFieldType>Text</CustomFieldType>
</CustomField>
I have this code below:
// TODO find these programmatically rather than a strict path.
var accountId = envelopeStatus.SelectSingleNode("./a:CustomFields", mgr).ChildNodes[1].ChildNodes[3].InnerText;
var matterId = envelopeStatus.SelectSingleNode("./a:CustomFields", mgr).ChildNodes[0].ChildNodes[3].InnerText;
The problem is, sometimes the CustomField with 'Matter ID' might not be there. So I need a way to find the element based on what 'Name is', i.e. a programmatic way of finding it. I can't rely on indexes being accurate.

You can use this code to read innertext from a specific element:
XmlDocument doc = new XmlDocument();
doc.Load("your.xml");
XmlNodeList Nodes= doc.SelectNodes("/EnvelopeStatus/CustomField");
if (((Nodes!= null) && (Nodes.Count > 0)))
{
foreach (XmlNode Level1 in Nodes)
{
if (Level1.ChildNodes[1].Name == "name")
{
string text = Convert.ToInt32(Level1.ChildNodes[1].InnerText.ToString());
}
}
}

You can often find anything in a XML document by utilizing the XPath capabilities that is available directly in the .NET Framework versions.
Maybe create a small XPath parser helper class
public class EnvelopeStatusParser
{
public XmlNodeList GetNodesWithName(XmlDocument doc, string name)
{
return doc.SelectNodes($"//CustomField[Name[text()='{name}']]");
}
}
and then call it like below to get all CustomFields which have a Name that equals what you need to search for
// Creating the XML Document in some form - here reading from file
XmlDocument doc = new XmlDocument();
doc.Load(#"envelopestatus.xml");
var parser = new EnvelopeStatusParser();
var matchingNodes = parser.GetNodesWithName(doc, "Matter ID");
Console.WriteLine(matchingNodes);
matchingNodes = parser.GetNodesWithName(doc, "NotHere");
Console.WriteLine(matchingNodes);
There exist numerous XPath cheat sheets around - like this one from LaCoupa - xpath-cheatsheet which can be quiet helpful to fully utilize XPath on XML structures.

output xml with a certain attribute

Sorry for being a total noob here, but maybe s/o can point me in the right direction!?
I got code off the net and am trying to make it work, but it will not :(
the xml is this:
<bla>
<blub>
<this that="one">test one</this>
<this that="two">test two</this>
<this that="three">test three</this>
</blub>
</bla>
And I want to have "this" displayed, where "that" == "two", as you can probably tell ;)
but it will only work for the first (one"), not for "two" or "three".
Why does it not continue to the 2nd and 3rd element? Thanks for any advice!
XElement tata = XElement.Load(#"\\somewhere\test.xml");
var tutu = from titi in tata.Elements("blub")
where (string)titi.Element("this").Attribute("that") == "two"
select titi.Element("this");
foreach (XElement soso in tutu)
{
Console.WriteLine(soso.Value);
}

Your problem is here:
where (string)titi.Element("this").Attribute("that") == "two"
In particular the Element(), which gets the first (in document order) child element with the specified XName which happens to be test one in your case.
Instead, use XDocument rather than XElement and look at all elements in the document:
XDocument tata = XDocument.Load(#"\\somewhere\test.xml");
var tutu = from titi in tata.Descendants("this")
where titi.Attribute("that").Value == "two"
select titi;

Parsing XML file using C#?

I'm new to both XML and C#; I'm trying to find a way to efficiently parse a given xml file to retrieve relevant numerical values, base on the "proj_title" value=heat_run or any other possible values. For example, calculating the duration of a particular test run (proj_end val-proj_start val).
ex.xml:
<proj ID="2">
<proj_title>heat_run</proj_title>
<proj_start>100</proj_start>
<proj_end>200</proj_end>
</proj>
...
We can't search by proj ID since this value is not fixed from test run to test run. The above file is huge: ~8mb, and there's ~2000 tags w/ the name proj_title. is there an efficient way to first find all tag names w/ proj_title="heat_run", then to retrieve the proj start and end value for this particular proj_title using C#??
Here's my current C# code:
public class parser
{
public static void Main()
{
XmlDocument xmlDoc= new XmlDocument();
xmlDoc.Load("ex.xml");
//~2000 tags w/ proj_title
//any more efficient way to just look for proj_title="heat_run" specifically?
XmlNodeList heat_run_nodes=xmlDoc.GetElementsByTagName("proj_title");
}
}

8MB really isn't very large at all by modern standards. Personally I'd use LINQ to XML:
XDocument doc = XDocument.Load("ex.xml");
var projects = doc.Descendants("proj_title")
.Where(x => (string) x == "heat_run")
.Select(x => x.Parent) // Just for simplicity
.Select(x => new {
Start = (int) x.Element("proj_start"),
End = (int) x.Element("proj_end")
});
foreach (var project in projects)
{
Console.WriteLine("Start: {0}; End: {1}", project.Start, project.End);
}
(Obviously adjust this to your own requirements - it's not really clear what you need to do based on the question.)
Alternative query:
var projects = doc.Descendants("proj")
.Where(x => (string) x.Element("proj_title") == "heat_run")
.Select(x => new {
Start = (int) x.Element("proj_start"),
End = (int) x.Element("proj_end")
});

You can use XPath to find all nodes that match, for example:
XmlNodeList matches = xmlDoc.SelectNodes("proj[proj_title='heat_run']")
matches will contain all proj nodes that match the critera. Learn more about XPath: http://www.w3schools.com/xsl/xpath_syntax.asp
MSDN Documentation on SelectNodes

Use XDocument and use the LINQ api.
http://msdn.microsoft.com/en-us/library/bb387098.aspx
If the performance is not what you expect after trying it, you have to look for a sax parser.
A Sax parser will not load the whole document in memory and try to apply an xpath expression on everything in memory. It works more in an event driven approach and in some cases this can be a lot faster and does not use as much memory.
There are probably sax parsers for .NET around there, haven't used them myself for .NET but I did for C++.

Reading a single node from XML file and using it as a condition

I am simply trying to read a particular node from an XML and use it as a string variable in a condition. This gets me to the XML file and gives me the whole thing.
string url = #"http://agent.mtconnect.org/current";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(url);
richTextBox1.Text = xmlDoc.InnerXml;
But I need the power state "ON" of "OFF" (XML section below, can view the whole XML online)
<Events><PowerState dataItemId="p2" timestamp="2013-03-11T12:27:30.275747" name="power" sequence="4042868976">ON</PowerState></Events>
I have tried everything I know of. I am just not that familiar with XML files. and the other posts get me nowhere.
HELP PLEASE!

You may try LINQ2XML for that:
string value = (string) (XElement.Load("http://agent.mtconnect.org/current")
.Descendants().FirstOrDefault(d => d.Name.LocalName == "PowerState"))

If you wanted to avoid LINQ, or if it is not working for you you can use straight XML traversal for this:
string url = #"http://agent.mtconnect.org/current";
System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();
xmlDoc.Load(url);
System.Xml.XmlNamespaceManager theNameManager = new System.Xml.XmlNamespaceManager(xmlDoc.NameTable);
theNameManager.AddNamespace("mtS", "urn:mtconnect.org:MTConnectStreams:1.2");
theNameManager.AddNamespace("m", "urn:mtconnect.org:MTConnectStreams:1.2");
theNameManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
System.Xml.XmlElement DeviceStreams = (System.Xml.XmlElement)xmlDoc.SelectSingleNode("descendant::mtS:DeviceStream", theNameManager);
System.Xml.XmlNodeList theStreams = DeviceStreams.SelectNodes("descendant::mtS:ComponentStream", theNameManager);
foreach (System.Xml.XmlNode CompStream in theStreams)
{
if (CompStream.Attributes["component"].Value == "Electric")
{
System.Xml.XmlElement EventElement = (System.Xml.XmlElement)CompStream.SelectSingleNode("descendant::mtS:Events", theNameManager);
System.Xml.XmlElement PowerElement = (System.Xml.XmlElement)EventElement.SelectSingleNode("descendant::mtS:PowerState", theNameManager);
Console.Out.WriteLine(PowerElement.InnerText);
Console.In.Read();
}
}
When traversing any document with a default namespace in the root node, I have found it is imperative to have a namespace manager. Without it the document is just un-navigable.
I created this code in a console application. It worked for me. Also I am no guru and I may be making some mistakes here. I am not sure if there is some way to have the default namespace referenced without naming it (mtS). Anyone who knows how to make this cleaner or more efficient please comment.
EDIT:
For one less level of 'clunk' you can change this:
if (CompStream.Attributes["component"].Value == "Electric")
{
Console.Out.WriteLine(((System.Xml.XmlElement)CompStream.SelectSingleNode("descendant::mtS:Events", theNameManager)).InnerText;);
Console.In.Read();
}
because there is only one element in there and its innerText is all you will get.

How to read XML attributes in C#?

I have several XML files that I wish to read attributes from. My main objective is to apply syntax highlighting to rich text box.
For example in one of my XML docs I have: <Keyword name="using">[..] All the files have the same element: Keyword.
So, how can I get the value for the attribute name and put them in a collection of strings for each XML file.
I am using Visual C# 2008.

The other answers will do the job - but the syntax highlighting thingy and the several xml files you say you have makes me thinks you need something faster, why not use a lean and mean XmlReader?
private string[] getNames(string fileName)
{
XmlReader xmlReader = XmlReader.Create(fileName);
List<string> names = new List<string>();
while (xmlReader.Read())
{
//keep reading until we see your element
if (xmlReader.Name.Equals("Keyword") && (xmlReader.NodeType == XmlNodeType.Element))
{
// get attribute from the Xml element here
string name = xmlReader.GetAttribute("name");
// --> now **add to collection** - or whatever
names.Add(name);
}
}
return names.ToArray();
}
Another good option would be the XPathNavigator class - which is faster than XmlDoc and you can use XPath.
Also I would suggest to go with this approach only IFF after you try with the straightforward options you're not happy with performance.

You could use XPath to get all the elements, then a LINQ query to get the values on all the name atttributes you find:
XDocument doc = yourDocument;
var nodes = from element in doc.XPathSelectElements("//Keyword")
let att = element.Attribute("name")
where att != null
select att.Value;
string[] names = nodes.ToArray();
The //Keyword XPath expression means, "all elements in the document, named "Keyword".
Edit: Just saw that you only want elements named Keyword. Updated the code sample.

Like others, I would suggest using LINQ to XML - but I don't think there's much need to use XPath here. Here's a simple method to return all the keyword names within a file:
static IEnumerable<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value);
}
Nice and declarative :)
Note that if you're going to want to use the result more than once, you should call ToList() or ToArray() on it, otherwise it'll reload the file each time. Of course you could change the method to return List<string> or string[] by -adding the relevant call to the end of the chain of method calls, e.g.
static List<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value)
.ToList();
}
Also note that this just gives you the names - I would have expected you to want the other details of the elements, in which case you'd probably want something slightly different. If it turns out you need more, please let us know.

You could use LINQ to XML.
Example:
var xmlFile = XDocument.Load(someFile);
var query = from item in xmlFile.Descendants("childobject")
where !String.IsNullOrEmpty(item.Attribute("using")
select new
{
AttributeValue = item.Attribute("using").Value
};

You'll likely want to use XPath. //Keyword/#name should get you all of the keyword names.
Here's a good introduction: .Net and XML XPath Queries

**<Countries>
<Country name ="ANDORRA">
<state>Andorra (general)</state>
<state>Andorra</state>
</Country>
<Country name ="United Arab Emirates">
<state>Abu Z¸aby</state>
<state>Umm al Qaywayn</state>
</Country>**
public void datass(string file)
{
string file = HttpContext.Current.Server.MapPath("~/App_Data/CS.xml");
XmlDocument doc = new XmlDocument();
if (System.IO.File.Exists(file))
{
//Load the XML File
doc.Load(file);
}
//Get the root element
XmlElement root = doc.DocumentElement;
XmlNodeList subroot = root.SelectNodes("Country");
for (int i = 0; i < subroot.Count; i++)
{
XmlNode elem = subroot.Item(i);
string attrVal = elem.Attributes["name"].Value;
Response.Write(attrVal);
XmlNodeList sub = elem.SelectNodes("state");
for (int j = 0; j < sub.Count; j++)
{
XmlNode elem1 = sub.Item(j);
Response.Write(elem1.InnerText);
}
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to search entire XML file for keyword? - c#

Related

How can I find a specific XML element programmatically?

output xml with a certain attribute

Parsing XML file using C#?

Reading a single node from XML file and using it as a condition

How to read XML attributes in C#?

Categories

Resources