Regular expression over where clause? - c#

I'm trying to match specific terms on a XML file and save the results. Here is the XML text on a string:
<node1>
<node2>
<text>Goodbye</text>
</node2>
<node2>
<text>Welcome James</text>
</node2>
<node2>
<text>Welcome John</text>
</node2>
<node2>
<text>See you later!</text>
</node2>
</node1>
I want to use linq to select any text that has welcome in it. However the name after welcome (ex. welcome James) can change. Thus, i'm trying to understand is there an easy way to select the nodes with any welcome name in it via regular expressions?
Here's the C# code:
private static void Test(string stream)
{
XDocument doc = XDocument.Parse(stream); //stream contains the xml written above
var list = from hello in doc.Descendants("node2")
where attacker.Element("text").Value == "Welcome .*"
select attacker.Element("text").Value;
foreach (var x in attackList)
Console.WriteLine(x);
}

For a scenario as simple as yours there is no need to use regular expressions. You can use the String.StartsWith(String) method that determines whether a string starts with a specified string as follows:
private static void Test(string stream)
{
XDocument doc = XDocument.Parse(stream);
var list = from hello in doc.Descendants("node2")
where attacker.Element("text").Value.StartsWith("Welcome")
select attacker.Element("text").Value;
foreach (var x in attackList)
{
Console.WriteLine(x);
}
}

Regex regex = new Regex("Welcome");
XDocument doc = XDocument.Parse(stream); //stream contains the xml written above
var list = from hello in doc.Descendants("node2")
where regex.IsMatch(attacker.Element("text").Value)
select attacker.Element("text").Value;

The simplest way to get Regex matching is to use the static Regex.IsMatch(String, String) function
If you want better performance, you can compile the regex beforehand (see proxon's answer).
As Marius mentions though, String.StartsWith is good enough for your specific example.

Related

Xml Linq Find Element After another Element

I am searching for this for a long time here, but I can't get it working from other codes. I need to find the closest element to "user" (which is "robot") and write its value (it's depending on user's input). I am programming a chat-bot. This is my XML file:
<Answers>
<user>This is a message</user><robot>Here is an answer</robot>
<user>This is another message</user><robot>Here is another answer</robot>
</Answers>
In C# code i am trying something like this:
private static void Main(string[] args)
{
XDocument doc = XDocument.Load("C:\\bot.xml");
var userPms = doc.Descendants("user");
var robotPm = doc.Descendants("robot");
string userInput = Console.ReadLine();
foreach (var pm in userPms.Where(pm => pm.Value == userInput))
{
Console.WriteLine // robotPm.FindNextTo(pm)
}
}
Simply put, I want to compare "user" input in console and in xml and if they are equal write robot's answer from xml which is responsible to specified user input.
Thank you for help
Simply use NextNode
Console.WriteLine(((XElement)pm.NextNode).Value);
But don't forget: Altough I've never seen a counter-example, xml parsers do not guarantee the order of elements. a better approach would be
<item>
<q>qusetion1</q>
<a>answer1</a>
</item>
<item>
<q>qusetion2</q>
<a>answer2</a>
</item>

How to replace all "values" in an XML document with "0.0" using C# (preferably LINQ)?

This is not a homework; I need this for my unit tests.
Sample input: <rows><row><a>1234</a><b>Hello</b>...</row><row>...</rows>.
Sample output: <rows><row><a>0.0</a><b>0.0</b>...</row><row>...</rows>.
You may assume that the document starts with <rows> and that parent node has children named <row>. You do not know the name of nodes a, b, etc.
For extra credit: how to make this work with an arbitrary well-formed, "free-form" XML?
I have tried this with a regex :) without luck. I could make it "non-greedy on the right", but not on the left. Thanks for your help.
EDIT: Here is what I tried:
private static string ReplaceValuesWithZeroes(string gridXml)
{
Assert.IsTrue(gridXml.StartsWith("<row>"), "Xml representation must start with '<row>'.");
Assert.IsTrue(gridXml.EndsWith("</row>"), "Xml representation must end with '<row>'.");
gridXml = "<deleteme>" + gridXml.Trim() + "</deleteme>"; // Fake parent.
var xmlDoc = XDocument.Parse(gridXml);
var descendants = xmlDoc.Root.Descendants("row");
int rowCount = descendants.Count();
for (int rowNumber = 0; rowNumber < rowCount; rowNumber++)
{
var row = descendants.ElementAt(0);
Assert.AreEqual<string>(row.Value /* Does not work */, String.Empty, "There should be nothing between <row> and </row>!");
Assert.AreEqual<string>(row.Name.ToString(), "row");
var rowChildren = row.Descendants();
foreach (var child in rowChildren)
{
child.Value = "0.0"; // Does not work.
}
}
// Not the most efficient but still fast enough.
return xmlDoc.ToString().Replace("<deleteme>", String.Empty).Replace("</deleteme>", String.Empty);
}
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlElement el in doc.SelectNodes("//*[not(*)]"))
el.InnerText = "0.0";
xml = doc.OuterXml;
or to be more selective about non-empty text nodes:
foreach (XmlText el in doc.SelectNodes("//text()[.!='']"))
el.InnerText = "0.0";
XDocument xml = XDocument.Load(myXmlFile);
foreach (var element in xml.Descendants("row").SelectMany(r => r.Elements()))
{
element.Value = "0.0";
}
Note that this general search for "Desscendants('row')" is not very efficient--but it satisfies the 'arbitrary format' requirement.
You should take look at HTML Agility Pack. It allows you to treat html documents as well-formed xml's, therefore you can parse it and change values.
I think you can use Regex.Replace method in C#. I used the below regex to replace all the XML elements values:
[>]+[a-zA-Z0-9]+[<]+
This will basically match text starting with a '>'{some text alphabets or number}'<'.
I was able to use this successfully in Notepad++. You can write a small program as well using this.

C# , xml parsing. get data between tags

I have a string :
responsestring = "<?xml version="1.0" encoding="utf-8"?>
<upload><image><name></name><hash>SOmetext</hash>"
How can i get the value between
<hash> and </hash>
?
My attempts :
responseString.Substring(responseString.LastIndexOf("<hash>") + 6, 8); // this sort of works , but won't work in every situation.
also tried messing around with xmlreader , but couldn't find the solution.
ty
Try
XDocument doc = XDocument.Parse(str);
var a = from hash in doc.Descendants("hash")
select hash.Value;
you will need System.Core and System.Xml.Linq assembly references
Others have suggested LINQ to XML solutions, which is what I'd use as well, if possible.
If you're stuck with .NET 2.0, use XmlDocument or even XmlReader.
But don't try to manipulate the raw string yourself using Substring and IndexOf. Use an XML API of some description. Otherwise you will get it wrong. It's a matter of using the right tool for the job. Parsing XML properly is a significant chunk of work - work that's already been done.
Now, just to make this a full answer, here's a short but complete program using your sample data:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
string response = #"<?xml version='1.0' encoding='utf-8'?>
<upload><image><name></name><hash>Some text</hash></image></upload>";
XDocument doc = XDocument.Parse(response);
foreach (XElement hashElement in doc.Descendants("hash"))
{
string hashValue = (string) hashElement;
Console.WriteLine(hashValue);
}
}
}
Obviously that will loop over all the hash elements. If you only want one, you could use doc.Descendants("hash").Single() or doc.Descendants("hash").First() depending on your requirements.
Note that both the conversion I've used here and the Value property will return the concatenation of all text nodes within the element. Hopefully that's okay for you - or you could get just the first text node which is a direct child if necessary.
var val = XElement.Parse();
val.Descendants(...).Value
Get your xml well formed and escape the double quotes with backslash. Then apply the following code
XDocument resp = XDocument.Parse("<hash>SOmetext</hash>");
var r= from element in resp.Elements()
where element.Name == "hash"
select element;
foreach (var item in r)
{
Console.WriteLine(item.Value);
}
You can use an xmlreader and/or xpath queries to get all desired data.
XmlReader_Object.ReadToFollowing("hash");
string value = XmlReader_Object.ReadInnerXml();

How to read XML attributes in C#?

I have several XML files that I wish to read attributes from. My main objective is to apply syntax highlighting to rich text box.
For example in one of my XML docs I have: <Keyword name="using">[..] All the files have the same element: Keyword.
So, how can I get the value for the attribute name and put them in a collection of strings for each XML file.
I am using Visual C# 2008.
The other answers will do the job - but the syntax highlighting thingy and the several xml files you say you have makes me thinks you need something faster, why not use a lean and mean XmlReader?
private string[] getNames(string fileName)
{
XmlReader xmlReader = XmlReader.Create(fileName);
List<string> names = new List<string>();
while (xmlReader.Read())
{
//keep reading until we see your element
if (xmlReader.Name.Equals("Keyword") && (xmlReader.NodeType == XmlNodeType.Element))
{
// get attribute from the Xml element here
string name = xmlReader.GetAttribute("name");
// --> now **add to collection** - or whatever
names.Add(name);
}
}
return names.ToArray();
}
Another good option would be the XPathNavigator class - which is faster than XmlDoc and you can use XPath.
Also I would suggest to go with this approach only IFF after you try with the straightforward options you're not happy with performance.
You could use XPath to get all the elements, then a LINQ query to get the values on all the name atttributes you find:
XDocument doc = yourDocument;
var nodes = from element in doc.XPathSelectElements("//Keyword")
let att = element.Attribute("name")
where att != null
select att.Value;
string[] names = nodes.ToArray();
The //Keyword XPath expression means, "all elements in the document, named "Keyword".
Edit: Just saw that you only want elements named Keyword. Updated the code sample.
Like others, I would suggest using LINQ to XML - but I don't think there's much need to use XPath here. Here's a simple method to return all the keyword names within a file:
static IEnumerable<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value);
}
Nice and declarative :)
Note that if you're going to want to use the result more than once, you should call ToList() or ToArray() on it, otherwise it'll reload the file each time. Of course you could change the method to return List<string> or string[] by -adding the relevant call to the end of the chain of method calls, e.g.
static List<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value)
.ToList();
}
Also note that this just gives you the names - I would have expected you to want the other details of the elements, in which case you'd probably want something slightly different. If it turns out you need more, please let us know.
You could use LINQ to XML.
Example:
var xmlFile = XDocument.Load(someFile);
var query = from item in xmlFile.Descendants("childobject")
where !String.IsNullOrEmpty(item.Attribute("using")
select new
{
AttributeValue = item.Attribute("using").Value
};
You'll likely want to use XPath. //Keyword/#name should get you all of the keyword names.
Here's a good introduction: .Net and XML XPath Queries
**<Countries>
<Country name ="ANDORRA">
<state>Andorra (general)</state>
<state>Andorra</state>
</Country>
<Country name ="United Arab Emirates">
<state>Abu Z¸aby</state>
<state>Umm al Qaywayn</state>
</Country>**
public void datass(string file)
{
string file = HttpContext.Current.Server.MapPath("~/App_Data/CS.xml");
XmlDocument doc = new XmlDocument();
if (System.IO.File.Exists(file))
{
//Load the XML File
doc.Load(file);
}
//Get the root element
XmlElement root = doc.DocumentElement;
XmlNodeList subroot = root.SelectNodes("Country");
for (int i = 0; i < subroot.Count; i++)
{
XmlNode elem = subroot.Item(i);
string attrVal = elem.Attributes["name"].Value;
Response.Write(attrVal);
XmlNodeList sub = elem.SelectNodes("state");
for (int j = 0; j < sub.Count; j++)
{
XmlNode elem1 = sub.Item(j);
Response.Write(elem1.InnerText);
}
}
}

Get html tags embedded in xml using linq

I have a basic xml file that looks like this.
<root>
<item>
<title><p>some title</p></title>
</item>
...
</root>
What I want, is to get the whole title string including the html tag of the xml using linq and displaying it in a repeater .
I can get the title with no problem, but the <p> tag is being stripped out.
If I use
title = item.Element("title").ToString(), it works somehow but I get all the xml tag as well - meaning the title is not displayed in html.
I already tried with encoding the "<" with "<" but to do this makes the xml hard to read.
What would be a possible solution besides using CDATA and encoding?
Cheers
Terry
Create a reader from the title element and read InnerXml:
static void Main(string[] args)
{
string xml = "<root><item><title><p>some title</p></title></item></root>";
XDocument xdoc = XDocument.Parse(xml);
XElement te = xdoc.Descendants("title").First();
using (XmlReader reader = te.CreateReader())
{
if (reader.Read())
title = reader.ReadInnerXml();
}
}
See Best way to get InnerXml of an XElement? for some ideas on how to get the "InnerXml" of an XElement.
XElement x = XElement.Parse(your xml);
var y= x.Descendants("title").Descendants();
Then iterate y for a list of the contents of the title elements.
BTW, LINQPad (http://www.linqpad.net) is a handy free tool for trying out LINQ-XML.

Categories