I have an XDocument object which contains XHTML and I am looking to add ABBR elements into the string. I have a List that I am looping through to look for values which need to be wrapped in ABBR elements.
Lets say I have an XElement which contains XHTML like so:
<p>Some text will go here</p>
I need to adjust the value of the XElement to look like this:
<p>Some text <abbr title="Will Description">will</abbr> go here</p>
How do I do this?
UPDATE:
I am wrapping the value "will" with the HTML element ABBR.
This is what I have so far:
// Loop through them
foreach (XElement xhtmlElement in allElements)
{
// Don't process this element if it has child elements as they
// will also be processed through here.
if (!xhtmlElement.Elements().Any())
{
string innerText = GetInnerText(xhtmlElement);
foreach (var abbrItem in AbbreviationItems)
{
if (innerText.ToLower().Contains(abbrItem.Description.ToLower()))
{
var abbrElement = new XElement("abbr",
new XAttribute("title", abbrItem.Abbreviation),
abbrItem.Description);
innerText = Regex.Replace(innerText, abbrItem.Description, abbrElement.ToString(),
RegexOptions.IgnoreCase);
xhtmlElement.Value = innerText;
}
}
}
}
The problem with this approach is that when I set the XElement Value property, it is encoding the XML tags (correctly treating it as a string rather than XML).
If innerText contains the right XML you can try the following:
xhtmlElement.Value = XElement.Parse(innerText);
instead of
xhtmlElement.Value = innerText;
you can :
change the element value first to string,
edit and replace the previous element with the xmltag,
and then replace old value with the new value.
this might what you're looking for:
var element = new XElement("div");
var xml = "<p>Some text will go here</p>";
element.Add(XElement.Parse(xml));
//Element to replace/rewrite
XElement p = element.Element("p");
var value = p.ToString();
var newValue = value.Replace("will", "<abbr title='Will Description'>will</abbr>");
p.ReplaceWith(XElement.Parse(newValue));
Related
Is it possible to change the name of a tag from code? Something like this:
var tag = doc.QuerySelector("i");
tag.TagName = "em";
This won't work, because TagName is read-only.
But, what are my options for getting to the same end? Would I have to construct an entirely new tag and set the InnerHtml to the contents of the old tag, then delete and swap? Is this even possible?
If you mean to replace the elements in html string then it can be done this way:
private static string RefineImageElement(string htmlContent)
{
var parser = new HtmlParser();
var document = parser.ParseDocument(htmlContent);
foreach (var element in document.All)
{
if (element.LocalName == "img")
{
var newElement = document.CreateElement("v-img");
newElement.SetAttribute("src", element.Attributes["src"] == null ? "" :
element.Attributes["src"].Value);
newElement.SetAttribute("alt", "Article Image");
element.Insert(AdjacentPosition.BeforeBegin, newElement.OuterHtml);
element.Remove();
}
}
return document.FirstElementChild.OuterHtml;
}
To change element name, you can replace OuterHtml of initial element with combination of :
new element opening tag
initial element's InnerHtml
new element closing tag
Here is an example :
var raw = #"<div>
<i>foo</i>
</div>";
var parser = new AngleSharp.Parser.Html.HtmlParser();
var doc = parser.Parse(raw);
var tag = doc.QuerySelector("i");
tag.OuterHtml = $"<em>{tag.InnerHtml}</em>";
Console.WriteLine(doc.DocumentElement.OuterHtml);
Output :
<html><head></head><body><div>
<em>foo</em>
</div></body></html>
I was having the same question the other day. Later the only solution came to me is simply create another element and copy every attributes from the original element. Then remove the original element.
I'm currently using a loop which gives me a variable, which then needs to be fed into an Xpath method to get me any nodes with an attribute equal to my variable. So far, I've learned that Xpath allows you to select a node from the XML document using
root.SelectNodes("Element[#Attribute='SpecificValue']")
However, I'd like to know if there's a way I can insert a predefined variable where the specific value, so I can grab a different set of nodes with each iteration of my loop.
For example something like this:
string attribValue= "test"
root.SelectNodes("Element[#Attribute = attribValue]")
Use string formatting:
string attribValue = "test";
string expression = String.Format("Element[#Attribute = '{0}']", attribValue);
root.SelectNodes(expression);
Using XML Linq
XDocument doc = new XDocument();
XElement root = (XElement)doc.FirstNode;
string attribValue= "test";
var results = root.Descendants("Element").Where(x => x.Attribute("Attribute").Value == attribValue).ToList();
I want to replace inner text of HTML tags with another text.
I am using HtmlAgilityPack
I use this code to extract all texts
HtmlDocument doc = new HtmlDocument();
doc.Load("some path")
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']")) {
// How to replace node.InnerText with some text ?
}
But InnerText is readonly. How can I replace texts with another text and save them to file ?
Try code below. It select all nodes without children and filtered out script nodes. Maybe you need to add some additional filtering. In addition to your XPath expression this one also looking for leaf nodes and filter out text content of <script> tags.
var nodes = doc.DocumentNode.SelectNodes("//body//text()[(normalize-space(.) != '') and not(parent::script) and not(*)]");
foreach (HtmlNode htmlNode in nodes)
{
htmlNode.ParentNode.ReplaceChild(HtmlTextNode.CreateNode(htmlNode.InnerText + "_translated"), htmlNode);
}
Strange, but I found that InnerHtml isn't readonly. And when I tried to set it like that
aElement.InnerHtml = "sometext";
the value of InnerText also changed to "sometext"
The HtmlTextNode class has a Text property* which works perfectly for this purpose.
Here's an example:
var textNodes = doc.DocumentNode.SelectNodes("//body/text()").Cast<HtmlTextNode>();
foreach (var node in textNodes)
{
node.Text = node.Text.Replace("foo", "bar");
}
And if we have an HtmlNode that we want to change its direct text, we can do something like the following:
HtmlNode node = //...
var textNode = (HtmlTextNode)node.SelectSingleNode("text()");
textNode.Text = "new text";
Or we can use node.SelectNodes("text()") in case it has more than one.
* Not to be confused with the readonly InnerText property.
This is not a homework; I need this for my unit tests.
Sample input: <rows><row><a>1234</a><b>Hello</b>...</row><row>...</rows>.
Sample output: <rows><row><a>0.0</a><b>0.0</b>...</row><row>...</rows>.
You may assume that the document starts with <rows> and that parent node has children named <row>. You do not know the name of nodes a, b, etc.
For extra credit: how to make this work with an arbitrary well-formed, "free-form" XML?
I have tried this with a regex :) without luck. I could make it "non-greedy on the right", but not on the left. Thanks for your help.
EDIT: Here is what I tried:
private static string ReplaceValuesWithZeroes(string gridXml)
{
Assert.IsTrue(gridXml.StartsWith("<row>"), "Xml representation must start with '<row>'.");
Assert.IsTrue(gridXml.EndsWith("</row>"), "Xml representation must end with '<row>'.");
gridXml = "<deleteme>" + gridXml.Trim() + "</deleteme>"; // Fake parent.
var xmlDoc = XDocument.Parse(gridXml);
var descendants = xmlDoc.Root.Descendants("row");
int rowCount = descendants.Count();
for (int rowNumber = 0; rowNumber < rowCount; rowNumber++)
{
var row = descendants.ElementAt(0);
Assert.AreEqual<string>(row.Value /* Does not work */, String.Empty, "There should be nothing between <row> and </row>!");
Assert.AreEqual<string>(row.Name.ToString(), "row");
var rowChildren = row.Descendants();
foreach (var child in rowChildren)
{
child.Value = "0.0"; // Does not work.
}
}
// Not the most efficient but still fast enough.
return xmlDoc.ToString().Replace("<deleteme>", String.Empty).Replace("</deleteme>", String.Empty);
}
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlElement el in doc.SelectNodes("//*[not(*)]"))
el.InnerText = "0.0";
xml = doc.OuterXml;
or to be more selective about non-empty text nodes:
foreach (XmlText el in doc.SelectNodes("//text()[.!='']"))
el.InnerText = "0.0";
XDocument xml = XDocument.Load(myXmlFile);
foreach (var element in xml.Descendants("row").SelectMany(r => r.Elements()))
{
element.Value = "0.0";
}
Note that this general search for "Desscendants('row')" is not very efficient--but it satisfies the 'arbitrary format' requirement.
You should take look at HTML Agility Pack. It allows you to treat html documents as well-formed xml's, therefore you can parse it and change values.
I think you can use Regex.Replace method in C#. I used the below regex to replace all the XML elements values:
[>]+[a-zA-Z0-9]+[<]+
This will basically match text starting with a '>'{some text alphabets or number}'<'.
I was able to use this successfully in Notepad++. You can write a small program as well using this.
How to get a value of XElement without getting child elements?
An example:
<?xml version="1.0" ?>
<someNode>
someValue
<child>1</child>
<child>2</child>
</someNode>
If i use XElement.Value for <someNode> I get "somevalue<child>1</child><child>2<child>" string but I want to get only "somevalue" without "<child>1</child><child>2<child>" substring.
You can do it slightly more simply than using Descendants - the Nodes method only returns the direct child nodes:
XElement element = XElement.Parse(
#"<someNode>somevalue<child>1</child><child>2</child></someNode>");
var firstTextValue = element.Nodes().OfType<XText>().First().Value;
Note that this will work even in the case where the child elements came before the text node, like this:
XElement element = XElement.Parse(
#"<someNode><child>1</child><child>2</child>some value</someNode>");
var firstTextValue = element.Nodes().OfType<XText>().First().Value;
There is no direct way. You'll have to iterate and select. For instance:
var doc = XDocument.Parse(
#"<someNode>somevalue<child>1</child><child>2</child></someNode>");
var textNodes = from node in doc.DescendantNodes()
where node is XText
select (XText)node;
foreach (var textNode in textNodes)
{
Console.WriteLine(textNode.Value);
}
I think what you want would be the first descendant node, so something like:
var value = XElement.Descendents.First().Value;
Where XElement is the element representing your <someNode> element.
You can specifically ask for the first text element (which is "somevalue"), so you could also do:
var value = XElement.Descendents.OfType<XText>().First().Value;