Why label control does not render DIV as HTML (AllowHtmlString=true) - c#

I want to center some of the strings.
I saw it.
https://documentation.devexpress.com/WindowsForms/9536/Controls-and-Libraries/Editors-and-Simple-Controls/Simple-Editors/Examples/How-to-Format-Text-in-LabelControl-Using-HTML-Tags
So, I wrote this code.
labelControl1.Text = "<div style=\"text-align:center;\">center</div><br>" +
"<size=14>Size = 14<br>" +
"Bold <i>Italic</i> <u>Underline</u><br>" +
"<color=255, 0, 0>Sample Text</color></size>";
labelControl1.AllowHtmlString = true;
labelControl1.Appearance.TextOptions.WordWrap = WordWrap.Wrap;
labelControl1.Appearance.Options.UseTextOptions = true;
labelControl1.AutoSizeMode = LabelAutoSizeMode.Vertical;
But, it didn't work.
What is the problem with it?

According to HTML Text Formatting documentation, LabelControl.AllowHtmlString property support these tags and "pseudotags" (tags which not exist in current HTML standard but can be used for rendering purpose in label control):
Normal HTML tags
<b> - bold text
<i> - italic text
<s> - strikethrough
<u> - underline
<br> (current HTML equivalent is <br />)
Pseudotags
<color> (equivalent to CSS color)
<backcolor> (equivalent to CSS background-color)
<size> (equivalent to CSS font-size)
<image=value> (equivalent to HTML <img src="value">)
<href=url> (equivalent to HTML <a href="url">)
<nbsp> (equivalent to HTML )
The HTML <div> tag is not included in supported tags mentioned above, hence it will rendered as standard text instead.

According to the documentation, only specific HTML tags are supported, and div is not in the list.
Depending on your requirements, you might split the text into two labels, one centered (AutoSize=False, TextAlign=MiddleCenter) and one with HTML.

Related

Regex removes to much text

In our CMS we are using some tags which should be replaced on exporting for other systems.
The code for replacing is stated below:
var rxStr = "<div[^<]+class=([\"'])related-document-content\\1.*</div>";
var rx = new System.Text.RegularExpressions.Regex(rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
bodyText = rx.Replace(bodyText, "");
Our problem occurs when there are to instances of the tag in rxStr :
<p>First paragraph</p>
<div class='related-document-content' id='457'>First related text</div>
<p>Second paragraph</p>
<div class='related-document-content' id='458'>Second related text</div>
<p>Third paragraph</p>
When the code runs it removes the second paragraph and the output will be
<p>First paragraph</p>
<p>Third paragraph</p>
Can anyone help me adjust code so that only the div tags get removed
Besides the obvious "Use an HTML parser/write instead":
What your regex matches is the < of the next HTML tag over, that's why it skips one.
Your rxStr looks for "anything but the next open tag" <div[^<]+.
Instead it should look for "anything but the current tag's end" <div[^>]+.
You then also add the > to your regular expression. See below:
// Added [^>]+> towards the end.
// Also adding () within the div so you can debug better which matches were found.
var rxStr = "<div[^>]+class=([\"'])related-document-content\\1[^>]*>(.*)</div>";
If the innerHTML of your div is actually text-only use [^<]* instead of .*:
var rxStr = "<div[^>]+class=([\"'])related-document-content\\1[^>]*>([^<]*)</div>";

check if html contains tags and whitespaces

I want to check if html string only contain white spaces. I used HtmlAgilityPack as suggested in this post, it works fine when html string only has text in it like:
"< title></title><p style=\"margin: 0em 0px;\"><\br></p>"
but when it contains any tags like image, then HtmlAgilityPack will also mark it as empty
"< title></title><p style="margin: 0em 0px;"><img src="https: //www.google.com.pk/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"></p><p style="margin:0em 0px;"><\br></p>"
I want to check that after rendering html data on browser it should not be an empty space.

Extract content within a div tag ignoring other tags inside

Below is the sample html source
<div id="page2" dir="ltr">
<p>This text I dont want to extract</p>
This is the text which I want to extract
</div>
Irrespective of the attributes of div tag, I want to extract only the div tag text ignoring the other tags text that come inside div tag.
In the above example i do not want to extract text within <p></p> tag, but i want to extract text within <div></div> tag, i.e "This is the text which I want to extract"
XmlNodeList DivNodeList = xDoc.GetElementsByTagName("div");
string DivInnerText;
for (int i = 0; i < DivNodeList.Count; i++)
{
if (!DivNodeList[i].InnerXml.Contains("p"))
{
DivInnerText = DivNodeList[i].InnerText.Trim();
Div_List.Add(DivInnerText);
}
}
But the above code is not working as expected, since I am checking whether p tag is present or not, then only extracting the text. Obviously if p tag is present, it would not go inside and more over the inner text of the div tag contains all the text combined whatever the tags inside it.
Any help on this is greatly appreciated.
For HTML processing, you should try the HtmlAgilityPack library.
Your requirement should be easy to do.
Take a look : http://www.c-sharpcorner.com/UploadFile/9b86d4/getting-started-with-html-agility-pack/
Using JQuery you can achieve this by doing that:
$("#page2").clone().children().remove().end().text();
Example
The credit should go to "DotNetWala" -
check his answer here

How can I wrap a <span> around matched words in HTML without breaking the HTML

Using C# - WinForms
I have a valid HTML string which may or may not contain various HTML elements such as <a>.
I need to search this HTML and highlight certain keywords - the highlighting is done by adding a <span> around the text with inline styling. I should not be doing this for <a> tags, or any other HTML tag that isn't actually visible to the user.
e.g. currently I am doing this:
html = html.Replace(phraseToCount, "<span style=\"background: #FF0000; color: #FFFFFF; font-weight: bold;\">" + phraseToCount + "</span>");
This kind of works but it breaks <a> tags. So in the example below only the 1st instance of the word cereal should end up with a <span> around it:
<p>To view more types of cereal click here.</p>
How could I do this?
EDIT - more info.
This will be running in a Winforms app as the best way to get the HTML is using the WebBrowser control - I will be scraping web pages and highlighting various words.
You're handling HTML as plain text. You don't want that. You only want to search through the "InnerText" of your HTML elements, as in <p attribute="value">innertext</p>. Not through tags, comments, styles and script and whatever else can be included in your document.
In order to do that properly, you need to parse the HTML, and then obtain all elements' InnerTexts and do your logic on that.
In fact, InnerText is a simplification: when you have an element like <p>FooBar<span>BarBaz</span></p> where "Baz" is to be replaced, then you need to actually recursively iterate all the nodes in the DOM, and only replace text nodes, because writing into the InnerText property will remove all child nodes.
For how to do that, you'd want to use a library. You don't want to build an HTML parser on your own. See for example C#: HtmlAgilityPack extract inner text, Extracting Inner text from HTML BODY node with Html Agility Pack, How can i parse InnerText of <option> tag with HtmlAgilityPack?, Parsing HTML with CSQuery, HtmlAgilityPack - get all nodes in a document and so on.
Most importantly seems to be How can I retrieve all the text nodes of a HTMLDocument in the fastest way in C#?:
HtmlNodeCollection coll = htmlDoc.DocumentNode.SelectNodes("//text()");
foreach (HtmlTextNode node in coll.Cast<HtmlTextNode>())
{
node.Text = node.Text.Replace(...);
}
Here's how you would do what #CodeCaster suggested in CSQuery
string str = "<p>To view more types of cereal click here cereal.</p>";
var cq = CQ.Create(str);
foreach (IDomElement node in cq.Elements)
{
PerformActionOnTextNodeRecursively(node, domNode => domNode.NodeValue = domNode.NodeValue.Replace("cereal", "<span>cereal</span>"));
}
Console.WriteLine(cq.Render());
private static void PerformActionOnTextNodeRecursively(IDomNode node, Action<IDomNode> action)
{
foreach (var childNode in node.ChildNodes)
{
if (childNode.NodeType == NodeType.TEXT_NODE)
{
action(childNode);
}
else
{
PerformActionOnTextNodeRecursively(childNode, action);
}
}
}
Hope it helps.

Error while using XPath to parse text from HTML

The HTML content I need to parse is the text in the marquee element as given below. I'm using C# with HTML Agility Pack to parse it, but a nullrefrence exception is thrown.
C# code is
var ht1 = ht.DocumentNode.SelectSingleNode("html/body/table/tbody/tr/td[2]/div[2]/marquee/text()").InnerText;
Part of HTML:
<html>
-<body ...
-<table id=..
-<tbody>
-<tr>
+<td.........
-<td
+<div ......
-<div style="width:100%;padding:0;margin:0;border
-style:solid;border-width:0;border-color:darkred;">
<marquee width="100%" height="20" bgcolor="" style="color:
darkorchid; font-size: 14" loop="3" behavior="scroll"
scrolldelay="90 scrollamount="5" align="middle" border="0">
your scrolling text - these are some samples - think of
possibilities</marquee>
<div>
Did you look in the direct source of the html file? If you only look in the html shown in a browser like Firebug/fox, it shows additional tbody tags, that are not actually in the file.
Therefore use:
var ht1 = ht.DocumentNode.SelectSingleNode("html/body/table/tr/td[2]/div[2]/marquee/text()").InnerText;
You usually do not want to use text() because, the text content of a node is already its text. And text() returns a set of text-nodes, not the concatenated text.
Therefore use:
var ht1 = ht.DocumentNode.SelectSingleNode("html/body/table/tr/td[2]/div[2]/marquee").InnerText
That page does not seem to be well formed HTML.
This worked for me though:
ht.DocumentNode.SelectSingleNode(#"html/head/table[1]/tbody/tr/td[1]/td/div[2]/marquee").InnerText;

Categories