String concatenation or HtmlGenericControl html controls? - c#

I am trying to genrate complex list of items using ListView. For every item i must create something like this
<div>
<ul>
<li>foo<li>
<li>bar<li>
.... Dynamic count of <li>
<ul>
<span>Some dynamic text</span>
.. bunch of other dynamicly generated html
</div>
My question is what is better way to generate the html.By using string concatenation like this
StringBuilder sb = new StringBuilder();
sb.Append("<div>");
.......
sb.Append("</div>")
Or by using HtmlGenericControl like this:
HtmlGenericControl htmlItem = new HtmlGenericControl( "div" );
....
using( TextWriter textWriter = new StringWriter( ) )
using( HtmlTextWriter htmlWriter = new HtmlTextWriter( textWriter ) )
{
HtmlGenericControl htmlItem = null;
CreateMenuItem( menuItem, 0, null );
htmlItem.RenderControl( htmlWriter );
return textWriter.ToString( );
}

I prefer this way because this gives me much more Readability. Looking at this , i can easily imagine, How my output will look like.
StringBuilder sb = new StringBuilder();
sb.Append("<div>");
sb.Append("<ul>");
sb.Append("<li>Item1</li>");
sb.Append("<li>Item2</li>");
sb.Append("<li>Item3</li>");
sb.Append("</ul>");
sb.Append("</div>");

HtmlTextWriter is good because:
HtmlTextWriter is the cleanest and the mark-up is nicely indented
when it is rendered.
There is a performance impact as HtmlTextWriter writes directly to the output stream.
HtmlTextWriter supports encoding HTML automatically
Stringbuilder doesn't write to the output stream until ToString is called on it.

Related

How do I hide images that have a certain class when creating a pdf from html?

I am having an issue trying to hide image elements that contain a certain class when converting the html to pdf, using iTextSharp (5.x).
I do not have access over the original Html as it comes from another source, however, I can do basic things like Regex and string.replace in C# after I get it.
A simple example of the Html string would be something like this:
<div>
<div>
<img src="somepath/desktop.jpg" class="img-desktop">Desktop</img>
<img src="somepath/mobile.jpg" class="img-mobile">Mobile</img>
</div>
</div>
This string is then getting created into a PDF using the XMLWorker in iTextSharp.
I need to hide the second image and, more generically, any image element with the "img-mobile" class.
What I've tried:
Add img.img-mobile {display:none} to the CSS that is sent in when creating the pdf
Add img.img-mobile {width:0;height:0} to the CSS
Add #media print { img.img-mobile: display:none} to the CSS
Add #media print { img.img-mobile: width:0;height:0} to the CSS
Use Regex to find an img element with that classes, then loop through the matches, replace the source with empty source and replace the original html of that string with the new string (my Regex isn't grabbing any matches, unfortunately)
var pattern = "<img.*?class=\"img-mobile.*\"\\s?>.*</img>";
var mobileImages = Regex.Matches(innerHtml, pattern);
var srcPattern = "src=\".*\" ";
foreach (var imageElement in mobileImages)
{
var replaceString = Regex.Replace(imageElement.ToString(), srcPattern, " ");
innerHtml.Replace(imageElement.ToString(), replaceString);
}
I am quickly running out of ideas on how to handle this... The only saving grace is that the Html that comes in is consistent since a tool is generating it, somewhere else. So, when a user "adds an image to that html" it will always be structured the same, so Regex and replace methods are acceptable, although a CSS method would be much more preferred...
Even if you're a Regex expert and your input is predictable as mentioned, parsing HTML is hard. A better and easier way is to use a tested/proven parser, which is available in pretty much every programming language. For .NET it's HtmlAgilityPack. If you know a bit of XPath, which is quite similar to CSS selectors, it's pretty simple to setup and select the specific nodes you want to remove:
string RemoveImage(string htmlToParse)
{
var hDocument = new HtmlDocument()
{
OptionWriteEmptyNodes = true,
OptionAutoCloseOnEnd = true
};
hDocument.LoadHtml(htmlToParse);
var root = hDocument.DocumentNode;
var imagesDesktop = root.SelectNodes("//img[#class='img-desktop']");
foreach (var image in imagesDesktop)
{
var imageText = image.NextSibling;
imageText.Remove();
image.Remove();
}
return root.WriteTo();
}
And then pass your parsed HTML to iTextSharp:
var parsedHtml = RemoveImage(HTML);
using (var xmlSnippet = new StringReader(parsedHtml))
{
using (FileStream stream = new FileStream(
outputFile,
FileMode.Create,
FileAccess.Write))
{
using (var document = new Document())
{
PdfWriter writer = PdfWriter.GetInstance(
document, stream
);
document.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(
writer, document, xmlSnippet
);
}
}
}
works for me with the HTML snippet you provided.
UPDATE, after comment about 'approved' code:
Aah, the dreaded CCB. Know how that goes. :( If HtmlAgilityPack doesn't pass, here's an alternate solution, although it's probably not the best Regex ever written. ;)
const string HTML = #"
<div>
<p class='img-desktop'>Paragraph</p>
<div>
<img src='somepath/desktop.jpg' class='img-desktop'>Desktop</img>
<img src='somepath/mobile.jpg' class='img-mobile'>Mobile</img>
</div>
<div>
<img src='somepath/desktop.jpg' alt='img-desktop' title='img-desktop' class=""img-desktop"">Desktop
</IMG>
<img src='somepath/mobile.jpg' class='img-mobile'>Mobile</img>
</div>
</div>";
public void Go()
{
var regex = new Regex(
// initial update
// #"<img[^>]*class='?""?'?img-desktop""?[^>]*>.*?</img>",
// after seeing accepted answer, noticed a bad copy/paste.
// above works, but for readability should have been this:
#"<img[^>]*class='?""?img-desktop""?'?[^>]*>.*?</img>",
// and also noticed above can be shortened to this, which works too
// #"<img[^>]*class=[^>]*img-desktop[^>]*>.*?</img>"
RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Singleline
);
Console.WriteLine(regex.Replace(HTML, ""));
}
The Regex gives you a little extra leeway in case the actual HTML you're dealing with isn't exactly as posted above.

Change rendered html

I have a dll thats rendering a div at the top of my page and I need to remove. Is it possible to edit the html that is about to be rendered, so that i can remove the div from the html before its displayed:
protected override void Render(HtmlTextWriter writer)
{
// setup a TextWriter to capture the markup
TextWriter tw = new StringWriter();
HtmlTextWriter htw = new HtmlTextWriter(tw);
// render the markup into our surrogate TextWriter
base.Render(htw);
// get the captured markup as a string
string pageSource = tw.ToString();
// render the markup into the output stream verbatim
writer.Write(pageSource);
// remove the viewstate field from the captured markup
//string viewStateRemoved = Regex.Replace(pageSource,
// "<input type=\"hidden\" name=\"__VIEWSTATE\" id=\"__VIEWSTATE\" value=\".*?\" />",
// "", RegexOptions.IgnoreCase);
// the page source, without the viewstate field, is in viewStateRemoved
// do what you like with it
}

Manipulating HTML from the asp.net code-behind

I am able to get the HTML from the code-behind, like this one:
protected override void OnPreRenderComplete(EventArgs e)
{
StringWriter sw = new StringWriter();
base.Render(new HtmlTextWriter(sw));
sbHtml = sw.GetStringBuilder();
Response.Write(sbHtml + "<!-- processed by code-behind -->");
}
But I need to remove the HTML from the Page, any help?
If I understand well you wish to manipulate the sbHtml, and write it out.
sbHtml = sw.GetStringBuilder();
sbHtml.Replace('anything','to anything');
Response.Write(sbHtml);
(or is something else ?)
Did you want a method like this to strip the HTML?
public static string StripHTML(string HTMLText)
{
var reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
return reg.Replace(HTMLText, "").Replace(" ", "");
}
You can put an <asp:placeholder> on the page and set the contents to whatever you want. Add/remove/whatever.

Memory efficiency :Passing Html code of aspx page through codebehind

My goal is to generate the aspx code of a page in the form of string.I am calling the below codebehind code through asynchronous request in javascript and i am getting the response back through Response.Write
string html = string.Empty;
using (var memoryStream = new MemoryStream())
{
using (var streamWriter = new StreamWriter(memoryStream))
{
var htmlWriter = new HtmlTextWriter(streamWriter);
base.Render(htmlWriter);
htmlWriter.Flush();
memoryStream.Position = 0;
using (var streamReader = new StreamReader(memoryStream))
{
html = streamReader.ReadToEnd();
streamReader.Close();
}
}
}
Response.Write(html);
Response.End();
I want to ask that is the above code is memory efficient, I am thinking of "yield" to use as it evaluates lazily.Can u suggest on memory efficency of above code.
Use a StringWriter instead of the MemoryStream, the StreamWriter and the StreamReader:
string html;
using (StringWriter stream = new StringWriter()) {
using (HtmlTextWriter writer = new HtmlTextWriter(stream)) {
base.Render(writer);
}
html = stream.ToString();
}
Response.Write(html);
Response.End();
The StringWriter uses a StringBuilder internally. The ToString method calls ToString on the Stringuilder, so it returns the internal string buffer as the string. That means that the string is only created once, and not copied back and forth.
Your method stores an html copy at html variable, and another at memoryStream. Try this:
base.Render(new HtmlTextWriter(Response.Output));
Response.End();
While this can work, I'm not sure what are you trying to accomplish.

LoadControl in WCF

Since i havn't access to the TemplateControl or page from a WCF service i was wondering if it was possible to render a custom control? If so how would one do it?
private string GetRenderedHtmlFrom(Control control)
{
StringBuilder stringBuilder = new StringBuilder();
StringWriter sw = new System.IO.StringWriter(stringBuilder);
HtmlTextWriter htmlWriter = new HtmlTextWriter(textWriter);
control.RenderControl(htmlWriter );
return stringBuilder.ToString();
}
Thanks
This actually wasn't achievable and i ended up abandoning the idea. The rough solution i implemented was loading an html page, and using string.Format() to manipulate it then returned the results as a string and let the JavaScript 'load the control'.

Categories