Manipulating HTML document before displaying into WPF WebBrowser control

Manipulating HTML document before displaying into WPF WebBrowser control - c#

I have to change inner html code before showing it in the WebBrowser.
Test page - http://aksmod.ru/skajrim-mod-kukri-ot-aksyonov-v5-0/
I tried to use AngleSharp.Scripting but it doesn't work correctly (the ads doesn't load)
var config = new Configuration().WithDefaultLoader().WithJavaScript();
var document = BrowsingContext.New(config).OpenAsync(address).Result;
//do something
return document.DocumentElement.OuterHtml;
later I thought about LoadCompleted, but the result was the same
private void Wb_LoadCompleted(object sender, NavigationEventArgs e)
{
Console.WriteLine("Loaded");
string url = e.Uri.ToString();
if (!(url.StartsWith("http://") || url.StartsWith("https://")))
{ }
if (e.Uri.AbsolutePath != wb.Source.AbsolutePath)
{ }
else
{
Console.WriteLine("Full Loaded");
HTMLDocument html = (HTMLDocument)wb.Document;
var value = html.getElementsByTagName("html").item(index: 0);
//do something
wb.NavigateToString(value.OuterHtml);
}
}
the event just doesn't fire (it works fine for some other sites, although).
So, what I am missing to do it?
Update 1
MCVE
XAML
<Grid>
<WebBrowser Name="wb" />
</Grid>
Code behind
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
wb.Navigated += Wb_Navigated;
wb.LoadCompleted += Wb_LoadCompleted;
wb.Navigate("http://aksmod.ru/skajrim-mod-kukri-ot-aksyonov-v5-0/");
}
private void Wb_LoadCompleted(object sender, NavigationEventArgs e)
{
Console.WriteLine("Loaded");
string url = e.Uri.ToString();
if (!(url.StartsWith("http://") || url.StartsWith("https://")))
{ }
if (e.Uri.AbsolutePath != wb.Source.AbsolutePath)
{ }
else
{
Console.WriteLine("Full Loaded");
HTMLDocument html = (HTMLDocument)wb.Document;
var value = html.getElementsByTagName("html").item(index: 0);
//do something
wb.NavigateToString(value.OuterHtml);
}
}
private void Wb_Navigated(object sender, NavigationEventArgs e)
{
FieldInfo fiComWebBrowser = typeof(WebBrowser)
.GetField("_axIWebBrowser2",
BindingFlags.Instance | BindingFlags.NonPublic);
if (fiComWebBrowser == null) return;
object objComWebBrowser = fiComWebBrowser.GetValue(wb);
if (objComWebBrowser == null) return;
objComWebBrowser.GetType().InvokeMember(
"Silent", BindingFlags.SetProperty, null, objComWebBrowser,
new object[] { true });
Console.WriteLine("Navigated");
}
}

The ads are embedded as iFrame within the page you presented. In my case, the Ad URL loaded in the iFrame is something like https://cdn.254a.com/images/hosted/elv/retargeting/v5/728x90.html?... (check with web browser's inspector tool)
Probably the ad does not allow iframing in your page (Check what the ad returns in X-Frame-Options header field). If this is the issue, it should be possible to implement a proxy for the ad, and let the proxy change the X-Frame-Options header.
In this case, if the ad URL is https (and not just http), you'd need to create a proxy that acts as Man-in-the-Middle. See accepted answer of What's the point of the X-Frame-Options header?. But you could replace the URL by your proxy URL, with the original URL in the ARGS. the proxy acts as HTTPS client, gets the content, proxy is able to modify the header, and returns the content to your page just via HTTP.

You can use: http://html-agility-pack.net for manipulate the Html code on C#.

Related

Change font of html content converted to string using web browser control

I am using this to convert HTML content to XAML but I need to change font size of the content. So, I am trying to use this to change the font size but I am getting doc as null. Any idea why?
Here's my code-
public static void DocumentPropertyChanged(DependencyObject target, DependencyPropertyChangedEventArgs e)
{
WebBrowser browser = target as WebBrowser;
var doc = browser.Document as HTMLDocument;
if (browser != null)
{
string document = e.NewValue as string;
browser.NavigateToString(document);
}
if (doc != null)
{
doc.execCommand("FontSize", false, 12);
doc.execCommand("FontFamily", false, "Arial");
}
}

Try this:
public static void DocumentPropertyChanged(DependencyObject target, DependencyPropertyChangedEventArgs e)
{
if (!(target is WebBrowser)) // Handles null and other weird things.
throw new Exception("target is not a WebBrowser!");
WebBrowser browser = target as WebBrowser;
string document = e.NewValue as string;
if (document == null)
throw new Exception("e.NewValue is not a string!");
browser.NavigateToString(document);
var doc = browser.Document as HTMLDocument;
if (doc != null)
{
doc.execCommand("FontSize", false, 12);
doc.execCommand("FontFamily", false, "Arial");
}
else
{
throw new Exception("browser.Document is not an HTMLDocument!");
}
}
I think this requires a reference to Microsoft.mshtml.dll and a using mshtml; statement, if you haven't done that yet.
I've added all of the throws, because I'm not 100% confident where the problem is, having not run this code.
So for what it's worth, I hope this helps.
Edit...
The documentation states that NavigateToString loads the content asynchronously.
By assigning doc after the navigation, the above code appears to work for very short documents, but that cannot be trusted. Assigning doc before navigation does not work.
A better solution might be to handle the WebBrowser.Navigated event to ensure the content has been fully loaded before interacting with the WebBrowser.Document property:
XAML:
<WebBrowser Name="browser" Navigated="Browser_Navigated"/>
CS:
public static void DocumentPropertyChanged(DependencyObject target, DependencyPropertyChangedEventArgs e)
{
WebBrowser browser = target as WebBrowser;
if (browser != null)
{
string document = e.NewValue as string;
browser.NavigateToString(document);
}
}
private void Browser_Navigated(object sender, NavigationEventArgs e)
{
var doc = webBrowser.Document as HTMLDocument;
if (doc != null)
{
doc.execCommand("FontSize", false, 12);
doc.execCommand("FontFamily", false, "Arial");
}
}
Note: This code will run your execCommand (which I have not tested btw) on every document loaded into the WebBrowser. If this is a problem we can fix it.

Cannot get rendered html via WebBrowser

I want to get html code from website. In Browser I usually can just click on ‘View Page Source’ in context menu or something similar. But how can I automatized it? I’ve tried it with WebBrowser class but sometimes it doesn’t work. I am not web developer so I don’t really know if my approach at least make sense. I think main problem is that I sometimes get html where not all code was executed. Hence it is uncompleted. I have problem with e.g. this site: http://www.sreality.cz/en/search/for-sale/praha
My code (I’ve tried to make it small but runnable on its own):
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Windows.Forms;
namespace WebBrowserForm
{
internal static class Program
{
[STAThread]
private static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
for (int i = 0; i < 10; i++)
{
Form1 f = new Form1();
f.ShowDialog();
}
// Now I can check Form1.List and see that some html is final and some is not
}
}
public class Form1 : Form
{
public static List<string> List = new List<string>();
private const string Url = "http://www.sreality.cz/en/search/for-sale/praha";
private System.Windows.Forms.WebBrowser webBrowser1;
public Form1()
{
this.webBrowser1 = new System.Windows.Forms.WebBrowser();
this.SuspendLayout();
this.webBrowser1.Dock = System.Windows.Forms.DockStyle.Fill;
this.webBrowser1.Name = "webBrowser1";
this.webBrowser1.TabIndex = 0;
this.ResumeLayout(false);
Load += new EventHandler(Form1_Load);
this.webBrowser1.ObjectForScripting = new MyScript();
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate(Url);
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Application.DoEvents();
webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
Application.DoEvents();
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
}
}
I would expect that in ‘webBrowser1_DocumentCompleted’ method I could get final html. It usually works, but with this site it doesn’t. So I’ve tried get html in my own code which should be executed in web site -> method ‘CallServerSideCode’. What is strange that sometimes I get final html (basically the same as if I do it manually via Browser) but sometimes not. I think the problem is caused because my script start before whole web site is rendered instead after. But I am not really sure since this kind of things are far from my comfort zone and I don’t really understand what I am doing. I’m just trying to apply something what I found on the internet.
So, does anyone knows what is wrong with the code? Or even more importantly how to easily get final html from the site?
Any help appreciated.

You should use WebClient class to download HTML page. No display control necessary.
You want method DownloadString

May be it will be helpful if you add calling of your external function to the end of the body and wrap it by Jquery "ondomready" function. I mean something like this:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
HtmlElement body = webBrowser1.Document.GetElementsByTagName("body")[0];
HtmlElement scriptEl = webBrowser1.Document.CreateElement("script");
IHTMLScriptElement element = (IHTMLScriptElement)scriptEl.DomElement;
element.text = "$(function() { window.external.CallServerSideCode(); });";
body.AppendChild(scriptEl);
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}

WebClient on Store Universal Apps

I'm using this code on Windows Desktop App to get the values of a combobox that I after need to select which is going to update the page with new information using JavaScript
private WebBrowser withEventsField_wb;
WebBrowser wb {
get { return withEventsField_wb; }
set {
if (withEventsField_wb != null) {
withEventsField_wb.Navigated -= navigated;
}
withEventsField_wb = value;
if (withEventsField_wb != null) {
withEventsField_wb.Navigated += navigated;
}
}
}
private void Form1_Load(object sender, EventArgs e)
{
wb = new WebBrowser();
wb.Navigate("https://academicos.ubi.pt/online/horarios.aspx?p=a");
}
private void navigated()
{
HtmlElementCollection allelements = wb.Document.All;
HtmlElement year = default(HtmlElement);
foreach (HtmlElement webpageelement in allelements) {
if (webpageelement.GetAttribute("id").Contains("ContentPlaceHolder1_ddlAnoLect") == true) {
year = webpageelement;
HtmlElementCollection yoptions = year.Children;
foreach (HtmlElement yopt in yoptions) {
ComboBox1.Items.Add(yopt.InnerText);
}
}
}
}
But now I'm trying to do the same on Universal App (Windows Phone/Windows) but I'm being unable to do the same. I know that I have to use HttpClient but it does not work like a WebBrowser, this web browser is only created by code to get all the data needed and as for each step of data that I need to retrieve the website does not refresh normally but uses jQuery to load the new information.
Any help?

Well after a lot of searching I got something that helps and even gave me other idea
http://blog.gauravchouhan.com/tag/advance-web-scraping-using-c/

Ajax toolkit file upload is not called

I have two ajaxtoolkit file ulopads on the same page like
<ajaxToolkit:AjaxFileUpload
id="AjaxFileUpload1"
AllowedFileTypes="jpg,jpeg,gif,png"
OnUploadComplete="ajaxUpload2_OnUploadComplete"
runat="server" />
<ajaxToolkit:AjaxFileUpload
id="ajaxUpload1"
AllowedFileTypes="jpg,jpeg,gif,png"
OnUploadComplete="ajaxUpload1_OnUploadComplete"
runat="server" />
and code behind
protected void ajaxUpload2_OnUploadComplete(object sender, AjaxControlToolkit.AjaxFileUploadEventArgs e)
{
string filePath = "~/Images/" + e.FileName;
filePath = filePath.Split('\\').Last();
Session["img2"] = filePath.ToString();
AjaxFileUpload1.SaveAs(MapPath(filePath));
}
protected void ajaxUpload1_OnUploadComplete(object sender, AjaxControlToolkit.AjaxFileUploadEventArgs e)
{
string filePath = "~/Images/" + e.FileName;
filePath = filePath.Split('\\').Last();
Session["img1"] = filePath.ToString();
ajaxUpload1.SaveAs(MapPath(filePath));
}
The question is whenever I use upload AjaxFileUpload1 it works on and calls void ajaxUpload2_OnUploadComplete method but if I use ajaxUpload1 the method ajaxUpload2_OnUploadComplete is called again but the method ajaxUpload1 is not called
Why??
Thanks.

We got the same problem yesterday and we found out that you cannot have more than one instance of AjaxFileUpload on the same page.
If you look at the source code, you'll see that this control use a constant GUID to identify its events. Since the GUID is a constant, all instances of AjaxFileUpload use the same GUID...
Result :
the first instance swallow all the events...
Here is the GUID in action :
private const string ContextKey = "{DA8BEDC8-B952-4d5d-8CC2-59FE922E2923}";
(...)
if (this.Page.Request.QueryString["contextkey"] == ContextKey && this.Page.Request.Files.Count > 0)

We customized the September 2012 toolkit as follows - hope this is a temporary workaround and that this is fixed in a future release:
OLD
private const string ContextKey = "{DA8BEDC8-B952-4d5d-8CC2-59FE922E2923}";
NEW
private string ContextKey = "";
OLD
public AjaxFileUpload()
: base(true, HtmlTextWriterTag.Div)
{
}
NEW
public AjaxFileUpload()
: base(true, HtmlTextWriterTag.Div)
{
if (HttpContext.Current.Items["lastAjaxFileUploadContextKey"] == null)
{
HttpContext.Current.Items["lastAjaxFileUploadContextKey"] = 1;
}
else
{
HttpContext.Current.Items["lastAjaxFileUploadContextKey"] = (int)HttpContext.Current.Items["lastAjaxFileUploadContextKey"] + 1;
}
ContextKey = HttpContext.Current.Items["lastAjaxFileUploadContextKey"].ToString();
}

There actually is a way to use multiple AjaxFileUpload controls on a single page, with each control firing its own event. The solution is very simple; it involves overriding one of Microsoft's client-side functions for the AjaxFileUpload control to inject information on the control that actually caused the upload complete event, then using a single event handler for all of the AjaxFileUpload controls as a "switchboard", which will subsequently fire the correct event handler for the control which created the event server-side.
Here's how to do it:
Add this script block somewhere after the head element of your page. If you're using master pages, put this in a placeholder for HTML content:
<script type="text/javascript">
Sys.Extended.UI.AjaxFileUpload.Control.prototype.doneAndUploadNextFile = function (c) {
var a = new XMLHttpRequest, b = this;
a.open("POST", "?contextKey=" + this._contextKey + "&done=1&guid=" + c._id + "&uplCtrlID=" + b.get_id(), true);
a.onreadystatechange = function () {
if (a.readyState == 4) if (a.status == 200) {
b.raiseUploadComplete(Sys.Serialization.JavaScriptSerializer.deserialize(a.responseText));
b._processor.startUpload()
}
else {
b.setFileStatus(c, "error", Sys.Extended.UI.Resources.AjaxFileUpload_error);
b.raiseUploadError(a);
throw "error raising upload complete event and start new upload";
}
};
a.send(null);
}
</script>
This code is the same function being used to call your page and trigger the UploadComplete event, only modified to add an extra parameter - uplCtrlID - which will contain the ID of the control that REALLY caused the event.
Set up your server side code as follows:
//set the OnUploadComplete property on all of your AjaxFileUpload controls to this method
protected void anyUploader_UploadComplete(object sender, AjaxFileUploadEventArgs e)
{
//call the correct upload complete handler if possible
if (Request.QueryString["uplCtrlID"] != null)
{
//uplCtrlID (the query string param we injected with the overriden JS function)
//contains the ID of the uploader.
//We'll use that to fire the appropriate event handler...
if (Request.QueryString["uplCtrlID"] == FileUploaderA.ClientID)
FileUploaderA_UploadComplete(FileUploaderA, e);
else if (Request.QueryString["uplCtrlID"] == FileUploaderB.ClientID)
FileUploaderB_UploadComplete(FileUploaderB, e);
//etc (or use a switch block - whatever suits you)
}
}
protected void FileUploaderA_UploadComplete(AjaxFileUpload sender, AjaxFileUploadEventArgs e)
{
//logic here
}
protected void FileUploaderB_UploadComplete(AjaxFileUpload sender, AjaxFileUploadEventArgs e)
{
//logic here
}
You're all set. Multiple AjaxFileUpload controls on the same page, no problems.

HttpModule - get HTML content or controls for modifications

Tried something like this:
HttpApplication app = s as HttpApplication; //s is sender of the OnBeginRequest event
System.Web.UI.Page p = (System.Web.UI.Page)app.Context.Handler;
System.Web.UI.WebControls.Label lbl = new System.Web.UI.WebControls.Label();
lbl.Text = "TEST TEST TEST";
p.Controls.Add(lbl);
when running this I get "Object reference not set to an instance of an object." for the last line...
How do I get to insert two lines of text (asp.net/html) at specific loactions in the original file?
And how do I figure out the extension of the file (I only want to apply this on aspx files...?

Its simplier than you think:
public void Init(HttpApplication app)
{
app.PreRequestHandlerExecute += OnPreRequestHandlerExecute;
}
private void OnPreRequestHandlerExecute(object sender, EventArgs args)
{
HttpApplication app = sender as HttpApplication;
if (app != null)
{
Page page = app.Context.Handler as Page;
if (page != null)
{
page.PreRender += OnPreRender;
}
}
}
private void OnPreRender(object sender, EventArgs args)
{
Page page = sender as Page;
if (page != null)
{
page.Controls.Clear(); // Or do whatever u want with ur page...
}
}
If the PreRender Event isn't sufficient u can add whatever Event u need in the PreRequestHandlerExecute EventHandler...

I'm not sure, but I don't think you can use an HttpModule to alter the Page's control tree (please correct me if I'm wrong). You CAN modify the HTML markup however, you'll have to write a "response filter" for this. For an example, see http://aspnetresources.com/articles/HttpFilters.aspx, or google for "httpmodule response filter".

It seems like the HttpFilter solution is doing the trick here :o)
If I had used MOSS/.net 2.x+ I could have used Runes version or just added my tags in a master page...
Super suggestions and after my test of the solution, I'll accept miies.myopenid.com's solution as it seems to solve thar actual issue

There have been some changes in how you write HttpModules in IIS7 as compared to IIS6 or 5, so it might be that my suggestion is not valid if you are using IIS7.
If you use the Current static property of the HttpContext you can get a reference to the current context. The HttpContext class has properties for both the Request (HttpRequest type) and the Response (HttpResponse) and depending on where which event you are handling (Application.EndRequest maybe?) you can perform various actions on these objects.
If you want to change the content of the page being delivered you will probably want to do this as late as possible so responding to the EndRequest event is probably the best place to do this.
Checking which file type that was requested can be done by checking the Request.Url property, maybe together with the System.IO.Path class. Try something like this:
string requestPath = HttpContext.Current.Request.Url.AbsolutePath;
string extension = System.IO.Path.GetExtension(requestPath);
bool isAspx = extension.Equals(".aspx");
Modifying the content is harder. You may be able to do it in one of the events of the Context object, but I am not sure.
One possible approach could be to write your own cusom Page derived class that would check for a value in the Context.Items collection. If this value was found you could add a Label to a PlaceHolder object and set the text of the label to whatever you wanted.
Something like this should work:
Add the following code to a HttpModule derived class:
public void Init(HttpApplication context)
{
context.BeginRequest += new EventHandler(BeginRequest);
}
void BeginRequest(object sender, EventArgs e)
{
HttpContext context = HttpContext.Current;
HttpRequest request = context.Request;
string requestPath = HttpContext.Current.Request.Url.AbsolutePath;
string extension = System.IO.Path.GetExtension(requestPath);
bool isAspx = extension.Equals(".aspx");
if (isAspx)
{
// Add whatever you need of custom logic for adding the content here
context.Items["custom"] = "anything here";
}
}
Then you add the following class to the App_Code folder:
public class CustomPage : System.Web.UI.Page
{
public CustomPage()
{ }
protected override void OnPreRender(EventArgs e)
{
base.OnPreRender(e);
if (Context.Items["custom"] == null)
{
return;
}
PlaceHolder placeHolder = this.FindControl("pp") as PlaceHolder;
if (placeHolder == null)
{
return;
}
Label addedContent = new Label();
addedContent.Text = Context.Items["custom"].ToString();
placeHolder .Controls.Add(addedContent);
}
}
Then you you modify your pages like this:
public partial class _Default : CustomPage
Note that the inheritance is changed from System.Web.UI.Page to CustomPage.
And finally you add PlaceHolder objects to your aspx files wherever you want you custom content.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Manipulating HTML document before displaying into WPF WebBrowser control - c#

You can use: http://html-agility-pack.net for manipulate the Html code on C#.

Related

Change font of html content converted to string using web browser control

Cannot get rendered html via WebBrowser

WebClient on Store Universal Apps

Ajax toolkit file upload is not called

HttpModule - get HTML content or controls for modifications

Categories

Resources