WebBrowser - empty DocumentText - c#

I'm trying to use WebBrowser class, but of course it doesn't work.
My code:
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
while(browser.DocumentText == "")
{
continue;
}
string html = browser.DocumentText;
browser.DocumentText is always "". Why?

You should use DocumentCompleted event, and if you don't have WebForms application, also ApplicationContext might be needed.
static class Program
{
[STAThread]
static void Main()
{
Context ctx = new Context();
Application.Run(ctx);
// ctx.Html; -- your html
}
}
class Context : ApplicationContext
{
public string Html { get; set; }
public Context()
{
WebBrowser browser = new WebBrowser();
browser.AllowNavigation = true;
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
browser.Navigate("http://www.google.com");
}
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
Html = ((WebBrowser)sender).DocumentText;
this.ExitThread();
}
}

The WebBrowser isn't going to do it's job until the current thread finishes it's work, if you changed it to be something like this:
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
browser.Navigated += (s, e) =>
{
var html = browser.DocumentText;
};
The variable will be set.
But, as others have mentioned, the document completed is a better event to attach to, as at that time, the entire document will be completed (appropriate name!)
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
browser.DocumentCompleted += (s, e) =>
{
var html = browser.DocumentText;
html.ToString();
};

Attach to the DocumentCompleted event, the code is as below
browser.DocumentCompleted += (s, e) =>
{
string html = browser.DocumentText;
};

If you need the DocumentText you should handle the DocumentCompleted event
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
See event below
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = (WebBrowser)sender;
string text = wb.DocumentText;
}

Try something like this
string url = string.Empty:
string html = "http://www.google.com/";
string url = html;
if (!url.StartsWith("http://") && !url.StartsWith("https://"))
{
url = "http://" + url;
}
browser.Navigate(new Uri(url));
replace it within your While loop where necessary

Related

Running a Webbrowser thread in a task

I have a program that runs and starts 2 long running tasks. One of the tasks is a web scraper in which I have to use the WebBrowser ActiveX control so that I can get the rendered page. In order to use the control I have to start a thread so that I can set the apartment state for the message loop. When I do this, the proogram works fine, or at least for the first page that is fetched. Subsequent pages/calls, the webbrowser times out and it's state seems to remain at "uninitialized". In tracing my code, I never see the "HandleDestroyed" fire for the WebClient.
What do I need to do to Properly Destroy the WebBrowser control and or my own class in order for it to be reused again.
public static string AXFetch(string url, string ua)
{
TestBrowser TB = new TestBrowser();
Thread th = new Thread(() => TB.MakeLiRequest(url,ua));
th.SetApartmentState(ApartmentState.STA);
th.Start();
th.Join(new TimeSpan(0, 0, 90)); //90 second timeout
SiteData = TB.DocumentText;
TB = null;
return SiteData;
}
class TestBrowser
{
public string DocumentText = "";
private bool DocCompleted = false;
public TestBrowser()
{
}
private void reset_fetch_status()
{
this.DocCompleted = false;
this.DocumentText = "";
}
public void MakeLiRequest(string url, string UA)
{
reset_fetch_status();
using (WebBrowser wb = new WebBrowser())
{
wb.Visible = false;
wb.AllowNavigation = true;
wb.ScriptErrorsSuppressed = true;
wb.DocumentCompleted += this.wb_DocumentCompleted;
wb.Navigate(url, "_self", null, "User-Agent: " + UA + "\r\n");
WaitForPage();
wb.Url = null;
wb.DocumentCompleted -= this.wb_DocumentCompleted;
}
}
private void HandleDestroyed(Object sender, EventArgs e)
{
//This never seems to fire, I don't knwo why
Logging.DoLog("You are in the Control.HandleDestroyed event.");
}
private bool WaitForPage()
{
int timer = 0;
while (this.DocCompleted == false)
{
Application.DoEvents();
System.Threading.Thread.Sleep(100);
++timer;
if (timer == (PageTimeOut * 10))
{
Console.WriteLine("WebBrowser Timeout has been reached");
Application.Exit();
return false;
}
}
return true;
}
private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = (WebBrowser)sender;
if (wb.ReadyState == WebBrowserReadyState.Complete)
{
this.DocumentText = wb.DocumentText;
this.DocCompleted = true;
}
}
}
On handle destroyed will only be called by the parent form.
If you were to try to access from the webbrowser control you would get this error:
Error 1 Cannot access protected member
'System.Windows.Forms.Control.OnHandleDestroyed(System.EventArgs)' via a
qualifier of type 'System.Windows.Forms.WebBrowser'; the qualifier must be of type 'stackoverflowpost47036339.Form1' (or derived from it)
Also you are not hooking it up. But since you haven't given your web browser any parent form, It can't be called. This is how you would hook it up to the parent form.
form1.HandleDestroyed += Form1_HandleDestroyed;
}
void Form1_HandleDestroyed(object sender, EventArgs e)
{
}

FinishLoadingFrameEvent event is called once when loading multiple urls

I have a list of urls in a for loop, loading a url one at a time but FinishLoadingFrameEvent event is called only once.
My complete code is like this
private List<string> urls = //fetch from db;
ManualResetEvent waitEvent = new ManualResetEvent(false);
BrowserView webView = new WPFBrowserView();
string path = //my local path;
public MainWindow()
{
InitializeComponent();
mainLayout.Children.Add((UIElement)webView.GetComponent());
webView.Browser.FinishLoadingFrameEvent += delegate (object sender, FinishLoadingEventArgs e)
{
System.Threading.Thread.Sleep(5000);
if (e.IsMainFrame)
{
DOMDocument document = e.Browser.GetDocument();
var html = document.DocumentElement.InnerHTML;
System.IO.File.WriteAllText(path, html);
waitEvent.Set();
}
};
foreach (var url in urls)
{
webView.Browser.LoadURL(url);
waitEvent.WaitOne();
waitEvent.Reset();
}
}
Am i missing something?
Your code seems to work as expected for my set of URLs.
Here is the complete sample code with all the modifications:
public partial class MainWindow : Window
{
private List<string> urls = new List<string>
{ "google.com", "microsoft.com", "teamdev.com", "teamdev.com/dotnetbrowser" };
ManualResetEvent waitEvent = new ManualResetEvent(false);
BrowserView webView = new WPFBrowserView();
string path = "html.txt";
public MainWindow()
{
InitializeComponent();
mainLayout.Children.Add((UIElement)webView.GetComponent());
webView.Browser.FinishLoadingFrameEvent += delegate (object sender,
FinishLoadingEventArgs e)
{
//System.Threading.Thread.Sleep(5000);
if (e.IsMainFrame)
{
DOMDocument document = e.Browser.GetDocument();
var html = document.DocumentElement.InnerHTML;
System.IO.File.WriteAllText(path, html);
waitEvent.Set();
}
};
foreach (var url in urls)
{
Debug.WriteLine($"Loading {url}");
webView.Browser.LoadURL(url);
waitEvent.WaitOne();
Debug.WriteLine($"{url} loaded");
waitEvent.Reset();
}
}
}
You can notice that I have commented out the Thread.Sleep call in the event handler. Uncommenting it simply makes everything run much slower, but it still works.

NET webbrowser - get HTML element ID by clicking on image

I have a C# winform project with a webbrowser control. I'm loading an HTML page with images into the webbrowser. Each image has a different ID:
<img src="F:\Temp\file12948.jpg" id="12948" width="180px">
Is there a way to pass the ID into a variable when clicking on the image so I can use the ID in my code? The path to the image can also be used as I can extract the number from there.
I have already searched here there and everywhere for a solution but can't find anything related.
You can dynamically attach to image's onClick event.
public class TestForm : Form
{
WebBrowser _WebBrowser = null;
public TestForm()
{
_WebBrowser = new WebBrowser();
_WebBrowser.ScriptErrorsSuppressed = true;
_WebBrowser.Dock = DockStyle.Fill;
this.Controls.Add(_WebBrowser);
WebBrowserDocumentCompletedEventHandler Completed = null;
Completed = (s, e) =>
{
//add onclick event dynamically
foreach (var img in _WebBrowser.Document.GetElementsByTagName("img").OfType<HtmlElement>())
{
img.AttachEventHandler("onclick", (_, __) => OnClick(img));
}
_WebBrowser.DocumentCompleted -= Completed;
};
_WebBrowser.DocumentCompleted += Completed;
var imgurl = "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.png";
//_WebBrowser.Navigate("http://edition.cnn.com/2017/09/09/us/hurricane-irma-cuba-florida/index.html");
_WebBrowser.DocumentText = $"<html> <img src='{imgurl}' id=123 /> </html>";
}
void OnClick(HtmlElement img)
{
MessageBox.Show(img.GetAttribute("id"));
}
}
On simple way would be to use browser navigation. When clicking you can navigate to a special URL, then you handle the Navigating event and if the url is the special url you cancel the navigation and handle the data.
public MainWindow()
{
InitializeComponent();
br.NavigateToString(#"<img src=""F:\Temp\file12948.jpg"" id=""12948"" width=""180px"" >");
br.Navigating += this.Br_Navigating;
}
private void Br_Navigating(object sender, NavigatingCancelEventArgs e)
{
if(e.Uri.Host == "messages")
{
MessageBox.Show(e.Uri.Query);
e.Cancel = true;
}
}
This works if you have some control over the HTML. You could also set the URL from JS if you don't want to add the anchor.
Edit
The above version is for a WPF application. The winforms version is as follows:
public Form1()
{
InitializeComponent();
webBrowser1.DocumentText = #"<img src=""F:\Temp\file12948.jpg"" id=""12948"" width=""180px"" >";
webBrowser1.Navigating += this.webBrowser1_Navigating;
}
private void webBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
if (e.Url.Host == "messages")
{
MessageBox.Show(e.Url.Query);
e.Cancel = true;
}
}

C# using webbrowser documenttext, document stays null

I hope the title was clear enough, but I will try to explain...
I'm using C# Winforms ( dotnet 4.5 ).
The thing is that I'm creating a WebBrowser control and try to set the content with wb.DocumentText. But when I try to loop through the elements, it says that the document is empty (null)
Here's my code:
WebBrowser wb = new WebBrowser();
wb.DocumentText = leMessage;
HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");
foreach (HtmlElement elem in elems)
{
// Do Some Stuff
}
leMessage holds an HTML newsletter message and there are some a tags in it.
I've already tried this: wb.Document.Body.InnerHtml = leMessage; but that didn't work either...
What did I miss or do wrong?
WebBrowser.DocumentText is asynchronous. You need to handle DocumentComplete before you can access the DOM, and keep pumping Windows messages. Here's a complete example of web-scraping, using async/await to keep the convinient linear code flow. Just alter the navigation part:
await NavigateAsync(ct, () => this.webBrowser.DocumentText = leMessage), timeout);
HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");
This way you could do it in a loop. In a nutshell:
using System;
using System.Diagnostics;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace WinformsApp2
{
public partial class MainForm : Form
{
public MainForm()
{
InitializeComponent();
}
const string leMessage = "<a href='http://example.com'>Go there</a>";
private async void MainForm_Load(object sender, EventArgs e)
{
var wb = new WebBrowser();
TaskCompletionSource<bool> tcs = null;
WebBrowserDocumentCompletedEventHandler documentCompletedHandler = (sender2, e2) => tcs.TrySetResult(true);
for (int i = 0; i < 3; i++)
{
tcs = new TaskCompletionSource<bool>();
wb.DocumentCompleted += documentCompletedHandler;
try {
wb.DocumentText = leMessage;
await tcs.Task;
}
finally {
wb.DocumentCompleted -= documentCompletedHandler;
}
HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");
foreach (HtmlElement elem in elems)
{
Debug.Print(elem.OuterHtml);
}
}
}
}
}
You need to loop elements after event webBrowser1_DocumentCompleted is triggered .Therefore you need to have it in your code
webBrowser1.DocumentCompleted+=new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
private void webBrowser1_DocumentCompleted(object sender,WebBrowserDocumentCompletedEventArgs e)
{
//here you can to loop your elements
}
Try this:
WebBrowser wb;
private void Form1_Load(object sender, EventArgs e)
{
wb = new WebBrowser();
wb.DocumentCompleted += wb_DocumentCompleted;
wb.DocumentText = "<html><body><a href='#'>Test</a></body></html>";
}
void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection elems = ((WebBrowser)sender)
.Document.GetElementsByTagName("a");
foreach (HtmlElement elem in elems)
{
// Do Some Stuff
}
}

C#: Stop loop until an event

I have a for loop and inside there is a navigate method for a browser. and it's suppose to load diffrent sites, but the problem is that it will start to load 1 site and before it will load it, it'll load another site. so I need to like pause it until it's completed.
I started to write an event to when the ProgressChanged event is at 100%.. than I figured I don't have any idea what to do next but I think it's a start.
Please help, Thanks!
Edit: I am using Forms as Roland said.
I assume you are doing windows forms programming. The event you want is DocumentCompleted Here's an example:
public Uri MyURI { get; set; }
public Form1()
{
InitializeComponent();
MyURI = new Uri("http://stackoverflow.com");
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
webBrowser1.Url = MyURI;
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if(e.Url == MyURI)
MessageBox.Show("Page Loaded");
}
For a list of URIs it's straight forward.
public int CurrentIndex = 0;
List<Uri> Uris;
public Form1()
{
InitializeComponent();
Uris = new List<Uri> { new Uri("http://stackoverflow.com"), new Uri("http://google.com/") };
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
webBrowser1.Url = Uris[CurrentIndex];
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = (WebBrowser)sender;
if (e.Url == Uris[CurrentIndex])
{
CurrentIndex++;
if (CurrentIndex < Uris.Count)
{
browser.Url = Uris[CurrentIndex];
}
}
}

Categories