I have a simple app I am developing that needs to iterate through a list of URLs which are passed to a WebBrowsers Navigate function in a for each loop. I was hoping to see the DocumentCompleted event firing after each call of the Navigate function but it only seems to be fired after the whole form has completed loading - and this the loop has completed.
I guess I am missing something fundamental here but some help and advice would be great!
Thanks!
Here is a sample of code that I am trying...
This foreach loop runs n the Form Load event of the WinForms page I am using...
int id = 0;
foreach (DataRow row in quals.Rows)
{
URN = row["LAIM_REF"].ToString();
string URN_formated = URN.Replace("/", "_");
string URL = "http://URL_I_AM_GOING_TOO/";
string FullURL = URL + URN_formated;
wbrBrowser.ScriptErrorsSuppressed = true;
wbrBrowser.Refresh();
wbrBrowser.Navigate(FullURL);
id += 1;
label1.Text = id.ToString();
}
At the point the loop gets to the line:
wbrBrowser.Navigate(FullURL);
I was hoping that the event:
private void wbrBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
...
}
would fire therefore being able to run processes against each of the URLs returned in the loop.
Thanks!
I used:
while (wbrBackground.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); }
after the Navigate function and it now works as expected.
Related
I have a for loop that changes the URL
for (int i = 1; i < max; i += 50)
{
completed = false;
string currkey = country;
crawler.Navigate(new Uri("http://www.example.net/func.php?dom=" + currkey + "&key=&start=" + i));
Console.WriteLine("Navigating to " + "http://www.example.net/func.php?dom=" + currkey + "&key=&start=" + i);
while (!completed)
{
Application.DoEvents();
Thread.Sleep(500);
}
}
This is my documentcompleted handler
crawler.Refresh();
Console.WriteLine("Getting universities");
getUniversities();
Console.WriteLine("Finished getting universities");
completed = true;
When i get rid of the for loop and use a single link, it seems to navigate to the website correctly, but when i use for loop to load websites in order, it seems that the web browser gets stuck in the second iteration.
Example:
currkey = United States
In the first iteration, the website link will be http://www.example.net/func.php?dom="United States"&key=&start=1, and on the next one it will be http://www.example.net/func.php?dom="United States"&key=&start=51. The navigation gets stuck when trying to load the second link.
I have used the boolean completed to note that the current iteration is finished, but it is still stuck.
Any kind of help is appreciated
Your Thread.Sleep call is blocking the WebBrowser from continuing to load. What you should be doing is attaching to the DocumentCompleted event, and then loading the next page. Please don't use this while/sleep combination in WinForms - you should use the events that the controls expose.
Attach the event:
crawler.DownloadCompleted += CrawlerDocumentCompleted;
Event handler:
private void CrawlerDocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//The document has loaded - now do something
}
A final thought
As it looks like you are implementing a crawler, why are you using the WebBrowser control in WinForms to navigate. Surely all you are interested in is the html that the server serves up? Or is the page using JavaScript to load additional elements into the DOM, requiring you to use the WebBrowser?
You could use the WebClient class and the DownloadString or DownloadStringAsync methods. See https://msdn.microsoft.com/en-us/library/fhd1f0sw(v=vs.110).aspx
I'm having this wierd problem within the application I'm currently working on.
string searchText = "onMouseOver=\"CallList_onMouseOver(this);\" id=\"";
List<int> searchOrders = AllIndexesOf(scraper.clientBrowser.DocumentText, searchText);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < searchOrders.Count; i++)
{
string order = scraper.clientBrowser.DocumentText.Substring(searchOrders[i] + searchText.Length, 6);
scraper.clientBrowser.Document.GetElementById(order).InvokeMember("Click");
for (int j = 0; j < scraper.clientBrowser.Document.Window.Frames.Count; j++)
{
if (scraper.clientBrowser.Document.Window.Frames[j].Document != null && scraper.clientBrowser.Document.Window.Frames[j].Document.Body != null)
{
string orderText = scraper.clientBrowser.Document.Window.Frames[j].Document.Body.InnerText ?? "Nope";
//MessageBox.Show(j + Environment.NewLine + orderText);
if (!orderText.Contains("Nope"))
{
sb.AppendLine(orderText + Environment.NewLine);
}
}
}
}
Clipboard.SetText(sb.ToString());
The thing is, whenever I uncomment the MessageBox.Show, I can clearly see orderText is filled with another value than "Nope", the Stringbuilder gets filled, and the correct text is copied.
However if I comment the Messagebox.Show, the outcome of this loop is always "Nope". I'm stuck here, I have no idea what could cause something like this.
The scraper.clientBrowser is a System.Windows.Forms.WebBrowser.
Update:
Solved the issue by waiting for the document to be loaded, created this mechanism:
public bool DocumentLoaded
{
get { return documentLoaded; }
set { documentLoaded = value; }
}
private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
this.DocumentLoaded = true;
this.clientBrowser = sender as WebBrowser;
}
void clientBrowser_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
this.DocumentLoaded = false;
}
Then in the class I'm using:
while(!scraper.DocumentLoaded)
{
System.Threading.Thread.Sleep(100);
}
It sounds like you need to ensure that the page is fully loaded, like there might be a race condition. I would suggest wiring up the WebBrowser.DocumentCompleted event, and then attempting your scrapping logic.
Update
I overlooked this initially, this certainly has something to do with your issue. The line where you are invoking a click, like so scraper.clientBrowser.Document.GetElementById(order).InvokeMember("Click");. This is done in the iteration, which will more than likely manipulate the DOM -- will it not? I suggest going about this problem entirely different. What are you trying to achieve exactly, (not how you're trying to do it)?
With this alone, I would suggest that you refer to this SO Q/A and look at how they're waiting for the click to finish.
Only one thing I can guest here:
When you uncomment MessageBox.Show, at the time the message box show the info, the clientBrowser use this time to finish loading page. Then when you press OK on message box, the page is load completed, so you get the result. When you comment it, you dont wai for page loaded, so the result is diffent.
I developing winform (c#) to read html form website.
When i click button, Textbox1 don't set text after 1 seconds. It wait unit the end forech.
Now i want, function will set text for textbox in 1 seconds.
how do it?
this is the code:
when kick button1:
private void button1_Click(object sender, EventArgs e)
{
string url = "http://truyentranh8.com/danh_sach_truyen/";
var web = new HtmlWeb();
var doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//tbody/tr/td[#class='tit']/a[#class='tipsy']"))
{
textBox1.Text += node.InnerText + "\n";
Thread.Sleep(1000);
}
}
Thread.Sleep in your case puts the main thread in in sleep mode. It can't update the UI till it gets released and the button1_Click method is over. So you don't see text changes per second. All you'll see is Text being updated all at once.
So make it asynchronous. If you're using .Net 4.5, you can use async/await and make life simple.
private async void button1_Click(object sender, EventArgs e)
{
string url = "http://truyentranh8.com/danh_sach_truyen/";
var web = new HtmlWeb();
var doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//tbody/tr/td[#class='tit']/a[#class='tipsy']"))
{
textBox1.Text += node.InnerText + "\n";
await Task.Delay(1000);
}
}
If you are interested I have written article on this subject.
Do not use Thread.Sleep on an event thread for this task.
The problem is that the UI is not getting a chance to update as it redraws on the thread that is blocked. As such the UI update only appears after all the thread-blocking code ends and the Click handler is exited.
Use an appropriate Timer instead, or if feeling hackish, read up about DoEvents. Alternatively, consider doing the long running task in a BackgroundWorker - the UserState of the Progress event can be used to report partial updates, already marshaled back to the appropriate thread.
Use DoEvents to refresh the form every time you change something on design
private void button1_Click(object sender, EventArgs e)
{
string url = "http://truyentranh8.com/danh_sach_truyen/";
var web = new HtmlWeb();
var doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//tbody/tr/td[#class='tit']/a[#class='tipsy']"))
{
textBox1.Text += node.InnerText + "\n";
Application.DoEvents();
}
}
I need to read the value from PLC and display it in a form whenever the PLC Tag value changes.
There will list of tags which I need to monitor. Whenever a TAG value changes i need to call a function(different functions for each tag).
This is what I have done so far for capturing the Tag value change..
After connecting to PLC, i ll create the LIST of tags.
Read TAG values in timer.
While reading i ll check with OLDVALUES tag, if there is any change in value I ll raise an event.
This is working fine with 4 or 5 tags. When the Tag count is more, say 100, some of the Tag change events are not firing..
This is what so far I have done..
public delegate void DataChangedEventHandler(string TagName, string NewValue);
private Timer tmr = new Timer();
public event DataChangedEventHandler OnDataChanged;
private void StartTimer(DataTable dt)
{
AddTagList(dt);
SetInitialVales();
tmr.Tick += timerticks;
tmr.Interval = 250;
tmr.Enabled = true;
tmr.Start();
}
private void StopTimer()
{
tmr.Enabled = false;
tmr.Stop();
}
I ll add the list of tags..
private List<string> TagValues = new List<string>();
private List<string> oldValues = new List<string>();
private List<string> newValues = new List<string>();
private void AddTagList(DataTable dt)
{
int ILoop = 0;
foreach (DataRow row in dt.Rows)
{
TagValues.Add((string)row["Path"]);
ILoop = ILoop + 1;
}
}
To set the initial values of the Tags.
private void SetInitialVales()
{
int iLoop = 0;
foreach (string vals in TagValues)
{
var rd = ReadTag(vals);
oldValues.Add(rd.ToString());
newValues.Add(rd.ToString());
iLoop = iLoop + 1;
}
//newValues = oldValues
}
and the main datachange part.
private void timerticks(object sender, EventArgs eventArgs)
{
int iLoop = 0;
foreach (string vals in TagValues)
{
oldValues[iLoop] = ReadTag(vals).ToString();
if (oldValues[iLoop] != newValues[iLoop])
{
newValues[iLoop] = oldValues[iLoop];
if (OnDataChanged != null)
{
OnDataChanged(vals, newValues[iLoop]);
}
}
iLoop = iLoop + 1;
}
}
My Queries:
1.What will happen if a event is raised while already raised event is still in progress(sub procedure is not completed)?? Is because of this reason I am missing some datachange events??
2.How to raise a raise a event automatically whenever the member of LIST value changes??
3.Any other better method to handle the timer-read-raiseevent?
What will happen if a event is raised while already raised event is still in progress
The event won't be raised, not until your code is done executing the previous one. Clearly you'll run into trouble when the events you fire take too long, longer than 1 second. The more tags you have, or the more of them can change within one scan, the greater the odds that these events take more than 1 second and thus miss a tag change.
You'll need to de-couple the scanning from the event processing. You can do so with a worker thread that does nothing but check for tag changes in a loop. And if it sees any, put a update notification in a thread-safe queue. Another thread, like your UI thread, can empty the queue and process the notifications. The queue now acts as a buffer, providing enough storage to be able to keep up with a sudden burst of tag changes.
Wouldn't it be better to create a class with old-new value in it and then a map with the tag as key to access the old-new instance?
It seems otherwise you have a lot of things floating around that need to be kept synched.
I'm using WatiN to parse my web site. I have a button that starts the process. I open a browser window and navigate where I need to go, then I create a new task that calls a method called DoWork.
My problem is that if I call a new method at the end of DoWork to do something else I get strange results when I try to have the program navigate my website, however, if I don't call this new method from DoWork and just hook the new method up to a button click all works fine. So my question is am I not properly calling my new method from the background process method, Dowork?
Code:
IE browser = new IE("http://www.mywebsite.com/");
string startYear;
string endYear;
int NumRows;
Task myThread;
public Form1()
{
InitializeComponent();
}
private void Start_Click(object sender, EventArgs e)
{
startYear = txtStartYear.Text;
endYear = txtEndYear.Text;
//website navigation work removed for brevity
browser.Button(Find.ById("ContentPlaceHolder1_btnApplyFilter")).Click();
int numRows = browser.Div(Find.ById("scroller1")).Table(Find.First()).TableRows.Count -1;
NumRows = numRows;
lblTotalRows.Text = numRows.ToString();
myThread = Task.Factory.StartNew(() => DoWork());
}
public void DoWork()
{
List<string> myList = new List<string>(NumRows);
txtStartYear.Text = startYear;
txtEndYear.Text = endYear;
for (int i = 1; i < NumRows; i++)
{
TableRow newTable = browser.Div(Find.ById("scroller1")).Table(Find.First()).TableRows[i];
string coll = string.Format("{0},{1},{2},{3},{4}", newTable.TableCells[0].Text, newTable.TableCells[1].Text, newTable.TableCells[2].Text, newTable.TableCells[3].Text, newTable.TableCells[4].Text);
myList.Add(coll);
label1.Invoke((MethodInvoker)delegate
{
label1.Text = i.ToString();
});
}
//database work removed for brevity.
browser.Button(Find.ById("btnFilter")).Click();
newMethod();
}
public void newMethod()
{
int start = int.Parse(startYear);
start++;
startYear = start.ToString();
int end = int.Parse(endYear);
end++;
endYear = end.ToString();
browser.SelectList(Find.ById("selStartYear")).SelectByValue(startYear);
browser.SelectList(Find.ById("selEndYear")).SelectByValue(endYear);
//removed for brevity
}
}
To reiterate, if I call newMethod from Dowork the line browser.SelectList(Find.ById("selStartYear")).SelectByValue(startYear) doesn't behave properly, but if I remove the call to newMethod from Dowork and just hook newMethod up to a button it works fine. I'm wondering if it has to do with DoWork being a background task?
When I say it doesn't behave properly I mean that when you select an item from the drop down list the page auto posts back, however the above line of code selects it but the page doesn't post back, which shouldn't be possible. If I don't call the method within DoWork I don't have this issue.
You're modifying a UI element from a non-UI thread. You've already got code which deals with that within DoWork, via Control.Invoke - you need to do the same kind of thing for newMethod. It would probably be easiest just to invoke the whole method in the UI thread:
// At the end of DoWork
Action action = newMethod;
label.BeginInvoke(action);
(I'm using label.BeginInvoke as I'm not sure whether the browser itself is a "normal" control - but using label will get to the right thread anyway. If browser.BeginInvoke compiles, that would be clearer.)
I suspect it's a problem with the select list control. When I browse websites, I sometimes select drop down items by keyboard. Sometimes, it just doesn't postback, while using mouse always guarantee a postback.
I think you might be better off putting an extra button and do a browser.Button(Find.ById("btnFilter")).Click(); kind of thing to invoke a postback.
If the functions in the browser doesn't perform the proper cross thread checking, what Jon Skeet said should help.