I'm having a problem screenscraping some data from this website using the MSHTML COM component. I have a WebBrowser control on my WPF form.
The code where I retrieve the HMTL elements is in the WebBrowser LoadCompleted events. After I set the values of the data to the HTMLInputElement and call the click method on the HTMLInputButtonElement, it is refusing to submit the the request and display the next page.
I analyse the HTML for the onclick attribute on the button, it is actually calling a JavaScript function and it processes my request. Which makes me not sure if calling the JavaScript function is causing the problem? But funny enough when I take my code out of the LoadCompleted method and put it inside a button click event it actually takes me to the next page where as the LoadCompleted method didn't do. Doing that sort of thing defeats the point of trying to screenscrape the page automatically.
On another thought: when I had the code inside the LoadCompleted method, I'm thinking the HTMLInputButtonElement is not fully rendered on to the page which result in click event not firing, despite the fact when I looked at the object in run time it is actually held the submit button element there and the state is saying I completed which baffles me even more.
Here is the code I used inside the LoadCompleted method and the click method on the button:
private void browser_LoadCompleted(object sender, NavigationEventArgs e)
{
HTMLDocument dom = (HTMLDocument)browser.Document;
IHTMLElementCollection elementCollection = dom.getElementsByName("PCL_NO_FROM.PARCEL_RANGE.XTRACKING.1-1-1.");
HTMLInputElement inputBox = null;
if (elementCollection.length > 0)
{
foreach (HTMLInputElement element in elementCollection)
{
if (element.name.Equals("PCL_NO_FROM.PARCEL_RANGE.XTRACKING.1-1-1."))
{
inputBox = element;
}
}
}
inputBox.value = "Test";
elementCollection = dom.getElementsByName("SUBMIT.DUM_CONTROLS.XTRACKING.1-1.");
HTMLInputButtonElement submitButton = null;
if (elementCollection.length > 0)
{
foreach (HTMLInputButtonElement element in elementCollection)
{
if (element.name.Equals("SUBMIT.DUM_CONTROLS.XTRACKING.1-1."))
{
submitButton = element;
}
}
}
submitButton.click();
}
FYI: This is the URL of the web page I'm trying to access using MSHTML,
http://track.dhl.co.uk/tracking/wrd/run/wt_xtrack_pw.entrypoint.
There are many possibilities:
You may try to put your code at
other events, such as on Navigation
Completed, or on Download Completed.
You may need to explicitly evaluate the OnClick event after the click() function.
Using the MS WebBrowser control is
easier than using the MSHTML COM.
To make life easier, you may just use a webscraping library such as the IRobotSoft ActiveX control to automate your entire process.
Delay in OnBeforeNavigate can cause click actions to fail.
We have noticed that with some submit actions OnBeforeNavigate is called twice, especially where onClick is used. The first call is before the onClick action is performed, the second is after it is complete.
Turn off your BHO, put a breakpoint on onClick, step over the submit action return jsSubmit() and then wait a bit and you should be able to cause the same issue without your automation.
Any delay >150ms on the second call to OnBeforeNavigate causes some failure in page load/navigation to the result.
Edit:
Having tried our own automation of this DHL page we don't currently have an issue with the timing described above.
Related
On my Xamarin.Forms app I using a three different types of Buttons. These are their on click functionalities:
Navigation Buttons:
I use them to navigate to an other page if I will need to came back later to the previous page.
btn_navigate.Clicked += (sender,e) => {
Navigation.PushAsync(new Page());
};
Error: If the new page takes a few seconds to load, doing a quick multi click on these buttons open several pages.
Non returnable navigation Buttons:
I use them to navigate to an other page when I want to avoid the user came back to the previous page. I destroy the current page after I insert before my new page.
btn_non_returnable_navigate.Clicked += (sender,e) => {
Navigation.InsertPageBefore(new Page(), this);
Navigation.PopAsync();
};
Error: If the new page takes a few seconds to load, doing a quick multi click on these buttons throws an exception: Before must be in the pushed stack of the current context. This is because the first click create the new page and destroy the current one, the second click cant not destroy to the current page because it is already destroyed so it throws the exception.
HTTP request Buttons:
I use them to send a HTTP request to the server. Usually after the HTTP request was completed it navigate to an other page. Those are obviously the more important ones.
btn_http_request.Clicked += (sender,e) => {
Uri uri = new Uri("http://192.168.0.1:8080/request");
HttpClient client = new HttpClient();
HttpResponseMessage http_response = client.GetAsync(uri);
....
Navigation.InsertPageBefore(new Page(), this);
Navigation.PopAsync();
};
Error: If the request takes a few seconds to get the answer, doing a quick multi click on these buttons throws several HTTP request. It should not happen.
All of these buttons are combined on the same page multiple times. So if the user try to click multiple buttons repeatedly he will tear down the app.
These issues are caused because the asynchronously of the button actions. But I think it will be solve by using a locker for these buttons so they could only clicked once a time.
Have Buttons some property like this? If it is not, how could I create a Buttons extension that fix this problems? I would like to have as simple solution as it possible. I need to control this issue in a lots of Buttons and if it is a complicated solutions it may do a tricky code problem.
On WPF this code works as I need but on Xamarin there is not RoutedEventArgs:
C# Button extension
public partial class ButtonEx : Button{
public bool Active;
public ButtonEx(){
InitializeComponent();
Active = true;
}
private void Extension_Click(object sender, RoutedEventArgs e){
if (Active) {
Active = false;
} else {
e.Handled = true;
}
}
}
XAML Button extension
<Button x:Class="test.ButtonEx"
..
Click="Extension_Click">
</Button>
C# on the code
...
ButtonEx buttonEx = new ButtonEx()
//Click function of the button
buttonEx.Click += (sender, e) =>{
//This code only happen if buttonEx.active = true
...
(sender as Button).Active = true;
}
...
When you click the extended button it throws two events. The fist one, the Extension_Click and the second one the Click function of the button. The second event only happen when the buttonEx.active = true. The first even allows or blocks the second.
It is there an alternative like this to Xamarin.Formns?
Thank you
The best practice for this including on WPF is to make the button disabled while it performs some action and shouldn't/can't perform additional actions. That is by the book. Technically someone can make a new sort of button that would perform this automatically. I would guess that the fact that no one made it by now is that this is not worth the effort / brings at least as much troubles as it resolves. Unfortunately I would guess that if you need something like that you will have to make it yourself, but it is definitely possible.
I'm creating a simple web-scraper. I want it to download data from a specific webpage. However, the data I want appears after clicking on a div. I'm trying to find that div, invoke click event on it, and then download the page source (after showing the hidden data). The data probably appears on the page after a javascript executes. I had to set WebBrowser.ScriptErrorsSuppressed to true, because too many errors would pop up. Currently I'm using the following code:
WebBrowser browser = new WebBrowser();
//Navigate etc...
foreach (HtmlElement el in browser.Document.GetElementsByTagName("div"))
{
if (el.GetAttribute("className").ToString().Equals(className))
{
el.InvokeMember("click");
foreach(HtmlElement child in el.Children)
{
child.InvokeMember("click");
}
}
}
browser.Document.GetElementsByTagName("body")[0].InvokeMember("click");
while (browser.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
Debug.WriteLine("State: " + browser.ReadyState);
System.Threading.Thread.Sleep(50);
}
string source = browser.DocumentText;
This doesn't work. The hidden data isn't shown. I've tried using RaiseEvent instead of InvokeMember, and changing click to onclick. Nothing worked.
Btw. the code invokes click on every child, because I'm not sure which one makes the data appear.
Does anyone know what goes wrong?
I am currently working on an app for WP7 for my university, and need a temporary solution to a problem. Now this solution is, that I will be loading a webpage using the web browser control for WP7. For example: http://m.iastate.edu/laundry/
Now as you see on the webpage, there are certain elements I want to hide, for example the back button. For now, what I have done to handle the back button is something like this:
private void webBrowser1_Navigating(object sender, NavigatingEventArgs e)
{
// Handle loading animations
// Handle what happens when the "back" button is pressed
Uri home = new Uri("http://m.iastate.edu/");
// The the current loading address is home
// Cancel the navigation, and go back to the
// apps home page.
if (e.Uri.Equals(home))
{
e.Cancel = true;
NavigationService.Navigate(new Uri("/MainPage.xaml", UriKind.Relative));
}
}
Now that works beautifully, except for the part that there is a back button on the hardware.
So my second option is to completely hide the back button ONLY on that page, and not its children. So not on http://m.iastate.edu/laundry/l/0
I am still debating on just parsing the data and displaying it in my own style, but I'm not sure if that's completely needed seeing how the data needs constant internet service and is already in a well-put format. Plus, I feel like that would be a waste of resources? Throw in your opinions on that too :)
Thanks!
You should inject a script in the page with InvokeScript.
Here is the kind of Javascript code you need to remove the back button:
// get the first child element of the header
var backButton = document.getElementsByTagName("header")[0].firstChild;
// check if it looks like a back button
if(backButton && backButton.innerText == "Back") {
// it looks like a back button, remove it
document.getElementsByTagName("header")[0].removeChild[backButton];
}
Call this script with InvokeScript:
webBrowser1.InvokeScript("eval", "(function() { "+ script +"}()");
Warning: IsScriptEnabled must be set to true on the web control
If the removal of the back button depends of the page, just test the navigating URI in C# and inject the script if neeeded.
I have a RadGrid within a RadAjaxPanel that has a View button that displays a user control in a jQuery popup, also within a RadAjaxPanel, that displays details of the grid record with a delete button. Clicking the delete button causes a partial postback that causes the record to be deleted and the grid to be rebound, removing the deleted record from it.
What I then need to do is run some client script to close the popup. I have tried:
private void RiskEditor_DeleteClick( object sender, EventArgs e )
{
this.grdRiskAnalysis.Rebind();
ScriptManager.RegisterStartupScript(this.RadAjaxPanelRiskEditor,
this.RadAjaxPanelRiskEditor.GetType(),
"closepopup",
"delayClosePopup($j(this).closest('.ui-dialog'), 1000);",
true);
}
In this example, RadAjaxPanelRiskEditor is the AjaxPanel that the User Control is in, but I have also tried registering the script with the panel that the grid is in. Neither works.
Can someone explain where I am going wrong and how to achieve this?
Thanks
Stewart
There are a couple of things I will do:
I would use ScriptManager.RegisterClientScriptBlock, since the RegisterStartupScript, according to msdn: "Registers a startup script block for every asynchronous postback".
If that does not work, there are some client js from ms: http://msdn.microsoft.com/en-us/library/bb397536 and you can use: http://msdn.microsoft.com/en-us/library/bb383810.aspx - Sys.WebForms.PageRequestManager endRequest, that will execute a specific js code, so maybe you can put your close dialog code there, with some conditional logic.
So I've been working on this project, and we have a WebBrowser object on the form. The purpose of the object is that it loads in HTML Forms into it to be viewed, at this current point in time though, you are able to edit the contents of the HTML form, which is not desired.
I want to simply display this HTML form of information to the user, but not allow them to alter the textboxes or checkboxes or anything of that nature on the form.
I tried using the Navigating event and set e.cancel = true;. This haulted the control from even loading the page. And if I set it to only execute e.cancel = true; after the form had loaded, I could still change text boxes and such on the form, as it only seemed to randomly called the Navigating event.
Does anyone know of a way to get a WebBrowser object to be read only?
Cheers!
You can apply contentEditable attribute to the Body tag of the document.
Document.Body.SetAttribute("contentEditable", false);
This will make your document readonly for user.
You could try accessing all form elements on the page and set the readonly attribute on the tag. Something like:
var inputs = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement element in inputs)
{
element.SetAttribute("readonly", "readonly");
}
You'd obviously have to repeat the process for all form elements (select etc.), but it should work.
I have been running into this issue as well. Thanks to steavy I have been able to come up with a solution :
Hook up to the DocumentCompleted event (you can do this in the designer) :
myWebBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_procedure_DocumentCompleted);
Then make it readonly in the event :
private void myWebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
myWebBrowser.Document.Body.SetAttribute("contentEditable", "false");
}
I do this in the event when the document is fully loaded because I sometimes ran into a NullReferenceException, the body wasn't loaded yet and the line would throw.