I try to use a webbrowser control in my application, in which I want to block scrips and frames.
I used the extended web browser control in this answer to have access to download control flags.
So, I used it as follows in the form constructor:
webBrowser1.DownloadControlFlags = (int)WebBrowserDownloadControlFlags.DLIMAGES
+ (int)WebBrowserDownloadControlFlags.NOFRAMES
+ (int)WebBrowserDownloadControlFlags.NO_SCRIPTS
+ (int)WebBrowserDownloadControlFlags.NO_FRAMEDOWNLOAD
+ (int)WebBrowserDownloadControlFlags.NO_JAVA
+ (int)WebBrowserDownloadControlFlags.NO_DLACTIVEXCTLS
+ (int)WebBrowserDownloadControlFlags.NO_BEHAVIORS
+ (int)WebBrowserDownloadControlFlags.NO_RUNACTIVEXCTLS
+(int)WebBrowserDownloadControlFlags.SILENT;
It seems works, but I have a certain injected script which I want to run it. I injected it after the document was loaded (in DocumentCompleted event)
IHTMLDocument2 doc2 = webBrowser1.Document.DomDocument as IHTMLDocument2;
IHTMLScriptElement script = (IHTMLScriptElement)doc2.createElement("SCRIPT");
script.type = "text/javascript";
script.text = #"// Highlight Words Script ....";
IHTMLElementCollection nodes = doc.getElementsByTagName("head");
foreach (IHTMLElement elem in nodes)
{
//Append script
HTMLHeadElement head = (HTMLHeadElement)elem;
head.appendChild((IHTMLDOMNode)script);
}
But it doesn't run as I call it
wb.Document.InvokeScript("findString", new string[] { toWord });
How can I run my script while I have suppressed running the document scripts?
Can I let scripts run but block script errors and undesired behaviours using other flags?
Related
I am new in c# programming. I am trying to scrape data from div (I want to display temperature from web page in Forms application).
This is my code:
private void btnOnet_Click(object sender, EventArgs e)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
var temperatura = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[3]/div/section/div/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]");
onet.Text = temperatura.InnerText;
}
This is the exception:
System.NullReferenceException:
temperatura was null.
You can use this:
public static bool TryGetTemperature(HtmlAgilityPack.HtmlDocument doc, out int temperature)
{
temperature = 0;
var temp = doc.DocumentNode.SelectSingleNode(
"//div[contains(#class, 'temperature')]/div[contains(#class, 'temp')]");
if (temp == null)
{
return false;
}
var text = temp.InnerText.EndsWith("°") ?
temp.InnerText.Substring(0, temp.InnerText.Length - 5) :
temp.InnerText;
return int.TryParse(text, out temperature);
}
If you use XPath, you can select with more precission your target. With your query, a bit change in the HTML structure, your application will fail. Some points:
// is to search in any place of document
You search any div that contains a class "temperature" and, inside that node:
you search a div child with "temp" class
If you get that node (!= null), you try to convert the degrees (removing '°' before)
And check:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
if (TryGetTemperature(doc, out int temperature))
{
onet.Text = temperature.ToString();
}
UPDATE
I updated a bit the TryGetTemperature because the degrees are encoded. The main problem is the HTML. When you request the source code you get some HTML that browser update later dynamically. So the HTML that you get is not valid for you. It doesn't contains the temperature.
So, I see two alternatives:
You can use a browser control (in Common Controls -> WebBrowser, in the Form Tools with the Button, Label...), insert into your form and Navigate to the page. It's not difficult, but you need learn some things: wait to events for page downloaded and then get source code from the control. Also, I suppose you'll want to hide the browser control. Be carefully, sometimes the browser doesn't works correctly if you hide. In that case, you can use a visible Form outside desktop and manage activate events to avoid activate this window. Also, hide from Task Window (Alt+Tab). Things become harder in this way but sometimes is the only way.
The simple way is search the location that you want (ex: Madryt) and look in DevTools the request done (ex: https://pogoda.onet.pl/prognoza-pogody/madryt-396099). Use this Url and you get a valid HTML.
Can somebody tell me how to use Optimus (headless browser) nuget package with C# to get response from a URL. I also want javascript on the page to be executed automatically like phantomjs.
Quite a simple bit of kit:
Create an Engine component first (common for dynamic and static pages):
Engine engine = new Engine();
Open the url of the html document you want to retreive:
a) Not waiting for any elements added in with javascript:
engine.OpenUrl("http://google.com").Wait();
b) Waiting for any elements added in with javascript:
engine.OpenUrl("http://google.com")
and then either:
engine.WaitDesappearingOfId("some-id")
engine.WaitId("some-id")
engine.WaitDocumentLoad()
engine.WaitSelector("#some-id")
engine.WaitSelector(".some-class")
now you open the url, there are two ways of doing this -
load the document (prior to any javascript being executed):
More complete examples:
public static string dynamicLoadingPage()
{
var engine = new Engine();
engine.OpenUrl("https://html5test.com");
var tagWithValue = engine.WaitSelector("#score strong").FirstOrDefault();
System.Console.WriteLine("Score: " + tagWithValue.InnerHTML);
}
Otherwise:
static string staticLoadingPage()
{
var engine = new Engine();
engine.OpenUrl("http://google.com").Wait();
Console.WriteLine("The first document child node is: " + engine.Document.FirstChild);
Console.WriteLine("The first document body child node is: " + engine.Document.Body.FirstChild);
Console.WriteLine("The first element tag name is: " + engine.Document.ChildNodes.OfType<HtmlElement>().First().TagName);
Console.WriteLine("Whole document innerHTML length is: " + engine.Document.DocumentElement.InnerHTML.Length);
}
First I tried to run from a WebBrowser Control
WebBrowser webBrowser1 = new WebBrowser();
webBrowser1.Visible = false;
webBrowser1.Navigate("about:blank");
webBrowser1.Document.Write("<html><head></head><body></body></html>");
HtmlElement head = webBrowser1.Document.GetElementsByTagName("head")[0];
dynamic scriptEl = webBrowser1.Document.CreateElement("script");
scriptEl.DomElement.text = "function test(fn) { try{ window[fn](); } catch(ex) { return 'abc '.trim(); } }"
+ "function sayHello() { alert('ha'); throw 'error with spaces '; }";
head.AppendChild(scriptEl);
var result = webBrowser1.Document.InvokeScript("test", new object[] { "sayHello" });
It works almost perfectly. It knows what a window, alert is... The only problem is that it apparently runs on ECMA3, so when I tested "abc ".trim() it couldn't execute.
My second attempt was Javascript .NET.
using (JavascriptContext context = new JavascriptContext())
{
// Setting external parameters for the context
//context.SetParameter("console", new SystemConsole());
context.SetParameter("message", "Hello World ! ");
// Script
string script = #"
alert(message.trim());
";
// Running the script
context.Run(script);
}
Unfortunately it doesn't know what alert, window, document, console... is. Unless I tell it setting context parameters.
What else is there? May I should try some headless browsers and invoke using Process?
If you want to run JavaScript server side, I would recommend using PhantomJS. It allows you to run a full WebKit browser from the command line using JavaScript and command line arguments.
JavaScript is definitely not just for client-side scripting any more. As Cameron said PhantomJS is excellent if you need the DOM. If you don't, NodeJS is the clear choice with a wealth of libraries.
I need to add a new option to a selectList in one of my unit tests, and I can't figure out how to do it.
The Dropdown currently has 2 options, I want to add a third, and use it.
I tried to use JavaScript injection using http://stevenharman.net/blog/archive/2007/07/10/add-option-elements-to-a-select-list-with-javascript.aspx as a base, but that failed. I get exceptions that crash the IE browser every time, and the text "RunScript failed" gets printed into my logs even though I don't use that text in my error output.
Is this possible in Watin? Or has Open Source Failed me?
Using the code in the link you provided, with one small change I've gotten it to work.
My changes
Changed the ID to the ID of my dropdown (of course!)
Changed the $ in the element get to 'document.getElementById'. With the $ in there instead I don't see any obvious errors or anything like that; just no action taken.
The 'New Option' is added to the dropdown as the last item and it is the selected item.
string js = "";
js = js + "var theSelectList = document.getElementById('myDropDownID'); ";
js = js + " AddSelectOption(theSelectList, \"My Option\", \"123\", true);";
js = js + " function AddSelectOption(selectObj, text, value, isSelected) ";
js = js + "{";
js = js + " if (selectObj != null && selectObj.options != null)";
js = js + "{";
js = js + " selectObj.options[selectObj.options.length] = new Option(text, value, false, isSelected);";
js = js + "}}";
myIE.Document.Eval(js);
My setup
WatiN 2.0
IE8
Win7
Checked when the dropdown has 1 entry and 2 entries; both scenarios had "My Option" added without issue.
I'm doing some web automation via C# and a WebBrowser. There's a link which I need to 'click', but since it fires a Javascript function, apparently the code needs to be executed rather than just having the element clicked (i.e. element.InvokeMember("click")). Here's the href for the element, which opens an Ajax form:
javascript:__doPostBack("ctl00$cphMain$lnkNameserverUpdate", "")
I've tried:
webBrowser1.Document.InvokeScript("javascript:__doPostBack", new object[] { "ctl00$cphMain$lnkNameserverUpdate", "" });
and:
webBrowser1.Document.InvokeScript("__doPostBack", new object[] { "ctl00$cphMain$lnkNameserverUpdate", "" });
and a few other things. The code gets hit, but the script doesn't get fired. Any ideas would be most appreciated.
Gregg
BTW Here's the full element in case it's useful:
NS51.DOMAINCONTROL.COM<br/>NS52.DOMAINCONTROL.COM<br/>
Have a look at this link:
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.objectforscripting.aspx
I've actually used this in the past, and it works perfectly.
HtmlDocument doc = browser.Document;
HtmlElement head = doc.GetElementsByTagName("head")[0];
HtmlElement s = doc.CreateElement("script");
s.SetAttribute("text","function sayhello() { alert('hello'); }");
head.AppendChild(s);
browser.Document.InvokeScript("sayHello");