Optimus headless browser with C# - c#

Can somebody tell me how to use Optimus (headless browser) nuget package with C# to get response from a URL. I also want javascript on the page to be executed automatically like phantomjs.

Quite a simple bit of kit:
Create an Engine component first (common for dynamic and static pages):
Engine engine = new Engine();
Open the url of the html document you want to retreive:
a) Not waiting for any elements added in with javascript:
engine.OpenUrl("http://google.com").Wait();
b) Waiting for any elements added in with javascript:
engine.OpenUrl("http://google.com")
and then either:
engine.WaitDesappearingOfId("some-id")
engine.WaitId("some-id")
engine.WaitDocumentLoad()
engine.WaitSelector("#some-id")
engine.WaitSelector(".some-class")
now you open the url, there are two ways of doing this -
load the document (prior to any javascript being executed):
More complete examples:
public static string dynamicLoadingPage()
{
var engine = new Engine();
engine.OpenUrl("https://html5test.com");
var tagWithValue = engine.WaitSelector("#score strong").FirstOrDefault();
System.Console.WriteLine("Score: " + tagWithValue.InnerHTML);
}
Otherwise:
static string staticLoadingPage()
{
var engine = new Engine();
engine.OpenUrl("http://google.com").Wait();
Console.WriteLine("The first document child node is: " + engine.Document.FirstChild);
Console.WriteLine("The first document body child node is: " + engine.Document.Body.FirstChild);
Console.WriteLine("The first element tag name is: " + engine.Document.ChildNodes.OfType<HtmlElement>().First().TagName);
Console.WriteLine("Whole document innerHTML length is: " + engine.Document.DocumentElement.InnerHTML.Length);
}

Related

How to run certain scripts while suppressing scripts in web browser control

I try to use a webbrowser control in my application, in which I want to block scrips and frames.
I used the extended web browser control in this answer to have access to download control flags.
So, I used it as follows in the form constructor:
webBrowser1.DownloadControlFlags = (int)WebBrowserDownloadControlFlags.DLIMAGES
+ (int)WebBrowserDownloadControlFlags.NOFRAMES
+ (int)WebBrowserDownloadControlFlags.NO_SCRIPTS
+ (int)WebBrowserDownloadControlFlags.NO_FRAMEDOWNLOAD
+ (int)WebBrowserDownloadControlFlags.NO_JAVA
+ (int)WebBrowserDownloadControlFlags.NO_DLACTIVEXCTLS
+ (int)WebBrowserDownloadControlFlags.NO_BEHAVIORS
+ (int)WebBrowserDownloadControlFlags.NO_RUNACTIVEXCTLS
+(int)WebBrowserDownloadControlFlags.SILENT;
It seems works, but I have a certain injected script which I want to run it. I injected it after the document was loaded (in DocumentCompleted event)
IHTMLDocument2 doc2 = webBrowser1.Document.DomDocument as IHTMLDocument2;
IHTMLScriptElement script = (IHTMLScriptElement)doc2.createElement("SCRIPT");
script.type = "text/javascript";
script.text = #"// Highlight Words Script ....";
IHTMLElementCollection nodes = doc.getElementsByTagName("head");
foreach (IHTMLElement elem in nodes)
{
//Append script
HTMLHeadElement head = (HTMLHeadElement)elem;
head.appendChild((IHTMLDOMNode)script);
}
But it doesn't run as I call it
wb.Document.InvokeScript("findString", new string[] { toWord });
How can I run my script while I have suppressed running the document scripts?
Can I let scripts run but block script errors and undesired behaviours using other flags?

Not getting correct data from span

I've been making a custom user handler for Jessecar's SteamBot, which is unrelated to the problem I'm having, but essentially what I've done, is I've made it so you can set the bot to play a specific game by App ID, and I've been using this to idle on games for Steam Trading Cards, the only issue is, the only way I can check if it's finished, is by checking its inventory and how many cards are supposed to drop, which isn't too much of a hassle, but the main reason I created this was for efficiency, and doing this every time kind of defeats the purpose of it.
Because of this, I tried getting data from the badge page for the bot on the game that it's playing, this is what I have so far...
else if (message.StartsWith(".updateidle"))
{
var webGet = new HtmlWeb();
var SteamID64 = Bot.SteamClient.SteamID.ConvertToUInt64();
string htmlget = "http://www.steamcommunity.com/profiles/" + SteamID64 + "/gamecards/" + newgame;
var doc = webGet.Load(htmlget);
HtmlNode hoursNode = doc.DocumentNode.SelectSingleNode("//div[#class=\"badge_title_stats_playtime\"]");
string hours = Regex.Match(hoursNode.InnerText, #"[0-9\.,]+").Value;
var cards = doc.DocumentNode.SelectSingleNode("div[#class='badge_title_stats_drops']/span").InnerText;
if (hours == string.Empty)
{
hours = "0.0";
}
Bot.SteamFriends.SendChatMessage(OtherSID, type, "I have been idling for " + hours + " hours on game " + newgame + " and have " + cards + " card drops remaining.");
}
Getting the hours works fine, if the bot has no time on that game, it doesn't appear, so I just check if it's empty then set it to 0.0, however, with the cards, it appears as either "No card drops remaining" or " card drops remaining" which it doesn't get either, I tried using the same method as the hours and only get it if it's a number, and it still returns with "0", same result goes for this...
I also tried again with doing a check if the string is empty, because that could mean there is no card drops remaining, as there would be no numbers, and I also had a look online for methods of getting span data inside a div, or span data general, and neither methods worked, they'd just return with "0". And if you can't already tell, I do have the HTML Agility Pack.
So building in my previous answer, that I have decided not to edit, since the followup here is gonna be large. I amusing both Selenium and Html Agility Pack for this. First I log in using Selenium(I am using Mono btw). After that I type in authorize my pc manually(if yours is already authorized then skip this step) and then go to the console and press any key to proceed with getting card info. I will gather the card info for all games in this case. I can't identify which game still has card drops as it has not been implemented yet.
class MainClass
{
public static void Main(string[] args)
{
string userName = "username";
string password ="password";
string steamProfile = "steamprofile";
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
using (var driver = new FirefoxDriver())
{
// Go to the home page
driver.Navigate().GoToUrl("https://store.steampowered.com//login/?redir=0");
// Get the page elements
var userNameField = driver.FindElementById("input_username");
var userPasswordField = driver.FindElementById("input_password");
//var loginButton = driver.FindElementById("login_btn_signin");
var loginButton = driver.FindElementByXPath("//button[#class='btnv6_blue_hoverfade btn_medium']");
// Type user name and password
userNameField.SendKeys(userName);
userPasswordField.SendKeys(password);
// and click the login button
loginButton.Click();
System.Threading.Thread.Sleep(5000);
//Type authorization code and enter manually.
System.Console.ReadKey();
driver.Navigate().GoToUrl("http://steamcommunity.com/profiles/"+steamProfile+"/badges");
driver.GetScreenshot().SaveAsFile(#"screen.png", ImageFormat.Png); //Debuggin purposes, as I was first using PhantomJS
htmlDoc.LoadHtml(driver.PageSource);
Console.Clear();
}
HtmlNodeCollection col = htmlDoc.DocumentNode.SelectNodes("//span[#class='progress_info_bold']");
foreach (HtmlNode n in col)
{
Console.WriteLine(n.InnerText);
}
}
}
}
The output in my case
5 of 29 tasks completed
No card drops remaining
No card drops remaining
No card drops remaining
4 card drops remaining
3 card drops remaining
This code also gives you the badge progress. You must figure out yourself how to filter your data in Html Agility Pack(read up on xpath). I also recommend that you use Selenium, since you can start a steamgame from your webpage using it.
remember that the xpath I gave you in my first answer and is also used in the code above finds ALL("//") the that has a class that equals "progress_info_bold".
You need to be more specific about what nodes to pick. I highly disencourage you from ever using regex to try and navigate the innertext or innerhtml of an htmldocument.
To find the HTmlNodes regarding if there is anymore cards to drop. try using this xpath:
"//span[#class='progress_info_bold']"
These nodes will either contain the text:
"No card drops remaining"
or
number+" card drops remaining"

How do I open XML from link in razor?

The task is quite simple, connect to another webservice using XML.
In the current pages (classic ASP) we use the following code:
zoekpcode=UCASE(Request.Querystring("zoekpcode")) <-- postal-code
zoeknr=Request.Querystring("zoeknr") <-- house-number
PC=Trim(Replace(zoekpcode," ",""))
NR=Trim(Replace(zoeknr," ",""))
strGetAddress="https://ws1.webservices.nl/rpc/get-simplexml/addressReeksPostcodeSearch/*~*/*~*/" & PC & NR
set xml = Server.CreateObject("Microsoft.XMLHTTP")
xml.open "GET", strGetAddress , false
xml.send ""
strStatus = xml.Status
If Len(PC)>5 and Len(NR)>0 Then
strRetval = Trim(xml.responseText)
End If
set xml = nothing
'Do something with the result string
One of the possible links could be: https://ws1.webservices.nl/rpc/get-simplexml/addressReeksPostcodeSearch/~/~/1097ZD49
Currently I'm looking for a way to do this in razor (C#), but all I seem to be able to find on Google is how to do it in JavaScript
I've tried (most combinations of) the following terms:
razor
xmlhttp
comobject
XML from url
-javascript
Results were mostly about JavaScript or razorblades.
Based on other result (like in the search comobjects in razor) it seems that comobject aren't available in Razor.
I did find this question (How to use XML with WebMatrix razor (C#)) on stackoverflow that seems to answer my question (partially), but is it also possible with a link to an external system (the mentioned web-service)?
I have covered the consumption of Web Services in Razor web pages here: http://www.mikesdotnetting.com/Article/209/Consuming-Feeds-And-Web-Services-In-Razor-Web-Pages.
If your web service is a SOAP one, you are best off using Visual Studio (the free Express editions is fine) to add a service reference and then work from there. Otherwise you can use Linq To XML to load the XML directly into an XDocument as in the ATOM example in the article:
var xml = XDoxument.Load("https://ws1.webservices.nl/rpc/get-simplexml/blah/blah");
Then use the System.Xml.Linq APIs to query the document.
With the help of Ralf I came to the following code:
public static XmlDocument getaddress(string pcode, string number){
string serverresponse = "";
string getlocation = "https://ws1.webservices.nl/rpc/get-simplexml/addressReeksPostcodeSearch/*~*/*~*/" + Request.QueryString["PCODE"] + Request.QueryString["NR"];
HttpWebRequest req = (HttpWebRequest) WebRequest.Create(getlocation);
using (var r = req.GetResponse()) {
using (var s = new StreamReader(r.GetResponseStream())) {
serverresponse = s.ReadToEnd();
}
}
XmlDocument loader = new XmlDocument();
loader.LoadXml(serverresponse);
return loader;
}
public static string getvalue(XmlDocument document, string node){
string returnval = "";
var results = document.SelectNodes(node);
foreach(XmlNode aNode in results){
returnval = returnval + "," + aNode.InnerText;
}
return returnval.Substring(1);
}

WatIn Add a new option to a Selectlist?

I need to add a new option to a selectList in one of my unit tests, and I can't figure out how to do it.
The Dropdown currently has 2 options, I want to add a third, and use it.
I tried to use JavaScript injection using http://stevenharman.net/blog/archive/2007/07/10/add-option-elements-to-a-select-list-with-javascript.aspx as a base, but that failed. I get exceptions that crash the IE browser every time, and the text "RunScript failed" gets printed into my logs even though I don't use that text in my error output.
Is this possible in Watin? Or has Open Source Failed me?
Using the code in the link you provided, with one small change I've gotten it to work.
My changes
Changed the ID to the ID of my dropdown (of course!)
Changed the $ in the element get to 'document.getElementById'. With the $ in there instead I don't see any obvious errors or anything like that; just no action taken.
The 'New Option' is added to the dropdown as the last item and it is the selected item.
string js = "";
js = js + "var theSelectList = document.getElementById('myDropDownID'); ";
js = js + " AddSelectOption(theSelectList, \"My Option\", \"123\", true);";
js = js + " function AddSelectOption(selectObj, text, value, isSelected) ";
js = js + "{";
js = js + " if (selectObj != null && selectObj.options != null)";
js = js + "{";
js = js + " selectObj.options[selectObj.options.length] = new Option(text, value, false, isSelected);";
js = js + "}}";
myIE.Document.Eval(js);
My setup
WatiN 2.0
IE8
Win7
Checked when the dropdown has 1 entry and 2 entries; both scenarios had "My Option" added without issue.

Getting the executed output of an aspx page after a short delay

I have an aspx page which has some javascript code like
<script>
setTimeout("document.write('" + place.address + "');",1);
</script>
As it is clear from the code it will going to write something on the page after a very short delay of 1 ms. I have created an another page to get the page executed by some query string and get its output. The problem is
I can not avoid the delay as simply writing document.write(place.address); will not print anything as it takes a little time to get values so if I set it in setTimeout for delayed output of 1 ms it always return me a value
If I request the output from another page using
System.Net.WebClient wc = new System.Net.WebClient();
System.IO.StreamReader sr = new System.IO.StreamReader(wc.OpenRead("http://localhost:4859/Default.aspx?lat=" + lat + "&lng=" + lng));
string strData = sr.ReadToEnd();
I get the source code of the document instead of the desired output.
I would like to either avoid that delay or else delayed the client request output so that I get a desired value not the source code.
The JS on default.aspx is
<script type="text/javascript">
var geocoder;
var address;
function initialize() {
geocoder = new GClientGeocoder();
var qs=new Querystring();
if(qs.get("lat") && qs.get("lng"))
{
geocoder.getLocations(new GLatLng(qs.get("lat"),qs.get("lng")),showAddress);
}
else
{
document.write("Invalid Access Or Not valid lat long is provided.");
}
}
function getAddress(overlay, latlng) {
if (latlng != null) {
address = latlng;
geocoder.getLocations(latlng, showAddress);
}
}
function showAddress(r) {
place = r.Placemark[0];
setTimeout("document.write('" + place.address + "');",1);
//document.write(place.address);
}
</script>
and the code on requestClient.aspx is as
System.Net.WebClient wc = new System.Net.WebClient();
System.IO.StreamReader sr = new System.IO.StreamReader(wc.OpenRead("http://localhost:4859/Default.aspx?lat=" + lat + "&lng=" + lng));
string strData = sr.ReadToEnd();
I'm not a JavaScript expert, but I believe using document.write after the page has finished loading is a bad thing. You should be creating an html element that your JavaScript can manipulate, once the calculation is complete.
Elaboration
In your page markup, create a placeholder for where you want the address to appear:
<p id="address">Placeholder For Address</p>
In your JavaScript function, update that placeholder:
function showAddress(r) {
place = r.Placemark[0];
setTimeout("document.getElementById('address').innerHTML = '" + place.address + "';",1);
}
string strData = sr.ReadToEnd();
I get the source code of the document instead of the desired output
(Could you give a sample of the output. I don't think I've seen a web scraper work that way so that would help me to be sure. But if not this is a good example web scraper)
Exactly what are you doing with the string "strData" If you are just writing it out, I recommend you putting it in a Server side control (like a literal). If at all possible, I'd recommend you do this server side using .net rather than waiting 1 ms in javascript (which isn't ideal considering the possibility that 1 ms may or may not be an ideal amount of time to wait on a particular user's machine hence: "client side"). In a case like this and I had to do it client side I would use the element.onload event to determine if a page has finished loading.

Categories