website parsing - webbrowser or httpwebresponse - c#

I experienced some difficulties when I tried to parse some data out of my banking website. Basically, I would like to export my transaction history in a daily bases automatically, but the internet banking does not have any automated functionality as such.
I am currently experimenting on how to simulate filling up form and clicks to get to the download page and get the CSV file where I can use for parsing.
I have tried different method and have no success, please direct me to the correct path.
public static void getNABLogin()
{
try
{
Console.WriteLine("ENTER to begin");
//Console.ReadLine();
System.Net.HttpWebRequest wr = (System.Net.HttpWebRequest)System.Net.WebRequest.Create("https://ib.nab.com.au/nabib/index.jsp");
wr.Timeout = 1000;
wr.Method = "GET";
wr.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36";
wr.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
wr.Headers.Add("Accept-Language", "en-GB,en-US;q=0.8,en;q=0.6");
wr.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
//wr.Connection = "Keep-Alive";
wr.Host = "ib.nab.com.au";
wr.KeepAlive = true;
wr.CookieContainer = new CookieContainer();
//////////This part will get me to the correct login page at least////////////////////
// System.IO.Stream objStreamReceive ;
// System.Text.Encoding objEncoding;
// System.IO.StreamReader objStreamRead;
// WebResponse objResponse;
//string strOutput = string.Empty;
//objResponse = wr.GetResponse();
//objStreamReceive = objResponse.GetResponseStream();
//objEncoding = System.Text.Encoding.GetEncoding("utf-8");
//objStreamRead = new StreamReader(objStreamReceive, objEncoding); // Set function return value
//strOutput = objStreamRead.ReadToEnd();
///////////////////////////////
System.Net.HttpWebResponse wresp = (System.Net.HttpWebResponse)wr.GetResponse();
System.Windows.Forms.WebBrowser wb = new System.Windows.Forms.WebBrowser();
wb.DocumentStream = wresp.GetResponseStream();
wb.ScriptErrorsSuppressed = true;
wb.DocumentCompleted += (sndr, e) =>
{
/////////////After dumping the document text into a text file, I get a different page/////////////////
//////////////I get the normal website instead of login page////////////////////////
System.IO.StreamWriter file = new System.IO.StreamWriter("C:\\temp\\test.txt");
Console.WriteLine(wb.DocumentText);
file.WriteLine(wb.DocumentText);
System.Windows.Forms.HtmlDocument d = wb.Document;
System.Windows.Forms.HtmlElementCollection ctrlCol = d.GetElementsByTagName("script");
foreach (System.Windows.Forms.HtmlElement tag in ctrlCol)
{
tag.SetAttribute("src", string.Format("https://ib.nab.com.au{0}", tag.GetAttribute("src")));
}
ctrlCol = d.GetElementsByTagName("input");
foreach (System.Windows.Forms.HtmlElement tag in ctrlCol)
{
if (tag.GetAttribute("name") == "userid")
{
tag.SetAttribute("value", "123456");
}
else if (tag.GetAttribute("name") == "password")
{
tag.SetAttribute("value", "nabPassword");
}
file.WriteLine(tag.GetAttribute("name"));
}
file.Close();
// object y = wb.Document.InvokeScript("validateLogin");
};
while (wb.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
{
System.Windows.Forms.Application.DoEvents();
}
}
catch(Exception e)
{
System.IO.StreamWriter file = new System.IO.StreamWriter("C:\\temp\\error.txt");
file.WriteLine(e.Message);
Console.WriteLine(string.Format("error: {0}", e.Message));
Console.ReadLine();
}
I called this method from a thread (as you have probably know that webbrowser need to be STA thread to work).
As explained in the code, I got the the login page correctly using httpwebresponse method. but when I tried to load to webbrowser using documentstream, I got to a different website.
Next question would be, what should I do next after I got to the login page, how can I simulate clicks and filling in data (my theory at the moment is trying to post some data using httpwebrequest).
Please shed some light on this. any comments or information is very much appreciated.
Thank you very much in advance.

You can use selenium like browser and go to where you want to go and parse page with HtmlAgilityPack. Both has a c# support. Very simple console application can do rest
Selenium
http://www.seleniumhq.org/docs/02_selenium_ide.jsp#chapter02-reference
HtmlAgilityPack
https://htmlagilitypack.codeplex.com/wikipage?title=Examples
You can fill form and post like this with selenium and c#
//Navigate to the site
driver.Navigate().GoToUrl("http://www.google.com.au");
// Find the text input element by its name
IWebElement query = driver.FindElement(By.Name("q"));
// Enter something to search for
query.SendKeys("Selenium");
// Now submit the form
query.Submit();
// Google's search is rendered dynamically with JavaScript.
// Wait for the page to load, timeout after 5 seconds
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(5));
wait.Until((d) => { return d.Title.StartsWith("selenium"); });
And you can parse data (this example table) like this with HtmlAgility
var cols = doc.DocumentNode.SelectNodes("//table[#id='table2']//tr//td");
for (int ii = 0; ii < cols.Count; ii=ii+2)
{
string name = cols[ii].InnerText.Trim();
int age = int.Parse(cols[ii+1].InnerText.Split(' ')[1]);
}

Related

Getting latest app version from play store xamarin

How can I get latest android app version from Google play store? Earlier to used to do so by using below code
using (var webClient = new System.Net.WebClient())
{
var searchString = "itemprop=\"softwareVersion\">";
var endString = "</";
//possible network error if phone gets disconnected
string jsonString = webClient.DownloadString(PlayStoreUrl);
var pos = jsonString.IndexOf(searchString, StringComparison.InvariantCultureIgnoreCase) + searchString.Length;
var endPos = jsonString.IndexOf(endString, pos, StringComparison.Ordinal);
appStoreversion = Convert.ToDouble(jsonString.Substring(pos, endPos - pos).Trim());
System.Diagnostics.Debug.WriteLine($"{currentVersion} :: {appStoreversion}");
System.Diagnostics.Debug.WriteLine($"{appStoreversion > currentVersion}");
if ((appStoreversion.ToString() != currentVersion.ToString() && (appStoreversion > currentVersion)))
{
IsUpdateRequired = true;
}
}
& the code below even throwing exception
var document =
Jsoup.Connect("https://play.google.com/store/apps/details?id=" + "com.spp.in.spp" + "&hl=en")
.Timeout(30000)
.UserAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.Referrer("http://www.google.com")
.Get();
Eception:
Android.OS.NetworkOnMainThreadException: Exception of type
'Android.OS.NetworkOnMainThreadException' was thrown. at
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw ()
But now Play store seems to change few conditions, so existing functionality is broke down. Few similar threads are already available here however those seems to have outdated.
This will return a string-based version, at least until Google changes the html page contents again.
var version = await Task.Run(async () =>
{
var uri = new Uri($"https://play.google.com/store/apps/details?id={PackageName}&hl=en");
using (var client = new HttpClient())
using (var request = new HttpRequestMessage(HttpMethod.Get, uri))
{
request.Headers.TryAddWithoutValidation("Accept", "text/html");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
request.Headers.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
using (var response = await client.SendAsync(request).ConfigureAwait(false))
{
try
{
response.EnsureSuccessStatusCode();
var responseHTML = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
var rx = new Regex(#"(?<=""htlgb"">)(\d{1,3}\.\d{1,3}\.{0,1}\d{0,3})(?=<\/span>)", RegexOptions.Compiled);
MatchCollection matches = rx.Matches(responseHTML);
return matches.Count > 0 ? matches[0].Value : "Unknown";
}
catch
{
return "Error";
}
}
}
}
);
Console.WriteLine(version);
Based from this link, this exception is thrown when an application attempts to perform a networking operation on its main thread. You may refer with this thread wherein it stated that network operations on Android need to be performed off the main UI thread. The easiest way is use a Task to push it onto a thread in the default threadpool.

Download File to Byte[] with Selenium C#

I need to download a file to byte[] using Selenium in C#.
The problem is the file is downloaded via a button which does a javascript call:
javascript:__doPostBack('ctl00$MainContent$gvOutputs','Select$0')
If i could get a URL I could just use the C# command:
using (WebClient wc = new WebClient())
{
wc.Headers[HttpRequestHeader.Cookie] = this.GetCookieHeaderString(); //Get Cookie from Selenium window
return wc.DownloadData(sourceURL);
}
Unfortunately this wont work as I dont have the URL.
the __dopostback makes a POST request which looks like this:
__EVENTTARGET=ctl00%24MainContent%24gvOutputs
__EVENTARGUMENT=Select%240
__VIEWSTATE=sEM2tcQczKVsK5kzEN2x19Gxco%....
__VIEWSTATEGENERATOR=B935C9B7
__VIEWSTATEENCRYPTED=
__EVENTVALIDATION=kOyxw5ZKBd1yygTXmUR%....
I suppose if there were a way to get those variables I could create a POST in C#.. However I'm not sure how I can get those variables?
I can click the link in selenium, but that will force a download to the clients computer..
I suppose 1 option would be to monitor the download directory, and read it this way, but im trying to avoid this brute force method.
I was hoping for a better answer. However I ended up making a specific DOWNLOAD path for every Selenium instance, then monitoring that directory and associating the files with the selenium instance.. A little bit hacky but it works.
I am a bit late, but I just found a solution for my case.
You can use this chunk of JS code to do the post request:
var xhr = new XMLHttpRequest();
xhr.open('POST', '<THE-POST-REQUEST-URL>', true);
xhr.responseType = 'arraybuffer';
xhr.onload = function () {
if (this.status === 200) {
var fileArray = this.response;
var filename = '';
var disposition = xhr.getResponseHeader('Content-Disposition');
if (disposition && disposition.indexOf('attachment') !== -1) {
var filenameRegex = /filename[^;=\n]*=((['"]).*?\2|[^;\n]*)/;
var matches = filenameRegex.exec(disposition);
if (matches != null && matches[1]) filename = matches[1].replace(/['"]/g, '');
}
var byteArray = new Uint8Array(fileArray);
var obj = {
'fileName': filename,
'fileData': byteArray
}
callback(obj);
}
else {
callback('server error - file might not be found');
}
};
xhr.setRequestHeader('Content-type', 'application/x-www-form-urlencoded');
xhr.send('<POST-REQUEST-ARGUMENTS>');
and then call it in selenium:
var response = driver.ExecuteAsyncScript(
#$ "var callback = arguments[arguments.length - 1];
// The rest of the code from the last example
"
);
if (response is Dictionary < string, object > fileDict)
{
var fileName = fileDict["fileName"]?.ToString();
var fileData = fileDict["fileData"];
if (fileData is ReadOnlyCollection < object > readonlyCollection)
{
var byteArray = readonlyCollection.Select(i => (byte)(int)(long) i).ToArray();
var tempFile = "PATH/TO/FILE";
File.WriteAllBytes(tempFile, byteArray);
}
else
{
throw new Exception("fileData is not a byte array");
}
}

how to get the response body in code webtest?

I wrote a webtest that calls a web-service.
I want to get the response body and do some validation on it.
public override IEnumerator<WebTestRequest> GetRequestEnumerator()
{
WebTestRequest request2 = new WebTestRequest("webservice");
request2.Headers.Add("Content-Type", "application/json");
request2.Method = "POST";
request2.Encoding = System.Text.Encoding.GetEncoding("utf-8");
StringHttpBody request2Body = new StringHttpBody();
request2Body.ContentType = "application/json";
request2Body.InsertByteOrderMark = false;
request2Body.BodyString = #"{ <body>}";
request2.Body = request2Body;
WebTestResponse res = new WebTestResponse();
console.WriteLine(res.BodyBytes);
yield return request2;
request2 = null;
}
When i ran the above code i didn't get any response on my console.
How can i get the response body using coded webtest?
There are at least three problems with the code in the question
The code in the question does not perform the request before doing the WriteLine. The two statements WebTestResponse res = new WebTestResponse(); and console.WriteLine(res.BodyBytes); just create a new WebTestResponse object (with all default values) and then try to print part of its contents. The request is issued by the code that calls your GetRequestEnumerator method.
The console object is not defined. The normal console has a first letter uppercase, ie Console.
When a web test executes I am not sure where its "console" output will go. The standard output of a web test is not, as far as I know, a well defined thing.
An easy way to get at the response body is to use the PostRequest method of a WebTestRequestPlugin. For a start
public class BodyContentsDemo : WebTestRequestPlugin
{
public override void PostRequest(object sender, PostRequestEventArgs e)
{
byte[] bb = e.Response.BodyBytes;
string ss = e.Response.BodyString;
e.WebTest.AddCommentToResult(
"BodyBytes is " +
bb == null ? " null"
: bb.Length.ToString() + " bytes");
e.WebTest.AddCommentToResult(
"BodyString is " +
ss == null ? "null"
: ss.Length.ToString() + " chars");
// Use bb or ss.
}
}
Note the use of AddCommentToResult to provide logging information to the web test results log.
Finally I am able to find a solution for last couple of days I was struggling to capture the response text from Web Performance test. Hope this helps
public override IEnumerator GetRequestEnumerator()
{
WebTestRequest request2 = new WebTestRequest("webservice");
request2.Headers.Add("Content-Type", "application/json");
request2.Method = "POST";
request2.Encoding = System.Text.Encoding.GetEncoding("utf-8");
StringHttpBody request2Body = new StringHttpBody();
request2Body.ContentType = "application/json";
request2Body.InsertByteOrderMark = false;
request2Body.BodyString = #"{<body>}";
request2.Body = request2Body;
WebTestResponse res = new WebTestResponse();
console.WriteLine(res.BodyBytes);
yield return request2;
/*This will generate a new string which can be part of your filename when you run performance tests*/
String randomNo = DateTime.Now.ToString("MM/dd/yyyy HH:mm:ss").Replace("-", "").Replace(" ", "").Replace(":", "");
/*This will generate a new file each time your WebRequest runs so you know what the server is returning when you perform webtests*/
/*You can use some Json parser if your response is Json and capture and validate the response*/
System.IO.File.WriteAllText(#"C:\Users\XXXX\PerformanceTestRequests\LastResponse" + randomNo+ ".txt", this.LastResponse.BodyString);
request2 = null;
}

How to display XML values in list box or how to display on label.?

The given below is my code to get weather details from world weather online. The code is working fine and I get the weather details to the variable "WP_XMLdoc". But the problem is the variable contains the values are in xml format.So how can I get each value seperatly and how to display those values on label or textbox.
public static XmlDocument WeatherAPI(string sLocation)
{
HttpWebRequest WP_Request;
HttpWebResponse WP_Response = null;
XmlDocument WP_XMLdoc = null;
string sKey = "********************"; //The API key generated by World Weather Online
string sRequestUrl = "http://api.worldweatheronline.com/free/v1/weather.ashx?format=xml&"; //The request URL for XML format
try
{
//Here we are concatenating the parameters
WP_Request = (HttpWebRequest)WebRequest.Create(string.Format(sRequestUrl + "q=" + sLocation + "&key=" + sKey));
WP_Request.UserAgent = #"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4";
//Making the request
WP_Response = (HttpWebResponse)WP_Request.GetResponse();
WP_XMLdoc = new XmlDocument();
//Assigning the response to our XML object
WP_XMLdoc.Load(WP_Response.GetResponseStream());
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
WP_Response.Close();
return WP_XMLdoc; // Here we get the five values from the website in xml format. Now I want those xml values from this "WP_XMLdoc" variable to diplay on textbox or labels.
The best you can use is XDocument object that gives you better control over XmlDocument.
Here is the console application that I wrote through.
The main method you will be using is Element(...) and Descendents(...);
using System;
using System.Linq;
using System.Net;
using System.Xml;
using System.Xml.Linq;
public class Program
{
public static void Main()
{
var result = WeatherAPI("London");
// loop throw all weather instances...
foreach (var w in result.Descendants("weather"))
{
Console.WriteLine("Weather");
Console.WriteLine("=================");
foreach (var e in w.Elements())
{
Console.WriteLine(string.Format("Key {0} - Value {1}", e.Name, e.Value));
}
}
// if you want to select specific element then use this.
var currentCondition = result.Descendants("current_condition").FirstOrDefault();
if (currentCondition != null)
{
Console.WriteLine("Current Condition");
Console.WriteLine("=================");
foreach (var e in currentCondition.Elements())
{
Console.WriteLine(string.Format("Key {0} - Value {1}", e.Name, e.Value));
}
}
Console.ReadLine();
}
public static XDocument WeatherAPI(string sLocation)
{
HttpWebRequest webRequest;
HttpWebResponse webResponse = null;
XDocument xmlResult = null;
var apiKey = "Your key"; //The API key generated by World Weather Online
var apiEndpoint = "http://api.worldweatheronline.com/free/v1/weather.ashx?format=xml&";
try
{
//Here we are concatenating the parameters
webRequest =
(HttpWebRequest) WebRequest.Create(string.Format(apiEndpoint + "q=" + sLocation + "&key=" + apiKey));
webRequest.UserAgent =
#"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4";
//Making the request
webResponse = (HttpWebResponse) webRequest.GetResponse();
xmlResult = XDocument.Load(webResponse.GetResponseStream());
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
finally
{
if (webResponse != null)
{
webResponse.Close();
}
}
return xmlResult;
// Here we get the five values from the website in xml format. Now I want those xml values from this "WP_XMLdoc" variable to diplay on textbox or labels.
}
}
XML output
<data>
<request>
<type>City</type>
<query>London, United Kingdom</query>
</request>
<current_condition>
<observation_time>04:53 AM</observation_time>
<temp_C>17</temp_C>
<temp_F>63</temp_F>
<weatherCode>143</weatherCode>
<weatherIconUrl><![CDATA[http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0006_mist.png]]></weatherIconUrl>
<weatherDesc><![CDATA[Mist]]></weatherDesc>
<windspeedMiles>6</windspeedMiles>
<windspeedKmph>9</windspeedKmph>
<winddirDegree>20</winddirDegree>
<winddir16Point>NNE</winddir16Point>
<precipMM>0.0</precipMM>
<humidity>94</humidity>
<visibility>1</visibility>
<pressure>1014</pressure>
<cloudcover>75</cloudcover>
</current_condition>
<weather>
<date>2014-09-20</date>
<tempMaxC>22</tempMaxC>
<tempMaxF>71</tempMaxF>
<tempMinC>10</tempMinC>
<tempMinF>50</tempMinF>
<windspeedMiles>8</windspeedMiles>
<windspeedKmph>13</windspeedKmph>
<winddirection>N</winddirection>
<winddir16Point>N</winddir16Point>
<winddirDegree>350</winddirDegree>
<weatherCode>119</weatherCode>
<weatherIconUrl><![CDATA[http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0003_white_cloud.png]]></weatherIconUrl>
<weatherDesc><![CDATA[Cloudy ]]></weatherDesc>
<precipMM>0.6</precipMM>
</weather>
</data>
Output:
What is the Difference between Element and Descendent.
Elements finds only those elements that are direct descendents, i.e. immediate children.
Descendants finds children at any level, i.e. children, grand-children, etc...

How to get all cookies from WatIn

I am using WatIn to make the login to a website. I need to get all cookies and set them to HttpWebRequest so that I will be able to download a file from this website (bank). I am using Fiddler to see all the cookies and I can see that I am missing some. If I stop my program (debug) and by hard-code insert all cookies from Fiddler to my cookie it will download my files, so that means 100% that I just need to get the cookie from WatIn and my mission is complete.
So how can I get all the cookies?
My code WaTin:
using (var browser = new IE("https://bankxxx.com"))
{
try
{
browser.WaitForComplete();
try
{
// browser.Visible = false;
browser.TextField(Find.ById("userID")).TypeText(strUser);
Thread.Sleep(1000);
browser.TextField(Find.ById("numID")).Value = strUserId;
browser.TextField(Find.ById("userPassword")).TypeText(strPass);
linkExist = browser.Image(Find.ById("inputSend")).Exists;
if (linkExist) browser.Image(Find.ById("inputSend")).Click();
browser.WaitForComplete();
linkExist = false;
}
catch (Exception ex)
{
successful = false;
clsUtils.WriteToLog("Fail to connect -" + ex.Message, true);
ErrorLog += "Fail to connect -" + ex.Message + Environment.NewLine;
}
//Here i am in side of WebSite
//I tried this too,i getting the same cookie
//CookieContainer cookies23 = browser.GetCookieContainerForUrl(new Uri("bank.com"));
//CookieCollection cookies34 = browser.GetCookiesForUrl(new Uri("bank"));
string cookies = browser.Eval("document.cookie");
CookieContainer _cookies = GeneralUtils.GetCc(cookies, "bank.com");
//then my httpreqest NOT FUll it is working 100% if cookies is good
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create("bank.com");
postRequest.CookieContainer = new CookieContainer();
postRequest.CookieContainer = _cookies;.......
}
My GetCc function than build CookieContainer from string and add domain:
public static CookieContainer GetCc(string cookie, string Domain)
{
CookieContainer Cc = new CookieContainer();
string[] arrCookie;
string[] allcookies = cookie.Split(';');
for (int i = 0; i < allcookies.Length; i++)
{
arrCookie = allcookies[i].Split('=');
Cookie TCookie = new Cookie();
TCookie.Name = arrCookie[0].Trim().ToString();
TCookie.Value = arrCookie[1].Trim().ToString();
TCookie.Domain = Domain;
Cc.Add(TCookie);
}
return Cc;
}

Categories